k Mismatch Average Common Substring

Introduction

kmacs is a new approach to alignment-free sequence comparison. While most alignment-free methods rely on exact word matches, kmacs uses a distance measure based on inexact substing matches. To define the distance between two DNA or protein sequences, kmacs estimates for each position i of the first sequence the longest substring starting at i and matching a substring of the second sequence with up to k mismatches. It defines the average of these values as a measure of similarity between the sequences and turns this into a symmetric distance measure. (This can be regarded as a generalization of the average common substring (ACS) approach (Ulitsky et al., 2006)). Kmacs does not compute exact k-mismatch substrings, since this would be computational too costly, but approximates such substrings. Details of this heuristic is described in the references cited below.

Availability

To make our new approach easily accessible to the scientific community, we set up a web interface at Göttingen Bioinformatics Compute Server (GOBICS).
In addition, the source code of our approach can be freely downloaded here.

The web server returns a distance matrix for the input sequences.

kmacs - the k Mismatch Average Common Substring Approach
to alignment-free sequence comparison

Introduction

Availability

Usage

Sequence input

Mismatches

Program Output

Example

Alternative approach to alignment-free sequence comparison

Contact

Reference

kmacs - the k Mismatch Average Common Substring Approach to alignment-free sequence comparison

Introduction

Availability

Usage

Sequence input

Mismatches

Program Output

Example

Alternative approach to alignment-free sequence comparison

Contact

Reference

kmacs - the k Mismatch Average Common Substring Approach
to alignment-free sequence comparison