Introduction
kmacs is a new approach to alignment-free sequence comparison. While most alignment-free methods rely on exact word matches,
kmacs uses a distance measure based on inexact substing matches. To define the distance between two DNA or protein sequences,
kmacs estimates for each position
i of the first sequence the longest substring starting at
i and matching a substring of the second sequence with up to
k mismatches. It defines the average of
these values as a measure of similarity between the sequences and turns this into a symmetric distance measure. (This can be regarded as a generalization of the average common substring (ACS) approach
(
Ulitsky et al., 2006)).
Kmacs does not compute exact
k-mismatch substrings, since this would be computational too costly, but approximates such substrings. Details of this heuristic is described in the references cited below.
Availability
-
To make our new approach easily accessible to the scientific community, we set up a web interface at Göttingen Bioinformatics Compute Server (GOBICS).
-
In addition, the source code of our approach can be freely downloaded here.
The web server returns a distance matrix for the input sequences.