is a new approach to alignment-free sequence comparison. While most alignment-free methods rely on exact word matches, kmacs
uses a distance measure based on inexact substing matches. To define the distance between two DNA or protein sequences,
estimates for each position i
of the first sequence the longest substring starting at i
and matching a substring of the second sequence with up to k
mismatches. It defines the average of
these values as a measure of similarity between the sequences and turns this into a symmetric distance measure. (This can be regarded as a generalization of the average common substring (ACS) approach
(Ulitsky et al., 2006
does not compute exact k
-mismatch substrings, since this would be computational too costly, but approximates such substrings. Details of this heuristic is described in the references cited below.
To make our new approach easily accessible to the scientific community, we set up a web interface at Göttingen Bioinformatics Compute Server (GOBICS).
In addition, the source code of our approach can be freely downloaded here.
The web server returns a distance matrix for the input sequences.