Pre-computed genome comparisons
Blast comparisons
The pre-computed comparisons available within WebACT currently encompass all
publicly available annotated prokaryotic genomes stored in the Genome Reviews database at the EBI. The comparisons have been carried out using NCBI blast, using the
BLASTN algorithm.
In order to carry out a comparison between two genomes, a blast database is
created from one of the genomes as a single sequence.
Although the bl2seq available with NCBI blast is capable of carrying out
pairwise blast comparisons, it is not possible to use it on comparisons of
this scale. Furthermore, the blast algorithm itself is only considered
suitable for sequences up to a maximum length of around one megabase,
(Korf I. et al 2003), considerably shorter than the majority of bacterial genomes.
The genome sequence to be used as a query sequence (genome A) is therefore divided
into 100 kb segments, with a 1 kb overlap between segments. The
genome segments are sequentially searched against the blast database of
genome B, and the results obtained in blast's tab-delimited output format,
since the actual alignments are not required for visualisation of
comparisons using ACT.
Comparisons were carried out using the following blast parameters:
- program : blastn
- nucleotide match score (-r) : 1
- nucleotide mismatch penalty (-q) : -1
- gap opening penalty (-G) : 1
- gap extension penalty (-E) : 2
- wordsize (-W) : 9
- low complexity filtering (-F) : "m D" (soft DUST masking)
No e-value cut-off is specified, so that this can be selected at the time
of viewing the comparison.
In order to carry out the computations in an optimal manner, any two
genomes have been compared only once, i.e. genome A vs genome B.
Should a comparison of genome B vs. genome A be requested, the data from the
genome A vs. genome B comparison is used, but is presented such that it
represents the comparison in the requested direction.