Pre-computed genome comparisons

Blast comparisons

The pre-computed comparisons available within WebACT currently encompass all publicly available annotated prokaryotic genomes stored in the Genome Reviews database at the EBI. The comparisons have been carried out using NCBI blast, using the BLASTN algorithm.
In order to carry out a comparison between two genomes, a blast database is created from one of the genomes as a single sequence.

Although the bl2seq available with NCBI blast is capable of carrying out pairwise blast comparisons, it is not possible to use it on comparisons of this scale. Furthermore, the blast algorithm itself is only considered suitable for sequences up to a maximum length of around one megabase, (Korf I. et al 2003), considerably shorter than the majority of bacterial genomes.

The genome sequence to be used as a query sequence (genome A) is therefore divided into 100 kb segments, with a 1 kb overlap between segments. The genome segments are sequentially searched against the blast database of genome B, and the results obtained in blast's tab-delimited output format, since the actual alignments are not required for visualisation of comparisons using ACT.

Comparisons were carried out using the following blast parameters:

  • program : blastn
  • nucleotide match score (-r) : 1
  • nucleotide mismatch penalty (-q) : -1
  • gap opening penalty (-G) : 1
  • gap extension penalty (-E) : 2
  • wordsize (-W) : 9
  • low complexity filtering (-F) : "m D" (soft DUST masking)

No e-value cut-off is specified, so that this can be selected at the time of viewing the comparison.

In order to carry out the computations in an optimal manner, any two genomes have been compared only once, i.e. genome A vs genome B.
Should a comparison of genome B vs. genome A be requested, the data from the genome A vs. genome B comparison is used, but is presented such that it represents the comparison in the requested direction.