University of Göttingen | Faculty of Biology | Inst. of Microbiology and Genetics | Dep. of Bioinformatics

DIALIGN-TX [algorithm]

IRMBASE 2 and DIRMBASE 1:

IRMBASE 2 and DIRMBASE 1 both consist of four reference sets ref1, ref2, ref3 and ref4 with one, two, three and four (respectively) randomly implanted ROSE motives. The major difference compared to the old IRMBASE 1 lies in the fact that by random in 1/s cases the occurence of a motive in a sequence has been omitted, whereby s is the number of sequences in the sequence family. The results on IRMBASE 2 and DIRMBASE 1 now tell us how the alignment programs perform in cases when it is unknown if every motive occurs in every sequence thus providing a more realistic basis for assessing the alignment quality on locally related sequences compared to the situation in the old IRMBASE 1 where every motive always occured in every sequence. Each reference set in IRMBASE 2 and DIRMBASE 1 consist of 48 sequence families, 24 of which contain ROSE motifs of length 30 while the remaining 30 families contain motifs of length 60. 16 sequence families in each of the reference sets consist of 4 sequences each, another 16 families consist of 8 sequences while the remaining 16 families consist of 16 sequences. In ref1, random sequences of length 400 are added to the conserved ROSE motif while for ref2 and ref3, random seqences of length 500 are added. In ref4 random sequences of length 600 are added. For informations on ROSE, consult:

Results on IRMBASE 2 (Protein)

Following table contains the average sum-of-pairs scores (SPS, before slash) and column-scores (CS, after the slash) of the implanted ROSE motives (aligned positions of the random parts of the sequences are ignored) and the average CPU time of the evaluated alignment programs. CPU times were measured on an Intel Pentium 4 3.5 GHz, 2 GB RAM, Redhat Linux workstation. The symbols -, -- in the (total column only) denote statistically significant superiority and the symbols +, ++ denote statistically significant inferiority of DIALIGN-TX, respectively. The symbols + and - denote significance according to the Wilcoxon Matched Pairs Signed Rank Test with p≤0.05 and the symbols ++ and -- with p≤ 0.001 respectively.

Method ref1 ref2 ref3 ref4 Total CPU-Time
DIALIGN-TX 89.42 / 64.17 94.90 / 77.36 93.75 / 70.30 93.64 / 72.23 92.93   / 71.02 4.47 secs
DIALIGN-T 0.2.2 64.67 / 67.04 94.19 / 75.81 93.93 / 70.40 93.12 / 70.44 92.73   / 70.93 2.73 secs
DIALIGN 2.2 90.43 / 68.52 93.40 / 73.32 91.78 / 65.34 92.98 / 69.50 92.15 --/ 69.17 -- 4.98 secs
CLUSTAL W2 07.13 / 00.00 10.63 / 00.00 19.87 / 00.11 26.17 / 02.86 15.95 --/ 00.74 -- 1.86 secs
T-COFFEE 5.56 72.67 / 34.84 77.80 / 40.87 83.03 / 43.62 83.48 / 49.56 79.24 --/ 42.22 -- 26.41 secs
POA V2 87.56 / 50.99 49.57 / 16.95 41.90 / 11.79 37.56 / 10.18 54.15 --/ 22.47 -- 1.81 secs
MAFFT 6.240 L-INSi 82.78 / 37.81 84.29 / 39.54 84.15 / 32.79 82.42 / 38.75 84.41 --/ 32.22 -- 8.47 secs
MAFFT 6.240 E-INSi 90.53/ 45.70 94.37 / 52.37 93.11 / 43.11 94.79 / 54.82 93.20+/ 49.00 -- 15.35 secs
MUSCLE 3.7 32.67 / 04.65 34.82 / 06.87 54.19 / 14.80 57.84 / 19.65 44.88 --/ 11.49 -- 6.34 secs
PROBCONS 1.12 78.78 / 36.77 86.82 / 43.47 87.29 / 41.89 87.69 / 43.56 85.15 --/ 41.42 -- 28.27 secs

According to the Wilcoxon Matched-Pair Signed-Ranks Test DIALIGN-TX is significantly superior to all other methods on the IRMBASE 2, except MAFFT E-INSi on the sum-of-pair scores. However DIALIGN-TX strongly outperforms MAFFT-EINSi on the column-scores and is 3.4 times faster.

Results on DIRMBASE 1 (DNA)

Following table contains benchmark results on the DIRMBASE 1 restricted to the conserved ROSE motif.

Method ref1 ref2 ref3 ref4 Total CPU-Time
DIALIGN-TX 94.38 / 74.39 92.85 / 69.03 95.44 / 71.57 95.70 / 75.11 94.59  / 75.22 9.87 secs
DIALIGN-T 0.2.2 64.00 / 29.60 61.22 / 28.63 64.96 / 35.51 65.24 / 35.85 63.85 --/ 32.40 -- 2.31 secs
DIALIGN 2.2 92.61 / 69.95 91.10 / 68.19 94.62 / 71.25 94.13 / 72.48 93.12 --/ 70.47 -- 4.82 secs
CLUSTAL W2 06.79 / 00.00 08.27 / 00.00 18.51 / 02.19 29.09 / 04.99 15.66 --/ 01.80 -- 1.36 secs
T-COFFEE 5.56 14.71 / 00.00 18.88 / 00.18 32.08 / 04.01 43.39 / 08.44 27.62 --/ 03.16 -- 365.88 secs
POA V2 32.03 / 05.63 27.40 / 07.32 28.78 / 04.12 32.18 / 06.81 30.10 --/ 05.97 -- 1.20 secs
MAFFT 6.240 L-INSi 52.40 / 21.45 48.81 / 11.93 49.77 / 16.02 57.47 / 22.30 52.36 --/ 17.93 -- 5.33 secs
MAFFT 6.240 E-INSi 92.42 / 40.28 84.15 / 41.99 87.91 / 45.77 89.36 / 51.01 88.46 --/ 44.76 -- 8.39 secs
MUSCLE 3.7 48.17 / 14.18 54.40 / 16.18 56.57 / 19.62 60.24 / 30.43 56.84 --/ 20.10 -- 4.87 secs
PROBCONSRNA 1.10 13.00 / 00.73 12.94 / 00.05 20.28 / 01.34 32.56 / 04.31 19.69 --/ 01.61 -- 18.54 secs

According to the Wilcoxon Matched-Pair Signed-Ranks Test DIALIGN-TX is significantly superior to all other methods on the DIRMBASE 1.

Results on BALIBASE 3 (Protein)

Following table contains benchmark results on the BALIASBE 3 restricted to the so-calles core-blocks:

Method RV11 RV12 RV20 RV30 RV40 RV50 Total CPU-Time
DIALIGN-TX 51.52 / 26.53 89.18 / 75.23 87.87 / 30.49 76.18 / 38.53 83.65 / 44.28 82.28 / 46.56 78.83   / 44.34 33.37 secs
DIALIGN-T 0.2.2 49.30 / 25.32 88.76 / 72.55 86.29 / 29.20 74.66 / 34.90 81.95 / 45.23 80.14 / 44.25 77.31 --/ 42.76 - 27.79 secs
DIALIGN 2.2 50.73 / 26.50 86.66 / 69.55 86.91 / 29.22 74.05 / 31.23 83.31 / 44.12 80.69 / 42.50 77.52 --/ 41.49 -- 45.41 secs
CLUSTAL W2 50.06 / 22.74 86.43 / 71.59 85.16 / 21.98 72.50 / 27.23 78.93 / 39.55 74.24 / 30.75 75.36 --/ 37.35 -- 8.72 secs
T-COFFEE 5.56 58.22 / 31.34 92.27 / 81.18 90.93 / 37.81 79.09 / 36.57 86.03 / 48.20 86.09 / 50.63 82.41++/ 48.54++ 315.78 secs
POA V2 37.96 / 15.26 83.19 / 63.84 85.28 / 23.34 71.93 / 28.23 78.22 / 33.67 71.49 / 27.00 72.17 --/ 33.37 -- 8.07 secs
MAFFT 6.240 L-INSi 67.11 / 44.61 93.63 / 83.75 92.67 / 45.27 85.55 / 56.93 91.97 / 59.69 90.00 / 56.19 87.07++/58.57++ 19.51 secs
MAFFT 6.240 E-INSi 66.00 / 43.71 93.61 / 83.43 92.64 / 44.63 86.12 / 58.80 91.46 / 58.33 89.91 / 58.94 86.83++/ 58.37++ 28.26 secs
MUSCLE 3.7 57.90 / 33.03 91.67 / 80.46 89.17 / 35.22 80.60 / 38.77 87.26 / 45.96 83.39 / 44.94 82.19++/47.58++ 10.49 secs
PROBCONS 1.12 66.99 / 41.68 94.12 / 85.52 91.68 / 40.49 84.61 / 54.37 90.24 / 52.90 89.28 / 56.50 86.40++/55.66++ 168.65 secs

According to the Wilcoxon Matched-Pair Signed-Ranks Test DIALIGN-TX is significantly superior to DIALIGN-T, DIALIGN 2.2, POA and CLUSTAL W2 on the BALIBASE 3. Note, that the previous versions of the DIALIGN approach do not outperform CLUSTAL W2 on the BALIBASE 3. DIALIGN-TX is still outperformed by methods that strongly focus on global alignments like MAFFT, PROBCONS, MUSCLE and T-COFFEE.

Results on BRAliBase II (DNA)

Following table contains benchmark results on the BRAliBase II:

Method G2In rRNA SRP tRNA U5 Total CPU-Time
DIALIGN-TX 72.08 / 60.85 91.69 / 84.33 82.92 / 70.95 78.53 / 68.05 77.80 / 62.71 80.42   / 69.03 0.15 secs
DIALIGN-T 0.2.2 54.68 / 36.51 69.13 / 50.00 60.81 / 42.34 64.44 / 52.01 67.87 / 50.34 63.53 --/ 46.43 - 0.08 secs
DIALIGN 2.2 71.72 / 90.90 89.89 / 81.08 81.47 / 68.53 78.57 / 67.59 76.16 / 60.11 79.37 --/ 67.29 -- 0.09 secs
CLUSTAL W2 72.68 / 61.24 93.25 / 87.72 87.40 / 76.61 86.96 / 76.20 79.56 / 65.11 83.80++/72.85++ 0.07 secs
T-COFFEE 5.56 73.79 / 60.24 90.94 / 82.56 83.90 / 71.63 81.65 / 69.23 79.13 / 62.93 81.73 + / 69.01 1.95 secs
POA V2 67.22 / 55.21 88.92 / 80.38 85.47 / 73.77 76.91 / 66.03 77.28 / 61.63 79.02 --/ 67.12 -- 0.04 secs
MAFFT 6.240 L-INSi 78.93 / 65.23 93.85 / 87.49 87.46 / 76.75 91.79 / 84.59 82.80 / 68.46 88.84++/76.25++ 0.26 secs
MAFFT 6.240 E-INSi 77.39 / 63.84 93.80 / 87.34 87.24 / 76.59 90.60 / 83.29 80.46 / 65.71 85.71++/75.04++ 0.27 secs
MUSCLE 3.7 76.42 / 63.20 94.04 / 87.97 87.06 / 76.57 87.27 / 78.01 79.71 / 64.34 84.69++/73.64++ 0.05 secs
PROBCONSRNA 1.10 80.08 / 68.70 94.48 / 88.60 88.07 / 77.55 92.58 / 85.46 84.76 / 71.73 87.90++/78.19++ 0.24 secs

According to the Wilcoxon Matched-Pair Signed-Ranks Test DIALIGN-TX is significantly superior to DIALIGN-T, DIALIGN 2.2 and POA on the BRAliBase II. With respect to column scores there is no difference between DIALIGN-TX and T-COFFEE while on the sum-of-pairs scores T-COFFEE still outperforms DIALIGN-TX. All other methods outperform DIALIGN-TX on the strongly global BRAliBASE II on which PROBCONSRNA and MAFFT have been trained.

Further Information and Citation

More detailed descriptions of the results can be found here:

Back to main page