Sequence Assembly, Quality Report and Closest Match FAQ

This program assembles sequences and compare them to reference sequences. It does this using the following steps
  1. Assembles the sequences
  2. Reports the quality of the individual sequences and the overall consensus
  3. Displays an overview of the assemebly
  4. Aligns the consensus sequence with a file containing one or more reference sequences (must be in fasta format)
  5. Reports the percentage identity to closest match and displays the alignment between the user submitted sequence and the reference sequence
Details of the various steps are shown below
Overview
Basecalling
Contig Assembly
Quality Reports
Quality Scores
Assembly Information
Alignment with reference sequence
Trimming
Contig overview
Further Reading
Overview

This Assembly Tool allows users to upload trace files which belong to the same locus for one isolate. The Tool will then attempt to carry out basecalling, assign quality scores and assemble contig(s) from the traces. The Tool was developed to address problems caused by low quality trace files, it addresses these problems by using Phred to

i)Recall bases with high accuracy and

ii)Assign position-specific quality scores.


Basecalling

The Tool will try to carry out basecalling using the peak data in the traces (not the text file associated with the traces). This will work even if the dye chemistry of the traces is not recognised.

If you get the error message that says "Dye Chemistry is unknown" please contact Anthony Underwood. Basecalling is done using Phred version 0.020425.c. Because base-calling of the trace-file data is carried out by Phred, the bases called by Phred may be different from those present in the original trace files. Phred can read '.abi' and '.scf' files.


Contig Assembly

The Tool will assemble contigs from the trace files you upload. This may not be possible for the following reasons:

1) The peak data is of low quality so no assembly can be carried out because of too many disagreements between forward and reverse sequences.

2) You have uploaded less than two trace files for one contig.

Contig assembly is done using Phrap version 0.990329.


Quality Reports

 The Tool produces sequence reports for each sequence which can be by toggling the detail view button(+) in section 1) of the results. The quality scores for the consensus sequence and the trimmed consensus are shown in the section labelled 'Quality reports' and can be revealed by toggling the + button. The reports are colour-coded to represent variations in sequence quality. A different colour is used to show a different range of quality score. A complete table of colours for each score interval is displayed by the Tool.


Quality Scores

 Quality Scores are linked to the probability of a wrong base-call. The quality score is logarithmically linked to the error probability so that a score of 20 means that there is 1/100 probability that the base-call is wrong but a score of 30 means that the error probability is 1/1000. Hence a high quality score is desirable.


Assembly Information
If the sequences are assembled correctly then information about the resulting assembly will be reported.
Section 1a) reports the total length and average quality for the assembled sequence, and by using the + button in this section the consensus sequence can be viewed.
Section 1b reports the length of sequence that is double-stranded
Section 1c) described what gaps were/are present in the assembly. High quality gaps are those where the gap is caused by insertion of a base(s) of low quality and therefore there is a high probablility that the position should regarded as a gap and removed. Low quality gaps are those where the gaps are caused by a high quality base on one sequence but the other sequence(s) does not have a corresponding base. Therefore the gap can not be removed and remains in the consensus sequence.

Trimming
Section 2 reports the length and quality of the consensus sequence after low quality sequence has been removed. Again the trimmed sequence can be revealed by toggling the + button in this section

Contig Overview
This section displays a graphical overview of the assembled sequences. Sequences in the forward direction are shown in green and those in the reverse direction are shown in red. The regions trimmed off due to poor quality are shaded in grey. A more detailed look at the assembly can be viewed by clicking on the 'Click to view assembly' link.

Alignment with reference sequence
If you inluded a reference sequence, this section will display the closest match to your reference sequence and the percentage identity to it. The alignment between this closest match and your sequences will be displayed beneath. Bases highlighted in blue are those that differ from the reference sequence. Those coloured in red are bases with a low quality score.

Further Reading

Base-calling of automated sequencer traces using of phred.II. Error probabilities(Genome Res. 1998 Mar;8(3):186-94).

Base-calling of automated sequencer traces using phred.I.Accuracy assessment(Genome Res. 1998 Mar;8(3):175-85.).

Estimation of errors in "raw" DNA sequences: a validation study.(Genome Res. 1998 Mar;8(3):251-9.).

Evaluation of window cohabitation of DNA sequencing errors and lowest PHRED quality values.(Genet Mol Res. 2004 Dec 30;3(4):483-92.)

DNA Sequences Base Calling by PHRED: Error Pattern Analysis(RTInfo 3: 107-10, 2003)

Basecalling