1. Using the weight matrix calculated from the frequency table, compiled by Hawley&McClure 1983 as shown in slide, evaluate the likelihood that the following sequences are -10 promoter of E. coli.

1.1 TAAATT

1.2 TAGAAT

1.3 TGTAAT

The weight matrix calculated from the frequency table, compiled by Hawley&McClure 1983 of the three sequences are

 Sequence No Score from weight matrix (E.coli : 25.4% G or C) Sum 10 sum 1.1 TAAATT 0.01 + 0.58 + 0.02 + 0.38 - 0.16 + 0.59 1.92 83.18 1.2 TAGAAT 0.51 + 0.58 - 0.20 + 0.38 + 0.32 + 0.59 2.18 151.36 1.3 TGTAAT 0.51 – 1.45 + 0.25 + 0.38 + 0.32 + 0.59 0.6 3.98

TAGAAT is 10 2.18 = 151.36 times more likely to be a – 10 promoter of E.coli than a random sequence
2. Use the dot matrix method to identify the similarity between these two sequences.

2.1 ATGCTACGGGTAATC

2.2 GGTAATCATGCTACGG

 A T G C T A C G G G T A A T C G . . . . G . . . . T . . . . A . . . . A . . . . T . . . . C . . A . . . . T . . . . G . . . . C . . . T . . . A . . . . C . . . G . . . . G . . . .
There are two diagonal lines show local identity of these two sequences at the first 7 and the last 8 bases of the first sequence. (two diagonal lines parallel to the main diagonal line).

There is also a DNA rearrangement (a line perpendicular to the diagonal line at 11th to 14th of the first sequence)

3. Submit the following sequences to Genbank and explain the results.

3.1 agtcgaacgg aaaggtctct

3.2 agtcgaacgg aaaggtctct tcgagtggcg aacgggtgag taacacgtgg

When submit the following sequences to Genbank the results are

3.1 agtcgaacgg aaaggtctct

Query= prasit

(20 letters)

Database: nt

591,775 sequences; 1,603,993,870 total letters

Distribution of 100 Blast Hits on the Query Sequence

Score E

Sequences producing significant alignments: (bits) Value N

emb|X58888.1|ML16SRRN M.leprae gene for 16S ribosomal RNA 40 0.008 1
gb|M29575.1|MSGRR16SQ M.kansasii 16S ribosomal RNA 40 0.008 1
emb|AJ007315.1|MCA7315 Mycobacterium canettii 16S rRNA ... 40 0.008 1
emb|Z83862.1|MTCY149 Mycobacterium tuberculosis H37Rv c... 40 0.008 1
gb|AF152559.1|AF152559 Mycobacterium sp. 'MCRO 33' 16S ... 40 0.008 1
emb|Z13990.1|MU16SRRN M.ulcerans 16S ribosomal RNA 40 0.008 1
emb|X88923.1|MHR16SRNA M.haemophilum 16S rRNA gene 40 0.008 1
emb|X52919.1|MGA16SRN Mycobacterium gastri 16S rRNA gene 40 0.008 1
gb|U06638.1|MHU06638 Mycobacterium haemophilum 16S ribo... 40 0.008 1

3.2 agtcgaacgg aaaggtctct tcgagtggcg aacgggtgag taacacgtgg

`Query= prasit2`
`         (50 letters)`
`Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences`
527,570 sequences; 1,523,322,523 total letters
`Score     E`
```Sequences producing significant alignments:                        (bits)  Value
emb|X70960.1|MH16SRRNA  M.heidelbergense 16S rRNA                  60  4e-08
gb|AF152560.1|AF152560  Mycobacterium malmoense 16S ribosoma...    60  4e-08
gb|AF152559.1|AF152559  Mycobacterium sp. 'MCRO 33' 16S ribo...    60  4e-08
gb|AF115940.1|AF115940  Uncultured Corynebacterium sp. MTcor...    60  4e-08
emb|AJ131120.1|MTU131120  Mycobacterium tuberculosis 16S rRN...    60  4e-08
emb|AJ007315.1|MCA7315  Mycobacterium canettii 16S rRNA gene...    60  4e-08
emb|Z83862|MTCY149  Mycobacterium tuberculosis H37Rv complet...    60  4e-08
gb|AF059853|AF059853  Mycobacterium avium strain ATCC25291 1...    60  4e-08
gb|AF059851|AF059851  Mycobacterium fortuitum strain ATCC684...    60  4e-08```

`The short sequence result = emb|AJ007315.1|MCA7315 Mycobacterium canettii 16S rRNA ... 40 0.008 1`
`The long  sequence result =emb|AJ007315.1|MCA7315  Mycobacterium canettii 16S rRNA gene...    60  4e-08`
The result of submitted sequences show the effect of query sequence length. When blast with the same sequence the longer sequence show higher total score and lower probability to be randomly similar (without homologous) to the database sequence. Thus blast result of longer sequence give more probability to be significantly homologous than the shorter one.