Consider three enzymes with recognition sequences as indicated
(a caret symbol (^) or asterisk (*) is often inserted to mark the place where the
enzyme breaks the phosphodiester backbone)
It is important that you recognize the differences between the
three types of ends generated by restriction enzymes, and the three examples above
In the example of digestion with the enzyme BamHI, it's obvious
that the newly created ends of the DNA do not line up evenly with each other. On
each fragment, there is a four-nucleotide sequence 5'-GATC that hangs off the end
and doesn't base-pair (because the other fragment has broken away and moved off).
Since the one end that hangs over past the other has a free 5'
end, we say that BamHI digestion creates a "5'
overhanging end" which we sometimes call a "5' overhang."
Another term that means the same thing is to say that overhanging
ends are "cohesive ends"
or "sticky ends"
meaning that they could hydrogen bond to other compatible complementary strands (compatible in the sense of Watson-Crick base pairing).
By our usual convention of writing DNA, a 5' overhanging end has a characteristic
Some restriction enzymes leave a 3' overhanging
An example would be the enzyme Sac I:
Sac I searches for the sequence GAGCTC on each strand (once again,
GAGCTC reads the same off of both strands because the sequence is palindromic). The
enzyme breaks the phosphodiester bonds between the fifth and sixth nucleotides in
the recognition sequence.
5'-GAGCTC-3' Sac I 5'-GAGCT -3' + 5'- C-3'
3'-CTCGAG-5' ----> 3'-C -5' 3'- TCGAG-5'
Some restriction enzymes leave a blunt end.
What do we call a DNA molecule that has ends that line up evenly
with each other (i.e. neither end is overhanging)? We say the ends are "blunt" (meaning "not sharp")
(meaning "level or even").
For example, the enzyme Sma 1 cuts in the middle of the six nucleotide
5'-CCCGGG-3' Sma I 5'-CCC -3' + 5'- GGG-3'
3'-GGGCCC-5' ----> 3'-GGG -5' 3'- CCC-5'
Not all restriction
enzymes recognize sequences that are palindromic.
For example, the enzyme Bsr I cuts as follows (where "N"
can represent any nucleotide):
5'-ACTGGNN-3' Bsr I 5'-ACTGGN N-3'
3'-TGACCNN-5' ----> 3'-TGAC CNN-5'
The reason this is said to not be a palindromic sequence
is that the two strands read differently in their antiparallel directions.
The top strand is 5'-ACTGGNN while the bottom strand is 5'-NNCCAGT. Compare that
with the recognition sequence for an enzyme like Sma I (which is palindromic) and
reads 5'-CCCGGG on the top and bottom strand, and you can see there is a difference.
Restriction enzymes that do not recognize palindromic sequences might therefore be
described in some references using multiple sequences, since the top and bottom strand
read differently. That is useful if you are only scanning one strand of DNA for sites.
For example the enzyme Bsr I, which was just described, is described by the sequences
ACTGGN^ and C^CAGT
Some restriction enzyme sequences cut outside
of their recognition sequence
An example of this is the enzyme Bsr I, just described.
5'-A C T G G N^N-3'
3'-T G A C^C N N-5'
On one strand, the enzyme breaks the phosphodiester backbone between the two unspecified
"N" bases just 3' to the ACTGG. Here are some other examples of enzymes
that have an extended "reach":
What type of DNA end does Mnl I leave?
...sometimes written CTGAAG (16/14)
Here's an interesting trick. Ksp632I can be used, in combination
with a single-stranded DNA nuclease such as mung bean nuclease or S1 nuclease to
generate a 3 nucleotide deletion. Thus, you could use this trick to repetitively
remove three nucleotides in a protein coding sequence (not changing the reading frame,
but possibly introducing a mutation as well as a deletion). Don't you agree that
site-directed mutagenesis is easier with PCR?
Some enzymes have split recognition sequences
Consider the enzyme Asp 700, with the restriction enzyme recognition sequence:
The 6 nucleotides of recognition sequence is split - palindromic,
with the four internal nucleotides not specified. What type of end does Asp 700 leave?
Can you figure out how the enzyme Dra III cuts, from the recognition sequence:
Would the overhanging end left by Dra III be the same at every
Some enzymes accept degenerate sequences
We've been using "N" nucleotides in our recognition sequences,
but the N is obviously non-specific. There are enzymes that have partial degeneracies
in their recognition sequence.
An example of this is Aha II, recognizing the sequence GR^CGYC, where R = G or A,
and Y = T or C. For Aha II then, the following are all acceptable recognition sequences:
Additional examples follow:
Not all restriction
enzymes recognize six-nucleotide pair sequences.
You may have already noticed this is true, but here are two examples:
5'-TTAATTAA-3' Pac I 5'-TTAAT TAA-3'
3'-AATTAATT-5' ----> 3'-AAT TAATT-5'
5'-NAATTN-3' Tsp509 I 5'-N AATTN-3'
3'-NTTAAN-5' ----> 3'-NTTAA N-5'
Note that there are 8 nucleotides that specify the location of
a PacI restriction site, and 4 that specify the location of a Tsp509I site.
I can hear you thinking:
"Wait a minute! The Tsp509I has 6 nucleotides
in its sequence, not 4!"
That's true, but since the two "N" nucleotides could
be anything, they don't really add to the specificity of the recognition sequence.
They are included only as "placeholders" to help indicate where the phosphodiester
bonds are broken. The following are all valid Tsp509I sites (I've highlighted the
central four nucleotides in red):
AAATTG TAATTA CAATTC GAATTA AAATTC etc.
TTTAAC ATTAAT GTTAAG CTTAAT TTTAAG
You may be amazed to notice that the recognition sequence for PacI
(TTAATTAA) also has a
Tsp509I site in it!
A calculation to ponder:
The AATT sequence is something that occurs by chance pretty frequently.
If a DNA sequence is evenly made up of G, A, T, and C nucleotides (i.e. 25% of each),
we would expect to find the sequence "AATT" by chance about every 256 nucleotides
on the average. Why is that? Because if we point to a nucleotide in a sequence at
random, the chances would be one in four that it would be "A" (the first
nucleotide in the recognition sequence). The chance that the next nucleotide is also
"A" is also 1 in 4; the chance that the nucleotide after that is "T"
is 1 in 4; and the chance that the next one is also "T" is also 1 in 4.
Therefore, the chance that we have randomly pointed to a sequence that reads "AATT'
(1/4) x (1/4) x (1/4) x (1/4) = 1/256
Any recognition sequence that was four nucleotides in length could
be found every 256 nucleotides (on the average) in this simple scenario. In actuality,
sequences are usually not evenly made up of G, A, T, and C nucleotides, which skews
the statistics a bit. In addition, certain short sequences may be more or less common
in the DNA, which will also affect the frequency with which a recognition sequence
is found. The dinucleotide CG is very uncommon in mammalian DNA, which makes it less
likely that you will find a recognition sequence for the enzyme Hpa II (C^CGG).
Longer recognition sequences lead to lower probability of having
a site at any point in a DNA strand. In our simplistic scenario where every nucleotide
is evenly distributed in DNA, you would expect to find a PacI site every 65,000 nucleotides
(on the average). That's because there's a one in four chance that each of the eight
nucleotides (taken individually) in a random sequence is just right for PacI. The
chances of being lucky eight times in a row is one over four to the eighth power,
or about one over sixty five thousand.
Enzymes with recognition sequences from 4 to 8 nucleotides in length each have uses
in genetic engineering.
6-cutters (i.e. enzymes that have recognition sequences specified
by six nucleotides) are good for day-to-day cloning work: You are using 6-cutters
in the experiments you are performing in the lab, because they cut frequently enough
that there are one or two sites in the plasmid, but infrequently enough that they
do not cut into the essential elements such as the origin of replication or ampicillin
resistance gene. An example of a 6-cutter is HindIII (A^AGCTT) which cuts the genome
of bacteriophage lambda (48 kbp) at 7 sites.
8-cutters are good for carving up chromosomes into specific pieces
that are still quite large. PacI might cut the E. coli chromosome into only
about 20 pieces, for example, whereas BamHI might cut it into about 300 pieces. Example
of where an 8 cutter might be of use: Suppose you were trying to obtain a specific
fragment of a yeast chromosome, for example. Then, it would be impractical to use
a 6-cutter enzyme because you would generate too many small fragments during your
digestion. An 8-cutter might cut infrequently, generating larger and more useful
products. An example of an 8-cutter is NotI (GC^GGCCGC) - the NotI recognition sequence
is not present in the genome of bacteriophage lambda.
4-cutters are good for experiments where you want the possibility
of cleavage at many potential sites. For example, if you want to gather a collection
of random DNA fragments, some of which may contain a gene you want, you can perform
a partial digestion (not all sites are cleaved due to limitation in enzyme activity)
using a 4-cutter. One that is commonly used for this purpose is Sau3AI, which cleaves:
5'-NGATCN-3' Sau3AI 5'-N GATCN-3'
3'-NCTAGN-5' ----> 3'-NCTAG N-5'
There are 116 Sau3AI sites in the genome of bacteriophage lambda.