Sunday, 19 June 2011

What is DNA

DNA, or deoxyribonucleic acid, is the hereditary material in humans and almost all other organisms. Nearly every cell in a person’s body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA).
The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Human DNA consists of about 3 billion bases, and more than 99 percent of those bases are the same in all people. The order, or sequence, of these bases determines the information available for building and maintaining an organism, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences.
DNA bases pair up with each other, A with T and C with G, to form units called base pairs. Each base is also attached to a sugar molecule and a phosphate molecule. Together, a base, sugar, and phosphate are called a nucleotide. Nucleotides are arranged in two long strands that form a spiral called a double helix. The structure of the double helix is somewhat like a ladder, with the base pairs forming the ladder’s rungs and the sugar and phosphate molecules forming the vertical sidepieces of the ladder.
An important property of DNA is that it can replicate, or make copies of itself. Each strand of DNA in the double helix can serve as a pattern for duplicating the sequence of bases. This is critical when cells divide because each new cell needs to have an exact copy of the DNA present in the old cell.
DNA is a double helix formed by base pairs attached to a sugar-phosphate backbone.
DNA is a double helix formed by base pairs attached to a sugar-phosphate backbone.

For more information about DNA:

The National Human Genome Research Institute fact sheet Deoxyribonucleic Acid (DNA)This link leads to a site outside Genetics Home Reference. provides an introduction to this molecule.
For additional information about the structure of DNA, please refer to the chapter called What Is A Genome?This link leads to a site outside Genetics Home Reference. in the NCBI Science Primer. Scroll down to the heading “The Physical Structure of the Human Genome.”


Deoxyribonucleic acid (/diˌɒksiˌraɪbɵ.njuːˌkleɪ.ɨk ˈæsɪd/ ( listen)), or DNA, is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms (with the exception of RNA viruses). The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, like a recipe or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Along with RNA and proteins, DNA is one of the three major macromolecules that are essential for all known forms of life.
DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore anti-parallel. Attached to each sugar is one of four types of molecules called nucleobases (informally, bases). It is the sequence of these four nucleobases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription.
Within cells, DNA is organized into long structures called chromosomes. These chromosomes are duplicated before cells divide, in a process called DNA replication. Eukaryotic organisms (animals, plants, fungi, and protists) store most of their DNA inside the cell nucleus and some of their DNA in organelles, such as mitochondria or chloroplasts.[1] In contrast, prokaryotes (bacteria and archaea) store their DNA only in the cytoplasm. Within the chromosomes, chromatin proteins such as histones compact and organize DNA. These compact structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed.

United States - African ancestry

Y-DNA and mtDNA testing may be able to determine with which peoples in present-day African country a person shares a direct line of part of his or her ancestry, but patterns of historic migration and historical events cloud the tracing of ancestral groups. Testing company African Ancestry[20] maintains an "African Lineage Database" of African lineages from 30 countries and over 160 ethnic groups. Due to joint long histories in the US, approximately 30% of African American males have a European Y chromosome haplogroup[21] Approximately 58% of African Americans have the equivalent of one great-grandparent (12.5 percent) of European ancestry. Only about 5% have the equivalent of one great-grandparent of Native American ancestry. By the early 19th century, substantial families of Free Persons of Color had been established in the Chesapeake Bay area who were descended from people free during the colonial period; most of those have been documented as descended from white men and African women (servant, slave or free). Over time various groups married more within mixed-race, black or white communities.[22]
According to authorities like Salas, nearly three-quarters of the ancestors of African Americans taken in slavery came from regions of West Africa. The African-American movement to discover and identify with ancestral tribes has burgeoned since DNA testing became available. Often members of African-American churches take the test as groups.[citation needed] African Americans cannot easily trace their ancestry during the years of slavery through surname research, census and property records, and other traditional means. Genealogical DNA testing may provide a tie to regional African heritage.

Biogeographical ancestry

Autosomal DNA testing purports either to determine the "genetic percentages" of a person's ancestry from particular continents/regions or to identify the countries and "tribes" of origin on an overall basis. Admixture tests arrive at these percentages by examining SNPs, which are locations on the DNA where one nucleotide has "mutated" or "switched" to a different nucleotide. Tests' listing geographical places of origin use alleles—individual and family variations on various chromosomes across the genome analyzed with the aid of population databases. As further detailed below, this latter type of test concentrates on standard identity markers, such as the CODIS profile, combined with databases such as OmniPop, ENFSI and proprietary adaptations of published studies.
The admixture tests are designed to tell what percentages a person has of ancestry of Native American, "European", East Asian, and Sub-Saharan African. One company[14] describes these four biogeographic groups as follows:
  • Native American: Populations that migrated from Asia to inhabit North, South and Central America.
  • European: European, Middle Eastern and South Asian populations from the Indian subcontinent, including India, Pakistan and Sri Lanka.
  • East Asian: Japanese, Chinese, Mongolian, Korean, Southeast Asian and Pacific Islander populations, including populations native to the Philippines.
  • African: Populations from Sub-Saharan Africa such as Nigeria and Congo region.
Based on customer feedback, the company in June 2007 introduced a new version of its EURO DNA test, with a more limited range of countries, that promises to provide more meaningful clues to one's European ancestry. Both tests: the four-part ethnicity estimate and EURO DNA test, identify a high number of so-called Ancestry Informative Markers (AIM), whose genetic distance between populations reflects the populations' geographic distance from each other. The location and variation of these AIMs are proprietary to the company and have never been published.
In 2006, another company[15] developed an autosomal DNA ancestry-tracing product that combined the traditional CODIS markers used by law enforcement officers and the judicial system with OmniPop, a population database developed by San Diego detective Brian Burritt. Customers received matches to their profile's frequency of occurrence in world populations, as well as a breakout for European ancestry based on the European Network of Forensic Science Institutes (ENFSI).[16] As a public service, the company has supported the expansion of OmniPop, which currently encompasses over 360 populations, double that of its first release. The ENFSI calculator uses data from 24 European populations (5700 profiles). The two databases must be searched separately, because they are based on two different sets of markers. The company sells its product as the DNA Fingerprint Test. The 16 markers incorporated in its results are: D8S1179, D21S11, D7S820, CSFIPO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, VWA, TPOX, D18S51, D5S818, and FGA.
The theory behind using a forensic profile for ancestry tracing is that the alleles' respective frequency of occurrence develops over generations with equal input of the two parents, since for each location we take one value from our mother and one from our father. It thus serves as a window into a person's total ancestral composition. The configuration of scores reflects inherited changes from all previous generations in all ancestral lines, and can predict an individual's unique probable ethnic matches based on the profile's frequency or rarity in different populations.[17]
To give an idea of the inclusiveness of the latest version of OmniPop, the following are the last populations that have been added:
  • Greek
  • Sikkim (India)
  • Bhutia (India)
  • Italian
  • Argentinian (Misiones)
  • Hungarian (E. Romani)
  • Hungarian (Ashkenazim)
  • Romanian (Szekler)
  • Romanian (Csango)
  • Tibet (Luoba)
As studies from more populations are included, the accuracy of results should improve, leading to a more informative picture of one's ancestry.
Along the same lines, yet another company[18] identifies the indigenous and diaspora populations in which an individual's autosomal STR profile is most common. This test examines autosomal STRs, which are locations on a chromosome where a pattern of two or more nucleotides is repeated and the repetitions are directly adjacent to each other. The populations in which the individual's profile is most common are identified and assigned a likelihood score. The individual's profile is assigned a likelihood of membership in each of thirty-four world regions:[19]

Mitochondrial DNA (mtDNA) testing

A person's matrilineal or mother-line ancestry can be traced using the DNA in his or her mitochondria, the mtDNA, as follows: This mtDNA is passed down by the mother unchanged, to all children. If a perfect match is found to another person's mtDNA test results, one may find a common ancestor in the other relative's (matrilineal) "information table", similar to the patrilineal or Y-DNA testing case above. However, because mtDNA mutations are very rare, a nearly perfect match is not as helpful as it is for the above patrilineal case. In the matrilineal case, it takes a perfect match to be very helpful.[6]
Note that, in cultures lacking matrilineal surnames to pass down, neither relative above is likely to have as many generations of ancestors in their matrilineal information table as in the above patrilineal or Y-DNA case: for further information on this difficulty in traditional genealogy, due to lack of matrilineal surnames, see Matrilineality's section Matrilineal surname.[7]
Some people cite paternal mtDNA transmission as invalidating mtDNA testing,[8] but this has not been found problematic in genealogical DNA testing, nor in scholarly population genetics studies. See the rest of this article.

mtDNA by current conventions is divided into three regions. They are the coding region (00577-16023) and two Hyper Variable Regions (HVR1 [16024-16569], and HVR2 [00001-00576]).[9] All test results are compared to the mtDNA of a European in Haplogroup H2a2a. This early sample is known as the Cambridge Reference Sequence (CRS). A list of single nucleotide polymorphisms (SNPs) is returned. The relatively few "mutations" or "transitions" that are found are then reported simply as differences from the CRS, such as in the examples just below.

The two most common mtDNA tests are a sequence of HVR1 and a sequence of both HVR1 and HVR2. Some mtDNA tests may only analyze a partial range in these regions. Some people are now choosing to have a full sequence performed, to maximize their genealogical help. The full sequence is still somewhat controversial because it may reveal medical information.

DNA Testing for Pakistan

Bio-Synthesis, Inc. (BSI) is an AABB accredited laboratory and supplier of DNA testing services for the United States Citizenship and Immigration Services (USCIS), U.S. Embassies, and many immigration offices around the world. BSI provides the DNA parentage tests for immigration purposes which are required to be performed by an AABB accredited member. With more than a decade of experience in providing familial DNA Testing worldwide, BSI has become an expert on immigration DNA testing interpretations for government agencies, local Embassies, and our immigration clients.
Our immigration consultants are dedicated to serving our clients in Pakistan by working closely with U.S. Embassies and petitioners, as well as with immigration attorneys, and Embassy-approved panel physicians within Pakistan. Each of the testing processes is carefully monitored to ensure appropriate family verification, accuracy, and a satisfying experience. 
What you need to know?
  • Most tests require 3-way shipping to and from the Embassy in Pakistan.
  • Bio-Synthesis, Inc. will ship the kit from the U.S. to the Embassy.
  • Upon completion of the test, the DNA Identity Testing Laboratory will ship test results to the U.S. Embassy in Pakistan; via Fed Ex as well as a copy to our clients.

 How to start?
  1. Fill out Immigration DNA Test Application and fax your DNA request letter provided by the Embassy to 972-420-0442 or call 1-800-DNA-EXAM (800-362-3926) for a complete quote for your DNA Testing needs. We will review your case promptly and provide a total quote for your ease.
  2. We will include all costs of shipping and DNA Testing fees with the exception of the panel physician fee. We cannot pay for the collection of international DNA Samples because these fees must be paid in local Pakistan currency. Tested parties within the U.S. pay their collection fees when they pay for the test.
  3. Once samples are received we will begin analyzing them. After the testing is complete a certified copy of the results will be shipped to the Embassy and to you. We also include a letter of explanation for your convenience.

DNA Testing: An Introduction For Non-Scientists An Illustrated Explanation

 The explanation of DNA testing that follows is intended as an introduction to the subject for those who may have limited backgrounds in biological science.  While basically accurate, this explanation involves liberal use of illustration and, in some cases, over-simplification.  Although intended to be informative, this is brief and incomplete explanation of a complex subject.  The author suggests consulting the scientific literature for more rigorous details and alternative views. 
DNA is material that governs inheritance of eye color, hair color, stature, bone density and many other human and animal traits.  DNA is a long, but narrow string-like object.  A one foot long string or strand of DNA is normally packed into a space roughly equal to a cube 1/millionth of an inch on a side.  This is possible only because DNA is a very thin string.
Our body's cells each contain a complete sample of our DNA.  One cell is roughly equal in size to the cube described in the previous paragraph.  There are muscle cells, brain cells, liver cells, blood cells, sperm cells and others.  Basically, every part of the body is made up of these tiny cells and each contains a sample or complement of DNA identical to that of every other cell within a given person.  There are a few exceptions.  For example, our red blood cells lack DNA.  Blood itself can be typed because of the DNA contained in our white blood cells.  
Not only does the human body rely on DNA but so do most living things including plants, animals and bacteria.  
A strand of DNA is made up of tiny building-blocks.  There are only four, different basic building-blocks.  Scientists usually refer to these using four letters, A,  T,  G,  and C.  These four letters are short nicknames for more complicated building-block chemical names, but actually the letters (A,T, G and C) are used much more commonly than the chemical names so the latter will not be mentioned here.  Another term for DNA's building blocks is the term, "bases."  A, T, G and C are bases. 
For example, to refer to a particular piece of DNA, we might write:  AATTGCCTTTTAAAAA.  This is a perfectly acceptable way of describing a piece of DNA. Someone with a machine called a DNA synthesizer could actually synthesize the same piece of DNA from the information AATTGCCTTTTAAAAA alone.  
The sequence of bases (letters) can code for many properties of the body's cells.  The cells can read this code.  Some DNA sequences encode important information for the cell.  Such DNA is called, not surprisingly, "coding DNA."  Our cells also contain much DNA that doesn't encode anything that we know about.  If the DNA doesn't encode anything, it is called non-coding DNA or sometimes, "junk DNA."[1]  
The DNA code, or genetic code as it is called, is passed through the sperm and egg to the offspring.  A single sperm cell contains about three billion bases consisting of A, T, G and C that follow each other in a well defined sequence along the strand of DNA.  Each egg cell also contains three billion bases arranged in a well-defined sequence very similar, but not identical to the sperm. 
Both coding and non-coding DNAs may vary from one individual to another.  These DNA variations can be used to identify people or at least distinguish one person from another. 
What is a Locus?
A locus (with a hard "c", LOW-KUS)  is simply a location in the DNA.  The plural of locus is, loci ( with a soft "c", pronounced LOW-S-EYE).  Again, the DNA is a long string like object as illustrated below.  A locus is simply a location in the DNA.  Such locations, or loci, reside at specific places on chromosomes. 
What is a Chromosome?
When a cell is getting ready to divide creating two daughter cells, it packs its DNA into bundles called chromosomes.  Chromosomes are just bundles of DNA.  For humans, there are consistently 23 pairs of chromosomes, each with a consistent size and shape.  Chromosomes are numbered.  Chromosome number 1 is the largest chromosome; chromosome number 2 a little smaller and so on.  Among the 23 pairs of chromosomes there is a pair called the sex chromosomes.  This is something of a misnomer, since there are many functions on the "sex" chromosomes that have nothing to do with sex.  In females, the sex-chromosome pair consists of two similar size chromosomes called X chromosomes.   Males have one X and one small Y chromosome.

Unless it has been purified, our DNA is actually not a loosely tangled string as illustrated but rather is well organized and packaged into what are called chromosomes.  A chromosome is a tightly folded bundle of DNA.  Chromosomes are most visible when cells divide.  In a microscope, chromosomes look something like this without the numbers and letters:

The illustration shows a pair of chromosomes named chromosome number 4, one pair among 23 pairs of chromosomes.  The illustration also shows the position of a locus that happens to be called "GYPA."   In this example, the chromosome on the left has the variation called the B allele while the chromosome on the right has the variation called the A allele. 
What are alleles?
Alleles (ALL-EELS') are just variations at a particular site on a chromosome.  Since each chromosome has a similar chromosome partner (except for males with their X and Y chromosomes) each locus is duplicated.  Loci can vary a bit.  If a person has two identical versions of the locus, they are said to be homozygous (HOMO-Z-EYE'-GUS).  If there is a difference, they are said to be heterozygous (HETERO-Z-EYE'-GUS).


There have been two main types of forensic DNA testing.  They are often called, RFLP and PCR based testing, although these terms are not very descriptive.  Generally, RFLP testing requires larger amounts of DNA and the DNA must be undegraded.  Crime-scene evidence that is old or that is present in small amounts is often unsuitable for RFLP testing.  Warm moist conditions may accelerate DNA degradation rendering it unsuitable for RFLP in a relatively short period of time. 
PCR-based testing often requires less DNA than RFLP testing and the DNA may be partially degraded, more so than is the case with RFLP.  However, PCR still has sample size and degradation limitations that sometimes may be under-appreciated.  PCR-based tests are also extremely sensitive to contaminating DNA at the crime scene and within the test laboratory.  During PCR, contaminants may be amplified up to a billion times their original concentration.  Contamination can influence PCR results, particularly in the absence of proper handling techniques and proper controls for contamination.
PCR is less direct and somewhat more prone to error than RFLP.  However, PCR has tended to replace RFLP in forensic testing primarily because PCR based tests are faster and more sensitive.   
 RFLP has been almost entirely replaced by PCR-based testing.  The following description of RFLP is included here primarily for historic reasons (more current formats see below).
 RFLP DNA testing has four  basic steps:
1.  The DNA from crime-scene evidence or from a reference sample is cut with something called a restriction enzyme.  The restriction enzyme recognizes a particular short sequence such as AATT  that occurs many times in a given cell's DNA.  One enzyme commonly used is called Hae III (pronounced: Hay Three) but the choice of enzyme varies.   For RFLP to work, the analyst needs thousands of cells.  If thousands of cells are present from a single individual, they will all be cut in same place along their DNA by the enzyme because each cells DNA is identical to every other cell of that person.
2.  The cut DNA pieces are now sorted  according to size by a device called a gel.  The DNA is placed at one end of a slab of gelatin and it is drawn through the gel by an electric current.  The gel acts like a sieve allowing small DNA fragments to move more rapidly than larger ones.  
3.  After the gel has separated the DNA pieces according to size, a blot or replica of the gel is made to trap the DNA in the positions that they end up in, with small DNA fragments near one end of the blot and large ones near the other end.  The blot is now treated with a piece of DNA called a probe.  The probe is simply a piece of DNA that binds to the DNA on the blot in the position were a similar sequence (the target sequence) is located. 
4.  The size or sizes of the target DNA fragments recognized by the probe are measured.  Using the same probe and enzyme,  the test lab will perform these same steps for many people.  These sizes and how they distribute among large groups of people form a database.  From the database a rough idea of how common a given DNA size measured by a given probe is found.  The commonness of a given size of DNA fragment is called a population frequency.

The restriction enzyme cuts the DNA into thousands of fragments of nearly all possible sizes.  The sample is then electrophoretically separated.  The DNA at this point is invisible in the gel unless the DNA is stained with a dye.  A replica of the gel's DNA is made on something called a blot (also called a Southern blot) or membrane.  The blot is then probed (mixed with) a special preparation of DNA that recognizes a specific DNA sequence or locus.  Often, the probe is a radioactively labeled DNA sequence (represented by * labeled object in the figure above).  Excess probe is washed off the blot, then the blot is laid onto X-ray film.  Development reveals bands indicating the sizes of the alleles for the locus within each sample.  The film is now called an "autorad."  The band sizes are measured by comparing them with a "ladder" of known DNA sizes that is run next to the sample.  A match may be declared if two samples have RFLP band sizes that are all within 5% of one another in size


PCR is an abbreviation for "polymerase chain reaction."   (POLL'-IM-ER-ACE).  This term applies to a wide variety of different DNA tests that differ in reliability and effectiveness.   Reliabilities of each kind of PCR test need independent verification.  PCR itself doesn't accomplish DNA typing, it only increases the amount of DNA available for typing. 
PCR uses constant regions of DNA sequence to prime the copying of variable regions of DNA sequence.

PCR typically uses two short pieces of known DNA called primers (small arrows below).  These serve as starting points for the copying of a region of DNA.

Many forensic laboratories use commercial supplied DNA testing kits that contain key components for certain PCR-based tests.  PM plus DQA1TM, Profiler PlusTM and CofilerTM and IdentifilerTM are all test kits commercially supplied by PE Applied Biosystems.  PowerPlexTM is another test kit with variations supplied by Promega.  PowerPlex kits have published primers, an advantage if the precise DNA targeted is to be recorded for posterity or studied for research.  As of 2005, Profiler PlusTM and CofilerTM and PowerPlexTM are probably the most commonly used test kits in US forensic laboratories.   
PCR Contamination
            It is worth considering contamination early in this discussion since this is a well-recognized limitation.  Unfortunately, the importance contamination in PCR is often underestimated.  PCR copies DNA efficiently if the initial DNA is in good condition.  A single DNA entity (molecule) can become millions or billions of DNA molecules in about three hours.  The PCR process is sometimes compared to a Xerox machine since many copies are made.  While initially, this is a useful comparison, it doesn't communicate the true, chain-reaction nature of PCR.  In PCR, the original DNA is copied, then the copies are copied, those copies are copied and so on.  This results in dramatic increases in the amount of DNA that couldn't be easily accomplished in the Xeroxing analogy.  The PCR process deserves its classification as a "chain-reaction" because it has much in common with other chain reactions such as avalanches. 
            PCR is also very similar to what happens when a clinical infection occurs.  Clinicians have known for many years that a single germ (bacterial cell or virus) contaminating a wound can produce a massive infection.  Similarly, a DNA molecule can contaminate (infect) a PCR and become a significant problem.  The ability of small amounts of DNA to produce false and misleading results is well-known and well-documented within the research community, where the technology originated.  Anyone who has caught a cold from an unknown source, or who has a pollen allergy should have some sense of how easily PCRs are contaminated.  Actually, it is probably easier to contaminate a PCR than to catch a cold since unlike our bodies, PCRs lack immune systems.  The only protection PCRs have is the technique of the analyst, use of control samples to monitor contaminants and careful interpretation. 
            Prevention of false results involves the use of carefully applied controls and techniques.  As described later, such controls and techniques can rarely guarantee that contamination hasn't influenced the results.  In forensic DNA testing, some of the scientifically worst-case scenarios can be prevented by keeping DNA samples from known individuals well out of range of other items of evidence at all stages.  Most forensic DNA laboratories perform negative controls, blank samples that will often detect contaminants in the laboratory.  The blanks detect contaminants by showing partial or full DNA profiles representing the contaminants.  Alternatively, the blank may show no profile, consistent with, but not proving that contamination didn't occur.  Unfortunately, a few forensic DNA laboratories omit their controls.  A few favor the controls by using special equipment on them, or by not carrying them through the entire procedure.  Such practices are hazardous, especially when an important evidentiary sample has a low amount of DNA, degraded DNA, or otherwise presents as a minimal or partial (see below) sample.  In short, while PCR is a useful research tool, all applications require extreme care and vigilance.   


This will be presented in some detail because STRs are important in current, forensic DNA testing.  The abbreviation, STR stands for Short Tandem Repeat.  STRs are the type of DNA used in most of the currently popular forensic DNA tests.  STR is a generic term that describes any short, repeating DNA sequence.  For example, the DNA sequence ATATATATATAT is an STR that has a repeating motif consisting of two bases, A and T.  It turns out that our DNA has a variety of STRs scattered among DNA sequences that encode cellular functions.  For reasons that are not entirely understood, people vary from one another in the number of repeats they have, at least for some STR loci.  For example, person #1 may have ATATAT at a particular locus while person #2 may have ATATATATATAT.  Thus, STRs are often variable (polymorphic) and these variations are used to try and distinguish people.  The term, STR doesn't necessarily imply PCR.  PCR is one of many methods that might be used to help analyze STRs.  STRs have also been analyzed by DNA sequencing for example.  To understand PCR-assisted STR typing, it is useful to briefly consider how such PCRs are designed.
Suppose that laboratory data revealed the following DNA sequence:
The STR is underlined and consists of the sequence, GATA repeated 7 times.  The dashes at the beginning and end of the overall sequence shown indicate that there is more sequence available both upstream and downstream of the region shown.  Remember, DNA is relatively very long and linear and we are just going to look at a small region of it. 
Now, let's say we want to design a PCR to examine this same locus in other people.  To design the PCR, we need two primers, short synthetic DNA molecules that recognize the region.  One primer might be, ATGCTAGTA (Italics, in the above sequence) a sequence that would recognize the DNA flanking the left side of the STR.  The second primer might be, AAAAAAAATTTTTT.  This is called the downstream primer and it might be difficult to recognize in the sequence.  The reason it is difficult to recognize at first is that it is the complement of the sequence, AAAAAAAATTTTTT (italics, on the right in the longer sequence above).  See "General Considerations", for a more detailed discussion.
 What is the complement of a DNA sequence?  This might be more information than you would like, but to really understand PCR primers, try to walk through this: 
The complement of a DNA sequence is the sequence written backwards exchanging all A's for T's, all T's for A's, all G's for C's and all C's for G's.  For example, the complement of the sequence, AGTA is TACT.  An easy way to get the complement of a DNA sequence is to write another line below the original sequence remembering that A replaces T and G replaces C.  Then read the lower line backwards: 
So, for the sequence:
 write the complementary line below it giving:
 Then, just read the lower line backwards (from right to left) giving the complement: 
In practical words, the upstream (left) primer can be a direct reading of the target sequence while the downstream primer (right) must be the complement of the directly read sequence.  
If the above is confusing, it may suffice to think of the primers as  two arrows that point at one another with the STR located between them.  This is how the PCR targets the locus and the STR.
In practice, PCR primers are usually at least 17 bases in length.  The point here is that to use PCR to target an STR, the primers recognize constant, conserved sequences that flank the actual STR.  This means that the actual length of the target sequence depends on where the primers are placed in the flanking sequence.  For example, the Promega and PE, Applied Biosystems test kits use mostly different primers.  For example, the upstream primer could be designed to recognize DNA 100 bases upstream of the sequence shown.  Similarly, the downstream primer could be designed to recognize DNA further downstream.  Such placement of the primers by design, further upstream and downstream, would make all alleles (variations) of the STR appear to be larger than if the primers are placed by design close to the STR itself.  Wherever the primers are placed, that defines the region we will examine.  That region will then vary among individuals due to changes in the STR itself as explained above for the simple STR based on the repeating AT motif.

After PCR is used to provide many copies of a given person's STR, the products (copies) are separated according to size on an electrophoretic gel (see RFLP above for more details about gels).  The gel can be flat, as for RFLP, or it can be in a round tube, called a capillary with a detector at the end of it.  Typical flat gel STR results look like this:
 The black bars are called bands.  Each band is made up of many identical-size DNA molecules that were produced by PCR.  The gel separates smaller bands (DNA molecules) from larger ones.  The bands near the lower end of the gel are smaller (ie. the DNA fragments are shorter in length)  than those near the top.  For example, looking at the reference ladder, the first band near the lower end of the gel is the smallest STR.  For simplicity, let's say this smallest band contains a single repeat such as CATG, flanked by other DNA that the primers actually recognize in everyone's DNA.  The next higher band in the ladder would then contain 2 repeats, CATGCATG; the next 3 repeats and so on.  By comparing the positions of bands in the unknown samples with the reference ladder, the allele sizes are deduced.  In this example, Sample A had bands at the 2-repeat position and the 5-repeat position. Common terminology would call this sample a 2,5 type.  Sample B would be called,  2,4.  For a single person, each locus normally has two alleles and these can be different (heterozygous) or the same (homozygous).  

DQA1 (also known as DQ alpha)

The PM plus DQA1TM (PE Applied Biosystems) typing kit targets six genetic loci.  All six are copied in the initial PCR.  The products from this reaction are then placed onto two separate typing strips.  One strip is for DQ alpha and the other types the remaining five loci. 
There are several steps in a DQ alpha PCR test: 
1.  DNA from 50 or more cells is extracted.  Notice that this test requires fewer cells that the RFLP test.  Sensitivity (the number of cells needed) is the main advantage of PCR tests.  However, the increased sensitivity also makes PCR tests more vulnerable to trace contaminants, DNA from unexpected sources, in other words. 
2.  The DNA from the sample is copied over and over resulting in amplification of the original target sequence.  The copying or amplification is accomplished in a machine specially designed for this purpose.  This machine is called a thermal cycler. 
3.  The amplified DNA is now treated with a variety of probes that are bound to a blot (see RFLP: Note: In RFLP, the target DNA is bound to the blot and the probe DNA is added.  For the DQ alpha dot blot, the probe DNAs are bound to a small blot strip and the target DNA is added). 

From the pattern of probes that the amplified DNA binds to, a potential DNA type, also called a genotype, can be inferred. 
DQ alpha typing strips look like this before any types are obtained:

 The invisible dot to the right of the number 1, has a DNA probe for the 1-allele (variation) for DQ alpha.  The invisible dot to the right of the 2 has a DNA probe for the 2-allele and so on.  The 1-allele itself has variations, the 1.1,1.2 and 1.3 subtypes, also called alleles.  Notice that the typing strip has no specific dot or probe for the 1.2 subtype.  Also, the typing strip can't distinguish between the 4.2 and 4.3 subtypes and there is a single dot for these.  It is quite possible that there exist DQ alpha alleles that would be undetected by the typing strip and alleles that may be further subtypes of the alleles that the strip does detect.
 Here are some examples of how the strips are read: 


This last example brings up an important issue with DQ alpha typing.  The 1.2 allele is actually the second most common allele in most populations.  This means there will be frequent situations where the 1.2 allele may be present but undetected as in the last example.  An obvious question is:  Why not just have a specific probe for the 1.2 allele?  The answer is that the typing strip already maximizes the probing of a relatively short stretch of DNA.  That is, the DQ alpha locus itself is only about 240 base pairs long.  The multiple probe typing strip was probably about the best that could be done in terms of detecting multiple alleles of this small locus in a single typing step.  
Historicall, DQ alpha was often the first PCR-based test that forensic labs used.  Actually, the DQ alpha system is quite different from the majority of PCR applications in the scientific community.  This will be explained in more detail below. 



Native or natural DNA usually has two complementary strands.  The G residues on one strand bind C residues on the complementary strand and A residues bind T's.


Notice that the G-C pairs are depicted with three lines, or bonds between them while A-T base pairs have only 2 bonds.  This property of the DNA has been recognized since 1953.  The bonds between G-C base pairs and A-T base pairs are called, hydrogen bonds.  It is well known that the G-C base pairs are stronger than A-T pairings because of the extra hydrogen bond for G-C base pairs.  This means that the stability of the DNA can be predicted based on the % G+C content.  For example, the sequence shown above has 12 G-C base pairs and a total of 25 base pairs, for a G+C content of 50%.  The two strands of this sequence are held together more tightly than a similar length sequence with a 40% G +C content for example.  Such considerations are fairly important for DNA testing since any use of PCR or probe hybridization involves the disruption and reformation of the two strands.
For example, each cycle of PCR involves heating the DNA to separate the strands followed by cooling to the appropriate temperature to allow the primer DNAs to bind accurately to their complementary sequences.  This process is also important in hybridizations involving dot strips or Southern blots where the single-stranded probes must bind accurately to their complementary target DNA sequences. 
If the temperature of the cooling step is too warm (warmer than optimal) the probe may not bind to its target sequence.  If the temperature of the cooling step is too cool, the probe may bind incorrect targets as well as correct ones.  The latter effect is called cross-hybridization and has been documented for some of the probes of PM plus DQA1.  Incorrect binding can also happen to the primers of STR based PCR tests if conditions are improper.
With regard to accuracy of hybridization, the binding of a DNA probe to its complementary DNA target for example, there may be an important difference comparing the common research use of probes and primers and systems like PM plus DQA1 and even multiplex STRs.  The common use is to target a single sequence in each PCR using two primers to flank the sequence.  This is followed by some form of analysis of the sole PCR product.  Analysis may involve a Southern blot to size and probe the PCR product, or DNA sequencing of the product to determine the precise sequence of bases. 
In contrast, multiplex PCRs begin with the simultaneous binding of many different primers (two for each of the loci).  If 14 loci are targeted, there are at least 28 different primers involved.  For Polymarker, this is followed by simultaneous probings of the PCR products.  The PM typing strip alone has 14 different probes (one of the 13 dots has two probes) while the DQA1 strip has 11 probes.  Thus these systems are far more complex than usual applications of PCR.  The complexity was added to speed the analysis.  All of the loci of PM plus DQA1 could be analyzed one at a time.  
One would think that the sequences of the probes for PM plus DQA1 would have been chosen to all have roughly the same G + C content so that they could all be used at the same temperature with the same relative accuracy of each probe.  However, sequence inspection reveals that these in fact were not designed that way.  Based on empirically tested formulas for predicting best temperatures of probe binding, the S and C dot probes in particular appear to be as much as 20  C away from their temperature optima.  It is possible that this observation may account for some of the known artifacts that have been observed.  There is some evidence that PM plus DQA1 can function consistently when provided with relatively undegraded, unmixed DNA samples available in ample amounts.  However, there is evidence that this system can be fooled by aged or degraded DNA, mixtures and low input amounts of DNA.  For multiplex systems with unpublished primers, it is difficult for the scientific community to evaluate the general, thermal equality of the primers. 
Multiplex systems have the limitations of any PCR system in terms of the influence of contamination.  Stray DNA molecules can contribute alleles or complete DNA profiles.  PCR is a replication process similar to the replication of an infectious agent.  Contamination of a PCR can occur as easily as the spread of the common cold virus.  In fact, it may be easier to contaminate a PCR than it is to catch a cold since PCRs have no immune system to ward of the contaminating DNA.   
PCR is potentially useful since it is the only method of amplifying really minuscule amounts of DNA.  However, it is important to recognize that PCR methods are sometimes problematic, exquisitely sensitive to contamination and need to be interpreted with extreme caution. 
Analysis of Separated Sperm and Non-Sperm Fractions. 
In order to perform DNA typing on sperm DNA, it is desirable to separate the sperm DNA from any other DNA that may be present.  For example, in swabbed materials from a rape evidence kit, the swabs may contain non-sperm cells from the victim as well as sperm and non-sperm cells from the rapist.  To accomplish separation of the sperm cells, a process known as differential extraction is often performed.  This involves lysing (breaking open) the non-sperm cells followed by spinning (centrifugation) the mixture to remove the still unbroken sperm cells.  To do this, chemicals, usually an enzyme called proteinase K (PROTEIN-ACE-K) (breaks down most proteins), and a mild detergent (breaks down cellular membranes) are added to the original mixture of sperm and non-sperm cells.  The enzyme and the mild detergent can lyse most cell types but not sperm because the sperm cell membranes have cross-linking chemical bonds called disulfides (pronounced DI-SUL-FIDES).  Actually, the illustration below is slightly incorrect because the proteinase K does remove most of the sperm tails.  These were left in the illustration to assist in following what happens to the sperm. 

Y chromosome STR testing

Another way of getting information on a male contributor is to use PCR to target STRs on the Y chromosome.  Since females have two X chromosomes, instead of an X and a Y, the male DNA can sometimes be distinguished even if there is more female than male DNA present.  Such Y-chromosome STR tests are in use, but they tend to be used only after the other tests have failed to give clear results.  There are several reasons:  First, the Y-chromosome is a small chromosome with no pairing partner.  Pairing of the other chromosomes promotes exchange of DNA, effectively a scrambling event, known as recombination.  An effect of this is that various loci, when far enough apart, can become independent markers (they are said to be in equilibrium).  This means having allele type 1, at locus A on the chromosome doesn't imply an increased or decreased probability of having a particular allele at locus B on the same chromosome.  This is idealized and requires the locus be far apart on the chromosome, and also that functional products in the two regions don't interact.  The Y chromosome, on the other hand, only recombines (with the X chromosome) at a small region of the short end of the chromosome.  As far as we know, it doesn't recombine in other regions of its length.  One probable result of this is that loci on the Y-chromosome may be more dependent on one another than loci on other chromosomes.  Y-chromosome STR testing is an active area of research, since despite the limitations, there may eventually prove to be some advantages. 

Understanding PCR Contamination

Early in the history of PCR, its pioneers recommended certain techniques and practices for preventing and recognizing contamination.  A parallel with sterile technique in medical clinics is often drawn.  By definition, sterile means the absence of all living organisms, including bacteria and viruses.  For example, sterile technique is used when working in the vicinity of an open wound.  PCR technique is similar to sterile technique and even borrows many basics concepts from it.  This includes the use of sterile instruments and pipettes that may contact the samples under analysis.  Similar sterile techniques are used by scientists who grow cells in culture dishes which are easily contaminated. 
PCR technique differs from sterile technique in that a clinically sterile solution or instrument may still harbor DNA.  DNA usually survives heat sterilization used to make clinical solutions and instruments sterile.  Presence of a single bacterium or virus would violate sterility.  Doctors and nurses think in terms of a sterile "field", an area where everything present is sterile and meticulous effort expended to maintain that condition of sterility.  Once a non-sterile object, or even one whose sterility can be questioned, enters the area, the field is no longer considered sterile.  Sterile technique training involves the development of a mental image of the sterile field and how to protect it.  Finally, one does not assume success, just because the mental picture seems un-breached.  Post-sterile technique practices include monitoring patients for fever and other sign of infection and giving antibiotics in advance, actions that basically assume that technique may well have failed.  As rigorous as sterile technique concepts are, PCR technique involves the same concepts and more since a properly sterilized item of equipment, or a sterilized solution, may contain DNA that would potentially influence a PCR.  For example, large pressure cookers called autoclaves are effectively used to sterilize instruments and some solutions by heating to temperatures (slightly higher than the temperature of boiling water) that most infectious organisms can't survive.  However, such temperatures are insufficient to destroy contaminating DNA.  Thus, autoclaves, while they achieve the condition of clinical sterility by getting rid of all bacteria, are not infallible for PCR.  In short, PCR technique needs to go beyond sterile technique.  Disposable instruments and pipettes and proper design of PCR laboratories are helpful considerations in this regard. 
Good PCR technique is no guarantee that contamination didn't influence the results.  Steps must be taken to try and detect contamination.  Negative controls are blank PCRs that have all the components of the evidentiary PCRs but have no other DNA added intentionally.  Fortunately, there are often two negative controls used, one when the DNA is extracted, and another when the PCR is set up.  Any PCR signal in the negative control would warn that contamination has occurred.  Unfortunately, the negative controls are virtually the only warning of PCR contamination.  Negative controls may alert the analyst to general contamination occurring within the lab or the lab reagents.  These controls don't offer protection against contamination occurring before the samples arrived at the PCR lab.  Negative controls also can't rule out contamination of individual samples.  The individual samples lack individual signs of contamination if it occurs.  Unlike a human patient, a PCR is incapable of showing signs of infection (contamination) such as fever or undue pain.  PCRs also have no immune system to ward off contaminants.  
It is often said that the most critical source of PCR contamination is DNA from previous PCRs.  Again, a PCR produces many DNA copies of the target DNA sequences.  Due to shear number, these copies (called amplicons) are a hazard for future PCRs.  In terms of DNA typing, stray amplicons could contribute single or multiple alleles to a genetic profile.  This would manifest itself in the form of producing, for example, an extra dot on a DQA1 or PM typing strip or an extra band in an STR profile.  The fact that the contaminating dot or band is in fact extra may or may not reveal itself.  Thus, amplicons can lead to mistyping.  
However, a more dangerous source of contamination is what is called genomic DNA.  This is DNA that hasn't yet been amplified.  Genomic DNA doesn't have the high concentration of the target DNA copies but is a hazard because genomic DNA could produce an entirely false DNA profile.  Full profile contaminants have been documented on multiple occasions and in multiple laboratories.  Partial profile contaminants are more common and sometimes constitute a poorly recognized risk in using partial profiles in evidentiary samples as evidence.  When contamination occurs there is rarely any way to confirm how it happened.  
For example, suppose evidence item #1 has little to no DNA or has DNA degraded beyond the ability to function in a PCR.  Suppose further that item #2 is a defendants reference blood stain that would typically have a high concentration of undegraded genomic DNA from the defendant.  If item #2 comes in close proximity with item #1, or comes in contact with item #1, the genomic DNA from item #2 may contaminate item #1.  Subsequent DNA typing of contaminated item #1 will give the false impression that the defendant contributed DNA to item #1 during a crime.  Similarly, when there are multiple items of evidence with some having larger amounts of DNA and some much lower, cross-contamination is an important consideration.  
This is not to say that all PCR-based results are due to cross-contamination.  However, the ease of cross-contamination and its potentially misleading effects may sometimes be under-appreciated, especially in the context of match probabilities reported to be extremely rare.

Dealing with contamination

PCR-based technology has an interesting history.  In PCR's history, contamination has often led to false results, and erroneous actions have been taken based on those results.  This has led some  investigators to discontinue and denounce the technology as being, "too sensitive."  In research, PCR-based results face routine, often vigorous scrutiny for the possibility that contamination may have influenced the results.  
But, PCR also has an unparalleled advantage of powerfully increasing the amount of DNA from small samples.  This can be a great advantage in both research and forensics.  For that reason, many investigators use PCR.  
Fortunately, there are ways of dealing with contamination, or at least limiting its influence:
1.  It is extremely important to run negative controls and background controls through the entire procedure.  Such controls are virtually the only way of detecting low-level contaminating DNA molecules.
2.  Once contamination has been detected, it is important to discard all current reagents and clean relevant equipment and work surfaces.  Bleach is useful for cleaning.  However, not all equipment can be cleaned with bleach.  Some laboratories effectively use gas flames to rid metal utensils of DNA.
3.  Thermal cyclers (where PCR is carried out) need to be cleaned.  It is not unusual for sample tubes leak DNA in the thermal cycler.   Such tubes become soft during temperature extremes and they do not always seal properly.   It is not usual for sample tubes to have minuscule pin-holes.  Sample contamination due to contaminated thermal cyclers has been documented.  Hot soapy water, a sponge and a round scrub brush are useful for cleaning thermal cyclers and their sample-tube wells. 
4.  Of course the contamination event should be discussed.  However, discussion alone is rarely, if ever, sufficient since it may lead to rationalization of the event and failure to correct it.
5.  It is critically important to store samples in proper containers and keep known samples well-segregated from other evidence, particularly evidentiary samples that have small amounts of DNA.  Paper envelopes or wax-paper folds are unsuitable containers.
6.  The laboratory should be extremely careful not to overstate the scientific value of the evidence.  For example, reports that a profile occurs in 1 in a billion, randomly selected individuals greatly overstate the proven error rate of the technology since false convictions based on DNA evidence have been established.  Perhaps such rare match probabilities could be reached if thoroughly independent samples produced the same results in multiple, independent, non-communicating laboratories.  But, for single laboratories, extremely rare match probabilities misrepresent the scientific value of technology.
            Some laboratories prefer to trace the contaminants.  But, many find it is more time-efficient to perform a general cleaning and reagent replacement.  The latter makes sense because contaminant sources often vary, and time spent tracing the contaminant can be easily wasted.  It is important for laboratories to have procedures that effectively detect, acknowledge and deal with contamination