Biology 332 BIOINFORMATICS
EXERCISE 6. STR, AND DNA FORENSICS (CODIS)
REVIEW OF MENDEL’S LAW OF INHERITANCE, HARDY-WEINBERG EQUILIBRIUM, AND DNA FORENSICS
DNA forensics has become an important tool to solve social problems in modern times. Part of the power of this technology lies in providing robust statistical values to refute or support statements of guilt, paternity, ancestry, and identity. To fully understand the technology, it is imperative to understand the ideas and principles that form the biological basis of its power. The first two ideas came from Gregor Mendel (1865): the law of segregation and the law of independent assortment. It describes how you inherit your genes and how genetic diversity is generated. The second idea is the Hardy –Weinberg Equilibrium which describes how genetic materials as genotypes and alleles are structured in a population. Your instructor will give a brief discussion on these ideas.
To determine whether evolution has occurred, the Hardy-Weinberg Equilibrium is utilized by population geneticist. This formula is based on Mendel’s Laws of Inheritance and could allow us to track the pattern of allelic and genotypic frequency changes over time. This is also the theoretical foundation of modern DNA forensic analysis. A short lecture by the instuctor will allow to understand how this is done.
Any changes in the gene frequencies in the population over time can be detected. The law states that if no evolution is occurring, then an equilibrium of allele frequencies will remain in effect in each succeeding generation of sexually reproducing individuals. In order for equilibrium to remain in effect (i.e. that no evolution is occurring) then the following five conditions must be met:
No mutations must occur so that new alleles do not enter the population.
No gene flow can occur (i.e. no migration of individuals into, or out of, the population).
Random mating must occur (i.e. individuals must pair by chance)
The population must be large so that no genetic drift (random chance) can cause the allele frequencies to change.
No selection can occur so that certain alleles are not selected for, or against.
Obviously, the Hardy-Weinberg equilibrium cannot exist in real life. Some or all of these types of forces all act on living populations at various times and evolution at some level occurs in all living organisms. The Hardy-Weinberg formulas allow us to detect some allele frequencies that change from generation to generation, thus allowing a simplified method of determining that evolution is occurring. There are two formulas that must be memorized:
p2 + 2pq + q2 = 1 and p + q = 1
p = frequency of the dominant allele in the population q = frequency of the recessive allele in the population
p2 = percentage of homozygous dominant individuals q2 = percentage of homozygous recessive individuals 2pq = percentage of heterozygous individuals
For a multiple allele system, or a gene with three (3) alleles in a Hardy-Weinberg equilibrium., the allelic frequency is
P + q + r =1
And the genotypic frequency equation would be:
P2 + 2pq + q2 + 2pr + 2qr + r2 = 1.0
INTRODUCTION TO STR
DNA is present in nearly every cell of our bodies, and we leave cells behind everywhere we go without even realizing it. Flakes of skin, drops of blood, hair, and saliva all contain DNA that can be used to identify us. In fact, the study of forensics, commonly used by police departments and prosecutors around the world, frequently relies upon these small bits of shed DNA to link criminals to the crimes they commit. This fascinating science is often portrayed on popular television shows as a simple, exact, and infallible method of finding a perpetrator and bringing him or her to justice. In truth, however, teasing out a DNA fingerprint and determining the likelihood of a match between a suspect and a crime scene is a complicated process that relies upon probability to a greater extent than most people realize. Government-administered DNA databases, such as the Combined DNA Index System (CODIS), do help speed the process, but they also bring to light complex ethical issues involving the rights of victims and suspects alike. Thus, understanding the ways in which DNA evidence is obtained and analyzed, what this evidence can tell investigators, and how this evidence is used within the legal system is critical to appreciating the true ethical and legal impact of forensic genetics.
How Does DNA Identification Work?
Although the overwhelming majority of the human genome is identical across all individuals, there are regions of variation. This variation can occur anywhere in the genome, including areas that are not known to code for proteins. Investigation into these noncoding regions reveals repeated units of DNA that vary in length among individuals. Scientists have found that one particular type of repeat, known as a short tandem repeat (STR), is relatively easily measured and compared between different individuals. In fact, the Federal Bureau of Investigation (FBI) has identified 13 core STR loci that are now routinely used in the identification of individuals in the United States, and Interpol has identified 10 standard loci for the United Kingdom and Europe. Nine STR loci have also been identified for Indian populations.
As its name implies, an STR contains repeating units of a short (typically three- to four-nucleotide) DNA sequence. The number of repeats within an STR is referred to as an allele. For instance, the STR known as D7S820, found on chromosome 7, contains between 5 and 16 repeats of GATA. Therefore, there are 12 different alleles possible for the D7S820 STR. An individual with D7S820 alleles 10 and 15, for example, would have inherited a copy of D7S820 with 10 GATA repeats from one parent, and a copy of D7S820 with 15 GATA repeats from his or her other parent. Because there 12 different alleles for this STR, there are therefore 78 different possible genotypes, or pairs of alleles. Specifically, there are 12 homozygotes, in which the same allele is received from each parent, as well as 66 heterozygotes, in which the two alleles are different.
The Statistical Strength of a 13-STR Profile
Within the U.S., the 13-STR profile is a widely used means of identification, and this technology is now routinely employed to identify human remains, to establish or exclude paternity, or to match a suspect to a crime scene sample.
In order to utilize STR information as a means of human identification, the FBI established the frequency with which each allele of each of the 13 core STRs naturally occurs in people of different ethnic backgrounds. To this end, the FBI analyzed DNA samples from hundreds of unrelated Caucasian, African American, Hispanic, and Asian individuals. Assuming that all 13 STRs follow the principle of independent assortment (and they should, as they are scattered widely across the genome) and that the population randomly mates, a statistical calculation based upon the FBI-determined STR allele frequencies reveals that the probability of two unrelated Caucasians having identical STR profiles, or so-called “DNA fingerprints,” is approximately 1 in 575 trillion (Reilly, 2001).
This very small number needs to be put into perspective. Note that this figure refers to pairs of people, and there are many pairs of people in the world. Indeed, for the 100 million Caucasians in the world, there are 5,000 trillion pairs of people, so roughly eight or nine pairs would be expected to match at the 13 STR loci. This predicted matching does not specify which profile is shared by two people, and the chance that anyone matches the particular profile associated with a crime is still very small. The distinction between two people sharing a profile and one person having a particular profile is an example of the so-called “birthday problem.” Here, the probability that a person has a particular birthday is 1 in 365, ignoring February 29, but there is a 50% chance that two people in a random group of 23 people have the same unspecified birthday (Weir, 2007).
Location of the 13 STR sequences on human chromosomes. DNA sequence of one of the STRs used by CODIS
The number of repeats determines the allele identity. This STR has 12 repeats of the sequence AGAT and is named allele 12. The allele for this STR having 13 repeats is named allele 13.
Does the DNA Databank System Help Solve Crimes?
The current DNA database maintained by the FBI, known as the Combined DNA Index System (CODIS), contains case samples (DNA samples from crime scenes or “rape kits”) and individuals’ samples (collected from convicted felons or arrestees) that are compared automatically by the system’s software as new samples are entered. As of February 2007, CODIS had produced over 45,400 “hits,” which assisted in more than 46,300 investigations (Federal Bureau of Investigation, n.d.). However, contrary to how DNA analysis is portrayed on popular television shows, DNA samples are not analyzed within the course of an hour. Rather, the U.S. currently has an enormous backlog of samples waiting to be typed and entered into the database. Some of these samples are from cases that have outlasted their statutes of limitation, so even if these samples could help solve a crime, the crime can no longer be tried.
This delay brings up the dilemma of the validity of statutes of limitation. These statutes were established at a time when large quantities of physical evidence were required to match a suspect to a sample and when extended time periods significantly decreased law enforcement’s ability to find a match, as well as the likelihood of successful prosecution. With the advent of DNA databanks and the possibility of storing samples indefinitely, the very notion of a statute of limitation now seems extremely outdated.
Of course, there are many other debatable issues concerning DNA banking. For instance, should the original tissue sample be stored indefinitely after the DNA profile has been entered into the database? Detractors note threats to genetic privacy, but proponents argue that future DNA typing methods will undoubtedly be developed and that old samples might have to be reanalyzed using new techniques. Also at issue is the reopening of old cases on the basis of new (DNA-based) evidence. Which cases should be eligible for reanalysis in light of this new evidence? Can equitable rules be established to allow reexamination of cases that were analyzed with less powerful lab techniques? Further public awareness of the power of DNA forensic technology will help lawmakers decide these issues in a way that seeks to strike a balance between protecting individuals’ genetic privacy and protecting innocent citizens from crime.
SOLVING POPULATION GENETICS PROBLEMS
PROBLEM #1. You have sampled a population in which you know that the percentage of the homozygous recessive genotype (aa) is 36%. Using that 36%, calculate the following:
The frequency of the “aa” genotype.
The frequency of the “a” allele.
The frequency of the “A” allele.
The frequencies of the genotypes “AA” and “Aa.”
The frequencies of the two possible phenotypes if “A” is completely dominant over “a.”
PROBLEM #2. Sickle-cell anemia is an interesting genetic disease. Normal homozygous individials (SS) have normal blood cells that are easily infected with the malarial parasite. Thus, many of these individuals become very ill from the parasite and many die. Individuals homozygous for the sickle-cell trait (ss) have red blood cells that readily collapse when deoxygenated. Although malaria cannot grow in these red blood cells, individuals often die because of the genetic defect. However, individuals with the heterozygous condition (Ss) have some sickling of red blood cells, but generally not enough to cause mortality. In addition, malaria cannot survive well within these “partially defective” red blood cells. Thus, heterozygotes tend to survive better than either of the homozygous conditions. If 9% of an African population is born with a severe form of sickle-cell anemia (ss), what percentage of the population will be more resistant to malaria because they are heterozygous (Ss) for the sickle-cell gene?
PROBLEM #3. There are 100 students in a class. Ninety-six did well in the course whereas four blew it totally and received a grade of F. Sorry. In the highly unlikely event that these traits are genetic rather than environmental, if these traits involve dominant and recessive alleles, and if the four (4%) represent the frequency of the homozygous recessive condition, please calculate the following:
The frequency of the recessive allele.
The frequency of the dominant allele.
The frequency of heterozygous individuals.
PPROBLEM # 4. In a population of 1000 under Hardy-Weinberg Equilibrium, if p2 = AA and has 100 individuals, q2 = BB and has an unknown number of individuals, and r2 = CC and has 16 individuals, estimate the allelic and genotypic frequencies of that population. How many individuals would be heterozygote AB? How many would be BC?
GIVEN THE FBI STR ALLELIC FREQUENCY DATABASE ASSIGNED TO YOU, CALCULATE THE MATCH PROBABILTY BASED ON THREE STR LOCI ASSIGNED TO YOU AND DESCRIBE WHAT IT MEANS IN THE COURT OF LAW (in terms of the perfect match between the DNA evidence found and the suspect’s DNA).
Mendel, G. 1865. Versuche über Plflanzenhybriden. (Experiments in Hybridization). Verhandlungen des naturforschenden Vereines in Brünn, Bd. IV für das Jahr 1865, Abhandlungen, 3–47.
Norrgard, K. 2008. Forensics, DNA fingerprinting, and CODIS. Nature EdUcaiton 1:35.
CHOICE 1. Name : __________________________________
Calculate match probability value P for each of the following locus and the total match probability.
Locus TH01 alleles 6, 7 SE Hispanics
Locus D22S1045 alleles 10, 11 SE Hispanics
Locus D2S441 alleles 13 , 14 SE Hispanics
Calculate match probability value P for each of the following locus and the total match probability.
Locus D3S1358 alleles 13, 13 Caucasian
Locus D8S1179 alleles 12, 15 Caucasian
Locus DYS391 alleles 9, 10 Caucasian