Experiment 4: Defining significance |
After you have complete the previous experiments, you should have found that there are three different classes of organisms in the F2 generation. Approximately
How do we make sense of these ratios? Mendel's solution was to postulate that traits were the produce or determined by cellular factors. Mendel's factors were given their current name, genes, by the Danish scientist and plant breeder Wilhelm Johannsen. Genes can exist in different forms, originally called allelomorphs but now known as alleles. In organisms, like peas and people, each gene is present in two copies - there is an allele supplied by the maternal parent and an allele supplied by the paternal parent (both parents contribute equally). These maternal and paternal alleles can be the same or different. For example, consider flower color. Assume that the dominant purple color is determined by the presence of the "P" allele of a specific gene . The inbred purple parental strain has two copies of this allele - it is "PP". The gametes that an organism produces contain one and only one copy of a particular gene. A PP organism can only produce gametes containing the P version of the gene. |
In contrast, white flower color is determined by the presence of different allele, which we will call "p" and the inbred white flower parental strain is "pp" - it can produce only gametes that contain p. If we cross true-breeding purple and white flower plants, all of the F1 off-spring with be "Pp". These individuals can produce either of two types of gametes, ones that contain the P allele and ones that contain the p allele. |
At this point Mendel made another assumption, he assumed that the chances of an F1 Pp organism producing a P gamete would be equal to its chances of producing a p gamete. Based on this assumption, it is possible to make some very specific numerical predictions. Our task is to determine whether the data we find when we cross plants is consistent with these predictions or contradicts them? This generally involves what is known as a "test for statistical significance". |
In the case of genetic data, a common statistical test is called the χ2 (chi squared) test. This test was developed by Karl Pearson (1857-1936), one of the founders of modern statistics and its application to genetics and evolutionary studies. |
In a ct, we begin by determining the number of "degrees of freedom" (df) in our system. We can think of degrees of freedom as the number of measurements needed to completely define the behavior of a system. Consider dice: A conventional "western" die has six sides. If we know the total number of throws of the die we made, and the number of times any five of the six faces came up, we automatically know how many times the 6th side came up. The system is therefore said to have five degrees of freedom. We will explore the use of the Χ2 test in a set of experiments to determine whether a particular die is "fair". First, what does fair mean? For a standard die to be fair, the probability that it will land on any of its six faces should be equal. |
As in any experiment, we begin by forming a working hypothesis – in this case, we assume that the die is fair, although we could equally well begin with the opposite hypothesis, that it is not fair. We choose "fair" because it makes a very simple prediction, namely that the probability of rolling a 1, 2, 3, 4, 5 or 6 will equal - we expect to see each 1/6th of the time. |
To test our hypothesis (the die is fair), we do an experiment: we roll the die some number of times and note how many times each particular face comes up (are there other ways to do test for fairness?) If we roll the die 60 times, we will expect that each side will appear 10 times. At the same time, since each trial (a roll of the die) is independent, it is extremely unlikely that we will see each side come up exactly 10 times in any 60 trials. How, then, do we decide whether the difference between the "expected" number of times a number appears (one in six) and the "observed" number of times it actually did appear was due to chance or to the fact that the die is unfair? We use the Χ2 formula
For each value, we want a positive number, so we square the difference between observed and expected. If expectedi = observedi, Χ2 is zero; the smaller Χ2 , the more closely our observations agree with the predictions of our hypothesis. |
So, how large can Χ2 be before we seriously question the validity of our hypothesis? Because unlikely events (like winning the lottery) do occur, we must look at our data skeptically. We are not trying to determine an absolute right or wrong, with regards our hypothesis; rather we are trying to determine the likelihood that our hypothesis is correct. We seek to estimate the chance that the result we observe occur by chance, even though our hypothesis is false. To analyze our results, we use a table of critical values. Which critical value we use is determined by the degrees of freedom in the system (df), and how stringently we seek to test our hypothesis. |
The typical standard is based on a critical value of 0.05. If our calculated Χ2 value is smaller than the critical value, we our results would be obtained simply by chance less than 1 time out of 20, even if our hypothesis was wrong. A smaller critical value (e.g. α = 0.01) demands a closer agreement between observed and predicted data and is less likely to occur purely by chance. |
Chi square critical values (α)
|
Only data that supports our hypothesis at a level that is greater that 0.05 is considered "statistically significant" (which is different from significant). It is important to remember that we cannot attain certainty through the use of the chi square test, but we can assess our uncertainties. |
Experiment 4 directions:
|
|
Use Wikipedia to
look up concepts | edited/revised
09-Dec-2005
|