Click here for the original journal page (in Acrobat pdf format).

The text below is grayed out because it is not intended to be read. It is a necessarily imperfect OCR of the original and is only used by a search engine.


64

Vol. 6, nos. 4-5

SOME STATISTICAL CONCEPTS IN TAXONOMY

by Nicholas Shoumatoff

In my profession of process engineering in the pulp and paper industry, I have been confronted on numerous occasions with problems involving the statistical analysis of observational and experimental data. I have been impressed with the power of analysis made available by modern statistical methods. The possibilities of applying these methods to the taxonomy of Lepi-doptera, as discussed in the recent series of articles in the Lep. Neivs by F. Martin Brown, have therefore interested me keenly.

The statistical methods of modern experimental science were largely developed in biology, although some of the most significant early contributions (1908) were made by "Student", an anonymous industrial engineer. The full power of today's methods, however, based on the principle of fiducial probability, is largely due to R. A. Fisher, whose specialty of genetics is certainly related to taxonomy. I therefore anticipated no difficulty in attempting statistical reversion from technology to Lepidoptera. For this reason it was quite a surprise to find that, in Mr. Brown's article in the News, Vol. 5: pp. 64-66, some basic concepts are recommended which are quite different from those which I have encountered elsewhere.

Mr. BROWN indicates that in taxonomy valid tests of significance should be based on probability levels of a vastly different order of magnitude than are commonly used in other branches of science. He further indicates that the required probability level can be uniquely determined from the size of the sample and the size of the population from which it is drawn. This was illustrated with two examples, without details of the mathematical procedure employed.

I am not in a position to take issue with these recommendations directly, but I would like to point out (in the hope of obtaining information which may resolve this conflict) how they differ from what I have understood to be the basic concepts underlying modern procedures of statistical inference.

The first principle involved in the case is that statistical methods do not establish what can be known with certainty, but only what can be expected with any desired degree of confidence. The specification of the degree of confidence, which is necessary for any test of significance, is purely subjective, although certain definite criteria have been established by custom. Published tables of statistical functions frequently do not extend beyond the 99-9% level of confidence.

The implications of selecting the level of confidence should be clearly understood. If in a given test the 95% confidence level is established as the criterion of significance, it means that in accepting the significance of results which meet this criterion the chance of error is 5%.. The chance of error in rejecting the significance of a result which does not meet this criterion is riot uniquely determined by the selected confidence level, but depends on the degree of difference between the true and assumed values of the quantity being investigated. It may be as high as 95%. It may be seen that if the confidence level' is changed to a higher value to reduce the chance if accepting a non-significant result, the chance of rejecting a result which may be significant is correspondingly increased. Greater certainty in the first type of judgement can be acquired only at the price of reduced sensitivity in the

1952

The Lepidopterists' News

65

second type of judgement. In certain investigations, particularly those which are in a preliminary stage, it is often desirable to follow up an indication even though in the end it may prove to be insignificant. In such cases a lower confidence criterion must be allowed. However, the minimum confidence level commonly used is 95%, which with large samples is approximately equivalent to two standard deviations.

It is understandable that a taxonomist would like to exercise the highest degree of confidence in assigning names to populations of Lepidoptera so that the names will have enduring validity rather than clutter up the literature with synonyms. On the other hand, it is doubtful whether, in certain groups at least, the subspecific structure has been so clearly defined that one can afford to overlook indicated differences at a lower level of confidence. In "The Karanasa Butterflies" (Annals Carnegie Museum, 1951) AviNOFF and Sweadner assigned names to every local population that they were able to distinguish. In doing so they realized that future investigations based on more complete data might not uphold all these names. This was felt to be a lesser evil, however, than the danger of confusing two really distinct entities under the same name, as has frequently happened in earlier literature on this group.

It should always be borne in mind that the magnitude of variation corresponding to any given level of confidence is not fixed but depends on the amount of information available. In the absence of a complete census, the true random variation of an entire population is never known as such, but must be estimated from a sample. A most important concept in this connection is the number of degrees of freedom (number of observations minus number of restraints) available for calculation of and comparison with the estimate of error. For example, the number of standard deviations at each probability level is a variable, depending on the number of degrees of freedom, in accordance with "Student's" t-function, tables of which can be found in almost any current book on statistics. The fixed values listed in Mr. Brown's article correspond to infinite degrees of freedom, and are approximately true for large samples only. In most actual cases there are several methods of calculating the estimate of error from the same data, each with a corresponding number of degrees of freedom. A typical example is testing the difference between the means of two samples. If specimens are available from two localities so that they may be arranged in two parallel time series to form, say, ten simultaneous pairs, and if one measurement is made on each specimen, the mean difference between the two localities can be tested by comparison with four different estimates of error as follows:

Degrees of         Value of "t" at

Square Deviations of                        Freedom         95% Confidence

Individual values from                         19                      2.093

general mean Individual values from                         18                      2.101

mean each locality Differences in each pair                       10                      2.228

from zero Differences in each pair                          9                      2.262

from average difference

The last of these methods has the least degrees of freedom and the highest "t," yet it is often the most sensitive method because the variance among pairs and between localities has been eliminated from the estimate of

66

Shoumatoff: Statistical Concepts

Vol. 6, nos. 4-5

error. Whichever of these four methods yields the highest confidence level is the one whose result must be considered. These differences in sensitivity due to different sources of variation should not be confused with the general principle that, regardless of the methods of statistical reduction employed, all tests of significance of the same hypothesis based on the same sources of variation in the same set of data are bound, if correctly carried out, to yield exactly the same result, barring only the use of inefficient statistics.

With the small sampling theory illustrated above, confidence limits can be established just as exactly from small samples as from large samples. However, with larger samples, the limits are smaller. This is due to three separate effects:

1.     The degrees of freedom, not the total number of observations, is used in calculating the mean square deviation.

2.     The value of "t" depends on the degrees of freedom.

3.     The standard error of a sample mean varies inversely as the square root of the sample size.

Eventually a point is reached where relatively little is gained by increasing the sample size. This has been called the principle of inertia for large samples.

The previous discussion is intended to show that statistical methods are objective only insofar as they establish accurate betting odds, but the final step of the procedure, whether or not to accept these odds, involves a subjective decision. In contrast, Mr. Brown has, I believe, suggested that there is an absolute scientific basis for completing this final step.

A second fundamental concept involved in this case has to do with the character of the population. The basic calculus of statistical analysis has been derived from the assumption of random sampling from an infinite population. All actual populations are of course finite. Fortunately, if the populations are very large in proportion to the sample, the calculus of infinite populations can be applied with entirely negligible error. The principle of inertia applies to population size as well as sample size. In taxonomy, on the other hand, one is not primarily concerned with actual populations. A sample containing specimens taken over a period of time exceeding the life span of individuals certainly represents more than one actual population. Conclusions drawn from the study of the sample usually refer not only to the actual populations represented, but also to an unspecified number of future populations. The taxonomic unit is an abstraction which does not actually exist in its entirety at any one time. It is partly actual, partly hypothetical, and in effect infinite. It does not appear, therefore, that significance tests based on the calculated size of an actual population are pertinent to the problems of taxonomy, whether or not they are statistically correct. If Mr. Brown's reasoning on this point is followed to its logical conclusion, an infinite deviation would be required if an infinite population is considered.

In conclusion, I would like to repeat that these thoughts have been assembled not in a spirit of criticism but in the hope of reaching a more complete mutual understanding, as all those who work in the same field should have. Properly used statistical methods can do much to promote mutual understanding in taxonomy, and Mr. Brown's articles with their high standard of clarity are undoubtedly a most significant contribution in-this direction.

Box 333, Bedford, N. Y., U.S.A.