Phylogeny from Craniometrics
As we saw in the previous dispatch, different craniofacial characters are variously under the control of neutral drift, sexual selection, and thermoregulatory adaptation to the paleoclimate. I suggested that kosher inference of phylogeny (ie lineage) is difficult because the population history signal is confounded by natural selection. One way to go about this would be to control for dimorphism and absolute latitude. That doesn't seem to work. One gets nonsense trees. Might not there be another way?
The law of large numbers dictates that if we average over a large number of characters, second-order factors will get averaged away thereby revealing the dominant, first-order term. There is good reason to believe that the controlling factor for cranial morphology is neutral drift. As we shall, this turns out to be right. In what follows we shall obtain phylogenetic trees from Hanihara's and Howells' craniometric datasets. The idea is to standardize all characters to have mean 0 and variance 1, and use distance measures between populations in this space to back out the underlying phylogenetic relationships.
Figure 1. Phylogeny from craniofacial characters. Source: Hanihara (2000), author's computations.
Figure 1 displays the phylogenetic tree obtained from the Hanihara sample. See from the top-right, at the base of the tree. The first cluster of 7 correctly identifies western Eurasian phylogeny: eastern and western Europe are closest to western Asia; Europe, north Africa, and western Asia split last from northern and southern India. The second cluster of Sahul, Negrito and southern Africa identifies the really ancient populations. The split between this group and others is correctly identified as having great time-depth. The bottom supercluster correctly identifies the eastern world, although the tree places the East-West split as having as great a time-depth as the San-Sahul split (which is known to be older). But the subdivisions within the eastern supercluster are correctly identified with the possible exception of the close phylogenetic relationship between circumpolar peoples and Polynesians. All in all, not bad for just half-a-dozen craniofacial characters.
The Howells dataset has 82 linear measurements each for 2,524 from thirty populations. The sample is thus wide enough for the averaging method to really work. Figure 2 displays the longitude and latitude for the populations in the sample. This will help us identify whether the inferred tree makes sense.
Figure 2. Locations of demes in the Howells sample.
Figure 3. Phylogeny from craniometrics. Source: Howells Craniometric Dataset, author's computations.
The accuracy of the inferred phylogenetic tree is simply astonishing. Read from bottom-right. The Sahul peoples (Australia, Tasmania, and Tolai) are correctly identified as having split last from Bushman and Zulu; the great-time depth of the Andamanese and its close relationship to the ancient clade likewise. (Perhaps it is best to mentally pull the ancient cluster to the left and place them at the root of the tree.) The precise pattern of the eastern cluster (the next 13 above the Andamanese) is exactly right. The phylogenetic relationships in the Austronesian cluster from New Zealand to Easter Islands (a result of recent Holocene expansions) are correctly identified in detail, as is the cluster's close phylogenetic relationship with the eastern cluster. Further up, the Americans (Peru, Santa Cruz, Arikara) are mixed up with medieval Austro-Hungarians (Berg, Zalavar). Similarly, while the circumpolar peoples are identified as having recently split, as are Dogon and Tieta, they are placed next to each other and to the ancient Egyptians and the medieval Norse. This may be because of the diachronic patterns in morphology, such as those we identified in the European case. In any case, every recent split in the phylogram (the last of the forks) is accurate. The great time-depth of the ancient cluster is spot-on.
So the algorithm does an excellent job in predicting phylogeny. Although second and third-level branches are sometimes confounded (particularly for ancient and medieval populations so this is presumably due to diachronic patterns). But what is clear is that craniometric distance is an excellent signal of phylogeny, suggesting both that cranial characters are under tight genetic control and that neutral drift is the controlling factor in craniometric variation.
Finally, we check that postcranial osteometric data does not predict phylogeny as well. Presumably this is because either postcranial skeletal morphology is not under tight genetic control, or it is so but neutral drift is not the factor controlling osteometric variation. Whatever the case may be, as shall see, the population history signal in the postcranial skeleton is relatively weak and easily confounded by selection.
Figure 4. Phylogeny from osteometrics. Source: Goldman Data Set, author's computations.
We look at the Goldman osteometric dataset. We restrict the sample to the Old World and apply the same algorithm as before. Figure 4 displays the phylogram thus obtained. Although most recent splits are not too far off the mark, the prediction is unimpressive. Malaysia ends up with the Europeans, Australia with Madagascar, Tasmania and South Africa with China and the Philippines, and the Andamanese with the Congolese and the Indonesians! The algorithm thus fails catastrophically in predicting phylogenetic relationships between populations.
In sum, kosher inference of phylogeny is possible from craniometrics but not from osteometrics. This suggests that the former is under tighter genetic control than the latter, and/or contains a stronger population history signal that is less confounded by natural selection.
Postscript. The Howells phylogram can be improved by including the second moments. I figured out that the reason Hanihara's craniofacial data yields such a convincing phylogram is because it contains both means and variances of the measurements. The second moments also contain phylogenetic information since the variance of metric characters and the second moment of the frequency of neutrally-derived discrete polymorphisms is a function of distance from Africa. Indeed, the inclusion solves the problem in our previous estimate.
Figure 5. A more precise phylogram from craniometrics. (Corrected version.)
The new estimate is hard to argue with. The Teita and Dogon are correctly identified as closest to Zulu and Bushman. All others are descendents of this ancient cluster of demes around the San. The San-Sahul split occurs at great-time depth, followed by the ancient split with the Andaman Islanders. The medieval Austro-Hungarians (Berg and Zalavar) are accurately placed closest to the medieval Norse and the ancient Egyptians. The Americans (Arikara, Santa Cruz, and Peru) are correctly showed as closely related to eastern Eurasians (Japanese, Atayal, Hainan, and the Philippines). The Austronesians are placed close to the Anyang and the circumpolar peoples (Eskimo and Buriat). Although the algorithm does get the time-depth of the most ancient splits wrong. The ancient subtree including the San and the Andamanese has the greatest time-depth (60ka), the Euro-Asian split has the second greatest time-depth (45ka), then you have the Americans and the circumpolar people splitting off (14ka), and finally the Austronesians split off from Taiwan (5ka). The phylogram, although highly accurate in detail, gets the greatest time-depths catastrophically wrong.