Asia Location: Home Page> Asia

Disadvantages of Principal Component Analysis in Human Genome Analysis

In 2021, Eran Elhaik published Why Most Principal Component Analyzes (PCA) in Population Genetic Studies are Wrong (Why Most Principal Component Analyzes in Human Genome Research are Wrong) from a mathematical and empirical point of view. , errors in genome analysis software such as PCA and admixture were analyzed.

Disadvantages of Principal Component Analysis in Human Genome Analysis

First, author points out that Principal Component Analysis (PCA) is a multivariate analysis that reduces complexity of datasets while maintaining data covariance and visualizing information on colored scatterplots, ideally with minimal information loss. Analysis (PCA) is a multivariate analysis that allows you to reduce complexity of your dataset while maintaining data covariance and visualizing information on a colored scatterplot, ideally with minimal loss of information.

Based on PCA analysis, main factors that affect results are size (sample size) and component (number of components). However, due to asymmetry in numbers of ancient humans and modern humans, data errors often occur.

Disadvantages of Principal Component Analysis in Human Genome Analysis

The author combined a different number of samples, which led to completely different results. For example, in figure below, due to different number of samples, positions of scatterplots of Europeans, Asians, Oceanians and Africans have different permutations and combinations, which is clearly unreasonable.

Disadvantages of Principal Component Analysis in Human Genome Analysis

With regard to mixin software, author pointed out that Lawson et al. (2018) commented on misuse of mixin-like tools and stated that they should not be used to draw historical inferences. use and accuracy in most common study designs. For example, Lawson et al. pointed to misuse of tools like impurities in 2018 and felt that they should not be used to draw historical inferences. So far, no review has thoroughly explored use and accuracy of ACP in most common research designs. This is mainly due to fact that when analyzing ancient data, sample size is too small, and if K value (peak value) is set too high, result will be skewed. In addition, since there are fewer combinations among alleles (alleles), too high a K value will increase this deviation.

This kind of error is often seen in non-serious discussions on Internet, especially an admixture plot with a very high K value (for example, K>10) is used to demonstrate comparison of different groups of people in ancient and modern times. , because sample size of ancients is too small and peak is too high, and result will be skewed. The correct approach is to reduce number of different groups while controlling for K value. Generally, given actual size of ancient samples, K value is preferably between 4 and 6.

Related Blogs

Disadvantages of Principal Component Analysis in Human Genome Analysis Time of human origin in Americas: major archaeological discoveries are made, and origin of Africa will be rewritten? The Kingdom of Vandals in North Africa: why there are no traces of women in surviving documents Medieval period: veneration of St. Edmund of faith in abbeys Genetics of Kets in Arctic In 1956, Monroe and Queen of England were in the same frame, and her good figure was not worth mentioning in front of her elegant temperament. Changes in post-war British Parliament: rhetorical culture of House of Commons The Importance of Public Education in Scotland West: faith in history, development of diplomacy In Renaissance: role and communication of Latin