Genetic Epidemiology, Psychiatric Genetics, Asthma Genetics and Statistical Genetics Laboratories investigate the pattern of disease in families, particularly identical and non-identical twins, to assess the relative importance of genes and environment in a variety of important health problems.
QIMR Home Page
GenEpi Home Page
Publications
Contacts
Research
Staff Index
Collaborators
Software Tools
Computing Resources
Studies
Search
GenEpi Intranet
PMID
24026092
TITLE
Genetic and nongenetic variation revealed for the principal components of human gene expression.
ABSTRACT
Principal components analysis has been employed in gene expression studies to correct for population substructure and batch and environmental effects. This method typically involves the removal of variation contained in as many as 50 principal components (PCs), which can constitute a large proportion of total variation present in the data. Each PC, however, can detect many sources of variation, including gene expression networks and genetic variation influencing transcript levels. We demonstrate that PCs generated from gene expression data can simultaneously contain both genetic and nongenetic factors. From heritability estimates we show that all PCs contain a considerable portion of genetic variation while nongenetic artifacts such as batch effects were associated to varying degrees with the first 60 PCs. These PCs demonstrate an enrichment of biological pathways, including core immune function and metabolic pathways. The use of PC correction in two independent data sets resulted in a reduction in the number of cis- and trans-expression QTL detected. Comparisons of PC and linear model correction revealed that PC correction was not as efficient at removing known batch effects and had a higher penalty on genetic variation. Therefore, this study highlights the danger of eliminating biologically relevant data when employing PC correction in gene expression data.
DATE PUBLISHED
2013 Nov
HISTORY
PUBSTATUS PUBSTATUSDATE
aheadofprint 2013/09/11
entrez 2013/09/13 06:00
pubmed 2013/09/13 06:00
medline 2014/06/03 06:00
AUTHORS
NAME COLLECTIVENAME LASTNAME FORENAME INITIALS AFFILIATION AFFILIATIONINFO
Goldinger A Goldinger Anita A University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, Queensland 4102, Australia.
Henders AK Henders Anjali K AK
McRae AF McRae Allan F AF
Martin NG Martin Nicholas G NG
Gibson G Gibson Greg G
Montgomery GW Montgomery Grant W GW
Visscher PM Visscher Peter M PM
Powell JE Powell Joseph E JE
INVESTIGATORS
JOURNAL
VOLUME: 195
ISSUE: 3
TITLE: Genetics
ISOABBREVIATION: Genetics
YEAR: 2013
MONTH: Nov
DAY:
MEDLINEDATE:
SEASON:
CITEDMEDIUM: Internet
ISSN: 1943-2631
ISSNTYPE: Electronic
MEDLINE JOURNAL
MEDLINETA: Genetics
COUNTRY: United States
ISSNLINKING: 0016-6731
NLMUNIQUEID: 0374636
PUBLICATION TYPE
PUBLICATIONTYPE TEXT
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
COMMENTS AND CORRECTIONS
REFTYPE REFSOURCE REFPMID NOTE
Cites Am J Hum Genet. 2000 Jan;66(1):279-92 10631157
Cites PLoS Genet. 2013;9(8):e1003649 23935528
Cites Pac Symp Biocomput. 2000;:455-66 10902193
Cites Bioinformatics. 2001 Sep;17(9):763-74 11590094
Cites Genome Res. 2002 Jul;12(7):1112-20 12097349
Cites Nat Genet. 2002 Dec;32 Suppl:490-5 12454643
Cites Anal Chem. 2003 Sep 1;75(17):4672-5 14632079
Cites Nat Biotechnol. 2004 Jan;22(1):86-92 14647306
Cites Mol Chem Neuropathol. 1992 Jun;16(3):207-24 1418218
Cites Ann Neurol. 1994 Nov;36(5):747-51 7979220
Cites Ann Neurol. 1997 May;41(5):646-53 9153527
Cites Nat Methods. 2005 May;2(5):345-50 15846361
Cites Nat Rev Genet. 2006 Jan;7(1):55-65 16369572
Cites Biostatistics. 2007 Jan;8(1):118-27 16632515
Cites Proc Natl Acad Sci U S A. 2007 Apr 10;104(15):6478-83 17420480
Cites Nat Genet. 2007 Jul;39(7):807-8; author reply 808-9 17597765
Cites Nucleic Acids Res. 2007 Jul;35(Web Server issue):W169-75 17576678
Cites Am J Hum Genet. 2007 Sep;81(3):559-75 17701901
Cites PLoS Genet. 2007 Sep;3(9):1724-35 17907809
Cites J Clin Oncol. 2008 Mar 1;26(7):1186-7; author reply 1187-8 18309960
Cites Nature. 2008 Mar 27;452(7186):429-35 18344982
Cites BMC Bioinformatics. 2008;9:244 18492285
Cites BMC Genomics. 2008;9:285 18549499
Cites Immunity. 2008 Jul 18;29(1):150-64 18631455
Cites Hum Mol Genet. 2008 Oct 15;17(R2):R129-34 18852201
Cites DNA Res. 2008 Dec;15(6):367-74 18931094
Cites Nat Protoc. 2009;4(1):44-57 19131956
Cites Bioinformatics. 2009 Apr 1;25(7):882-9 19223452
Cites Proc Natl Acad Sci U S A. 2009 Nov 24;106(47):20057-62 19897719
Cites Nature. 2010 Apr 1;464(7289):768-72 20220758
Cites PLoS Comput Biol. 2010 May;6(5):e1000770 20463871
Cites Nat Genet. 2010 Jul;42(7):570-5 20562874
Cites Pharmacogenomics J. 2010 Aug;10(4):278-91 20676067
Cites Nat Rev Genet. 2010 Oct;11(10):733-9 20838408
Cites PLoS Genet. 2011 Feb;7(2):e1001317 21383966
Cites PLoS One. 2011;6(2):e17238 21386892
Cites PLoS Genet. 2011 Aug;7(8):e1002197 21829388
Cites PLoS Genet. 2012 Jan;8(1):e1002431 22275870
Cites Genome Res. 2012 Mar;22(3):456-66 22183966
Cites PLoS Genet. 2012;8(4):e1002639 22532805
Cites PLoS One. 2012;7(4):e35430 22563384
Cites PLoS Genet. 2012;8(5):e1002704 22589741
Cites Nature. 2012 Aug 2;488(7409):96-9 22801501
Cites PLoS One. 2012;7(8):e43301 22905253
Cites Nat Genet. 2012 Oct;44(10):1084-9 22941192
Cites PLoS One. 2012;7(10):e46612 23094028
Cites Proc Natl Acad Sci U S A. 2000 Jul 18;97(15):8409-14 10890920
GRANTS
GRANTID AGENCY COUNTRY
GM057091 NIGMS NIH HHS United States
P01 GM099568 NIGMS NIH HHS United States
GENERAL NOTE
KEYWORDS
KEYWORD
batch effects
gene expression
heritability
linear models
normalization
principal components analysis
MESH HEADINGS
DESCRIPTORNAME QUALIFIERNAME
Female
Gene Expression Profiling statistics & numerical data
Genetic Variation statistics & numerical data
Genome-Wide Association Study statistics & numerical data
Humans statistics & numerical data
Linear Models statistics & numerical data
Male statistics & numerical data
Models, Genetic statistics & numerical data
Principal Component Analysis statistics & numerical data
Quantitative Trait Loci statistics & numerical data
Quantitative Trait, Heritable statistics & numerical data
SUPPLEMENTARY MESH
GENE SYMBOLS
CHEMICALS
OTHER ID's
OTHERID SOURCE
PMC3813841 NLM