Genetic Epidemiology, Psychiatric Genetics, Asthma Genetics and Statistical Genetics Laboratories investigate the pattern of disease in families, particularly identical and non-identical twins, to assess the relative importance of genes and environment in a variety of important health problems.
QIMR Home Page
GenEpi Home Page
Publications
Contacts
Research
Staff Index
Collaborators
Software Tools
Computing Resources
Studies
Search
GenEpi Intranet
PMID
24404405
TITLE
Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data.
ABSTRACT
Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called "gradient boosting machine" (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach.
DATE PUBLISHED
2013 Oct 20
HISTORY
PUBSTATUS PUBSTATUSDATE
entrez 2014/01/10 06:00
pubmed 2014/01/10 06:00
medline 2014/01/10 06:00
AUTHORS
NAME COLLECTIVENAME LASTNAME FORENAME INITIALS AFFILIATION AFFILIATIONINFO
Lubke G Lubke Gh G Department of Psychology, University of Notre Dame, Notre Dame, IN, USA ; Department of Biological Psychology, VU University Amsterdam, Amsterdam Netherlands.
Laurin C Laurin C C Department of Psychology, University of Notre Dame, Notre Dame, IN, USA.
Walters R Walters R R Department of Psychology, University of Notre Dame, Notre Dame, IN, USA.
Eriksson N Eriksson N N 23 and Me, Inc., Mountain View, CA, USA.
Hysi P Hysi P P Twin Research and Genetic Epidemiology, Genetic Epidemiologist, King's College London, London, England.
Spector T Spector Td T Twin Research and Genetic Epidemiology, Genetic Epidemiologist, King's College London, London, England.
Montgomery G Montgomery Gw G Genetic Epidemiology Laboratory, Queensland Institute of Medical Research, Brisbane, Australia.
Martin N Martin Ng N Genetic Epidemiology Laboratory, Queensland Institute of Medical Research, Brisbane, Australia.
Medland S Medland Se S Genetic Epidemiology Laboratory, Queensland Institute of Medical Research, Brisbane, Australia.
Boomsma D Boomsma DI D Department of Biological Psychology, VU University Amsterdam, Amsterdam Netherlands.
INVESTIGATORS
JOURNAL
VOLUME: 4
ISSUE:
TITLE: Journal of data mining in genomics & proteomics
ISOABBREVIATION: J Data Mining Genomics Proteomics
YEAR: 2013
MONTH: Oct
DAY: 20
MEDLINEDATE:
SEASON:
CITEDMEDIUM: Print
ISSN: 2153-0602
ISSNTYPE: Print
MEDLINE JOURNAL
MEDLINETA: J Data Mining Genomics Proteomics
COUNTRY:
ISSNLINKING:
NLMUNIQUEID: 101560244
PUBLICATION TYPE
PUBLICATIONTYPE TEXT
JOURNAL ARTICLE
COMMENTS AND CORRECTIONS
GRANTS
GRANTID AGENCY COUNTRY
R37 DA018673 NIDA NIH HHS United States
GENERAL NOTE
KEYWORDS
KEYWORD
Boosting
GCTA
GWAS
MESH HEADINGS
SUPPLEMENTARY MESH
GENE SYMBOLS
CHEMICALS
OTHER ID's