eprintid: 21982 rev_number: 21 userid: 3671 importid: 2458 dir: disk0/00/02/19/82 datestamp: 2014-06-23 20:54:25 lastmod: 2019-01-29 15:55:20 status_changed: 2014-06-23 20:54:25 type: article metadata_visibility: show item_issues_count: 0 eprint_status: archive creators_name: Chikina, MD creators_name: Sealfon, SC creators_email: MCHIKINA@pitt.edu creators_email: creators_id: MCHIKINA creators_id: contributors_type: http://www.loc.gov/loc.terms/relators/EDT contributors_name: Hatzis, Christos title: Increasing consistency of disease biomarker prediction across datasets ispublished: pub divisions: sch_med_Computational_Systems_Biology full_text_status: public abstract: Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. © 2014 Chikina, Sealfon. date: 2014-04-16 date_type: published publication: PLoS ONE volume: 9 number: 4 refereed: TRUE id_number: 10.1371/journal.pone.0091272 citation: Chikina, MD and Sealfon, SC (2014) Increasing consistency of disease biomarker prediction across datasets. PLoS ONE, 9 (4). document_url: http://d-scholarship-dev.library.pitt.edu/21982/1/journal.pone.0091272.pdf document_url: http://d-scholarship-dev.library.pitt.edu/21982/8/licence.txt