外文科技图书简介
当前位置:首页 > 检索结果 >文献详细内容

书名:Probabilistic graphical models for genetics, genomics and postgenomics

责任者:Christine Sinoquet  |  Raphael Mourad.

ISBN\ISSN:9780198709022 

出版时间:2014

出版社:Oxford University Press

分类号:生物科学


摘要

Nowadays bioinformaticians and geneticists are faced with myriad high-throughput data usually presenting the characteristics of uncertainty, high dimensionality and large complexity.
These data will only allow insights into this wealth of so-called 'omics' data if represented by flexible and scalable models, prior to any further analysis. At the interface between statistics and machine learning, probabilistic graphical models (PGMs) represent a powerful formalism to discover complex networks of relations.
These models are also amenable to incorporating a priori biological information. Network reconstruction from gene expression data represents perhaps the most emblematic area of research where PGMs have been successfully applied. However these models have also created renewed interest in genetics in the broad sense, in particular regarding association genetics, causality discovery, prediction of outcomes, detection of copy number variations, and epigenetics. This book provides an overview of the applications of PGMs to genetics, genomics and postgenomics to meet this increased interest.

查看更多

目录

Abbreviations xix

List of Contributors xxiii

Part I. INTRODUCTION

1. Probabilistic Graphical Models for Next-generation Genomics and Genetics 3

1.1. Fine-grained Description of Living Systems 4

      1.1.1. DNA and the Genome 4

      1.1.2. Genes and Proteins 5

      1.1.3. Phenotype and Genotype 5

      1.1.4. Molecular Biology, Genetics, Genomics, and Postgenomics 6

1.2. Higher Descript ion Levels of Living Systems 6

      1.2.l. Complexity in Cells 7

      1.2.2. Genetics, Epigenetics, and Copy Number Polymorphism 9

      1.2.3. Epigenetics with Additional Prior Knowledge on the Genome 11

      1.2.4. Transcriptomics 11

      1.2.5. Transcriptomics with Prior Biological Knowledge 13

      1.2.6. Integrating Data from Several Levels 13

      1.2.7. Recapitulation 16

1.3. An Era of High-th roughput Genomic Technologies 16

      1.3.1. Genotyping 16

      1.3.2. Copy Number Polymorphism 19

      1.3.3. DNA Methylation Measurements 19

      1.3.4. Gene Expression Data 20

      1.3.5. Quantitative Trait Loci 21

      1.3.6. The Challenge of Handling Omics Data 23

1.4. Probabilistic Graphical Models to Infer Novel Knowledge from Omics Data 23

      1.4.1. Gene Network Inference 24

      1.4.2. Causality Discovery 24

      1.4.3. Association Genetics 26

      1.4.4. Epigenetics 26

      1.4.5. Detection of Copy Number Variations 26

      1.4.6. Prediction of Outcomes from High-dimensional Gen omic Data 26

2. Essentials to Understand Probabilistic Graphical Models: A Tutorial about Inference and Learning 30

2.1. Introduction 32

2.2. Reminders 32

2.3. Various Classes of Probabilistic Graphical Models 38

      2.3.1. Markov Chains and Hidden Markov Models 38

      2.3.2. Markov Random Fields 39

      2.3.3. Variants around the Concept of Markov random field 41

      2.3.4. Bayesian networks 4 1

      2.3.5. Unifying Model and Model Extension 45

2.4. Probabilistic Inference 46

      2.4.1. Exact Inference 46

      2.4.2. Approximate Inference 51

2.5. Learning Bayesian networks 57

      2.5.1. Parameter Learning 58

      2.5.2. Structure Learning 61

2.6. Learning Markov random fields 69

      2.6.1. Parameter Learning 69

      2.6.2. Structure Learning 72

2.7. Causal Networks 75

2.8. List of General Monographs and Focused Chapter Books 77

Part II. GENE EXPRESSION

3. Graphical Models and Multivariate Analysis of Microarray Data 85

3.1. Introduction 85

3.2. The Model 87

3.3. Model Fitting 88

      3.3.1. Maximum Likelihood Estimation when the Zero Pattern is Known 89

      3.3.2. Determining the Pattern of Zeroes i.n the Inverse Covariance Matri.x 90

3.4. Hypothesis Testing 92

      3.4.1. Null Distributions by Permutation 92

      3.4.2. A Multivariate Test Statistic 93

      3.4.3. Partitioning of the Test Statistic 94

      3.4.4. Testing Strategies 95

3.5. Example 96

3.6. Discussion and Conclusions 99

4. Comparison of Mixture Bayesian and Mi.xture Regression Approaches to Infer Gene Networks 105

4.1. Introduction 106

4.2. Methods 107

      4.2.l. Mixture Bayesia n Network 107

      4.2.2. Mixture Regression Approach 108

      4.2.3. Data 110

4.3. Results 112

      4.3.1. Comparison of Mixtures 112

      4.3.2. Mixture Modeling of Changes in Gene Relationships 112

      4.3.3. Interpretatio n of Mixtures 114

      4.3.4. Inference of Large Networks 116

4.4. Conclusions 116

5. Network Inference in Breast Cancer with Gaussian Graphical Models and Extensions 121

5.1. I ntroduction 122

5.2. Modeling of Gene Networks by Gaussian Graph ical Networks 123

      5.2.1. Simple Gaussian graphical network 123

      5.2.2. Extensions Motivated by Regulatory Network Model ing 127

5.3. Application to Estrogen Receptor Status in Breast Cancer 134

      5.3.1. Con text 134

      5.3.2. Biological Prior Definition 135

      5.3.3. Network Inference from Biological Prior: Application and Interpretation 139

5.4. Concl usions and Discussion 141

Part III. CAUSALITY DISCOVERY

6. Utilizing Genotypic Information as a Prior for Learning Gene Networks 149

6.1. Introduction 149

6.2. Methods 151

      6.2.1. eQTL Data sets 151

      6.2.2. LCMS Method for Learning a Prior Matrix of Causal Relationships 151

      6.2.3. Bayesian Network Structure Learning 154

      6.2.4. Integrating the Prior Matrix 155

      6.2.5. Stochastic Causal Tree Method 156

6.3. Conclusion 161

7. Bayesian Causal Phenotype Network Incorporating Genetic Variation and Biological Knowledge 165

7.1. Introduction 166

7.2. Joint Inference of Causal Phenotype Nerwork and Causal QTLs 167

      7.2.1. Standard Bayesia n Network Model 168

      7.2.2. HCGR Model 169

      7.2.3. Systems Genetics and Causal Inference 170

      7.2.4. QTL Mapping Conditional on Phenotype N etwork Structure 172

      7.2.5. Joint Inference of Phenotype Network and Causal QTLs 173

7.3. Causal Phenotype Network l ncorporating Biological Knowledge 174

      7.3.1. Model 175

      7.3.2. Sketch of MCMC 178

      7.3.3. Summ ary of Encoding of Biological Knowledge 180

7.4. Simulations 183

7.5. Analysis of Yeast Cell-Cycle Genes 185

7.6. Conclusion 188

8. Structural Equation Models for Studying Causal Phenotype Networks in Quantitative Genetics 196

8.1. I ntroduction 196

8.2. Classical Linear Mixed-effects Models in Quantitative Genetics 197

8.3. Mixed-effects Structural Equation Models 202

8.4. Data-driven Search for Phenotypic Causal Relationships 204

      8.4.1. General Overview 204

      8.4.2. Search Algorithms 206

8.5. Inferring Causal Structures in Genetics Applications 207

      8.5.1. Genotypic information as Instrumental Variable 207

      8.5.2. Accounting for Polygenic Confounding Effects 208

8.6. Concluding Remarks 210

Part IV. GENETIC ASSOCIATION STUDIES

9. Modeling Linkage Disequilibrium and Performing Association Studies through Probabilistic Graphical Models: a Visiting Tour of Recent Advances 217

9.1. introduction 218

9.2. Modeling Li nkage Disequilibriu m 219

      9.2.1. General Panorama 221

      9.2.2. Decomposable Markov Random Fields 221

      9.2.3. Bayesian Network-based Approaches without Latent Variables 223

      9.2.4. Bayesian Network-based Approaches with Latent Va riables 224

      9.2.5. Recapitulation 226

9.3. Single-SNP Approaches for Genome-wide Association Studies 228

      9.3.1. Integration of Confounding Factors 228

      9.3.2. GWAS Multilocus Approach 230

      9.3.3. Strengths and Limitations 235

9.4. Identifying Epistasis at the Genome Scale 237

      9.4.1. Bayesia n Network-based Approaches 237

      9.4.2. Markov Blanket-based Method 239

      9.4.3. Recapitu lation 240

9.5. Discussion 241

9.6. Perspectives 242

10. Modeling Linkage Disequilibrium with Decomposable Graphical Models 247

10.l. Introduction 248

10.2. Methods 249

      10.2.1. Decomposable Graphical Models 249

      10.2.2. Estim ating Decomposable Graph ical Models 251

      10.2.3. Application to Diploid Data by Phase Imputation 254

      10.2.4. Estimation on the Genome-Wide Scale 256

10.3. Application s 258

      10.3.1. Phasing 258

      10.3.2. Unconditional Simulation 260

      10.3.3. Phenotypes and Covariates 261

      10.3.4. Admixture Mapping 263

10.4. Application to Sequence Data 265

11. Scoring, Searching and Evaluating Bayesian Network Models of Gene-phenotype Association 269

11.1. Introduction 270

11.2. Background 270

      11.2.1. Epistasis 270

      11.2.2. Genome-wide association studies I 1.3. A Bayesian Network Model 271

11.3 A Bayesian Network Model 272

11.4. Scoring Candidate Models 273

      11.4.1. Bayesia n Network Scoring Criteria 273

      11.4.2. Experi ments 275

11.5. Searching over the Space of Models 278

      11.5.1. Experiment s 280

11.6. Determining Whether a Model is Sufficiently Noteworthy 280

      11.6.1. The Bayesian Network Posterior Probability (BNPP) 282

      11.6.2. Prior Probabilities 285

      11.6.3. Experiments 287

11.7. Discussion a nd Further Research 290

12. Graphical Modeling of Biological Pathways in Genome-wide Association Studies 294

12.1. Introduction 295

12.2. MRF Modeling of Gene Pathways 296

12.3. A Bayesian Framework 300

      12.3.1. Prior Specification and Likelihood Function 300

      12.3.2. Posterior Distribution 302

      12.3.3. Making Inference Based on the Posterior Distribution 304

      12.3.4. Numerical Studies 305

      12.3.5. Real Data Example-Crohn 's Disease Data 309

12.4. Discussion 312

13. Bayesian, Systems-based, Multilevel Analysis of Associations for Complex Phenotypes: from Interpretation to Decision 318

13.1. Introduct ion 319

13.2. Bayesian network-based Concepts of Association and Relevance 320

      13.2.1. Association and Strong Relevance 320

      13.2.2. Stable Distribution s, Markov Blankets and Markov Bou ndaries 322

      13.2.3. Further relevance types 323

      13.2.4. Necessary Subsets and Sufficient Supersets in Strong Relevance 326

      13.2.5. Relevance for Multiple Targets 327

13.3. A Bayesian View of Relevan ce for Complex Phenotypes 328

      13.3.1. Estimating the Posteriors of Complex Features 330

      13.3.2. Sufficiency of the Data for Full M ultivariate Analysis 332

      13.3.3. Rate of Learning: Effect of Featu re and Model Complexity 333

      13.3.4. Bayesian network-based Bayesian Multi level Analysis of Relevance 336

      13.3.5. Posteriors for Multiple Target Variables 339

      13.3.6. Subtypes of Strong and Weak Relevance 340

      13.3.7. Interaction -redundancy Scores Based on Posteriors of Strong Relevance 342

13.4. Bayes Optimal Decisions about Multivariate Relevance 344

      13.4.1. Optima.I Decision about Univariate Relevance 344

      13.4.2. Optima.I Bayesian Decision to Control FDR 345

      13.4.3. General Bayes Optimal Decision about M ultivaria te Relevance 348

13.5. Knowledge Fusion: Relevance of Genes and Annotat ions 350

13.6. Conclusion 352

14. Bayesian Networks in the Study of Genome-wide DNA Methylation 363

14.1. lntroduction to Epigenetics 364

14.2. Next-ge neration Sequencing and DNA Methylation 365

      14.2.1. Assaying Genome-wide DNA Methylation 366

      14.2.2. The methyl -Seq Method 368

14.3. A Bayesian network for methyl-Seq Analysis 370

      14.3.1. Notation 371

      14.3.2. A Generative Model 371

      14.3.3. Parameter Learning and Inference of Posterior Probabilities 372

14.4. Genomic Structure as a Prior on Methylation Status 375

14.5. Application:Methyltyping the Human Neutrophil 379

      14.5.1. Unmethylated Clusters 379

14.6. Conclusions 381

15. Latent Variable Models for Analyzing DNA Methylation387

15.1. Introduction 388

15.2. Latent Variable Methods for DNA Methylation in Low-dimensional Settings 390

      15.2.1. Discrete Latent Variables 39 1

      15.2.2. Con t inuous Latent Variables 392

15.3. Latent Variable Methods for DNA Methylation in High-dimen sional Settings 396

      15.3.1. Model-based Clustering: Recursively Partit ioned Mixture Models 396

      15.3.2. Semi-Supervised Recursively Partitioned Mixture Models 399

15.4. Conclusion 401

Part VI. DETECTION OF COPY NUMBER VARIATIONS

16. Detection of Copy Number Variations from Array Comparative Genomic Hybridization Data Using Linear-chain Conditional Random Field Models 409

16.1. Introduction 410

16.2. aCGH Data and Analysis 411

      16.2.1. aCGH Data 4 11

      16.2.2. Existing Algorithms 412

16.3. Linear-chain CRF Model for aCGH Data 4 13

      16.3.1. Feature Fun ctions 415

      16.3.2. Parameter Estimation 417

      16.3.3. Eval uation Methods 421

16.4. Experimental Results 421

16.4.1. A Real Example 421

16.4.2. SimuJated Data 424

16.5. Conclusion 425

Part VII. PREDICTION OF OUTCOMES FROM HIGH-DIMENSIONAL GENOMIC DATA

17. Prediction of ClinicaJ Outcomes from Genome-wide Data 431

17.1. Introduction 431

17.2. Challenges with Genome-wide Data 432

17.3. Background 433

      17.3.1. The Na ive Bayes Model 433

      17.3.2. Bayesian Model Averaging 434

      17.3.3. Alzheimer's Disease 434

17.4. The Model-Averaged Naive Bayes (MANB) AJgorithm 435

17.4.1. Overview of the MAN B Algorithm 435

17.4.2. Details of the MANB AJgorithm 436

17.5. Evaluation Protocol 438

      17.5.1. Data set 438

      17.5.2. Protocol 438

17.6. Results 439

17.7. Conclusion 440

Index 447

查看更多

作者简介

Raphaël Mourad received his PhD from the University of Nantes in september 2011. His first postdoc (2011-2012) was at the Lang Li lab, Center for Computational Biology and Bioinformatics, Indiana University Purdue University of Indianapolis (IUPUI). He notably worked on the genome-wide analysis of chromatin interactions. His second postdoc (2012-2013) was at the Carole Ober Laboratory and Dan Nicolae Laboratory, Department of Human Genetics, University of Chicago. He worked on whole-genome sequencing data in asthma. As from november 2013, he started a third postdoc at the LIRMM, in Montpellier (France) which deals with the bioinformatics of HIV.

查看更多

馆藏单位

中科院文献情报中心