书名:Introduction to statistical methods for biosurveillance
责任者:Ronald D. Fricker | Jr.
出版时间:2013
出版社:Cambridge University Press,
前言
This book is about basic statistical methods useful for biosurveillance. The focus on basic methods has a twofold motivation. First, there is a need for a text that starts from the fundamentals, both of public health surveillance and statistics, and weaves them together into a foundation for biosurveillance. Only from a solid foundation can an enduring edifice be built.
Second, while there is a large and growing literature about biosurveillance that includes the application of some very complicated and sophisticated statistical methods, it has been my experience that more complicated methods and models do not always result in better performance. And even when they do, there is often an inherent trade-off made in terms of transparency and interpretability.
Indeed a real challenge in today's data-rich environment is deciding when enough complication is enough. More is not always better, whether we're talking about eating dessert or building a model or developing a detection algorithm. There is a rich history that speaks to this point:
Occam's razor: "All other things being equal, a simpler explanation is better than a more complex one."
Blaise Pascal (1623-1662): "Je n'ai fait cette lettre - ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte." (I have made this letter longer than usual, only because I have not had time to make it shorter.)
Albert Einstein(1879-1955): "Make everything as simple as possible, but not simpler," and "Any intelligent fool can make things bigger, more complex.... It takes a touch of genius...to move in the opposite direction."
Note the theme in these quotes is not one of just simplicity but also that it takes effort and insight to appropriately simplify. Hence, I do not claim that the methods in this book are necessarily the best or most correct ones for biosurveillance. Most of the research necessary to reach such a determination is yet to be done. However, the philosophy on which this book is predicated is that biosurveillance should start with basic methods such as those described herein and, only after empirically demonstrating the added value of more complicated methods, extend from there.
This text presumes a familiarity with basic probability and statistics at the level of an advanced undergraduate or beginning graduate-level course. For readers requiring a probability refresher, Appendix A provides a brief review of many of the basic concepts used throughout the text. However, the text also uses some statistical methods that are often not taught in introductory courses, such as ROC (receiver operating characteristic) curves, imputation, and time series modeling. In presenting these and other methods, the goal has been to make the exposition as accessible and as relevant to the widest audience possible. However, this inevitably means that some of the concepts and methods will be insufficiently explained for some readers, while others may have preferred a more advanced treatment. In an attempt to accommodate all levels of interest, the end of each chapter contains an "additional reading" section with pointers to other resources, some providing more background and introductory material and others providing a more advanced treatment of the material.
That said, this book is largely focused on univariate temporal data. More complicated data, whether multivariate or spatio-temporal, will by definition require more complicated statistical methods. In this book, I touch on these types of data, but they require a treatment more in depth than a text of this length will allow.
As a statistician with a background in industrial quality control, I approach the problem of biosurveillance early event detection from the perspective of statistical process control (SPC). This is, of course, only one way to approach the problem. and different disciplines have different viewpoints.
SPC methods were first developed to monitor industrial processes, which are generally more controlled and for which the data are often easier to distributionally characterize than biosurveillance data. Nonetheless, I am of the opinion that, appropriately applied to biosurveillance data, these methods have much to offer in terms of (1) their performance and (2) a rich, quantitatively rigorous literature that both develops the methods and describes their performance characteristics. Thus, returning to a previous point, my motivation for starting from an SPC perspective is that it provides biosurveillance with a solid methodological foundation on which to build.
It is also important to note that I tend to look at biosurveillance as a tool for guarding against bioterrorism. Of course, a system designed to detect a bioterrorism attack is also useful for detecting natural disease outbreaks, but it's not necessarily true that a biosurveillance system designed for natural disease detection will be optimal for bioterrorism applications. Just as the person who tries to please everyone ends up pleasing no one, so it is with biosurveillance. Thus, while these systems do have dual-use possibilities, I am of the opinion that first and foremost they should be designed for thwarting bioterrorism.
Additional material related to this book, including erata, can be found at http://facultly.nps.edu/rdfricke/biosurveillance_book/. Please feel free to e-mail me at rdfricker@nps.edu with any comments, thoughts, or material that might be relevant and useful in the next revision.
In conclusion, I hope this book contributes to the effective design and implementation of biosurveillance systems. Given the increasingly dangerous threats that face humankind, some of natural origin and some not, and all magnified by our increasingly interconnected world, biosurveillance systems are truly a first line of defense.
查看更多
目录
Preface page xi
Acknowledgments xv
Part I Introduction to Biosurveillance
1 Overview 3
1.1 What Is Biosurveillance? 5
1.2 Biosurveillance Systems 10
1.3 Biosurveillance Utility and Effectiveness 15
1.4 Discussion and Summary 20
2 Biosurveillance Data 23
2.1 Types of Data 25
2.2 Types of Biosurveillance Data 26
2.3 Data Preparation 37
2.4 Discussion and Summary 50
Part II Situational Awareness
3 Situational Awareness for Biosurveillance 55
3.1 What Is Situational Awareness? 57
3.2 A Theoretical Situational Awareness Model 57
3.3 Biosurveillance Situational Awareness 60
3.4 Extending the Situational Awareness Model: Situated Cognition 61
3.5 Discussion and Summary 64
4 Descriptive Statistics for Comprehending the Situation 67
4.1 Numerical Descriptive Statistics 70
4.2 Graphical Descriptive Statistics 84
4.3 Discussion and Summary 107
5 Statistical Models for Projecting the Situation 111
5.1 Modeling Time Series Data 114
5.2 Smoothing Models 118
5.3 Regression-Based Models 129
5.4 ARMA and ARIMA Models 138
5.5 Change Point Analysis 141
5.6 Discussion and Summary 145
Part III Early Event Detection
6 Early Event Detection Design and Performance Evaluation 149
6.1 Notation and Assumptions 152
6.2 Design Points and Principles 154
6.3 Early Event Detection Methods Differ from Other Statistical Tests 157
6.4 Measuring Early Event Detection Performance 166
6.5 Discussion and Summary 175
7 Univariate Temporal Methods 178
7.1 Historical Limits Detection Method 182
7.2 Shewhart Detection Method 183
7.3 Cumulative Sum Detection Method 192
7.4 Exponentially Weighted Moving Average Detection Method 203
7.5 Other Methods 212
7.6 Discussion and Summary 215
8 Multivariate Temporal and Spatio-temporal Methods 218
8.1 Multivariate Temporal Methods 221
8.2 Spatio-temporal Methods 242
8.3 Discussion and Summary 248
Part IV Putting It All Together
9 Applying the Temporal Methods to Real Data 253
9.1 Using Early Event Detection Methods to Detect Outbreaks and Attacks 257
9.2 Assessing How Syndrome Definitions Affect Early Event Detection Performance 268
9.3 Discussion and Summary 279
10 Comparing Methods to Better Understand and Improve Biosurveillance Performance 281
10.1 Performance Comparisons: A Univariate Example 285
10.2 Performance Comparisons: A Multivariate Example 295
10.3 Discussion and Summary 301
Part V Appendices
A A Brief Review of Probability, Random Variables, and Some Important Distributions 305
A.1 Probability 308
A.2 Random Variables 313
A.3 Some Important Probability Distributions 318
B Simulating Biosurveillance Data 335
B.1 Types of Simulation 337
B.2 Simulating Biosurveillance Data 343
B.3 Discussion and Summary 364
C Tables 366
References 381
Author Index 391
Subject Index 395
查看更多
馆藏单位
中科院文献情报中心