外文科技图书简介
当前位置:首页 > 检索结果 >文献详细内容

书名:Data mining and analysis

责任者:Mohammed J. Zaki  |  Wagner Meira  |  Jr.

ISBN\ISSN:9780521766333 

出版时间:2014

出版社:Cambridge University Press,

分类号:自动化技术、计算机技术


摘要

The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike.

查看更多

目录

Preface page ix

1 Data Mi ni ng and Analysis 1

1.1 Data Matrix 1

1.2 Att ributes 3

1.3 Data: Algebraic and Geometric View 4

1.4 Data: Probabilistic View 14

1.5 Data Mining 25

1.6 Further Reading 30

1.7 Exercises 30

PART ONE: DATA ANALYSIS FOU NDATIONS

2 Numeric Attributes 33

2.1 Univariate Anallysis 33

2.2 Bivariate Analysis 42

2.3 Multivariate Analysis 48

2.4 Data Normalization 52

2.5 Normal Distribution 54

2.6 Further Reading 60

2.7 Exercises 60

3 Categorical Attri butes 63

3.1 Univariate Analysis 63

3.2 Bivariate Analysis 72

3.3 Multivariate Analysis 82

3.4 Distance and Angle 87

3.5 Discretization 89

3.6 Further Reading 91

3.7 Exercises 91

4 Graph Data 93

4.1 Graph Concepts 93

4.2 Topological Attributes 97

4.3 Centrality Analysis 102

4.4 Graph Models 112

4.5 Further Reading 132

4.6 Exercises 132

5 Kernel Methods 134

5.1 Kernel Matrix 138

5.2 Vector Kernels 144

5.3 Basic Kernel Operations in Feature Space 148

5.4 Kernels for Complex Objects 154

5.5 Further Reading 163

5.6 Exercises163

6 High-dimensional Data 163

6.1 High-dimensional Objects 163

6.2 High-dimensional Volumes 165

6.3 Hypersphere Inscribed within Hypercube 168

6.4 Volume of Thin Hypersphere Shell 169

6.5 Diagonals in Hyperspace 171

6.6 Density of the Multivariate Normal 172

6.7 Appendix:Derivation of Hypersphere Volume 175

6.8 Further Reading 180

6.9 Exercises 180

7 Dimensionality Reduction 183

7.1 Background 183

7.2 Principal Component Analysis 187

7.3 Kernel Principal Component Analysis 202

7.4 Singular Value Decomposition 208

7.5 Further Reading 213

7.6 Exercises 214

PART TWO: FREQUENT PAπERN MINlNG

8 ltemset Mining 217

8.1 Frequent Itemsets and Association Rules 217

8.2 ltemset Mining Algorithms 221

8.3 Generating Association Rules 234

8.4 Further Reading 236

8.5 Exercises 237

9 Summarizing ltemsets 242

9.1 Maximal and Closed Frequent ltemsets 242

9.2 Mining Maximal Frequent Itemsets:GenMax Algorithm 245

9.3 Mining Closed Frequent ltemsets: Charm Algorithm 248

9.4 Nonderivable Itemsets 250

9.5 Further Reading 256

9.6 Exercises 256

10 Sequ ence tining 259

10.1 Freq uent Sequences 259

10.2 M ining Frequent Sequences 260

10.3 Substri ng Mining via Suffix Trees 267

10.4 Further Reading 277

10.5 Exercises 277

11 Graph Pattern Mining 280

11.1 Isomorphism and Support 280

11.2 Candidate Generation 284

11.3 The gSpan Algorithm 288

11.4 Further Reading 296

11.5 Exercises 297

12 Pattern and Rule Assessment 301

12.1 Rule and Pattern Assessment Measures 301

12.2 Significance Testing and Con fidence Intervals 316

12.3 Further Reading 328

12.4 Exercises 328

PART THREE: CLUSTERI NG

13 Representative-based Clusteri ng - 333

13.1 K-means Algorithm 333

13.2 Kernel K-means 338

13.3 Expectation-Maximization Clustering 342

13.4 Further Reading 360

13.5 Exercises 361

14 Hierarchical Clustering 364

14.1 Preliminaries 364

14.2 Agglomerative Hierarchical Clustering 366

14.3 Further Reading 372

14.4 Exercises and P叫ects 373

15 Density-based Clustering 375

15.1 The DBSCAN A lgorith m 375

15.2 Kernel Density Estimation 379

15.3 Density-based Clustering: DENCLUE 385

15.4 Further Reading 390

15.5 Exercises 391

16 Spectral an d G raph Clusteri ng 394

16.1 Graphs and Matrices 394

16.2 Clustering as Graph Cuts 401

16.3 Markov Clustering 416

16.4 Furt her Reading 422

16.5 Exercises 423

17 Clustering Validation 425

17.1 External Measures 425

17.2 Internal Measures 440

17.3 Relative Measures 448

17.4 Further Reading 461

17.5 Exercises 462

PART FOUR: CιASSIFICATION

18 Probabilistic Classification 467

18.1 Bayes Classifier 467

18.2 Naive Bayes Classifier 473

18.3 K Nearest Neighbors Classifier 477

18.4 Further Reading 479

18.5 Exercises 479

19 Decision Tree Classifier 481

19.1 Decision Trees 483

19.2 Decision Tree Algorithm 485

19.3 Further Reading 496

19.4 Exercises 496

20 Linear Discriminant Analysis 49

20.1 Optimal Linear Discriminant 498

20.2 Kernel Discriminant Analysis 505

20.3 Further Reading 511

20.4 Exercises 512

21 Support Vector Machines 514

21.1 Support Vectors and Margins 514

21.2 SVM: Linear and Separable Case 520

21.3 Soft Margin SVM: Linear and Nonseparable Case 524

21.4 Kernel SVM: Nonlinear Case 530

21.5 SVM Training Algorithms 534

21.6 Further Reading 545

21.7 Exercises 546

22 Classification Assessment 54

22.1 Classification Performance Measures 548

22.2 Classifier Evaluation 562

22.3 Bias-Variance Decomposition 572

22.4 Further Reading 581

22.5 Exercises 582

lndex 585

查看PDF
查看更多

馆藏单位

中科院文献情报中心