书名:Data mining and analysis
责任者:Mohammed J. Zaki | Wagner Meira | Jr.
出版时间:2014
出版社:Cambridge University Press,
摘要
The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike.
查看更多
目录
Preface page ix
1 Data Mi ni ng and Analysis 1
1.1 Data Matrix 1
1.2 Att ributes 3
1.3 Data: Algebraic and Geometric View 4
1.4 Data: Probabilistic View 14
1.5 Data Mining 25
1.6 Further Reading 30
1.7 Exercises 30
PART ONE: DATA ANALYSIS FOU NDATIONS
2 Numeric Attributes 33
2.1 Univariate Anallysis 33
2.2 Bivariate Analysis 42
2.3 Multivariate Analysis 48
2.4 Data Normalization 52
2.5 Normal Distribution 54
2.6 Further Reading 60
2.7 Exercises 60
3 Categorical Attri butes 63
3.1 Univariate Analysis 63
3.2 Bivariate Analysis 72
3.3 Multivariate Analysis 82
3.4 Distance and Angle 87
3.5 Discretization 89
3.6 Further Reading 91
3.7 Exercises 91
4 Graph Data 93
4.1 Graph Concepts 93
4.2 Topological Attributes 97
4.3 Centrality Analysis 102
4.4 Graph Models 112
4.5 Further Reading 132
4.6 Exercises 132
5 Kernel Methods 134
5.1 Kernel Matrix 138
5.2 Vector Kernels 144
5.3 Basic Kernel Operations in Feature Space 148
5.4 Kernels for Complex Objects 154
5.5 Further Reading 163
5.6 Exercises163
6 High-dimensional Data 163
6.1 High-dimensional Objects 163
6.2 High-dimensional Volumes 165
6.3 Hypersphere Inscribed within Hypercube 168
6.4 Volume of Thin Hypersphere Shell 169
6.5 Diagonals in Hyperspace 171
6.6 Density of the Multivariate Normal 172
6.7 Appendix:Derivation of Hypersphere Volume 175
6.8 Further Reading 180
6.9 Exercises 180
7 Dimensionality Reduction 183
7.1 Background 183
7.2 Principal Component Analysis 187
7.3 Kernel Principal Component Analysis 202
7.4 Singular Value Decomposition 208
7.5 Further Reading 213
7.6 Exercises 214
PART TWO: FREQUENT PAπERN MINlNG
8 ltemset Mining 217
8.1 Frequent Itemsets and Association Rules 217
8.2 ltemset Mining Algorithms 221
8.3 Generating Association Rules 234
8.4 Further Reading 236
8.5 Exercises 237
9 Summarizing ltemsets 242
9.1 Maximal and Closed Frequent ltemsets 242
9.2 Mining Maximal Frequent Itemsets:GenMax Algorithm 245
9.3 Mining Closed Frequent ltemsets: Charm Algorithm 248
9.4 Nonderivable Itemsets 250
9.5 Further Reading 256
9.6 Exercises 256
10 Sequ ence tining 259
10.1 Freq uent Sequences 259
10.2 M ining Frequent Sequences 260
10.3 Substri ng Mining via Suffix Trees 267
10.4 Further Reading 277
10.5 Exercises 277
11 Graph Pattern Mining 280
11.1 Isomorphism and Support 280
11.2 Candidate Generation 284
11.3 The gSpan Algorithm 288
11.4 Further Reading 296
11.5 Exercises 297
12 Pattern and Rule Assessment 301
12.1 Rule and Pattern Assessment Measures 301
12.2 Significance Testing and Con fidence Intervals 316
12.3 Further Reading 328
12.4 Exercises 328
PART THREE: CLUSTERI NG
13 Representative-based Clusteri ng - 333
13.1 K-means Algorithm 333
13.2 Kernel K-means 338
13.3 Expectation-Maximization Clustering 342
13.4 Further Reading 360
13.5 Exercises 361
14 Hierarchical Clustering 364
14.1 Preliminaries 364
14.2 Agglomerative Hierarchical Clustering 366
14.3 Further Reading 372
14.4 Exercises and P叫ects 373
15 Density-based Clustering 375
15.1 The DBSCAN A lgorith m 375
15.2 Kernel Density Estimation 379
15.3 Density-based Clustering: DENCLUE 385
15.4 Further Reading 390
15.5 Exercises 391
16 Spectral an d G raph Clusteri ng 394
16.1 Graphs and Matrices 394
16.2 Clustering as Graph Cuts 401
16.3 Markov Clustering 416
16.4 Furt her Reading 422
16.5 Exercises 423
17 Clustering Validation 425
17.1 External Measures 425
17.2 Internal Measures 440
17.3 Relative Measures 448
17.4 Further Reading 461
17.5 Exercises 462
PART FOUR: CιASSIFICATION
18 Probabilistic Classification 467
18.1 Bayes Classifier 467
18.2 Naive Bayes Classifier 473
18.3 K Nearest Neighbors Classifier 477
18.4 Further Reading 479
18.5 Exercises 479
19 Decision Tree Classifier 481
19.1 Decision Trees 483
19.2 Decision Tree Algorithm 485
19.3 Further Reading 496
19.4 Exercises 496
20 Linear Discriminant Analysis 49
20.1 Optimal Linear Discriminant 498
20.2 Kernel Discriminant Analysis 505
20.3 Further Reading 511
20.4 Exercises 512
21 Support Vector Machines 514
21.1 Support Vectors and Margins 514
21.2 SVM: Linear and Separable Case 520
21.3 Soft Margin SVM: Linear and Nonseparable Case 524
21.4 Kernel SVM: Nonlinear Case 530
21.5 SVM Training Algorithms 534
21.6 Further Reading 545
21.7 Exercises 546
22 Classification Assessment 54
22.1 Classification Performance Measures 548
22.2 Classifier Evaluation 562
22.3 Bias-Variance Decomposition 572
22.4 Further Reading 581
22.5 Exercises 582
lndex 585
查看PDF
查看更多
馆藏单位
中科院文献情报中心