书名:Mathematical chemistry and chemoinformatics
责任者:Adalbert Kerber | Reinhard Laue | Markus Meringer | Christoph Rücker | Emma Schymanski.
ISBN\ISSN:9783110300079,3110300079
出版时间:2014
出版社:de Gruyter,
前言
In this book,we describe,extend and apply methods of computer chemistry and chemoinformatics,suitable for molecularstructure generation,structure elucidation,combinatorial chemistry,QSPRs,the generation of chemical patent libraries and so on.The tools come from discrete mathematics(graph theory,constructive combinatorics),stochastics (explorative data analysis,supervised and unsupervised learning),computer science (data structures,algorithms) and chemistry (combinatorial chemistry,molecular structure elucidation).
The book evolved from research on constructive combinatorics at the University of Bayreuth,guided by A.Kerber and R.Laue,and based on the use of finite group actions. Combinatorialstructures in the focus of this research arein particularcodes,designs,groups and graphs.In the present case the emphasis is,of course,on molecular graphs,i.e.multigraphs where the nodes are colored by atom symbols and atom states.They form the model ofmolecules used in the generator MOLGEN.For this purpose,new methods had to be developed,seee.g. [146,147,174]and the present bibliography. The book is in a sense a summary of research projects (DFG Ke201/16-1,Ke201/19-1,BMBF O3KE7BA1-4,03CO318C)whichled to implementations of thesoft- ware packages MOLGEN(several versions)for molecular structure generation,MOL- GEN-MS and MOLGEN-MS/MS for molecular structure elucidation using mass spectroscopy,MOLGEN-COMB for combinatorialchemistryand MOLGEN-QSPRtosupport the search for quantitative structure-property relationships. We are indebted to the Deutsche For schungsgemeinschaftDFGand the Bundesministerium für Bildung und Forschung BMBF for this long ranging support which not only made the development of MOLGEN possible but also had an impact on the general theoretical research.Several theses originated directly from these projects,e.g.[17,23,24,32,75,76,94,95,96,97,102,202].The aim of this research was to complement the well-known power-ful counting methods with constructions.While counting gives the number of objects without listing any of them,the structures themselves are essential in chemistry.
Historically,the widely known project DENDRAL started in the early 1970s in the US.It can be considered as a precursor of the MOLGEN project in Bayreuth that refined the approach and added theoretical as well as practical material.An early version of the MOLGENstructuregenerator was awarded the German-Austrian University Software Prizefor Chemistry (Deutsch-Osterreichischer Hochschul-Software-Preisfür Chemie) in 1993.
The book is based on the dissertation of M.Meringer [2O2] and also contains the main results of the dissertation of R. Gugisch [102].C.Rücker used the mathematical tools detailed herein to develop software to find quantitativestructure-property relationships (MOLGEN-QSPR)and software for teaching and studying a few facets of organicchemistry,isomerism and in particular stereoisomerism (UNMOLIS).
Finally,E. Schymanski used several MOLGEN products during her disserta-ion [283] at the Helmholtz Centre for Environmental Research (UFZ,Leipzig,Ger many) to integrate analytical and computational methods to identify unknown toxicants isolated during effect-directed analysis.As usual in Mathematics,the order of the book's authors is alphabetical and as such does not reflect the merits of individualauthors.The book's pdf version can be used as an interactive E-book,which can be accessed via http://www.degruyter.com/view/product/185915?format=G.
The exercises contain illustrative examples that can be evaluated using software packages such as MOLGEN-ONLINE,SYMMETRICA,MAGMA and others,directly via the respective homepages.
The book is written for the users of such software,as they need to know what is really meant when we speak of the generation of molecular graphs,of substructures,of a goodlist of prescribed substructures,of overlapping substructures,of non-over-lapping substructures,of closed substructures,of substructure counts,of molecular descriptors,and soon.Otherwise,users will not be able to achieve the full potential of the software.It is also meant to provide documentation of the mathematical basics required for the designers of software for computational chemistry or chemoin for matics.
We emphasize in particular the following aspects:The basic mathematical concepts for representation and evaluation of molecular structures: Molecular graphs,substructures,restrictions,reactions,structure generation,molecular descriptors and the statistical learning methods that play a central role in the applications.The most important results and facts,in particular the extensions of the MOLGEN class library:Reaction-based generation of structures,QSPR studies using different kinds of molecular descriptors,various methods for prediction,ranking and classification of mass spectra,relations between spectra and properties and CASE using electron impact (EI) mass spectrometry.Perspectives and suggestions for further research: Newapproaches to theinterpre- tation and verification of mass spectra,to stereoisomer and conformer generation,normal forms for patents inchemistry and CASEusing high resolution mass spectrometry.We would like to thank D.Moser,whose diploma thesis[214] was the starting point for our research,a quarter of a century ago.It contained the first version of the generator MOLGEN,showing that it is possible to make an efficient molecular generator avail- able to the scientific community.The contributions of R.Grund to orderly generation of molecular graphs,and those of R.Hohberger(evaluationof2Dand3Dplacementsof molecular graphs)are very useful.Together with T.Wieland and C.Benecke they were responsible for the implementation of MOLGENup toversion 3.5.Thanks are also due to T. Grüner for the development of constrained construction strategies for molecular graphs which are used in MOLGEN4and MOLGEN-MS,and his orderly generation of double cosets for the generation of combinatorial libraries for MOLGEN-COMB.We gratefully acknowledge J. Braun's implementation of molecular descriptors and aromaticity detection,which are important modules of MOLGEN-QSPR,as well as the work of R. Gugisch in developing MOLGEN 5.0.Moreover,we should like to emphasize the enormous transfer of know-how in chemoinformatics we received from many people,in particular from K.Varmuza(Vienna University of Technology) and W. Werther (University of Vienna).Thanks are also due to R.Neudert (Chemical Concepts)and S.Stein(NIST)for the MS databases,W.Brack(UFZLeipzig)for the excellent supervision and support during E.Schyman-ski's thesis,as well as many other colleagues from UFZLeipzig,finally to S.Reinker(NIBR Basel,who provided frst measurements from newest generation tandem mass spectrometers with ultrahigh mass resolution for the evaluation of MOLGEN-MS/MS. Our interest in mathematical chemistry was stimulated in particular by A.Dreiding,A.Dress,M.E.Elyashberg,H.Gerlach,I.Gutman,W.Hässelbarth,O.E.Polansky,E.Ruch,S.Tratch,I.Ugi,N.Zefrovand by the foundationof MATCHin 1975.Bayreuth,München,Lüneburg,Zürich,June 4,2013A.Kerber,R.Laue,M.Meringer,C.Rücker,E.Schymanski
查看更多
目录
Preface v
List of figures xiv
List of tables xvi
List of symbols xxi
Introduction and outline 1
1 Basics of graphs and molecular graphs 13
1.1 Graphs 13
1.1.1 Labeled graphs 14
1.1.2 Unlabeled graphs 17
1.2 Molecular graphs,constitutional isomers 25
1.2.1 Atom states in organic chemistry 26
1.2.2Constitutional isomers 29
1.2.3 The existence of molecular graphs 34
1.3Group actions on molecular graphs 35
1.3.1Counting unlabeled structures 36
1.3.2Counting by weight 44
1.3.3Constructive methods 47
1.3.4Generating samples 52
2Advanced properties of molecular graphs 56
2.1Substructures 56
2.1.1Graph-theoretical elements 56
2.1.2Subgraphs and their embeddings 59
2.2Molecular substructures 62
2.2.1Ambiguous molecular graphs 62
2.2.2Substructure restrictions 64
2.3Chemical reactions 66
2.4Mesomerism 69
2.5Existing chemical compounds 72
2.6Molecular descriptors 76
2.6.1Arithmetical descriptors 78
2.6.2Topological descriptors 79
2.6.3Geometrical descriptors 87
3Chirality 91
3.1Orientation and chirality 92
3.1.1Barycentric placement of moleculesin space 93
3.1.2Symmetry operations,the point group 98
3.1.3Chirality and handedness 102
3.2Permutational isomers 106
3.2.1Counting permutational isomers 109
3.2.2Permutationalisomers by content 113
3.2.3Enumeration by symmetry 118
3.2.4Constructive aspects 124
4Stereoisomers 132
4.1Basic stereochemistry 132
4.1.1Symmetry,the orientational automorphism group 140
4.1.2Partial orientation functions (POFs) 141
4.1.3Generation of abstract POFs 143
4.1.4Tests for chemical realizability 146
4.2Radon partitions 150
4.3Binary Grassmann-Plücker relations 154
4.4Chemical conformation and cyclohexane 158
4.5Perspectives 162
5Molecular structure generation 164
5.1Formula-based structure generation 165
5.1.1Orderly generation of simple graphs 165
5.1.2Introducing constraints 171
5.1.3Variations and refinements 172
5.1.4From simple graphs to multigraphs 173
5.1.5Applying the Homomorphism Principle 174
5.1.6Orderly generation 176
5.1.7Beyond orderly generation 179
5.2Constrained generation and fuzzyformulas 180
5.2.1Restrictions for a molecularformula 181
5.2.2Structural restrictions 182
5.3Reaction-based structure generation 183
5.3.1Libraries of permutational isomers 183
5.3.2Attaching substituents to a central molecule 190
5.3.3Generationusing the networkprinciple 191
5.3.4Generation of MS fragments 193
5.3.5Constructionusing the network principle 194
5.3.6Combinatorial libraries 195
5.3.7Ugi's seven component reaction 196
5.4Generic structural formulas 199
5.4.1A simple generic structural formula 199
5.4.2Patents in chemistry 202
5.5Canonizing molecular graphs 204
5.5.1Initial classification 206
5.5.2Iterative refinement 207
5.5.3Labeling by backtracking 209
5.5.4Pruning the backtrack tree 210
5.5.5Profiting from symmetry 214
5.6Data structures for moleculargraphs 219
6Supervised statistical learning 221
6.1Variables and predictingfunctions 221
6.1.1Regression and classification 222
6.1.2 Validation ofthe predicting function 224
6.1.3Preprocessing of data 227
6.1.4Selection of variables 228
6.2Models for predictingfunctions 231
6.2.1Linear models 231
6.2.2Neural networks 233
6.2.3Support vector machines 234
6.2.4Decision trees 236
6.2.5Nearest neighbors 238
7Quantitative structure-property relationships 240
7.1Optimization of experiments in combinatorial chemistry 240
7.2The use of molecular descriptors 242
7.2.1Arithmetical,topological,and geometrical descriptors 243
7.2.2Substructure counts 250
7.3Mathematical composition of QSPRs 251
7.4Case studies of QSPRs obtained by linear modeling 254
7.4.1Linear modeling using topological indices 255
7.4.2Linear modeling using substructure counts 261
7.4.3Linear modeling using TI and SC 265
7.4.4Further descriptors and regression methods 268
7.4.5Prediction 270
7.5Case studies with separate learning and test sets 270
7.5.1Preprocessing of structures 271
7.5.2Choice of descriptors 273
7.5.3Linear modeling by best subset selection 275
7.5.4Linear modeling by stepwise subset selection 277
7.5.5Linear modelingusing principal component regression 282
7.6A case study of QSARs with discrete values 284
7.6.1Choice and redundancy of descriptors 284
7.6.2Regression 286
7.6.3Multi-classification 288
7.6.4Binary classification 290
7.6.5Prediction 294
7.7Outlook:Unsupervised learning and diversity considerations 295
8Molecular structure elucidation 297
8.1Spectroscopic methods 297
8.2Automated molecular structure elucidation 298
8.3Basics of mass spectrometry 301
8.3.1Mode of operation of an EI mass spectrometer 302
8.3.2Problems in El mass spectrometry 303
8.3.3Mass spectra and isotope distributions 306
8.3.4Database of elucidated mass spectra 311
8.4Ranking functions for mass spectra 314
8.4.1Ranking of molecular formulas 319
8.4.2Ranking of structural formulas 327
8.5Classification of mass spectra 338
8.5.1MS descriptors 340
8.5.2MS classifiers 341
8.5.3Search for substructures amenable to MS classification 355
8.6Automated structure elucidation via MS 356
8.6.1Example methyl n-pentanoate 357
8.6.2Example ethyl3-hydroxyphenylacetate 361
8.7High resolution MS 363
8.7.1Exact isotope masses 363
8.7.2Molecular formulas of identical exact mass 364
8.7.3Mass differences between molecular formulas 365
8.7.4Molecular formulas from exact molecular masses 368
8.8High resolution MS/MS 372
8.8.1Generating molecular formulas 373
8.8.2Calculating MS match values 374
8.8.3Calculating MS/MS match values 376
8.8.4Verifying MS/MS matchvalues experimentally 379
8.8.5Scope,limitations and outlook for HR-MS 390
9Case studies of CASE 393
9.1CASE with MOLGEN-MS 393
9.1.1Example for a single spectrum 393
9.1.2Multiple spectra 395
9.2Calculated propertiesto improve CASE 396
9.2.1Mass spectrum prediction 397
9.2.2Retention properties 398
9.2.3Partitioning properties 399
9.2.4Steric energy 400
9.2.5Filtering candidates by calculated properties 401
9.2.6Consensus scoring 406
9.3Examples of CASE at work 407
9.3.1Blue rayon unknown 1 408
9.3.2Blue rayon unknown 2 410
9.3.3Diclofenac transformation product 412
9.4CASE conclusions and outlook 415
9.4.1GC-EI-MS 415
9.4.2CASE with high accuracy data 417
A Lists of molecular descriptors 418
A.1 Arithmetical descriptors 418
A.2Topological descriptors 418
A.3 Geometrical descriptors 421
B Substructures for MS classifiers 422
B.1Alkyls 423
B.2Aromatics 425
B.3Bonds 436
B.4Elements 437
B.5Functional groups 438
B.6Rings 442
C Molecularformulas by mass and ion type 443
D Isomers by mass and molecular formula 447
Bibliography 459
List of abbreviations 475
Index 477
查看PDF
查看更多
馆藏单位
中科院文献情报中心