外文科技图书简介
当前位置:首页 > 检索结果 >文献详细内容

书名:Bitemporal data

责任者:Tom Johnston.  |  Johnston, Tom,

ISBN\ISSN:9780124080676,0124080677 

出版时间:2014

出版社:Morgan Kaufmann is an imprint of Elsevier,

分类号:自动化技术、计算机技术


摘要

Bitemporal data has always been important. But it was not until 2011 that the ISO released a SQL standard that supported it. Currently, among major DBMS vendors, Oracle, IBM and Teradata now provide at least some bitemporal functionality in their flagship products. But to use these products effectively, someone in your IT organization needs to know more than how to code bitemporal SQL statements. Perhaps, in your organization, that person is you.
To correctly interpret business requests for temporal data, to correctly specify requirements to your IT development staff, and to correctly design bitemporal databases and applications, someone in your enterprise needs a deep understanding of both the theory and the practice of managing bitemporal data. Someone also needs to understand what the future may bring in the way of additional temporal functionality, so their enterprise can plan for it. Perhaps, in your organization, that person is you.
This is the book that will show the do-it-yourself IT professional how to design and build bitemporal databases and how to write bitemporal transactions and queries, and will show those who will direct the use of vendor-provided bitemporal DBMSs exactly what is going on "under the covers" of that software.
Key Features
Explains the business value of bitemporal data in terms of the information that can be provided by bitemporal tables and not by any other form of temporal data, including history tables, version tables, snapshot tables, or slowly-changing dimensions
Provides an integrated account of the mathematics, logic, ontology and semantics of relational theory and relational databases, in terms of which current relational theory and practice can be seen as unnecessarily constrained to the management of nontemporal and incompletely temporal data
Explains how bitemporal tables can provide the time-variance and nonvolatility hitherto lacking in Inmon historical data warehouses
Explains how bitemporal dimensions can replace slowly-changing dimensions in Kimball star schemas, and why they should do so
Describes several extensions to the current theory and practice of bitemporal data, including the use of episodes, "whenever" temporal transactions and queries, and future transaction time
Points out a basic error in the ISO’s bitemporal SQL standard, and warns practitioners against the use of that faulty functionality. Recommends six extensions to the ISO standard which will increase the business value of bitemporal data
Points towards a tritemporal future for bitemporal data, in which an Aristotelian ontology and a speech-act semantics support the direct management of the statements inscribed in the rows of relational tables, and add the ability to track the provenance of database content to existing bitemporal databases
This book also provides the background needed to become a business ontologist, and explains why an IT data management person, deeply familiar with corporate databases, is best suited to play that role. Perhaps, in your organization, that person is you

查看更多

前言

Time present and time past
Are both perhaps present in time future
And time future contained in time past.
T. S. Eliot, The Four Quartets: Burnt Norton
In this fragment from Burnt Norton, Eliot describes a Buddhist conception of time, one which encourages us to think of past time, present time and future time as interwoven with one another. This Buddhist concept is a useful counter-balance to our mechanistic notion of time as a linear sequence of moments which occur one after the other, and which constitute a series which can be traversed in one direction only.
Anything at all ? you, or me, or any of the changeable objects around us ? is at the present moment the latest stage in the history of what we are. With a different history, we would, at this present moment, be other than what we are now. In this sense, William Faulkner was correct when he wrote (in Requiem for a Nun), "The past is never dead. It's not even past."
It is perhaps with human beings, and the short-term and long-term projects and plans that inform their lives, that it is most obviously true that time present and time past are present in time future. Somewhere, a store manager is reviewing a history of product price changes and their effect on sales. She isn't doing this out of simple curiosity. She is doing it because she wants to maximize future profits for her store. Somewhere, an author is working on the Great American Novel. He isn't doing it just to pass the time. He imagines a future in which he has accomplished the great work of his life, in which accolades are heaped on him, and in which royalty checks are more than pittances. If and when either of those futures is achieved, it will be because of a history of present moments, each the culmination of a sequence of past moments during which those people worked towards those future goals.
So the intimate relationships of past, present and future manifest themselves in the changes that take place in the world. But they also manifest themselves in the changes that take place in what we say about the world.
So the intimate relationships of past, present and future manifest themselves in the changes that take place in the world. But they also manifest themselves in the changes that take place in what we say about the world.
This brings us to the subject of this book: temporal data and, in particular, bitemporal data. Bitemporal data is data that is associated with two kinds of time. One of these is the time in which things happen in the world; the other is the time in which descriptions of the world accumulate. The first kind of time is about when things were, are, or will be as the data which describes those things says they were, are, or will be. The second kind of time is about that data itself. It is about when we once thought, or still think, or may eventually come to think, that that data correctly describes what things were, are, or will be like; or at least when we once thought or still think that that data constitutes the best descriptions currently available to us.
This book is about bitemporal data that is persisted in relational databases, and about the informa-tion which that data provides. However, the extension to non-relational ways of persisting data is straightforward. I talk about data in relational databases, first of all, because that is the prevalent way of storing character set data, and because character set data is still the prevalent kind of data that describes the things an enterprise engages with, and the processes in which it engages with them.
I talk about data in relational databases, secondly, because the language of relational data and relational databases is a lingua franca among data management professionals. For example, we all know what tables, rows and columns are, and we all know what entity integrity and referential integrity are. Or, at least, we all should know these things.
But I also talk about data in relational databases, thirdly and most importantly, because rela-tional theory is the richest and most mathematically informed of theories of data management. It is thus best suited to incorporate extensions needed to manage bitemporal data while itself remaining stable and well-grounded.
Relational theory also has both an ontology and a semantics, although neither are much dis-cussed. To the best of my knowledge, little has been written about how the ontology and the semantics of the Relational Paradigm (as I will call the use of relational theory in data manage-ment) give meaning to the mathematical structures of sets, Cartesian Products and relations, and to their concrete manifestations as tables, columns and rows.
But in this book, I would like to say something about the ontology and the semantics of the Relational Paradigm ? a set of concepts based on the relational theory invented by Dr. E. F. Codd, and on the implementation of that theory in the world's major Database Management Systems (DBMSs). In fact, I don't think that the Relational Paradigm can be correctly extended to accommodate bitemporal data unless these perspectives are understood and taken into consideration.
PERSPECTIVES ON THE RELATIONAL PARADIGM OF DATA
One of the distinctive features of this book is that it discusses relational concepts, and their exten-sion into the realm of bitemporal data, from several perspectives. In these discussions, I try to avoid explanations which mix these perspectives because I think that when that happens, explanations become pseudo-explanations which in fact explain nothing at all. In these discussions, I will occa-sionally point out examples of perspectival confusion so the reader may be better prepared to rec-ognize it when she encounters it in her own working environment.
One perspectival distinction is the distinction between syntax and semantics. This distinction will become clearer through repeated use, but this much can be said at the outset. The syntax of the Relational Paradigm describes relational data structures, instances of those structures, and transfor-mations made to those instances. It's about the things that DBAs and programmers construct and manipulate. A Customer table is a data structure, for example, and one row in that table is an instance of that structure. An update to a row in that table is a transformation made to that instance. Syntax describes structures and their instances, and transformations on those instances. Those trans-formations add instances to a database, change instances in a database, and remove instances from a database. The instances have the structure described by their syntax. The transformations add and remove syntactically valid instances, and change valid instances into other valid instances.
The semantics of the Relational Paradigm is about the information expressed in those data structures and in their instances. Data is created and modified so that it accurately conveys information. If customer Smith changes her name to "Jones", then we change her name on her row in the Customer table to reflect that change.
The important point here is that what we do to data, we do in order to preserve its value as an embodiment of information. That is all too obvious, of course. But once we get deep into the syn-tax of data and its management, it is easy to lose sight of this important fact. Information is the master; data is the servant.
Here is a brief example. Relational entity integrity is often explained as the rule that no primary key in a table can be null, and that each primary key must be unique. That is a rule of syntax that a relational DBMS enforces
Is the semantics of entity integrity left undescribed because it is too obvious to be worth men-tioning? Well, consider the fact that the semantics of entity integrity is that a database may never contain contradictory statements. Is this so widely recognized and so obvious as to not be worth mentioning? I don't think so.
A consideration of contradictory statements is an entry into the realm of propositional logic and predicate logic. I discuss these perspectives on the Relational Paradigm in this book because we data management professionals should have some understanding of that logic, of how it is expressed in the Relational Paradigm, and of how it is used to manage data in relational databases.
We are all willing to do the hand-waving which acknowledges that relational theory is based on mathematics and logic. But if we can catch on to the trick of seeing mathematics and logic embed-ded in the data structures and transformations that we manage, then we will build better databases and better applications. In particular, we will be more likely to provide generalized solutions to spe-cific problems. These solutions are always more stable in the face of changing requirements than point solutions to specific problems are. They are easier to code and to maintain because they express simpler and clearer patterns than do idiosyncratic implementations of solutions to narrowly conceptualized problems. They are always better solutions.
THE TEMPORAL SQL STANDARDS: ISO 9075:2011 AND TSQL2
In late 2011, the ISO published the latest release of its SQL standard, ISO 9075:2011. This was the first ISO release to include support for bitemporal data. Prior to that, in 1994, a group of computer scientists published the TSQL2 proposed standard for the management of bitemporal data, but this proposal was never accepted by the ISO. Nonetheless, I will refer to it as a standard because it is a draft standard which represented, at the time, a consensus among a significant part of the computer science community.
A current implementation of the ISO SQL standard can be found in IBM's DB2 10 DBMS and its successive releases, and a current implementation of the TSQL2 standard can be found in the Teradata 13 DBMS and its successive releases.
This book is not an introduction to either of these standards, or to either of these families of products. For example, the insert, update and delete transactions that I describe in this book (in Chapter 11), and the queries that I also describe (in Chapters 12 and 13) use my own syntax. More
1 However, I don't claim that these two commercial products are complete with respect to their regulating standards documents, or even that they are fully conformant with them. They may or may not be. I haven't looked into the issue closely enough to make that determination.
1 However, I don't claim that these two commercial products are complete with respect to their regulating standards documents, or even that they are fully conformant with them. They may or may not be. I haven't looked into the issue closely enough to make that determination.
1 However, I don't claim that these two commercial products are complete with respect to their regulating standards documents, or even that they are fully conformant with them. They may or may not be. I haven't looked into the issue closely enough to make that determination.
From this perspective, you will be better prepared to understand the good, the bad, and perhaps even the ugly, in the current temporal standards and in current vendor implementations of those standards. With that understanding, you will be better prepared to utilize those vendor implementa-tions in a manner that maximizes their value to your enterprise, and minimizes the penalties of their shortcomings. You will also be in a better position to influence the evolution of those standards and those products.
AUDIENCE
The "Theory" chapters of this book contain material that may be unfamiliar to many IT professionals, while the "Practice" chapters of this book contain material that may be unfamiliar to many computer scientists. Not everyone will find every part of this book equally relevant to her work and to her inter-ests. But theorists can always benefit from seeing how their theories are put to use, and practitioners can always benefit from understanding the theory behind the data constructs they build and maintain.
This suggests two different reading strategies. I recommend that both the theorist and the prac-titioner begin each chapter by reading the Glossary entry for each term in that chapter's Glossary List. Then, for the theorist, reading from Chapter 1 through to the end of the book is probably the best strategy. But for the practitioner, especially the IT professional who does not have a strong background in one or more of mathematics, logic, ontology or semantics, the better strat-egy might be to read Chapter 1 and then skip ahead to Part 2 and use the rest of Part 1 as refer-ence material. Whenever the narrative in Part 2 is confusing, this reader should first identify unfamiliar technical terms and look up their definitions. The key technical terms which I have introduced are defined in the Glossary for this book (available at http://booksite.elsevier.com/9780124080676). The definitions of other important technical terms will be available from reli-able sources on the internet.
If this review of definitions doesn't provide enough clarification, then this reader should go back to Part 1 and at least skim the relevant chapter or chapters there. After finishing the book by reading chapters in this sequence, the practitioner should be well-equipped to go back and study the chapters in Part 1.
All this theory and practice, of course, has a focus. That focus is bitemporal data. But this book is not for IT professionals whose only interest is in gaining proficiency in how bitemporal data is managed by a specific DBMS, or for those whose only interest is in a commentary on one or other of the temporal SQL standards. It is, rather, for IT professionals who will work with specific vendor support for bitemporal data, but who will benefit from understanding the theory behind the constructs and functions, behind the DDL and DML, of those specific vendor products. It is also for computer scientists and their students, not only those interested in the management of temporal data, but also those interested in formal ontologies and their role in expressing the semantics of data. So this audience includes:
business analysts;
enterprise data modelers;
enterprise data architects;
database, data warehouse and data mart developers;
business ontologists;
computer science students; and
computer scientists.
BUSINESS ANALYSTS
The role of the business analyst is not a passive one. Subject matter experts can seldom express their requirements clearly enough that what they really want can be built from an initial statement of those requirements. Developing accurate requirements is an interactive process in which the business analyst must bring out distinctions that were "too obvious" to the experts to mention, and must describe relevant possibilities to those experts that exist because of the capabilities of DBMSs and of related hardware and software technologies.
When temporal requirements are the issue, it is especially easy for subject matter experts to fail to communicate what they really want. When the requirement is to "add history" to one or more tables, the business analyst will often write up requirements that developers will implement by add-ing a timestamp to the primary keys of those tables. After implementation, when the business users find that the "history" that is available does not allow them to rerun a report and get the same results as when the report was originally run, they will wonder what the business analyst meant when she promised them support for "as-was" as well as "as-is" reporting.
The as-was capabilities that she promised, in fact, were never anything more than the ability to report on the past, using current data. But if the data as originally entered was incorrect or incom-plete, and was later corrected or completed, then that as-was reporting will not reproduce what was originally reported.
Occasionally, the business analyst is confronted with another important requirement which, in the absence of a yet-to-be-implemented feature of bitemporal data, cannot be fulfilled. That is the requirement to implement a go-live event for a very large number of rows in a set of database tables. This is a situation in which there are too many rows to load in one off-line session. But the business may be unhappy when IT tells them that it will take multiple off-line sessions to complete the load, the reason for their unhappiness being that when the database comes back up after each session, the results of that session immediately becomes visible in the database. And so there isn't one go-live event. There are as many events as there are batch load sessions.
The business analyst should understand that this problem is now solvable. The solution isn't to attach a future effectivity date to the data being loaded, because it may be that the data is a mix of rows which are to become immediately effective, rows which are to become effective on specified future dates, and even rows which became effective on specified past dates. It is all of these rows, those with past effectivity, those with future effectivity, and those with current effectivity, that the business wants to become available to database users at the same point in time, on the same go-live event.
Unfortunately, the solution to this problem is neither specified in the current SQL standards nor implemented in current DBMSs which support bitemporal data. The solution to this problem lies in an extension to current bitemporal theory which I first described, with my co-author Randy Weis, in our 2010 book Managing Time in Relational Databases, and which I describe in this book in Chapter 14. But when IT professionals understand that this solution has been defined, then together we may be able to encourage standards bodies and DBMS vendors to implement it.
This is another reason to understand more about bitemporal data than just how to code it. If we understand bitemporal data itself, we don't have to passively accept what standards bodies define and what vendors implement. We can have a say in the process because we will know what is pos-sible, and we can prioritize what we want. Perhaps it is not customary for IT professionals in end-user organizations to attempt to influence standards committees or DBMS vendors in any serious way. But there is no reason why it shouldn't be done.
ENTERPRISE DATA MODELERS
Bitemporal requirements do not have to be hand-designed into data models. And they shouldn't be. A declarative specification of temporal requirements is far less costly than coding those require-ments in data models. It is also a more reliable way of expressing those requirements.
One reason we can use a declarative approach is that, for a given DBMS, there will be no two ways about how to make tables bitemporal. The CREATE TABLE statements for bitemporal tables are well-defined in the standards. Unitemporal tables are also defined in the standards, and so there is a choice there for the modeler to express. But that choice, and all other choices whose end result is to define a set of one or more bitemporal or unitemporal tables, can easily be expressed in metadata by selecting options from a list from which temporal DDL is then generated.
Nonetheless, data modelers will have to specify those options, and so there will be something related to bitemporal data for the modeler to do. Also, although the bitemporal target of a table conversion will be well-defined, the source table will be as idiosyncratic as you care to imag-ine. The mapping from a non-bitemporal source table to a bitemporal target table will require a careful inventory and labelling of the components of the non-temporal table. For example, primary keys, foreign keys, surrogate keys, natural keys and non-keys will all have to be identified, because they must all be mapped onto corresponding components of the target bitemporal table. If the source table is the result of a prior attempt to "add history", the mapping will be even more complex.
There is also another major role for the data modeler that is on the horizon, and the chapter on the ontology of relational data (Chapter 5) introduces that role. The role is that of a business ontol-ogist. Many major corporations do not yet have ontologists on their IT staffs. Others have ontolo-gists, and employ them to develop ontologies for the information expressed in emails, documents, and other semi-structured data. But the most important ontology an enterprise can develop is the ontology that formalizes the types of which the rows in that enterprise's production tables are instances.
It is the enterprise data modelers of a company who are most familiar with that data and with those types, and it is those data modelers who are best qualified to become the ontologists of their companies' production databases. But for this role, the mathematics, logic, ontology, and semantics of relational databases are all important. Without some grounding in all four of these perspectives on relational databases, these IT professionals will be unable to take that considerable step from data modeler to data ontologist.
I hope this book will be useful as an introduction to enterprise data ontology and to related theo-retical topics for those IT professionals who have the best understanding of an enterprise's core data?enterprise data modelers.
ENTERPRISE DATA ARCHITECTS
Enterprise data architecture, in some considerable part, is enterprise data modeling writ large. Of course, data architecture must consider data in motion as well as data at rest. But even with respect to data in motion, the structure of the data itself is important. For example, as I will describe in Chapter 17, the canonical message model in an SOA architecture can in large part replace an enter-prise data model.
The ontologies discovered in semi-structured data will have much in common with the ontolo-gies discovered in an enterprise's production databases. It is the natural role of the enterprise data architect to oversee the integration of these ontologies.
DATABASE, DATA WAREHOUSE, AND DATA MART DEVELOPERS
In this book, I will demonstrate that Inmon-style data warehouses and Kimball-style data marts are not able to provide the historical information that bitemporal data can provide. The accumulation of data in today's data warehouses does not distinguish between a history of what the data is about and a history of the data itself. Yet this distinction is vital to supporting the real requirements that business users have in mind when they request "historical data". The history provided by the slowly-changing dimensions of star schemas are also unable to make this distinction and provide this information.
This will change. The added information that becomes available only with bitemporal data is too important to ignore. Developers will have to make their databases, data warehouses, and data marts bitemporal, although this conversion to bitemporal data can take place gradually, one set of tables at a time. So when the first projects roll out to make these modifications, developers better have some understanding of what bitemporal structures look like, of the differences between entity integrity and temporal entity integrity and between referential integrity and temporal referential integrity. This book provides a DBMS-neutral description of these structures and these constraints.
BUSINESS ONTOLOGISTS
Many business ontologists will have some familiarity with semantics, and should have some famil-iarity with both classical and formal ontologies. If not, I discuss those topics in Chapters 5 and 6. But if the business ontologist has not been put to work developing ontologies for the entity, attri-bute and relationship types of the production data of the enterprise he works for, he may not be familiar with the mathematics of relational databases, or with the propositional and predicate logic used to access and manipulate those databases. If he is not, he should pay special attention to Chapters 3 and 4.
I especially recommend to the ontologist's attention the Relational Paradigm Ontology pre-sented in Chapter 5. It is an upper-level ontology which is based on Aristotle's work. I present it, not as one more among several current upper-level ontologies, but rather as the ontology common to all relational databases.
This Relational Paradigm Ontology is not a carefully crafted prescriptive structured ontology, by which I mean one which formalizes someone's preferred theories of important concepts like space, time, matter, or mind. Nor is it a descriptive unstructured ontology, by which I mean an ontology based on translating general-purpose or subject-matter-specific dictionaries into an ontol-ogy management tool. Rather, it is a carefully selected, extensible, descriptive, structured ontology.
It is descriptive rather than prescriptive because it expresses the ontology underlying our everyday experiences. It is a folk ontology. For that reason, as readers focus on Chapter 5, they will likely find the ontology described there to be familiar, even if the terminology used to describe it is not.
It is selective rather than universal in scope because it expresses that part of this folk ontology which is implicit in the structures of relational databases. That part includes objects and how they are distinguished, properties and relationships of objects, and events that objects take part in. The expression of those ontological categories in the tables, rows, columns, primary keys and foreign keys of relational databases is straightforward, and constitutes an upper-level ontology common to all relational databases.
It is structured rather than unstructured because its components are mapped onto the mathematical constructs of relational databases, expressing those upper-level ontological commitments common to all databases. Because of this mapping onto mathematical structures, formal logic can be used to express the ontology, and to do theorem-proving and other reasoning on it.
It is extensible rather than static because by means of subtyping, middle-level and lower-level ontologies can be formalized which express the ontological commitments of an enterprise in greater and greater detail. With a rich set of distinctions mapped onto mathematical structures, semantic interoperability among databases can be expressed formally and verified across the Semantic Web. And this can be done automatically, i.e. by software. Databases will be able to talk to databases directly, and know when the same names of data objects mean the same thing and when they don't. Currently, OWL (WC3's Web Ontology Language) and RDF (WC3's Resource Description Framework) are the technologies in which these ontologies are most likely to be expressed. But enterprise-level ontologies themselves remain for the most part unformulated.
There is important ontology work to be done at the enterprise level.
COMPUTER SCIENCE STUDENTS
This book may also be useful to computer science students, especially at the graduate level. It shows how much work remains to be done after the mathematics of bitemporal data ? its structures and algorithms ? are fully defined. The best direction for any topic in computer science to evolve, of course, is towards reducing the additional useful work that always remains after current theory has straightened up what it can. Several of the advanced topics discussed in this book illustrate this point. I would mention, in particular, the distinction between the physical serialization of transac-tions and the semantic serialization of the statements they make, a topic discussed in Chapter 14.
COMPUTER SCIENTISTS
These advanced topics may also prove of interest to computer scientists working on bitemporal data. An example is the explicit representation of statements as managed objects in databases. The standard theory of bitemporal data does not distinguish between rows of data and the statements which they express. That theory defines the transaction time of a row as the time which begins when the row is physically created. In my previous book, I and my co-author replaced transaction time with assertion time, and defined assertion time as the time which begins when the owners of a database are ready to assign the status of making a true statement to a row. This point in time is normally identical to the point in time at which the row is physically created. But there is no reason why the statement-making status of a row cannot be postponed until some time after the row is physically created.
The issue here is simply that transaction time is a physical concept and assertion time is a semantic one. Transaction time is about rows of data. Assertion time is about statements expressed by those rows.
In this book, statements are explicitly represented as managed objects. Statements may be asserted to be true, or asserted to be false, by those who make them. Other people may assent to a statement, or dissent from it. Still others may take note of a statement, but withhold judgment as to its truth value. Thus, truth values are not inherent in statements. They are ascribed to statements by people. These different stances that different people may take to the same statement are proposi-tional attitudes these people have to these statements. These propositional attitudes are expressed in speech acts made by people in which they indicate what they think about the possible truth of a statement. Speech act time is thus a period of time which relates a person or a group of persons to a statement. It is a generalization of the assertion time of my previous book, and a relativization of that time to one person's or one group's relationship to one statement.
Rows are different managed objects. They are not statements. They are physical inscriptions of statements. Hence the transaction time of the standard theory is called inscription time in this book. However, creating inscriptions, copying and/or moving inscriptions, and physically removing inscriptions, are also acts performed by people. Thus, in this book, different inscriptional actions are relationships between the people who perform those actions, and the rows which appear in and disappear from databases as a result of those actions.
Valid time was never in dispute. It is what, in this book, I call state time. So although most of this book is a description of bitemporal data, and specifically a vendor-neutral description of how the bitemporal data defined in the ISO and TSQL2 standards is managed in databases, I currently believe that the most complete and semantically consistent theory of temporal data is a tritemporal theory. The three temporal dimensions of this tritemporal theory of time are these:
Valid time was never in dispute. It is what, in this book, I call state time. So although most of this book is a description of bitemporal data, and specifically a vendor-neutral description of how the bitemporal data defined in the ISO and TSQL2 standards is managed in databases, I currently believe that the most complete and semantically consistent theory of temporal data is a tritemporal theory. The three temporal dimensions of this tritemporal theory of time are these:
Speech Act Time. Different people have different attitudes, at different times, about the truth value of different statements. The temporal intervals of these person-relative attitudes exist in a second temporal dimension which I call speech-act time.
Inscription Time. Rows physically exist in a database from the moment they are created to the moment they are physically deleted from that database. The temporal intervals of these inscriptions exist in a third temporal dimension which I call inscription time.
This tritemporal theory of temporal data is described in Chapter 19 of this book.
A COMPANION VOLUME TO MANAGING TIME IN RELATIONAL DATABASES
Many of the concepts discussed in this book were also discussed in my earlier book Managing Time in Relational Databases, co-authored with Randy Weis. When referring to that book, I will use the abbreviation "MTRD". These two books can usefully be considered companion volumes. MTRD focused on the syntax of how bitemporal data is implemented, although it did not ignore the semantics of the information that bitemporal data alone can make available.
This book extends both discussions. It brings in concepts from logic, ontology and linguistics which deepen our understanding of the semantics of bitemporal data. Among implementation issues, it shows how bitemporal data can be used in star schema databases, and demonstrates that none of the varieties of slowly-changing dimensions can provide the information that bitemporal data provides.
These two books can also be usefully contrasted. In MTRD, my co-author and I essentially argued for replacing the standard theory's transaction time with our assertion time. In this book, I argue that we need to keep transaction time and add assertion time as a distinct temporal dimen-sion. This gives us one temporal dimension for things (state time), one temporal dimension for statements (speech act time), and one temporal dimension for rows (inscription time).
AN EXTENSIVE GLOSSARY
As in MTRD, an extensive Glossary accompanies this book. Unlike MTRD, that Glossary is not contained in this book. Instead, it is available from Morgan-Kaufmann at (http://booksite.elsevier.com/9780124080676).
As in MTRD, there is also a Glossary List at the end of each chapter. These are lists of the most important of the temporal data technical terms introduced in this book that are used in the chapter. Each of these terms is defined in the Glossary.
A NOTE ON STYLE
Some people who come up with ideas like to publish their results, and nothing but those results. For them, the thought process which led to the results is a private matter. All that anyone else is entitled to know are those results. All that anyone else cares to know, perhaps they think, are those results. The mathematician Carl Friedrich Gauss put it this way: "One does not leave the scaffold-ing around the cathedral" (quoted in Rockmore, 2006, pp. 39-40).
For some topics, that's the right way to do things. In a manual for a programming language, for example, no one can reasonably be expected to be interested in what led the designers of the lan-guage to include the features they did. But in a book on how to design programming languages, how one thinks about designing a language is at least as important as a list of features that the author believes all programming languages should include.
So should a book on bitemporal data be more like a language manual, or more like a book on how to design a language? Should it be a book exclusively about bitemporal data structures and the algorithms that manipulate instances of those structures? Or should it be a book which explains those things in the context of why we are beginning to use bitemporal data, why the bitemporal structures of that data include the components they include, and why the algorithms that transform that data do what they do?
Most computer science publications about bitemporal data tend to follow Gauss' lead. They emphasize results, and they demonstrate the correctness of those results. They are not written, for the most part, to show the train of thought that led to those results. Temporal SQL standards docu-ments, and the technical manuals for DBMSs which support bitemporal data, are also written with-out any mention of scaffolding.
My objective is to do both things. I have written to show both the cathedral and the scaffolding. I have attempted to show how I thought my way through to the conclusions I reached. That should make it easier for others to discover my mistakes, and to do better themselves.
I have written to show, for example, not only what temporal entity integrity and temporal refer-ential integrity are, but in what sense they are temporal extensions of entity and referential integ-rity, and in what sense each form of the two constraints implements rules which give data its meaning, rules without which that data would not express information. Which, after all, is the rea-son we go to the expense of managing data in the first place
This makes this book more discursive, more conversational, than most technical books. It is my sincere hope that, at the end of this book, readers will have gained insights into bitemporal data that they would not have gained from a book written more like a technical manual or a standards document.
LOOKING FORWARD
Bitemporal data provides a history, and a future. On one timeline, which I call state time, it is the history and future of the things data is about. On another timeline, which I call assertion time, it is the history and future of the data itself. Specifically, bitemporal data makes the following kinds of information available. 2
2 "State time" is my term for what computer scientists, vendors, and the SQL standards ? i.e. pretty much everybody else ? calls "valid time". "Assertion time", excluding its extension into future time, is my term for what nearly every-body else calls "transaction time". My reasons for being a terminological iconoclast will become clear as this book pro-gresses. My basic reason is that bad terminology makes you think badly.
As-was states of things. Past state time makes it possible for queries and reports to be run that show what the things we are interested in used to be like, based on any corrections that may have been made to the data in the meantime.
As-will-be states of things. Future state time makes it possible for inserts, updates and deletes to be made proactively, to make information available about the anticipated future states of things.
As-was data about things. Past assertion time makes it possible for queries and reports to be rerun that show what we used to say the things we are interested were like, are like, or will be like. It makes it possible to do this without restoring old data, without redirection to different table names, and it produces exactly the same results that appeared when the queries and reports were originally run ? even if corrections to that data were later made.
As-will-be data about things. Future assertion time ? introduced in MTRD as part of the Asserted Versioning method of managing temporal data ? makes it possible for staging areas and sandboxes to co-exist in the same tables as the production data they are about, and makes it possible to switch such data into production status instantaneously, without latency delays, and regardless of volume. It makes it possible to load a large volume of transactions over multiple update cycles, whose results must all become visible in the database at the same moment in time.
These capabilities are clearly useful in almost every industry and every subject area. So imagine that these capabilities are available in every database, that all existing transactions, queries and reports against those databases work correctly, without modification, and that all temporal updates and queries are seamlessly integrated with them.
This is possible. This is the future of relational databases. From this perspective, non-temporal tables in non-temporal databases are like a tiny moving window on this two-dimensional realm of data. This tiny window shows us, with each tick of the clock, what the things our data is about are like at that moment, based on what the data itself is like at that moment. This tiny window loses, with each tick of the clock, all information about the past. This tiny window is unable, at any tick of the clock, to show us what things may be like in the future.
Bitemporal data expands this tiny window to the infinite horizons of past and future, along both temporal dimensions.

查看更多

目录

Foreword xv

Preface xix

Acknowledgments xxxi

CHAPTER 1 Bitemporal Data: Preliminaries 1

Nontemporal, Unitemporal and Bitemporal Data 3

      Nontemporal Tables 4

      Unitemporal Tables 6

      Bitemporal Tables 10

Semantics and its Implementations 11

Glossary List 13

PART 1 THEORY

CHAPTER 2 Time and Temporal Terminology 19

      Time 19

      Instants and Moments 19

      Clock Ticks 20

      Time Periods 24

      Temporal Terminology 26

      Temporal Dimensions 27

      Types of Tables 30

      A Choice of Terminologies 32

      Glossary List 33

CHAPTER 3 The Relational Paradigm: Mathematics 35

      Tables and Columns 35

      Columns and Domains 36

      Cartesian Products 37

      Functions and Primary Keys 38

      Relations 41

      Glossary List 41

CHAPTER 4 The Relational Paradigm: Logic 43

      Propositional Logic 43

      Connectives 43

      Well-Formed Formulas 47

      Transformation Rules 48

      Rules of Inference 51

      Predicate Logic 53

      Statements and Statement Schemas 55

      Logic and the Relational Paradigm 57

      Glossary List 58

CHAPTER 5The Relational Paradigm: Ontology 59

      Types and Instances 61

      A Data Modeling Perspective 61

      A Philosophical Perspective 61

      A Set Theoretic Perspective 62

      A Logic and Language Perspective 63

      An Analogy 63

      Summary 63

      Instances and Identity 64

      The Relational Paradigm Ontology: Aristotelian Roots 64

      Aristotle on Substance 64

      Aristotle on Accidents 65

      Beyond the Aristotelian Roots 67

      The Relational Paradigm Ontology 68

      A Middle Level Extension to the Relational Paradigm Ontology 70

      States and Change 70

      Primary Keys Natural Keys, Foreign Keys 72

      Objects, Events and Change 72

      On Using Ontologies 73

      Integrating the Mathematics and Ontology of the Relational Paradigm 75

      Glossary List 78

CHAPTER 6 The Relational Paradigm: Semantics 79

      Rows, Statements, Assertions and Kindred Notions 80

      Rows, Inscriptions and Sentences 81

      Statements 82

      Disambiguating Statements 84

      Statements and Statement Schemas 88

      Speech Acts 88

      Statements and Assertions 89

      Propositions 91

      Expressing Assertions Explicitly 92

      Glossary List 97

CHAPTER 7 The Allen Relationships 99

      Why the Allen Relationships are Important 99

      A Taxonomy of the Allen Relationships 100

      The Basic Allen Relationships 100

      Combinations of the Allen Relationships 107

      A Binary Partitioning of the Allen Relationships Taxonomy 107

      Common Timeline Time Periods: [Includes] or [Excludes] 108

      [Includes]:[Contains] or [Overlaps] 108

      [Contains]:[Equals] or [Encloses] 109

      [Encloses]:[Aligns With] or [During] 110

      [Aligns With]:[Starts] or [Finishes] 111

      [Excludes]:[Before] or [Meets] 112

      An Allen Relationship Thought Experiment 114

      Glossary List 115

CHAPTER 8 Temporal Integrity Concepts and Notations 117

      Cubes, Slices and Cells: Data in Three-Dimensional Temporal Space 117

      Semantically Anomalous Relational Tables 122

      Implicit Bitemporal Time 124

      Glossary List 126

CHAPTER 9 Temporal Entity Integrity 127

      Entity Integrity 127

      Bitemporal Entity Integrity 128

      Some Bitemporal Transactions 128

      State-Time Entity Integrity 135

      Conventional Entity Integrity 137

      Glossary List 139

CHAPTER 10 Temporal Referential Integrity 141

      Temporal Foreign Keys 141

      Episodes 143

      State-Time Referential Integrity 146

      A State-Time Delete: Block Mode 147

      A State-Time Delete: Cascade Mode 148

      A State-Time Delete: Set Null Mode 150

      Bitemporal Referential Integrity 150

      Conventional Referential Integrity 154

      Glossary List 158

PART 2 PRACTICE

      CHAPTER 11 Temporal Transactions 165

      An Overview of Temporal Transactions 165

      Basic Temporal Transactions on State-Time Tables 167

      A State-Time Insert With Default Time 168

      A State-Time Update With Default Time 168

      A State-Time Update With Specified State Time 170

      A State-Time Delete With Default Time 171

      Basic Temporal Transactions on Bitemporal Tables 171

      A Bitemporal Insert With Default Time 172

      A Bitemporal Update With Default Time 172

      A Bitemporal Update With Specified State Time 175

      A Bitemporal Delete With Default Time 176

      Whenever Temporal Transactions 177

      A Whenever Insert Transaction 179

      A Whenever Update Transaction 180

      A Whenever Delete Transaction 181

      Temporal Merge Transactions 182

      Glossary List 185

CHAPTER 12 Basic Temporal Queries 187

      Temporal Query Syntax 187

      Bitemporal Tables and Views 190

      The Conventional Table View 190

      The Logfile View 192

      The Version View 195

      Point-In-Time Range Queries 197

      Range Queries 199

      Glossary List 203

CHAPTER 13 Advanced Temporal Queries 205

      A Basic Temporal Range Multi-Table Query 205

      Step 1: Decoalesce and Restricton Assertion Time 207

      Step 2: Decoalesce and Restricton State Time 208

      Step 3: Drop Assertion-Time Period Columns 208

      Step 4: Align on State-Time Boundaries 210

      Step 5: Join on RefId and State Time 212

      A Complex Temporal Range Multi-Table Query 214

      Step 1: Decoalesce and Restricton Assertion Time 215

      Step 2: Decoalesce and Restricton State Time 216

      Step 3: Drop Assertion-Time Period Columns 217

      Step 4: Align on State-Time Boundaries 219

      Step 5: Join on RefId and State-Time 221

      Why Temporal Range Multi-Table Queries are Complex 222

      Glossary List 223

CHAPTER 14 Future Assertion Time 225

      Future Assertion Time: Semantics 225

      The Six-Fold Way 226

      Challenging the Six-Fold Way 227

      The Nine-Fold Way 230

      Future Assertion Time: Implementation 233

      The Time Travel Paradox 234

      Future Assertion Time Locking 236

      Future Transactions With Assertion-Time Locking 237

      Glossary List 240

CHAPTER 15 Temporal Requirements 241

      Updates and Corrections to Conventional Tables 241

      Timestamped Tables 245

      Double-Timestamped Tables 249

      Double-Timestamps and Corrections 252

      The Double-Timestamped Dilemma 253

      The Bitemporal Data Solution 254

      Glossary List 260

CHAPTER 16 Bitemporal Data and the Inmon Data Warehouse 261

      A Brief History of the Data Warehouse 261

      What is an Inmon Data Warehouse? 265

      Subject Orientation 265

      Integration 266

      Time-variance 268

      Nonvolatility 269

      Support for Management Decision-Making 269

      Why Unitemporal Tables Cannot Be Both Time-Variant and Nonvolatile 270

      Two Senses of "As-Was" 271

      The Enterprise Data Warehouse Redefined 272

      The Semantics of the EDW and the Question of its Physical Instantiation 273

      Inmon's Arguments for a Physical EDW 274

      Glossary List 276

      Inmon Terms 276

CHAPTER 17 Semantic Integration via Messaging 277

      The Objectives of an Enterprise Database 277

      Two Paths to Semantic Integration 279

      The Enterprise Data Model as a Canonical Message Model 280

      The Failed Mission of the Enterprise Data Model 281

      A New Mission for the Enterprise Data Model 282

      Glossary List 289

CHAPTER 18 Bitemporal Data and the Kimball Data Warehouse 291

      Star Schemas and Relational Databases 293

      The Star Schema Data Warehouse Architecture 294

      The Star Schema Design Pattern 294

      Reconceptualizing Star Schemas: Fact Tables and Dimension Tables 295

      Events and Objects 295

      Surrogate Keys and Natural Keys 297

      A Bitemporal Star Schema 299

      A Bitemporal Dimension Case Study 300

      Fact Table Analysis 307

      Summary of the Case Study 309

      Bitemporal Dimensions Versus Slowly-Changing Dimensions 311

      Glossary List 313

      Kimball and Ross Terms 313

CHAPTER 19 Time, Types and the Future of Relational Databases 315

      Tritemporal Data and Statement Provenance 316

      Inscription Time, State Time, Speech Act Time 318

      Ontolog izing Relational Databases 318

      The Extended Relational Paradigm Ontology 321

      The Extended Relational Paradigm Metamodel 323

      Atomic Statements and Binary Tables 334

      Looking Ahead 335

      Glossary List 336

CHAPTER 20 Recommendations 337

      Recommendations for IT Professionals in End-User IT Organizations 337

      Recommendations for Standards Committees and Vendors 338

      Remove the Ability to Correct Data in State-Time Tables 339

      Specify Referent Identifiers in SQL Table Definitions 339

      Specify Temporal Unique Identifiers in SQL Table Definitions 340

      Package the Bitemporalization of Conventional Tables 340

      Modify SQL Query Syntax to Clarify Semantics 341

      Add Whenever Temporal Transactions to the Standard 342

      Add Future Transaction Time to the Standard 342

      Glossary List 342

Afterword: Reflections on Mindfulness and Bitemporality 343

Bibliography 347

Index 359

查看更多

作者简介

Dr. Tom Johnston is the Chief Scientist at Asserted Versioning, LLC, which has developed a middleware product which supports the standard theory of bitemporal data, and which also implements the Asserted Versioning extensions to that standard theory. He is the co-author of Managing Time in Relational Databases (Morgan-Kaufmann, 2010). He lives in Atlanta, Georgia.

查看更多

馆藏单位

中科院文献情报中心