The Database reference article from the English Wikipedia on 24-Apr-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Database

A database is an information set with a regular structure. A database is usually but not necessarily stored in some machine readable format accessed by a computer. There are a wide variety of databases, from simple tables stored in a single file to very large databases with many millions of records, stored in rooms full of disk drives.

Databases resembling modern versions were first developed in the 1960s. A pioneer in the field was Charles Bachman.

The most useful way of classifying databases is by the programming model associated with the database. Several models have been in wide use for some time. Historically, the hierarchical model was implemented first, then the network model, then the relational model overcame with the so-called flat model accompanying it for low-end usage. The first two and the last one were never theoretised and were deemed as data models only as a contrast to the relational model, not having conceptual underpinnings of their own; they have arisen simply out of the realisation of physical constraints and programming, not data, models.

Table of contents
1 Database models
2 Implementations and indexing
3 Mapping objects into databases
4 Applications of databases
5 Transactions and concurrency
6 See also
7 References

Database models

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password might be used as a part of a system security database. Each row would have the specific password associated with a specific user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is the basis of the spreadsheet.

The network model allows multiple tables to be used together though the use of pointers (or references). Some columns contain pointers to different tables instead of data. Thus, the tables are related by references, which can be viewed as a network structure. A particular subset of the network model, the hierarchical model, limits the relationships to a tree structure, instead of the more general directed graph structure implied by the full network model.

Relational databases consist not of tables, but of three components: a collection of data structures, namely relations, sometimes incorrectly identified with tables; a collection of operators, the relational algebra and calculus; and a collection of integrity constraints, defining the set of consistent database states and changes of state. The integrity constraints can be of four types: domain (AKA type), attribute, relvar and and database constraints.

Unlike the hierarchical and network models, there are no pointers whatsoever, according to the Information Principle: all information must be represented as data values; attributes of any type represent relationships between relations. Relational databases allow users (including programmers) to write queries that were not anticipated by the database designer. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made relational databases very popular with businesses.

The relational model is a mathematical theory developed by Ted Codd to describe how relational databases should work. Although this theory is the basis for relational database software, very few database management systems actually follow the model very closely and all have features violating the theory, thus increasing complexity and subtracting power. Therefore they should not be called relational DBMSs, but SQL (or some other language) ones.

Implementations and indexing

All of these kinds of database can take advantage of indexing to increase their speed. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used, including b-trees, hashes, and linked lists are all common indexing techniques.

Relational and SQL DBMS have the advantage that indexes can be created or dropped without changing existing applications, because applications don't use the indexes directly. Instead, the database software decides on behalf of the application which indexes to use. The database chooses between many different strategies based on which one it estimates will run the fastest.

Mapping objects into databases

In recent years, the object-oriented paradigm has been applied to databases as well, creating a new programming model known as object databases. These databases attempt to overcome some of the difficulties of using objects with the SQL DBMSs. An object-oriented program allows objects of the same type to have different implementations and behave differently, so long as they have the same interface (polymorphism). This doesn't fit well with a SQL database where user-defined types are difficult to define and use, and where the Two Great Blunders prevail: the identification of classes with tables (the correct identification is of classes with types, and of objects with values), and the usage of pointers.

A variety of ways have been tried for storing objects in a database, but there is little consensus on how this should be done. Implementing object databases undo the benefits of relational model by introducing pointers and making ad-hoc queries more difficult. This is because they are essentially adaptations of obsolete network and hiearchical databases to object-oriented programming. As a result, object databases tend to be used for specialized applications and general-purpose object databases have not been very popular. Instead, objects are often stored in SQL databases using complicated mapping software. At the same time, SQL DBMS vendors have added features to allow objects to be stored more conveniently, drifting even further away from the relational model.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology.

Database application

A database application is a type of computer application dedicated to managing a database. Database applications span a huge variety of needs and purposes, from small user-oriented tools such as an address book, to huge enterprise-wide systems for tasks like accounting.

The term "database application" usually refers to software providing a user interface to a database. The software that actually manages the data is usually called a database management system (DBMS) or (if it is embedded) a database engine.

Examples of database applications include Microsoft Access, dBASE, FileMaker and (to some degree) HyperCard.

In March, 2004, AMR Research (as cited in the CNET News.com article listed in the "References" section) had predicted that open source database applications would come into wide acceptance in 2006.

Transactions and concurrency

In addition to their data model, most practical databases attempt to enforce a database transaction model that has desirable data integrity properties. Ideally, the database software should enforce the ACID rules:

In practice, many DBMS's allow some of these rules to be relaxed for better performance.

Concurrency control is a method used to ensure transactions are executed in a safe manner and follows the ACID rules. The DBMS must be able to ensure only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

See also

References