review: indexing

A database index is a data structure that improves the speed of operations on a database table. Indices can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

The disk space required to store the index is typically less than that required by the table (since indices usually contain only the key-fields according to which the table is to be arranged, and excludes all the other details in the table), yielding the possibility to store indices in memory for a table whose data is too large to store in memory.

In a relational database an index is a copy of part of a table. Some databases extend the power of indexing by allowing indices to be created on functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper case versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indices, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. All of these indexing refinements are supported in Visual FoxPro, for example.^[1]

Indices may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing identical rows in the index and thus, the original columns.

Column order

The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.

For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index. kopf ujhfojp f]

[edit] Applications and limitations

Indices are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure until the Smith entry has been found; this is much less computationally expensive than a full table scan.

Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure. This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@yahoo.com');. This puts the wild-card at the right-most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.

[edit] Types

[edit] Bitmap index

Main article: Bitmap index

A bitmap index is a special kind of index that stores the bulk of its data as bitmaps and answers most queries by performing bitwise logical operations on these bitmaps. The most commonly used index, such as B+trees, are most effective if the values it indexes do not repeat or repeat a relatively smaller number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the gender field in a customer database usually contains two distinct values, male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

[edit] Dense index

A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys the dense index points to the first record with that key.^[5]

[edit] Sparse index

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys the sparse index points to the lowest search key in each block.

review

Tuesday, October 21, 2008

indexing

Column order

[edit] Applications and limitations

[edit] Types

[edit] Bitmap index

[edit] Dense index

[edit] Sparse index

No comments:

Followers

Blog Archive

About Me