review: 2008

Wednesday, December 24, 2008

squirel sql

To get a connection to the database

Name: Give any name

Driver:Mysql Driver

URL:jdbc:mysql://local host:port no/database name

User:root

Password:give the required pasword.

Before adding alias,add Mysql driver
Right click mysql driver->extra class path add mysql.jar
(mysql.jar added externally)
then give ok.

cvs

To do the checkout process in windows:
Before start checking out in windows system,the directory must be configured in the cvs by the cvs admin.

The file must be marked as binary in the cvs

To mark a file as binary:

cvs admin -kb name_of_binary_file

Wednesday, December 17, 2008

mysql

1)To know the instances that are running
mysqld_multi report

2)To start the instance
mysqld_multi start

3)To use the database in the instance
mysql -u demouser -p -h 192.168.1.88 -P 4007 qamart;

Tuesday, December 16, 2008

To run the process in tomcat server

1)Run the build.xml in the program
when u run a war file will be created in a folder.
ex:yumeetl.war will be created in the folder temp

2)Copy the war file and paste in the webapps folder

3)edit the log4configuration.xml whether the path is correctly denoted.
file:/home/sathish/tomcat/webapps/yumeetl/WEB-INF/classes/log4configuration.xml

4)copy and paste yumeetl.xml to /home/sathish/tomcat/conf/Catalina/localhost

5)edit server.xml whether url value is correct

6)Start the tomcat server

To start the process

execute queries in qadb.
truncate table and insert in QRTZ_TRIGGERS, QRTZ_CRONTRIGGERS, QRTZ_JOBDETAILS
update QRTZ_TRIGGERS,

To view log tail -100f yumeappserver.log
To remove rm*.

TOMCAT

TO START TOMCAT

1)Go to tomcat/bin.

2)Set environment variables

a) export JAVA_HOME=/usr/java/jdk1.6.0_03/
b)export CATALINA_HOME=/usr/local/tomcat

3)Enter the command to start
./startup.sh

Friday, October 31, 2008

JavaServer Faces

JavaServer Faces (JSF) is a Java-based Web application framework intended to simplify development of user interfaces for Java EE applications. Unlike other traditional request-driven MVC web frameworks, JSF uses a component-based approach. The state of UI components is saved when the client requests a new page and then is restored when the request is returned. Out of the box, JSF uses JavaServer Pages (JSP) for its display technology, but JSF can also accommodate other display technologies (such as XUL). JSF includes:

A set of APIs for representing user interface (UI) components and managing their state, handling events and input validation, converting values, defining page navigation, and supporting internationalization and accessibility
A default set of UI components
Two JavaServer Pages (JSP) custom tag libraries for expressing a JavaServer Faces interface within a JSP page.
A server-side event model
State management
Managed Beans (JavaBeans created with dependency injection)

EJB

Enterprise JavaBeans™ (EJB) is a managed, server-side component architecture for modular construction of enterprise applications.

The EJB specification is one of several Java APIs in the Java Platform, Enterprise Edition. EJB is a server-side model that encapsulates the business logic of an application. The EJB specification was originally developed in 1997 by IBM and later adopted by Sun Microsystems (EJB 1.0 and 1.1) and enhanced under the Java Community Process as JSR 19 (EJB 2.0), JSR 153 (EJB 2.1) and JSR 220 (EJB 3.0).

The EJB specification intends to provide a standard way to implement the back-end 'business' code typically found in enterprise applications (as opposed to 'front-end' user-interface code). Such code was frequently found to reproduce the same types of problems, and it was found that solutions to these problems are often repeatedly re-implemented by programmers. Enterprise JavaBeans were intended to handle such common concerns as persistence, transactional integrity, and security in a standard way, leaving programmers free to concentrate on the particular problem at hand.

Types:

Session Beans are distributed objects having state: that is, they keep track of which calling program they are dealing with throughout a session. For example, checking out in a web store might be handled by a stateful session bean, which would use its state to keep track of where the customer is in the checkout process. Stateful session beans' state may be persisted, but access to the bean instance is limited to only one client.

Stateless Session Beans are distributed objects that do not have state associated with them thus allowing concurrent access to the bean. The contents of instance variables are not guaranteed to be preserved across method calls. The lack of overhead to maintain a conversation with the calling program makes them less resource-intensive than stateful beans. Sending an e-mail to customer support might be handled by a stateless bean, since this is a one-off operation and not part of a multi-step process.

Message Driven Beans were introduced in the EJB 2.0 specification, which is supported by Java 2 Platform, Enterprise Edition 1.3 or higher. The message bean represents the integration of JMS (Java Message Service) with EJB to create an entirely new type of bean designed to handle asynchronous JMS messages. Message Driven Beans are distributed objects that behave asynchronously. That is, they handle operations that do not require an immediate response.

Thursday, October 30, 2008

Application Layer

The Application Layer is the seventh level of the seven-layer OSI model, and the top layer of the TCP/IP model. It interfaces directly to and performs common application services for the application processes; it also issues requests to the Presentation Layer (OSI).

The common application layer services provide semantic conversion between associated application processes. Note: Examples of common application services of general interest include the virtual file, virtual terminal, and job transfer and manipulation protocols.

The application layer of the TCP/IP model corresponds to the entire group of application layer, Presentation Layer, and Session Layer in the seven layer OSI model.

The Data Link Layer is Layer 2 of the seven-layer OSI model. It responds to service requests from the Network Layer and issues service requests to the Physical Layer.

The Data Link Layer is the protocol layer which transfers data between adjacent network nodes in a wide area network or between nodes on the same local area network segment. The Data Link Layer provides the functional and procedural means to transfer data between network entities and might provide the means to detect and possibly correct errors that may occur in the Physical Layer. Examples of data link protocols are Ethernet for local area networks (multi-node) and PPP, HDLC and ADCCP for point-to-point (dual-node) connections.

The data link is all about getting information from one place to a selection of other close, local places. At this layer one does not need to be able to go everywhere globally, just able to go somewhere else locally. In the OSI model protocol stack the Network Layer, which is on top of the Data Link Layer, is analogous to the postal office making a best effort to delivering international mail. If a parcel is to be delivered from London to New York it can be sent via a variety of means: it can travel across the Atlantic by air or by sea, for which the exact route itself can also vary. The postal office (the Network Layer) only needs to try to get the parcel from the source to the correct destination regardless of the exact path it takes. The Data Link Layer in this analogy will be more akin to the role of a truck driver: the driver needs to know the local route to get from the post office to the airport/port. In fact, the driver would not need to know that the parcel he/she is delivering is ultimately bound for New York.

The Data Link Layer also serves the function of media access control. An example would be in an apartment building there is an WLAN access point (AP) in each of two neighboring apartments. A client can request access to one of the APs (say, AP A) by sending radio-frequency signals from his/her laptop. Since the two APs are in close proximity they may both be able to receive the request signals sent out by the client. It is the job of the Data Link Layer protocol to let AP B know that when it receives the client's signals they are not intended for it but for another AP. For AP A the decision as to whether the client is permitted access can also occur on the Data Link Layer.

The data link thus provides data transfer across the physical link. That transfer might or might not be reliable; many data link protocols do not have acknowledgments of successful frame reception and acceptance, and some data link protocols might not even have any form of checksum to check for transmission errors. In those cases, higher-level protocols must provide flow control, error checking, and acknowledgments and retransmission.

Presentation Layer

The Presentation Layer is the sixth level of the seven layer OSI model. It responds to service requests from the Application Layer and issues service requests to the Session Layer.

The Presentation Layer is responsible for the delivery and formatting of information to the application layer for further processing or display. It relieves the application layer of concern regarding syntactical differences in data representation within the end-user systems. Note: An example of a presentation service would be the conversion of an EBCDIC-coded text file to an ASCII-coded file.

The Presentation Layer is the first one where people start to care about what they are sending at a more advanced level than just a bunch of ones and zeros. This layer deals with issues like how strings are represented - whether they use the Pascal method (an integer length field followed by the specified amount of bytes) or the C/C++ method (null-terminated strings, i.e. "thisisastring\0"). The idea is that the application layer should be able to point at the data to be moved, and the Presentation Layer will deal with the rest.

Encryption is typically done at this level too, although it can be done at the Application, Session, Transport, or Network Layer; each having its own advantages and disadvantages. Another example is representing structure, which is normally standardized at this level, often by using XML. As well as simple pieces of data, like strings, more complicated things are standardized in this layer. Two common examples are 'objects' in object-oriented programming, and the exact way that streaming video is transmitted.

In many widely used applications and protocols, no distinction is made between the presentation and application layers. For example, HTTP, generally regarded as an application layer protocol, has Presentation Layer aspects such as the ability to identify character encoding for proper conversion, which is then done in the Application Layer.

Data Types in Mysql

MySQL supports a number of data types in several categories: numeric types, date and time types, and string (character) types. This chapter first gives an overview of these data types, and then provides a more detailed description of the properties of the types in each category, and a summary of the data type storage requirements. The initial overview is intentionally brief. The more detailed descriptions later in the chapter should be consulted for additional information about particular data types, such as the allowable formats in which you can specify values.

Data type descriptions use these conventions:

M indicates the maximum display width for integer types. For floating-point and fixed-point types, M is the total number of digits that can be stored. For string types, M is the maximum length. The maximum allowable value of M depends on the data type.
D applies to floating-point and fixed-point types and indicates the number of digits following the decimal point. The maximum possible value is 30, but should be no greater than M–2.
Square brackets (“[” and “]”) indicate optional parts of type definitions.

As an extension to the SQL standard, MySQL also supports the integer types TINYINT, MEDIUMINT, and BIGINT. The following table shows the required storage and range for each of the integer types.

Type	Bytes	Minimum Value	Maximum Value
		(Signed/Unsigned)	(Signed/Unsigned)
`TINYINT`	1	`-128`	`127`
		`0`	`255`
`SMALLINT`	2	`-32768`	`32767`
		`0`	`65535`
`MEDIUMINT`	3	`-8388608`	`8388607`
		`0`	`16777215`
`INT`	4	`-2147483648`	`2147483647`
		`0`	`4294967295`
`BIGINT`	8	`-9223372036854775808`	`9223372036854775807`
		`0`	`18446744073709551615`

The date and time types for representing temporal values are DATETIME, DATE, TIMESTAMP, TIME, and YEAR. Each temporal type has a range of legal values, as well as a “zero” value that may be used when you specify an illegal value that MySQL cannot represent. The TIMESTAMP type has special automatic updating behavior, described later on. For temporal type storage requirements, see Section 10.5, “Data Type Storage Requirements”.

MySQL gives warnings or errors if you try to insert an illegal date. By setting the SQL mode to the appropriate value, you can specify more exactly what kind of dates you want MySQL to support. (See Section 5.1.7, “SQL Modes”.) You can get MySQL to accept certain dates, such as '2009-11-31', by using the ALLOW_INVALID_DATES SQL mode. This is useful when you want to store a “possibly wrong” value which the user has specified (for example, in a web form) in the database for future processing. Under this mode, MySQL verifies only that the month is in the range from 0 to 12 and that the day is in the range from 0 to 31. These ranges are defined to include zero because MySQL allows you to store dates where the day or month and day are zero in a DATE or DATETIME column. This is extremely useful for applications that need to store a birthdate for which you do not know the exact date. In this case, you simply store the date as '2009-00-00' or '2009-01-00'. If you store dates such as these, you should not expect to get correct results for functions such as DATE_SUB() or DATE_ADD() that require complete dates. (If you do not want to allow zero in dates, you can use the NO_ZERO_IN_DATE SQL mode).

Prior to MySQL 5.1.18, when DATE values are compared with DATETIME values, the time portion of the DATETIME value is ignored, or the comparison could be performed as a string compare. Starting from MySQL 5.1.18, a DATE value is coerced to the DATETIME type by adding the time portion as '00:00:00'. To mimic the old behavior, use the CAST() function to cause the comparison operands to be treated as previously. For example:

date_col = CAST(NOW() AS DATE);

MySQL also allows you to store '0000-00-00' as a “dummy date” (if you are not using the NO_ZERO_DATE SQL mode). This is in some cases more convenient (and uses less data and index space) than using NULL values.

Here are some general considerations to keep in mind when working with date and time types:

MySQL retrieves values for a given date or time type in a standard output format, but it attempts to interpret a variety of formats for input values that you supply (for example, when you specify a value to be assigned to or compared to a date or time type). Only the formats described in the following sections are supported. It is expected that you supply legal values. Unpredictable results may occur if you use values in other formats.
Dates containing two-digit year values are ambiguous because the century is unknown. MySQL interprets two-digit year values using the following rules:
- Year values in the range 70-99 are converted to 1970-1999.
- Year values in the range 00-69 are converted to 2000-2069.
Although MySQL tries to interpret values in several formats, dates always must be given in year-month-day order (for example, '98-09-04'), rather than in the month-day-year or day-month-year orders commonly used elsewhere (for example, '09-04-98', '04-09-98').
MySQL automatically converts a date or time type value to a number if the value is used in a numeric context and vice versa.
By default, when MySQL encounters a value for a date or time type that is out of range or otherwise illegal for the type (as described at the beginning of this section), it converts the value to the “zero” value for that type. The exception is that out-of-range TIME values are clipped to the appropriate endpoint of the TIME range.

The following table shows the format of the “zero” value for each type. Note that the use of these values produces warnings if the NO_ZERO_DATE SQL mode is enabled.

Data Type “Zero” Value

DATETIME '0000-00-00 00:00:00'

DATE '0000-00-00'

TIMESTAMP '0000-00-00 00:00:00'

TIME '00:00:00'

YEAR 0000
The “zero” values are special, but you can store or refer to them explicitly using the values shown in the table. You can also do this using the values '0' or 0, which are easier to write.
“Zero” date or time values used through MyODBC are converted automatically to NULL in MyODBC 2.50.12 and above, because ODBC cannot handle such values.

The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.

The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.

The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

If strict SQL mode is not enabled and you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of non-space characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.7, “SQL Modes”.

For VARCHAR columns, excess trailing spaces are truncated prior to insertion and a warning is generated, regardless of the SQL mode in use. For CHAR columns, truncation of excess trailing spaces from inserted values is performed silently regardless of the SQL mode.

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

The following table illustrates the differences between CHAR and VARCHAR by showing the result of storing various string values into CHAR(4) and VARCHAR(4) columns (assuming that the column uses a single-byte character set such as latin1):

Value	`CHAR(4)`	Storage Required	`VARCHAR(4)`	Storage Required
`''`	`' '`	4 bytes	`''`	1 byte
`'ab'`	`'ab '`	4 bytes	`'ab'`	3 bytes
`'abcd'`	`'abcd'`	4 bytes	`'abcd'`	5 bytes
`'abcdefgh'`	`'abcd'`	4 bytes	`'abcd'`	5 bytes

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than non-binary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values.

The allowable maximum length is the same for BINARY and VARBINARY as it is for CHAR and VARCHAR, except that the length for BINARY and VARBINARY is a length in bytes rather than in characters.

The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary collation for the column character set to be used, and the column itself contains non-binary character strings rather than binary byte strings. For example, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET latin1 COLLATE latin1_bin, assuming that the default character set is latin1. This differs from BINARY(5), which stores 5-bytes binary strings that have no character set or collation. For information about differences between non-binary string binary collations and binary strings,

A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to the four BLOB types and have the same maximum lengths and storage requirements.

An ENUM is a string object with a value chosen from a list of allowed values that are enumerated explicitly in the column specification at table creation time.

An enumeration value must be a quoted string literal; it may not be an expression, even one that evaluates to a string value.

CREATE TABLE sizes (
   name ENUM('small', 'medium', 'large')
);

A SET is a string object that can have zero or more values, each of which must be chosen from a list of allowed values specified when the table is created. SET column values that consist of multiple set members are specified with members separated by commas (“,”). A consequence of this is that SET member values should not themselves contain commas.

For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:

''
'one'
'two'

'one,two'

Tuesday, October 28, 2008

timestamp

A timestamp is a sequence of characters, denoting the date and/or time at which a certain event occurred. This data is usually presented in a consistent format, allowing for easy comparison of two different records and tracking progress over time; the practice of recording timestamps in a consistent manner along with the actual data is called timestamping.

Data logging is the practice of recording sequential data, often chronologically.

Wednesday, October 22, 2008

joins

A SQL JOIN clause combines records from two tables in a relational database, resulting in a new, temporary table, sometimes called a "joined table". A JOIN may also be thought of as a SQL operation that relates tables by means of values common between them. SQL specifies four types of JOIN: INNER, OUTER, LEFT, and RIGHT. In special cases, a table (base table, view, or joined table) can JOIN to itself in a self-join.

A programmer writes a JOIN predicate to identify the records for joining. If the predicate evaluates positively, the combined record is inserted into the temporary (or "joined") table. Any predicate supported by SQL can become a JOIN-predicate, for example, WHERE-clauses.

An inner join requires each record in the two joined tables to have a matching record. An inner join essentially combines the records from two tables (A and B) based on a given join-predicate. The result of the join can be defined as the outcome of first taking the Cartesian product (or cross-join) of all records in the tables (combining every record in table A with every record in table B) - then return all records which satisfy the join predicate. Actual SQL implementations will normally use other approaches where possible, since computing the Cartesian product is not very efficient. This type of join occurs most commonly in applications, and represents the default join-type.

SQL:2003 specifies two different syntactical ways to express joins. The first, called "explicit join notation", uses the keyword JOIN, whereas the second uses the "implicit join notation". The implicit join notation lists the tables for joining in the FROM clause of a SELECT statement, using commas to separate them. Thus, it specifies a cross-join, and the WHERE clause may apply additional filter-predicates. Those filter-predicates function comparably to join-predicates in the explicit notation.

One can further classify inner joins as equi-joins, as natural joins, or as cross-joins (see below).

Programmers should take special care when joining tables on columns that can contain NULL values, since NULL will never match any other value (or even NULL itself), unless the join condition explicitly uses the IS NULL or IS NOT NULL predicates.

Equi-join

An equi-join, also known as an equijoin, is a specific type of comparator-based join, or theta join, that uses only equality comparisons in the join-predicate. Using other comparison operators (such as <) disqualifies a join as an equi-join. The query shown above has already provided an example of an equi-join:

SELECT Employee.lastName, Employee.DepartmentID, Department.DepartmentName
FROM Employee INNER JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
ORDER BY Employee.lastName;

The resulting joined table contains two columns named DepartmentID, one from table Employee and one from table Department

SQL:2003 does not have a specific syntax to express equi-joins, but some database engines provide a shorthand syntax: for example, MySQL and PostgreSQL support USING(DepartmentID) in addition to the ON ... syntax.

[edit] Natural join

A natural join offers a further specialization of equi-joins. The join predicate arises implicitly by comparing all columns in both tables that have the same column-name in the joined tables. The resulting joined table contains only one column for each pair of equally-named columns.

Cross join

A cross join, cartesian join or product provides the foundation upon which all types of inner joins operate. A cross join returns the cartesian product of the sets of records from the two joined tables. Thus, it equates to an inner join where the join-condition always evaluates to True or join-condition is absent in statement.

If A and B are two sets, then the cross join is written as A × B.

The SQL code for a cross join lists the tables for joining (FROM), but does not include any filtering join-predicate.

Example of an explicit cross join:

SELECT *
FROM   employee CROSS JOIN department

Example of an implicit cross join:

SELECT *
FROM   employee, department;

Outer joins

An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record—even if no other matching record exists. Outer joins subdivide further into left outer joins, right outer joins, and full outer joins, depending on which table(s) one retains the rows from (left, right, or both).

(For a table to qualify as left or right its name has to appear after the FROM or JOIN keyword, respectively.)

No implicit join-notation for outer joins exists in SQL:2003.

[edit] Left outer join

The result of a left outer join (or simply left join) for tables A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B). This means that if the ON clause matches 0 (zero) records in B, the join will still return a row in the result—but with NULL in each column from B. This means that a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate).

For example, this allows us to find an employee's department, but still to show the employee even when their department does not exist (contrary to the inner-join example above, where employees in non-existent departments are excluded from the result).

Example of a left outer join, with the additional result row italicized:

SELECT * 
FROM   employee  LEFT OUTER JOIN department

Right outer join

A right outer join (or right join) closely resembles a left outer join, except with the tables reversed. Every record from the "right" table (B) will appear in the joined table at least once. If no matching row from the "left" table (A) exists, NULL will appear in columns from A for those records that have no match in A.

A right outer join returns all the values from the right table and matched values from the left table (NULL in case of no matching join predicate).

For example, this allows us to find each employee and their department, but still show departments that have no employees.

Example right outer join, with the additional result row italicized:

SELECT *
FROM   employee RIGHT OUTER JOIN department

Full outer join

A full outer join combines the results of both left and right outer joins. The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.

For example, this allows us to see each employee who is in a department and each department that has an employee, but also see each employee who is not part of a department and each department who doesn't have an employee.

Example full outer join:

SELECT * 
FROM   employee
      FULL OUTER JOIN department
         ON employee.DepartmentID = department.DepartmentID


         ON employee.DepartmentID = department.DepartmentID


         ON employee.DepartmentID = department.DepartmentID

CVS

In the field of software development, the Concurrent Versions System (CVS), also known as the Concurrent Versioning System, provides a version control system based on open-source code. Version control system software keeps track of all work and all changes in a set of files, and allows several developers (potentially widely separated in space and/or time) to collaborate. Dick Grune developed CVS in the 1980s. CVS has become popular in the open source software world and is released under the GNU General Public License.

CVS uses a client-server architecture: a server stores the current version(s) of a project and its history, and clients connect to the server in order to "check out" a complete copy of the project, work on this copy and then later "check in" their changes. Typically, the client and server connect over a LAN or over the Internet, but client and server may both run on the same machine if CVS has the task of keeping track of the version history of a project with only local developers. The server software normally runs on Unix (although at least the CVSNT server supports various flavors of Microsoft Windows and Linux), while CVS clients may run on any major operating-system platform.

Several developers may work on the same project concurrently, each one editing files within their own "working copy" of the project, and sending (or checking in) their modifications to the server. To avoid the possibility of people stepping on each other's toes, the server will only accept changes made to the most recent version of a file. Developers are therefore expected to keep their working copy up-to-date by incorporating other people's changes on a regular basis. This task is mostly handled automatically by the CVS client, requiring manual intervention only when a conflict arises between a checked-in modification and the yet-unchecked local version of a file.

If the check-in operation succeeds, then the version numbers of all files involved automatically increment, and the CVS-server writes a user-supplied description line, the date and the author's name to its log files. CVS can also run external, user-specified log processing scripts following each commit. These scripts are installed by an entry in CVS's loginfo file, which can trigger email notification or convert the log data into a Web-based format.

Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number. Many open-source projects allow "anonymous read access", a feature pioneered by OpenBSD. This means that clients may check out and compare versions with either a blank or simple published password (e.g., "anoncvs"); only the check-in of changes requires a personal account and password in these scenarios.

Clients can also use the "update" command in order to bring their local copies up-to-date with the newest version on the server. This eliminates the need for repeated downloading of the whole project.

CVS can also maintain different "branches" of a project. For instance, a released version of the software project may form one branch, used for bug fixes, while a version under current development, with major changes and new features, can form a separate branch.

CVS uses delta compression for efficient storage of different versions of the same file. The implementation favors files with many lines (usually text files) - in extreme cases the system may store individual copies of each version rather than deltas.

operational data store

An operational data store (or "ODS") is a database designed to integrate data from multiple sources to make analysis and reporting easier. Because the data originates from multiple sources, the integration often involves cleaning, resolving redundancy and checking against business rules for integrity. An ODS is usually designed to contain low level or atomic (indivisible) data (such as transactions and prices) with limited history that is captured "real time" or "near real time" as opposed to the much greater volumes of data stored in the Data warehouse generally on a less frequent basis.

According to Bill Inmon, the originator of the concept, an ODS is "a subject-oriented, integrated, volatile, current-valued, detailed-only collection of data in support of an organization's need for up-to-the-second, operational, integrated, collective information."

ODS differ from Inmon's definition of Enterprise data warehouse by having a limited history, and more frequent update than an EDW. In practice ODS tend to be more reflective of source structures in order to speed implementations and provide a truer representation of production data.

Tuesday, October 21, 2008

data mart

A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.^[1] Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may or may not derive from a data warehouse and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.

One writer, Marc Demerest, suggests combining the ideas into a Universal Data Architecture (UDA). In practice, many products and companies offering data warehouse services also tend to offer data mart capabilities or services.

There can be multiple data marts inside a single corporation; each one relevant to one or more business units for which it was designed. DMs may or may not be dependent or related to other data marts in a single corporation. If the data marts are designed using conformed facts and dimensions, then they will be related. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.^[2] This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.

Data warehouse

A Data Warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis^[1].

This classic definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

In contrast to data warehouses are operational systems which perform day-to-day transaction processing.

Benefits of data warehousing

Some of the benefits that a data warehouse provides are as follows: ^[2]^[3]

A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.
Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.
Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.
Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.
Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.

[edit] Data warehouse architecture

Architecture, in the context of an organization's data warehousing efforts, is a conceptualization of how the data warehouse is built. There is no right or wrong architecture. The worthiness of the architecture can be judged in how the conceptualization aids in the building, maintenance, and usage of the data warehouse.

One possible simple conceptualization of a data warehouse architecture consists of the following interconnected layers:

Operational database layer: The source data for the data warehouse - An organization's ERP systems fall into this layer.
Informational access layer: The data accessed for reporting and analyzing and the tools for reporting and analyzing data - Business intelligence tools fall into this layer. And the Inmon-Kimball differences about design methodology, discussed later in this article, have to do with this layer.
Data access layer: The interface between the operational and informational access layer - Tools to extract, transform, load data into the warehouse fall into this layer.
Metadata layer: The data directory - This is often usually more detailed than an operational system data directory. There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.

Bottom-up design

Ralph Kimball, a well-known author on data warehousing, ^[4] is a proponent of the bottom-up approach to data warehouse design. In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Data marts contain atomic data and, if necessary, summarized data. These data marts can eventually be unioned together to create a comprehensive data warehouse. The combination of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture".^[5]

Business value can be returned as quickly as the first data marts can be created. Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball words, this means that the dimensions "conform".

[edit] Top-down design

Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise.^[5] Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities. The CIF is driven by data provided from business operations

normalization

Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.

Higher degrees of normalization typically involve more tables and create the need for a larger number of joins, which can reduce performance. Accordingly, less highly normalized tables are typically used in database applications involving many isolated transactions (e.g. an automated teller machine), while more normalized tables tend to be used in database applications that need to map complex relationships between data entities and data attributes (e.g. a reporting application, or a full-text search application).^{[citation needed]}

Database theory describes a table's degree of normalization in terms of normal forms of successively higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second normal form (2NF) as well; but the reverse is not necessarily the case.

Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally, it is meaningful to consider the extent to which the representation is normalized.

Normal forms

The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF.

The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n.

Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms.

Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole key and nothing but the key". The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-to-many relationships among attributes. Sixth normal form (6NF) incorporates considerations relevant to temporal databases.

[edit] First normal form

Main article: First normal form

A table is in first normal form (1NF) if and only if it represents a relation.^[3] Given that database tables embody a relation-like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF.

Note that the restriction on nullable columns as a 1NF requirement, as espoused by Chris Date, et. al., is controversial. This particular requirement for 1NF is a direct contradiction to Dr. Codd's vision of the relational database, in which he stated that "null values" must be supported in a fully relational DBMS in order to represent "missing information and inapplicable information in a systematic way, independent of data type."^[4] By redefining 1NF to exclude nullable columns in 1NF, no level of normalization can ever be achieved unless all nullable columns are completely eliminated from the entire database. This is in line with Date's and Darwen's vision of the perfect relational database, but can introduce additional complexities in SQL databases to the point of impracticality.^[5]

One requirement of a relation is that every table contains exactly one value for each attribute. This is sometimes expressed as "no repeating groups"^[6]. While that statement itself is axiomatic, experts disagree about what qualifies as a "repeating group", in particular whether a value may be a relation value; thus the precise definition of 1NF is the subject of some controversy. Notwithstanding, this theoretical uncertainty applies to relations, not tables. Table manifestations are intrinsically free of variable repeating groups because they are structurally constrained to the same number of columns in all rows.

Put at its simplest; when applying 1NF to a database, every record must be the same length. This means that each record has the same number of fields, and none of them contains a null value.

[edit] Second normal form

Main article: Second normal form

The criteria for second normal form (2NF) are:

The table must be in 1NF.
None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies.^[7] For example, consider an "Employees' Skills" table whose attributes are Employee ID, Employee Name, and Skill; and suppose that the combination of Employee ID and Skill uniquely identifies records within the table. Given that Employee Name depends on only one of those attributes – namely, Employee ID – the table is not in 2NF.
In simple terms, a table is 2NF if it is in 1NF and all fields are dependent on the whole of the primary key, or a relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.
Note that if none of a 1NF table's candidate keys are composite – i.e. every candidate key consists of just one attribute – then we can say immediately that the table is in 2NF.
All columns must be a fact about the entire key, and not a subset of the key.

[edit] Third normal form

Main article: Third normal form

The criteria for third normal form (3NF) are:

The table must be in 2NF.
Every non-key attribute must be non-transitively dependent on the primary key.

All attributes must rely only on the primary key. So, if a database has a table with columns Student ID, Student, Company, and Company Phone Number, it is not in 3NF. This is because the Phone number relies on the Company. So, for it to be in 3NF, there must be a second table with Company and Company Phone Number columns; the Phone Number column in the first table would be removed.

[edit] Boyce-Codd normal form

Main article: Boyce-Codd normal form

A table is in Boyce-Codd normal form (BCNF) if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof.^[8]

[edit] Fourth normal form

Main article: Fourth normal form

A table is in fourth normal form (4NF) if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a superkey—that is, X is either a candidate key or a superset thereof.^[9]

For example, if you can have two phone numbers values and two email address values, then you should not have them in the same table.

[edit] Fifth normal form

Main article: Fifth normal form

The criteria for fifth normal form (5NF and also PJ/NF) are:

The table must be in 4NF.
There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

[edit] Domain/key normal form

Main article: Domain/key normal form

Domain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints.

[edit] Sixth normal form

According to the definition by Christopher J. Date and others, who extended database theory to take account of temporal and other interval data, a table is in sixth normal form (6NF) if and only if it satisfies no non-trivial (in the formal sense) join dependencies at all,^[10], meaning that the fifth normal form is also satisfied. When referring to "join" in this context it should be noted that Date et al. additionally use generalized definitions of relational operators that also take account of interval data (e.g. from-date to-date) by conceptually breaking them down ("unpacking" them) into atomic units (e.g. individual days), with defined rules for joining interval data, for instance.^[11]

Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g. for historical data) but the three values may vary independently of each other and at different rates. We may, for instance, wish to trace the history of changes to Status.

For further discussion on Temporal Aggregation in SQL, see also Zimyani, ^[12] For a non-relational approach, see TSQL2.

In a different meaning, sixth normal form may also be used by some to refer to Domain/key normal form (DKNF).

indexing

A database index is a data structure that improves the speed of operations on a database table. Indices can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

The disk space required to store the index is typically less than that required by the table (since indices usually contain only the key-fields according to which the table is to be arranged, and excludes all the other details in the table), yielding the possibility to store indices in memory for a table whose data is too large to store in memory.

In a relational database an index is a copy of part of a table. Some databases extend the power of indexing by allowing indices to be created on functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper case versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indices, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. All of these indexing refinements are supported in Visual FoxPro, for example.^[1]

Indices may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing identical rows in the index and thus, the original columns.

Column order

The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.

For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index. kopf ujhfojp f]

[edit] Applications and limitations

Indices are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure until the Smith entry has been found; this is much less computationally expensive than a full table scan.

Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure. This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@yahoo.com');. This puts the wild-card at the right-most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.

[edit] Types

[edit] Bitmap index

Main article: Bitmap index

A bitmap index is a special kind of index that stores the bulk of its data as bitmaps and answers most queries by performing bitwise logical operations on these bitmaps. The most commonly used index, such as B+trees, are most effective if the values it indexes do not repeat or repeat a relatively smaller number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the gender field in a customer database usually contains two distinct values, male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

[edit] Dense index

A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys the dense index points to the first record with that key.^[5]

[edit] Sparse index

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys the sparse index points to the lowest search key in each block.

entity model

An entity-relationship model (ERM) is an abstract conceptual representation of structured data. Entity-relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created using this process are called entity-relationship diagrams, or ER diagrams or ERDs for short. The definitive reference for entity relationship modelling is generally given as Peter Chen's 1976 paper^[1]. However, variants of the idea existed previously (see for example A. P. G. Brown^[2]) and have been devised subsequently.

[edit] Overview

The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse (i.e. area of interest). In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".

There are a number of conventions for entity-relationship diagrams (ERDs). The classical notation is described in the remainder of this article, and mainly relates to conceptual modeling. There are a range of notations more typically employed in logical and physical database design, including IDEF1x (ICAM DEFinition Language) and dimensional modeling.

An entity may be defined as a thing which is recognised as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world (Beynon-Davies, 2004).

An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. Entities are represented as rectangles.

A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. Relationships are represented as diamonds, connected by lines to each of the entities in the relationship.

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Attributes are represented as ellipses connected to their owning entity sets by a line.

Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set.

Lines are drawn between entity sets and the relationship sets they are involved in. If all entities in an entity set must participate in the relationship set, a thick or double line is drawn. This is called a participation constraint. If each entity of the entity set can participate in at most one relationship in the relationship set, an arrow is drawn from the entity set to the relationship set. This is called a key constraint. To indicate that each entity in the entity set is involved in exactly one relationship, a thick arrow is drawn.

Data Type	“Zero” Value
`DATETIME`	`'0000-00-00 00:00:00'`
`DATE`	`'0000-00-00'`
`TIMESTAMP`	`'0000-00-00 00:00:00'`
`TIME`	`'00:00:00'`
`YEAR`	`0000`