Thursday 6 March 2014

Data Warehouse Concept

The data warehouse is comprised of data that is stored over a well-defined period of time thus enabling trend analysis to be carried out.  This information is stored in such a way as to facilitate user-defined analysis.

What is a data mart?
A data mart is a segment of a data warehouse typically organized by subject area or focused on a particular client group within an enterprise (ie. Indirect Sourcing) that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, and production.  In short, a data mart is usually created for a specific department or individual need.

What is a data warehouse?
A data warehouse is a collection of data in a database that is organized representing historical data from different operations in the company.  A data warehouse can also be considered a collection of data marts.  This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent.

What are dimensions?
Dimensions are categories by which factual data can be viewed and summarized. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension (profit by country, state, city), Product dimension (profit for product1, product2).

What are fact tables?
A fact table is a table that contains transactional data or individual facts, for example a sales transaction, and a multipart index composed of foreign keys from the primary keys of related dimension tables.

What are measures?
Measures are numeric data based on columns in a fact table. They are the primary data which end users are interested in. E.g. a sales fact table may contain a profit measure, which represents profit on each sale.

What are aggregations?
Aggregations are precalculated numeric data at
different levels of granularity. By calculating and storing the answers to a query before users ask for it, the query processing time can be reduced. This is key in providing fast query performance in OLAP.

What are cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients.

What is OLAP?
OLAP stands for Online Analytical Processing.
It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data. E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company's historical data for trends and patterns.



What is OLTP?
OLTP stands for Online Transaction Processing.
OLTP uses normalized data models to quickly record large numbers of concurrent transactions while ensuring that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.
Imagine a call center that takes orders. Call takers are continually taking calls and entering orders that may contain numerous items. Each order and each item must be inserted into a database. Since the performance of the database is critical, we want to maximize the speed of inserts (and updates and deletes).

Data warehouse and Business Intelligence Testing

DW/BI testing is one the most challenging and emerging career in the field of software testing.
This is not as similar as manual testing or database testing .This requires a lot of technical expertise, knowledge of database schema, ETL tools, Reporting tools, Complex SQL queries.

Anyone who wants to pursue his or her career as DW/BI testing professional should have following skill set.

1. Knowledge of Data warehouse concept.
2. Knowledge of tools like Data stage, informatica, Teradata etc
3. Proficient in writing complex SQL queries.
4. Test Data management.
5. Knowledge of reporting tool like SAP BO, Cognos, Microstrategy etc