HOME     ABOUT US   CONTACT       SITEMAP     
 NEWS
   
| PRODUCTS | | DOWNLOADS | | ORDER | | SUPPORT | | SERVICES |
 
 Support
   Support Information
 
 
   Updates
 
   Articles
 Product Registration
  Enable technical support and free minor updates.
 Purchase Products
  Shop online
 VisualETL
 Free Demo Downloads
 Newsletter
 
 



What is ETL and Why is it Important


[Business houses strive to create meaningful and integrated views of their business by harnessing their data into business intelligence systems and data warehouses. Efficient management of data requires that the data so harnessed is cleansed of anomalies and gives an accurate picture of the state of the business and helps the organization in making informed mission critical decisions. Extract, Transform and Load is a process of ensuring that data integrated from multitude of sources meets the stringent norms of data validity, provides for seamless integration and supports decision making activities. Inefficient ETL systems will result in distortion of information and corruption of target data systems.]

The modern day knowledge based economy demands that organizations take informed mission critical decisions for retaining a competitive edge. The required information is garnered from multifarious sources and formats. Historical repositories within the organization, online transactional databases and external sources of information are integrated for purposes of multidimensional analysis and interpretation. Database integration and data mining are mantras of the information age business houses. Data mining software, data mining tools and data warehouses are the gold mines from which informed business decisions are to be made. Implicit in this process of mixing and matching data from different sources, is the need to identify relevant information, reconcile inconsistencies and deficiencies in data and incompatibility of data types and to harvest information that is clean, consistent and correct. One of the powerful methodologies adopted by business entities to ease the process of data integration and data transfer is the ‘Extract, Transform and Load’ or ETL.

What is ETL? ETL is the process that enables companies to move data from multiple sources, reformat and cleanse it, and load it into another database, a data mart or a data warehouse for analysis, or on to another operational system to support a business process. It presupposes that the organization has defined its business needs and identified the relevant sources of data for extraction, transformation and loading. It assumes that the organization needs to now understand the data and examine the incompleteness, inaccuracy or invalidity of source data it is harnessing in order to extract it, transform it and load it into the new repository for meaningful analysis.

The process of data integration, therefore, begins with data profiling —which aims at discovering the quality and characteristics of organizational data sources. Data profiling automatically identifies quality data by mapping basic statistics, frequencies and ranges; identifying multiple spellings of the same content; discovering and validating data patterns and formats; setting out numeric range analysis; identifying and validating redundant data and defining primary/foreign key relationships across data sources; identifying duplicate name and address or null information; validating data specific business rules within a record or across sources and so on. In other words, it uncovers the anomalies of data by examining the content, structure and hidden relationships between data sources and lays the foundation for monitoring the quality of data.

The next stage is the process of transforming data or modifying data. Data Transformation involves the correction of errors, reformatting of data, standardization of data across tables, validation of inconsistent and inaccurate data and even summarization of data based on content. Transformation could also include transforming fixed record formats to variable formats and vice versa. Expansion of records with additional fields containing information from other demographic information would also fall within the purview of data transformation. The process helps to cleanse data as well as integrate data and make it ready for loading into the data mart or data warehouse.

The load process can be simple or extremely complex. Old data can be overwritten or new data can be appended to the old data. Audit trial of changes can be maintained for historical review of the changes made to data.

It is apparent from the above that a majority of the data warehousing project timeline is spent in selecting the data sources, extracting, transforming and loading data. The direction of the effort is to create a cost effective and efficient ETL System to extract, transform, and load data into the integrated database. Though, ETL systems can be created from the scratch by organizations using almost any kind of programming language, improperly designed ETL systems and data transfer tools can be disastrous to the organization. Unexpected changes in the format of the source systems can destroy or corrupt significant amounts of data in the target system. Moreover, if update cycles of data are not in Synchronous mode, the ETL system will hold back data until all systems are synchronized and can be updated. Fractures in integration of data could result in information being incorrectly presented. An error of any kind or lack of foresight while designing the system can result in the entire process becoming a costly white elephant. It is for this reason that organizations are increasingly buying off the shelf ETL tools that help in creating ETL systems and processes without the hiccups of labored and experiential creativity.

A good ETL tool can communicate with many different kinds of data sources. ETL tools are designed to read various file formats and harvest data stored in them. They can gather excel worksheet data and import it into Access and from Access to SQL Sever; from SQL Server to Oracle and so on. ETL tools have been integrated into Enterprise Application Integration (EAI) systems. Today they not only help users in extraction, transformation and loading of data, they also help the user understand the inadequacies of the data they are handling and provide them with tools to transform such data and to ensure data quality. The metadata capabilities of these tools also streamline the process of ETL.

Though the ETL market is rapidly growing, it is still in a state of flux. There are no major vendors with integrated Data warehouse solutions. Packaged ETL systems include Microsoft, which offers data transformation services bundled with its SQL Server database. Oracle has embedded some ETL capabilities in its database, and IBM offers a DB2 Information Integrator component for its warehouse offerings. DataFlux, Informatica, Ascential, DataMirror iDeliver are some of the ETL tools available in the very high end market. There are very few tools that are priced for the small and mid size market. VisualETL is one such tool that while powerful, it is also reasonably priced so you can receive ROI after just one small ETL project. Therefore, it is important to make a detailed assessment of the business requirements and understand the data that will be handled before selecting an ETL software tool that is the best fit for your particular needs and budget.

<< Back to List of Articles
 
| Home | About Us | Contact Us | Sitemap | FAQ | News | Products | Downloads | Order | Support | Services |