Visual ETL
Visual Database Migration Tool

  Extract, Transform and Load
Complex Data from Multiple
Data Sources using a Simple
Visual Wizard Like Interface.

Visual ETL: Making Data Migrations Easy
 
 
 



Planning for an ETL Project


[There are no silver bullets in ETL project planning process. It is sheer hard work and dedication that will determine the success or failure of the project. The process of collecting, collating, extracting and processing data is not very simple. The complexity of the process is best understood when you begin to plan for an ETL project. Planning for an ETL project begins with a needs analysis and includes planning for an entire gamut of activities that begins with the creation of the basic ETL design and ends with the process being moved into the production environment after testing and user validation]

All organizations today are inundated with data. Disparate legacy systems and non-standardized business practices add to the growing volume of unstructured information that is available with organizations. Data warehouse and business intelligence projects are initiated with the view to create repositories of data that support decision-making and involves extraction of mission critical data, cleaning of data and organization of data in meaningful ways. A large portion of this exercise is devoted to the process of Extract, Transform and Load (ETL) development. This is the stage at which all data challenges are addressed and resolved and it consumes about 40-70% of the time and resources of the Data warehouse project. Therefore the design of a robust ETL architecture is of prime importance to the success of the business intelligence project and is a key factor in determining whether the information is tailored to the needs of the various stakeholders of the organization.

Every project begins with a concept that is defined by the vision, mission, goals and objectives of the organization. The commitment of the organization to the achievement of these goals and objectives is a driver of the initiative. The ETL project, too, begins with a felt need for structured information that will support management decisions. It is the determinant of the environment in which the project is conceived and executed.

The project life cycle of an ETL project is no different from the life cycle of other projects. The questions that need to be asked are those that will be asked at the inception of every kind of project.

1. Is the management committed to project?
2. Are the stakeholders available for any kind of discussion?
3. Are they willing to give the necessary time and effort to the initiative?
4. What is the project scope and what are the deliverables?
5. What is the time line?
6. How detailed are the requirements?
7. Are there sufficient resources committed to the project?
8. Are there any gaps in resource deployment that need to be taken care of?
9. Has a cost-benefit analysis been performed?
10. What is the return on investment expected?
11. Is the project outcomes scalable and will it be able to meet the changing needs of the organization for a specified time frame?
12. Are the team members identified?
13. Are the member’s full time or part time?
14. What are their skill levels?
15. What kind of training needs to be imparted to such members?

At the core of the ETL project is the needs analysis--a definition of the scope of work. An understanding of the business needs is a good starting point. While the process sounds very simple, it is extremely complicated. Users are often not prepared to discuss dynamic long-term needs of the organization. Static reporting requirements are uppermost in their minds and they often do not think beyond the immediate needs. While static reporting requirements is a good starting point, it cannot be the goal of an ETL project. The project head must make an all out effort to discover data elements by framing a series of questions that help discover sources of data that can be used by the organization to empower itself. However, the danger that lurks in the discovery of additional data sources is the possibility of data overload, which can ruin the ETL architecture. So, the ETL project plan should ensure that the project starts small and remains simple, while retaining flexibility and scalability.

The output of this stage of the ETL project plan, is a logical description of the process of identifying the source of data, designing the physical structure for the data, extracting, transforming and loading data and identifying the deliverables at every stage. Business Requirements Specification templates are often used at this stage of the project to define the different parameters. The record layouts, location of source and target, File/table sizing information and documentation of data transformation are built up to assist the project manager in orderly implementation of the project. These are known as mapping specifications. Standardized mapping specification templates are used or customized to suit the needs of the organization.

The project implementation phase that follows is the source of great concern and great frustration for the project manager. After all, the objective of the ETL exercise is to manage data and provide the required information to the end users in formats specified by them. Data has to be acquired, moved from one location to another, validated, transformed and loaded into its destination. He needs to understand the data combinations and define the expected results.

To this end, the project manager must choose the architecture, which will support the extraction, transformation and loading of the data. If he has to deal with large volumes of data, he must consider the potential of partitioning and running multiple processes to optimize on the ETL architecture. He must grapple with issues relating to Data profiling, monitoring of source systems, need for continuous ETL and set profiling in order to avoid the common pitfalls of an ETL project. He must include error checks and decide how many test cases he will build. He must have a bird’s eye view of the technical risks and security. Finally he must be conscious that the test results must validate the effort to meet business requirements. This stage of the project plan would include documentation of the test plan; the development of the test plan and the performance test plan using pre defined or customized templates.

The project manager at this stage of the project will be defining the ETL object Migration Plan and the Database object migration plan. He will be completing the ETL mapping specifications and defining the cleanup process for log and bad files. The test plan specifications drawn up will include plans for unit, volume and performance tests.

The next stage in the ETL project plan would be a Walkthrough ETL process. The project manager would identify the common modules; evaluate the efficiency of the ETL code, the business logic, accuracy and standardization. The documentation of the test results is a deliverable.

The project then moves rapidly on to user validation tests wherein the user validates the end product and certifies whether or not it meets the business specifications set forth at the start of the project. The business requirement specification document forms the basis of the assessment.

If all business needs have been met the project manager begins the move from prototype to production stage. Thereafter the ETL process maintenance process needs to kick in. This would include ETL process break definitions and ETL process update specifications. The deliverables would include Updated business requirements specifications, updated mapping specifications, revised mapping specification, updated ETL object migration form, developer check methodologies for final results in production and monitoring of ETL process.