Visual ETL
Visual Database Migration Tool

  Extract, Transform and Load
Complex Data from Multiple
Data Sources using a Simple
Visual Wizard Like Interface.

Visual ETL: Making Data Migrations Easy
 
 

Managing an ETL Project

[The life cycle of an ETL project is no different from the life cycle of other projects. It begins with a vision, mission, objective and goal that translates into a scope of work, which must be executed to create the desired outcomes—in this instance a data warehouse. Managing an ETL project is no different from managing other projects. The level of detail and the skill of execution of the project will determine the success or failure of the project]

ETL Project management is a challenge. It is the first step in a data warehouse initiative and perhaps the most important. If ETL project fails the data warehouse will not deliver the desired outcomes. Since these projects are high risk, high cost projects, organizations are very anxious about the success or failure of these projects. Yet, statistical reports reveal that a large number of data warehouse projects fail for want of effective management and imperfect understanding of the needs of the organization.

So what are the factors that must be taken into consideration while launching ourselves into the management of a ETL project? The answer is very simple—adopt best practices in management. However, the application of the percept is not so simple. Best practices can at best provide guidance for leveraging technology (such as ETL tools, data integration software), mitigating risk, and optimizing resources. It cannot replace the native common sense, the experience and the leadership drive of a good ETL project manager.

The ETL project manager is as constrained as the manager of other projects. He needs to garner the necessary resources and find the relevant answers to the universal questions of who, when, why, where, what and how of the ETL project. Why do we need a data warehouse? What is the data that will be collected, collated, cleaned and used? What ETL tools are available for the ETL process? Who will be the beneficiaries of the ETL process and who will define the process needs of Extract, Transform, Load? Where will the data be extracted? Where will it be staged? Where will it be transformed and loaded? How will the data be used? The replies will give him a fairly good idea of the need for the ETL process and the kinds of data that he will need to provide to the end user. This will set in focus the short term and long term objectives of the project. The project manager can never lose sight of the objectives of the exercise and he must ensure that the project team members also, keep the vision, mission, goals and objectives of the project in mind at operational levels and delivery levels. His task will be a definition of a strategic plan to achieve these objectives.

The first phase of project management is the development and deployment of the ETL project methodology. Experience will help him identify potential conflicts, misunderstandings and to provide for them in order to mitigate project risks and control costs. This is also the phase in which the project manager educates his team members and stakeholders on the project management approach and ensures that complete and effective communication is established between the different individuals involved. The ETL project manager also uses this time for defining the procedures for issue, risk and change management, project management monitoring and communication; project management quality assurance and meeting schedules. He details the strategies for stakeholder acceptance, support, training and marketing. He focuses on configuration management, document management, backup and recovery, creation of archives. He also defines the scope of the ETL project, the performance parameters, and the measures of success, ETL project life cycle methodology and the ETL project plan. The ETL tools, data integration software and the battery of hardware required for the ETL project are identified and requisitioned.

The management of an ETL project begins with a Work Breakdown Structure (WBS). The ETL project manager will have to ensure that all the phases of the project are listed; activities and tasks identified and allotted appropriate slots in the time line. He will have to provide detailed descriptions of each phase, activity and task required in order to ensure that all the ETL tasks are carried out as planned and within the time frame specified. He will also have to identify the ETL project process dependencies and list the critical project milestones and deliverables at each stage.

The next step would involve the identification of the required resources for the project. These could be generic resources required for each task such as full time or part time staff, contractors, equipment and material. Each resource type will have to be quantified and allocated against the activities and tasks and scheduled for availability in accordance with the delivery dates and timelines. The purpose of this exercise is to ensure that each resource is used optimally. The required ETL tools and data integration tools are acquired installed and tested for high-availability.

Once the above steps are complete, the manager will proceed to construct an ETL project schedule. This ETL project schedule will list the phases, activities, task sequences and tasks. All key internal and external dependencies will be identified and defined. Time frames will be allocated for each ETL process phase and additional contingencies required to mitigate risks will be incorporated into the plan. Finally resources will be assigned to the ETL task and the work will begin keeping in mind the critical milestones of the project. All assumptions and constraints will be described and inbuilt into the project process definition for further review and action.

The first proof of concept is created when the data warehouse architecture and the data warehouse concept is set up and a small prototype is created for user validation and testing This validates the ETL process blue print and gets the buy in of the users and stakeholders of the project. It mitigates the risk of the project and enables incremental delivery of the concept until complete production scale is reached and the data is ready to be loaded into the data warehouse.

However, the task of the ETL manager does not end here. He needs to ascertain the refresh rate of the data warehouse and the flow of data that will have to be extracted, staged, cleaned, transformed and loaded into the warehouse years after the data warehouse has been in operation. His role never ceases to be valued and he never becomes redundant.