Managing an ETL Project
[The life cycle of an ETL project is no different from the life cycle of other projects. It begins with a vision, mission, objective and goal that translates into a scope of work, which must be executed to create the desired outcomes—in this instance a data warehouse. Managing an ETL project is no different from managing other projects. The level of detail and the skill of execution of the project will determine the success or failure of the project]
ETL Project management is a challenge. It is
the first step in a data warehouse initiative and perhaps
the most important. If ETL project fails the data warehouse
will not deliver the desired outcomes. Since these projects
are high risk, high cost projects, organizations are very
anxious about the success or failure of these projects. Yet,
statistical reports reveal that a large number of data
warehouse projects fail for want of effective management and
imperfect understanding of the needs of the organization.
So what are the factors that must be taken into
consideration while launching ourselves into the management
of a ETL project? The answer is very simple—adopt best
practices in management. However, the application of the
percept is not so simple. Best practices can at best provide
guidance for leveraging technology (such as ETL tools, data
integration software), mitigating risk, and optimizing
resources. It cannot replace the native common sense, the
experience and the leadership drive of a good ETL project
manager.
The ETL project manager is as constrained as the manager of
other projects. He needs to garner the necessary resources
and find the relevant answers to the universal questions of
who, when, why, where, what and how of the ETL project. Why
do we need a data warehouse? What is the data that will be
collected, collated, cleaned and used? What ETL tools are
available for the ETL process? Who will be the beneficiaries
of the ETL process and who will define the process needs of
Extract, Transform, Load? Where will the data be extracted?
Where will it be staged? Where will it be transformed and
loaded? How will the data be used? The replies will give him
a fairly good idea of the need for the ETL process and the
kinds of data that he will need to provide to the end user.
This will set in focus the short term and long term
objectives of the project. The project manager can never
lose sight of the objectives of the exercise and he must
ensure that the project team members also, keep the vision,
mission, goals and objectives of the project in mind at
operational levels and delivery levels. His task will be a
definition of a strategic plan to achieve these objectives.
The first phase of project management is the development and
deployment of the ETL project methodology. Experience will
help him identify potential conflicts, misunderstandings and
to provide for them in order to mitigate project risks and
control costs. This is also the phase in which the project
manager educates his team members and stakeholders on the
project management approach and ensures that complete and
effective communication is established between the different
individuals involved. The ETL project manager also uses this
time for defining the procedures for issue, risk and change
management, project management monitoring and communication;
project management quality assurance and meeting schedules.
He details the strategies for stakeholder acceptance,
support, training and marketing. He focuses on configuration
management, document management, backup and recovery,
creation of archives. He also defines the scope of the ETL
project, the performance parameters, and the measures of
success, ETL project life cycle methodology and the ETL
project plan. The ETL tools, data integration software and
the battery of hardware required for the ETL project are
identified and requisitioned.
The management of an ETL project begins with a Work
Breakdown Structure (WBS). The ETL project manager will have
to ensure that all the phases of the project are listed;
activities and tasks identified and allotted appropriate
slots in the time line. He will have to provide detailed
descriptions of each phase, activity and task required in
order to ensure that all the ETL tasks are carried out as
planned and within the time frame specified. He will also
have to identify the ETL project process dependencies and
list the critical project milestones and deliverables at
each stage.
The next step would involve the identification of the
required resources for the project. These could be generic
resources required for each task such as full time or part
time staff, contractors, equipment and material. Each
resource type will have to be quantified and allocated
against the activities and tasks and scheduled for
availability in accordance with the delivery dates and
timelines. The purpose of this exercise is to ensure that
each resource is used optimally. The required ETL tools and
data integration tools are acquired installed and tested for
high-availability.
Once the above steps are complete, the manager will proceed
to construct an ETL project schedule. This ETL project
schedule will list the phases, activities, task sequences
and tasks. All key internal and external dependencies will
be identified and defined. Time frames will be allocated for
each ETL process phase and additional contingencies required
to mitigate risks will be incorporated into the plan.
Finally resources will be assigned to the ETL task and the
work will begin keeping in mind the critical milestones of
the project. All assumptions and constraints will be
described and inbuilt into the project process definition
for further review and action.
The first proof of concept is created when the data
warehouse architecture and the data warehouse concept is set
up and a small prototype is created for user validation and
testing This validates the ETL process blue print and gets
the buy in of the users and stakeholders of the project. It
mitigates the risk of the project and enables incremental
delivery of the concept until complete production scale is
reached and the data is ready to be loaded into the data
warehouse.
However, the task of the ETL manager does not end here. He
needs to ascertain the refresh rate of the data warehouse
and the flow of data that will have to be extracted, staged,
cleaned, transformed and loaded into the warehouse years
after the data warehouse has been in operation. His role
never ceases to be valued and he never becomes redundant.