Managing an ETL Project
[The life cycle of an ETL project is no different from
the life cycle of other projects. It begins with a vision,
mission, objective and goal that translates into a scope of
work, which must be executed to create the desired outcomes—in
this instance a data warehouse. Managing an ETL project is no
different from managing other projects. The level of detail and
the skill of execution of the project will determine the
success or failure of the project]
ETL Project management is a challenge. It is the
first step in a data warehouse initiative and perhaps the most
important. If ETL project fails the data warehouse will not
deliver the desired outcomes. Since these projects are high
risk, high cost projects, organizations are very anxious about
the success or failure of these projects. Yet, statistical
reports reveal that a large number of data warehouse projects
fail for want of effective management and imperfect
understanding of the needs of the organization.
So what are the factors that must be taken into consideration
while launching ourselves into the management of a ETL project?
The answer is very simple—adopt best practices in management.
However, the application of the percept is not so simple. Best
practices can at best provide guidance for leveraging
technology (such as ETL tools, data integration software),
mitigating risk, and optimizing resources. It cannot replace
the native common sense, the experience and the leadership
drive of a good ETL project manager.
The ETL project manager is as constrained as the manager of
other projects. He needs to garner the necessary resources and
find the relevant answers to the universal questions of who,
when, why, where, what and how of the ETL project. Why do we
need a data warehouse? What is the data that will be collected,
collated, cleaned and used? What ETL tools are available for
the ETL process? Who will be the beneficiaries of the ETL
process and who will define the process needs of Extract,
Transform, Load? Where will the data be extracted? Where will
it be staged? Where will it be transformed and loaded? How will
the data be used? The replies will give him a fairly good idea
of the need for the ETL process and the kinds of data that he
will need to provide to the end user. This will set in focus
the short term and long term objectives of the project. The
project manager can never lose sight of the objectives of the
exercise and he must ensure that the project team members also,
keep the vision, mission, goals and objectives of the project
in mind at operational levels and delivery levels. His task
will be a definition of a strategic plan to achieve these
objectives.
The first phase of project management is the development and
deployment of the ETL project methodology. Experience will help
him identify potential conflicts, misunderstandings and to
provide for them in order to mitigate project risks and control
costs. This is also the phase in which the project manager
educates his team members and stakeholders on the project
management approach and ensures that complete and effective
communication is established between the different individuals
involved. The ETL project manager also uses this time for
defining the procedures for issue, risk and change management,
project management monitoring and communication; project
management quality assurance and meeting schedules. He details
the strategies for stakeholder acceptance, support, training
and marketing. He focuses on configuration management, document
management, backup and recovery, creation of archives. He also
defines the scope of the ETL project, the performance
parameters, and the measures of success, ETL project life cycle
methodology and the ETL project plan. The ETL tools, data
integration software and the battery of hardware required for
the ETL project are identified and requisitioned.
The management of an ETL project begins with a Work Breakdown
Structure (WBS). The ETL project manager will have to ensure
that all the phases of the project are listed; activities and
tasks identified and allotted appropriate slots in the time
line. He will have to provide detailed descriptions of each
phase, activity and task required in order to ensure that all
the ETL tasks are carried out as planned and within the time
frame specified. He will also have to identify the ETL project
process dependencies and list the critical project milestones
and deliverables at each stage.
The next step would involve the identification of the required
resources for the project. These could be generic resources
required for each task such as full time or part time staff,
contractors, equipment and material. Each resource type will
have to be quantified and allocated against the activities and
tasks and scheduled for availability in accordance with the
delivery dates and timelines. The purpose of this exercise is
to ensure that each resource is used optimally. The required
ETL tools and data integration tools are acquired installed and
tested for high-availability.
Once the above steps are complete, the manager will proceed to
construct an ETL project schedule. This ETL project schedule
will list the phases, activities, task sequences and tasks. All
key internal and external dependencies will be identified and
defined. Time frames will be allocated for each ETL process
phase and additional contingencies required to mitigate risks
will be incorporated into the plan. Finally resources will be
assigned to the ETL task and the work will begin keeping in
mind the critical milestones of the project. All assumptions
and constraints will be described and inbuilt into the project
process definition for further review and action.
The first proof of concept is created when the data warehouse
architecture and the data warehouse concept is set up and a
small prototype is created for user validation and testing This
validates the ETL process blue print and gets the buy in of the
users and stakeholders of the project. It mitigates the risk of
the project and enables incremental delivery of the concept
until complete production scale is reached and the data is
ready to be loaded into the data warehouse.
However, the task of the ETL manager does not end here. He
needs to ascertain the refresh rate of the data warehouse and
the flow of data that will have to be extracted, staged,
cleaned, transformed and loaded into the warehouse years after
the data warehouse has been in operation. His role never ceases
to be valued and he never becomes redundant.
|