Wednesday, 10 September 2014

Basics of ETL and Informatica


Basics of ETL and Informatica

Posted by Koti Reddy on September 10, 2014
Thought to provide some basics of ETL and Informatica PowerCenter 8.6
What is ETL?
ETL stands for Extraction, Transformation and Load. It is the process of transferring heterogeneous data from different sources to target system called as data warehouse as per the business requirements for different purpose like management reporting, data analysis etc. ETL Tools provide facility to extract data from different non-coherent systems, cleanse it, merge it and load into target systems.
Informatica is one of the powerful ETL tool, easy to work which supports all the steps of Extraction, Transformation and Load process. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load.
Main Components of Informatica Power Center
1. Repository:
Repository is the heart of Informatica tool. Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept. All the client tools and Informatica Server fetch data from Repository.
2. Informatica PowerCenter Server:
All the executions take place in server, it makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.
3. Informatica PowerCenter Client Tools:
These tools enable a developer to define transformation process, known as mapping in Designer, define run-time properties for a mapping, known as sessions in Workflow Manager, monitor execution of sessions in Workflow Monitor and manage repository, useful for administrators in Repository Manager.
Clients Tools
Administration Console
It’s a web based client tool which is responsible for creating and managing services, manipulate services, manages to create users/groups and privileges. It creates repositories, services and initialization of the services.
It creates and maintains two important services 1. Repository Services – Interacts with Repository and 2. Integration Services – Main services which used to execute the process through repository services.
Repository Manager
It is responsible for creating the folders to manage the Meta data used by different users.
Designer
It is responsible for defining the flows between source system and target systems through transformation logics called as mappings. There are five components of Designer
1. Source Analyzer: Defines the source definition through ODBC connection.
2. Target Designer: Defines the target definition through ODBC connection
3. Transformation Developer: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation.
4. Mapplet Designer: A Mapplet is a reusable object that you create in the Mapplet Designer. It contains a set of transformations and lets you reuse the transformation logic in multiple mappings.
5. Mappings Designer: Mappings are developed in mappings designer. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation.
Workflow Manager
Workflow helps to load the data from source to target in a sequential manner. A workflow is a set of instructions that tells the Integration service how to run tasks such as sessions, email notifications, and shell commands. After you create tasks in the Task Developer and Workflow Designer, you connect the tasks with links to create a workflow. There are three components of workflow manager
1. Task Developer: Defines the tasks like sessions, email, command. Session is a object of an repository which sets instructions to the server how to execute by carrying the mapping definitions and production connections.
2. Worklet Designer: It’s an object which carries set of sessions for execution in a serial or parallel flow.
3. Workflow Designer: it’s an object of repository which sets instructions to the server, when to execute the workflow.
Workflow Monitor
This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.
Simple steps to start with Informatica 8.6
1. Install the Informatica
2. Create a Repository in Admin Console
3. Create Integration Services in Admin Console
4. Connect to Repository Manager (First time, add the repository created and connect it, create the folder)
5. Work on the Designer to create Mappings
6. Work on the Workflow Manager to create workflow
7. Work on the Workflow Monitor to track the changes
- Koti Reddy

No comments:

Post a Comment