Title: Informatica - An ETL Tool | Informatica Tutorial
1Informatica ETL Tool and Its Features
Informatica Tutorial - Mindmajix
The overview of Informatica is explained in the
previous article Informatica PowerCenter. Infor
matica relies on a ETL concept which is
abbreviated as Extract- Transform- Load.
What is ETL? It is a data warehousing concept of
data extraction where the data is extracted from
numerous different databases.
Who Invented ETL? Ab Intio- a multinational
software company based out of Lexington,
Massachusetts, United States framed a GUI Based
parallel processing software called as ETL. The
other historic transformations relating to ETL
journey are briefed here.
2Where do Informatica-ETL concepts apply in
Real-Time Business? Informatica is a company that
offers data integration products for ETL, data
masking, data Quality, data replica, data
virtualization, master data management, etc.
Informatica ETL is most commonly used Data
integration tool used for connecting fetching
data from different data source.
- Some of the typical use cases for approaching
this software are - An organization migrating from existing software
system to a new database system. - To set up a Data Warehouse in an organization
where the data is moved from the Production/
data gathering system to Warehouse. - It serves as a data cleansing tool where data is
detected, corrected or removed corrupt or
inaccurate records from a database.
How do Informatica - ETL tool is implemented?
1. Extract The data is extracted from different
sources of data. Common data-source formats
include relational databases, XML and flat files,
Information Management System (IMS) or other
data structures. An instant data validation is
performed to confirm whether the data pulled
from the sources has the correct values in a
given domain.
3- Transform
- A set of rules or logical functions like cleaning
of data are applied on the extracted data in
order to prepare it for loading into a target
datasource. Cleaning of data implies passing
only the "proper" data into the target source.
There are many transformation types that can be
applied on data as per the business need. Some of
them can be column or row based, coded and
calculated values, key based, joining different
data sources etc. - Load
- The data is simply loaded into the target data
source. - All the three phases are executed parallelly
without being waiting for other to complete or
begin.
What are other features of ETL Tool? Parallel
Processing ETL is implemented using a concept
called Parallel Processing. Parallel Processing
is a computation executed on multiple processes
executing simultaneously. ETL can work 3 types
of parallelism
- Data by splitting a single file into smaller data
files. - Pipeline allows several component to run
simultaneously on the same data. - Component are the executables. Processes involved
for running simultaneously on different data to
do the same job.
4Data Reuse, Data Re-Run and Data Recovery Each
datarow is provided with a row_id and each piece
of process is provided with a run_id so that one
can track the data by these ids. There are
checkpoints created to state the certain phases
of the process as completed. These checkpoint
state us the need for us to re-run the query for
completion of the task. Visual ETL The
advanced ETL tools like PowerCenter and Metadata
Messenger etc., that helps you to make faster,
automated and highly impactful structured data as
per your business needs.
You can ready-made database and metadata modules
with drag and drop mechanism on a solution which
automatically configure, connect , extracts ,
transfers and loads on your target
system. Click for more information Informatica
ETL Tool