Monday, February 23, 2009

Moving data between various databases

If the complex enterprise world was synonymous with an automobile, data can be considered as the fuel revving its engines. Enterprise applications either consume or generate the data to deliver the functionality for which they are designed.ETL_Basic

Enterprises persist data into various different source systems. Most of the organizations use standard RDBMS (i.e. MySQL, Oracle, DB2, MSSQL etc.) while some of them keep it simple by dumping data into spreadsheets, text files/csv, xml etc. There is also a considerable amount of data floating around as RSS, HTML and Emails and many more such bizarre data sources.

It can get extremely chaotic if there is a requirement to move such data between heterogeneous platforms using traditional data extraction and loading techniques. Data movement requirements can be envisaged with following few scenarios :

  • Database upgrades i.e. from oracle 1.x to 1.y
  • Database schema upgrades i.e. structural changes w.r.t. fields and tables.
  • Bulk loading of data from multiple and possibly heterogeneous data sources into a target database i.e. loading data into MySQL partly from DB2 and partly from Oracle
  • Moving data from source to target while transforming information being moved i.e.  First name and Last name fields being combined from source to Full Name field in target database.
  • Reconciliation of information from various dynamic data sources into a data warehouse.
  • Simple data archival requirements.

ETL (Extract-Transform-Load) is a function of Data Integration that provides the technology for moving data between variety of such data sources. ETL technology is being provided by variety of vendors today. Each offering has some or the other uniqueness in terms of  data sources it supports. Here are some of the prominent ETL products listed against vendors providing it :

  • Ab Initio
  • Adeptia [Data Tranaformation Server]
  • Altova [Mapforce]
  • Advanced ETL Processor
  • Barracuda Software [Barracuda Integrator]
  • CA [Data Integrator]
  • Corporator [Transformer]
  • CoSORT [CoSORT ETL Tools]
  • Crossflo Systems [Data Exchance]
  • DataHabitat [DH ETL]
  • Djuggler
  • Enhydra [Open Source]
  • Group 1 [Data Flow]
  • IBM [WebSphere DataStage]
  • iSoft [Amadea]
  • Ikan [ETL4ALL]
  • LogiXML [LogiETL]
  • Microsoft [SSIS]
  • Oracle [Oracle Data Integrator(ODI) / Oracle Warehouse Builder(OWB)]
  • Pentaho [Pentaho Data Integration, PDI]
  • Pervasive [Data Junction]
  • Platinum [Info Pump]
  • SAP/Business Objects [Data Integrator & Services]
  • SAS [Data Integration]
  • Sagent Technologies [Sagent Solution]
  • Solonde [Warehouse Workbench]
  • Sybase [Data Integration Suite & Data federation]
  • WisdomForce [FastReader]

Also, there are variety of Open Source ETL solutions available.

  • Apatar
  • Benetl
  • kJube
  • QXchange
  • Scriptella [Open Source ETL]
  • SUN Microsystems [SUN Data Integrator]
  • Teland [Talend Open Studio]

Here are some of the other prominent (Open Source) once - Browse List Here

Sphere: Related Content