Showing posts with label Data Integration. Show all posts
Showing posts with label Data Integration. Show all posts

Monday, February 23, 2009

Moving data between various databases

If the complex enterprise world was synonymous with an automobile, data can be considered as the fuel revving its engines. Enterprise applications either consume or generate the data to deliver the functionality for which they are designed.ETL_Basic

Enterprises persist data into various different source systems. Most of the organizations use standard RDBMS (i.e. MySQL, Oracle, DB2, MSSQL etc.) while some of them keep it simple by dumping data into spreadsheets, text files/csv, xml etc. There is also a considerable amount of data floating around as RSS, HTML and Emails and many more such bizarre data sources.

It can get extremely chaotic if there is a requirement to move such data between heterogeneous platforms using traditional data extraction and loading techniques. Data movement requirements can be envisaged with following few scenarios :

  • Database upgrades i.e. from oracle 1.x to 1.y
  • Database schema upgrades i.e. structural changes w.r.t. fields and tables.
  • Bulk loading of data from multiple and possibly heterogeneous data sources into a target database i.e. loading data into MySQL partly from DB2 and partly from Oracle
  • Moving data from source to target while transforming information being moved i.e.  First name and Last name fields being combined from source to Full Name field in target database.
  • Reconciliation of information from various dynamic data sources into a data warehouse.
  • Simple data archival requirements.

ETL (Extract-Transform-Load) is a function of Data Integration that provides the technology for moving data between variety of such data sources. ETL technology is being provided by variety of vendors today. Each offering has some or the other uniqueness in terms of  data sources it supports. Here are some of the prominent ETL products listed against vendors providing it :

  • Ab Initio
  • Adeptia [Data Tranaformation Server]
  • Altova [Mapforce]
  • Advanced ETL Processor
  • Barracuda Software [Barracuda Integrator]
  • CA [Data Integrator]
  • Corporator [Transformer]
  • CoSORT [CoSORT ETL Tools]
  • Crossflo Systems [Data Exchance]
  • DataHabitat [DH ETL]
  • Djuggler
  • Enhydra [Open Source]
  • Group 1 [Data Flow]
  • IBM [WebSphere DataStage]
  • iSoft [Amadea]
  • Ikan [ETL4ALL]
  • LogiXML [LogiETL]
  • Microsoft [SSIS]
  • Oracle [Oracle Data Integrator(ODI) / Oracle Warehouse Builder(OWB)]
  • Pentaho [Pentaho Data Integration, PDI]
  • Pervasive [Data Junction]
  • Platinum [Info Pump]
  • SAP/Business Objects [Data Integrator & Services]
  • SAS [Data Integration]
  • Sagent Technologies [Sagent Solution]
  • Solonde [Warehouse Workbench]
  • Sybase [Data Integration Suite & Data federation]
  • WisdomForce [FastReader]

Also, there are variety of Open Source ETL solutions available.

  • Apatar
  • Benetl
  • kJube
  • QXchange
  • Scriptella [Open Source ETL]
  • SUN Microsystems [SUN Data Integrator]
  • Teland [Talend Open Studio]

Here are some of the other prominent (Open Source) once - Browse List Here

Sphere: Related Content

Sunday, January 4, 2009

Moving from QoS to QoE with Single Customer View (Master Indexed Data)

QoS (Quality of Service) has to be delivered. Organizations worldwide have already identified and mastered this to sustain in the  customer centric markets of today. QoE(Quality of Experience), however, has been fast maturing and taking over QoS (Quality of Service) as a key differentiator.

Organization wide indexing of customer data helps the enterprise deliver QoE by uniquely identifying customers across various functions of the business. They also allow customer centric collaboration between integrated business models thus driving QoE along with QoS. This can be very simply understood as described below :

Jerry is a frequent flyer and often needs to dial-a-cab "Cool-Cabs" service for the Airport. He needs to provide his Name, Address and Telephone number along with the time when the cab is required and the cab is always at the doorstep well in time. Jerry is happy with this service.

Due to Christmas season rush, all "Cool-Cabs" were booked and Jerry tried calling another similar cab service named "Super-cool-cabs" this time. On placing the call from his mobile, Jerry was identified and greeted with his first name, his address was promptly identified and confirmed and the cab was booked in less than 30 sec. Jerry was awestruck when the cab arrived with a morning copy of his favorite newspaper with a personalized welcome note stuck on it.

Later, on inquiring how this was made possible, Jerry learns that "Super-cool-cabs" service was apparently run by the same  "Baron-business-group" that owns a popular book store in the city where he usually picks up a copy of this daily newsprint. Baron maintains the master index of its customers and leverages the same in all its spread out customer centric businesses.

"Super-cool-cabs" service turned out to be as efficient as "Cool-Cabs" service and Jerry, a new customer, was WON!

SUN Master Index Studio provides the capability to create any domain specific master index through the matching, de-duplication, merging, and cleansing of data from various data sources.

Project Mural is an open source MDM (Master Data Management) solution from SUN.

Sphere: Related Content

Monday, December 8, 2008

Demystifying Enhanced Data Integrator with some Demos

The Enhanced SUN Data Integrator lays foundation stone for providing a platform that would facilitate data movement virtually between wide variety of data sources and targets.

Typically Enhanced Data Integrator would work in three different modes :

  1. Basic ETL Mode : Data movement from a relational source to a relational target database.
  2. Advanced ETL Mode : Data movement from any data source (File and JDBC) to a relational data target. This also supports generation of staging database for injecting data into SUN MDM Projects (Project Mural).
  3. Bulk Loader Mode : Supports data loading for delimited files having well replicated relational targets. This is typically useful for loading indexed data (from project Mural) into its target schema/database.

To get the flavor of this, here are couple of videos from the Enhanced SUN Data Integrator bean bag that demos basic and advanced data integrator modes :

SUN Data Integrator - Basic ETL

 

SUN Data Integrator - Advanced ETL

Enhanced SUN Data Integrator projects can be created with Glassfish ESB RC1 builds under SOA catagory. You can download the installer here

Sphere: Related Content

Friday, November 28, 2008

Get your data dancing to your tunes with SUN Data Integrator (ETL)

SUN Data Integrator (SUN ETL), is getting ready to be even more user friendly with its revamped wizard experience from NetBeans IDE.SUN_Data_Integrator

The new experience will provide users with an option to craft ETL (Extract – Transform - Load) collaborations from data residing in disparate data sources like flat-files, csv-files,  spread sheets (XLS) and relational database tables. It would also support sourcing data from RSS feeds and some simple HTML web tables. User will be able to craft ETL collaborations by selecting combination of such data sources and moving source data to some target database.

Wide variety of relational databases  are supported by SUN Data Integrator of which some of the common once are MySQL, Microsoft SQL Server, Oracle, Derby, PostgreSQL etc.

SUN Data Integrator can run as a JBI service (JSR 208 compliant) on the Glassfish Application Server thus enabling Data Integration/ETL as a SOA enabled service. It is also one of the key component for Project Mural (SUN MDM Suite) where it facilitates raw data loading from source systems into the MDM infrastructure. It is also used for loading indexed data (from MDM Project ) to its targeted database platform.

The Enhanced ETL projects also supports generating portable/re-useable ETL packages which could be used/re-used from command-line for simultaneous data loading  from various distributed host machines to a specified target database.

Here is how the integrated story would look like :

New_DI_Overview

Stay tunes to find out more on this topic.

Sphere: Related Content