INTRODUCTION & ARCHITECTURE OF AB INITIO

Ab Initio means “Starts from the Beginning”. Ab-Initio software works with the client-server model. The client is called “Graphical Development Environment” (you can call it GDE).It resides on user desktop. The server or back-end is called Co-Operating System”. The Co-Operating System can reside in a mainframe or UNIX remote machine. The Ab-Initio code is called graph, which has got .mp extension. The graph from GDE is required to be deployed in corresponding .ksh version. In Co-Operating system the corresponding .ksh in run to do the required job.

overview

Ab Initio Capabilities

Ab Initio, classically an ETL vendor, has gradually emerged as a strong player in the application integration spectrum with its wide range of enterprise scale, mission-critical applications that include:

  • Data warehousing (ETL)
  • Real-time analytics
  • Customer relationship management (CRM)
  • Enterprise application integration (EAI)

Ab Initio provides a robust architecture that would allow simple, fast, and highly secure integration of systems and applications. It can run heterogeneously with parallel execution over distributed networks. It can integrate diverse, complex, and continuous data streams ranging in size from several gigabytes to tens of terabyte, providing both ETL (Extraction, Transformation and Loading) and EAI (Enterprise Application Integration) tasks within a single, consistent framework.

Ab Initio ETL tool architecture

The Ab Initio is a business intelligence software containing 6 data processing products:

  1. Co>Operating system (Co>Op v2.14, 2.15…)
  2. The Component Library
  3. Graphical Development Environment (GDE v1.14, 1.15…)
  4. Enterprise Meta>Environment (EME v3.0…)
  5. Data Profiler
  6. Conduct>IT

abinitio architecture

  1. Co>Operating System:

Ab Initio Co>Operating System is a foundation for all Ab Initio applications and provides a base for all Ab Initio processes. It runs on variety of system environments like AIX, HP-UX, Solaris, Linux, z/OS and Windows.

It provides following features:

  • Manage and run Ab Initio graphs and control the ETL processes
  • Provides Ab Initio extensions to the operating system
  • ETL processes monitoring and debugging
  • Metadata Management by interacting with EME
  1. The Component Library:

 The Ab Initio Component Library is a reusable software module for sorting, data transformation, and high-speed database loading and unloading.

  1. Graphical Development Environment:

GDE provides an intuitive graphical interface for editing and executing applications. One can easily drag-and-drop components from library on to a canvas, configure them and connect them into flowcharts.

Ab Initio GDC graph compilation process results in generation of a UNIX shell script which may be executed on a machine where GDE is not installed.

  1. Enterprise Meta>Environment:

EME is repository and environment to store and managing metadata. It has ability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or Ab Initio Co>Op command line.

  1. Data Profiler:

Data Profiler is an analytical application that can specify data range, scope, distribution, variance and quality. It runs in a graphic environment on top of the Co>Op.

  1. Conduct>It:

Conduct>It is a high-volume data processing systems developing tool. It enables combining graphs from GDE with custom scripts and programs from other vendors. Ab Initio provides both Graphical and command line interface to Conduct>It.

Application Integration Implementation Using Ab Initio

The following section discusses an application integration process facilitated by Ab Initio. The diagram below describes the general design architecture followed to integrate data from disparate sources into an Enterprise Data Warehouse and loaded into target CRM application.

figure 1

Figure : Implementation Architecture

The major challenges in such an implementation are:

  1. Multiple Sources – Data from disparate sources like Mainframe /Oracle table using different technologies, data formats, with different data load frequencies.
  2. Complex Business Logic – Achieving common data format aligned with the target systems, generic entities, and data cleansing requirements.
  3. Redundancy – Multiple truth of source data due to data duplication.

A cost-effective solution can be provided using the Ab Initio batch or real time (continuous flow) execution mechanism. A scalable solution that extracts data from distributed and disparate systems, transforms multiple format data into common format, creates Data Warehouse, operational data stores, aggregations/ derivations for Business intelligence, and loads data into target systems can also be provided.

The following schematic diagram explains the solution to implement the above system architecture.

figure 2

In this design, data from different sources are loaded into the data reception area (DRA) in periodic batch execution as well as near real time data flow using MQ series/AI queue. The DRA can handle multiple source data in different data format. The data is then transported to the data staging area where the data is converted to common format data.

The generic loader or extractor process is Ab Initio based applications that can perform miscellaneous functions:

  • Load data into operational data stores
  • Use metadata driven rules engine to generate code
  • Provide PAI facility to perform Query
  • Interface with database for data extraction and data load
  • Provide delta and before-after image of data
  • Feeds target system/ reporting tools or message queue.

Conclusion:

Ab Initio, classically an ETL vendor, has gradually emerged as a strong player in the application integration spectrum with its wide range of enterprise scale, mission-critical applications. The Ab Initio products are provided on a user friendly platform for data processing applications. These applications perform functions relating to fourth generation data analysis, batch processing, data manipulation graphical user interface (GUI)-based parallel processing software which is commonly used to extract, transform, and load (ETL) data.

Leave a Comment

Your email address will not be published. Required fields are marked *