Example 3: Consolidating data in real-time

A company has expanded via mergers and acquisitions. This expansion has resulted in new data continually being added by departments and business units at various geographical locations. This data is inaccessible due to incompatible legacy systems and data schemas.

Product and customer data is stored in heterogeneous databases such as Sybase ASE, Microsoft SQL Server, Oracle, and legacy mainframe systems. Numerous in-house tools are used to extract data from heterogeneous data stores and load data into the central warehouse. Data in the central repository is not synchronized, and this lag in information yields imprecise data for analysis.

Each office sends data changes in batches, which are consolidated into the central data repository. The company processes batch updates two to four times a day, which does not provide real-time data. More frequent updates place burden on the operational system, while less frequent updates increase latency.

In-house client applications are used to access data from the central repository.

The primary requirement in this example is to gain a holistic view of the entire business and integrate data from multiple heterogeneous databases into one central repository. Information in the repository must be constantly updated and synchronized with real-time changes in various locations.

Sybase Replication provides bidirectional, heterogeneous replication and synchronization of operational data across enterprise, client/server, and mobile systems in near real-time.

Figure 2-3: Sybase Replication data flow

This is a Sybase Replication data flow diagram. It illustrates the data flow from heterogeneous source databases to a central repository via Replication Server. The data in the central repository is connected to client applications that require data for their processing. It also illustrates the development and administration capabilities provided by Sybase WorkSpace and Data Services Administrator, of the Data Integration Suite.

Data flow

At each site, the databases that contain customer and product data are identified. The central repository is initially loaded using tools such as Sybase ETL or other Sybase-certified ETL products. This information is then kept up to date using replication, explained in detail below.

NoteSybase ETL provides Extract, Transform, and Load capabilities and is available outside of the DI Suite. For more information about this product, go to Data Integration Suite.

Transactions from heterogeneous databases are received through the Replication Agents connected to the Replication Server. These replication agents read transaction log files from various databases and pass the transactions to the Replication Server at each geographical site.

NoteFor mainframe systems, the Replication Agent for DB2 UDB for OS/390 reads the various DB2 UDB logs, extracts transactional data, and passes it to the relevant Replication Server at that site. Replication Agent for DB2 UDB for OS/390 is an option, which you need to purchase separately.

Transactions from the Replication Servers at each site are distributed to a central Replication Server. The central Replication Server connects to a repository that stores the consolidated information using DirectConnect. The central repository is now the data warehouse for the company. It is loaded with complete product and customer data and is in synchronization with the remote databases.

Any existing in-house or external client application can access this central repository and perform tasks on the consolidated data.

Development

Use Sybase WorkSpace to set up the heterogeneous replication system. It provides clients using databases in the replication system with local data access, thereby reducing load on the network and centralized computer systems. After installing and configuring Replication Server, you can connect to and manage the replication system with replication definitions, publications, articles, and subscriptions.

Administration

Use Data Services Administrator (DSA) to administer the Sybase Replication component through GUI-based server managers accessible via Sybase Central plug-ins. DSA enables you to manage and monitor all distributed components of the replication environment from a single site.