A company has expanded its business via mergers and acquisitions and now has data from many business units and acquired companies. There is no central repository that consolidates and integrates data to provide a complete view of the business. The lack of data integration has resulted in users writing and running individual reports and consolidating this information to analyze trends.
At various locations, customer and sales data is stored in relational databases, enterprise package applications, and legacy systems. Some custom reporting programs are used to extract data from relational databases and legacy systems.
Different reports run against different systems, thereby making report generation resource intensive. Multiple reports reduce the ability to correlate data from different systems.
Data is extracted from data sources and cleaned. The cleaned data is then loaded into the warehouse using an ETL tool. However, there still exists problems such as:
Individual data stores are of poor quality, and include duplicate entries, orphaned data, spelling variations, and data entry errors.
Data is not consolidated well enough to support ad-hoc analysis.
The main requirement is to consolidate business data into one central repository. Then, consolidated data must be cleaned, aggregated, and restructured in the central repository. Data in the central repository must enable reporting tools to generate ad-hoc reports on the consolidated information for analyzing trends.
Figure 2-5: Sybase ETL data flow
The data stores that contain the business data are identified. Sybase ETL extracts data from individual data stores and loads it into a central repository. The raw consolidated data in the central repository is cleaned, aggregated, and restructured and made ready for access. You can use reporting tools on this central repository to generate ad-hoc reports and trend analyses.
Sybase ETL Development provides GUI tools for administration and development of Sybase ETL.