I work for a transportation and logistics company based in Atlanta. Several years ago, I was involved in the design and development of a Data Warehouse (DW). The data warehouse was to become the data integration point which would serve as a global repository of the company’s customer and revenue data. It was built primarily to support Sales Compensation, but over time it has evolved to become the company’s main data source for managerial reporting systems. To date, there are about 30 reporting systems that rely on data sourced from the Data Warehouse.
The team that supports the ETL (Extract, Transform Load) processes that load and integrates data in the DW spend about 50% of their time resolving data integrity issues which occur frequently given the disparate and inconsistent data sources. The DW reflects the fragmented and semi-automated nature of our financial systems as they exist today. The company has aggressively pursued inorganic growth by acquiring smaller logistics and supply chain companies globally. As a result, we end up having 25+ financial systems (some manual) that load data into the DW.
We receive about 220 files in various formats through different transmission modes: from emailed EXCEL sheets and CSV files, to direct database to database connections, and FTPd flat files. Going forward, as we rationalize our billing systems and migrate them into strategic applications, the number of data feeds should decrease, and their quality and consistency should improve. But that’s in the future…
In the meantime, I am interested to listen to your ideas in the following areas:
1) Methodologies and technologies in integrating data from disparate data sources
2) Methodologies and technologies in ensuring data quality
Read More!