Data Integration (Looking for Ideas)

I work for a transportation and logistics company based in Atlanta. Several years ago, I was involved in the design and development of a Data Warehouse (DW). The data warehouse was to become the data integration point which would serve as a global repository of the company’s customer and revenue data. It was built primarily to support Sales Compensation, but over time it has evolved to become the company’s main data source for managerial reporting systems. To date, there are about 30 reporting systems that rely on data sourced from the Data Warehouse.

The team that supports the ETL (Extract, Transform Load) processes that load and integrates data in the DW spend about 50% of their time resolving data integrity issues which occur frequently given the disparate and inconsistent data sources. The DW reflects the fragmented and semi-automated nature of our financial systems as they exist today. The company has aggressively pursued inorganic growth by acquiring smaller logistics and supply chain companies globally. As a result, we end up having 25+ financial systems (some manual) that load data into the DW.

We receive about 220 files in various formats through different transmission modes: from emailed EXCEL sheets and CSV files, to direct database to database connections, and FTPd flat files. Going forward, as we rationalize our billing systems and migrate them into strategic applications, the number of data feeds should decrease, and their quality and consistency should improve. But that’s in the future…

In the meantime, I am interested to listen to your ideas in the following areas:
1) Methodologies and technologies in integrating data from disparate data sources
2) Methodologies and technologies in ensuring data quality

Share this post!

Bookmark and Share

4 comments:

Dmitriy said...

Have them fill out a template (excel as a basic one) and send it to you. Putting a mandate in place that says unless its in the mandated format, it won't get processed and you won't get paid (or some other punishment) would get them in line very quickly. My non-technical business approach to this.

Also a finance app that lives on a website which your partners can use to uploaded or enter their data manually onto can be used by you as a single source to pull from.

Max said...

Another way would be to not necessarily force your own template on them, but let them come up and control the template. At the same time, charge either their or the central IT staff with creating a mapping for importing the data. This way it minimizes their resentment in compliance since they had a say in the data template, while you would still get the necessary data into the central system.

Jack G. Zheng said...

Thanks for an excellent case for integration. This again proves that technology does not solve all the problems. I believe in organizational policies that will make things better. I agree with Dmitriy that some degree of organizational wide policy should be forced. In this case, the solution may depend more on the business side rather than the technical side.

phillc said...

To me, it sounds like hell that you have to receive so many different data types. I think if i were in that spot, I would be hiring a bunch of college interns to deal with it and get it to me standardized, and then I do the cool stuff with it.

Post a Comment