News & blog

Read the latest news and blogs from The Data Lab and Scotland’s data science community.

Forget Blue Sky Thinking, Where Data Quality is Concerned it’s all about “Life Cycle Thinking”

Guest blog post by Brian Rutherford, Director at Eyecademy.

With corporate data growing at over 40% each year and up to 25% of that data corrupted, there are teams sitting in meeting rooms up and down the country stating: “We know we have a data quality problem but we don’t know what to do about it”.

“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.” – Lewis Carroll, Alice in Wonderland

Where to start? It’s an obvious enough question but the answer can be more elusive. You know where the issues are. For example, when examining a customer account you discover duplication, balances are misassigned and credit ratings misapplied. The knock-on effects are leading to unsatisfied customers and damage to your reputation. Root cause analysis is needed but your data goes through so many stages before it’s actually encountered by the end-users that getting to the root of the problem looks like it’s going to take an exhaustive investment in time and budget.

One approach to help bring clarity to the problem is ‘Life Cycle Thinking’ which refocuses your analysis from the data itself to the data’s journey throughout your organisation and the departments/functions who interact with that data. Life Cycle Thinking helps analyse and segment activities in such a way that you can identify what’s happening and at what stage of the life cycle those activities are taking place. The data life cycle has six stages although it’s very likely that only a few functions will encounter them all.

  • Plan: Prepare for the data.
  • Obtain: Acquire the data.
  • Store & Share: Hold the data electronically or in hardcopy and share it through some kind of distribution method.
  • Maintain: Ensure the data continues to work properly.
  • Apply: Use the data to accomplish your goals.
  • Delete: Discard the data that is no longer in use.

A Reusable Resource

Data quality is affected by activities in all the phases of the lifecycle. All stages of the Data Lifecycle have a cost but it’s only when you Apply the data that you get value from it. This means when you Apply bad data you will either have a negative or reduced impact. Not only that “ since data is a re-useable resource that negative impact could be applied again and again. Given that information increases in value the more you use it, then the converse is also true. The more you use bad data, the greater the negative effect.

So Far, So Gloomy

The good news is that by applying Data Life Cycle Thinking you identify the activities that impact data quality and make the start that has so far eluded you. For example, you are responsible for the Customer Credit Data in a large finance institution. The head of Credit Decision Monitoring is concerned about the quality of the customer Credit data that supports his department. If he were to describe the organisation to you, then you would need to consider the teams involved in each area. Which teams have input into the planning process for the customer information? Which obtain the data? Who uses or applies the customer credit information? Who maintains the data and who can dispose of it? At a high level you might end up with the diagram below.

With a quick 10 minute conversation you can identify that the Electronic Lending Platform (ELP) team and the IT team both Obtain information in various ways and both maintain it. This makes sense as often the data is obtained from customers directly by the ELP team through telephone conversations and occasionally face-to-face meetings. Meanwhile, IT Obtain their data through a large customer information Data warehouse. Therefore, to avoid duplicate customer records there needs to be a process for identifying existing customers when applying new customers through the ELP.

Do all of the teams Obtaining data for your organisation receive the same data entry training? Do they work to the same set of data entry standards?

If the answer is No then you have a data quality problem “ you just don’t know how big it is or what pieces of data are most affected. To use our finance example again, you can see the ELP team Obtains, Maintains and Applies the data but they are not involved in the Planning stage. This suggests that important requirements are being missed and could be impacting on data quality. The ELP and IT teams are both able to Maintain and Dispose of customer records. Unless there are clear guidelines strictly followed by both teams then once again, you almost certainly have a data quality problem.

Are all of the needs of those entering data being met? For example, during a data project for a large financial institution I discovered one team were able to create customer numbers manually and would often add extra letters at the end of the Customer Number to indicate whether the customer was retail or non-retail. This led to a non-standard customer number and a mountain to climb to have them link to customer data elsewhere. In our example the ELP team might be doing something very similar. If so, then we have discovered yet another data quality problem.

The process of mapping lifecycle thinking to the business can be as high level or as detailed as you want. Although I’ve only used a finance example here, there are numerous areas where the process can be applied to your business data, such as mapping out where your company has interactions with customers, especially for organisations that deal with customers online and offline. Your business might have multiple communications with the customer, multiple ways to store customer data and various ways to apply it. Some of those communications might even be re-contacting the customer. Applying the Data Lifecycle method by ‘tagging’ the different interfaces with the customer can provide a clear, structured way to identify where your data quality issues might be.

There is no magic bullet, but with data quality management becoming an increasingly important and integral part of business success, Data Life Cycle Thinking provides a clear and consistent methodology to tackle the ever-increasing challenges that the data age poses.

“Begin at the beginning with Data Lifecycle Thinking”.

Brian Rutherford – Director, Eyecademy

Brian is director and co-founder of Eyecademy. He has over 25 years’ experience working in business intelligence and data, leading complex data and business analytics projects in finance, manufacturing, health and energy sectors.

Eyecademy

Eyecademy provides Business Intelligence services tailored to their clients’ needs. They have been helping organisations use their data as an important Business resource since 2008. Their approach is client-focused, business-led and driven by gaining a clear understanding of the key data within an organisation; providing solutions that are easy to use and maximise today’s extensive capabilities in terms of visualisation and mobility to ease consumption and interpretation.

Share this story:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on email
Email