Lineage is a term we usually associate with heredity or a family tree. The digital world borrowed that word a couple of decades ago and fused it with data. So, what is the definition of data lineage and why is it important?
Very simply, data lineage is a cradle-to-grave view of a piece of data in the enterprise from the moment of its creation (or its import) to its final resting place (or its export). That view is analogous to a detailed biography of a famous individual. Much like a celebrity’s life, a piece of data:
- Often lives in multiple places
- May have significant impacts on other pieces of data
- May be impacted by other pieces of data
- May undergo one or multiple transformations
- Requires care and maintenance
- Must be reliable and trustworthy throughout its life
Why Data Lineage Is Important
It’s helpful to think about the evolution of enterprise computing over the last sixty years. In the Stone Age of data management, there were only big mainframes and a lot of paper. Digital data typically existed only in a few places:
- A sequential file on a tiny, enormously expensive hard disk
- Backup tapes
- Archive vaults
- Punch cards
No one needed to document where digital data lived. With the exception of the largest corporations, anyone in the so-called data processing department kept those details in their heads.
The problems started with technology advances that enabled less expensive mainframes, minicomputers, microcomputers, personal computers, and so forth. Today, any employee with a credit card can, in a few minutes, purchase their own computing environment in a cloud. In a modern enterprise, a piece of data can live in dozens or hundreds of places. Many of those instances (if not most) are not visible to the people who are accountable for corporate data management. This is a very risky state of affairs for any business.
That’s why data lineage definition is so important.
Data Lineage Challenges
Data isn’t always represented in convenient rows and columns. In fact, most data assets in an enterprise are unstructured (e.g., documents, emails, graphics). Extracting data lineage from unstructured data is far more challenging. That’s why employing a comprehensive, fully automated system that surfaces every data lineage in an enterprise is so important.
The data lineage definition includes complete transparency at every stage of a data item’s lifecycle. It’s not sufficient to look at inputs and outputs of software processes within a lifecycle. A full data lineage means revealing the inner workings of each process that touches data. This includes both the how and why of every transformation.
How does this visibility help businesses? It enables key processes or makes existing processes more efficient. For example:
- Regulatory reporting
- Software testing
- Data migration
- Error resolution
- Reveal opportunities for new or better uses of data assets
Orion Governance Data Lineage Solution
An important part of data lineage is the organization and display of lineage information in a way that’s comprehensible and fit for purpose. Graphical representations are typical. The most utilitarian tool is the Orion Governance Enterprise Information Intelligence Graph (EIIG), a visual representation of data lineages derived from metadata of existing data assets.
Orion Governance’s platform is a comprehensive, enterprise-wide solution that reveals data lineage in ways other platforms cannot. See Orion Governance data lineage in action with a free demo today.