Data lineage is indispensable to data governance, compliance, cloud migration, and overall data-driven decision making. Though quite a few vendors claim to offer data lineage solutions, not all data lineages are created equal. In fact, some of these so-called data lineage solutions are so inadequate that they become more misleading than useful. Then, how can an organization determine whether a data lineage offering is effective?
You Can’t Link What You Can’t See
The prerequisite to data lineage is the ability to ingest all types of metadata (business, technical, operational, and social) from a wide range of technology sources. Can you handle PL1, JCL, and COBOL in the mainframe? Can the data lineage tool parse scripts of ETL jobs written in Perl? Does it support NOSQL databases such as MongoDB? Can it scan Python or Java code automatically? How much coverage does it have for the SAP environment?
If the answer is no, not really, or not much, then this tool is insufficient, to say the least, and end-to-end traceability is only a pipedream. You can’t visualize what is unknown and link what you can’t see. It is as simple as that.
Lineage Is Not an Orphan
When data lineage is positioned as a standalone product, organizations should see a warning sign. That is because it means that extra work, often by expensive consultants, is needed to integrate it with the rest of the data governance regime.
An effective data lineage solution should be natively integrated with other capabilities such as data catalog, active metadata, impact analysis, and metadata analytics in a data fabric. Only in this way, can data lineage become truly effective and can enterprises achieve the lowest total cost of ownership.
The Layer Cake Is Delicious
Another key characteristic of an effective data lineage solution is the capability of offering different perspectives for different personas. While a CDO or CFO may want to look at a high level view, a business manager is interested only in the Line of Business information, a data analyst or data engineer tends to need to see more details.
A comprehensive data lineage should offer options of all these layers, from an entity, LOB, organization, domain, application to system and all details, thus granting users the convenience to get what they need right away.
Data Lineage Reimagined
Orion Governance’s Enterprise Information Intelligence Graph (EIIG) offers the industry-leading data lineage in a self-defined fabric. EIIG automatically scans metadata from 70+ technology sources including the mainframe, AS400, SAP, NOSQL, Java, Scala, and Python. It then stitches it into a knowledge graph and at the same time establishes data lineage and builds a data catalog.
Because EIIG scans the source code, it gets the DNA of the datasets and thus provides much more granular details in the lineage. Users can move back and forth between data lineage and data catalog effortlessly. They can perform impact analysis right in the data lineage and get results in seconds just by clicking a data element in the lineage graph. They can also conduct metadata analytics right in the lineage. An example is similarity analysis. EIIG enables users to identify similar tables, reports, or ETL jobs, again right in the lineage graph. Furthermore, metadata is activated in the lineage.
Data citizens can view quality scores at each stage of the data flow, and whether and how it changes after transformations. In the same way, EIIG propagates trust all the way to the destination. By integrating with a BI tool such as Tableau, Qlik, or PowerBI, users can see metrics such as quality score, trust score, and value score right in the report and dive into the data lineage directly from there when desired.
To find out how data lineage is reimagined, please click here to request a demo.
About the Author: Niu Bai, Ph.D. is the Head of Global Business Development at Orion Governance, Inc. Connect with Niu on LinkedIn.