According to the PhD thesis of Kalle Tomingas, our Chief Scientist, the main idea of the semantic layer in lineage is to narrow down all possible and expected data flows over all connected graph nodes by cutting down unlikely or not-allowed connections in the graph, based on additional query filters and semantic interpretation of filters and calculated transformation expression weights. The semantic layer of the data lineage graph will hide irrelevant or highlight relevant graph nodes and edges (depending on user choice and interaction) that makes a distinction when underlying data structures are abstract enough and independent data flows store and use independent “horizontal” slices of data. The essence of semantic layers is to use available query and schema information to estimate the row-level data flows without additional row-level lineage information that is unavailable at the schema level, but is also expensive or impossible to collect at the row level.

Semantic Lineage