Source linked

No Schema? No Problem: Auto-Trace Pipeline Hits 85% Next-Event Accuracy

A schema-agnostic pipeline reconstructs process execution traces from raw relational data, achieving 85% next-event prediction accuracy and 82% precedence recovery on TPC-H/E and industry benchmarks.

temporal convolutional networktpch eprocess miningschema agnostictrace constructionarxiv

85% next-event prediction accuracy from raw, schema-less tables. That is what a Temporal Convolutional Network plus a clever statistical pipeline delivers for process trace reconstruction in OLTP environments where schemas drift and keys are sparse.

Why Schemas Fail in Modern OLTP

Traditional information systems engineering assumes stable schemas, explicit foreign keys, and curated event logs. In practice, modern OLTP systems produce the opposite: schemas drift continuously, keys are optional or missing, and execution traces scatter across loosely connected tables. Manual trace construction becomes costly and error prone. The old approach of relying on ER diagrams and domain templates simply does not scale when the database changes weekly.

Four Steps from Raw Tables to Ordered Events

The proposed pipeline takes raw relational data and does four things automatically. First, it identifies columns that behave like keys or timestamps using statistical signals rather than predefined schema metadata. Second, it discovers table-to-table connections without any schema hints. Third, it assembles and orders events for each case, handling multiple date fields gracefully. Fourth, it learns likely ordering and flow relations using a Temporal Convolutional Network that models long-range dependencies.

The TCN is the key enabler. It captures patterns across sequences of varying length, which is critical when execution behavior spans months and hundreds of steps. No hand-crafted rules for ordering; the network learns from the data itself.

Benchmarks That Matter: TPC-H/E and Industry Data

Evaluation on the TPC-H/E benchmark, synthetic corpora, and a real industry dataset shows consistent results: 85% accuracy in predicting the next event, and 82% of ground-truth precedence relations recovered correctly. These are not cherry-picked numbers from a toy dataset. TPC-H/E is a standard decision-support benchmark; the industry dataset reflects actual ERP-like environments.

By eliminating dependence on predefined schemas, this pipeline opens a path to automated process mining in continuously evolving information systems. No more weeks spent reverse-engineering a database just to build a process model.

The next step is obvious: integrate this pipeline into live observability stacks so that trace reconstruction happens on every schema change, not once per quarter.


Source: Schema-Agnostic Process Trace Construction: From Raw Tables to Execution Behavior
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.