Source linked

CR-Miner reduce los costes de perfil de datos en un 40-50% con el cambio de reglas

Una nueva formalización, Change Rules (CRs), modela cambios secuenciales en tuples ordenados, y su algoritmo CR-Miner logra un tiempo de ejecución 40-50% más rápido que las líneas de base existentes.

change rulescr minerdata profilingrelational databasescausal analysisdata quality

Database change management today is a hack: triggers, constraints, and ad-hoc statistics. None capture the causal 'why' behind attribute changes. A new formalization, Change Rules (CRs), finally models sequential change across ordered tuples — and the CR-Miner algorithm that discovers them runs 40-50% faster than prior baselines.

Why Triggers and Constraints Fall Short

Existing database systems lean on triggers, constraints, and statistical aggregates to track change. Those tools can tell you a value changed, but they can't represent the sequence of attributes that led to the change or the context in which it occurred. Change Rules (CRs) address that gap by quantifying sequential changes in both antecedent and consequent attributes, enabling trend analysis and causal reasoning that declarative dependencies can't express.

How Change Rules Work

CRs operate on ordered tuples — that's the key difference. They model change intervals over a restricted set of attributes, but unlike prior work, they also capture the context under which attribute changes happen. The paper defines CRs as dependencies that connect sequences of attribute values, so you can ask not just "what changed" but "what pattern of changes triggered this outcome?" That's a shift from passive monitoring to active causal profiling.

CR-Miner's Performance Gain

The authors propose CR-Miner, an automated algorithm that discovers CRs by generating candidate change intervals in a level-wise manner — think Apriori-style search tailored for sequences. Experimental results show an average runtime improvement of 40-50% over existing baselines. That's not marginal; it makes previously impractical analyses feasible on large databases.

Closing that gap means data teams can now profile change patterns at scale without waiting hours. CR-Miner's level-wise generation and the CR formalism's expressiveness suggest that causal data profiling is entering the realm of production-ready tooling.


Source: Data Profiling for Change Rules
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.