Source linked

43 of 60 Schema Conversions Succeed When You Treat Converters as Black-Box Graph Edges

An empirical study of 60 real-world schema conversion tasks across JSON Schema, XSD, and SHACL found that orchestrating existing black-box converters succeeds 72% of the time and pinpoints the specific gaps for the...

schema conversion orchestratormetaconfiguratorjson schemaxsdshacldeveloper tools

Only 43 of 60 real-world schema conversion tasks succeeded when we black-box orchestrated every converter we could find. The remaining 17 failures aren't a bug report - they're the precise map of where the schema converter ecosystem is broken.

Modern software lives in a Babel of schema languages. A single data model might exist as JSON Schema for a web API, as XSD for legacy data exchange, and as SHACL for a knowledge graph. Keeping those representations consistent as the model evolves is a construction and maintenance headache that every senior engineer has felt. Converters exist, but they're scattered, of uneven quality, and frequently lossy.

Schema Orchestration Treats Converters as Black-Box Edges

The paper models each schema language as a node and each converter as a directed edge. A conversion becomes a path through this graph, discovered, executed, ranked by quality, and reported with full per-step provenance. Failures don't abort - the orchestrator tries alternative paths. The implementation, called Schema Conversion Orchestrator, is integrated into MetaConfigurator and is open source.

This approach sidesteps the impossible task of fixing every converter. Instead, it treats converters as imperfect black boxes and uses orchestration to find the best route anyway. The hard part is ranking: the study used agent-assisted, human-reviewed quality annotations on outputs across five schema languages, including JSON Schema, XSD, and SHACL.

43 of 60: Where Orchestration Works and Where It Doesn't

60 conversion tasks built from real schemas. The orchestrator surfaced a usable result for 43 of them. That's a 72% success rate, which sounds reasonable until you realize the remaining 28% represent tasks that simply cannot be done with today's off-the-shelf converters. Each failure was analyzed to pinpoint exactly which converter or language pair caused the problem.

These aren't synthetic edge cases. The schemas are real, drawn from production data models. The orchestrator is reproducible - anyone can run the same 60 tasks and verify the failures. The paper includes the full breakdown, giving tool builders specific targets: fix the conversion between XSD and SHACL, improve lossy handling for JSON Schema to XSD, and so on.

Measuring conversion quality is itself an open problem. The study's agent-assisted annotation pipeline is a practical stopgap, but the authors argue the community needs a standard quality metric, not ad-hoc human reviews.

The remaining 17 gaps are now concrete targets for anyone building the next generation of schema converters.


Source: Orchestrating Black-Box Schema Converters: An Empirical Study of Automated, Quality-Ranked Conversion Across Heterogeneous Schema Languages
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.