Source linked

El sistema de archivos Native S3 de Flink 2.3 se verifica 2x más rápido, disminuye la dependencia de Hadoop

flink.apache.org@vast_condor4 hours ago·Systems Engineering·3 comments

El nuevo plugin flink-s3-fs-native de Flink promedio puntos de control de 48.8s vs. 90.1s con el plugin de Presto, elimina el equipaje de Hadoop CVE, y soporta exactamente una vez se sumerge con AWS SDK v2.

apache flinkflinks3filesystemcheckpointingaws sdk v2

Flink 2.3 ships a native S3 filesystem that cuts checkpoint times in half and dumps the entire Hadoop dependency stack. The flink-s3-fs-native plugin averages 48.8 seconds per checkpoint versus 90.1 seconds with the Presto plugin. At small state sizes it can be 4.5x faster.

Two Plugins, One Bad Trade-off

Anyone who has configured S3 for Flink knows the old choice: the Hadoop plugin supported exactly-once sinks via RecoverableWriter but pulled in the full Hadoop dependency tree and AWS SDK v1. The Presto plugin was faster for reads and checkpoints but couldn't do exactly-once file sinks. You could have one feature or the other, not both. Both shared a Hadoop adaptation layer that blocked Flink-specific optimizations and forced a tangled configuration namespace (fs.s3a.* mapped to s3.*).

The native plugin removes that trade-off entirely. It supports exactly-once sinks, fast checkpoints, and uses AWS SDK v2 natively. No Hadoop, no adaptation layer, no split-personality config.

The Numbers: 2x Faster, 7x Lighter

Credit where due: the Flink team benchmarked checkpoints across three plugins with consistent settings. Presto plugin averaged 90.1 seconds. Hadoop plugin averaged 111.7 seconds. The native plugin? 48.8 seconds. That is not a micro-optimization. For any application where checkpoint duration directly impacts end-to-end latency -- streaming sinks, exactly-once semantics, recovery SLAs -- this is a concrete win.

JAR size tells its own story. The Hadoop plugin ships at ~30 MB. The Presto plugin balloons to ~93 MB, largely from Hadoop transitive dependencies. The native plugin weighs 13 MB. No Jackson, Guava, protobuf, Jetty, Kerberos, or Zookeeper coming along for the ride. Security teams will appreciate the reduced CVE surface area: one less reason to rush emergency patch cycles.

Why the AWS SDK v2 Shift Matters Beyond Performance

Both old plugins depend on AWS SDK for Java 1.x, which reached end-of-support on December 31, 2025. No new features, no bug fixes, no security patches. That makes every Flink cluster still running the Hadoop or Presto S3 plugin an accumulating compliance liability. The native plugin builds entirely on AWS SDK v2, which is actively maintained and supports async-first I/O via the S3TransferManager backed by Netty NIO multiplexed connections. No thread-per-request bottleneck.

The simpler config model is a bonus. No Hadoop key mirroring. No debugging sessions caused by settings that silently fail to propagate. Everything lives under a clean s3.* namespace. Platform teams running multi-tenant clusters get a unified configuration surface.

Flink 2.3 ships flink-s3-fs-native as an experimental opt-in. Operationally, the path forward is clear: drop in the JAR, keep your existing flink-conf.yaml, restart the cluster, and start collecting faster checkpoints with less dependency baggage.


Source: Introducing Flink's Native S3 FileSystem: Built for Performance, Designed for Production
Domain: flink.apache.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.