FLOSS Weekly with Doc Searls

Nov 8th 2017

FLOSS Weekly 458

Crail

Open source user-level I/O architecture for the Apache data processing ecosystem

Although the show is no longer in production at TWiT, you can enjoy episodes from our archives.
Category: News

Crail is a storage platform for sharing performance critical data in distributed data processing jobs at very high speed. Crail is built entirely upon principles of user-level I/O and specifically targets data center deployments with fast network and storage hardware (e.g., 100Gbps RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation such resource disaggregation or server-less computing. Crail is written in Java and integrates seamlessly with the Apache data processing ecosystem (e.g., Spark, Hadoop, Flink). It can be used as (i) a backbone to accelerate high-level data operations such as shuffle, reduce, or broadcast; (ii) a cache to store hot data that is queried repeatedly; (iii) a storage platform for sharing inter-job data in complex multi-job pipelines. Last week, Crail has been voted in to become an Apache Incubator project.

Links