FeatherCast

The voice of The Apache Software Foundation

Writing the Hazelcast Jet Runner Jozsef Bartok

September 13, 2019
timothyarthur

Hazelcast Jet is a distributed data processing engine that treats all data as a stream. Jet is built on top of Hazelcast IMDG and thanks to this the Jet cluster can also play the role of the data source and sink. If you use Jet this way, you can achieve perfect data locality and top-of- the-class throughputs.
Jet uses cooperative multithreading (comparable to green threads), its processors correspond to standalone single-threaded tasks that process streams of data. By not depending on OS-level thread scheduling Jet can achieve greater throughput and better saturate the CPU cores. Since the theory underlying Apache Beam has also influenced the development of Jet and there are many conceptual similarities, attempting to write a Runner based on Jet came naturally to us. When we started however the first big obstacle we hit was the relative lack of documentation. Figuring out what exactly a Runner needed to do required reverse engineering existing runners.
A second obstacle turned out to be the generic nature of the user defined functions used by Beam. While Jet’s cooperative execution model allows for great performance, it requires the processing of the data to be non-blocking and this is hard to assure when running completely opaque and generic DoFns.
Yet another difficulty was presented by trying to adapt Jet’s fault tolerance and processing guarantee mechanism for the Runner, a process that, even though difficult, might be more solvable than the above mentioned non-blocking requirement.
Despite these difficulties implementing an initial version of our runner turned out to be quite doable, even though experimental it’s quite capable of running a majority of the existing Beam test suits and we are proud to say that we are now listed on the Capability Matrix. We can also showcase Nexmark test results of our runner and describe possible avenues for future development.

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: