(cache) Apache Arrow Homepage

Fast

Arrow enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.

Flexible

Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

Standard

Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.

Each system has its own internal memory format
70-80% CPU wasted on serialization and deserialization
Similar functionality implemented in multiple projects

All systems utilize the same memory format
No overhead for cross-system communication
Projects can share functionality (eg, Parquet-to-Arrow reader)

Name	Alias (email is <alias>@apache.org)
Jacques Nadeau	jacques
Todd Lipcon	todd
Ted Dunning	tdunning
Michael Stack	stack
P. Taylor Goetz	ptgoetz
Julian Hyde	jhyde
Reynold Xin	rxin
James Taylor	jamestaylor
Julien Le Dem	julien
Jake Luciani	jake
Jason Altekruse	json
Alex Levenson	alexlevenson
Parth Chandra	parthc
Marcel Kornacker	marcel
Steven Phillips	smp
Hanifi Gunes	hg
Abdelhakim Deneche	adeneche
Wes McKinney	wesm

Apache Arrow Upcoming Events

Apache Arrow

Fast

Flexible

Standard

Developer Mailing List

Links

Performance Advantage of Columnar In-Memory

Advantages of a Common Data Layer

Committers