Skip to main content

· 2 min read

img alt

We are extremely excited to introduce Velox as an open-source project, with a mission to define common standards for modular data processing systems. Velox provides reusable, extensible, high-performance, and dialect-agnostic data processing components for building execution engines, and enhancing data management systems. We envision Velox to be the defacto execution engine for Arrow-compatible data format powering ML and Analytical workloads.

The Velox team has been partnering with companies such as Ahana, Intel, and Voltron Data as well as various academic institutions to accelerate innovation and development in the data management industry.

Looking at the future, we believe Velox’s unified and modular nature has the potential to disrupt the data management industry. It will allow us to deepen our partnership with hardware vendors and proactively adapt our unified software stack as hardware advances. We believe that modularity and reusability are the future of database system development and hope a vibrant open source community will help us in this journey.

Quick Introduction -

If you are excited to learn what’s under the hood, refer to the Velox research paper.

If you are interested in contributing, visit our Contributing guide on Github. For technical discussions and exploring what’s happening within our community, refer to the discussions section.

We will be publishing a series of technical blogs on various topics on our website soon!

For more updates and exciting news follow us on Twitter.

· 2 min read
Deepak Majeti

Documentation

  • Add documentation for :ref:complex types writers<outputs-write>.

Core Library

  • Add support for INTERVAL DAY TO SECOND Presto type.
  • Allow cast between DATE and TIMESTAMP types.
  • Allow cast from JSON to scalar, ARRAY, and MAP types.
  • Add :ref:GroupIdNode<group-id-node> and GroupId operator to support aggregations over grouping sets.
  • Add support for function signatures with DECIMAL input and return types using flex and bison to evaluate formulas for calculating the return precision and scale based on input precisions and scales.
  • Add support for conversion of DuckDB DECIMALS to Velox DECIMALS.
  • Add support for running tasks on the caller's thread.
  • Fix expression evaluation to disable sub-expression optimization for non-deterministic functions.

Presto Functions

  • Add :func:degrees, :func:e, and :func:sha512 functions.
  • Add aggregate function :func:map_union.
  • Optimize :func:zip for the case when all arrays are flat and the same size.
  • Extend :func:plus, :func:minus functions to support DATE, INTERVAL DAY TO SECOND argument types.

Hive Connector

  • Add support for reading files from HDFS.
  • Add limited ORC support.
  • Optimize NOT IN (<list of integers>) filters pushed down into DWRF reader.

TPC-H Connector

  • Add totalParts and partNumber to TpchSplit.

Performance and Correctness

  • Add q3 to TPC-H benchmark.
  • Add utility to benchmark dataset generation speed to TPC-H connector.
  • Optimize constant aggregation mask.
  • Optimize VectorWriter for a subset of simple functions that return strings.
  • Optimize DictionaryVector wrapping LazyVector to load only necessary rows.

Debugging Experience

  • Separate the user exception stack from the runtime exception stack trace collection control.

Credits

Adam Simpkins, Aditi Pandit, Amit Dutta, Behnam Robatmili, Chad Austin, Connor Devlin, Daniel Ng, Dark Knight, Deepak Majeti, Denis Yaroshevskiy, Huameng Jiang, Jake Jung, Jialiang Tan, Jie1 Zhang, Jimmy Lu, Karteek Murthy, Katie Mancini, Ke Jia, Kevin Wilfong, Krishna Pai, Laith Sakka, Masha Basmanova, Michael Shang, Mindaugas Rukas, Orri Erling, Patrick Stuedi, Paul Saab, Pedro Eugenio Rocha Pedreira, Pramod Sathyanarayana, Sahana CB, Sergey Pershin, Wei He, Xavier Deguillard, Xiaoxuan Meng, Yating Zhou, Yoav Helfman, Zeyi (Rice) Fan, Zhenyuan Zhao, artem.malyshev, benitakbritto, frankobe, usurai, yingsu00, zhaozhenhui