Skip to main content

3 posts tagged with "primer"

View All Tags

A Velox Primer, Part 3

· 10 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

At the end of the previous article, we were halfway through running our first distributed query:

SELECT l_partkey, count(*) FROM lineitem GROUP BY l_partkey;

We discussed how a query starts, how tasks are set up, and the interactions between plans, operators, and drivers. We have also presented how the first stage of the query is executed, from table scan to partitioned output - or the producer side of the shuffle.  

In this article, we will discuss the second query stage, or the consumer side of the shuffle.

A Velox Primer, Part 2

· 9 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

In this article, we will discuss how a distributed compute engine executes a query similar to the one presented in our first article:

SELECT l_partkey, count(*) FROM lineitem GROUP BY l_partkey;

We use the TPC-H schema to illustrate the example, and Prestissimo as the compute engine orchestrating distributed query execution. Prestissimo is responsible for the query engine frontend (parsing, resolving metadata, planning, optimizing) and distributed execution (allocating resources and shipping query fragments), and Velox is responsible for the execution of plan fragments within a single worker node. Throughout this article, we will present which functions are performed by Velox and which by the distributed engine - Prestissimo, in this example.

A Velox Primer, Part 1

· 6 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

This is the first part of a series of short articles that will take you through Velox’s internal structures and concepts. In this first part, we will discuss how distributed queries are executed, how data is shuffled among different stages, and present Velox concepts such as Tasks, Splits, Pipelines, Drivers, and Operators that enable such functionality.