One post tagged with "query-execution"

Why RIGHT SEMI JOIN Can Be Slower Than LEFT SEMI JOIN in Velox

June 26, 2026 · 5 min read

Software Engineer

TL;DR

A query rewrite from A LEFT SEMI JOIN B to B RIGHT SEMI JOIN A can be much slower in Velox, even though the two forms are semantically equivalent.

At first glance, this rewrite looks attractive when A is much smaller than B: building on A should reduce build-side work and often helps regular hash joins. In one real query, we made exactly this change expecting a speedup, but instead observed roughly a 10x regression.

The root cause is execution asymmetry:

RIGHT SEMI needs to mark build-side rows as matched.
That marking (setProbedFlag) is random-memory-write heavy.
With duplicate or skewed keys, redundant marking work grows quickly.

A targeted optimization for RIGHT SEMI FILTER without extra filter moves marking into probe-time hit traversal and adds early stop logic for duplicate chains, significantly reducing redundant work.

TL;DR​

TL;DR