Skip to main content

One post tagged with "query-execution"

View All Tags

Why RIGHT SEMI JOIN Can Be Slower Than LEFT SEMI JOIN in Velox

· 5 min read
Rui Mo
Software Engineer

TL;DR

A query rewrite from A LEFT SEMI JOIN B to B RIGHT SEMI JOIN A can be much slower in Velox, even though the two forms are semantically equivalent.

At first glance, this rewrite looks attractive when A is much smaller than B: building on A should reduce build-side work and often helps regular hash joins. In one real query, we made exactly this change expecting a speedup, but instead observed roughly a 10x regression.

The root cause is execution asymmetry:

  • RIGHT SEMI needs to mark build-side rows as matched.
  • That marking (setProbedFlag) is random-memory-write heavy.
  • With duplicate or skewed keys, redundant marking work grows quickly.

A targeted optimization for RIGHT SEMI FILTER without extra filter moves marking into probe-time hit traversal and adds early stop logic for duplicate chains, significantly reducing redundant work.