Skip to main content

3 posts tagged with "performance"

View All Tags

Accelerating Unicode string processing with SIMD in Velox

· 8 min read
Ping Liu
Software Engineer
Yuhta
Software Engineer @ Meta
Masha Basmanova
Software Engineer @ Meta

TL;DR

We optimized two Unicode string helpers — cappedLengthUnicode and cappedByteLengthUnicode — by replacing byte-by-byte utf8proc_char_length calls with a SIMD-based scanning loop. The new implementation processes register-width blocks at a time: pure-ASCII blocks skip in one step, while mixed blocks use bitmask arithmetic to count character starts. Both helpers now share a single parameterized template, eliminating code duplication.

On a comprehensive benchmark matrix covering string lengths from 4 to 1024 bytes and ASCII ratios from 0% to 100%, we measured 2–15× speedups across most configurations, with no regressions on Unicode-heavy inputs. The optimization benefits all callers of these helpers, including the Iceberg truncate transform and various string functions.

Improve LIKE's performance

· 5 min read
James Xu
Software Engineer @ Alibaba

What is LIKE?

LIKE is a very useful SQL operator. It is used to do string pattern matching. The following examples for LIKE usage are from the Presto doc:

SELECT * FROM (VALUES ('abc'), ('bcd'), ('cde')) AS t (name)
WHERE name LIKE '%b%'
--returns 'abc' and 'bcd'

SELECT * FROM (VALUES ('abc'), ('bcd'), ('cde')) AS t (name)
WHERE name LIKE '_b%'
--returns 'abc'

SELECT * FROM (VALUES ('a_c'), ('_cd'), ('cde')) AS t (name)
WHERE name LIKE '%#_%' ESCAPE '#'
--returns 'a_c' and '_cd'

Learnings from optimizing try_cast

· 4 min read
Laith Sakka
Software Engineer @ Meta

One of the queries shadowed internally at Meta was much slower in Velox compared to presto(2 CPU days vs. 4.5 CPU hours). Initial investigation identified that the overhead is related to casting empty strings inside a try_cast.

In this blogpost I summarize my learnings from investigating and optimizing try_cast.