Accelerating Unicode string processing with SIMD in Velox
TL;DR
We optimized two Unicode string helpers — cappedLengthUnicode and
cappedByteLengthUnicode — by replacing byte-by-byte utf8proc_char_length
calls with a SIMD-based scanning loop. The new implementation processes
register-width blocks at a time: pure-ASCII blocks skip in one step, while
mixed blocks use bitmask arithmetic to count character starts. Both helpers now
share a single parameterized template, eliminating code duplication.
On a comprehensive benchmark matrix covering string lengths from 4 to 1024 bytes and ASCII ratios from 0% to 100%, we measured 2–15× speedups across most configurations, with no regressions on Unicode-heavy inputs. The optimization benefits all callers of these helpers, including the Iceberg truncate transform and various string functions.




