Skip to main content

One post tagged with "vectors"

View All Tags

FlatMapVector Adoption for Scaling High-Performance AI/ML Data Pre-Processing

· 7 min read
Peter Enescu
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta
Kk Pulla
Software Engineer @ Meta
Yuhta
Software Engineer @ Meta
Kevin Wilfong
Software Engineer @ Meta

Context

At Meta, features used for AI use cases are largely combined and stored within warehouse tables as map columns because frequent access to and manipulation of these features can scale poorly if modelled as top-level columns, which would result in extremely wide tables and frequent schema changes. Thus, to provide maximum flexibility, features are modeled as maps.

In a traditional columnar layout, map columns are typically represented in-memory by a few data streams. The diagram below illustrates an example dataset. Two main buffers or streams are allocated for map keys and values. Additional buffers are used for null flags and map offsets or lengths (note that map keys are non-nullable):