Skip to main content

3 posts tagged with "packaging"

View All Tags

Enabling Shared Library Builds in Velox

· 8 min read
Jacob Wujciak-Jens
Software Engineer @ Voltron Data

In this post, I’ll share how we unblocked shared library builds in Velox, the challenges we encountered with our large CMake build system, and the creative solution that let us move forward without disrupting contributors or downstream users.

The State of the Velox Build System

Velox’s codebase was started in Meta’s internal monorepo, which still serves as a source of truth. Changes from pull requests in the Github repository are not merged directly via the web UI. Instead, the changes are imported into the internal review and CI tool Phabricator, as a ‘diff’. There, it has to pass an additional set of CI checks before being merged into the monorepo and in turn, exported to Github as a commit on the ‘main’ branch.

Internally, Meta uses the Buck2 build system which, like its predecessor Buck, promotes small, granular build targets to improve caching and distributed compilation. Externally, however, the open-source community relies on CMake as a build system. When the CMake build system was first added, its structure mirrored the granular nature of the internal build targets.

The result was hundreds of targets: over one hundred library targets, built as static libraries, and more than two hundred executables for tests and examples. While Buck(2) is optimized for managing such granular targets, the same can not be said for CMake. Each subdirectory maintained its own CMakeLists.txt, and header includes were managed through global include_directories(), with no direct link between targets and their associated header files. This approach encouraged tight coupling across module boundaries. Over the years, dependencies accumulated organically, resulting in a highly interconnected, and in parts cyclic, dependency graph.

Static Linking vs. Shared Linking

The combination of 300+ targets and static linking resulted in a massive build tree, dozens of GiB when building in release mode and several times larger when built with debug information. This grew to the point where we had to provision larger CI runners just to accommodate the build tree in debug mode!

Individual test executables could reach several GiB, making it impractical to transfer the executables between runners to parallelize the execution of the tests across different CI runners. Significantly delaying CI feedback for developers when pushing changes to a PR.

This size is the result of each static library containing a full copy of all of its dependencies' object files. With over 100+ strongly connected libraries, we end up with countless redundant copies of the same objects scattered throughout the build tree, blowing up the total size.

CMake usually requires the library dependency graph to be acyclic (a DAG). However it allows circular dependencies between static libraries, because it’s possible to adjust the linker invocation to work around possible issues with missing symbols.

For example, say library A depends on foo() from library B, and B in turn depends on bar() defined in A. If we link them in order A B, the linker will find foo() in B for use in A but will fail to find bar() for use in B. This happens because the linker processes the libraries from left to right and symbols that are not (yet) required are effectively ignored.

The trick is now simply to repeat the entire circular group in the linker arguments A B A B. Previously ignored symbols are now in the list of required symbols and can be resolved when the linker processes the repeated libraries.

However, there is no workaround for shared libraries. So the moment we attempted to build Velox as a set of shared libraries, CMake failed to configure. The textbook solution would involve refactoring dependencies between targets, explicitly marking dependencies - as PUBLIC( meaning forwarded to both direct and transitive dependents) or PRIVATE, ( required only for this target at build time), and manually breaking cycles. But with hundreds of targets and years of organic growth, this became an unmanageable task.

Solution Attempt 1: Automate

Manually addressing this was unmanageable so, our first instinct as Software Engineers was, of course, to automate the process. I wrote a python package that parses all CMakelists.txt files in the repository using Antlr4, tracks which source files and headers belong to which target and reconstructs the library dependency graph. Based on the graph and whether headers are included from a source or a header file, the tool grouped the linked targets as either PRIVATE or PUBLIC.

While this worked locally and made it easier to resolve the circular dependencies, it proved far too brittle and disruptive to implement at scale. Velox's high development velocity meant many parts of the code base changed daily, making it impossible to land modifications across over 250+ CMake files in a timely manner while also allowing for thorough review and testing. Maintaining this carefully constructed DAG of dependencies would also have required an unsustainable amount of specialized reviewer attention, time that could be spent much more productively elsewhere.

Solution: Monolithic Library via CMake Wrappers

Given the practical constraints we needed a solution that would unblock shared builds without disrupting the ongoing development. To achieve this we used a common pattern of larger CMake code bases, project specific wrapper functions. A special thanks to Marcus Hanwell for introducing me to the pattern he successfully used in VTK.

We created wrapper functions for a selection of the CMake commands that create and manage build targets: add_library became velox_add_library, target_link_libraries turned into velox_link_libraries etc. These functions just forward the arguments to the wrapped core command, unless the option to build Velox as a monolithic library is set. In the latter case instead of ~100 only a single library target is created.

Each call to velox_add_library just adds the relevant source files to the main velox target. To ensure drop-in compatibility, for every original library target, the wrapper creates an ALIAS target referring to velox. This way, all existing code, downstream consumers, old and new test targets, can continue to link to their accustomed velox_xyz targets, which effectively point to the same monolithic library. There is no need to adjust CMake files throughout the codebase, keeping the migration quick and entirely transparent to developers and users.

Some special-case libraries, such as those involving the CUDA driver, still needed to be built separately for runtime or integration reasons. The wrapper logic allows exemptions for these, letting them be built as standalone libraries even in monolithic mode.

Here's a simplified version of the relevant logic:

function(velox_add_library TARGET)
if(VELOX_MONO_LIBRARY)
if(TARGET velox)
# Target already exists, append sources to it.
target_sources(velox PRIVATE ${ARGN})
else()
add_library(velox ${ARGN})
endif()
# create alias for compatibility
add_library(${TARGET} ALIAS velox)
else()
# Just call add_library as normal
add_library(${TARGET} ${ARGN})
endif()
endfunction()

This new, monolithic library obviously doesn’t suffer from cyclic dependencies, as it only has external dependencies. So after implementing the wrapper functions, building Velox as a shared library only required some minor adjustments!

We have now switched the default configuration to build the monolithic library and transitioned to build the shared library in our main PR CI and macOS builds.

Result

The results of our efforts can be clearly seen in our Build Metrics Report with the size of the resulting binaries being the biggest win as seen in Table 1. The workflow that collects these metrics uses our regular CI image with GCC 12.2 and ld 2.38.

Velox Build Metrics Report

The new executables built against the shared libvelox.so are a mere ~4% of their previous size! This unlocks improvements across the board both for the local developer experience, packaging/deployments for downstream users as well as CI.

Size of executablesStaticShared
Release18.31G0.76G
Debug244G8.8G

Table 1: Total size of test, fuzzer and benchmark executables

Less impressive but still notable is the reduction of the debug build time by almost two hours1. The improvement is entirely explained by the much faster link time for the shared build of ~30 minutes compared to the 2 hours and 20 minutes it takes to link the static library.

Total CPU TimeStaticShared
Release13:21h13:23h
Debug12:50h11:07h

Table 2: Total time spent on compilation and linking across all cores

In addition to unblocking shared builds, the wrapper function pattern offers further benefits for future feature additions or build-system wide adjustments with minimal disruption. We already use them to automate installation of header files along the Velox libraries, with no changes to the targets themselves.

The wrapper functions will also help with defining and enforcing the public Velox API by allowing us to manage header visibility, component libraries like GPU and Cloud extensions and versioning in a consistent and centralized manner.

Footnotes

  1. This is of course much less in terms of wall time the user experiences (~ CPU time/n where n is the number of cores used for the build)

Optimizing and Migrating Velox CI Workloads to Github Actions

· 4 min read
Jacob Wujciak-Jens
Software Engineer @ Voltron Data
Krishna Pai
Software Engineer @ Meta

TL;DR

In late 2023, the Meta OSS (Open Source Software) Team requested all Meta teams to move the CI deployments from CircleCI to Github Actions. Voltron Data and Meta in collaboration migrated all the deployed Velox CI jobs. For the year 2024, Velox CI spend was on track to overshoot the allocated resources by a considerable amount of money. As part of this migration effort, the CI workloads were consolidated and optimized by Q2 2024, bringing down the projected 2024 CI spend by 51%.

Velox’s Continuous Integration Workload

Continuous Integration (CI) is crucial for Velox’s success as an open source project as it helps protect from bugs and errors, reduces likelihood of conflicts and leads to increased community trust in the project. This is to ensure the Velox builds works well on a myriad of system architectures, operating systems and compilers - along with the ones used internally at Meta. The OSS build version of Velox also supports additional features that aren't used internally in Meta (for example, support for Cloud blob stores, etc.).

When a pull request is submitted to Velox, the following jobs are executed:

  1. Linting and Formatting workflows:
    1. Header checks
    2. License checks
    3. Basic Linters
  2. Ensure Velox builds on various platforms
    1. MacOS (Intel, M1)
    2. Linux (Ubuntu/Centos)
  3. Ensure Velox builds under its various configurations
    1. Debug / Release builds
    2. Build default Velox build
    3. Build Velox with support for Parquet, Arrow and External Adapters (S3/HDFS/GCS etc.)
    4. PyVelox builds
  4. Run prerequisite tests
    1. Unit Tests
    2. Benchmarking Tests
      1. Conbench is used to store and compare results, and also alert users on regressions
    3. Various Fuzzer Tests (Expression / Aggregation/ Exchange / Join etc)
    4. Signature Check and Biased Fuzzer Tests ( Expression / Aggregation)
    5. Fuzzer Tests using Presto as source of truth
  5. Docker Image build jobs
    1. If an underlying dependency is changed, a new Docker CI image is built for
      1. Ubuntu Linux
      2. Centos
      3. Presto Linux image
  6. Documentation build and publish Job
    1. If underlying documentation is changed, Velox documentation pages are rebuilt and published
    2. Netlify is used for publishing Velox web pages

Velox CI Optimization

Previous implementation of CI in CircleCI grew organically and was unoptimized, resulting in long build times, and also significantly costlier. This opportunity to migrate to Github Actions helped to take a holistic view of CI deployments and actively optimized to reduce build times and CI spend. Note however, that there has been continued investment in reducing test times to further improve Velox reliability, stability and developer experience. Some of the optimizations completed are:

  1. Persisting build artifacts across builds: During every build, the object files and binaries produced are cached. In addition to this, artifacts such as scalar function signatures and aggregate function signatures are produced. These signatures are used to compare with the baseline version, by comparing against the changes in the current PR to determine if the current changes are backwards incompatible or bias the newly added changes. Using a stash to persist these artifacts helps save one build cycle.

  2. Optimizing our Instances: Building Velox is Memory and CPU intensive job. Some beefy instances (16-core machines) are used to build Velox. After the build, the build artifacts are copied to smaller instances (4 core machines) to run fuzzer tests and other jobs. Since these fuzzers often run for an hour and are less intensive than the build process, it resulted in significant CI savings while increasing the test coverage.

Instrumenting Velox CI Builds

Velox CI builds were instrumented in Conbench so that it can capture various metrics about the builds:

  1. Build times at translation unit / library/ project level.
  2. Binary sizes produced at TLU/ .a,.so / executable level.
  3. Memory pressure
  4. Measure across time how our changes affect binary sizes

A nightly job is run to capture these build metrics and it is uploaded to Conbench. Velox build metrics report is available here: Velox Build Metrics Report

Acknowledgements

A large part of the credit goes to Jacob Wujciak and the team at Voltron Data. We would also like to thank other collaborators in the Open Source Community and at Meta, including but not limited to:

Meta: Sridhar Anumandla, Pedro Eugenio Rocha Pedreira, Deepak Majeti, Meta OSS Team, and others

Voltron Data: Jacob Wujciak, Austin Dickey, Marcus Hanwell, Sri Nadukudy, and others

Improving the Velox Build Experience

· 5 min read
Jacob Wujciak-Jens
Software Engineer @ Voltron Data
Raúl Cumplido
Software Engineer @ Voltron Data
Krishna Pai
Software Engineer @ Meta

When Velox was open sourced in August 2021, it was not nearly as easily usable and portable as it is today. In order for Velox to become the unified execution engine blurring the boundaries for data analytics and ML, we needed Velox to be easy to build and package on multiple platforms, and support a wide range of hardware architectures. If we are supporting all these platforms, we also need to ensure that Velox remains fast and regressions are caught early.

To improve the Velox experience for users and community developers, Velox has partnered with Voltron Data to help make Velox more accessible and user-friendly. In this blog post, we will examine the challenges we faced, the improvements that have already been made, and the ones yet to come.

Enhancements & Improvements

Velox was a product of the mono repo and required installation of dependencies on the system via a script. Any change in the state of the host system could cause a build failure and introduce version conflicts of dependencies. Fixing these challenges was a big focus to help the Velox Community and we worked in collaboration with the Voltron Data Team. We wanted to improve the overall Velox user experience by making Velox easy to consume across a wide range of platforms to accelerate its adoption.

We choose hermetic builds as a solution to the aforementioned problems, as they provide a number of benefits. Hermetic builds1 improve reproducibility by providing isolation from the state of the host machine and produce the same result for any given commit in the Velox repository. This requires precise dependency management.

The first major step in moving towards hermetic builds was the integration of a new dependency management system that is able to download, configure and build the necessary dependencies within the Velox build process. This new system also gives users the option to use already installed system dependencies. We hope this work will increase adoption of Velox in downstream projects and make troubleshooting of build issues easier, as well as improve overall reliability and stability.

We also wanted to lower the barrier to entry for contributions to Velox. Therefore, we created Docker Development images for both Ubuntu and CentOS, and we now publish them automatically when changes are merged. We hope this work will help speed up the development process by allowing developers to stand up a development environment quickly, without the requirement of installing third-party dependencies locally. We also use these images in the Velox CI to lower build times and speed up the feedback loop for proposing a PR.

# Run the development image from the root of the Velox repository
# to build and test Velox
docker compose run --rm ubuntu-cpp

An important non-technical improvement is the introduction of new issue templates and utility scripts. These will help guide troubleshooting and getting support from the relevant Velox developers via Github. This helps to improve the experience for the community and make it easier for users and contributors to get help and support when they need it.

Lastly, we implemented new nightly builds to increase the overall reliability and stability of Velox, as well as test the integration with downstream community projects.

To enable easy access to Velox from Python, we built CI infrastructure to generate and publish pre-built binary wheels for PyVelox (the Velox Python Bindings). We also improved Conda support by contributing to upstream feedstocks.

# Try PyVelox today!
pip install pyvelox

Future Goals

We will continue the work of moving all dependencies to the new dependency management system to move closer to hermetic builds and make development and usage as smooth as possible.

In the same theme, the next major goal is the refactoring of the existing CMake build system to use a target based "modern" style. This will allow us to properly install Velox as a system library to be used by other projects. This project will improve the development experience overall by creating a stable, reliable build infrastructure, but also allows us to publish Velox as a conda-forge package and make it easier to further improve support for non x86_64 architectures like Apple Silicon, arm64 systems, various compilers and older CPUs that don’t support the currently obligatory instructions sets like BMI2 and make Velox available to an even larger community.

Confidence in the stability and reliability of a project are key when you want to deploy it as part of your stack. Therefore, we are working on a release process and versioning scheme for Velox so that you can deploy with confidence!

In conclusion, the collaboration between Velox and Voltron Data has led to several key improvements in Velox's packaging and CI. Setting up a new environment with Velox has never been this easy! With the new improvements, this new broader community of developers and contributors can expect a smoother and more user-friendly experience when using Velox. The Velox team is continuously working towards further improving the developer and user experience, and we invite you to join us in building the next generation unified execution engine!

Footnotes

  1. Hermeticity - why hermetic builds are recommended