Skip to main content

2 posts tagged with "build"

View All Tags

Velox switches to C++20 standard

· 2 min read
Christian Zentgraf
Software Engineer @ IBM

Background

The C++ standard used in a project determines what built-in functionality developers can use to ease development and extended capabilities.

Since its inception in August of 2021 Velox used the C++17 standard.

Benefits

Switching to C++20 contributes to having a modern ecosystem that is attractive in use, to develop in, and to maintain. Going forward Velox is looking to enhance the codebase by making use of newer compiler functionalities, such as sanitization checks, by also moving to support newer compiler versions for both GCC and Clang.

Changes

New minimum compiler versions

This change also changes the minimum required compiler versions for GCC and CLang. The following minimum versions are now required:

  • GCC11 and later
  • CLang 15 and later

C++20 major new features relevant to Velox

  • coroutines language feature
  • modules language feature
  • Calendar and Timezone library <chrono>
  • constraints and concepts language feature
  • 3-way operator language feature
  • Ranges library feature

Please refer to the C++20 standard for a complete list of new features.

The minimum targeted compiler versions support

  • coroutines
  • Calendar and Timezone library <chrono>

Newer versions of compilers implement more and more and C++20 features and library support. Supporting Velox on newer compiler versions is a continuous effort.

Lessons

There was some interesting behavior by the compilers. Changing the C++20 standard caused some compile errors in the existing code. One of these errors was caused by a compiler issue itself.

In GCC 12.2.1, which is used in the CI, the std::string + operator implementation ran into a known issue causing a warning. Because all warnings are errors we had to explictly add an exemption for this particlar compiler version.

In general, however, the found compile errors were reasonably easy to fix. Most of the changes were compatible with C++17 as well which means the code is bit more clean. However, one change caused slight trouble because it emitted warnings in C++17 causing build failures due to turning all warnigns into errors. This was the change to require this in the lambda capture where applicable. On the other hand, not addressing the changes to the lamda capture caused errors in C++20.

Overall, the switch to C++20 took some time but was not overly complicated. No changes to the CI pipelines used in the project were needed. It was limited to CMake and code changes.

SEGFAULT due to Dependency Update

· 4 min read
Deepak Majeti
Software Engineer @ IBM
Christian Zentgraf
Software Engineer @ IBM

Background

Velox depends on several libraries. Some of these dependencies include open-source libraries from Meta, including Folly and Facebook Thrift. These libraries are in active development and also depend on each other, so they all have to be updated to the same version at the same time.

Updating these dependencies typically involves modifying the Velox code to align with any public API or semantic changes in these dependencies. However, a recent upgrade of Folly and Facebook Thrift to version v2025.04.28.00 caused a SEGFAULT only in one unit test in Velox named velox_functions_remote_client_test.

Investigation

We immediately put on our gdb gloves and looked at the stack traces. This issue was also reproducible in a debug build. The SEGFAULT occurred in Facebook Thrift's ThriftServer Class during it's initialization but the offending call was invoking a destructor of a certain handler. However, the corresponding source code was pointing to an invocation of a different function. And this code was present inside a Facebook Thrift header called AsyncProcessor.h.

This handler (RemoteServer) was implemented in Velox as a Thrift definition. Velox compiled this thrift file using Facebook Thrift, and the generated code was using the ServerInterface class in Facebook Thrift. This ServerInterface class was further extended from both the AsyncProcessorFactory and ServiceHandlerBase interfaces in Facebook Thrift.

One of the culprits resulting in SEGFAULTs in the past was the conflict due to the usage of Apache Thrift and Facebook Thrift. However, this was not the root cause this time because we were able to reproduce this issue by just building the test without the Apache Thrift dependency installed. We were entering a new territory to investigate, and we were not sure where to start.

We then compiled an example called EchoService in Facebook Thrift that was very similar to the RemoteServer, and it worked. Then we copied and compiled the Velox RemoteServer in Facebook Thrift and that worked too! So the culprit was likely in the compilation flags, which likely differed between Facebook Thrift and Velox. We enabled the verbose logging for both builds and were able to spot one difference. We saw the GCC coroutines flag being used in the Facebook Thrift build.

We were also curious about the invocation of the destructor instead of the actual function. We put our gdb gloves back on and dumped the entire vtable for the RemoteServer class and its base classes. The vtables were different when it was built in Velox vs. Facebook Thrift. Specifically, the list of functions inherited from ServiceHandlerBase was different.

The vtable for the RemoteServer handler in the Velox build had the following entries:

folly::SemiFuture<folly::Unit> semifuture_onStartServing()
folly::SemiFuture<folly::Unit> semifuture_onStopRequested()
Thunk ServiceHandlerBase::~ServiceHandlerBase

The vtable for the RemoteServer handler in the Facebook Thrift build had the following entries:

folly::coro::Task<void> co_onStartServing()
folly::coro::Task<void> co_onStopRequested()
folly::SemiFuture<folly::Unit> semifuture_onStartServing()
folly::SemiFuture<folly::Unit> semifuture_onStopRequested()
Thunk ServiceHandlerBase::~ServiceHandlerBase

Tying up both pieces of evidence, we could conclude that Velox generated a different vtable structure compared to what Facebook Thrift (and thus ThriftServer) was built with. Looking around further, we noticed that the ServiceHandlerBase was conditionally adding functions based on the coroutines compile flag that influences the FOLLY_HAS_COROUTINES macro from the portability header.

class ServiceHandlerBase {
....
public:
#if FOLLY_HAS_COROUTINES
virtual folly::coro::Task<void> co_onStartServing() { co_return; }
virtual folly::coro::Task<void> co_onStopRequested() {
throw MethodNotImplemented();
}
#endif

virtual folly::SemiFuture<folly::Unit> semifuture_onStartServing() {
#if FOLLY_HAS_COROUTINES
return co_onStartServing().semi();
#else
return folly::makeSemiFuture();
#endif
}

As a result, the ThriftServer would access an incorrect function (~ServiceHandlerBase destructor at offset 3 in the first vtable above) instead of the expected initialization function (semifuture_onStartServing at offset 3 in the second vtable above), thus resulting in a SEGFAULT. We recompiled the Facebook Thrift dependency for Velox with the coroutines compile flag disabled, and the test passed.