Further Optimizing TRY_CAST and TRY

May 31, 2024 · 9 min read

Software Engineer @ Meta

TL;DR

Queries that use TRY or TRY_CAST may experience poor performance and high CPU usage due to excessive exception throwing. We optimized CAST to indicate failure without throwing and introduced a mechanism for scalar functions to do the same. Microbenchmark measuring worst case performance of CAST improved 100x. Samples of production queries show 30x cpu time improvement.

TRY and TRY(CAST)

TRY construct can be applied to any expression to suppress errors and turn them into NULL results. TRY_CAST is a version of CAST that suppresses errors and returns NULL instead.

For example, parse_datetime('2024-05-', 'YYYY-MM-DD') fails:

	 Invalid format: "2024-05-" is too short

, but TRY(parse_datetime('2024-05-', 'YYYY-MM-DD')) succeeds and returns NULL.

Similarly, CAST('foo' AS INTEGER) fails:

	Cannot cast 'foo' to INT

, but TRY_CAST('foo' AS INTEGER) succeeds and returns NULL.

TRY_CAST vs. TRY(CAST)

TRY can wrap any expression, so one can wrap CAST as well:

  TRY(CAST('foo' AS INTEGER))

Wrapping CAST in TRY is similar to TRY_CAST, but not equivalent. TRY_CAST suppresses only cast errors, while TRY suppresses any error in the expression tree.

For example, CAST(1/0 AS VARCHAR) fails:

  Division by zero

, TRY_CAST(1/0 AS VARCHAR) also fails:

  Division by zero

, but TRY(CAST(1/0 AS VARCHAR)) succeeds and returns NULL.

In this case, the error is generated by division operation (1/0). TRY_CAST cannot suppress that error, but TRY can. More generally, TRY(CAST(...)) suppresses all errors in all expressions that are evaluated to produce an input for CAST as well as errors in CAST itself, but TRY_CAST suppresses errors in CAST only.

What happens when many rows fail?

In most cases only a fraction of rows generates an error. However, there are queries where a large percentage of rows fail. In these cases, a lot of CPU time goes into handling exceptions.

For example, one Prestissimo query used 3 weeks of CPU time, 93% of which was spent processing try(date_parse(...)) expressions where most rows failed. Here is a profile for that query that shows that all the time went into stack unwinding:

This query processes 14B rows, ~70% of which fail in date_parse(...) function due to the date string being empty.

    presto> select try(date_parse('', '%Y-%m-%d'));
     _col0
    -------
     NULL
    (1 row)


    – TRY suppressed Invalid format: "" error and produced a NULL.

Velox tracks the number of suppressed exceptions per operator / plan node and reports these as numSilentThrow runtime stat. For this query, Velox reported 21B throws for a single FilterProject node that processed 14B rows. Before the optimizations, each failing row used to throw twice. An earlier blog post from Laith Sakka explains why. After the optimizations this query’s CPU time dropped to 17h: 30x difference from the original cpu time. Compared to Presto Java, this query uses 4x less cpu time (originally it used 6x more).

We observed similar issues with queries that use other functions that parse strings as well as casts from strings.

Solution

To avoid the performance penalty of throwing exceptions we need to report errors differently. Google’s Abseil library uses absl::Status to return errors from void functions and absl::StatusOr to return value or error from non-void functions. Arrow library has similar Status and Result. Our own Folly has folly::Expected. Inspired by these examples we introduced velox::Status and velox::Expected.

velox::Status holds a generic error code and an error message.

velox::Expected<T> is a typedef for folly::Expected<T, velox::Status>.

For example, a non-throwing modulo operation can be implemented like this:

  Expected<int> mod(int a, int b) {
    if (b == 0) {
      return folly::makeUnexpected(Status::UserError(“Division by zero”));
    }

    return a % b;
  }

Non-throwing Simple Functions

We extended the Simple Function API to allow authoring non-throwing scalar functions. The function author can now define a ‘call’ method that returns Status. Such a function can indicate an error by returning a non-OK status.

  Status call(result&, arg1, arg2,..)

These functions are still allowed to throw and exceptions will be handled properly, but not throwing improves performance of expressions that use TRY.

Modulo SQL function would look like this:

    template <typename TExec>
    struct NoThrowModFunction {
      VELOX_DEFINE_FUNCTION_TYPES(TExec);

      Status call(int64_t& result, const int64_t& a, const int64_t& b) {
        if (b == 0) {
          return Status::UserError("Division by zero");
        }

        result = a % b;
        return Status::OK();
      }
    };

We changed date_parse, parse_datetime, and from_iso8601_date Presto functions to use the new API and report errors without throwing.

Non-throwing Vector functions

Vector functions can implement non-throwing behavior by leveraging the new EvalCtx::setStatus(row, status) API. However, nowadays we expect virtually all functions to be written using Simple Function API.

Non-throwing CAST

CAST is complex. A single name refers to multiple dozen individual operations. The full matrix of supported conversions is available in the Velox documentation. Not all casts throw. For example, cast from an integer to a string does not throw. However, casts from strings may fail in multiple ways. A common failure scenario is cast from an empty string. Laith Sakka optimized this use case earlier.

> select cast('' as integer);
Cannot cast '' to INT

However, we are also seeing failures in casting non-empty strings and NaN floating point values to integers.

> select cast(nan() as bigint);
Unable to cast NaN to bigint

> select cast('123x' as integer);
Cannot cast '123x' to INT

CAST from string to integer and floating point value is implemented using folly::to template. Luckily there is a non-throwing version: folly::tryTo. We changed our CAST implementation to use folly::tryTo to avoid throwing. Not throwing helped improve performance of TRY_CAST by 20x.

Still, the profile showed that there is room for further improvement.

Do not produce or store error messages under TRY

After switching to non-throwing implementation, the profile showed that half the cpu time went into folly::makeConversionError. folly::tryTo returns result or ConversionCode enum. CAST uses folly::makeConversionError to convert ConversionCode into a user-friendly error message. This involves allocating and populating a string for the error message, copying it into the std::range_error object, then copying it again into Status. This error message is very helpful if it is being propagated all the way to the user, but it is not needed if the error is suppressed via TRY or TRY_CAST.

To solve this problem we introduced a thread-local flag, threadSkipErrorDetails, that indicates whether Status needs to include a detailed error message or not. By default, this flag is ‘false’, but TRY and TRY_CAST set it to ‘true’. CAST logic checks this flag to decide whether to call folly::makeConversionError or not. This change gives a 3x performance boost to TRY_CAST and 2x to TRY.

    if (threadSkipErrorDetails()) {
      return folly::makeUnexpected(Status::UserError());
    }

    return folly::makeUnexpected(Status::UserError(
        "{}", folly::makeConversionError(result.error(), "").what()));

After this optimization, we observed that TRY(CAST(...)) is up to 5x slower than TRY_CAST when many rows fail.

The profile revealed that 30% of cpu time went to EvalCtx::ensureErrorsVectorSize. For every row that fails, we call EvalCtx::ensureErrorsVectorSize to resize the error vector to accommodate that row. When many rows fail we end up resizing a lot: resize(1), resize(2), resize(3),...resize(n). We fixed this by pre-allocating the error vector in the TRY expression.

Another 30% of cpu time went into managing reference counts for std::shared_ptr<std::exception_ptr> stored in the errors vector. We do not need error details for TRY, hence, no need to store these values. We fixed this by making error values in error vector optional and updating EvalCtx::setStatus to skip writing these under TRY.

After all these optimizations, the microbenchmark that measures performance of casting invalid strings into integers showed 100x improvement. The benchmark evaluates 4 expressions:

TRY_CAST(‘’ AS INTEGER)
TRY(CAST(‘’ AS INTEGER))
TRY_CAST(‘$’ AS INTEGER)
TRY(CAST(‘$’ AS INTEGER))

When we started, the benchmark results were:

===============================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
================================================================
cast##try_cast_invalid_empty_input                          2.40ms    417.47
cast##tryexpr_cast_invalid_empty_input                    402.63ms      2.48
cast##try_cast_invalid_nan                                392.14ms      2.55
cast##tryexpr_cast_invalid_nan                            827.09ms      1.21

At the end the numbers improved 100x:

cast##try_cast_invalid_empty_input                          2.16ms    463.62
cast##tryexpr_cast_invalid_empty_input                      4.29ms    232.95
cast##try_cast_invalid_nan                                  5.47ms    182.83
cast##tryexpr_cast_invalid_nan                              7.76ms    128.81

Note: The performance of TRY_CAST(‘’ AS INTEGER) hasn’t changed because this particular use case has been optimized by Laith Sakka earlier.

Next steps

We can identify queries with a high percentage of numSilentThrow rows and change throwing functions to not throw.

For simple functions this involves changing the ‘call’ method to return Status and replacing ‘throw’ statements with return Status::UserError(...). You get extra points for producing error messages conditionally based on thread-local flag threadSkipErrorDetails().

template <typename TExec>
struct NoThrowModFunction {
  VELOX_DEFINE_FUNCTION_TYPES(TExec);

  Status call(int64_t& result, const int64_t& a, const int64_t& b) {
    if (b == 0) {
      If (threadSkipErrorDetails()) {
          return Status::UserError();
      }
      return Status::UserError("Division by zero");
    }

    result = a % b;
    return Status::OK();
  }
};

We are changing CAST(varchar AS date) to not throw.

We provided a non-throwing ‘call’ API for simple functions that never return a NULL for a non-NULL input. This covers the majority of Presto functions. For completeness, we would want to provide non-throwing ‘call’ APIs for all other use cases:

bool call() for returning NULL sometimes
callAscii for processing all-ASCII inputs
callNullable for processing possibly NULL inputs
callNullFree for processing complex inputs with all NULLs removed.

Acknowledgements

Thank you Laith Sakka for doing the initial work to investigate and optimize TRY_CAST for empty strings and sharing your findings in a blog post.

Thank you Orri Erling for adding numSilentThrow runtime stat to report number of suppressed exceptions.

Thank you Pedro Eugenio Rocha Pedreira for introducing the velox::Status class.

Thank you Bikramjeet Vig, Jimmy Lu, Orri Erling, Pedro Eugenio Rocha Pedreira and Xiaoxuan Meng for brainstorming and helping with code reviews.

TL;DR​

TRY and TRY(CAST)​

TRY_CAST vs. TRY(CAST)​

What happens when many rows fail?​

Solution​

Non-throwing Simple Functions​

Non-throwing Vector functions​

Non-throwing CAST​

Do not produce or store error messages under TRY​

Next steps​

Acknowledgements​