Skip to content

perf: Optimize to_char to allocate less, fix NULL handling#20635

Open
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/optimize-to-char
Open

perf: Optimize to_char to allocate less, fix NULL handling#20635
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/optimize-to-char

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Mar 1, 2026

Which issue does this PR close?

Rationale for this change

The current to_char implementation (both scalar and array paths) allocates a new string for every input row to hold the result of the ArrayFormatter. To produce the results, it uses StringArray::from, which copies.

We can do better by using a reusable buffer to store the result of ArrayFormatter, and then append to StringBuilder. We can then construct the final result values from the StringBuilder very efficiently. This yields a 10-22% improvement on the to_char microbenchmarks.

In the scalar path, we can do a bit better by having the ArrayFormatter write directly into the StringBuilder's buffer, which saves a copy.

In the array path, we can't easily do this, because to_char has some (dubious) logic to retry errors on Date32 inputs by casting to Date64 and retrying the ArrayFormatter. Since the StringBuilder is shared between all rows and there's no easy way to remove the partial write that might happen in the case of an error, we use the intermediate buffer here instead.

This PR also cleans up various code in to_char, and fixes a bug in NULL handling: in the array case, if the current data value is NULL but the format string is non-NULL, we incorrectly returned an empty string instead of NULL.

What changes are included in this PR?

  • Optimize to_char (scalar and array paths) as described above
  • Fix bug in NULL handling
  • Add SLT test case for NULL handling bug
  • Simplify and refactor various parts of to_char, particularly around error handling
  • Revise to_char benchmarks: the previous version tested to_char for Date32 inputs where the format string specifies a time value, which is an odd corner case. Also get rid of benchmarking the scalar-scalar case; this is very fast and occurs rarely in practice (will usually be constant-folded before query execution).

Are these changes tested?

Yes. Benchmarked and added new test.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 1, 2026
@neilconway neilconway changed the title perf: Optimize to_char to reduce allocations, fix NULL handling perf: Optimize to_char to allocate less, fix NULL handling Mar 1, 2026
@neilconway neilconway force-pushed the neilc/optimize-to-char branch from f56f972 to 0a1245c Compare March 1, 2026 23:11
@neilconway
Copy link
Contributor Author

Benchmarks:

  ┌───────────────────────────────┬───────────┬───────────────────┬────────┐
  │           Benchmark           │ Data type │      Format       │ Change │
  ├───────────────────────────────┼───────────┼───────────────────┼────────┤
  │ array_date_only_patterns_1000 │ Date32    │ array, date-only  │ -10%   │
  ├───────────────────────────────┼───────────┼───────────────────┼────────┤
  │ array_datetime_patterns_1000  │ Date64    │ array, datetime   │ -10%   │
  ├───────────────────────────────┼───────────┼───────────────────┼────────┤
  │ array_mixed_patterns_1000     │ Date64    │ array, mixed      │ -10%   │
  ├───────────────────────────────┼───────────┼───────────────────┼────────┤
  │ scalar_date_only_pattern_1000 │ Date32    │ scalar, date-only │ -22%   │
  ├───────────────────────────────┼───────────┼───────────────────┼────────┤
  │ scalar_datetime_pattern_1000  │ Date64    │ scalar, datetime  │ -17%   │
  └───────────────────────────────┴───────────┴───────────────────┴────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid per-row allocations in to_char

1 participant