perf: Optimize to_char to allocate less, fix NULL handling#20635
Open
neilconway wants to merge 2 commits intoapache:mainfrom
Open
perf: Optimize to_char to allocate less, fix NULL handling#20635neilconway wants to merge 2 commits intoapache:mainfrom
to_char to allocate less, fix NULL handling#20635neilconway wants to merge 2 commits intoapache:mainfrom
Conversation
to_char to reduce allocations, fix NULL handlingto_char to allocate less, fix NULL handling
f56f972 to
0a1245c
Compare
Contributor
Author
|
Benchmarks: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
to_char#20634.Rationale for this change
The current
to_charimplementation (both scalar and array paths) allocates a new string for every input row to hold the result of theArrayFormatter. To produce the results, it usesStringArray::from, which copies.We can do better by using a reusable buffer to store the result of
ArrayFormatter, and then append toStringBuilder. We can then construct the final result values from theStringBuildervery efficiently. This yields a 10-22% improvement on theto_charmicrobenchmarks.In the scalar path, we can do a bit better by having the
ArrayFormatterwrite directly into theStringBuilder's buffer, which saves a copy.In the array path, we can't easily do this, because
to_charhas some (dubious) logic to retry errors onDate32inputs by casting toDate64and retrying the ArrayFormatter. Since theStringBuilderis shared between all rows and there's no easy way to remove the partial write that might happen in the case of an error, we use the intermediate buffer here instead.This PR also cleans up various code in
to_char, and fixes a bug inNULLhandling: in the array case, if the current data value is NULL but the format string is non-NULL, we incorrectly returned an empty string instead of NULL.What changes are included in this PR?
to_char(scalar and array paths) as described aboveto_char, particularly around error handlingto_charbenchmarks: the previous version testedto_charforDate32inputs where the format string specifies a time value, which is an odd corner case. Also get rid of benchmarking the scalar-scalar case; this is very fast and occurs rarely in practice (will usually be constant-folded before query execution).Are these changes tested?
Yes. Benchmarked and added new test.
Are there any user-facing changes?
No.