make Coder.getEncodedElementByteSize public to allow performance improvements in higher level coders#33626
Conversation
|
Alternative to breaking change would be keeping things as is and moving problematic accumulator to sdk but it won't help if someone else writes problematic coder outside of beam. |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| long size = 0; | ||
| for (int i = 0; i < value.length; ++i) { | ||
| Coder<Object> objectCoder = coders.get(i); | ||
| if (objectCoder instanceof StructuredCoder) { |
There was a problem hiding this comment.
I've seen this type of conditionhere. StructuredCoder doesn't enforce overriding getEncodedElementByteSize so I'm not sure if this is best practice. Ideally I would like to loop and invoke this method for every coder (extending CustomCoder/StructuredCoder/Coder) on the list. If underlying coder has default implementation, it will just serialize as is.
There was a problem hiding this comment.
I would remove the typecheck on StructuredCoder, just loop and add for every coder. The only downside is that there may be overhead constructing multiple CountingOutputStreams rather than creating one, but as you say there's no contract between StructuredCoder and implementing getEncodedElementByteSize.
Go ahead and remove the check in LengthPrefixCoder too, there's no benefit for ever skipping this attempt.
robertwb
left a comment
There was a problem hiding this comment.
I am a bit concerned about the backwards incompatibility implications of changing the visibility here. If we do need to call it from outside a Coder (which seems OK to me), what if we instead make a public static method on Coder that calls this?
| long size = 0; | ||
| for (int i = 0; i < value.length; ++i) { | ||
| Coder<Object> objectCoder = coders.get(i); | ||
| if (objectCoder instanceof StructuredCoder) { |
There was a problem hiding this comment.
I would remove the typecheck on StructuredCoder, just loop and add for every coder. The only downside is that there may be overhead constructing multiple CountingOutputStreams rather than creating one, but as you say there's no contract between StructuredCoder and implementing getEncodedElementByteSize.
Go ahead and remove the check in LengthPrefixCoder too, there's no benefit for ever skipping this attempt.
fc7d114 to
880563a
Compare
added static method, thanks for suggestions! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #33626 +/- ##
=========================================
Coverage 57.46% 57.46%
Complexity 1474 1474
=========================================
Files 984 984
Lines 155676 155676
Branches 1076 1076
=========================================
Hits 89463 89463
Misses 64005 64005
Partials 2208 2208 ☔ View full report in Codecov by Sentry. |
My change in Coder apache#33626 has introduced breaking change leading to Coder serial version to change. This mostly affects Coders extending CustomCoder and Coder.
* Add 2.63 breaking incompatibility to CHANGES.md My change in Coder #33626 has introduced breaking change leading to Coder serial version to change. This mostly affects Coders extending CustomCoder and Coder. * Update CHANGES.md * Update CHANGES.md
Fixes #33620
Half of the sdk coders have getEncodedElementByteSize public because they are invoked outside of beam.sdk.coders package.
Some accumulators like ComposedAccumulatorCoder should have access to optimized methods for calculating size but due to it can be any Coder.
Optimized way of calculating size shouldn't be limited to Coders implemented in beam.sdk.coders only.
This change changes it to public and fixes all the coders in beam sdk.
This will break coders implemented in customer codebase requiring small patch to change visibility.