make Coder.getEncodedElementByteSize public to allow performance improvements in higher level coders by stankiewicz · Pull Request #33626 · apache/beam

stankiewicz · 2025-01-16T20:37:02Z

Half of the sdk coders have getEncodedElementByteSize public because they are invoked outside of beam.sdk.coders package.
Some accumulators like ComposedAccumulatorCoder should have access to optimized methods for calculating size but due to it can be any Coder.
Optimized way of calculating size shouldn't be limited to Coders implemented in beam.sdk.coders only.
This change changes it to public and fixes all the coders in beam sdk.

This will break coders implemented in customer codebase requiring small patch to change visibility.

stankiewicz · 2025-01-17T07:01:43Z

Alternative to breaking change would be keeping things as is and moving problematic accumulator to sdk but it won't help if someone else writes problematic coder outside of beam.

github-actions · 2025-01-17T07:07:14Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.
R: @chamikaramj for label io.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

stankiewicz · 2025-01-17T12:37:26Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFns.java

+      long size = 0;
+      for (int i = 0; i < value.length; ++i) {
+        Coder<Object> objectCoder = coders.get(i);
+        if (objectCoder instanceof StructuredCoder) {


I've seen this type of conditionhere. StructuredCoder doesn't enforce overriding getEncodedElementByteSize so I'm not sure if this is best practice. Ideally I would like to loop and invoke this method for every coder (extending CustomCoder/StructuredCoder/Coder) on the list. If underlying coder has default implementation, it will just serialize as is.

I would remove the typecheck on StructuredCoder, just loop and add for every coder. The only downside is that there may be overhead constructing multiple CountingOutputStreams rather than creating one, but as you say there's no contract between StructuredCoder and implementing getEncodedElementByteSize.

Go ahead and remove the check in LengthPrefixCoder too, there's no benefit for ever skipping this attempt.

@robertwb thanks, changed.

robertwb

I am a bit concerned about the backwards incompatibility implications of changing the visibility here. If we do need to call it from outside a Coder (which seems OK to me), what if we instead make a public static method on Coder that calls this?

robertwb · 2025-01-18T01:35:58Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFns.java

+      long size = 0;
+      for (int i = 0; i < value.length; ++i) {
+        Coder<Object> objectCoder = coders.get(i);
+        if (objectCoder instanceof StructuredCoder) {


I would remove the typecheck on StructuredCoder, just loop and add for every coder. The only downside is that there may be overhead constructing multiple CountingOutputStreams rather than creating one, but as you say there's no contract between StructuredCoder and implementing getEncodedElementByteSize.

Go ahead and remove the check in LengthPrefixCoder too, there's no benefit for ever skipping this attempt.

stankiewicz · 2025-01-18T22:48:57Z

I am a bit concerned about the backwards incompatibility implications of changing the visibility here. If we do need to call it from outside a Coder (which seems OK to me), what if we instead make a public static method on Coder that calls this?

added static method, thanks for suggestions!

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java

codecov · 2025-01-24T16:17:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.46%. Comparing base (b5fa883) to head (e490bed).
Report is 68 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #33626   +/-   ##
=========================================
  Coverage     57.46%   57.46%           
  Complexity     1474     1474           
=========================================
  Files           984      984           
  Lines        155676   155676           
  Branches       1076     1076           
=========================================
  Hits          89463    89463           
  Misses        64005    64005           
  Partials       2208     2208

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

My change in Coder apache#33626 has introduced breaking change leading to Coder serial version to change. This mostly affects Coders extending CustomCoder and Coder.

* Add 2.63 breaking incompatibility to CHANGES.md My change in Coder #33626 has introduced breaking change leading to Coder serial version to change. This mostly affects Coders extending CustomCoder and Coder. * Update CHANGES.md * Update CHANGES.md

github-actions bot added java io extensions runners dataflow gcp fn-execution protobuf sketching labels Jan 16, 2025

stankiewicz marked this pull request as ready for review January 17, 2025 06:58

github-actions bot added the Next Action: Reviewers label Jan 17, 2025

stankiewicz commented Jan 17, 2025

View reviewed changes

stankiewicz changed the title ~~make Coder.getEncodedElementByteSize public.~~ make Coder.getEncodedElementByteSize public to allow performance improvements in higher level coders Jan 17, 2025

robertwb reviewed Jan 18, 2025

View reviewed changes

stankiewicz force-pushed the make_getEncodedElementByteSize_public branch from fc7d114 to 880563a Compare January 18, 2025 21:56

github-actions bot added io extensions runners dataflow gcp and removed io extensions runners dataflow gcp fn-execution protobuf sketching labels Jan 18, 2025

github-actions bot added dataflow gcp fn-execution protobuf sketching labels Jan 18, 2025

rename

e490bed

github-actions bot added io extensions runners dataflow gcp fn-execution protobuf sketching and removed io extensions runners dataflow gcp fn-execution protobuf sketching labels Jan 18, 2025

robertwb approved these changes Jan 24, 2025

View reviewed changes

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java Show resolved Hide resolved

robertwb merged commit 2fc7646 into apache:master Jan 24, 2025
23 checks passed

This was referenced May 13, 2025

Add 2.63 breaking incompatibility to CHANGES.md #34931

Merged

[Bug]: Breaking change in Coder introduced in 2.63 is leading to incompatiblities #34933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make Coder.getEncodedElementByteSize public to allow performance improvements in higher level coders#33626

make Coder.getEncodedElementByteSize public to allow performance improvements in higher level coders#33626
robertwb merged 2 commits intoapache:masterfrom
stankiewicz:make_getEncodedElementByteSize_public

stankiewicz commented Jan 16, 2025

Uh oh!

stankiewicz commented Jan 17, 2025

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

stankiewicz Jan 17, 2025 •

edited

Loading

Uh oh!

robertwb Jan 18, 2025

Uh oh!

stankiewicz Jan 18, 2025

Uh oh!

robertwb left a comment

Uh oh!

robertwb Jan 18, 2025

Uh oh!

stankiewicz commented Jan 18, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stankiewicz commented Jan 16, 2025

Uh oh!

stankiewicz commented Jan 17, 2025

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

stankiewicz Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertwb Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

stankiewicz Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

robertwb left a comment

Choose a reason for hiding this comment

Uh oh!

robertwb Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

stankiewicz commented Jan 18, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 24, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stankiewicz Jan 17, 2025 •

edited

Loading