Skip to content

libn/d/overlay: calculate SPI like older engines#51951

Merged
vvoland merged 1 commit intomoby:masterfrom
corhere:fix-overlay-spi-hash
Jan 29, 2026
Merged

libn/d/overlay: calculate SPI like older engines#51951
vvoland merged 1 commit intomoby:masterfrom
corhere:fix-overlay-spi-hash

Conversation

@corhere
Copy link
Contributor

@corhere corhere commented Jan 27, 2026

- What I did

The Security Parameter Index value signals to the recipient which key to decrypt the packet with. The overlay driver derives the SPI value for a flow from a hash digest of the source and destination IP addresses. The source and destination need to derive the same digest given the same information as the SPI values are not signaled over the overlay driver's control plane. Refactoring the overlay driver to use netip types accidentally changed the hash function to digest IPv4 addresses in 4-byte form, causing newer engines to calculate a different SPI value for a flow than older engines would. Restore the original calculation by hashing IPv4 addresses in their 16-byte form, and refactor the buildSPI function to take netip.Addr parameters to prevent 16-byte vs 4-byte mixups from being possible in the future.

- How I did it

While it would be straightforward to get the receiving side to decrypt packets tagged with both possible SPI values concurrently by programming two sets of states into the kernel, there is no easy solution for the sending side. The sender would need to know which algorithm each recipient is using to calculate its SPIs so that it can pick the same SPI to program the kernel to transmit with. Or it could transmit every packet twice. This may be possible to do so, e.g. by looking up the Engine version in Swarm's node inventory, but it would be a significant amount of work. Since we recommend against running Swarm clusters with mixed engine versions and only provide best-effort support, YAGNI. Mixed version clusters should be a transient condition which only occurs during a rolling upgrade, so whatever heroics would be needed to get the latest engine to pass encrypted overlay traffic with engine versions that use the bugged SPI calculation would only be active for that one single maintenance window where degraded availability is already expected.

- How to verify it

  1. Create a Swarm cluster with an engine running this code and a v28.5.2 engine.
  2. Create an encrypted overlay network.
  3. Start a container on each of the nodes, attached to the encrypted overlay network.
  4. One container should be able to ping the other over the encrypted overlay network.

- Human readable description for the release notes

- Fix encrypted overlay networks not passing traffic to containers on v28 and older Engines. Encrypted overlay networks will no longer pass traffic to containers on v29.2.0 thru v29.0.0, v28.2.2, v25.0.14 or v25.0.13.

- A picture of a cute animal (not mandatory but encouraged)

The Security Parameter Index value signals to the recipient which key to
decrypt the packet with. The overlay driver derives the SPI value for a
flow from a hash digest of the source and destination IP addresses. The
source and destination need to derive the same digest given the same
information as the SPI values are not signaled over the overlay driver's
control plane. Refactoring the overlay driver to use netip types
accidentally changed the hash function to digest IPv4 addresses in
4-byte form, causing newer engines to calculate a different SPI value
for a flow than older engines would. Restore the original calculation
by hashing IPv4 addresses in their 16-byte form, and refactor the
buildSPI function to take netip.Addr parameters to prevent 16-byte vs
4-byte mixups from being possible in the future.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere corhere force-pushed the fix-overlay-spi-hash branch from 7b430df to 51664a2 Compare January 27, 2026 23:32
@thaJeztah
Copy link
Member

GHA oddness;

/usr/bin/docker buildx bake https://github.com/moby/moby.git#refs/pull/51951/merge --allow fs=* --set *.platform=linux/arm/v5 --metadata-file /home/runner/work/_temp/docker-actions-toolkit-Whd2BI/bake-metadata-61f172c9e0.json --set all.attest=type=provenance,mode=max,builder-id=https://github.com/moby/moby/actions/runs/21418435774/attempts/1 all
#0 building with "builder-5bc76c5b-55bc-4b66-93c6-fd1edbb3e3ae" instance using docker-container driver

#1 [internal] load git source https://github.com/moby/moby.git#refs/pull/51951/merge
#1 0.080 fatal: could not read Username for 'https://github.com/': terminal prompts disabled
#1 ERROR: failed to fetch remote https://github.com/moby/moby.git: git stderr:
fatal: could not read Username for 'https://github.com/': terminal prompts disabled
: exit status 128
------
 > [internal] load git source https://github.com/moby/moby.git#refs/pull/51951/merge:
0.080 fatal: could not read Username for 'https://github.com/': terminal prompts disabled
------

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robmry @vvoland ptal

@vvoland vvoland modified the milestones: 29.3.0, 29.2.1 Jan 29, 2026
@vvoland vvoland merged commit 6772371 into moby:master Jan 29, 2026
221 of 225 checks passed
@corhere corhere deleted the fix-overlay-spi-hash branch January 29, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encrypted overlay network between nodes on 29.1.3 and 28.x versions doesn't pass traffic

3 participants