Wait for container dependencies upon daemon start up by Adrien-Atmosphere · Pull Request #50327 · moby/moby

Adrien-Atmosphere · 2025-07-04T16:03:02Z

Fixes: Containers with "restart: always" and shared network namespace may fail to start with "cannot join network namespace of a non running container" #50326

- What I did
Wait for container dependencies upon daemon start up

- How I did it

List all dependent containers upon registration
Dependent containers can be either from legacy link or container network
Wait on a best effort basis for the dependent containers

- How to verify it

Create a docker-dind container with bind mounted /var/lib/docker :
docker run -d --privileged --name docker -v ./docker:/var/lib/docker docker-dind:latest --debug
Launch a compose application that creates 500 containers in docker-dind image to maximize the chances to encounter the issue

services:
  net:
    image: alpine
    command: sleep infinity
    restart: always

  app:
    image: alpine
    command: sleep infinity
    network_mode: "service:net"
    restart: always
    scale: 500

Stop and restart the docker-dind container and check for stopped containers

CONTAINER=docker
EXEC="docker exec $CONTAINER"
while true;
do 
	echo "Stopping $CONTAINER"
	docker stop $CONTAINER
	echo "Removing $CONTAINER" 
	docker rm -v $CONTAINER
	echo "Starting $CONTAINER"
        docker run -d --privileged --name docker -v ./docker:/var/lib/docker docker-dind:latest --debug
	timeout 30 bash -c "until $EXEC docker info >/dev/null 2>&1; do sleep 1; done"
	echo "$CONTAINER started successfully"

	containers=$($EXEC docker ps -aq   --filter status=created   --filter status=exited   --filter status=dead)
        if [ -n "$containers" ]
	then 
    		echo "Inspecting stopped/exited/dead containers:"
    		for i in $containers; do
		      echo "Container ID: $i"
		      $EXEC docker inspect "$i" -f "{{ .State | json }}" | jq
	        done
		exit 
	else
		echo "No created/exited/dead containers found."
	fi
done

- Human readable description for the release notes

- On daemon startup, restart containers that share their network stacks before containers that need those stacks.

- A picture of a cute animal (not mandatory but encouraged)

thaJeztah · 2025-07-04T18:14:21Z

Thanks for contributing. This may need some review/discussion; I'm a bit concerned about possible consequences of this - at least it could use more eyes.

Currently, we create somewhat of a hard dependency when joining namespaces with another container; while this "works", it's also causing issues (like the one this PR is trying to address). Ultimately, we should try to make things more dynamic and resilient against situations where (e.g.) the linked container is restarted, removed, fails to start, or otherwise; we could still have some dependency, but ideally, it would be more dynamic.

For the legacy-links, it's definitely something we want to move away from; it's a broken concept, has security implications, and has many limitations and side-effects (also see #22295); non-legacy (DNS-based) links are more flexible in this respect.

The --network=container:XX is something we should look at, and possibly re-architect (I recall I brought this idea up in some discussions we had, but never wrote it up; I'll probably write up something in a separate ticket for further discussion).

Effectively, --network=container:XX indicates that a container wants to share its networking-namespace with an existing network-namespace; in this case a namespace that was created as part of another container.

container-A is created, and a network namespace is created for it
container-B -> network=container-A (IOW: "share network-namespace with container A") joins the network namespace that was created for container-A
container-A and container-B now both use the same network namespace

For convenience (and UX), it uses the --network flag, but to some extent it's a bit of an odd-one-out; for other namespaces, we use a flag more specific to "namespace". From the docker run --help output;

--pid string                       PID namespace to use
--userns string                    User namespace to use
--uts string                       UTS namespace to use

While the original "donor" container is required when creating the container to join its networking namespace, ultimately that's only when creating the container. Currently we still consider that namespace to be owned by the original container, but I think we should consider it more closely to a "co-ownership". Both containers share the same namespace, and we can consider the namespace to be no longer needed once the last container is gone.

If we take that approach, things become more similar to, for example, --volumes-from;

--volumes-from list                Mount volumes from the specified container(s)

Similar to --network=container:XXX, the container using the --volumes-from option needs the donor container to exist when creating. Volumes are created (or defined) on the donor's container, but

docker run -d -v shared:/foo --name container-a nginx:alpine
docker run -d --volumes-from=container-a --name container-b nginx:alpine

But once created, there's no longer a hard dependency; container-b now just shares the same volume(s);

docker system df --verbose
...
Local Volumes space usage:

VOLUME NAME                                                        LINKS     SIZE
docker-dev-cache                                                   1         2.209GB
shared                                                             2         0B
acd3650aa324548e56d29276001b13c8ad7e493a881b63106aaec0b6be542e9d   0         3.997MB

The original "donor" container (container-a) can be removed without impacting container-b;

docker container rm -f container-a
container-a

docker system df --verbose
...
Local Volumes space usage:

VOLUME NAME                                                        LINKS     SIZE
docker-dev-cache                                                   1         2.209GB
shared                                                             1         0B
acd3650aa324548e56d29276001b13c8ad7e493a881b63106aaec0b6be542e9d   0         3.997MB

I'm wondering if we could get to a similar state for networking; the devil may be in the detail though!

Adrien-Atmosphere · 2025-07-07T08:25:40Z

Thank you for your feedback. I thought it could be useful to have a simple "wait on dependent containers" functionality to have the possibility in the future to add a flag "--depends_on" like in docker compose.
I understand that you would prefer to rework completely the container network namespace sharing functionality. In that case, adding the possibility to make them work with linux "ip netns" would be great.

robmry · 2025-07-16T09:02:37Z

I'll take a more detailed look at the code if we can agree this change is needed - but, I think it is ...

@thaJeztah - as you say, un-hooking netns's is probably a separate discussion ... but, as you also say, the container that shares its netns would need to start first - even if we allow it to exit before its dependents and leave its netns behind. (Maybe the first container has CAP_NET_ADMIN and needs to do some setup before other containers arrive.) So, we'd still need restart dependencies like this? (At least on initial startup, maybe we could allow a restart without the dependency if another container sharing the netns has held it open - but that's about design for a future feature, and this change wouldn't hold it back.)

At the moment - if containers with netns dependencies don't necessarily restart when they should, that's a bug. Even if we want to re-design the way container-networking works in future, it should work properly until then. The mechanism for queuing container-starts exists, this change adds a missing dependency. I think it makes sense?

akerouanton · 2025-07-16T10:21:08Z

I think this PR makes sense overall. As @robmry outlined, whether we want to rearchitect --network=container:… is a separate matter, and until we settle on that, we should treat the restart scenario as a bug that should be fixed.

robmry · 2025-07-16T11:08:43Z

(The Windows test failures are unrelated to this PR.)

corhere

I agree that this change makes a lot of sense. The PR could use another iteration, though.

corhere · 2025-07-16T16:02:57Z

daemon/container.go

+	// It is used to determine which containers need to be restarted when the given container is restarted.
+	// It returns a list of containers that depend on the given container.
+	// Upon error, it returns the last known dependent containers, which may be empty.
+	dependentContainers := make([]*container.Container, 0)


Nit: https://go.dev/wiki/CodeReviewComments#declaring-empty-slices

Suggested change

dependentContainers := make([]*container.Container, 0)

var dependentContainers []*container.Container

corhere · 2025-07-16T16:11:22Z

daemon/container.go

+	if c.HostConfig.Links != nil {
+		for _, link := range c.HostConfig.Links {
+			name, _, err := opts.ParseLink(link)
+			if err != nil {
+				log.G(context.TODO()).WithError(err).Errorf("Could not parse link %s for %s", link, c.ID)
+				return dependentContainers
+			}
+			child, err := daemon.GetContainer(name)
+			if err != nil {
+				if cerrdefs.IsNotFound(err) {
+					// Trying to link to a non-existing container is not valid, and
+					// should return an "invalid parameter" error. Returning a "not
+					// found" error here would make the client report the container's
+					// image could not be found (see moby/moby#39823)
+					err = errdefs.InvalidParameter(err)
+				}
+				log.G(context.TODO()).WithError(err).Errorf("Could not get container for %s", name)
+				return dependentContainers
+			}
+
+			dependentContainers = append(dependentContainers, child)
+		}
+	}
+	return dependentContainers
+


There's a more efficient way to do this.

Suggested change

if c.HostConfig.Links != nil {

for _, link := range c.HostConfig.Links {

name, _, err := opts.ParseLink(link)

if err != nil {

log.G(context.TODO()).WithError(err).Errorf("Could not parse link %s for %s", link, c.ID)

return dependentContainers

}

child, err := daemon.GetContainer(name)

if err != nil {

if cerrdefs.IsNotFound(err) {

// Trying to link to a non-existing container is not valid, and

// should return an "invalid parameter" error. Returning a "not

// found" error here would make the client report the container's

// image could not be found (see moby/moby#39823)

err = errdefs.InvalidParameter(err)

}

log.G(context.TODO()).WithError(err).Errorf("Could not get container for %s", name)

return dependentContainers

}

dependentContainers = append(dependentContainers, child)

}

}

return dependentContainers

return append(dependentContainers, maps.Values(daemon.linkIndex.children(c)))

corhere · 2025-07-16T16:15:48Z

daemon/container.go

+	// This function is used to retrieve the containers that depend on the given container.
+	// It is used to determine which containers need to be restarted when the given container is restarted.
+	// It returns a list of containers that depend on the given container.
+	// Upon error, it returns the last known dependent containers, which may be empty.


How come this isn't part of the function's doc comment?

corhere · 2025-07-16T16:23:29Z

daemon/daemon.go

-			// (legacy links) to be running before we try to start the container
-			if children := daemon.linkIndex.children(c); len(children) > 0 {
+			// (legacy links or container network) to be running before we try to start the container
+			if children := dependentContainers[c]; len(children) > 0 {


Accumulating the dependent containers into a map doesn't really afford us anything, at the cost of some memory.

Suggested change

if children := dependentContainers[c]; len(children) > 0 {

if children := daemon.GetDependentContainers(c); len(children) > 0 {

robmry

Looks good - thank you ... but please could you squash the commits? Looks like it needs a rebase too, just a conflict in imports.

robmry · 2025-07-17T15:44:28Z

Looks like something odd happened in the rebase, an already merged commit ended up in this PR?

Adrien-Atmosphere · 2025-07-17T15:56:12Z

Thanks, I didn't notice

corhere

LGTM!

robmry · 2025-07-17T16:28:42Z

There's a test failure in https://github.com/moby/moby/actions/runs/16349928056/job/46194686893?pr=50327 ...

# github.com/docker/docker/daemon
../../daemon/container.go:223:52: implicit function instantiation requires go1.18 or later (-lang was set to go1.16; check go.mod)

That means we need this comment at the top of daemon/container.go (because of the templated slices/maps functions) ...

// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23

robmry · 2025-07-17T16:29:55Z

I've added a changelog comment to the description - hopefully it makes sense?!

Adrien-Atmosphere · 2025-07-17T16:46:58Z

For the comment at the top of container.go, do I make a separate commit or do I squash them?

robmry · 2025-07-17T16:47:56Z

For the comment at the top of container.go, do I make a separate commit or do I squash them?

Squash please!

robmry · 2025-07-17T17:30:21Z

Sorry this is getting a bit tedious - nearly there, but ...

daemon/container.go:1:1: File is not properly formatted (gofmt)
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:

I think it wants a blank like after the build directive ...

// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23

package daemon

Adrien-Atmosphere · 2025-07-17T17:34:20Z

I just did a gofmt -s and it reversed both comments like so :

//go:build go1.23

// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:

I think I should push this, no?

robmry · 2025-07-17T17:39:34Z

It should be ok the original way around - we've got quite a few of these now (and should be able to remove them soon), so it's best if they're all the same. For example ...

moby/daemon/create.go

Lines 1 to 5 in c616e76

    
           // TODO(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16: 
        
           //go:build go1.23 
        
           package daemon

(Oh, except I see we have a mix of TODO and FIXME! Either is ok, most are FIXME.)

- Get dependent containers before starting containers - Dependent containers can be either from legacy link or container network - Wait on a best effort basis for the dependent containers Fixes: moby#50326 Signed-off-by: Adrien Pompée <adrien.pompee@atmosphere.aero>

robmry

LGTM - thank you @Adrien-Atmosphere!

The BuildKit Windows test failure is unrelated ... I'll get this merged.

thaJeztah added this to 🔦 Maintainer spotlight Jul 10, 2025

github-project-automation bot moved this to New in 🔦 Maintainer spotlight Jul 10, 2025

thompson-shaun assigned robmry and corhere Jul 10, 2025

thompson-shaun requested review from corhere and robmry July 10, 2025 18:38

thompson-shaun moved this from New to Open in 🔦 Maintainer spotlight Jul 10, 2025

corhere reviewed Jul 16, 2025

View reviewed changes

Adrien-Atmosphere requested a review from corhere July 17, 2025 07:22

robmry reviewed Jul 17, 2025

View reviewed changes

Adrien-Atmosphere force-pushed the 50326-wait-for-dependent-containers branch from 3c948af to 3a3b9f9 Compare July 17, 2025 15:03

Adrien-Atmosphere force-pushed the 50326-wait-for-dependent-containers branch from 3a3b9f9 to 8f677ab Compare July 17, 2025 15:55

corhere approved these changes Jul 17, 2025

View reviewed changes

robmry added area/networking Networking impact/changelog area/daemon Core Engine kind/bugfix PR's that fix bugs labels Jul 17, 2025

robmry added this to the 29.0.0 milestone Jul 17, 2025

Adrien-Atmosphere force-pushed the 50326-wait-for-dependent-containers branch from 8f677ab to 2e492de Compare July 17, 2025 16:54

Adrien-Atmosphere force-pushed the 50326-wait-for-dependent-containers branch from 2e492de to d2f8e7b Compare July 17, 2025 16:59

Adrien-Atmosphere force-pushed the 50326-wait-for-dependent-containers branch from d2f8e7b to d4e026f Compare July 17, 2025 17:42

robmry approved these changes Jul 17, 2025

View reviewed changes

robmry merged commit 2f4f78b into moby:master Jul 17, 2025
227 of 232 checks passed

Adrien-Atmosphere deleted the 50326-wait-for-dependent-containers branch July 17, 2025 19:28

thompson-shaun moved this from Open to Complete in 🔦 Maintainer spotlight Jul 24, 2025

dmcgowan mentioned this pull request Aug 6, 2025

Prepare release notes for v2.0.0-alpha.0 #50651

Closed

dmcgowan mentioned this pull request Sep 5, 2025

Prepare release notes for v2.0.0-beta.0 #50918

Merged

	dependentContainers := make([]*container.Container, 0)
	var dependentContainers []*container.Container

	if children := dependentContainers[c]; len(children) > 0 {
	if children := daemon.GetDependentContainers(c); len(children) > 0 {

Conversation

Adrien-Atmosphere commented Jul 4, 2025 • edited by robmry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented Jul 4, 2025

Uh oh!

Adrien-Atmosphere commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robmry commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akerouanton commented Jul 16, 2025

Uh oh!

robmry commented Jul 16, 2025

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

corhere Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

corhere Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

corhere Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

corhere Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

robmry left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robmry commented Jul 17, 2025

Uh oh!

Adrien-Atmosphere commented Jul 17, 2025

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

robmry commented Jul 17, 2025

Uh oh!

robmry commented Jul 17, 2025

Uh oh!

Adrien-Atmosphere commented Jul 17, 2025

Uh oh!

robmry commented Jul 17, 2025

Uh oh!

robmry commented Jul 17, 2025

Uh oh!

Adrien-Atmosphere commented Jul 17, 2025

Uh oh!

robmry commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robmry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Adrien-Atmosphere commented Jul 4, 2025 •

edited by robmry

Loading

Adrien-Atmosphere commented Jul 7, 2025 •

edited

Loading

robmry commented Jul 16, 2025 •

edited

Loading

robmry left a comment •

edited

Loading

robmry commented Jul 17, 2025 •

edited

Loading