Wait for container dependencies upon daemon start up#50327
Wait for container dependencies upon daemon start up#50327robmry merged 1 commit intomoby:masterfrom
Conversation
|
Thanks for contributing. This may need some review/discussion; I'm a bit concerned about possible consequences of this - at least it could use more eyes. Currently, we create somewhat of a hard dependency when joining namespaces with another container; while this "works", it's also causing issues (like the one this PR is trying to address). Ultimately, we should try to make things more dynamic and resilient against situations where (e.g.) the linked container is restarted, removed, fails to start, or otherwise; we could still have some dependency, but ideally, it would be more dynamic. For the legacy-links, it's definitely something we want to move away from; it's a broken concept, has security implications, and has many limitations and side-effects (also see #22295); non-legacy (DNS-based) links are more flexible in this respect. The Effectively,
For convenience (and UX), it uses the --pid string PID namespace to use
--userns string User namespace to use
--uts string UTS namespace to useWhile the original "donor" container is required when creating the container to join its networking namespace, ultimately that's only when creating the container. Currently we still consider that namespace to be owned by the original container, but I think we should consider it more closely to a "co-ownership". Both containers share the same namespace, and we can consider the namespace to be no longer needed once the last container is gone. If we take that approach, things become more similar to, for example, --volumes-from list Mount volumes from the specified container(s)Similar to docker run -d -v shared:/foo --name container-a nginx:alpine
docker run -d --volumes-from=container-a --name container-b nginx:alpineBut once created, there's no longer a hard dependency; container-b now just shares the same volume(s); docker system df --verbose
...
Local Volumes space usage:
VOLUME NAME LINKS SIZE
docker-dev-cache 1 2.209GB
shared 2 0B
acd3650aa324548e56d29276001b13c8ad7e493a881b63106aaec0b6be542e9d 0 3.997MBThe original "donor" container ( docker container rm -f container-a
container-a
docker system df --verbose
...
Local Volumes space usage:
VOLUME NAME LINKS SIZE
docker-dev-cache 1 2.209GB
shared 1 0B
acd3650aa324548e56d29276001b13c8ad7e493a881b63106aaec0b6be542e9d 0 3.997MBI'm wondering if we could get to a similar state for networking; the devil may be in the detail though! |
|
Thank you for your feedback. I thought it could be useful to have a simple "wait on dependent containers" functionality to have the possibility in the future to add a flag "--depends_on" like in docker compose. |
|
I'll take a more detailed look at the code if we can agree this change is needed - but, I think it is ... @thaJeztah - as you say, un-hooking netns's is probably a separate discussion ... but, as you also say, the container that shares its netns would need to start first - even if we allow it to exit before its dependents and leave its netns behind. (Maybe the first container has CAP_NET_ADMIN and needs to do some setup before other containers arrive.) So, we'd still need restart dependencies like this? (At least on initial startup, maybe we could allow a restart without the dependency if another container sharing the netns has held it open - but that's about design for a future feature, and this change wouldn't hold it back.) At the moment - if containers with netns dependencies don't necessarily restart when they should, that's a bug. Even if we want to re-design the way container-networking works in future, it should work properly until then. The mechanism for queuing container-starts exists, this change adds a missing dependency. I think it makes sense? |
|
I think this PR makes sense overall. As @robmry outlined, whether we want to rearchitect |
|
(The Windows test failures are unrelated to this PR.) |
corhere
left a comment
There was a problem hiding this comment.
I agree that this change makes a lot of sense. The PR could use another iteration, though.
daemon/container.go
Outdated
| // It is used to determine which containers need to be restarted when the given container is restarted. | ||
| // It returns a list of containers that depend on the given container. | ||
| // Upon error, it returns the last known dependent containers, which may be empty. | ||
| dependentContainers := make([]*container.Container, 0) |
There was a problem hiding this comment.
Nit: https://go.dev/wiki/CodeReviewComments#declaring-empty-slices
| dependentContainers := make([]*container.Container, 0) | |
| var dependentContainers []*container.Container |
daemon/container.go
Outdated
| if c.HostConfig.Links != nil { | ||
| for _, link := range c.HostConfig.Links { | ||
| name, _, err := opts.ParseLink(link) | ||
| if err != nil { | ||
| log.G(context.TODO()).WithError(err).Errorf("Could not parse link %s for %s", link, c.ID) | ||
| return dependentContainers | ||
| } | ||
| child, err := daemon.GetContainer(name) | ||
| if err != nil { | ||
| if cerrdefs.IsNotFound(err) { | ||
| // Trying to link to a non-existing container is not valid, and | ||
| // should return an "invalid parameter" error. Returning a "not | ||
| // found" error here would make the client report the container's | ||
| // image could not be found (see moby/moby#39823) | ||
| err = errdefs.InvalidParameter(err) | ||
| } | ||
| log.G(context.TODO()).WithError(err).Errorf("Could not get container for %s", name) | ||
| return dependentContainers | ||
| } | ||
|
|
||
| dependentContainers = append(dependentContainers, child) | ||
| } | ||
| } | ||
| return dependentContainers | ||
|
|
There was a problem hiding this comment.
There's a more efficient way to do this.
| if c.HostConfig.Links != nil { | |
| for _, link := range c.HostConfig.Links { | |
| name, _, err := opts.ParseLink(link) | |
| if err != nil { | |
| log.G(context.TODO()).WithError(err).Errorf("Could not parse link %s for %s", link, c.ID) | |
| return dependentContainers | |
| } | |
| child, err := daemon.GetContainer(name) | |
| if err != nil { | |
| if cerrdefs.IsNotFound(err) { | |
| // Trying to link to a non-existing container is not valid, and | |
| // should return an "invalid parameter" error. Returning a "not | |
| // found" error here would make the client report the container's | |
| // image could not be found (see moby/moby#39823) | |
| err = errdefs.InvalidParameter(err) | |
| } | |
| log.G(context.TODO()).WithError(err).Errorf("Could not get container for %s", name) | |
| return dependentContainers | |
| } | |
| dependentContainers = append(dependentContainers, child) | |
| } | |
| } | |
| return dependentContainers | |
| return append(dependentContainers, maps.Values(daemon.linkIndex.children(c))) | |
daemon/container.go
Outdated
| // This function is used to retrieve the containers that depend on the given container. | ||
| // It is used to determine which containers need to be restarted when the given container is restarted. | ||
| // It returns a list of containers that depend on the given container. | ||
| // Upon error, it returns the last known dependent containers, which may be empty. |
There was a problem hiding this comment.
How come this isn't part of the function's doc comment?
daemon/daemon.go
Outdated
| // (legacy links) to be running before we try to start the container | ||
| if children := daemon.linkIndex.children(c); len(children) > 0 { | ||
| // (legacy links or container network) to be running before we try to start the container | ||
| if children := dependentContainers[c]; len(children) > 0 { |
There was a problem hiding this comment.
Accumulating the dependent containers into a map doesn't really afford us anything, at the cost of some memory.
| if children := dependentContainers[c]; len(children) > 0 { | |
| if children := daemon.GetDependentContainers(c); len(children) > 0 { |
3c948af to
3a3b9f9
Compare
|
Looks like something odd happened in the rebase, an already merged commit ended up in this PR? |
3a3b9f9 to
8f677ab
Compare
|
Thanks, I didn't notice |
|
There's a test failure in https://github.com/moby/moby/actions/runs/16349928056/job/46194686893?pr=50327 ... That means we need this comment at the top of |
|
I've added a changelog comment to the description - hopefully it makes sense?! |
|
For the comment at the top of container.go, do I make a separate commit or do I squash them? |
Squash please! |
8f677ab to
2e492de
Compare
2e492de to
d2f8e7b
Compare
|
Sorry this is getting a bit tedious - nearly there, but ... I think it wants a blank like after the build directive ... |
|
I just did a gofmt -s and it reversed both comments like so : I think I should push this, no? |
|
It should be ok the original way around - we've got quite a few of these now (and should be able to remove them soon), so it's best if they're all the same. For example ... Lines 1 to 5 in c616e76 (Oh, except I see we have a mix of |
- Get dependent containers before starting containers - Dependent containers can be either from legacy link or container network - Wait on a best effort basis for the dependent containers Fixes: moby#50326 Signed-off-by: Adrien Pompée <adrien.pompee@atmosphere.aero>
d2f8e7b to
d4e026f
Compare
robmry
left a comment
There was a problem hiding this comment.
LGTM - thank you @Adrien-Atmosphere!
The BuildKit Windows test failure is unrelated ... I'll get this merged.
- What I did
Wait for container dependencies upon daemon start up
- How I did it
- How to verify it
Create a docker-dind container with bind mounted /var/lib/docker :
docker run -d --privileged --name docker -v ./docker:/var/lib/docker docker-dind:latest --debugLaunch a compose application that creates 500 containers in docker-dind image to maximize the chances to encounter the issue
- Human readable description for the release notes
- On daemon startup, restart containers that share their network stacks before containers that need those stacks.- A picture of a cute animal (not mandatory but encouraged)