[docker-29.x backport] daemon: clean up dead containers on start#51693
[docker-29.x backport] daemon: clean up dead containers on start#51693vvoland merged 1 commit intomoby:docker-29.xfrom
Conversation
Stopping the Engine while a container with autoremove set is running may leave behind dead containers on disk. These containers aren't reclaimed on next start, appear as "dead" in `docker ps -a` and can't be inspected or removed by the user. This bug has existed since a long time but became user visible with 9f5f4f5. Prior to that commit, containers with no rwlayer weren't added to the in-memory viewdb, so they weren't visible in `docker ps -a`. However, some dangling files would still live on disk (e.g. folder in /var/lib/docker/containers, mount points, etc). The underlying issue is that when the daemon stops, it tries to stop all running containers and then closes the containerd client. This leaves a small window of time where the Engine might receive 'task stop' events from containerd, and trigger autoremove. If the containerd client is closed in parallel, the Engine is unable to complete the removal, leaving the container in 'dead' state. In such case, the Engine logs the following error: cannot remove container "bcbc98b4f5c2b072eb3c4ca673fa1c222d2a8af00bf58eae0f37085b9724ea46": Canceled: grpc: the client connection is closing: context canceled Solving the underlying issue would require complex changes to the shutdown sequence. Moreover, the same issue could also happen if the daemon crashes while it deletes a container. Thus, add a cleanup step on daemon startup to remove these dead containers. Signed-off-by: Albin Kerouanton <albin.kerouanton@docker.com> (cherry picked from commit ec9315c) Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
46a43fe to
3376758
Compare
|
Hey @thaJeztah @vvoland , sorry to bother on a closed PR. At Portainer we are still getting reports of users with docker 29.1.3 having dead containers that are not automatically cleaned up on restart. The ghosts/dead containers are not listed by Reports can be read at: portainer/portainer#12959 portainer/portainer#12987 portainer/portainer#12948 I wasn't able to recreate a dead container even by manipulating the files (the containers always get a new random name on restart), so idk what you can do to reproduce the issue. This is just a heads up so you are aware of this situation, I'm not asking for a specific fix 😄 |
|
I recall there was an additional fix that was not merged yet; possibly could be related to that; |
- What I did
Stopping the Engine while a container with autoremove set is running may leave behind dead containers on disk. These containers aren't reclaimed on next start, appear as "dead" in
docker ps -aand can't be inspected or removed by the user.This bug has existed since a long time but became user visible with 9f5f4f5. Prior to that commit, containers with no rwlayer weren't added to the in-memory viewdb, so they weren't visible in
docker ps -a. However, some dangling files would still live on disk (e.g. folder in /var/lib/docker/containers, mount points, etc).The underlying issue is that when the daemon stops, it tries to stop all running containers and then closes the containerd client. This leaves a small window of time where the Engine might receive 'task stop' events from containerd, and trigger autoremove. If the containerd client is closed in parallel, the Engine is unable to complete the removal, leaving the container in 'dead' state. In such case, the Engine logs the following error:
Solving the underlying issue would require complex changes to the shutdown sequence. Moreover, the same issue could also happen if the daemon crashes while it deletes a container. Thus, add a cleanup step on daemon startup to remove these dead containers.
- How to verify it
A new integration test has been added.
- Human readable description for the release notes