Skip to content

cri: replace busy-wait loop with direct UpdateImage in checkpoint restore#12940

Open
veophi wants to merge 1 commit intocontainerd:mainfrom
veophi:fix/cri-checkpoint-remove-busy-wait-for-image-resolve
Open

cri: replace busy-wait loop with direct UpdateImage in checkpoint restore#12940
veophi wants to merge 1 commit intocontainerd:mainfrom
veophi:fix/cri-checkpoint-remove-busy-wait-for-image-resolve

Conversation

@veophi
Copy link

@veophi veophi commented Feb 26, 2026

During checkpoint restore, the code used a busy-wait loop (up to 500 iterations) to poll LocalResolve() waiting for the CRI image store cache to be updated by an asynchronous ImageCreate event. This was unreliable and the original author noted 'This is probably wrong'.

The root cause is that CRImportCheckpoint creates images via the containerd ImageService directly (bypassing CRI), so the CRI in-memory image store cache only gets updated asynchronously through the event handler. Instead of polling, we can proactively call UpdateImage() to synchronously refresh the CRI cache, which is the same operation the event handler would eventually perform.

This change:

  • Removes the busy-wait polling loop
  • Calls UpdateImage() to synchronously sync the image into the CRI cache
  • Removes the unused imagestore import

…tore

During checkpoint restore, the code used a busy-wait loop (up to 500
iterations) to poll LocalResolve() waiting for the CRI image store cache
to be updated by an asynchronous ImageCreate event. This was unreliable
and the original author noted 'This is probably wrong'.

The root cause is that CRImportCheckpoint creates images via the
containerd ImageService directly (bypassing CRI), so the CRI in-memory
image store cache only gets updated asynchronously through the event
handler. Instead of polling, we can proactively call UpdateImage() to
synchronously refresh the CRI cache, which is the same operation the
event handler would eventually perform.

This change:
- Removes the busy-wait polling loop
- Calls UpdateImage() to synchronously sync the image into the CRI cache
- Removes the unused imagestore import

Signed-off-by: sunweixiang <sunweixiang@xiaohongshu.com>
@github-project-automation github-project-automation bot moved this to Needs Triage in Pull Request Review Feb 26, 2026
@dosubot dosubot bot added the area/cri Container Runtime Interface (CRI) label Feb 26, 2026
@veophi
Copy link
Author

veophi commented Feb 26, 2026

/cc @adrianreber

@k8s-ci-robot
Copy link

@veophi: GitHub didn't allow me to request PR reviews from the following users: adrianreber.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @adrianreber

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adrianreber
Copy link
Contributor

@veophi If this works better then I am all for it. I didn't know how to implement it better when doing it initially. I was looking for a blocking call to download the image and I was not able to find it. If your change is blocking until the image has been downloaded then this PR looks better then what was there initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) size/S

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

3 participants