Skip to content

NRI: allow plugins to see/modify env vars#51674

Merged
akerouanton merged 2 commits intomoby:masterfrom
robmry:nri-env-vars
Dec 10, 2025
Merged

NRI: allow plugins to see/modify env vars#51674
akerouanton merged 2 commits intomoby:masterfrom
robmry:nri-env-vars

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Dec 9, 2025

- What I did

- How I did it

  • Call NRI's ContainerCreate hook.
  • Tell NRI plugins about env-vars, and allow plugins to modify them.
  • Add an integration test

- How to verify it

- Human readable description for the release notes

Add experimental NRI support

@robmry robmry added this to the 29.2.0 milestone Dec 9, 2025
@robmry robmry self-assigned this Dec 9, 2025
@robmry robmry added area/runtime Runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny area/daemon Core Engine labels Dec 9, 2025
@robmry robmry force-pushed the nri-env-vars branch 2 times, most recently from 73156ed to de3712e Compare December 9, 2025 11:03
@robmry robmry marked this pull request as ready for review December 9, 2025 11:08
return nil
}
ctr.State.Lock()
defer ctr.State.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the container hasn't been made visible to other goroutines when this method is called, so there should be no reasons to lock its state. Thus, I think this is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true at the moment ... but locking the lock makes sure, it's super-cheap if there really is no contention, and particularly as there's a package boundary - just acquiring the lock anyway seemed best to me.

(If the implementation carries on past a minimal initial version, there are more hooks / lifecycle events to add. They'll definitely need to lock the container. If they copy from CreateContainer, they won't forget!)

So I can remove it, but would prefer not to - do you (or others) feel more strongly that it should go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's better to remove it, but leave a comment explaining why we don't need a lock here (because it's the initial creation and container hasn't been registered yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - it's gone!

}

func containerToNRI(ctr *container.Container) (*adaptation.PodSandbox, *adaptation.Container, error) {
nriPod := &adaptation.PodSandbox{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a dumb question, is this mandatory? If so, is it required by github.com/containerd/nri or individual plugins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually tried passing in a nil PodSandbox, or looked at whether it's nil-checked in the NRI code... but, although we don't have pods, the PodSandbox is where the netns lives in the plugin interface. So, it seems likely individual plugins will want that at some point.

My thinking is that we'll tell plugins there's a pod-per-container, with our non-pods having the same lifecycle as our containers. (And we'll probably represent container-networking as containers sharing a pod.)

For now though, in this super-minimal implementation that only allows env-var and mount updates - it's not populated. Only because the implementation is incomplete.

I've added comments to that effect ... I hope the plan makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +218 to +237
Linux: &adaptation.LinuxContainer{
Namespaces: nil,
Devices: nil,
Resources: nil,
OomScoreAdj: nil,
CgroupsPath: "",
IoPriority: nil,
SeccompProfile: nil,
SeccompPolicy: nil,
},
Mounts: nil,
Pid: uint32(ctr.Pid),
Rlimits: nil,
CreatedAt: 0,
StartedAt: 0,
FinishedAt: 0,
ExitCode: 0,
StatusReason: "",
StatusMessage: "",
CDIDevices: nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to not fill these fields? Is this coming in a later PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most fields ... potentially coming in a later PR (there's not currently any plan beyond an initial implementation that allows modification of env-vars and mounts).

To support that minimal case, a few more fields are coming soon - master...robmry:moby:nri

Most of the fields have a fairly obvious mapping from our internal data - but each thing that's populated here will need a test and, if it's adjustable by the plugin, tests for that too ... I've expanded the structs with zero values here to show what's missing.

The other big gap in the initial implementation is that it'll only support the "creation" lifecycle event / hook point ... not-implemented yet are updating and stopping (which allow adjustments), and informative events post-creation, starting, post-start, post-update and removal.

return nil
}

func (n *NRI) syncFn(ctx context.Context, syncCB adaptation.SyncCB) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it correctly, syncFn is called by the adaptation package when a NRI plugin request a 'container update' to update its internal state. Maybe worth documenting for laymen? (But that's not directly related to this PR, so feel free to address that whenever you want if that makes sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's meant for newly registered plugins to learn about existing containers - I've added a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those plugins able to modify state on those containers as part of their registration? i.e., can we expect them to modify them after-the-fact when registering?

(mostly curious how such plugins hook into the container lifecycle; I'd expect them to act when creating containers, so slightly wondering why they need to be aware of existing containers)

Also, like Albin's comment; non-blocking, but we can improve docs and/or look at the integration in follow-ups.

return err
}

if resp.GetUpdate() != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an update? Looking at NRI types, it seems the type returned by GetUpdate only holds resource limits. Any links to share, etc.? Maybe a few code comments could help understand what's supported or not, and where to learn more about NRI.

Should we check that GetEvict() == nil too, and log / return an error otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates won't be supported in the initial implementation - but they do map (ish) to what can be modified by our "docker update", so it should be possible to support them at some point.

I've added a missing package comment, with a link to https://github.com/containerd/nri - which describes the NRI lifecycle, what mods it can make, and when.

There's a commit to check for unsupported adjustments in the queue ("NRI: error on unsupported adjustment") - but I missed GetEvict, so will add a check for it there.

return errors.New("empty environment variable key")
}
val := kv.Key + "=" + kv.Value
log.G(ctx).Debugf("Applying NRI env var adjustment: %s", val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logs aren't enabled by default, but this could leak secrets, etc. I think it'd be better to only log the list of updated keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

The modified env-var's value will always show up in container-inspect. So, storing secrets directly in env-vars set by NRI may not be the best idea anyway.

p.stub = stub
err = p.stub.Start(ctx)
assert.Assert(t, err)
t.Cleanup(p.stub.Stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a matter of personal taste, but I feel uncomfortable when a stop is burried inside of a start func. From a caller's perspective, it seems the plugin is started but never stopped.

You could instead return that stop func and call it through a defer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; the mutex can be handled in follow up

Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akerouanton akerouanton merged commit 0702422 into moby:master Dec 10, 2025
185 of 186 checks passed
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// the same level of trust as the daemon itself, because they can make arbitrary
// modifications to container settings.
//
// The NRI framework is implemented by https://github.com/containerd/nri - see that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change this to a docs-link, then GoDoc automatically links it to the right place (on pkg.go.dev);

Suggested change
// The NRI framework is implemented by https://github.com/containerd/nri - see that
// The NRI framework is implemented by [github.com/containerd/nri] - see that

Before / After;

Screenshot 2025-12-10 at 15 32 26 Screenshot 2025-12-10 at 15 31 32

func (n *NRI) CreateContainer(ctx context.Context, ctr *container.Container) error {
n.mu.RLock()
defer n.mu.RUnlock()
if n.nri == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up, we should look if we can come with more descriptive names for these; there's various places where we use nri (both in package names and field-names), and it's not always clear "NRI-what? is it a bird, a plane, a service?"; e.g. also this one, which is a service that's started stopped, I THINK ?

nri *nri.NRI

Comment on lines +112 to +113
// ctr.State can safely be locked here, but there's no need because it's expected
// to be newly created and not yet accessible in any other thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this would be something we should also mention in the CreateContainer GoDoc; i.e., that it's not designed for concurrent use (on the container) or something along those lines (because it doesn't lock the container's state).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(definitely non-blocking, just reading along and fine for a follow-up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - just taking the lock would've made it impossible to get wrong ... but I was outvoted!

return nil
}

func (n *NRI) syncFn(ctx context.Context, syncCB adaptation.SyncCB) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those plugins able to modify state on those containers as part of their registration? i.e., can we expect them to modify them after-the-fact when registering?

(mostly curious how such plugins hook into the container lifecycle; I'd expect them to act when creating containers, so slightly wondering why they need to be aware of existing containers)

Also, like Albin's comment; non-blocking, but we can improve docs and/or look at the integration in follow-ups.


func TestNRIContainerCreateEnvVarMod(t *testing.T) {
skip.If(t, testEnv.IsRemoteDaemon, "cannot run daemon when remote daemon")
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing "non windows" is also because we can't start a custom daemon with these options, right? Perhaps we should add a comment to the skip (Looking at "patch env-vars", I could imagine that can be supported on Windows as well, just not easy to test it).


ctx := testutil.StartSpan(baseContext, t)

tmp := t.TempDir()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really (really) nit; I think in most places we use tmpDir as var-name for these.

git grep -i 'tmp := t.TempDir()' | wc -l
       2
git grep -i 'tmpDir := t.TempDir()' | wc -l
      34
git grep -i 'tempDir := t.TempDir()' | wc -l
       7

defer d.Stop(t)
c := d.NewClientT(t)

testcases := []struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar nit;

git grep -i ':= range testcases' | wc -l
     109
git grep -i ':= range tests' | wc -l
     177

@thaJeztah
Copy link
Member

Ah, LOL, race;

Screenshot 2025-12-10 at 15 53 39

@robmry robmry mentioned this pull request Dec 10, 2025
@robmry
Copy link
Contributor Author

robmry commented Dec 10, 2025

Ah, LOL, race;

Raised #51678 to address the extra comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine area/runtime Runtime impact/changelog kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny platform/windows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants