NRI: allow plugins to see/modify env vars#51674
Conversation
73156ed to
de3712e
Compare
daemon/internal/nri/nri.go
Outdated
| return nil | ||
| } | ||
| ctr.State.Lock() | ||
| defer ctr.State.Unlock() |
There was a problem hiding this comment.
AFAIK the container hasn't been made visible to other goroutines when this method is called, so there should be no reasons to lock its state. Thus, I think this is not needed.
There was a problem hiding this comment.
Yes, that's true at the moment ... but locking the lock makes sure, it's super-cheap if there really is no contention, and particularly as there's a package boundary - just acquiring the lock anyway seemed best to me.
(If the implementation carries on past a minimal initial version, there are more hooks / lifecycle events to add. They'll definitely need to lock the container. If they copy from CreateContainer, they won't forget!)
So I can remove it, but would prefer not to - do you (or others) feel more strongly that it should go?
There was a problem hiding this comment.
IMO, it's better to remove it, but leave a comment explaining why we don't need a lock here (because it's the initial creation and container hasn't been registered yet).
| } | ||
|
|
||
| func containerToNRI(ctr *container.Container) (*adaptation.PodSandbox, *adaptation.Container, error) { | ||
| nriPod := &adaptation.PodSandbox{ |
There was a problem hiding this comment.
Probably a dumb question, is this mandatory? If so, is it required by github.com/containerd/nri or individual plugins?
There was a problem hiding this comment.
I haven't actually tried passing in a nil PodSandbox, or looked at whether it's nil-checked in the NRI code... but, although we don't have pods, the PodSandbox is where the netns lives in the plugin interface. So, it seems likely individual plugins will want that at some point.
My thinking is that we'll tell plugins there's a pod-per-container, with our non-pods having the same lifecycle as our containers. (And we'll probably represent container-networking as containers sharing a pod.)
For now though, in this super-minimal implementation that only allows env-var and mount updates - it's not populated. Only because the implementation is incomplete.
I've added comments to that effect ... I hope the plan makes sense?
| Linux: &adaptation.LinuxContainer{ | ||
| Namespaces: nil, | ||
| Devices: nil, | ||
| Resources: nil, | ||
| OomScoreAdj: nil, | ||
| CgroupsPath: "", | ||
| IoPriority: nil, | ||
| SeccompProfile: nil, | ||
| SeccompPolicy: nil, | ||
| }, | ||
| Mounts: nil, | ||
| Pid: uint32(ctr.Pid), | ||
| Rlimits: nil, | ||
| CreatedAt: 0, | ||
| StartedAt: 0, | ||
| FinishedAt: 0, | ||
| ExitCode: 0, | ||
| StatusReason: "", | ||
| StatusMessage: "", | ||
| CDIDevices: nil, |
There was a problem hiding this comment.
Any particular reason to not fill these fields? Is this coming in a later PR?
There was a problem hiding this comment.
For most fields ... potentially coming in a later PR (there's not currently any plan beyond an initial implementation that allows modification of env-vars and mounts).
To support that minimal case, a few more fields are coming soon - master...robmry:moby:nri
Most of the fields have a fairly obvious mapping from our internal data - but each thing that's populated here will need a test and, if it's adjustable by the plugin, tests for that too ... I've expanded the structs with zero values here to show what's missing.
The other big gap in the initial implementation is that it'll only support the "creation" lifecycle event / hook point ... not-implemented yet are updating and stopping (which allow adjustments), and informative events post-creation, starting, post-start, post-update and removal.
| return nil | ||
| } | ||
|
|
||
| func (n *NRI) syncFn(ctx context.Context, syncCB adaptation.SyncCB) error { |
There was a problem hiding this comment.
If I get it correctly, syncFn is called by the adaptation package when a NRI plugin request a 'container update' to update its internal state. Maybe worth documenting for laymen? (But that's not directly related to this PR, so feel free to address that whenever you want if that makes sense)
There was a problem hiding this comment.
I think it's meant for newly registered plugins to learn about existing containers - I've added a comment.
There was a problem hiding this comment.
Are those plugins able to modify state on those containers as part of their registration? i.e., can we expect them to modify them after-the-fact when registering?
(mostly curious how such plugins hook into the container lifecycle; I'd expect them to act when creating containers, so slightly wondering why they need to be aware of existing containers)
Also, like Albin's comment; non-blocking, but we can improve docs and/or look at the integration in follow-ups.
| return err | ||
| } | ||
|
|
||
| if resp.GetUpdate() != nil { |
There was a problem hiding this comment.
What's an update? Looking at NRI types, it seems the type returned by GetUpdate only holds resource limits. Any links to share, etc.? Maybe a few code comments could help understand what's supported or not, and where to learn more about NRI.
Should we check that GetEvict() == nil too, and log / return an error otherwise?
There was a problem hiding this comment.
Updates won't be supported in the initial implementation - but they do map (ish) to what can be modified by our "docker update", so it should be possible to support them at some point.
I've added a missing package comment, with a link to https://github.com/containerd/nri - which describes the NRI lifecycle, what mods it can make, and when.
There's a commit to check for unsupported adjustments in the queue ("NRI: error on unsupported adjustment") - but I missed GetEvict, so will add a check for it there.
daemon/internal/nri/nri.go
Outdated
| return errors.New("empty environment variable key") | ||
| } | ||
| val := kv.Key + "=" + kv.Value | ||
| log.G(ctx).Debugf("Applying NRI env var adjustment: %s", val) |
There was a problem hiding this comment.
Debug logs aren't enabled by default, but this could leak secrets, etc. I think it'd be better to only log the list of updated keys.
There was a problem hiding this comment.
Good point, done.
The modified env-var's value will always show up in container-inspect. So, storing secrets directly in env-vars set by NRI may not be the best idea anyway.
integration/daemon/nri/plugin.go
Outdated
| p.stub = stub | ||
| err = p.stub.Start(ctx) | ||
| assert.Assert(t, err) | ||
| t.Cleanup(p.stub.Stop) |
There was a problem hiding this comment.
It might be a matter of personal taste, but I feel uncomfortable when a stop is burried inside of a start func. From a caller's perspective, it seems the plugin is started but never stopped.
You could instead return that stop func and call it through a defer.
vvoland
left a comment
There was a problem hiding this comment.
LGTM; the mutex can be handled in follow up
Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
| // the same level of trust as the daemon itself, because they can make arbitrary | ||
| // modifications to container settings. | ||
| // | ||
| // The NRI framework is implemented by https://github.com/containerd/nri - see that |
There was a problem hiding this comment.
| func (n *NRI) CreateContainer(ctx context.Context, ctr *container.Container) error { | ||
| n.mu.RLock() | ||
| defer n.mu.RUnlock() | ||
| if n.nri == nil { |
There was a problem hiding this comment.
As a follow-up, we should look if we can come with more descriptive names for these; there's various places where we use nri (both in package names and field-names), and it's not always clear "NRI-what? is it a bird, a plane, a service?"; e.g. also this one, which is a service that's started stopped, I THINK ?
Line 120 in 186a5ab
| // ctr.State can safely be locked here, but there's no need because it's expected | ||
| // to be newly created and not yet accessible in any other thread. |
There was a problem hiding this comment.
Wondering if this would be something we should also mention in the CreateContainer GoDoc; i.e., that it's not designed for concurrent use (on the container) or something along those lines (because it doesn't lock the container's state).
There was a problem hiding this comment.
(definitely non-blocking, just reading along and fine for a follow-up)
There was a problem hiding this comment.
Yeah - just taking the lock would've made it impossible to get wrong ... but I was outvoted!
| return nil | ||
| } | ||
|
|
||
| func (n *NRI) syncFn(ctx context.Context, syncCB adaptation.SyncCB) error { |
There was a problem hiding this comment.
Are those plugins able to modify state on those containers as part of their registration? i.e., can we expect them to modify them after-the-fact when registering?
(mostly curious how such plugins hook into the container lifecycle; I'd expect them to act when creating containers, so slightly wondering why they need to be aware of existing containers)
Also, like Albin's comment; non-blocking, but we can improve docs and/or look at the integration in follow-ups.
|
|
||
| func TestNRIContainerCreateEnvVarMod(t *testing.T) { | ||
| skip.If(t, testEnv.IsRemoteDaemon, "cannot run daemon when remote daemon") | ||
| skip.If(t, testEnv.DaemonInfo.OSType == "windows") |
There was a problem hiding this comment.
I'm guessing "non windows" is also because we can't start a custom daemon with these options, right? Perhaps we should add a comment to the skip (Looking at "patch env-vars", I could imagine that can be supported on Windows as well, just not easy to test it).
|
|
||
| ctx := testutil.StartSpan(baseContext, t) | ||
|
|
||
| tmp := t.TempDir() |
There was a problem hiding this comment.
Really (really) nit; I think in most places we use tmpDir as var-name for these.
git grep -i 'tmp := t.TempDir()' | wc -l
2
git grep -i 'tmpDir := t.TempDir()' | wc -l
34
git grep -i 'tempDir := t.TempDir()' | wc -l
7
| defer d.Stop(t) | ||
| c := d.NewClientT(t) | ||
|
|
||
| testcases := []struct { |
There was a problem hiding this comment.
Similar nit;
git grep -i ':= range testcases' | wc -l
109
git grep -i ':= range tests' | wc -l
177
Raised #51678 to address the extra comments. |



- What I did
- How I did it
- How to verify it
- Human readable description for the release notes