Skip to content

If container will run as non root user, drop permitted, effective caps early#36587

Merged
vdemeester merged 1 commit intomoby:masterfrom
justincormack:unpriv-caps
Mar 22, 2018
Merged

If container will run as non root user, drop permitted, effective caps early#36587
vdemeester merged 1 commit intomoby:masterfrom
justincormack:unpriv-caps

Conversation

@justincormack
Copy link
Contributor

As soon as the initial executable in the container is executed as a non root user,
permitted and effective capabilities are dropped. Drop them earlier than this, so
that they are dropped before executing the file. The main effect of this is that
if CAP_DAC_OVERRIDE is set (the default) the user will not be able to execute
files they do not have permission to execute, which previously they could.

The old behaviour was somewhat surprising and the new one is definitely correct,
but it is not in any meaningful way exploitable, and I do not think it is
necessary to backport this fix. It is unlikely to have any negative effects as
almost all executables have world execute permission anyway.

Use the bounding set not the effective set as the canonical set of capabilities, as
effective will now vary.

Signed-off-by: Justin Cormack justin.cormack@docker.com

owl surprise

@thaJeztah
Copy link
Member

Could you add a test for this? Looks like something that's easy to miss if the behaviour changes

@justincormack
Copy link
Contributor Author

justincormack commented Mar 14, 2018 via email

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when the user specifies --cap-(add|drop) on the CLI, this is available in the bounding set? I thought that we also wanted the effective set to remain consistent with what was requested from the end-user across the "user context" switch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind, I had to read more carefully CAPABILITIES(7). Yes, this seems like the behavior we should have had from the start.

@n4ss
Copy link

n4ss commented Mar 14, 2018

LGTM

This fixes also all the duplicates of: #27590 I guess

@crosbymichael
Copy link
Contributor

LGTM

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 16, 2018

windows failure (from here):

06:05:40 FAIL: docker_cli_run_test.go:4345: DockerSuite.TestSlowStdinClosing
06:05:40
06:05:40 docker_cli_run_test.go:4361:
06:05:40 c.Fatal("running container timed out") // cleanup in teardown
06:05:40 ... Error: running container timed out

I hope to fix that one in #36571 (commit 5043639) which was merged recently. @justincormack could you please rebase (for the sake of checking if this one goes away)?

…s early

As soon as the initial executable in the container is executed as a non root user,
permitted and effective capabilities are dropped. Drop them earlier than this, so
that they are dropped before executing the file. The main effect of this is that
if `CAP_DAC_OVERRIDE` is set (the default) the user will not be able to execute
files they do not have permission to execute, which previously they could.

The old behaviour was somewhat surprising and the new one is definitely correct,
but it is not in any meaningful way exploitable, and I do not think it is
necessary to backport this fix. It is unlikely to have any negative effects as
almost all executables have world execute permission anyway.

Use the bounding set not the effective set as the canonical set of capabilities, as
effective will now vary.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
@codecov
Copy link

codecov bot commented Mar 19, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@60e2dc2). Click here to learn what that means.
The diff coverage is 16.66%.

@@            Coverage Diff            @@
##             master   #36587   +/-   ##
=========================================
  Coverage          ?    34.7%           
=========================================
  Files             ?      612           
  Lines             ?    45413           
  Branches          ?        0           
=========================================
  Hits              ?    15761           
  Misses            ?    27590           
  Partials          ?     2062

@justincormack
Copy link
Contributor Author

@kolyshkin rebased

@kolyshkin
Copy link
Contributor

SGTM

Copy link
Contributor

@boaz0 boaz0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@boaz0
Copy link
Contributor

boaz0 commented Mar 20, 2018

hmmm... 4 tests failed in CI:

01:39:12 FAIL: docker_api_build_test.go:518: DockerSuite.TestBuildWithSession
01:39:12 
01:39:12 assertion failed: 
01:39:12 {int}:
01:39:12 	-: 500
01:39:12 	+: 200
01:39:12 
01:39:12 assertion failed: string "{\"message\":\"failed to copy to /var/lib/docker/builder/2e666379ecc5beac4c5f81bd80ba26ae3213f00abacc934f2b9e5f80f0b9669e: rpc error: code = Internal desc = transport is closing\"}\n" does not contain "Successfully built"
01:39:12 assertion failed: 0 (int) != 2 (int)
01:39:12 assertion failed: string "{\"message\":\"failed to copy to /var/lib/docker/builder/2e666379ecc5beac4c5f81bd80ba26ae3213f00abacc934f2b9e5f80f0b9669e: rpc error: code = Internal desc = transport is closing\"}\n" does not contain "contentcontent"
01:39:12 assertion failed: 2 (int) != 4 (int)
01:39:12 assertion failed: 12 (du.BuilderSize int64) != 26 (du2.BuilderSize int64)
[...]
02:03:27 FAIL: docker_api_swarm_service_test.go:32: DockerSwarmSuite.TestAPIServiceUpdatePort
02:03:27 
02:03:27 [daa986865c326] waiting for daemon to start
02:03:27 [daa986865c326] daemon started
02:03:27 
02:03:27 docker_api_swarm_service_test.go:38:
02:03:27     waitAndAssert(c, defaultReconciliationTimeout, d.CheckActiveContainerCount, checker.Equals, 1)
02:03:27 docker_utils_test.go:465:
02:03:27     c.Assert(v, checker, args...)
02:03:27 ... obtained int = 0
02:03:27 ... expected int = 1
02:03:27 
02:03:27 [daa986865c326] exiting daemon
[...]
02:19:33 FAIL: docker_cli_swarm_test.go:1377: DockerSwarmSuite.TestSwarmClusterRotateUnlockKey
02:19:33 
02:19:33 [d219a6baa1b4b] waiting for daemon to start
02:19:33 [d219a6baa1b4b] daemon started
02:19:33 
02:19:33 [d72a1573299a1] waiting for daemon to start
02:19:33 [d72a1573299a1] daemon started
02:19:33 
02:19:33 [d2b5c5a9133d2] waiting for daemon to start
02:19:33 [d2b5c5a9133d2] daemon started
02:19:33 
02:19:33 [d72a1573299a1] exiting daemon
02:19:33 [d72a1573299a1] waiting for daemon to start
02:19:33 [d72a1573299a1] daemon started
02:19:33 
02:19:33 [d2b5c5a9133d2] exiting daemon
02:19:33 [d2b5c5a9133d2] waiting for daemon to start
02:19:33 [d2b5c5a9133d2] daemon started
02:19:33 
02:19:33 [d72a1573299a1] exiting daemon
02:19:33 [d72a1573299a1] waiting for daemon to start
02:19:33 [d72a1573299a1] daemon started
02:19:33 
02:19:33 [d2b5c5a9133d2] daemon started
02:19:33 Attempt #2: daemon is still running with pid 6685
02:19:33 Attempt #3: daemon is still running with pid 6685
02:19:33 Attempt #4: daemon is still running with pid 6685
02:19:33 [d2b5c5a9133d2] exiting daemon
02:19:33 docker_cli_swarm_test.go:1410:
02:19:33     d3.Restart(c)
02:19:33 daemon/daemon.go:389:
02:19:33     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
02:19:33 ... Error: Error while stopping the daemon d2b5c5a9133d2 : exit status 130
02:19:33 
02:19:33 [d219a6baa1b4b] exiting daemon
02:19:33 [d72a1573299a1] exiting daemon
[...]
02:25:42 FAIL: docker_cli_swarm_test.go:1682: DockerSwarmSuite.TestSwarmPublishDuplicatePorts
02:25:42 
02:25:42 [d345e0848b9b7] waiting for daemon to start
02:25:42 [d345e0848b9b7] daemon started
02:25:42 
02:25:42 docker_cli_swarm_test.go:1690:
02:25:42     // make sure task has been deployed.
02:25:42     waitAndAssert(c, defaultReconciliationTimeout, d.CheckActiveContainerCount, checker.Equals, 1)
02:25:42 docker_utils_test.go:465:
02:25:42     c.Assert(v, checker, args...)
02:25:42 ... obtained int = 0
02:25:42 ... expected int = 1
02:25:42 
02:25:42 [d345e0848b9b7] exiting daemon

None of those seem like related to this change, though.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :lion:

@vdemeester
Copy link
Member

experimental failure unrelated 👼 let's get this merged 👼

@vdemeester vdemeester merged commit b67c1e0 into moby:master Mar 22, 2018
@justincormack justincormack deleted the unpriv-caps branch March 22, 2018 22:38
justincormack added a commit to justincormack/docker that referenced this pull request Mar 28, 2018
Make the same changes for exec as for run in moby#36587.

This means that capabilities are set early to how they will be set
after exec. This largely affects ability to exec a binary that
would not be allowed by a user if `CAP_DAC_OVERRIDE` is not set.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
@nfnty
Copy link

nfnty commented Apr 20, 2018

This (with the upgrade to Docker 18.04.0) broke all my non-root containers with added capabilities.

How are you supposed to be able to add capabilities to non-root containers now? I really don't see the relevance of following execve behavior here, since Docker is managing containers, not executing random files as non-root. The much better option would be to change default capabilities instead, since forcing the use of root containers is much more dangerous.

@thaJeztah
Copy link
Member

@nfnty could you open an issue with details, and a minimal example that broke with this change?

@nfnty
Copy link

nfnty commented Apr 20, 2018

Sure, but maybe there's something I'm missing. Would be great to get an answer to how non-root containers are supposed to handle capabilities now.

@thaJeztah
Copy link
Member

IIRC this fix just requires the containers entrypoint to have the executable bit set, whereas previously those permissions were not respected (ie, a binary without executable bit set would still run as if it had the bit set); setting the correct permissions on the executable is now needed (just as would be the case in a regular shell)

@nfnty
Copy link

nfnty commented Apr 20, 2018

No, CAP_DAC_OVERRIDE being dropped is just a side effect, since it's one of the default capabilities. This drops all capabilities for all UID != 0 (i.e. non-root) containers.

@nfnty
Copy link

nfnty commented May 20, 2018

ping @justincormack

@justincormack
Copy link
Contributor Author

Hmm, I didn't think non root capabilities worked previously (other than for suid executables); that what ambient capabilities are for which we have not added support for yet. Let me experiment a bit.

@justincormack
Copy link
Contributor Author

@nfnty I am quite confused about how you might be getting this to work, testing on an older Docker I get:

docker run --user=100 --cap-add sys_admin justincormack/debian capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=100(_apt)
gid=65534(nogroup)
groups=

Note the +i rather than the +eip you get as root; this means caps can be inherited (and used by suid binaries) but are not effective now. Are you using filesystem capabilities? Which version of Docker were you using before?

@nfnty
Copy link

nfnty commented May 21, 2018

@justincormack Indeed, a combination of CapDrop: [ALL], CapAdd, and filesystem capabilities. See nfnty/arch-kea:latest and the kea container spec.

I'm currently using 18.03.0

@justincormack
Copy link
Contributor Author

Hmm a minimal simple example with filesystem caps works for me on Docker 18.05.0-ce

Dockerfile:

FROM alpine
RUN apk add --update libcap

RUN ls -la /usr/sbin/capsh
RUN setcap 'cap_sys_admin=ep' /usr/sbin/capsh
docker build -t cap .
docker run --cap-drop all --cap-add sys_admin --user 100:100 cap capsh --print
Current: = cap_sys_admin+eip
Bounding set =cap_sys_admin
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=100(???)
gid=100(users)
groups=

So that is running with uid 100 and CAP_SYS_ADMIN as +eip ie including effective capability.

@justincormack
Copy link
Contributor Author

(I wouldn't expect this change to affect filesystem capabilities; it is possible some other change has affected round tripping on filesystem capabilities though, are you sure they are in the image at runtime? At one point only some graph drivers would round trip filesystem capabilities, and we also made some changes to the tar code recently which could have affected it, maybe do an lsattr in the image at runtime to check they are still there).

@nfnty
Copy link

nfnty commented May 22, 2018

Thanks. I've pinpointed the problem to --security-opt=no-new-privileges.

@nfnty
Copy link

nfnty commented May 22, 2018

This patch combined with no-new-privileges makes capabilities unusable. Do you know of a workaround? Disabling no-new-privileges is not an option.

@justincormack
Copy link
Contributor Author

Yes --no-new-privileges stops all filesystem capabilities and suid working, so yes it does stop this. Not really sure you should both require --no-new-privileges and require the use of filesystem capabilities as they are exclusive. You could immediately set no new privileges in the container process yourself at the beginning of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants