Skip to content

Fuzzing: documentation updates, build script changes, new fuzzers#12896

Draft
samuelkarp wants to merge 16 commits intocontainerd:mainfrom
samuelkarp:fuzzers
Draft

Fuzzing: documentation updates, build script changes, new fuzzers#12896
samuelkarp wants to merge 16 commits intocontainerd:mainfrom
samuelkarp:fuzzers

Conversation

@samuelkarp
Copy link
Member

@samuelkarp samuelkarp commented Feb 14, 2026

⚠️ AI-assisted code and documentation ⚠️

This series of commits was generated in response to a threat modeling exercise and some identified areas where untrusted input is consumed. In terms of AI influence level the commits range from AIL-3 to AIL-5 with the actual fuzzers mostly being AIL-5.

In terms of changes I think are still worth discussing is the move to tool dependencies for github.com/AdamKorcz/go-118-fuzz-build, since the build script cloning it seemed less than ideal.

There are more opportunities to make the development process better; one problem with the current state is that building fuzzers with the containerd repo mounted into the oss-fuzz container ends up littering the repo with a bunch of fuzz_ binary files and protoc zips that never get cleaned up. There's also something odd the oss-fuzz container is doing around changing file ownership during fuzzer build.

The commits have been edited decently beyond the agent's creation in terms of history curation: the guide was the last thing produced but is the first commit in the series as it's probably the most important, the build script/dependency changes are also things that were found later but commits are placed earlier. I had to do a bit of commit-splitting too.

This is in draft for now as I still want to iterate and review (at least a little more) the fuzzers themselves. As this adds 5 new fuzzers it's a bit of a big PR; I can easily split it into one per fuzzer to make reviewing easier but I wanted to post the whole series first just so it can all be seen.

cc @AdamKorcz since you did a lot of the initial fuzzer work.

Adds comprehensive documentation for developing and testing fuzzers within
containerd's OSS-Fuzz integration.

This guide covers:
- Fuzzer location and structure (using `pull_fuzz_test.go` as an example).
- The OSS-Fuzz build process using `helper.py`.
- Required adjustments to `oss_fuzz_build.sh` (git config, robust vendor
  cleanup, gofmt-compliant dependency registration).
- Troubleshooting common build issues.
- Running fuzzer binaries locally.

This documentation serves as a reference for future fuzzer development,
codifying best practices and common pitfalls.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Refactor how fuzzer-specific build dependencies are managed to improve
build robustness and resolve linting issues on generated code.

Specifically:
- Migrated the dependency on `github.com/AdamKorcz/go-118-fuzz-build` and
  its `testing` subpackage to use the Go 1.24 `tool` directive in
  `go.mod`. This is the modern, recommended approach for managing
  build-time tools.
- Updated `contrib/fuzz/oss_fuzz_build.sh` to remove the fragile logic
  that generated a temporary `client/registerfuzzdep.go` file and
  manually ran `go mod tidy`. These steps are no longer necessary with
  the `tool` directive.
- Streamlined `oss_fuzz_build.sh` to be more maintainable by eliminating
  redundant filesystem manipulations.

These changes ensure a cleaner workspace and a more reliable build
process for all fuzzers in the containerd repository.

Testing:
- Ran `make check` to verify the workspace is lint-clean.
- Ran the full OSS-Fuzz build process to confirm successful fuzzer
  compilation.
- Verified fuzzer execution with `run_fuzzer`.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Streamlined `contrib/fuzz/oss_fuzz_build.sh` by removing redundant Go
installation steps and fragile vendoring logic. The script now correctly
utilizes the system-provided Go and the dependencies managed via the
`tool` directive.

Testing:
- Verified successful fuzzer compilation with the streamlined script.
- Verified system Go version (1.25.0) is sufficient for the build.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Adds a new fuzzer for the `client.Pull` API to improve fuzzing coverage.

This fuzzer helps identify potential vulnerabilities or unexpected behaviors
when pulling container images, contributing to the overall security and
stability of containerd.

As part of integrating this new fuzzer, and to ensure consistent and
reliable builds for all fuzzers within the OSS-Fuzz environment, the
`contrib/fuzz/oss_fuzz_build.sh` script was improved:
- Configured git to mark `/src/containerd` as a safe directory to prevent
  "dubious ownership" errors during containerized builds.
- Changed `rm -r vendor` to `rm -rf vendor` for more robust vendor
  directory cleanup, preventing build failures if the directory is missing.
- Ensured the dynamically generated `client/registerfuzzdep.go` is
  gofmt-compliant, resolving previous linting issues on generated code.

These enhancements to the build script facilitate the seamless integration
and future development of fuzzers for containerd.

Testing:
- Ran `make check` to ensure lint passes.
- Ran `python3 /tmp/oss-fuzz/infra/helper.py build_fuzzers containerd <local_path>`
  to build all fuzzers.
- Ran `python3 /tmp/oss-fuzz/infra/helper.py run_fuzzer containerd fuzz_FuzzPull
  -- -max_total_time=10` to execute the new fuzzer.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `NewBinaryIO` and
`NewBinaryCmd` functions. These functions are responsible for parsing
logging URIs and constructing external commands for pluggable shim
logging.

The fuzzer:
- Tests `NewBinaryCmd` with various URI schemes and query parameters.
- Tests `NewBinaryIO` by attempting to start processes based on the
  fuzzed URIs.
- Includes a safety mechanism to prevent the accidental execution of
  random binaries found on the system during fuzzing, while still
  allowing the execution path to be tested with `/bin/true`.

This helps ensure that URI parsing and command construction in the shim
are robust against malformed or malicious inputs.

Testing:
- Verified successful fuzzer compilation in the OSS-Fuzz environment.
- Ran the fuzzer for 10 seconds to confirm functionality and coverage
  growth.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `copyPipes` function. This
function is responsible for copying data between the container's I/O
(FIFOs or pipes) and external files or logs.

The fuzzer:
- Tests various combinations of stdout and stderr paths, including
  sharing the same file.
- Mocks the `runc.IO` interface using `io.Pipe()` to provide and
  consume data asynchronously.
- Exercises the data copy logic for stdin, stdout, and stderr.
- Uses a simplified approach with normal files to ensure robustness and
  avoid deadlocks that can occur with complex FIFO synchronization.

This helps ensure that the shim's core I/O copying logic is robust
against malformed data and edge cases in file path configurations.

Testing:
- Verified successful fuzzer compilation in the OSS-Fuzz environment.
- Ran the fuzzer for 10 seconds to confirm functionality and coverage.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `createIO` function.

The `FuzzCreateIO` fuzzer:
- Fuzzes container ID, UID, GID, and the `stdio.Stdio` struct.
- Exercises URI parsing and scheme dispatching (fifo, binary, file).
- Safely handles the "binary" scheme by ensuring only known-safe
  binaries are used.

This helps ensure that the shim's I/O infrastructure is robust against
malformed data and various edge cases in configuration.

Testing:
- Verified successful compilation in OSS-Fuzz.
- Ran fuzzer for 30 seconds, confirming stable operation and high
  execution rate.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's Task Service API. This
fuzzer exercises the primary ttrpc/gRPC interface used by containerd to
interact with the shim.

The fuzzer:
- Performs sequences of fuzzed operations (up to 10 per iteration) to
  test state transitions and edge cases.
- Covers all major Task Service methods: Create, Start, Delete, Exec,
  ResizePty, State, Pause, Resume, Kill, Pids, CloseIO, Checkpoint,
  Update, Wait, Connect, Shutdown, and Stats.
- Mocks external dependencies like the event publisher and shutdown
  service.
- Uses temporary directories and namespaces to ensure isolation during
  fuzzing.

This helps verify that the shim's API has robust input validation and
state management, protecting against potential vulnerabilities from
malformed or malicious requests.

Testing:
- Verified successful fuzzer compilation in the OSS-Fuzz environment.
- Ran the fuzzer for 10 seconds, confirming high execution rates and
  stable operation.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Fixes a goroutine and subscription leak in the shim task service where
the service would subscribe to the global reaper but never unsubscribe
upon shutdown. This caused the reaper's subscriber list to grow
indefinitely and leaked processExits goroutines in long-running
scenarios or tests (like fuzzing) that repeatedly initialize the service.

The fix ensures reaper.Default.Unsubscribe(s.ec) is called during
shutdown. Since Unsubscribe closes the channel, this also signals the
processExits goroutine to terminate.

Additionally, a recover() block was added to the send method to
handle a race condition where an event might be sent to the events
channel after it has been closed during shutdown, protecting the process
from crashing with a panic.

These issues were discovered by the FuzzTaskService fuzzer.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update the Fuzzing workflow to upload crash artifacts found during the
go_test_fuzz job. Currently, when `go test -fuzz` fails, the crash
reproducers are generated but not preserved, making it difficult to
diagnose and fix the issues discovered in CI.

This change adds an upload-artifact step that captures all files in
testdata/fuzz directories across the repository upon failure.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Address intermittent `exit status 2` crashes and hangs in
FuzzTaskService caused by resource exhaustion (goroutine and file
descriptor leaks).

The primary leak occurred because each fuzzer iteration created a new
shim task service, which in turn initialized a new `stdio.Platform`.
In the Linux implementation, `NewPlatform` spawns a background epoll
goroutine that was never closed, leading to thousands of leaked
goroutines during high-frequency fuzzing.

Changes:
- Implement functional options for `NewTaskService` to allow dependency
  injection.
- Add a private `withPlatform` option to provide an existing
  `stdio.Platform` instance.
- Update `FuzzTaskService` to use a single, shared `stdio.Platform`
  across all iterations via the `withPlatform` option.
- Make `mockPublisher` thread-safe by adding a `sync.Mutex` to protect
  the `events` slice.
- Synchronize fuzzer shutdown by waiting for both the service's shutdown
  callbacks and the event forwarding goroutine to complete.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Include lessons learned about resource management, high-frequency
iterations, and debugging techniques for fuzzers.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Address `FuzzPull` failures caused by permission denied errors when
attempting to chown the gRPC socket and "address already in use" errors
during parallel fuzzing.

Changes:
- Use the current user's UID and GID for the gRPC socket configuration
  instead of defaulting to 0 (root).
- Use a unique temporary directory for the daemon's root, state, and
  socket path to avoid collisions between multiple fuzzer workers.
- Change `defaultRoot`, `defaultState`, and `defaultAddress` from
  constants to variables to allow runtime updates.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Optimize `FuzzImageStore`, `FuzzLeaseManager`, `FuzzContainerStore`, and
`FuzzContentStore` to share a single database instance across all fuzzer
iterations.

Previously, each iteration created a new temporary directory and a new
database file, which is highly inefficient and caused `context deadline
exceeded` errors in CI environments with slow I/O.

Changes:
- Refactor all fuzzers in `contrib/fuzz/metadata_fuzz_test.go` to
  initialize the database and other persistent resources once per
  fuzzer run.
- Update `testDB` to accept `*testing.F` for managing the shared
  temporary directory.
- Remove the redundant `testEnv` helper function.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update `script/go-test-fuzz.sh` to skip the `docs/` directory when
searching for fuzz targets. This prevents the script from attempting to
run code snippets found in documentation as actual tests, which causes
failures in CI because the `docs/` directory contains no Go source
files.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update `FuzzCreateIO` and `FuzzCopyPipes` to use `t.TempDir()` for
temporary directory management and increase the context timeout in
`FuzzCreateIO` to 1 second.

The previous 100ms timeout in `FuzzCreateIO` was too aggressive and
caused `context deadline exceeded` errors in CI when the system was
under high concurrent load.

Assisted-by: gemini-cli
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

2 participants