Fuzzing: documentation updates, build script changes, new fuzzers#12896
Draft
samuelkarp wants to merge 16 commits intocontainerd:mainfrom
Draft
Fuzzing: documentation updates, build script changes, new fuzzers#12896samuelkarp wants to merge 16 commits intocontainerd:mainfrom
samuelkarp wants to merge 16 commits intocontainerd:mainfrom
Conversation
Adds comprehensive documentation for developing and testing fuzzers within containerd's OSS-Fuzz integration. This guide covers: - Fuzzer location and structure (using `pull_fuzz_test.go` as an example). - The OSS-Fuzz build process using `helper.py`. - Required adjustments to `oss_fuzz_build.sh` (git config, robust vendor cleanup, gofmt-compliant dependency registration). - Troubleshooting common build issues. - Running fuzzer binaries locally. This documentation serves as a reference for future fuzzer development, codifying best practices and common pitfalls. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Refactor how fuzzer-specific build dependencies are managed to improve build robustness and resolve linting issues on generated code. Specifically: - Migrated the dependency on `github.com/AdamKorcz/go-118-fuzz-build` and its `testing` subpackage to use the Go 1.24 `tool` directive in `go.mod`. This is the modern, recommended approach for managing build-time tools. - Updated `contrib/fuzz/oss_fuzz_build.sh` to remove the fragile logic that generated a temporary `client/registerfuzzdep.go` file and manually ran `go mod tidy`. These steps are no longer necessary with the `tool` directive. - Streamlined `oss_fuzz_build.sh` to be more maintainable by eliminating redundant filesystem manipulations. These changes ensure a cleaner workspace and a more reliable build process for all fuzzers in the containerd repository. Testing: - Ran `make check` to verify the workspace is lint-clean. - Ran the full OSS-Fuzz build process to confirm successful fuzzer compilation. - Verified fuzzer execution with `run_fuzzer`. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Streamlined `contrib/fuzz/oss_fuzz_build.sh` by removing redundant Go installation steps and fragile vendoring logic. The script now correctly utilizes the system-provided Go and the dependencies managed via the `tool` directive. Testing: - Verified successful fuzzer compilation with the streamlined script. - Verified system Go version (1.25.0) is sufficient for the build. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Adds a new fuzzer for the `client.Pull` API to improve fuzzing coverage. This fuzzer helps identify potential vulnerabilities or unexpected behaviors when pulling container images, contributing to the overall security and stability of containerd. As part of integrating this new fuzzer, and to ensure consistent and reliable builds for all fuzzers within the OSS-Fuzz environment, the `contrib/fuzz/oss_fuzz_build.sh` script was improved: - Configured git to mark `/src/containerd` as a safe directory to prevent "dubious ownership" errors during containerized builds. - Changed `rm -r vendor` to `rm -rf vendor` for more robust vendor directory cleanup, preventing build failures if the directory is missing. - Ensured the dynamically generated `client/registerfuzzdep.go` is gofmt-compliant, resolving previous linting issues on generated code. These enhancements to the build script facilitate the seamless integration and future development of fuzzers for containerd. Testing: - Ran `make check` to ensure lint passes. - Ran `python3 /tmp/oss-fuzz/infra/helper.py build_fuzzers containerd <local_path>` to build all fuzzers. - Ran `python3 /tmp/oss-fuzz/infra/helper.py run_fuzzer containerd fuzz_FuzzPull -- -max_total_time=10` to execute the new fuzzer. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `NewBinaryIO` and `NewBinaryCmd` functions. These functions are responsible for parsing logging URIs and constructing external commands for pluggable shim logging. The fuzzer: - Tests `NewBinaryCmd` with various URI schemes and query parameters. - Tests `NewBinaryIO` by attempting to start processes based on the fuzzed URIs. - Includes a safety mechanism to prevent the accidental execution of random binaries found on the system during fuzzing, while still allowing the execution path to be tested with `/bin/true`. This helps ensure that URI parsing and command construction in the shim are robust against malformed or malicious inputs. Testing: - Verified successful fuzzer compilation in the OSS-Fuzz environment. - Ran the fuzzer for 10 seconds to confirm functionality and coverage growth. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `copyPipes` function. This function is responsible for copying data between the container's I/O (FIFOs or pipes) and external files or logs. The fuzzer: - Tests various combinations of stdout and stderr paths, including sharing the same file. - Mocks the `runc.IO` interface using `io.Pipe()` to provide and consume data asynchronously. - Exercises the data copy logic for stdin, stdout, and stderr. - Uses a simplified approach with normal files to ensure robustness and avoid deadlocks that can occur with complex FIFO synchronization. This helps ensure that the shim's core I/O copying logic is robust against malformed data and edge cases in file path configurations. Testing: - Verified successful fuzzer compilation in the OSS-Fuzz environment. - Ran the fuzzer for 10 seconds to confirm functionality and coverage. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's `createIO` function. The `FuzzCreateIO` fuzzer: - Fuzzes container ID, UID, GID, and the `stdio.Stdio` struct. - Exercises URI parsing and scheme dispatching (fifo, binary, file). - Safely handles the "binary" scheme by ensuring only known-safe binaries are used. This helps ensure that the shim's I/O infrastructure is robust against malformed data and various edge cases in configuration. Testing: - Verified successful compilation in OSS-Fuzz. - Ran fuzzer for 30 seconds, confirming stable operation and high execution rate. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Implement a new fuzzer targeting the shim's Task Service API. This fuzzer exercises the primary ttrpc/gRPC interface used by containerd to interact with the shim. The fuzzer: - Performs sequences of fuzzed operations (up to 10 per iteration) to test state transitions and edge cases. - Covers all major Task Service methods: Create, Start, Delete, Exec, ResizePty, State, Pause, Resume, Kill, Pids, CloseIO, Checkpoint, Update, Wait, Connect, Shutdown, and Stats. - Mocks external dependencies like the event publisher and shutdown service. - Uses temporary directories and namespaces to ensure isolation during fuzzing. This helps verify that the shim's API has robust input validation and state management, protecting against potential vulnerabilities from malformed or malicious requests. Testing: - Verified successful fuzzer compilation in the OSS-Fuzz environment. - Ran the fuzzer for 10 seconds, confirming high execution rates and stable operation. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Fixes a goroutine and subscription leak in the shim task service where the service would subscribe to the global reaper but never unsubscribe upon shutdown. This caused the reaper's subscriber list to grow indefinitely and leaked processExits goroutines in long-running scenarios or tests (like fuzzing) that repeatedly initialize the service. The fix ensures reaper.Default.Unsubscribe(s.ec) is called during shutdown. Since Unsubscribe closes the channel, this also signals the processExits goroutine to terminate. Additionally, a recover() block was added to the send method to handle a race condition where an event might be sent to the events channel after it has been closed during shutdown, protecting the process from crashing with a panic. These issues were discovered by the FuzzTaskService fuzzer. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update the Fuzzing workflow to upload crash artifacts found during the go_test_fuzz job. Currently, when `go test -fuzz` fails, the crash reproducers are generated but not preserved, making it difficult to diagnose and fix the issues discovered in CI. This change adds an upload-artifact step that captures all files in testdata/fuzz directories across the repository upon failure. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Address intermittent `exit status 2` crashes and hangs in FuzzTaskService caused by resource exhaustion (goroutine and file descriptor leaks). The primary leak occurred because each fuzzer iteration created a new shim task service, which in turn initialized a new `stdio.Platform`. In the Linux implementation, `NewPlatform` spawns a background epoll goroutine that was never closed, leading to thousands of leaked goroutines during high-frequency fuzzing. Changes: - Implement functional options for `NewTaskService` to allow dependency injection. - Add a private `withPlatform` option to provide an existing `stdio.Platform` instance. - Update `FuzzTaskService` to use a single, shared `stdio.Platform` across all iterations via the `withPlatform` option. - Make `mockPublisher` thread-safe by adding a `sync.Mutex` to protect the `events` slice. - Synchronize fuzzer shutdown by waiting for both the service's shutdown callbacks and the event forwarding goroutine to complete. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Include lessons learned about resource management, high-frequency iterations, and debugging techniques for fuzzers. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Address `FuzzPull` failures caused by permission denied errors when attempting to chown the gRPC socket and "address already in use" errors during parallel fuzzing. Changes: - Use the current user's UID and GID for the gRPC socket configuration instead of defaulting to 0 (root). - Use a unique temporary directory for the daemon's root, state, and socket path to avoid collisions between multiple fuzzer workers. - Change `defaultRoot`, `defaultState`, and `defaultAddress` from constants to variables to allow runtime updates. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Optimize `FuzzImageStore`, `FuzzLeaseManager`, `FuzzContainerStore`, and `FuzzContentStore` to share a single database instance across all fuzzer iterations. Previously, each iteration created a new temporary directory and a new database file, which is highly inefficient and caused `context deadline exceeded` errors in CI environments with slow I/O. Changes: - Refactor all fuzzers in `contrib/fuzz/metadata_fuzz_test.go` to initialize the database and other persistent resources once per fuzzer run. - Update `testDB` to accept `*testing.F` for managing the shared temporary directory. - Remove the redundant `testEnv` helper function. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update `script/go-test-fuzz.sh` to skip the `docs/` directory when searching for fuzz targets. This prevents the script from attempting to run code snippets found in documentation as actual tests, which causes failures in CI because the `docs/` directory contains no Go source files. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
Update `FuzzCreateIO` and `FuzzCopyPipes` to use `t.TempDir()` for temporary directory management and increase the context timeout in `FuzzCreateIO` to 1 second. The previous 100ms timeout in `FuzzCreateIO` was too aggressive and caused `context deadline exceeded` errors in CI when the system was under high concurrent load. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series of commits was generated in response to a threat modeling exercise and some identified areas where untrusted input is consumed. In terms of AI influence level the commits range from AIL-3 to AIL-5 with the actual fuzzers mostly being AIL-5.
In terms of changes I think are still worth discussing is the move to
tooldependencies for github.com/AdamKorcz/go-118-fuzz-build, since the build script cloning it seemed less than ideal.There are more opportunities to make the development process better; one problem with the current state is that building fuzzers with the containerd repo mounted into the oss-fuzz container ends up littering the repo with a bunch of
fuzz_binary files andprotoczips that never get cleaned up. There's also something odd the oss-fuzz container is doing around changing file ownership during fuzzer build.The commits have been edited decently beyond the agent's creation in terms of history curation: the guide was the last thing produced but is the first commit in the series as it's probably the most important, the build script/dependency changes are also things that were found later but commits are placed earlier. I had to do a bit of commit-splitting too.
This is in draft for now as I still want to iterate and review (at least a little more) the fuzzers themselves. As this adds 5 new fuzzers it's a bit of a big PR; I can easily split it into one per fuzzer to make reviewing easier but I wanted to post the whole series first just so it can all be seen.
cc @AdamKorcz since you did a lot of the initial fuzzer work.