Skip to content

fix: use flexible datetime parsing for start_date in file-based connectors#887

Merged
Aaron ("AJ") Steers (aaronsteers) merged 8 commits intomainfrom
devin/1769813348-fix-start-date-pattern
Jan 31, 2026
Merged

fix: use flexible datetime parsing for start_date in file-based connectors#887
Aaron ("AJ") Steers (aaronsteers) merged 8 commits intomainfrom
devin/1769813348-fix-start-date-pattern

Conversation

@aaronsteers
Copy link
Contributor

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Jan 30, 2026

Summary

Replaces the strict regex pattern for start_date validation in AbstractFileBasedSpec with a Pydantic validator using ab_datetime_try_parse. This fixes issues where valid ISO8601/RFC3339 datetime strings like 2025-01-01T00:00:00Z (without microseconds) were incorrectly rejected.

Before: Only accepted YYYY-MM-DDTHH:mm:ss.SSSSSSZ (exactly 6 microsecond digits required)
After: Accepts any valid ISO8601/RFC3339 format via ab_datetime_try_parse

Fixes: airbytehq/oncall#9390

Updates since last revision

  • Updated csv_scenarios.py test expectations to remove pattern and pattern_descriptor fields from the expected spec, aligning with the new Field definition

Review & Testing Checklist for Human

  • Schema impact: The pattern and pattern_descriptor fields were removed from the Field definition. Verify this doesn't break UI rendering or connector spec generation for file-based sources.
  • Validation permissiveness: ab_datetime_try_parse is quite flexible (accepts date-only, timestamps, various offsets). Confirm this level of permissiveness is acceptable for start_date.
  • Test with affected connectors: Test a file-based connector (e.g., S3, SharePoint) with various start_date formats including 2025-01-01T00:00:00Z and 2025-01-01T00:00:00.000000Z.

Notes


Open with Devin

Summary by CodeRabbit

  • New Features

    • Enhanced start_date field validation to accept flexible date/time formats, including date-only entries, various time precisions, and timezone offsets.
    • Improved error guidance for invalid date inputs.
  • Tests

    • Added comprehensive validation tests for start_date format handling across multiple scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

…ctors

Replace strict regex pattern with ab_datetime_try_parse validator to accept
any valid ISO8601/RFC3339 datetime format. This fixes issues where valid
datetime strings like '2025-01-01T00:00:00Z' (without microseconds) were
incorrectly rejected.

Fixes: airbytehq/oncall#9390
Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1769813348-fix-start-date-pattern#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1769813348-fix-start-date-pattern

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link

github-actions bot commented Jan 30, 2026

PyTest Results (Fast)

3 848 tests  +9   3 836 ✅ +9   6m 33s ⏱️ -15s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit a29abec. ± Comparison against base commit 313db66.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jan 30, 2026

PyTest Results (Full)

3 851 tests  +9   3 839 ✅ +9   10m 52s ⏱️ -10s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit a29abec. ± Comparison against base commit 313db66.

♻️ This comment has been updated with latest results.

Co-Authored-By: AJ Steers <aj@airbyte.io>
devin-ai-integration bot and others added 2 commits January 30, 2026 23:34
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
@aaronsteers Aaron ("AJ") Steers (aaronsteers) marked this pull request as ready for review January 30, 2026 23:38
Copilot AI review requested due to automatic review settings January 30, 2026 23:38
Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces strict regex validation for start_date in file-based connectors with a flexible Pydantic validator using ab_datetime_try_parse. This allows valid ISO8601/RFC3339 datetime formats (like 2025-01-01T00:00:00Z) that were previously rejected due to the requirement of exactly 6 microsecond digits.

Changes:

  • Added a Pydantic validator to AbstractFileBasedSpec that uses ab_datetime_try_parse for flexible datetime parsing
  • Updated the pattern and pattern_descriptor fields to reflect the broader range of accepted formats
  • Added test coverage for various datetime format inputs including edge cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py Added validator method and updated field metadata to support flexible datetime formats
unit_tests/sources/file_based/config/test_abstract_file_based_spec.py Added comprehensive test coverage for the new validator with various datetime formats
unit_tests/sources/file_based/scenarios/csv_scenarios.py Updated test expectations to match new pattern and examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

This PR enhances start_date validation in file-based source configurations by introducing Pydantic validators to enforce ISO8601/RFC3339 format parsing, expanding supported date-time formats, and adding comprehensive test coverage.

Changes

Cohort / File(s) Summary
Start Date Validation Logic
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py
Adds validator from pydantic.v1 and ab_datetime_try_parse helper. Introduces validate_start_date() classmethod that enforces ISO8601/RFC3339 format parsing at runtime, raising ValueError for unparseable strings. Expands start_date field constraints with additional examples and a more permissive regex pattern accepting optional time components and timezones.
Test Coverage & Configuration
unit_tests/sources/file_based/config/test_abstract_file_based_spec.py, unit_tests/sources/file_based/scenarios/csv_scenarios.py
Adds parametrized test test_start_date_validation covering valid (microseconds, milliseconds, timezone offsets, date-only, None) and invalid format cases. Updates CSV scenario start_date schema with expanded examples and relaxed regex pattern to accept date-only and various time precision formats.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: replacing strict regex validation with flexible datetime parsing for start_date in file-based connectors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1769813348-fix-start-date-pattern

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 3 additional flags in Devin Review.

Open in Devin Review

@aaronsteers
Copy link
Contributor Author

Aaron ("AJ") Steers (aaronsteers) commented Jan 30, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/21534542274

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

@devin-ai-integration
Copy link
Contributor

Validation Evidence: Flexible start_date Parsing

I've validated that the flexible datetime parsing works correctly on the SharePoint Enterprise connector (which inherits from AbstractFileBasedSpec).

Unit Tests Added

Added 9 parametrized test cases in unit_tests/test_spec.py covering:

  • With microseconds: 2021-01-01T00:00:00.000000Z
  • Without microseconds: 2021-01-01T00:00:00Z (Terraform provider format)
  • With milliseconds: 2021-01-01T00:00:00.000Z
  • With timezone offset: 2021-01-01T00:00:00+00:00
  • Date only: 2021-01-01
  • None value
  • Invalid strings (correctly rejected)
  • Empty strings (correctly rejected)

Direct Validation Results

Testing start_date validation on SourceMicrosoftSharePointSpec:
======================================================================
PASS: with_microseconds: "2021-01-01T00:00:00.000000Z" -> accepted
PASS: without_microseconds (Terraform format): "2021-01-01T00:00:00Z" -> accepted
PASS: with_milliseconds: "2021-01-01T00:00:00.000Z" -> accepted
PASS: terraform_provider_format: "2025-01-01T00:00:00Z" -> accepted
PASS: with_timezone_offset: "2021-01-01T00:00:00+00:00" -> accepted
PASS: date_only: "2021-01-01" -> accepted
PASS: none_value: "None" -> accepted
PASS: invalid_string: "not-a-date" -> correctly rejected
PASS: empty_string: "" -> correctly rejected
======================================================================
All tests passed!

This confirms the fix resolves the original issue where Terraform provider's Go time.Time serialization (without microseconds) was being rejected.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit efad73e into main Jan 31, 2026
28 of 29 checks passed
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the devin/1769813348-fix-start-date-pattern branch January 31, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants