[Azure Logs] Expand date formats for parsing time fields#16328
Conversation
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
🚀 Benchmarks reportTo see the full report comment with |
packages/azure/data_stream/identity_protection/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
0518b40 to
f64bbff
Compare
|
For a bug fix, I would expect to see some tests that verify that the bug has been fixed. There are no changes to the tests so running the test suite only validates that the old tests did not get broken. |
kaiyan-sheng
left a comment
There was a problem hiding this comment.
Maybe we can add a sample log for each additional format as @StacieClark-Elastic mentioned about test?
I agree, it makes sense to add more tests to prove the change is effective. To add more context, this is not addressing a user-reported actual occurrence of this problem. This is mostly a defense against future off-spec changes, so we don't know which data stream will receive off-spec dates yet. Azure often emits off-spec values in dates and all sort of other fields. I classified this PR as a bugfix because Azure-focused integrations have to work in this environment. |
Yeah, and I guess we need to do it for every data stream in the Azure Logs package. |
Expands the fallback formats based on the observed frequency of the format for each given log category.
"M/d/yyyy h:mm:ss a XXX"
f64bbff to
2b76017
Compare
|
Tests related to the changes would be nice. Otherwise, LGTM! |
|
Thanks for all the comments and suggestions: collecting time format samples to add a "corner cases time formats" test file for each data stream. |
|
@StacieClark-Elastic, added the following test case with the know time formats: {"time": "01/09/2007 09:41:00", "category": "Administrative"}
{"time": "1/9/2007 09:41:00", "category": "Administrative"}
{"time": "01/09/2007 09:41:00 AM", "category": "Administrative"}
{"time": "1/9/2007 9:41:00 AM", "category": "Administrative"}
{"time": "1/9/2007 10:41:00 AM +01:00", "category": "Administrative"}
{"time": "2007-01-09T09:41:00", "category": "Administrative"}
{"time": "2007-01-09T09:41:00.22Z", "category": "Administrative"}
{"time": "2007-01-09T09:41:00.6816663Z", "category": "Administrative"}
{"time": "2007-01-09T09:41:00.535404056Z", "category": "Administrative"}
{"time": "2007-01-09T09:41:00.992099+00:00", "category": "Administrative"}
{"time": "2007-01-09T11:41:00+02:00", "category": "Administrative"}Then I added the test case to all data streams that attempt to parse the This should ensure all data streams support the known data format variations we spotted in the wild. Please let me know what you think of this approach. |
@StacieClark-Elastic, please let me know if you think the change requires more/different tests coverage. |
💚 Build Succeeded
History
cc @zmoog |
|
@StacieClark-Elastic, I'd appreciate your review when you get a moment! 🙇 |
|
Package azure - 1.31.1 containing this change is available at https://epr.elastic.co/package/azure/1.31.1/ |
Proposed commit message
We are revising the date processor format list to improve robustness. We've prioritized the formats based on usage frequency: the first format is the most common, followed by necessary fallbacks.
This addresses the fact that Azure Logs don't adhere to a single format. This approach maximizes efficiency while maintaining parsing resilience.
Checklist
changelog.ymlfile.I have verified that any added dashboard complies with Kibana's Dashboard good practicesHow to test this PR locally
Related issues
timeformat across all Azure Logs integrations #15977