[AWS][Kinesis] Add dimension fields for TSDB support#5891
[AWS][Kinesis] Add dimension fields for TSDB support#5891constanca-m merged 3 commits intoelastic:mainfrom
Conversation
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
🌐 Coverage report
|
agithomas
left a comment
There was a problem hiding this comment.
Review feedback shared.
| type: group | ||
| fields: | ||
| - name: StreamName | ||
| dimension: true |
There was a problem hiding this comment.
Please add here the reason for adding the specific field as a dimension field.Adding the reason is among the best practices for TSDB enablement.
There was a problem hiding this comment.
By "here" you mean in the manifest @agithomas? It's explained in "Details" in the PR description.
There was a problem hiding this comment.
It can be added as the inline comment. Reference
There was a problem hiding this comment.
I have certain thoughts around a better way to handle aws dimensions. We have a length limitation in dimension and AWS permits 30 dimensions
If all 30 names and values are fully used to max limit, the 32KB dimension field length limitation would reach.
Can we have fingerprint processor applied on all aws dimensions and use the new field (having fingerprint) used as a dimension field?
There was a problem hiding this comment.
Ok, I will try to add it to the document without being too confused. I don't understand the other part though, there are only 3 fields set as dimension, why would we have the need for a processor? @agithomas
There was a problem hiding this comment.
Please validate the proposal against
- how often aws change the dimensions of a managed service
- feasibility of including ingest pipeline and inclusion of new field only for implementing TSDB.
There was a problem hiding this comment.
Sorry, I don't understand: is there a reason to create a new dimension using the dimensions.* field? The 3 fields set to dimension right now should be enough @agithomas
There was a problem hiding this comment.
The above proposal is based only on the convenience of TSDB. Please compare and choose the best approach.
There was a problem hiding this comment.
@agithomas from my understanding, aws dimensions shouldn't be a concern at least for this data_stream, since we set StreamName as a dimension (dimension in TSDB scope) - here is an sample of event.
So we set aws.dimensions.StreamName (the field of type keyword) , not the aws.dimensions.* (the field of type object)
| name: cloud | ||
| - external: ecs | ||
| name: cloud.account.id | ||
| dimension: true |
There was a problem hiding this comment.
here was a suggestion to align on list of fields - elastic/ecs#2172 and was suggested to use cloud.project.id. Is this field available for AWS?
There was a problem hiding this comment.
I haven't checked, but if we added that as a dimension it would be redundant, as we don't really need it set as a dimension @tetianakravchenko
There was a problem hiding this comment.
we checked with @agithomas that cloud.project.id does not exist for all cloud providers, it only present for gcp.
| type: group | ||
| fields: | ||
| - name: StreamName | ||
| dimension: true |
There was a problem hiding this comment.
@constanca-m how this name is defined? as I see this name is not set in configuration

what if 2 kinesis data_stream in the same region will be created?
There was a problem hiding this comment.
When we connect to the AWS, it fetches data from the existent data streams. We don't create any data stream when we add the integration. The stream name is unique per region, and since region is a dimension, it shouldn't be a problem @tetianakravchenko
tetianakravchenko
left a comment
There was a problem hiding this comment.
As agreed: it might be needed to add agent.id field as a dimension in future, for now keeping as is.
|
@constanca-m , please confirm if the below tests are conducted ? Verification and validation
|
Yes, for dimensions they are correct. I didn't check the tasks because they are influenced by the two PRs, one for dimensions and one for metrics. @agithomas |
Ok, Thanks. Please include this validation for metric_type mapping PR. |
|
Package aws - 1.34.2 containing this change is available at https://epr.elastic.co/search?package=aws |
What does this PR do?
Add dimension fields to Kinesis datastream.
Details
To uniquely identify a Kinesis stream, we need the combination of stream name (unique per AWS region) + account ID + account region. There are no metrics split by labels, so no more dimensions should be needed. The tests with TSDB enabled and disabled did not show a change on the number of documents received.
Checklist
changelog.ymlfile.How to test this PR locally
Refer to #5864
Related issues