Skip to content

[Nvidia_GPU] Nvidia GPU Integration Enhancements#14081

Merged
Linu-Elias merged 17 commits intoelastic:mainfrom
Linu-Elias:nvidia_enhancements
Jun 16, 2025
Merged

[Nvidia_GPU] Nvidia GPU Integration Enhancements#14081
Linu-Elias merged 17 commits intoelastic:mainfrom
Linu-Elias:nvidia_enhancements

Conversation

@Linu-Elias
Copy link
Contributor

@Linu-Elias Linu-Elias commented May 30, 2025

Proposed commit message

  • Revised field group and included all fields
  • Dashboard Enhancements
  • Updated Documentation
  • Generated sample_event.json file

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

nvidia_dashboard1 1

nvidia_dashboard1 2
3.
nvidia_dashboard2 1

nvidia_dashboard2 2

@Linu-Elias Linu-Elias requested a review from a team as a code owner May 30, 2025 11:38
@andrewkroh andrewkroh added dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:nvidia_gpu NVIDIA GPU Monitoring Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels May 30, 2025
@Linu-Elias Linu-Elias self-assigned this May 30, 2025
@Linu-Elias Linu-Elias force-pushed the nvidia_enhancements branch from 4817e6a to 8e1956c Compare May 30, 2025 13:18
@Linu-Elias Linu-Elias requested a review from agithomas June 2, 2025 05:04
@agithomas agithomas requested a review from daniela-elastic June 2, 2025 10:19
@agithomas
Copy link
Contributor

Kindly handle not to display the Errors when the details of Bios Version is not avaiable.

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the dashboard says "The following dashboard provides insights into the following". To remove the repetition of the word following replace with "The Overview dashboard provides insights into the following"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a second dashboard (via link panel)? If so I can't see the. name of the dashboard. We should update the screenshot. I assume this is the dashboard named "GPU-level Metrics", as per the previous screenshot?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the "Brand name" widget, what values would we show there? It seems like a widget where we'd show a number but a brand name sounds like it will a string. Can you give an example for brand name value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section displays labels in general. It includes the version, model name, brand name, last gathered date, and the number of GPUs monitored. The brand name is most likely a string, similar to the model name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "Error count" with "Errors"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to show screenshots of dashboard sections with empty widgets?

Copy link

@daniela-elastic daniela-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left feedback, please address before merging. Conditionally approved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you re-check if it is good to remove these mappings; instead, they may be important dimensions to be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields are provided as labels, so we have not removed the mappings. Instead, we have mapped them using labels.* which includes all GPU labels.
I understand that we will need to update this and implement individual mappings during the TSDB enablement process. I am currently making the necessary changes while working on this task.

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @Linu-Elias

@elastic-sonarqube
Copy link

@Linu-Elias
Copy link
Contributor Author

Left feedback, please address before merging. Conditionally approved

@daniela-elastic, I have addressed all the comments and made the changes accordingly.
Kindly review.

@Linu-Elias Linu-Elias merged commit c6cf3d5 into elastic:main Jun 16, 2025
7 checks passed
@elastic-vault-github-plugin-prod

Package nvidia_gpu - 0.2.0 containing this change is available at https://epr.elastic.co/package/nvidia_gpu/0.2.0/

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dashboard Relates to a Kibana dashboard bug, enhancement, or modification. documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:nvidia_gpu NVIDIA GPU Monitoring Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants