A system administrator of a high-performance computing (HPC) cluster that uses an InfiniBand fabric for high-speed interconnects between nodes received reports from researchers that they are experiencing unusually slow data transfer rates between two specific compute nodes. The system administrator needs to ensure the path between these two nodes is optimal.
What command should be used?
An administrator requires full access to the NGC Base Command Platform CLI.
Which command should be used to accomplish this action?
A system administrator is troubleshooting a Docker container that crashes unexpectedly due to a segmentation fault. They want to generate and analyze core dumps to identify the root cause of the crash.
Why would generating core dumps be a critical step in troubleshooting this issue?
A cloud engineer is looking to provision a virtual machine for machine learning using the NVIDIA Virtual Machine Image (VMI) and Rapids.
What technology stack will be set up for the development team automatically when the VMI is deployed?
A system administrator needs to lower latency for an AI application by utilizing GPUDirect Storage.
What two (2) bottlenecks are avoided with this approach? (Choose two.)
You are managing a high-performance computing environment. Users have reported storage performance degradation, particularly during peak usage hours when both small metadata-intensive operations and large sequential I/O operations are being performed simultaneously. You suspect that the mixed workload is causing contention on the storage system.
Which of the following actions is most likely to improve overall storage performance in this mixed workload environment?
An administrator wants to check if the BlueMan service can access the DPU.
How can this be done?
If a Magnum IO-enabled application experiences delays during the ETL phase, what troubleshooting step should be taken?
Your Kubernetes cluster is running a mixture of AI training and inference workloads. You want to ensure that inference services have higher priority over training jobs during peak resource usage times.
How would you configure Kubernetes to prioritize inference workloads?