Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

An administrator needs to perform a comprehensive pre-production stress test on a DGX H100 system. Which command validates GPU, CPU, memory, and storage components while following NVIDIA’s recommended procedure?

A.

nvidia-smi -q | grep " GPU Stress Test "

B.

sudo nvsm stress-test --force

C.

stress --cpu $(nproc) --io $(nproc) --timeout 600

D.

./gpu_burn 60

During cluster deployment, the UFM Cable Validation Tool reports " Wrong-neighbor " errors on multiple InfiniBand links. What is the most efficient way to resolve this issue?

A.

Reboot all leaf switches to force LLDP rediscovery.

B.

Replace all affected cables with higher-grade OM5 fiber optics.

C.

Verify LLDP data against topology files and remediate.

D.

Disable FEC on all switches to bypass neighbor validation.

During BCM cluster setup, an engineer must configure bonded network interfaces on DGX nodes for high availability. Which cmsh command sequence properly configures a bond0 interface with two physical NICs?

A.

device use dgx001 ; interfaces add vlan vlan100 ; set parent bond0 ; set mode 1 ; set network internalnet

B.

device use dgx001 ; interfaces add bond bond0 ; append interfaces enp225s0f1np1 enp97s0f1np1 ; set mode 1 ; set network internalnet

C.

device use dgx001 ; interfaces set enp225s0f1np1 network internalnet ; interfaces set enp97s0f1np1 network internalnet

D.

device use dgx001 ; interfaces delete enp225s0f1np1 ; interfaces delete enp97s0f1np1

An infrastructure engineer is preparing a new AI cluster for production use, relying on NVIDIA switches and high-speed optical transceivers for node connectivity. The team is finalizing network validation before launching large-scale training jobs. Why is it critical to confirm and align the firmware version on all switch transceivers prior to production?

A.

To guarantee that hardware inventory tools can report serial numbers and manufacturer codes for asset management, which is critical for future support and troubleshooting.

B.

To ensure stability, bandwidth, and compatibility across the cluster, avoiding link issues and performance loss.

C.

To allow the network operating system to automatically discover all connected transceivers with heterogeneous firmware.

D.

To reduce GPU memory consumption during distributed training jobs.

A network engineer is tasked with configuring the management, storage, and compute networks for a new DGX BasePOD deployment. Which statement best describes the network segmentation required for optimal operation?

A.

A single VLAN for all types of network traffic.

B.

Two networks: one for management and one for compute.

C.

Four networks: compute, storage, out-of-band, and management.

A user wants to restrict a Docker container to use only GPUs 0 and 2. Which command achieves this?

A.

docker run --gpus ' " device=0,2 " ' nvidia/cuda:12.1-base nvidia-smi

B.

docker run -e NVIDIA_VISIBLE_DEVICES=0,2 nvidia/cuda:12.1-base nvidia-smi

C.

docker run --gpus all nvidia/cuda:12.1-base nvidia-smi -id=0,2

D.

docker run --device /dev/nvidia0,/dev/nvidia2 nvidia/cuda:12.1-base nvidia-smi