Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

When configuring an out-of-core HPL burn-in for a 40B matrix on 8x H100 nodes, which environment variable prevents GPU out-of-memory errors while reserving space for drivers?

A.

export HPL_OOC_SAFE_SIZE=4.0

B.

export HPL_OOC_MODE=0

C.

export HPL_OOC_NUM_STREAMS=8

D.

export HPL_OOC_MAX_GPU_MEM=90

A team is installing the NVIDIA Run:ai control plane on a Kubernetes cluster. Which two (2) options are most critical to validate before proceeding? (Pick the 2 correct responses below)

A.

Helm is installed on the installer machine.

B.

Ensure Kubernetes is running on the cluster.

C.

All cluster nodes have NVIDIA GPUs installed.

D.

NTP is disabled to simplify time synchronization.

An enterprise IT team has completed the physical installation of an AI Factory with a Spectrum-X Ethernet network connected to all GPU servers. They now need to ensure the environment is ready for scalable AI workload deployment. What is the recommended sequence of validation steps?

A.

Set up Active Directory and LDAP, configure role-based access controls and security settings first, install users, and skip network or hardware performance validation.

B.

Perform application benchmarking first, use performance logs to identify bottlenecks, update switch and server firmware afterward, and then tune the network using performance tests.

C.

Validate the software stack, test link connectivity and port health, run network benchmarks, run OSPF, ensure neighbors are exchanging route information, then stage AI workload tests.

D.

Confirm switch and server firmware configuration, test link connectivity and port health, run network benchmarks, validate the software stack, then stage AI workload tests.

A customer has just completed the first boot of their DGX system and is prompted to create an administrative user. What is the correct approach for setting up this user to ensure secure BMC and GRUB access?

A.

Create a unique, strong, lower-case username and password that will be used for both BMC and GRUB access, avoiding default or weak credentials.

B.

Create separate usernames for BMC and GRUB to maximize flexibility.

C.

Skip the creation of a new user and retain the default admin account for BMC and GRUB access.

D.

Use “sysadmin” as the username and a simple password for ease of management.

For a 48-hour NCCL burn-in test, which parameters ensure sustained fabric stress while detecting silent data corruption?

A.

broadcast_perf -b 4G -e 16G -w 160

B.

all_reduce_perf -b 8G -e 32G -c 1000 -z 1 -G 1000

C.

all_reduce_perf -b 8G -e 32G -z 1 -G 1000

D.

reduce_scatter_perf -f 2 -g 8

Which statement best explains why maintaining high cable signal quality is essential in modern high-speed data centers?

A.

High cable signal quality ensures that cable length and connector type do not play as big a role in deploying new infrastructure in the data center.

B.

High cable signal quality minimizes bit error rates and supports reliable, high-throughput communication, reducing retransmissions and congestion across the network.

C.

High cable signal quality reduces electromagnetic interference (EMI) and crosstalk, helping prevent unexpected packet drops during sustained workloads.

D.

High cable signal quality enables effective use of Forward Error Correction (FEC), which is required for reliable operation at high data rates such as 200GbE and above.

After configuring HA, the administrator runs cmsh status and notices the secondary head node reports mysql [FAIL]. What is the most likely cause?

A.

The BCM license expired after HA configuration.

B.

Network connectivity issues between the primary and secondary head nodes.

C.

The secondary head node lacks NVIDIA GPU drivers.

D.

The cluster nodes are powered on during the HA configuration.

What command sequence is used to identify the exact name of the server that runs as the master SM in a multi-node fabric?

A.

sminfo, then smpquery ND

B.

ibstat, then sminfo

C.

ibnetdiscover, then ibsim

D.

sminfo, then smpquery NI

A systems administrator is preparing a new DGX server for deployment. What is the most secure approach to configuring the BMC port during initial setup?

A.

Enable remote access to the BMC over the internet using the default admin credentials for initial troubleshooting.

B.

Connect the BMC port directly to the production network and retain default admin credentials for convenience.

C.

Leave the BMC port disconnected until after the operating system is fully configured and in production.

D.

Connect the BMC port to a dedicated and firewalled network and change the default admin credentials.

An administrator needs to verify HA functionality after configuring BCM (Bright Cluster Manager). Which command confirms the active head node and failover readiness?

A.

cmsh status to check HA status and active/standby roles.

B.

nvsm show health to validate GPU status on both head nodes.

C.

systemctl restart cmdaemon to force a failover test.

D.

ping < secondary-head-node-ip > to test basic connectivity.