Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

You are running Hadoop cluster with all monitoring facilities properly configured.

Which scenario will go undeselected?

A.

HDFS is almost full

B.

The NameNode goes down

C.

A DataNode is disconnected from the cluster

D.

Map or reduce tasks that are stuck in an infinite loop

E.

MapReduce jobs are causing excessive memory swaps

Your cluster’s mapred-start.xml includes the following parameters

mapreduce.map.memory.mb

4096

mapreduce.reduce.memory.mb

8192

And any cluster’s yarn-site.xml includes the following parameters

yarn.nodemanager.vmen-pmen-ration

2.1

What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?

A.

4 GB

B.

17.2 GB

C.

8.9 GB

D.

8.2 GB

E.

24.6 GB

Choose three reasons why should you run the HDFS balancer periodically? (Choose three)

A.

To ensure that there is capacity in HDFS for additional data

B.

To ensure that all blocks in the cluster are 128MB in size

C.

To help HDFS deliver consistent performance under heavy loads

D.

To ensure that there is consistent disk utilization across the DataNodes

E.

To improve data locality MapReduce

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

A.

Sample the web server logs web servers and copy them into HDFS using curl

B.

Ingest the server web logs into HDFS using Flume

C.

Channel these clickstreams into Hadoop using Hadoop Streaming

D.

Import all user clicks from your OLTP databases into Hadoop using Sqoop

E.

Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?

A.

Run hdfs fs –du / and locate the DFS Remaining value

B.

Run hdfs dfsadmin –report and locate the DFS Remaining value

C.

Run hdfs dfs / and subtract NDFS Used from configured Capacity

D.

Connect to http://mynamenode:50070/dfshealth.jsp and locate the DFS remaining value

You use the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation?

A.

The file will remain under-replicated until the administrator brings that node back online

B.

The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication factor doesn’t fall below)

C.

This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are resorted

D.

The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the NameNodes

Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1 (MRv1) to MapReduce V2 (MRv2)? (Choose three)

A.

Configure the NodeManager to enable MapReduce services on YARN by setting the following property in yarn-site.xml:

yarn.nodemanager.hostname

your_nodeManager_shuffle

B.

Configure the NodeManager hostname and enable node services on YARN by setting the following property in yarn-site.xml:

yarn.nodemanager.hostname

your_nodeManager_hostname

C.

Configure a default scheduler to run on YARN by setting the following property in mapred-site.xml:

mapreduce.jobtracker.taskScheduler

org.apache.hadoop.mapred.JobQueueTaskScheduler

D.

Configure the number of map tasks per jon YARN by setting the following property in mapred:

mapreduce.job.maps

2

E.

Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml:

yarn.resourcemanager.hostname

your_resourceManager_hostname

F.

Configure MapReduce as a Framework running on YARN by setting the following property in mapred-site.xml:

mapreduce.framework.name

yarn

Assuming you’re not running HDFS Federation, what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a “split-brain” scenario with your NameNode when running HDFS High Availability (HA) using Quorum-based storage?

A.

Two active NameNodes and two Standby NameNodes

B.

One active NameNode and one Standby NameNode

C.

Two active NameNodes and on Standby NameNode

D.

Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy

Identify two features/issues that YARN is designated to address: (Choose two)

A.

Standardize on a single MapReduce API

B.

Single point of failure in the NameNode

C.

Reduce complexity of the MapReduce APIs

D.

Resource pressure on the JobTracker

E.

Ability to run framework other than MapReduce, such as MPI

F.

HDFS latency

You are configuring a server running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format underlying file system of each DataNode?

A.

They must be formatted as HDFS

B.

They must be formatted as either ext3 or ext4

C.

They may be formatted in any Linux file system

D.

They must not be formatted - - HDFS will format the file system automatically