Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Which action can a Cloud Dataproc Viewer perform?

A.

Submit a job.

B.

Create a cluster.

C.

Delete a cluster.

D.

List the jobs.

The Dataflow SDKs have been recently transitioned into which Apache service?

A.

Apache Spark

B.

Apache Hadoop

C.

Apache Kafka

D.

Apache Beam

Which TensorFlow function can you use to configure a categorical column if you don't know all of the possible values for that column?

A.

categorical_column_with_vocabulary_list

B.

categorical_column_with_hash_bucket

C.

categorical_column_with_unknown_values

D.

sparse_column_with_keys

Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?

A.

You expect to store at least 10 TB of data.

B.

You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.

C.

You need to integrate with Google BigQuery.

D.

You will not use the data to back a user-facing or latency-sensitive application.

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

A.

Dataproc Worker

B.

Dataproc Viewer

C.

Dataproc Runner

D.

Dataproc Editor

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

A.

Redefine the schema by evenly distributing reads and writes across the row space of the table.

B.

The performance issue should be resolved over time as the site of the BigDate cluster is increased.

C.

Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.

D.

Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

A.

Send the data to Google Cloud Datastore and then export to BigQuery.

B.

Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.

C.

Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.

D.

Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

Your team runs a complex analytical query daily that processes terabytes of data. Recently, after running for 20 minutes, the query fails with a "Resources exceeded” error. You need to resolve this issue. What should you do?

A.

Increase your project's BigQuery API request quota.

B.

Increase the maximum table size limit.

C.

Analyze the SQL syntax for errors.

D.

Move from BigQuery on-demand to slot reservations.

You are configuring networking for a Dataflow job. The data pipeline uses custom container images with the libraries that are required for the transformation logic preinstalled. The data pipeline reads the data from Cloud Storage and writes the data to BigQuery. You need to ensure cost-effective and secure communication between the pipeline and Google APIs and services. What should you do?

A.

Leave external IP addresses assigned to worker VMs while enforcing firewall rules.

B.

Disable external IP addresses and establish a Private Service Connect endpoint IP address.

C.

Disable external IP addresses from worker VMs and enable Private Google Access.

D.

Enable Cloud NAT to provide outbound internet connectivity while enforcing firewall rules.

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

A.

Enable data access logs in each Data Analyst’s project. Restrict access to Stackdriver Logging via Cloud IAM roles.

B.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ projects. Restrict access to the Cloud Storage bucket.

C.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.

D.

Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.