New Year Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a central data platform team. However, as each domain is evolving rapidly, the central data platform team is becoming a bottleneck. This is causing delays in deriving insights from data, and resulting in stale data when pipelines are not kept up to date. You need to design a data mesh architecture by using Dataplex to eliminate the bottleneck. What should you do?

A.

1. Create one lake for each team. Inside each lake, create one zone for each domain.2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.3. Have the central data platform team manage all zones' data assets.

B.

1 Create one lake for each team. Inside each lake, create one zone for each domain.2. Attach each to the BigQuory datasets created by the individual teams as assets to the respective zone.3. Direct each domain to manage their own zone's data assets.

C.

1 Create one lake for each domain. Inside each lake, create one zone for each team.2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.3. Direct each domain to manage their own lake's data assets.

D.

1 Create one lake for each domain. Inside each lake, create one zone for each team.2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.3. Have the central data platform team manage all lakes' data assets.

Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

A.

Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.

B.

Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US. Europe, and Asiareplication alter securing funding

C.

Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.

D.

Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.

You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm.

To do this you need to add a synthetic feature. What should the value of that feature be?

A.

X^2+Y^2

B.

X^2

C.

Y^2

D.

cos(X)

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

A.

Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.

B.

Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.

C.

Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.

D.

Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.

You are developing an application on Google Cloud that will automatically generate subject labels for users’ blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

A.

Call the Cloud Natural Language API from your application. Process the generated Entity Analysis aslabels.

B.

Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.

C.

Build and train a text classification model using TensorFlow. Deploy the model using Cloud MachineLearning Engine. Call the model from your application and process the results as labels.

D.

Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.

Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).

What should you do?

A.

Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.

B.

Add a try… catch block to your DoFn that transforms the data, extract erroneous rows from logs.

C.

Add a try… catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.

D.

Add a try… catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to PubSub later.

You work for an advertising company, and you’ve developed a Spark ML model to predict click-through rates at advertisement blocks. You’ve been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

A.

Use Cloud ML Engine for training existing Spark ML models

B.

Rewrite your models on TensorFlow, and start using Cloud ML Engine

C.

Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery

D.

Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery

You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?

A.

Cloud Scheduler

B.

Cloud Dataflow

C.

Cloud Functions

D.

Cloud Composer

You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analyses. You need to follow data privacy requirements, including protecting certain sensitive data elements, while also retaining all of the data for potential future use cases. What should you do?

A.

Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.

B.

Use the Cloud Data Loss Prevention API and Dataflow to detect and remove sensitive fields from the data in Cloud Storage. Write the filtered data in BigQuery.

C.

Use Dataflow and Cloud KMS to encrypt sensitive fields and write the encrypted data in BigQuery. Share the encryption key by following the principle of least privilege.

D.

Use customer-managed encryption keys (CMEK) to directly encrypt the data in Cloud Storage. Use federated queries from BigQuery. Share the encryption key by following the principle of least privilege.

You have an Apache Kafka Cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.

What should you do?

A.

Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

B.

Deploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

C.

Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read fron PubSub and write to GCS.

D.

Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read fron PubSub and write to GCS.