Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

You are designing a stateful data processing pipeline that reads data from a Cloud Storage bucket and writes transformed data to a BigQuery table. The pipeline must be highly available and resilient to zonal failures within the us-central1 region. You need to configure a Dataflow pipeline ensuring minimal disruption during a zonal outage. What should you do?

A.

Deploy the Dataflow job to a single zone within us-central1 and configure it to use a regional persistent disk to store its state.

B.

Launch the Dataflow job with the --region us-central1 parameter.

C.

Deploy the Dataflow job to a single zone within us-central1 and use a multi-regional Cloud Storage bucket to store its state.

D.

Launch the Dataflow job with the --zone us-central1-a parameter.

You are migrating your on-premises data warehouse to BigQuery. As part of the migration, you want to facilitate cross-team collaboration to get the most value out of the organization's data. You need to design an architecture that would allow teams within the organization to securely publish, discover, and subscribe to read-only data in a self-service manner. You need to minimize costs while also maximizing data freshness What should you do?

A.

Create authorized datasets to publish shared data in the subscribing team's project.

B.

Create a new dataset for sharing in each individual team's project. Grant the subscribing team the bigquery. dataViewer role on thedataset.

C.

Use BigQuery Data Transfer Service to copy datasets to a centralized BigQuery project for sharing.

D.

Use Analytics Hub to facilitate data sharing.

You work for a large real estate firm and are preparing 6 TB of home sales data lo be used for machine learning You will use SOL to transform the data and use BigQuery ML lo create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

A.

When creating your model, use BigQuerys TRANSFORM clause to define preprocessing stops. At prediction time, use BigQuery"s ML. EVALUATE clause without specifying any transformations on the raw input data.

B.

When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps Before requesting predictions, use a saved query to transform your raw input data, and then use ML. EVALUATE

C.

Use a BigOuery to define your preprocessing logic. When creating your model, use the view as your model training data. At prediction lime, use BigQuery's ML EVALUATE clause without specifying any transformations on the raw input data.

D.

Preprocess all data using Dataflow. At prediction time, use BigOuery"s ML. EVALUATE clause without specifying any further transformations on the input data.

You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?

A.

Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.

B.

Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.

C.

Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.

D.

Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.

You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages. What should you do?

Choose 2 answers

A.

Use Pub/Sub Seek with a timestamp.

B.

Use the Pub/Sub subscription clear-retry-policy flag.

C.

Create a new Pub/Sub subscription two days before the deployment.

D.

Use the Pub/Sub subscription retain-asked-messages flag.

E.

Use Pub/Sub Snapshot capture two days before the deployment.

Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all thedata in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

A.

Export the data into a Google Sheet for virtualization.

B.

Create an additional table with only the necessary columns.

C.

Create a view on the table to present to the virtualization tool.

D.

Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.

Which approach should you take?

A.

Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.

B.

Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.

C.

Use the NOW () function in BigQuery to record the event’s time.

D.

Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Flowlogistic’s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

A.

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

B.

Cloud Pub/Sub, Cloud Dataflow, and Local SSD

C.

Cloud Pub/Sub, Cloud SQL, and Cloud Storage

D.

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?

A.

Store the common data in BigQuery as partitioned tables.

B.

Store the common data in BigQuery and expose authorized views.

C.

Store the common data encoded as Avro in Google Cloud Storage.

D.

Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity ‘Movie’ the property ‘actors’ and the property ‘tags’ have multiple values but the property ‘date released’ does not. A typical query would ask for all movies with actor= ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

A.

Option A

B.

Option B.

C.

Option C

D.

Option D