Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

You are migrating your on-premises data warehouse to BigQuery. As part of the migration, you want to facilitate cross-team collaboration to get the most value out of the organization's data. You need to design an architecture that would allow teams within the organization to securely publish, discover, and subscribe to read-only data in a self-service manner. You need to minimize costs while also maximizing data freshness What should you do?

A.

Create authorized datasets to publish shared data in the subscribing team's project.

B.

Create a new dataset for sharing in each individual team's project. Grant the subscribing team the bigquery. dataViewer role on thedataset.

C.

Use BigQuery Data Transfer Service to copy datasets to a centralized BigQuery project for sharing.

D.

Use Analytics Hub to facilitate data sharing.

You want to optimize your queries for cost and performance. How should you structure your data?

A.

Partition table data by create_date, location_id and device_version

B.

Partition table data by create_date cluster table data by location_Id and device_version

C.

Cluster table data by create_date location_id and device_version

D.

Cluster table data by create_date partition by locationed and device_version

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

A.

Use bq load to load a batch of sensor data every 60 seconds.

B.

Use a Cloud Dataflow pipeline to stream data into the BigQuery table.

C.

Use the INSERT statement to insert a batch of data every 60 seconds.

D.

Use the MERGE statement to apply updates in batch every 60 seconds.

Your company needs to ingest and transform streaming data from IoT devices and store it for analysis. The data is sensitive and requires encryption with your own key in transit and at rest. The volume of data is expected to fluctuate significantly throughout the day. You need to identify a solution that is managed and elastic. What should you do?

A.

Write data directly into BigQuery by using the Storage Write API, and process it in BigQuery by using SQL functions, selecting a Google-managed encryption key for each service.

B.

Publish data to Pub/Sub, process it with Dataflow and store it in Cloud SQL, selecting your key from Cloud HSM for each service.

C.

Publish data to Pub/Sub, process it with Dataflow and store it in BigQuery, selecting your key from Cloud KMS for each service.

D.

Write data directly into Cloud Storage, process it with Dataproc, and store it in BigQuery, selecting a customer-managed encryption key (CMEK) for each service.

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

A.

Denormalize the data as must as possible.

B.

Preserve the structure of the data as much as possible.

C.

Use BigQuery UPDATE to further reduce the size of the dataset.

D.

Develop a data pipeline where status updates are appended to BigQuery instead of updated.

E.

Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery’s support for external data sources to query.

You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

A.

Edge TPUs as sensor devices for storing and transmitting the messages.

B.

Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.

C.

An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.

D.

A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.

You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. What should you do?

A.

Use Analytics Hub to control data access, and provide third party companies with access to the dataset

B.

Create a Dataflow job that reads the data in frequent time intervals and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.

C.

Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.

D.

Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.

You are designing a cloud-native historical data processing system to meet the following conditions:

The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

A streaming data pipeline stores new data daily.

Peformance is not a factor in the solution.

The solution design should maximize availability.

How should you design data storage for this solution?

A.

Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.

B.

Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.

C.

Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

D.

Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of you Cloud Bigtable cluster. Which two actions can you take to accomplish this? Choose 2 answers.

A.

Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.

B.

Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100.

C.

Monitor the latency of write operations. Increase the size of the Cloud Bigtable cluster when there is a sustained increase in write latency.

D.

Monitor storage utilization. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of max capacity.

E.

Monitor latency of read operations. Increase the size of the Cloud Bigtable cluster of read operations take longer than 100 ms.

You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level every ten seconds, and send that data to the pipeline when levels reach above 70 dBA. You need to detect the average noise level from a sensor when data is received for a duration of more than 30 minutes, but the window ends when no data has been received for 15 minutes What should you do?

A.

Use session windows with a 30-mmute gap duration.

B.

Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.

C.

Use session windows with a 15-minute gap duration.

D.

Use hopping windows with a 15-mmute window, and a thirty-minute period.