Google Professional-Data-Engineer Free Certification Exam Questions Answer Aug 2026 update

Question # 11

You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:

The user profile: What the user likes and doesn’t like to eat

The user account information: Name, address, preferred meal times

The order information: When orders are made, from where, to whom

The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?

BigQuery

Cloud SQL

Cloud Bigtable

Cloud Datastore

Question # 12

Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs. What should you recommend they do?

Rewrite the job in Pig.

Rewrite the job in Apache Spark.

Increase the size of the Hadoop cluster.

Decrease the size of the Hadoop cluster but also rewrite the job in Hive.

Question # 13

You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?

Change the processing job to use Google Cloud Dataproc instead.

Manually start the Cloud Dataflow job each morning when you get into the office.

Create a cron job with Google App Engine Cron Service to run the Cloud Dataflow job.

Configure the Cloud Dataflow job as a streaming job so that it processes the log data immediately.

Question # 14

You work for a large fast food restaurant chain with over 400,000 employees. You store employee information in Google BigQuery in a Users table consisting of a FirstName field and a LastName field. A member of IT is building an application and asks you to modify the schema and data in BigQuery so the application can query a FullName field consisting of the value of the FirstName field concatenated with a space, followed by the value of the LastName field for each employee. How can you make that data available while minimizing cost?

Create a view in BigQuery that concatenates the FirstName and LastName field values to produce the FullName.

Add a new column called FullName to the Users table. Run an UPDATE statement that updates the FullName column for each user with the concatenation of the FirstName and LastName values.

Create a Google Cloud Dataflow job that queries BigQuery for the entire Users table, concatenates the FirstName value and LastName value for each user, and loads the proper values for FirstName, LastName, and FullName into a new table in BigQuery.

Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for FirstName, LastName and FullName. Run a BigQuery load job to load the new CSV file into BigQuery.

Question # 15

You are running a streaming pipeline with Dataflow and are using hopping windows to group the data as the data arrives. You noticed that some data is arriving late but is not being marked as late data, which is resulting in inaccurate aggregations downstream. You need to find a solution that allows you to capture the late data in the appropriate window. What should you do?

Change your windowing function to session windows to define your windows based on certain activity.

Change your windowing function to tumbling windows to avoid overlapping window periods.

Expand your hopping window so that the late data has more time to arrive within the grouping.

Use watermarks to define the expected data arrival window Allow late data as it arrives.

Question # 16

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub

streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

They have not assigned the timestamp, which causes the job to fail

They have not set the triggers to accommodate the data coming in late, which causes the job to fail

They have not applied a global windowing function, which causes the job to fail when the pipeline iscreated

They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Question # 17

You have an Apache Kafka Cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.

What should you do?

Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

Deploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read fron PubSub and write to GCS.

Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read fron PubSub and write to GCS.

Question # 18

You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

Cloud Speech-to-Text API

Cloud Natural Language API

Dialogflow Enterprise Edition

Cloud AutoML Natural Language

Question # 19

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?

Cloud SQL

Cloud Bigtable

Cloud Spanner

Cloud Datastore

Question # 20

Your chemical company needs to manually check documentation for customer order. You use a pull subscription in Pub/Sub so that sales agents get details from the order. You must ensure that you do not process orders twice with different sales agents and that you do not add more complexity to this workflow. What should you do?

Create a transactional database that monitors the pending messages.

Create a new Pub/Sub push subscription to monitor the orders processed in the agent's system.

Use Pub/Sub exactly-once delivery in your pull subscription.

Use a Deduphcate PTransform in Dataflow before sending the messages to the sales agents.

Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Free Practice Questions for Google Professional-Data-Engineer Exam

The Answer Is:

The Answer Is:

The Answer Is:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

The Answer Is:

The Answer Is:

The Answer Is:

Explanation: