menu
arrow_back

Run a Big Data Text Processing Pipeline in Cloud Dataflow

Open Google Console

Caution: When you are in the console, do not deviate from the lab instructions. Doing so may cause your account to be blocked. Learn more.

Run a Big Data Text Processing Pipeline in Cloud Dataflow

40 minutes 7 Credits

GSP047

Google Cloud Self-Paced Labs

Overview

Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Because Dataflow is a managed service, it can allocate resources on demand to minimize latency while maintaining high utilization efficiency.

The Dataflow model combines batch and stream processing so developers don't have to make tradeoffs between correctness, cost, and processing time. In this lab, you'll learn how to run a Dataflow pipeline that counts the occurrences of unique words in a text file.

What you'll learn

  • How to create a Maven project with the Cloud Dataflow SDK

  • Run an example pipeline using the Google Cloud Platform Console

  • How to delete the associated Cloud Storage bucket and its contents

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
Score

—/10

Create a new Cloud Storage bucket

Run Step

/ 5

Run a text processing pipeline on Cloud Dataflow

Run Step

/ 5

home
Home
school
Catalog
menu
More
More