Example: Hadoop MapReduce job with Bigtable
This example uses Hadoop to perform a simple MapReduce job that
counts the number of times a word appears in a text file. The MapReduce job
uses Bigtable to store the results of the map operation. The code for
this example is in the GitHub repository
GoogleCloudPlatform/cloud-bigtable-examples, in the directory
java/dataproc-wordcount
.
Overview of the code sample
The code sample provides a simple command-line interface that takes one or more
text files and a table name as input, finds all of the words that appear in the
file, and counts how many times each word appears. The MapReduce logic appears
in the WordCountHBase
class.
First, a mapper tokenizes the text file's contents and generates key-value
pairs, where the key is a word from the text file and the value is 1
:
A reducer then sums the values for each key and writes the results to a
Bigtable table that you specified. Each row key is a word from the
text file. Each row contains a cf:count
column, which contains the number of
times the row key appears in the text file.