Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Job de MapReduce do Hadoop com o Bigtable
Neste exemplo, usamos o Hadoop para executar um job de MapReduce simples que conta o número de vezes que uma palavra aparece em um arquivo de texto. O job de MapReduce usa o Bigtable para armazenar os resultados da operação de mapeamento. O código deste exemplo está no repositório do GitHub GoogleCloudPlatform/cloud-bigtable-examples, no diretório java/dataproc-wordcount.
Configurar a autenticação
Para usar os exemplos Java desta página em um
ambiente de desenvolvimento local, instale e inicialize a CLI gcloud e
configure o Application Default Credentials com suas credenciais de usuário.
O exemplo de código oferece uma interface da linha de comando simples que usa um ou mais
arquivos de texto e um nome de tabela como entrada, localiza todas as palavras que aparecem no
arquivo e conta quantas vezes cada palavra aparece. A lógica do MapReduce aparece na classe WordCountHBase.
Primeiro, um mapeador tokeniza o conteúdo do arquivo de texto e gera pares de chave-valor, em que a chave é uma palavra do arquivo de texto e o valor é 1:
Com um redutor, são somados os valores para cada chave e gravados os resultados em uma tabela do Cloud Bigtable que você especificou. Cada chave de linha é uma palavra do arquivo de texto. Cada linha contém uma coluna cf:count, que contém o número de vezes que a chave de linha aparece no arquivo de texto.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-09-04 UTC."],[[["\u003cp\u003eThis example demonstrates a Hadoop MapReduce job that counts word occurrences in a text file, storing the results in Bigtable.\u003c/p\u003e\n"],["\u003cp\u003eThe code, located in the \u003ccode\u003eGoogleCloudPlatform/cloud-bigtable-examples\u003c/code\u003e GitHub repository, uses the \u003ccode\u003eWordCountHBase\u003c/code\u003e class to implement the MapReduce logic.\u003c/p\u003e\n"],["\u003cp\u003eA mapper tokenizes the text and generates key-value pairs where each word is a key and the value is 1.\u003c/p\u003e\n"],["\u003cp\u003eA reducer sums the values for each word and writes the final count to a specified Bigtable table in a \u003ccode\u003ecf:count\u003c/code\u003e column.\u003c/p\u003e\n"],["\u003cp\u003eTo run this example in a local environment, you will need to install and initialize the gcloud CLI, then set up application default credentials.\u003c/p\u003e\n"]]],[],null,["Hadoop MapReduce job with Bigtable\n\nThis example uses [Hadoop](https://hadoop.apache.org/) to perform a simple MapReduce job that\ncounts the number of times a word appears in a text file. The MapReduce job\nuses Bigtable to store the results of the map operation. The code for\nthis example is in the GitHub repository\n[GoogleCloudPlatform/cloud-bigtable-examples](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/), in the directory\n`java/dataproc-wordcount`.\n\nSet up authentication\n\n\nTo use the Java samples on this page in a local\ndevelopment environment, install and initialize the gcloud CLI, and\nthen set up Application Default Credentials with your user credentials.\n\n1. [Install](/sdk/docs/install) the Google Cloud CLI.\n2. If you're using an external identity provider (IdP), you must first [sign in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n3. If you're using a local shell, then create local authentication credentials for your user account: \n\n```bash\ngcloud auth application-default login\n```\n4. You don't need to do this if you're using Cloud Shell.\n5. If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have [signed in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n\n\nFor more information, see\n[Set up authentication for a local development environment](/bigtable/docs/authentication#local-development).\n\nOverview of the code sample\n\nThe code sample provides a simple command-line interface that takes one or more\ntext files and a table name as input, finds all of the words that appear in the\nfile, and counts how many times each word appears. The MapReduce logic appears\nin the [`WordCountHBase` class](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples//blob/master/java/dataproc-wordcount/src/main/java/com/example/bigtable/sample/WordCountHBase.java).\n\nFirst, a mapper tokenizes the text file's contents and generates key-value\npairs, where the key is a word from the text file and the value is `1`: \n\n public static class TokenizerMapper extends\n Mapper\u003cObject, Text, ImmutableBytesWritable, IntWritable\u003e {\n\n private final static IntWritable one = new IntWritable(1);\n\n @Override\n public void map(Object key, Text value, Context context) throws IOException,\n InterruptedException {\n StringTokenizer itr = new StringTokenizer(value.toString());\n ImmutableBytesWritable word = new ImmutableBytesWritable();\n while (itr.hasMoreTokens()) {\n word.set(Bytes.toBytes(itr.nextToken()));\n context.write(word, one);\n }\n }\n }\n\nA reducer then sums the values for each key and writes the results to a\nBigtable table that you specified. Each row key is a word from the\ntext file. Each row contains a `cf:count` column, which contains the number of\ntimes the row key appears in the text file. \n\n public static class MyTableReducer extends\n TableReducer\u003cImmutableBytesWritable, IntWritable, ImmutableBytesWritable\u003e {\n\n @Override\n public void reduce(ImmutableBytesWritable key, Iterable\u003cIntWritable\u003e values, Context context)\n throws IOException, InterruptedException {\n int sum = sum(values);\n Put put = new Put(key.get());\n put.addColumn(COLUMN_FAMILY, COUNT_COLUMN_NAME, Bytes.toBytes(sum));\n context.write(null, put);\n }\n\n public int sum(Iterable\u003cIntWritable\u003e values) {\n int i = 0;\n for (IntWritable val : values) {\n i += val.get();\n }\n return i;\n }\n }"]]