Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Job MapReduce di Hadoop con Bigtable
Questo esempio utilizza Hadoop per eseguire un semplice job MapReduce che
conta il numero di volte in cui una parola compare in un file di testo. Il job MapReduce
utilizza Bigtable per archiviare i risultati dell'operazione di mapping. Il codice per
questo esempio si trova nel repository GitHub
GoogleCloudPlatform/cloud-bigtable-examples, nella directory
java/dataproc-wordcount.
Configura l'autenticazione
Per utilizzare gli esempi di Java questa pagina in un ambiente di sviluppo locale, installa e inizializza gcloud CLI, quindi configura le credenziali predefinite dell'applicazione con le tue credenziali utente.
L'esempio di codice fornisce una semplice interfaccia a riga di comando che accetta uno o più file di testo e un nome di tabella come input, trova tutte le parole che appaiono nel file e conta quante volte compare ogni parola. La logica MapReduce viene visualizzata
nella classe WordCountHBase.
Innanzitutto, un mapper tokenizza i contenuti del file di testo e genera coppie chiave-valore, dove la chiave è una parola del file di testo e il valore è 1:
Un reducer somma quindi i valori per ogni chiave e scrive i risultati in una tabella Bigtable specificata. Ogni chiave di riga è una parola del
file di testo. Ogni riga contiene una colonna cf:count, che contiene il numero di
volte in cui la chiave di riga viene visualizzata nel file di testo.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eThis example demonstrates a Hadoop MapReduce job that counts word occurrences in a text file, storing the results in Bigtable.\u003c/p\u003e\n"],["\u003cp\u003eThe code, located in the \u003ccode\u003eGoogleCloudPlatform/cloud-bigtable-examples\u003c/code\u003e GitHub repository, uses the \u003ccode\u003eWordCountHBase\u003c/code\u003e class to implement the MapReduce logic.\u003c/p\u003e\n"],["\u003cp\u003eA mapper tokenizes the text and generates key-value pairs where each word is a key and the value is 1.\u003c/p\u003e\n"],["\u003cp\u003eA reducer sums the values for each word and writes the final count to a specified Bigtable table in a \u003ccode\u003ecf:count\u003c/code\u003e column.\u003c/p\u003e\n"],["\u003cp\u003eTo run this example in a local environment, you will need to install and initialize the gcloud CLI, then set up application default credentials.\u003c/p\u003e\n"]]],[],null,["Hadoop MapReduce job with Bigtable\n\nThis example uses [Hadoop](https://hadoop.apache.org/) to perform a simple MapReduce job that\ncounts the number of times a word appears in a text file. The MapReduce job\nuses Bigtable to store the results of the map operation. The code for\nthis example is in the GitHub repository\n[GoogleCloudPlatform/cloud-bigtable-examples](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/), in the directory\n`java/dataproc-wordcount`.\n\nSet up authentication\n\n\nTo use the Java samples on this page in a local\ndevelopment environment, install and initialize the gcloud CLI, and\nthen set up Application Default Credentials with your user credentials.\n\n1. [Install](/sdk/docs/install) the Google Cloud CLI.\n2. If you're using an external identity provider (IdP), you must first [sign in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n3. If you're using a local shell, then create local authentication credentials for your user account: \n\n```bash\ngcloud auth application-default login\n```\n4. You don't need to do this if you're using Cloud Shell.\n5. If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have [signed in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n\n\nFor more information, see\n[Set up authentication for a local development environment](/bigtable/docs/authentication#local-development).\n\nOverview of the code sample\n\nThe code sample provides a simple command-line interface that takes one or more\ntext files and a table name as input, finds all of the words that appear in the\nfile, and counts how many times each word appears. The MapReduce logic appears\nin the [`WordCountHBase` class](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples//blob/master/java/dataproc-wordcount/src/main/java/com/example/bigtable/sample/WordCountHBase.java).\n\nFirst, a mapper tokenizes the text file's contents and generates key-value\npairs, where the key is a word from the text file and the value is `1`: \n\n public static class TokenizerMapper extends\n Mapper\u003cObject, Text, ImmutableBytesWritable, IntWritable\u003e {\n\n private final static IntWritable one = new IntWritable(1);\n\n @Override\n public void map(Object key, Text value, Context context) throws IOException,\n InterruptedException {\n StringTokenizer itr = new StringTokenizer(value.toString());\n ImmutableBytesWritable word = new ImmutableBytesWritable();\n while (itr.hasMoreTokens()) {\n word.set(Bytes.toBytes(itr.nextToken()));\n context.write(word, one);\n }\n }\n }\n\nA reducer then sums the values for each key and writes the results to a\nBigtable table that you specified. Each row key is a word from the\ntext file. Each row contains a `cf:count` column, which contains the number of\ntimes the row key appears in the text file. \n\n public static class MyTableReducer extends\n TableReducer\u003cImmutableBytesWritable, IntWritable, ImmutableBytesWritable\u003e {\n\n @Override\n public void reduce(ImmutableBytesWritable key, Iterable\u003cIntWritable\u003e values, Context context)\n throws IOException, InterruptedException {\n int sum = sum(values);\n Put put = new Put(key.get());\n put.addColumn(COLUMN_FAMILY, COUNT_COLUMN_NAME, Bytes.toBytes(sum));\n context.write(null, put);\n }\n\n public int sum(Iterable\u003cIntWritable\u003e values) {\n int i = 0;\n for (IntWritable val : values) {\n i += val.get();\n }\n return i;\n }\n }"]]