이 예시에서는 Hadoop을 사용하여 단어가 텍스트 파일에 표시되는 횟수를 계산하는 간단한 맵리듀스 작업을 수행합니다. 맵리듀스 작업은 Bigtable을 사용하여 매핑 작업 결과를 저장합니다. 이 예의 코드는 GitHub 저장소 GoogleCloudPlatform/cloud-bigtable-examples의 java/dataproc-wordcount 디렉터리에 있습니다.
인증 설정
로컬 개발 환경에서 이 페이지의 Java 샘플을 사용하려면 gcloud CLI를 설치하고 초기화한 후 사용자 인증 정보로 애플리케이션 기본 사용자 인증 정보를 설정합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eThis example demonstrates a Hadoop MapReduce job that counts word occurrences in a text file, storing the results in Bigtable.\u003c/p\u003e\n"],["\u003cp\u003eThe code, located in the \u003ccode\u003eGoogleCloudPlatform/cloud-bigtable-examples\u003c/code\u003e GitHub repository, uses the \u003ccode\u003eWordCountHBase\u003c/code\u003e class to implement the MapReduce logic.\u003c/p\u003e\n"],["\u003cp\u003eA mapper tokenizes the text and generates key-value pairs where each word is a key and the value is 1.\u003c/p\u003e\n"],["\u003cp\u003eA reducer sums the values for each word and writes the final count to a specified Bigtable table in a \u003ccode\u003ecf:count\u003c/code\u003e column.\u003c/p\u003e\n"],["\u003cp\u003eTo run this example in a local environment, you will need to install and initialize the gcloud CLI, then set up application default credentials.\u003c/p\u003e\n"]]],[],null,["Hadoop MapReduce job with Bigtable\n\nThis example uses [Hadoop](https://hadoop.apache.org/) to perform a simple MapReduce job that\ncounts the number of times a word appears in a text file. The MapReduce job\nuses Bigtable to store the results of the map operation. The code for\nthis example is in the GitHub repository\n[GoogleCloudPlatform/cloud-bigtable-examples](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/), in the directory\n`java/dataproc-wordcount`.\n\nSet up authentication\n\n\nTo use the Java samples on this page in a local\ndevelopment environment, install and initialize the gcloud CLI, and\nthen set up Application Default Credentials with your user credentials.\n\n1. [Install](/sdk/docs/install) the Google Cloud CLI.\n2. If you're using an external identity provider (IdP), you must first [sign in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n3. If you're using a local shell, then create local authentication credentials for your user account: \n\n```bash\ngcloud auth application-default login\n```\n4. You don't need to do this if you're using Cloud Shell.\n5. If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have [signed in to the gcloud CLI with your federated identity](/iam/docs/workforce-log-in-gcloud).\n\n\nFor more information, see\n[Set up authentication for a local development environment](/bigtable/docs/authentication#local-development).\n\nOverview of the code sample\n\nThe code sample provides a simple command-line interface that takes one or more\ntext files and a table name as input, finds all of the words that appear in the\nfile, and counts how many times each word appears. The MapReduce logic appears\nin the [`WordCountHBase` class](https://github.com/GoogleCloudPlatform/cloud-bigtable-examples//blob/master/java/dataproc-wordcount/src/main/java/com/example/bigtable/sample/WordCountHBase.java).\n\nFirst, a mapper tokenizes the text file's contents and generates key-value\npairs, where the key is a word from the text file and the value is `1`: \n\n public static class TokenizerMapper extends\n Mapper\u003cObject, Text, ImmutableBytesWritable, IntWritable\u003e {\n\n private final static IntWritable one = new IntWritable(1);\n\n @Override\n public void map(Object key, Text value, Context context) throws IOException,\n InterruptedException {\n StringTokenizer itr = new StringTokenizer(value.toString());\n ImmutableBytesWritable word = new ImmutableBytesWritable();\n while (itr.hasMoreTokens()) {\n word.set(Bytes.toBytes(itr.nextToken()));\n context.write(word, one);\n }\n }\n }\n\nA reducer then sums the values for each key and writes the results to a\nBigtable table that you specified. Each row key is a word from the\ntext file. Each row contains a `cf:count` column, which contains the number of\ntimes the row key appears in the text file. \n\n public static class MyTableReducer extends\n TableReducer\u003cImmutableBytesWritable, IntWritable, ImmutableBytesWritable\u003e {\n\n @Override\n public void reduce(ImmutableBytesWritable key, Iterable\u003cIntWritable\u003e values, Context context)\n throws IOException, InterruptedException {\n int sum = sum(values);\n Put put = new Put(key.get());\n put.addColumn(COLUMN_FAMILY, COUNT_COLUMN_NAME, Bytes.toBytes(sum));\n context.write(null, put);\n }\n\n public int sum(Iterable\u003cIntWritable\u003e values) {\n int i = 0;\n for (IntWritable val : values) {\n i += val.get();\n }\n return i;\n }\n }"]]