Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Mengekstrak metadata dari Apache Hive untuk migrasi
Dokumen ini menunjukkan cara menggunakan alat dwh-migration-dumper untuk mengekstrak
metadata yang diperlukan sebelum menjalankan migrasi data atau izin
Apache Hive.
Dokumen ini membahas ekstraksi metadata dari sumber data berikut:
Apache Hive
Apache Hadoop Distributed File System (HDFS)
Apache Ranger
Cloudera Manager
Log kueri Apache Hive
Sebelum memulai
Sebelum Anda dapat menggunakan alat dwh-migration-dumper, lakukan hal berikut:
Menginstal Java
Server tempat Anda berencana menjalankan alat dwh-migration-dumper harus sudah menginstal Java 8 atau yang lebih baru. Jika tidak, download Java dari halaman download Java, lalu instal.
Izin yang diperlukan
Akun pengguna yang Anda tentukan untuk menghubungkan alat dwh-migration-dumper ke sistem sumber harus memiliki izin untuk membaca metadata dari sistem tersebut.
Pastikan akun ini memiliki keanggotaan peran yang sesuai untuk membuat kueri resource metadata yang tersedia untuk platform Anda. Misalnya, INFORMATION_SCHEMA adalah resource metadata yang umum di beberapa platform.
Menginstal alat dwh-migration-dumper
Untuk menginstal alat dwh-migration-dumper, ikuti langkah-langkah berikut:
Ganti RELEASE_ZIP_FILENAME dengan nama file zip yang didownload
dari rilis alat ekstraksi command line dwh-migration-dumper—misalnya,
dwh-migration-tools-v1.0.52.zip
Hasil True mengonfirmasi keberhasilan verifikasi checksum.
Hasil False menunjukkan error verifikasi. Pastikan file checksum dan
ZIP didownload dari versi rilis yang sama dan ditempatkan di
direktori yang sama.
Ekstrak file ZIP. Biner alat ekstraksi berada dalam subdirektori /bin dari folder yang dibuat dengan mengekstrak file ZIP.
Update variabel lingkungan PATH untuk menyertakan jalur penginstalan untuk alat ekstraksi.
Mengekstrak metadata untuk migrasi
Pilih salah satu opsi berikut untuk mempelajari cara mengekstrak metadata untuk sumber data Anda:
Apache Hive
Lakukan langkah-langkah di bagian Apache Hive Mengekstrak metadata dan log kueri dari data warehouse Anda
untuk mengekstrak metadata Apache Hive Anda. Kemudian, Anda dapat mengupload metadata
ke bucket Cloud Storage yang berisi file migrasi Anda.
HDFS
Jalankan perintah berikut untuk mengekstrak metadata dari HDFS menggunakan alat dwh-migration-dumper.
HDFS-PORT: nomor port HDFS NameNode. Anda dapat
melewati argumen ini jika menggunakan port 8020 default.
MIGRATION-BUCKET: bucket Cloud Storage yang Anda gunakan untuk menyimpan file migrasi.
Perintah ini mengekstrak metadata dari HDFS ke
file bernama hdfs-dumper-output.zip di direktori MIGRATION-BUCKET.
Ada beberapa batasan umum saat mengekstrak metadata dari HDFS:
Beberapa tugas di konektor ini bersifat opsional dan dapat gagal, mencatat perdagangan stack penuh dalam output. Selama tugas yang diperlukan telah berhasil dan hdfs-dumper-output.zip dibuat, Anda dapat melanjutkan migrasi HDFS.
Proses ekstraksi mungkin gagal atau berjalan lebih lambat dari yang diharapkan jika ukuran thread pool yang dikonfigurasi terlalu besar. Jika Anda mengalami masalah ini, sebaiknya kurangi ukuran kumpulan thread menggunakan argumen command line --thread-pool-size.
Apache Ranger
Jalankan perintah berikut untuk mengekstrak metadata dari Apache Ranger menggunakan alat dwh-migration-dumper.
RANGER-SCHEME: tentukan apakah Apache Ranger
menggunakan http atau https. Nilai defaultnya adalah http.
MIGRATION-BUCKET: bucket Cloud Storage yang Anda gunakan untuk menyimpan file migrasi.
Anda juga dapat menyertakan flag opsional berikut:
--kerberos-auth-for-hadoop: menggantikan --user dan --password, jika Apache Ranger
dilindungi oleh kerberos, bukan autentikasi dasar. Anda harus menjalankan perintah
kinit sebelum alat dwh-migration-dumper untuk menggunakan flag ini.
--ranger-disable-tls-validation: sertakan tanda ini jika sertifikat
https yang digunakan oleh API ditandatangani sendiri. Misalnya, saat menggunakan
Cloudera.
Perintah ini mengekstrak metadata dari Apache Ranger ke
file bernama ranger-dumper-output.zip di direktori MIGRATION-BUCKET.
Cloudera
Jalankan perintah berikut untuk mengekstrak metadata dari Cloudera menggunakan alat dwh-migration-dumper.
MIGRATION-BUCKET: bucket Cloud Storage yang Anda gunakan untuk menyimpan file migrasi.
APPLICATION-TYPES: (Opsional) daftar semua
jenis aplikasi yang ada dari Hadoop YARN. Contoh, SPARK, MAPREDUCE.
PAGE-SIZE: (Opsional) tentukan jumlah data yang diambil dari layanan pihak ketiga, seperti Hadoop YARN API. Nilai defaultnya adalah 1000, yang mewakili 1.000 entity per
permintaan.
Perintah ini mengekstrak metadata dari Cloudera ke
file bernama dwh-migration-cloudera.zip di direktori MIGRATION-BUCKET.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-09 UTC."],[],[],null,["# Extracting metadata from Apache Hive for migration\n==================================================\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n| **Note:** To get support or provide feedback for this feature, contact [bigquery-permission-migration-support@google.com](mailto:bigquery-permission-migration-support@google.com).\n\nThis document shows how you can use the `dwh-migration-dumper` tool to extract\nthe necessary metadata before running a Apache Hive data or permissions\nmigration.\n\nThis document covers metadata extraction from the following data sources:\n\n- Apache Hive\n- Apache Hadoop Distributed File System (HDFS)\n- Apache Ranger\n- Cloudera Manager\n- Apache Hive query logs\n\nBefore you begin\n----------------\n\nBefore you can use the `dwh-migration-dumper` tool, do the following:\n\n### Install Java\n\nThe server on which you plan to run `dwh-migration-dumper` tool must have\nJava 8 or higher installed. If it doesn't, download Java from the\n[Java downloads page](https://www.java.com/download/)\nand install it.\n\n### Required permissions\n\nThe user account that you specify for connecting the `dwh-migration-dumper` tool to\nthe source system must have permissions to read metadata from that system.\nConfirm that this account has appropriate role membership to query the metadata\nresources available for your platform. For example, `INFORMATION_SCHEMA` is a\nmetadata resource that is common across several platforms.\n\nInstall the `dwh-migration-dumper` tool\n---------------------------------------\n\nTo install the `dwh-migration-dumper` tool, follow these steps:\n\n1. On the machine where you want to run the `dwh-migration-dumper` tool, download the zip file from the [`dwh-migration-dumper` tool GitHub repository](https://github.com/google/dwh-migration-tools/releases/latest).\n2. To validate the `dwh-migration-dumper` tool zip file, download the\n [`SHA256SUMS.txt` file](https://github.com/google/dwh-migration-tools/releases/latest/download/SHA256SUMS.txt)\n and run the following command:\n\n ### Bash\n\n ```bash\n sha256sum --check SHA256SUMS.txt\n ```\n\n If verification fails, see [Troubleshooting](#corrupted_zip_file).\n\n ### Windows PowerShell\n\n ```bash\n (Get-FileHash RELEASE_ZIP_FILENAME).Hash -eq ((Get-Content SHA256SUMS.txt) -Split \" \")[0]\n ```\n\n Replace the \u003cvar translate=\"no\"\u003eRELEASE_ZIP_FILENAME\u003c/var\u003e with the downloaded\n zip filename of the `dwh-migration-dumper` command-line extraction tool release---for example,\n `dwh-migration-tools-v1.0.52.zip`\n\n The `True` result confirms successful checksum verification.\n\n The `False` result indicates verification error. Make sure the checksum and\n zip files are downloaded from the same release version and placed in the\n same directory.\n3. Extract the zip file. The extraction tool binary is in the\n `/bin` subdirectory of the folder created by extracting the zip file.\n\n4. Update the `PATH` environment variable to include the installation path for\n the extraction tool.\n\nExtracting metadata for migration\n---------------------------------\n\nSelect one of the following options to learn how to extract metadata for your\ndata source: \n\n### Apache Hive\n\nPerform the steps in the Apache Hive section [Extract metadata and query logs from your data warehouse](/bigquery/docs/migration-assessment#apache-hive)\nto extract your Apache Hive metadata. You can then upload the metadata\nto your Cloud Storage bucket containing your migration files.\n\n### HDFS\n\nRun the following command to extract extract metadata from HDFS\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector hdfs \\\n --host \u003cvar translate=\"no\"\u003eHDFS-HOST\u003c/var\u003e \\\n --port \u003cvar translate=\"no\"\u003eHDFS-PORT\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/hdfs-dumper-output.zip \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eHDFS-HOST\u003c/var\u003e: the HDFS NameNode hostname\n- \u003cvar translate=\"no\"\u003eHDFS-PORT\u003c/var\u003e: the HDFS NameNode port number. You can skip this argument if you are using the default `8020` port.\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n\nThis command extracts metadata from HDFS to a\nfile named `hdfs-dumper-output.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\nThere are several known limitations when extracting metadata from HDFS:\n\n- Some tasks in this connector are optional and can fail, logging a full stack trade in the output. As long as the required tasks have succeeded and the `hdfs-dumper-output.zip` is generated, then you can proceed with the HDFS migration.\n- The extraction process might fail or run slower than expected if the configured thread pool size is too large. If you are encountering these issues, we recommend decreasing the thread pool size using the command line argument `--thread-pool-size`.\n\n### Apache Ranger\n\nRun the following command to extract extract metadata from Apache Ranger\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector ranger \\\n --host \u003cvar translate=\"no\"\u003eRANGER-HOST\u003c/var\u003e \\\n --port 6080 \\\n --user \u003cvar translate=\"no\"\u003eRANGER-USER\u003c/var\u003e \\\n --password \u003cvar translate=\"no\"\u003eRANGER-PASSWORD\u003c/var\u003e \\\n --ranger-scheme \u003cvar translate=\"no\"\u003eRANGER-SCHEME\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/ranger-dumper-output.zip \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eRANGER-HOST\u003c/var\u003e: the hostname of the Apache Ranger instance\n- \u003cvar translate=\"no\"\u003eRANGER-USER\u003c/var\u003e: the username of the Apache Ranger user\n- \u003cvar translate=\"no\"\u003eRANGER-PASSWORD\u003c/var\u003e: the password of the Apache Ranger user\n- \u003cvar translate=\"no\"\u003eRANGER-SCHEME\u003c/var\u003e: specify if Apache Ranger is using `http` or `https`. Default value is `http`.\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n\nYou can also include the following optional flags:\n\n- `--kerberos-auth-for-hadoop`: replaces `--user` and `--password`, if Apache Ranger is protected by kerberos instead of basic authentication. You must run the `kinit` command before the `dwh-migration-dumper` tool tool to use this flag.\n- `--ranger-disable-tls-validation`: include this flag if the https certificate used by the API is self signed. For example, when using Cloudera.\n\nThis command extracts metadata from Apache Ranger to a\nfile named `ranger-dumper-output.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\n### Cloudera\n\nRun the following command to extract metadata from Cloudera\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector cloudera-manager \\\n --url \u003cvar translate=\"no\"\u003eCLOUDERA-URL\u003c/var\u003e \\\n --user \u003cvar translate=\"no\"\u003eCLOUDERA-USER\u003c/var\u003e \\\n --password \u003cvar translate=\"no\"\u003eCLOUDERA-PASSWORD\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/cloudera-dumper-output.zip \\\n --yarn-application-types \u003cvar translate=\"no\"\u003eAPPLICATION-TYPES\u003c/var\u003e \\\n --pagination-page-size \u003cvar translate=\"no\"\u003ePAGE-SIZE\u003c/var\u003e \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLOUDERA-URL\u003c/var\u003e: the URL for Cloudera Manager\n- \u003cvar translate=\"no\"\u003eCLOUDERA-USER\u003c/var\u003e: the username of the Cloudera user\n- \u003cvar translate=\"no\"\u003eCLOUDERA-PASSWORD\u003c/var\u003e: the password of the Cloudera user\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n- \u003cvar translate=\"no\"\u003eAPPLICATION-TYPES\u003c/var\u003e: (Optional) list of all existing application types from Hadoop YARN. For example, `SPARK, MAPREDUCE`.\n- \u003cvar translate=\"no\"\u003ePAGE-SIZE\u003c/var\u003e: (Optional) specify how much data is fetched from 3rd party services, like the Hadoop YARN API. The default value is `1000`, which represents 1000 entities per request.\n\nThis command extracts metadata from Cloudera to a\nfile named `dwh-migration-cloudera.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\n### Apache Hive query logs\n\nPerform the steps in the Apache Hive section [Extract query logs with the `hadoop-migration-assessment` logging hook](/bigquery/docs/migration-assessment#apache-hive)\nto extract your Apache Hive query logs. You can then upload the logs\nto your Cloud Storage bucket containing your migration files.\n\nWhat's next\n-----------\n\nWith your extracted metadata from Hadoop, you can use\nthese metadata files to do the following:\n\n- [Migrate permissions from Hadoop](/bigquery/docs/hadoop-permissions-migration)\n- [Schedule a Hadoop transfer](/bigquery/docs/hadoop-transfer)"]]