Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Extração de metadados do Apache Hive para migração
Neste documento, mostramos como usar a ferramenta dwh-migration-dumper para extrair os metadados necessários antes de executar uma migração de dados ou permissões do Apache Hive.
Este documento aborda a extração de metadados das seguintes fontes de dados:
Apache Hive
Sistema de arquivos distribuídos do Apache Hadoop (HDFS)
Apache Ranger
Cloudera Manager
Registros de consultas do Apache Hive
Antes de começar
Antes de usar a ferramenta dwh-migration-dumper, faça o seguinte:
Instalar o Java
O servidor em que você planeja executar a ferramenta dwh-migration-dumper precisa ter o
Java 8 ou mais recente instalado. Caso contrário, faça o download do Java da
página de downloads do Java
e instale-o.
Permissões necessárias
A conta de usuário especificada para conectar a ferramenta dwh-migration-dumper ao
sistema de origem precisa ter permissões para ler metadados desse sistema.
Confirme se essa conta tem os papéis apropriados para consultar os recursos de metadados disponíveis na sua plataforma. Por exemplo, INFORMATION_SCHEMA é um
recurso de metadados comum em várias plataformas.
Instalar a ferramenta dwh-migration-dumper
Para instalar a ferramenta dwh-migration-dumper, siga estas etapas:
Substitua RELEASE_ZIP_FILENAME pelo nome de arquivo zip baixado da versão da ferramenta de extração de linha de comando dwh-migration-dumper, por exemplo, dwh-migration-tools-v1.0.52.zip
O resultado True confirma a verificação com êxito do checksum.
O resultado False indica um erro de verificação. Verifique se o checksum e os arquivos ZIP são da mesma versão de lançamento ao fazer o download e foram colocados no mesmo diretório.
Extraia o arquivo ZIP. O binário da ferramenta /bin está no subdiretório da pasta criada extraindo o arquivo ZIP.
Atualize a variável de ambiente PATH para incluir o caminho de instalação da
ferramenta.
Extração de metadados para migração
Selecione uma das opções a seguir para saber como extrair metadados da sua fonte de dados:
HDFS-PORT: o número da porta do HDFS NameNode. É possível pular esse argumento se você estiver usando a porta 8020 padrão.
MIGRATION-BUCKET: o bucket do Cloud Storage
que você está usando para armazenar os arquivos de migração.
Esse comando extrai metadados do HDFS para um arquivo chamado hdfs-dumper-output.zip no diretório MIGRATION-BUCKET.
Há várias limitações conhecidas ao extrair metadados do HDFS:
Algumas tarefas nesse conector são opcionais e podem falhar, registrando um rastreamento de pilha completo na saída. Desde que as tarefas necessárias tenham sido
concluídas e o hdfs-dumper-output.zip seja gerado, você poderá
prosseguir com a migração do HDFS.
O processo de extração pode falhar ou ser executado mais lentamente do que o esperado se o tamanho do pool de linhas configurado for muito grande. Se você estiver enfrentando esses
problemas, recomendamos diminuir o tamanho do pool de linhas de execução usando o argumento
de linha de comando --thread-pool-size.
Apache Ranger
Execute o comando a seguir para extrair metadados do Apache Ranger usando a ferramenta dwh-migration-dumper.
RANGER-HOST: o nome do host da instância do Apache Ranger.
RANGER-USER: o nome de usuário do usuário do Apache Ranger
RANGER-PASSWORD: a senha do usuário do Apache Ranger
RANGER-SCHEME: especifique se o Apache Ranger está usando http ou https. O valor
padrão é http.
MIGRATION-BUCKET: o bucket do Cloud Storage
que você está usando para armazenar os arquivos de migração.
Também é possível incluir as seguintes flags opcionais:
--kerberos-auth-for-hadoop: substitui --user e --password se o Apache Ranger
estiver protegido pelo Kerberos em vez da autenticação básica. Execute o comando
kinit antes da ferramenta dwh-migration-dumper para usar essa flag.
--ranger-disable-tls-validation: inclua essa flag se o certificado
https usado pela API for autoassinado. Por exemplo, ao usar o Cloudera.
Esse comando extrai metadados do Apache Ranger para um arquivo chamado ranger-dumper-output.zip no diretório MIGRATION-BUCKET.
Cloudera
Execute o comando a seguir para extrair metadados do Cloudera usando a ferramenta dwh-migration-dumper.
CLOUDERA-USER: o nome de usuário do usuário do Cloudera
CLOUDERA-PASSWORD: a senha do usuário do Cloudera
MIGRATION-BUCKET: o bucket do Cloud Storage
que você está usando para armazenar os arquivos de migração.
APPLICATION-TYPES: (opcional) lista de todos os tipos de aplicativos atuais do Hadoop YARN. Por exemplo, SPARK, MAPREDUCE.
PAGE-SIZE: (opcional) especifique quantos dados são buscados de serviços de terceiros, como a API Hadoop YARN. O valor padrão é 1000, que representa 1.000 entidades por
solicitação.
Esse comando extrai metadados do Cloudera para um arquivo chamado dwh-migration-cloudera.zip no diretório MIGRATION-BUCKET.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-09-04 UTC."],[],[],null,["# Extracting metadata from Apache Hive for migration\n==================================================\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n| **Note:** To get support or provide feedback for this feature, contact [bigquery-permission-migration-support@google.com](mailto:bigquery-permission-migration-support@google.com).\n\nThis document shows how you can use the `dwh-migration-dumper` tool to extract\nthe necessary metadata before running a Apache Hive data or permissions\nmigration.\n\nThis document covers metadata extraction from the following data sources:\n\n- Apache Hive\n- Apache Hadoop Distributed File System (HDFS)\n- Apache Ranger\n- Cloudera Manager\n- Apache Hive query logs\n\nBefore you begin\n----------------\n\nBefore you can use the `dwh-migration-dumper` tool, do the following:\n\n### Install Java\n\nThe server on which you plan to run `dwh-migration-dumper` tool must have\nJava 8 or higher installed. If it doesn't, download Java from the\n[Java downloads page](https://www.java.com/download/)\nand install it.\n\n### Required permissions\n\nThe user account that you specify for connecting the `dwh-migration-dumper` tool to\nthe source system must have permissions to read metadata from that system.\nConfirm that this account has appropriate role membership to query the metadata\nresources available for your platform. For example, `INFORMATION_SCHEMA` is a\nmetadata resource that is common across several platforms.\n\nInstall the `dwh-migration-dumper` tool\n---------------------------------------\n\nTo install the `dwh-migration-dumper` tool, follow these steps:\n\n1. On the machine where you want to run the `dwh-migration-dumper` tool, download the zip file from the [`dwh-migration-dumper` tool GitHub repository](https://github.com/google/dwh-migration-tools/releases/latest).\n2. To validate the `dwh-migration-dumper` tool zip file, download the\n [`SHA256SUMS.txt` file](https://github.com/google/dwh-migration-tools/releases/latest/download/SHA256SUMS.txt)\n and run the following command:\n\n ### Bash\n\n ```bash\n sha256sum --check SHA256SUMS.txt\n ```\n\n If verification fails, see [Troubleshooting](#corrupted_zip_file).\n\n ### Windows PowerShell\n\n ```bash\n (Get-FileHash RELEASE_ZIP_FILENAME).Hash -eq ((Get-Content SHA256SUMS.txt) -Split \" \")[0]\n ```\n\n Replace the \u003cvar translate=\"no\"\u003eRELEASE_ZIP_FILENAME\u003c/var\u003e with the downloaded\n zip filename of the `dwh-migration-dumper` command-line extraction tool release---for example,\n `dwh-migration-tools-v1.0.52.zip`\n\n The `True` result confirms successful checksum verification.\n\n The `False` result indicates verification error. Make sure the checksum and\n zip files are downloaded from the same release version and placed in the\n same directory.\n3. Extract the zip file. The extraction tool binary is in the\n `/bin` subdirectory of the folder created by extracting the zip file.\n\n4. Update the `PATH` environment variable to include the installation path for\n the extraction tool.\n\nExtracting metadata for migration\n---------------------------------\n\nSelect one of the following options to learn how to extract metadata for your\ndata source: \n\n### Apache Hive\n\nPerform the steps in the Apache Hive section [Extract metadata and query logs from your data warehouse](/bigquery/docs/migration-assessment#apache-hive)\nto extract your Apache Hive metadata. You can then upload the metadata\nto your Cloud Storage bucket containing your migration files.\n\n### HDFS\n\nRun the following command to extract extract metadata from HDFS\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector hdfs \\\n --host \u003cvar translate=\"no\"\u003eHDFS-HOST\u003c/var\u003e \\\n --port \u003cvar translate=\"no\"\u003eHDFS-PORT\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/hdfs-dumper-output.zip \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eHDFS-HOST\u003c/var\u003e: the HDFS NameNode hostname\n- \u003cvar translate=\"no\"\u003eHDFS-PORT\u003c/var\u003e: the HDFS NameNode port number. You can skip this argument if you are using the default `8020` port.\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n\nThis command extracts metadata from HDFS to a\nfile named `hdfs-dumper-output.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\nThere are several known limitations when extracting metadata from HDFS:\n\n- Some tasks in this connector are optional and can fail, logging a full stack trade in the output. As long as the required tasks have succeeded and the `hdfs-dumper-output.zip` is generated, then you can proceed with the HDFS migration.\n- The extraction process might fail or run slower than expected if the configured thread pool size is too large. If you are encountering these issues, we recommend decreasing the thread pool size using the command line argument `--thread-pool-size`.\n\n### Apache Ranger\n\nRun the following command to extract extract metadata from Apache Ranger\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector ranger \\\n --host \u003cvar translate=\"no\"\u003eRANGER-HOST\u003c/var\u003e \\\n --port 6080 \\\n --user \u003cvar translate=\"no\"\u003eRANGER-USER\u003c/var\u003e \\\n --password \u003cvar translate=\"no\"\u003eRANGER-PASSWORD\u003c/var\u003e \\\n --ranger-scheme \u003cvar translate=\"no\"\u003eRANGER-SCHEME\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/ranger-dumper-output.zip \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eRANGER-HOST\u003c/var\u003e: the hostname of the Apache Ranger instance\n- \u003cvar translate=\"no\"\u003eRANGER-USER\u003c/var\u003e: the username of the Apache Ranger user\n- \u003cvar translate=\"no\"\u003eRANGER-PASSWORD\u003c/var\u003e: the password of the Apache Ranger user\n- \u003cvar translate=\"no\"\u003eRANGER-SCHEME\u003c/var\u003e: specify if Apache Ranger is using `http` or `https`. Default value is `http`.\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n\nYou can also include the following optional flags:\n\n- `--kerberos-auth-for-hadoop`: replaces `--user` and `--password`, if Apache Ranger is protected by kerberos instead of basic authentication. You must run the `kinit` command before the `dwh-migration-dumper` tool tool to use this flag.\n- `--ranger-disable-tls-validation`: include this flag if the https certificate used by the API is self signed. For example, when using Cloudera.\n\nThis command extracts metadata from Apache Ranger to a\nfile named `ranger-dumper-output.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\n### Cloudera\n\nRun the following command to extract metadata from Cloudera\nusing the `dwh-migration-dumper` tool. \n\n dwh-migration-dumper \\\n --connector cloudera-manager \\\n --url \u003cvar translate=\"no\"\u003eCLOUDERA-URL\u003c/var\u003e \\\n --user \u003cvar translate=\"no\"\u003eCLOUDERA-USER\u003c/var\u003e \\\n --password \u003cvar translate=\"no\"\u003eCLOUDERA-PASSWORD\u003c/var\u003e \\\n --output gs://\u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e/cloudera-dumper-output.zip \\\n --yarn-application-types \u003cvar translate=\"no\"\u003eAPPLICATION-TYPES\u003c/var\u003e \\\n --pagination-page-size \u003cvar translate=\"no\"\u003ePAGE-SIZE\u003c/var\u003e \\\n --assessment \\\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLOUDERA-URL\u003c/var\u003e: the URL for Cloudera Manager\n- \u003cvar translate=\"no\"\u003eCLOUDERA-USER\u003c/var\u003e: the username of the Cloudera user\n- \u003cvar translate=\"no\"\u003eCLOUDERA-PASSWORD\u003c/var\u003e: the password of the Cloudera user\n- \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e: the Cloud Storage bucket that you are using to store the migration files.\n- \u003cvar translate=\"no\"\u003eAPPLICATION-TYPES\u003c/var\u003e: (Optional) list of all existing application types from Hadoop YARN. For example, `SPARK, MAPREDUCE`.\n- \u003cvar translate=\"no\"\u003ePAGE-SIZE\u003c/var\u003e: (Optional) specify how much data is fetched from 3rd party services, like the Hadoop YARN API. The default value is `1000`, which represents 1000 entities per request.\n\nThis command extracts metadata from Cloudera to a\nfile named `dwh-migration-cloudera.zip` in the \u003cvar translate=\"no\"\u003eMIGRATION-BUCKET\u003c/var\u003e\ndirectory.\n\n### Apache Hive query logs\n\nPerform the steps in the Apache Hive section [Extract query logs with the `hadoop-migration-assessment` logging hook](/bigquery/docs/migration-assessment#apache-hive)\nto extract your Apache Hive query logs. You can then upload the logs\nto your Cloud Storage bucket containing your migration files.\n\nWhat's next\n-----------\n\nWith your extracted metadata from Hadoop, you can use\nthese metadata files to do the following:\n\n- [Migrate permissions from Hadoop](/bigquery/docs/hadoop-permissions-migration)\n- [Schedule a Hadoop transfer](/bigquery/docs/hadoop-transfer)"]]