Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Anda dapat menggunakan Dataplex Universal Catalog untuk membangun arsitektur mesh data. Panduan memulai cepat ini menunjukkan cara menggunakan fitur Dataplex Universal Catalog, seperti lake, zona, dan aset, untuk membangun mesh data.
Mesh data adalah pendekatan organisasi dan teknis yang mendesentralisasi kepemilikan data di antara pemilik data domain. Pemilik ini menyediakan data sebagai produk
dengan cara standar dan memfasilitasi komunikasi di antara berbagai bagian
organisasi untuk mendistribusikan set data di berbagai lokasi. Pelajari lebih lanjut
arsitektur data mesh.
Tujuan
Dalam panduan ini, Anda akan menggunakan entitas Dataplex Universal Catalog untuk membangun arsitektur mesh data:
Buat data lake Dataplex Universal Catalog yang berfungsi sebagai domain untuk mesh data Anda.
Tambahkan zona ke lake Anda yang merepresentasikan setiap tim dalam setiap domain dan menyediakan kontrak data terkelola.
Lampirkan aset yang dipetakan ke data yang disimpan di Cloud Storage.
Biaya
Dalam dokumen ini, Anda akan menggunakan komponen Google Cloudyang dapat ditagih berikut:
Untuk membuat perkiraan biaya berdasarkan proyeksi penggunaan Anda,
gunakan kalkulator harga.
Pengguna Google Cloud baru mungkin memenuhi syarat untuk mendapatkan uji coba gratis.
Setelah menyelesaikan tugas yang dijelaskan dalam dokumen ini, Anda dapat menghindari penagihan berkelanjutan dengan menghapus resource yang Anda buat. Untuk mengetahui informasi selengkapnya, lihat
Pembersihan.
Sebelum memulai
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Klik Create untuk membuat lake baru, yang berfungsi sebagai mesh data Anda.
Di kolom Nama tampilan, masukkan My data mesh.
Untuk Region, pilih us-central1.
Pilih layanan Dataproc Metastore yang Anda buat dan
konfigurasi sebelumnya sebagai metastore terkait.
Klik Buat.
Membuat zona di data lake
Setelah membuat domain dengan membuat data lake Dataplex Universal Catalog, Anda dapat menghosting
kontrak data terkelola dan setiap tim dalam domain menggunakan zona.
Terdapat dua jenis zona:
Zona mentah biasanya digunakan untuk menyimpan data dalam format apa pun dari sumber eksternal di Cloud Storage. Zona mentah berguna untuk data yang memerlukan pemrosesan lebih lanjut sebelum siap digunakan.
Zona yang dikurasi digunakan untuk data terstruktur di Cloud Storage yang harus sesuai dengan format file tertentu, dan disusun dalam tata letak direktori yang kompatibel dengan Hive. Data ini paling berguna untuk data yang siap digunakan
dan dianalisis.
Setiap domain (misalnya, sales, customers, products) harus memiliki setidaknya
zona mentah dan zona pilihan.
Zona tambahan digunakan untuk mengelola kontrak data antar-tim atau untuk memberikan perincian yang lebih terperinci bagi tim dalam domain tertentu. Misalnya, pengelolaan
inventaris dalam domain produk. Pemilik data dapat mengelola data
dalam domain mereka dan mengaksesnya.
Di Google Cloud konsol, buka tampilan Dataplex Universal Catalog
Kelola.
Klik nama danau (My data mesh) yang ingin Anda tambahi zona.
Di tab Zones, klik addAdd Zone.
Di kolom Nama tampilan, masukkan My sub domain. Dataplex Universal Catalog
secara otomatis membuat ID untuk zona Anda.
Untuk Type, pilih Raw zone.
Klik Buat.
Menghubungkan aset ke zona Anda
Lampirkan aset data ke zona Anda. Aset data, yaitu resource penyimpanan yang berisi data Anda, dapat berupa bucket Cloud Storage atau set data BigQuery. Ini adalah langkah terakhir dalam membuat arsitektur mesh data Anda.
Di tampilan Manage Dataplex Universal Catalog, klik lake yang Anda buat
(My data mesh).
Di tab Zones, klik zona (My sub domain) untuk menambahkan aset.
Di tab Aset, klik addTambahkan aset
Klik Tambahkan Aset.
Untuk Type, pilih Cloud Storage bucket.
Di kolom Nama tampilan , masukkan Data mesh asset. Dataplex Universal Catalog
akan otomatis membuat ID aset untuk Anda.
Di kolom Bucket, klik Browse.
Pilih bucket Anda dari daftar.
Klik Select.
Klik Selesai, lalu klik Lanjutkan.
Klik Lanjutkan untuk menerima Setelan lanjutan default.
Klik Kirim.
Pembersihan
Agar tidak perlu membayar biaya pada akun Google Cloud Anda untuk resource yang digunakan dalam
tutorial ini, hapus project yang berisi resource tersebut, atau simpan project dan
hapus setiap resource.
Menghapus project
In the Google Cloud console, go to the Manage resources page.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-05 UTC."],[],[],null,["# Build a data mesh\n\n*** ** * ** ***\n\nYou can use Dataplex Universal Catalog to build a data mesh architecture. This quickstart\nshows you how to use Dataplex Universal Catalog features, such as a lake, zones, and\nassets, to build a data mesh.\n\nA data mesh is an organizational and technical approach that decentralizes data\nownership among domain data owners. These owners provide the data as a product\nin a standard way and facilitate communication among different parts of the\norganization to distribute datasets across different locations. Learn more about\n[data mesh architectures](https://services.google.com/fh/files/misc/build-a-modern-distributed-datamesh-with-google-cloud-whitepaper.pdf).\n\nObjectives\n----------\n\nIn this guide, you use the Dataplex Universal Catalog entities to build a\ndata mesh architecture:\n\n- Create a Dataplex Universal Catalog lake that acts as the domain for your data mesh.\n- Add zones to your lake that represents individual teams within each domain and provide managed data contracts.\n- Attach assets that map to data stored in Cloud Storage.\n\nCosts\n-----\n\n\nIn this document, you use the following billable components of Google Cloud:\n\n\n- [Dataplex Universal Catalog](/dataplex/pricing)\n- [Cloud Storage](/storage/pricing)\n\n\nTo generate a cost estimate based on your projected usage,\nuse the [pricing calculator](/products/calculator). \nNew Google Cloud users might be eligible for a [free trial](/free). \n\n\u003cbr /\u003e\n\nWhen you finish the tasks that are described in this document, you can avoid\ncontinued billing by deleting the resources that you created. For more information, see\n[Clean up](#clean-up).\n\nBefore you begin\n----------------\n\n1. In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n2.\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n3.\n\n\n Enable the Dataplex API.\n\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com)\n4. [Create a Dataproc Metastore service](/dataproc-metastore/docs/create-service).\n\n | **Note:** You can attach each Dataproc Metastore to only one Dataplex Universal Catalog lake. Enable `gRPC` for your metastore.\n\n### Create a Cloud Storage bucket\n\nYou need a Cloud Storage bucket to store the data assets of your data\nmesh.\n\nTo create a Cloud Storage bucket, follow the instructions in\n[create a Cloud Storage bucket](/storage/docs/creating-buckets). When\ndoing so, note the following:\n\n- Name your bucket.\n- For **Location type** , choose **Region** , and select **us-central1 (Iowa)** from the menu. \n\nCreate a domain\n---------------\n\n1. In the Google Cloud console, go to the Dataplex Universal Catalog page.\n\n [Go to Dataplex Universal Catalog](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Manage** view.\n\n3. Click **Create** to create a new lake, which acts as your data mesh.\n\n4. In the **Display name** field, enter `My data mesh`.\n\n | **Note:** Dataplex Universal Catalog automatically generates a lake ID.\n5. For **Region** , select `us-central1`.\n\n | **Note:** The region you select for your data mesh determines the location of the data (not including attached assets) managed by Dataplex Universal Catalog. The same region is used when Dataplex Universal Catalog creates resources in other services, but not for data contained within assets.\n6. Select the Dataproc Metastore service that you created and\n configured earlier as the associated metastore.\n\n7. Click **Create**.\n\nCreate zones in your lake\n-------------------------\n\nAfter creating a domain by creating a Dataplex Universal Catalog lake, you can host\nmanaged data contracts and individual teams within the domain by using zones.\nThere are two types of zones:\n\n- Raw zones are typically used to store data in any format from external sources\n in Cloud Storage. Raw zones are useful for data that requires further\n processing before it's ready for consumption.\n\n- Curated zones are used for structured data in Cloud Storage that must\n conform to certain file formats, and are organized in a hive-compatible\n directory layout. They are most useful for data that's ready for consumption\n and analysis.\n\nEach domain (for example, `sales`, `customers`, `products`) should have at least\na raw zone and a curated zone.\n\nAdditional zones are used to manage data contracts between teams or to provide a\nmore granular breakdown for teams within a given domain. For example, inventory\nmanagement within the product domain. Data owners are able to manage the data\nwithin their domain and access it.\n\n1. In the Google Cloud console, navigate to the Dataplex Universal Catalog\n **Manage** view.\n\n2. Click the name of the lake (`My data mesh`) you want to add a zone to.\n\n3. In the **Zones** tab, click add **Add Zone**.\n\n4. In the **Display name** field, enter `My sub domain`. Dataplex Universal Catalog\n automatically generates an ID for your zone.\n\n | **Note:** The zone name becomes the name of a BigQuery dataset. Therefore, all zones hosted in the same Google Cloud project must have a unique ID, even if they exist within different lakes.\n5. For **Type** , select **Raw zone**.\n\n6. Click **Create**.\n\nAttach assets to your zones\n---------------------------\n\nAttach data assets to your zone. A data asset, the storage resources that\ncontain your data, can be a Cloud Storage bucket or a\nBigQuery dataset. This is the final step in creating your data\nmesh architecture.\n\n1. In the Dataplex Universal Catalog **Manage** view, click the lake you created\n (`My data mesh`).\n\n2. In the **Zones** tab, click the zone (`My sub domain`) to add the asset to.\n\n3. In the **Assets** tab, click add\n **Add assets**\n\n4. Click **Add an Asset**.\n\n5. For **Type** , select **Cloud Storage bucket**.\n\n6. In the **Display name** field , enter `Data mesh asset`. Dataplex Universal Catalog\n automatically generates an asset ID for you.\n\n7. In the **Bucket** field, click **Browse**.\n\n 1. Select your bucket from the list.\n 2. Click **Select**.\n8. Click **Done** and then click **Continue**.\n\n9. Click **Continue** to accept the default **Advanced settings**.\n\n10. Click **Submit**.\n\nClean up\n--------\n\n\nTo avoid incurring charges to your Google Cloud account for the resources used in this\ntutorial, either delete the project that contains the resources, or keep the project and\ndelete the individual resources.\n\n### Delete the project\n\n| **Caution** : Deleting a project has the following effects:\n|\n| - **Everything in the project is deleted.** If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.\n| - **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an `appspot.com` URL, delete selected resources inside the project instead of deleting the whole project.\n|\n|\n| If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects\n| can help you avoid exceeding project quota limits.\n1. In the Google Cloud console, go to the **Manage resources** page.\n\n [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)\n2. In the project list, select the project that you want to delete, and then click **Delete**.\n3. In the dialog, type the project ID, and then click **Shut down** to delete the project.\n\n### Delete your data mesh architecture\n\n1. In the Google Cloud console, navigate to the Dataplex Universal Catalog\n **Manage** view.\n\n2. For the lake that you want to delete, click more_vert\n **View more** , and then click **Delete**.\n\n3. To confirm the action, enter `delete` and click **Delete lake**.\n\nWhat's next\n-----------\n\n- Learn about [data processing tasks](/dataplex/docs/task-templates)\n- Learn about [discovering data](/dataplex/docs/discover-data)\n- Learn about [using data quality tasks](/dataplex/docs/using-data-quality-task-templates)"]]