Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Ringkasan pengurangan dimensi
Pengurangan dimensi adalah istilah umum untuk serangkaian teknik matematika
yang digunakan untuk menangkap bentuk dan hubungan data dalam ruang berdimensi tinggi
dan menerjemahkan informasi ini ke dalam ruang berdimensi rendah.
Mengurangi dimensi penting saat Anda menangani set data besar
yang dapat berisi ribuan fitur. Dalam ruang data yang begitu besar, rentang jarak yang lebih lebar
antara titik data dapat membuat output model lebih sulit untuk
diinterpretasikan. Misalnya, hal ini menyulitkan untuk memahami titik data mana
yang lebih dekat dan karenanya mewakili data yang lebih mirip.
Pengurangan dimensi membantu Anda mengurangi jumlah fitur sekaligus mempertahankan karakteristik set data yang paling penting. Mengurangi jumlah
fitur juga membantu mengurangi waktu pelatihan model apa pun yang menggunakan data sebagai
input.
BigQuery ML menawarkan model berikut untuk pengurangan dimensi:
Anda dapat menggunakan output dari model pengurangan dimensi untuk tugas seperti
berikut:
Penelusuran kesamaan: Menemukan titik data yang mirip satu sama lain berdasarkan penyematan. Hal ini sangat berguna untuk menemukan produk terkait,
merekomendasikan konten serupa, atau mengidentifikasi item duplikat atau anomali.
Pengelompokan: Gunakan penyematan sebagai fitur input untuk model k-means guna mengelompokkan titik data berdasarkan kesamaannya.
Hal ini dapat membantu Anda menemukan pola dan insight tersembunyi dalam data.
Machine learning: Menggunakan embeddings sebagai fitur input untuk model
klasifikasi atau regresi.
Artikel pusat informasi yang direkomendasikan
Dengan menggunakan setelan default dalam pernyataan CREATE MODEL dan fungsi inferensi, Anda dapat membuat dan menggunakan model pengurangan dimensi, bahkan tanpa banyak pengetahuan ML. Namun, memiliki pengetahuan dasar tentang pengembangan ML membantu Anda mengoptimalkan data dan model untuk memberikan hasil yang lebih baik. Sebaiknya gunakan referensi berikut untuk mengembangkan
pengetahuan tentang teknik dan proses ML:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eDimensionality reduction uses mathematical techniques to translate data from a high-dimensional space to a lower-dimensional space while retaining key characteristics.\u003c/p\u003e\n"],["\u003cp\u003eReducing dimensionality simplifies large datasets with numerous features, making model output more interpretable by showing which data points are most similar.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery ML offers Principal Component Analysis (PCA) and Autoencoder models for dimensionality reduction, which can then be used to perform tasks such as similarity search, clustering, or machine learning.\u003c/p\u003e\n"],["\u003cp\u003eUsing dimensionality reduction models such as PCA and autoencoder can reduce the number of features and significantly reduce model training time.\u003c/p\u003e\n"],["\u003cp\u003eEven without extensive machine learning knowledge, you can create and use dimensionality reduction models with default settings, however, basic knowledge of machine learning will allow you to optimize both the data and model.\u003c/p\u003e\n"]]],[],null,["# Dimensionality reduction overview\n=================================\n\nDimensionality reduction is the common term for a set of mathematical techniques\nused to capture the shape and relationships of data in a high-dimensional space\nand translate this information into a low-dimensional space.\n\nReducing dimensionality is important when you are working with large datasets\nthat can contain thousands of features. In such a large data space, the wider\nrange of distances between data points can make model output harder to\ninterpret. For example, it makes it difficult to understand which data points\nare more closely situated and therefore represent more similar data.\nDimensionality reduction helps you reduce the number of features while retaining\nthe most important characteristics of the dataset. Reducing the number of\nfeatures also helps reduce the training time of any models that use the data as\ninput.\n\nBigQuery ML offers the following models for dimensionality reduction:\n\n- [Principal component analysis (PCA)](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca)\n- [Autoencoder](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-autoencoder)\n\nYou can use PCA and autoencoder models with the\n[`ML.PREDICT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict)\nor\n[`ML.GENERATE_EMBEDDING`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding)\nfunctions to embed data into a lower-dimensional space, and with the\n[`ML.DETECT_ANOMALIES` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-detect-anomalies)\nto perform [anomaly detection](/bigquery/docs/anomaly-detection-overview).\n\nYou can use the output from dimensionality reduction models for tasks such as\nthe following:\n\n- **Similarity search**: Find data points that are similar to each other based on their embeddings. This is great for finding related products, recommending similar content, or identifying duplicate or anomalous items.\n- **Clustering**: Use embeddings as input features for k-means models in order to group data points together based on their similarities. This can help you discover hidden patterns and insights in your data.\n- **Machine learning**: Use embeddings as input features for classification or regression models.\n\nRecommended knowledge\n---------------------\n\nBy using the default settings in the `CREATE MODEL` statements and the\ninference functions, you can create and use a dimensionality reduction model\neven without much ML knowledge. However, having basic knowledge about\nML development helps you optimize both your data and your model to\ndeliver better results. We recommend using the following resources to develop\nfamiliarity with ML techniques and processes:\n\n- [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course)\n- [Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)\n- [Intermediate Machine Learning](https://www.kaggle.com/learn/intermediate-machine-learning)"]]