-
annotation set
- An annotation set contains the labels associated with the uploaded source files within a dataset. An annotation set is associated with both a data type and an objective (for example, video/classification)
-
API endpoints
- API Endpoints is a service config aspect that specifies the network addresses, also known as service endpoints. (for example, aiplatform.googleapis.com).
-
artifact
- An artifact is a discrete entity or piece of data produced and consumed by a machine learning workflow. Examples of artifacts include datasets, models, input files, and training logs.
-
Artifact Registry
- Artifact Registry is a universal artifact management service. It is the recommended service for managing containers and other artifacts on Google Cloud. For more information, see Artifact Registry.
-
batch prediction
- Batch prediction takes a group of prediction requests and outputs the results in one file. For more information, see Getting batch predictions.
-
bounding box
- A bounding box for an object in the video frame can be specified in either of two ways (i) Using 2 vertices consisting of a set of x,y coordinates if they are diagonally opposite points of the rectangle. For example: x_relative_min, y_relative_min,,,x_relative_max,y_relative_max,, (ii) Use all 4 vertices. For more information, see Prepare video data.
-
classification metrics
- Supported classification metrics in the Vertex AI SDK for Python are confusion matrix and ROC curve.
-
context
- A context is used to group artifacts and executions together under a single, queryable, and typed category. Contexts can be used to represent sets of metadata. An example of a Context would be a run of a machine learning pipeline.
-
Customer-managed encryption keys (cmek)
- Customer-managed encryption keys (CMEK) are integrations that allow customers to encrypt data in existing Google services using a key they manage in Cloud KMS (also known as Storky). The key in Cloud KMS is the key encryption key protecting their data.
-
dataset
- A dataset is broadly defined as a collection of structured or unstructured data records. For more information, see Create a dataset.
-
event
- An event describes the relationship between artifacts and executions. Each artifact can be produced by an execution and consumed by other executions. Events help you to determine the provenance of artifacts in their ML workflows by chaining together artifacts and executions.
-
execution
- An execution is a record of an individual machine learning workflow step, typically annotated with its runtime parameters. Examples of executions include data ingestion, data validation, model training, model evaluation, and model deployment.
-
experiment
- An experiment is a context that can contain a set of n experiment runs in addition to pipeline runs where a user can investigate, as a group, different configurations such as input artifacts or hyperparameters.
-
experiment run
- An experiment run can contain user-defined metrics, parameters, executions, artifacts, and Vertex resources (for example, PipelineJob).
-
exploratory data analysis
- In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
-
Google Cloud pipeline components SDK
- The Google Cloud pipeline components (GCPC) SDK provides a set of prebuilt Kubeflow Pipelines components that are production quality, performant, and easy to use. You can use Google Cloud Pipeline Components to define and run ML pipelines in Vertex AI Pipelines and other ML pipeline execution backends conformant with Kubeflow Pipelines. For more information, see .
-
histogram
- A graphical display of the variation in a set of data using bars. A histogram visualizes patterns that are difficult to detect in a simple table of numbers.
-
index
- A collection of vectors deployed together for similarity search. Vectors can be added to an index or removed from an index. Similarity search queries are issued to a specific index and will search over the vectors in that index.
-
Machine Learning Metadata
- ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows. MLMD is an integral part of TensorFlow Extended (TFX), but is designed so that it can be used independently. As part of the broader TFX platform, most users only interact with MLMD when examining the results of pipeline components, for example in notebooks or in TensorBoard.
-
managed dataset
- A dataset object created in and hosted by Vertex AI.
-
metadata resources
- Vertex ML Metadata exposes a graph-like data model for representing metadata produced and consumed from ML workflows. The primary concepts are artifacts, executions, events, and contexts.
-
MetadataSchema
- A MetadataSchema describes the schema for particular types of artifacts, executions, or contexts. MetadataSchemas are used to validate the key-value pairs during creation of the corresponding Metadata resources. Schema validation is only performed on matching fields between the resource and the MetadataSchema. Type schemas are represented using OpenAPI Schema Objects, which should be described using YAML.
-
MetadataStore
- A MetadataStore is the top-level container for metadata resources. MetadataStore is regionalized and associated with a specific Google Cloud project. Typically, an organization uses one shared MetadataStore for metadata resources within each project.
-
ML pipelines
- ML pipelines are portable and scalable ML workflows that are based on containers.
-
model
- Any model pre-trained or not.
-
model resource name
- The resource name for a
model
as follows: projects/<PROJECT_ID>/locations/<LOCATION_ID>/models/<MODEL_ID>. You can find the model's ID in the Cloud console on the 'Model Registry' page.
- The resource name for a
-
parameters
- Parameters are keyed input values that configure a run, regulate the behavior of the run, and affect the results of the run. Examples include learning rate, dropout rate, and number of training steps.
-
pipeline
- ML pipelines are portable and scalable ML workflows that are based on containers.
-
pipeline component
- A self-contained set of code that performs one step in a pipeline's workflow, such as data preprocessing, data transformation, and training a model.
-
pipeline job
- A resource in the Vertex AI API corresponding to Vertex Pipeline Jobs. Users create a PipelineJob when they want to run an ML Pipeline on Vertex AI.
-
pipeline run
- One or more Vertex PipelineJobs can be associated with an experiment where each PipelineJob is represented as a single run. In this context, the parameters of the run are inferred by the parameters of the PipelineJob. The metrics are inferred from the system.Metric artifacts produced by that PipelineJob. The artifacts of the run are inferred from artifacts produced by that PipelineJob.
-
pipeline template
- An ML workflow definition that a single user or multiple users can reuse to create multiple pipeline runs.
-
recall
- The percentage of true nearest neighbors returned by the index. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the "ground truth" nearest neighbors, the recall is 19/20x100 = 95%.
-
restricts
- Functionality to "restrict" searches to a subset of the index by using Boolean rules.
-
service account
- In Google Cloud, a service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person. Applications use service accounts to make authorized API calls.
-
summary metrics
- Summary metrics are a single value for each metric key in an experiment run. For example, the test accuracy of an experiment is the accuracy calculated against a test dataset at the end of training that can be captured as a single value summary metric.
-
TensorBoard
- TensorBoard is a suite of web applications for visualizing and understanding TensorFlow runs and models. For more information, see see Vertex AI TensorBoard.
-
TensorBoard instance
- A TensorBoard instance is a regionalized resource that stores Vertex AI TensorBoard experiments. You can create multiple TensorBoard instances in a project. This is the same as the TensorBoard resource in the API.
-
TensorFlow Extended (TFX)
- Tensorflow extended (tfx), an end-to-end platform for deploying production machine learning pipelines based on the tensorflow platform.
-
time offset
- Time offset is relative to the beginning of a video.
-
time segment
- A time segment is identified by beginning and ending time offsets.
-
time series metrics
- Time series metrics are longitudinal metric values where each value represents a step in the training routine portion of a run. Time series metrics are stored in Vertex AI TensorBoard. Vertex AI Experiments stores a reference to the Vertex TensorBoard resource.
-
unmanaged artifacts
- An artifact that exists outside of the Vertex AI context.
-
Vertex AI Experiments
- Vertex AI Experiments enables users to track (i) steps of an experiment run, for example, preprocessing, training, (ii) inputs, for example, algorithm, parameters, datasets, (iii) outputs of those steps, for example, models, checkpoints, metrics.
-
Vertex data type
- Vertex AI data types are "image," "text," "tabular," and "video".
-
video segment
- A video segment is identified by beginning and ending time offset of a video.
-
virtual private cloud (VPC)
- Virtual private cloud is an on-demand, configurable pool of shared computing resources that's allocated in a public cloud environment and provides a level of isolation between different organizations using those resources.