Vertex AI glossary

  • annotation set
    • An annotation set contains the labels associated with the uploaded source files within a dataset. An annotation set is associated with both a data type and an objective (for example, video/classification)
  • API endpoints
    • API Endpoints is a service config aspect that specifies the network addresses, also known as service endpoints. (for example, aiplatform.googleapis.com).
  • Approximate Nearest Neighbor (ANN)
    • The Approximate Nearest Neighbor (ANN) service is a high scale, low latency solution, to find similar vectors (or more specifically, "embeddings") for a large corpus.
  • artifact
    • An artifact is a discrete entity or piece of data produced and consumed by a machine learning workflow. Examples of artifacts include datasets, models, input files, and training logs.
  • Artifact Registry
    • Artifact Registry is a universal artifact management service. It is the recommended service for managing containers and other artifacts on Google Cloud. For more information, see Artifact Registry.
  • batch prediction
    • Batch prediction takes a group of prediction requests and outputs the results in one file. For more information, see Getting batch predictions.
  • bounding box
    • A bounding box for an object in the video frame can be specified in either of two ways (i) Using 2 vertices consisting of a set of x,y coordinates if they are diagonally opposite points of the rectangle. For example: x_relative_min, y_relative_min,,,x_relative_max,y_relative_max,, (ii) Use all 4 vertices. For more information, see Prepare video data.
  • classification metrics
    • Supported classification metrics in the Vertex AI SDK for Python are confusion matrix and ROC curve.
  • context
    • A context is used to group artifacts and executions together under a single, queryable, and typed category. Contexts can be used to represent sets of metadata. An example of a Context would be a run of a machine learning pipeline.
  • Customer-managed encryption keys (cmek)
    • Customer-managed encryption keys (CMEK) are integrations that allow customers to encrypt data in existing Google services using a key they manage in Cloud KMS (also known as Storky). The key in Cloud KMS is the key encryption key protecting their data.
  • dataset
    • A dataset is broadly defined as a collection of structured or unstructured data records. For more information, see Create a dataset.
  • embedding
    • An embedding is a type of vector that's used to represent data in a way that captures its semantic meaning. Embeddings are typically created using machine learning techniques, and they are often used in natural language processing (NLP) and other machine learning applications.
  • event
    • An event describes the relationship between artifacts and executions. Each artifact can be produced by an execution and consumed by other executions. Events help you to determine the provenance of artifacts in their ML workflows by chaining together artifacts and executions.
  • execution
    • An execution is a record of an individual machine learning workflow step, typically annotated with its runtime parameters. Examples of executions include data ingestion, data validation, model training, model evaluation, and model deployment.
  • experiment
    • An experiment is a context that can contain a set of n experiment runs in addition to pipeline runs where a user can investigate, as a group, different configurations such as input artifacts or hyperparameters.
  • experiment run
    • An experiment run can contain user-defined metrics, parameters, executions, artifacts, and Vertex resources (for example, PipelineJob).
  • exploratory data analysis
    • In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
  • feature
    • In machine learning (ML), a feature is a characteristic or attribute of an instance or entity that's used as an input to train an ML model or to make predictions.
  • feature engineering
    • Feature engineering is the process of transforming raw machine learning (ML) data into features that can be used to train ML models or to make predictions.
  • feature value
    • A feature value corresponds to the actual and measurable value of a feature (attribute) of an instance or entity. A collection of feature values for the unique entity represent the feature record corresponding to the entity.
  • feature serving
    • Feature serving is the process of exporting or fetching feature values for training or inference. In Vertex AI, there are two types of feature serving—online serving and offline serving. Online serving retrieves the latest feature values of a subset of the feature data source for online predictions. Offline or batch serving exports high volumes of feature data for offline processing, such as ML model training.
  • feature timestamp
    • A feature timestamp indicates when the set of feature values in a specific feature record for an entity were generated.
  • feature record
    • A feature record is an aggregation of all feature values that describe the attributes of a unique entity at a specific point in time.
  • feature registry
    • A feature registry is a central interface for recording feature data sources that you want to serve for online predictions.
  • feature group
    • A feature group is a feature registry resource that corresponds to a BigQuery source table or view containing feature data. A feature view might contain features and can be thought of as a logical grouping of feature columns in the data source.
  • feature view
    • A feature view is a logical collection of features materialized from a BigQuery data source to an online store instance. A feature view stores and periodically refreshes the customer's feature data, which is refreshed periodically from the BigQuery source. A feature view is associated with the feature data storage either directly or through associations to feature registry resources.
  • Google Cloud pipeline components SDK
    • The Google Cloud pipeline components (GCPC) SDK provides a set of prebuilt Kubeflow Pipelines components that are production quality, performant, and easy to use. You can use Google Cloud Pipeline Components to define and run ML pipelines in Vertex AI Pipelines and other ML pipeline execution backends conformant with Kubeflow Pipelines. For more information, see .
  • histogram
    • A graphical display of the variation in a set of data using bars. A histogram visualizes patterns that are difficult to detect in a simple table of numbers.
  • index
    • A collection of vectors deployed together for similarity search. Vectors can be added to an index or removed from an index. Similarity search queries are issued to a specific index and will search over the vectors in that index.
  • ground truth
    • A term that refers to verifying machine learning for accuracy against the real world, like a ground truth dataset.
  • Machine Learning Metadata
    • ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows. MLMD is an integral part of TensorFlow Extended (TFX), but is designed so that it can be used independently. As part of the broader TFX platform, most users only interact with MLMD when examining the results of pipeline components, for example in notebooks or in TensorBoard.
  • managed dataset
    • A dataset object created in and hosted by Vertex AI.
  • metadata resources
    • Vertex ML Metadata exposes a graph-like data model for representing metadata produced and consumed from ML workflows. The primary concepts are artifacts, executions, events, and contexts.
  • MetadataSchema
    • A MetadataSchema describes the schema for particular types of artifacts, executions, or contexts. MetadataSchemas are used to validate the key-value pairs during creation of the corresponding Metadata resources. Schema validation is only performed on matching fields between the resource and the MetadataSchema. Type schemas are represented using OpenAPI Schema Objects, which should be described using YAML.
  • MetadataStore
    • A MetadataStore is the top-level container for metadata resources. MetadataStore is regionalized and associated with a specific Google Cloud project. Typically, an organization uses one shared MetadataStore for metadata resources within each project.
  • ML pipelines
    • ML pipelines are portable and scalable ML workflows that are based on containers.
  • model
    • Any model pre-trained or not.
  • model resource name
    • The resource name for a model as follows: projects/<PROJECT_ID>/locations/<LOCATION_ID>/models/<MODEL_ID>. You can find the model's ID in the Cloud console on the 'Model Registry' page.
  • offline store
    • The offline store is a storage facility storing recent and historical feature data, which is typically used for training ML models. An offline store also contains the latest feature values, which you can serve for online predictions.
  • online store
    • In feature management, an online store is a storage facility for the latest feature values to be served for online predictions.
  • parameters
    • Parameters are keyed input values that configure a run, regulate the behavior of the run, and affect the results of the run. Examples include learning rate, dropout rate, and number of training steps.
  • pipeline
    • ML pipelines are portable and scalable ML workflows that are based on containers.
  • pipeline component
    • A self-contained set of code that performs one step in a pipeline's workflow, such as data preprocessing, data transformation, and training a model.
  • pipeline job
    • A resource in the Vertex AI API corresponding to Vertex Pipeline Jobs. Users create a PipelineJob when they want to run an ML Pipeline on Vertex AI.
  • pipeline run
    • One or more Vertex PipelineJobs can be associated with an experiment where each PipelineJob is represented as a single run. In this context, the parameters of the run are inferred by the parameters of the PipelineJob. The metrics are inferred from the system.Metric artifacts produced by that PipelineJob. The artifacts of the run are inferred from artifacts produced by that PipelineJob.
  • pipeline template
    • An ML workflow definition that a single user or multiple users can reuse to create multiple pipeline runs.
  • recall
    • The percentage of true nearest neighbors returned by the index. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the "ground truth" nearest neighbors, the recall is 19/20x100 = 95%.
  • restricts
    • Functionality to "restrict" searches to a subset of the index by using Boolean rules. Restrict is also referred to as "filtering". With Vector Search, you can use numeric filtering and text attribute filtering.
  • service account
    • In Google Cloud, a service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person. Applications use service accounts to make authorized API calls.
  • summary metrics
    • Summary metrics are a single value for each metric key in an experiment run. For example, the test accuracy of an experiment is the accuracy calculated against a test dataset at the end of training that can be captured as a single value summary metric.
  • TensorBoard
    • TensorBoard is a suite of web applications for visualizing and understanding TensorFlow runs and models. For more information, see TensorBoard.
  • TensorBoard Resource name
    • A TensorBoard Resource name is used to fully identify a Vertex AI TensorBoard instance. The format is as follows: projects/PROJECT_ID_OR_NUMBER/locations/REGION/tensorboards/TENSORBOARD_INSTANCE_ID.
  • TensorBoard instance
    • A TensorBoard instance is a regionalized resource that stores Vertex AI TensorBoard Experiments associated with a Project. You can create multiple TensorBoard instances in a project if, for example, you want multiple CMEK enabled instances. This is the same as the TensorBoard resource in the API.
  • TensorFlow Extended (TFX)
    • Tensorflow extended (tfx), an end-to-end platform for deploying production machine learning pipelines based on the tensorflow platform.
  • time offset
    • Time offset is relative to the beginning of a video.
  • time segment
    • A time segment is identified by beginning and ending time offsets.
  • time series metrics
    • Time series metrics are longitudinal metric values where each value represents a step in the training routine portion of a run. Time series metrics are stored in Vertex AI TensorBoard. Vertex AI Experiments stores a reference to the Vertex TensorBoard resource.
  • token
    • A token in a language model is the atomic unit that the model is training and making predictions on, namely words, morphemes, and characters. In domains outside of language models, tokens can represent other kinds of atomic units. For example, in computer vision, a token might be a subset of an image.
  • unmanaged artifacts
    • An artifact that exists outside of the Vertex AI context.
  • vector
    • A vector is a list of float values that has magnitude and direction. It can be used to represent any kind of data, such as numbers, points in space, or directions.
  • Vertex AI Experiments
    • Vertex AI Experiments enables users to track (i) steps of an experiment run, for example, preprocessing, training, (ii) inputs, for example, algorithm, parameters, datasets, (iii) outputs of those steps, for example, models, checkpoints, metrics.
  • Vertex AI TensorBoard Experiment
    • The data associated with an Experiment can be viewed in TensorBoard web application (scalars, histograms, distributions, etc.). Timeseries scalars can be viewed in the Google Cloud Console. For more information, see Compare and analyze runs.
  • Vertex data type
    • Vertex AI data types are "image," "text," "tabular," and "video".
  • video segment
    • A video segment is identified by beginning and ending time offset of a video.
  • virtual private cloud (VPC)
    • Virtual private cloud is an on-demand, configurable pool of shared computing resources that's allocated in a public cloud environment and provides a level of isolation between different organizations using those resources.