Classes for working with vision models.
Classes
ControlImageConfig
ControlImageConfig(
control_type: typing.Literal[
"CONTROL_TYPE_DEFAULT",
"CONTROL_TYPE_SCRIBBLE",
"CONTROL_TYPE_FACE_MESH",
"CONTROL_TYPE_CANNY",
],
enable_control_image_computation: typing.Optional[bool] = False,
)
Control image config.
ControlReferenceImage
ControlReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
control_type: typing.Optional[
typing.Literal["default", "scribble", "face_mesh", "canny"]
] = None,
enable_control_image_computation: typing.Optional[bool] = False,
)
Control reference image.
This encapsulates the control reference image type.
EntityLabel
EntityLabel(
label: typing.Optional[str] = None, score: typing.Optional[float] = None
)
Entity label holding a text label and any associated confidence score.
GeneratedImage
GeneratedImage(
image_bytes: typing.Optional[bytes],
generation_parameters: typing.Dict[str, typing.Any],
gcs_uri: typing.Optional[str] = None,
)
Generated image.
GeneratedMask
GeneratedMask(
image_bytes: typing.Optional[bytes],
gcs_uri: typing.Optional[str] = None,
labels: typing.Optional[
typing.List[vertexai.preview.vision_models.EntityLabel]
] = None,
)
Generated image mask.
Image
Image(
image_bytes: typing.Optional[bytes] = None, gcs_uri: typing.Optional[str] = None
)
Image.
ImageCaptioningModel
ImageCaptioningModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Generates captions from image.
Examples::
model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
ImageGenerationModel
ImageGenerationModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Generates images from text prompt.
Examples::
model = ImageGenerationModel.from_pretrained("imagegeneration@002")
response = model.generate_images(
prompt="Astronaut riding a horse",
# Optional:
number_of_images=1,
seed=0,
)
response[0].show()
response[0].save("image1.png")
ImageGenerationResponse
ImageGenerationResponse(images: typing.List[GeneratedImage])
Image generation response.
ImageQnAModel
ImageQnAModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Answers questions about an image.
Examples::
model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
ImageSegmentationModel
ImageSegmentationModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Segments an image.
ImageSegmentationResponse
ImageSegmentationResponse(
_prediction_response: typing.Any,
masks: typing.List[vertexai.preview.vision_models.GeneratedMask],
)
Image Segmentation response.
ImageTextModel
ImageTextModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Generates text from images.
Examples::
model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
MaskImageConfig
MaskImageConfig(
mask_mode: typing.Literal[
"MASK_MODE_DEFAULT",
"MASK_MODE_USER_PROVIDED",
"MASK_MODE_BACKGROUND",
"MASK_MODE_FOREGROUND",
"MASK_MODE_SEMANTIC",
],
segmentation_classes: typing.Optional[typing.List[int]] = None,
dilation: typing.Optional[float] = None,
)
Mask image config.
MaskReferenceImage
MaskReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
mask_mode: typing.Optional[
typing.Literal[
"default", "user_provided", "background", "foreground", "semantic"
]
] = None,
dilation: typing.Optional[float] = None,
segmentation_classes: typing.Optional[typing.List[int]] = None,
)
Mask reference image. This encapsulates the mask reference image type.
MultiModalEmbeddingModel
MultiModalEmbeddingModel(model_id: str, endpoint_name: typing.Optional[str] = None)
Generates embedding vectors from images and videos.
Examples::
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
image=image,
video=video,
contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding
MultiModalEmbeddingResponse
MultiModalEmbeddingResponse(
_prediction_response: typing.Any,
image_embedding: typing.Optional[typing.List[float]] = None,
video_embeddings: typing.Optional[
typing.List[vertexai.vision_models.VideoEmbedding]
] = None,
text_embedding: typing.Optional[typing.List[float]] = None,
)
The multimodal embedding response.
RawReferenceImage
RawReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
)
Raw reference image.
This encapsulates the raw reference image type.
ReferenceImage
ReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
)
Reference image.
This is a new base API object for Imagen 3.0 Capabilities.
Scribble
Scribble(image_bytes: typing.Optional[bytes], gcs_uri: typing.Optional[str] = None)
Input scribble for image segmentation.
StyleImageConfig
StyleImageConfig(style_description: str)
Style image config.
StyleReferenceImage
StyleReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
style_description: typing.Optional[str] = None,
)
Style reference image. This encapsulates the style reference image type.
SubjectImageConfig
SubjectImageConfig(
subject_description: str,
subject_type: typing.Literal[
"SUBJECT_TYPE_DEFAULT",
"SUBJECT_TYPE_PERSON",
"SUBJECT_TYPE_ANIMAL",
"SUBJECT_TYPE_PRODUCT",
],
)
Subject image config.
SubjectReferenceImage
SubjectReferenceImage(
reference_id,
image: typing.Optional[
typing.Union[bytes, vertexai.vision_models.Image, str]
] = None,
subject_description: typing.Optional[str] = None,
subject_type: typing.Optional[
typing.Literal["default", "person", "animal", "product"]
] = None,
)
Subject reference image.
This encapsulates the subject reference image type.
Video
Video(
video_bytes: typing.Optional[bytes] = None, gcs_uri: typing.Optional[str] = None
)
Video.
VideoEmbedding
VideoEmbedding(
start_offset_sec: int, end_offset_sec: int, embedding: typing.List[float]
)
Embeddings generated from video with offset times.
VideoSegmentConfig
VideoSegmentConfig(
start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16
)
The specific video segments (in seconds) the embeddings are generated for.
WatermarkVerificationModel
WatermarkVerificationModel(
model_id: str, endpoint_name: typing.Optional[str] = None
)
Verifies if an image has a watermark.
WatermarkVerificationResponse
WatermarkVerificationResponse(
_prediction_response: Any, watermark_verification_result: Optional[str] = None
)
WatermarkVerificationResponse(_prediction_response: Any, watermark_verification_result: Optional[str] = None)