- 1.122.0 (latest)
- 1.121.0
- 1.120.0
- 1.119.0
- 1.118.0
- 1.117.0
- 1.95.1
- 1.94.0
- 1.93.1
- 1.92.0
- 1.91.0
- 1.90.0
- 1.89.0
- 1.88.0
- 1.87.0
- 1.86.0
- 1.85.0
- 1.84.0
- 1.83.0
- 1.82.0
- 1.81.0
- 1.80.0
- 1.79.0
- 1.78.0
- 1.77.0
- 1.76.0
- 1.75.0
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
API documentation for vision_models package.
Classes
GeneratedImage
Generated image.
Image
Image.
ImageCaptioningModel
Generates captions from image.
Examples::
model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
ImageGenerationModel
Generates images from text prompt.
Examples::
model = ImageGenerationModel.from_pretrained("imagegeneration@002")
response = model.generate_images(
prompt="Astronaut riding a horse",
# Optional:
number_of_images=1,
seed=0,
)
response[0].show()
response[0].save("image1.png")
ImageGenerationResponse
Image generation response.
ImageQnAModel
Answers questions about an image.
Examples::
model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
ImageTextModel
Generates text from images.
Examples::
model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
MultiModalEmbeddingModel
Generates embedding vectors from images and videos.
Examples::
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
image=image,
video=video,
contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding
MultiModalEmbeddingResponse
The multimodal embedding response.
Video
Video.
VideoEmbedding
Embeddings generated from video with offset times.
VideoSegmentConfig
The specific video segments (in seconds) the embeddings are generated for.