ComfyUI_Gemini_Flash 002

ComfyUI_Gemini_Flash is a updated version of the custom node for ComfyUI that integrates the powerful Gemini 1.5 Flash 002 model from Google. This node allows users to leverage Gemini's capabilities for various AI tasks, including text generation, image analysis, video processing, and audio transcription.

node.video.mp4

Features

Multimodal Input Support: Process text, images, videos, and audio using the Gemini 1.5 Flash model.
Long Context Window: Utilize Gemini 1.5 Flash's 1-million-token context window for processing large inputs.
API Key Management: Securely save and manage your Gemini API key.
Proxy Support: Configure proxy settings for API requests.
Output Control: Adjust max output tokens and temperature for fine-tuned responses.

Installation

Clone this repository to your ComfyUI custom nodes directory:

git clone https://github.com/YourUsername/ComfyUI_Gemini_Flash.git

Install the required dependencies:

pip install google-generativeai pillow torchaudio

Ensure you have a valid Gemini API key from Google AI Studio.

Configuration

Open the config.json file in the ComfyUI_Gemini_Flash directory.
Replace "your key" with your actual Gemini API key.
Optionally, set a proxy if required.

Proxy Configuration

The Gemini Flash 002 node supports the use of a proxy for API requests. This can be useful if you're behind a corporate firewall or need to route your requests through a specific server. To use a proxy:

In the ComfyUI interface, locate the "Gemini Flash 002" node.
Find the "proxy" input field.
Enter your proxy URL in the following format:
```
http://username:password@proxy_host:proxy_port
```
or if no authentication is required:
```
http://proxy_host:proxy_port
```

For example:

With authentication: http://john:[email protected]:8080
Without authentication: http://proxy.example.com:8080

You can also set a default proxy in the config.json file:

{
  "GEMINI_API_KEY": "your_api_key_here",
  "PROXY": "http://proxy.example.com:8080"
}

The proxy set in the node input will override the one in the config file.

Note: Make sure your proxy is compatible with HTTPS traffic, as the Gemini API uses secure connections.

Usage

In ComfyUI, locate the "Gemini Flash 002" node in the "Gemini Flash 002" category.
Connect the appropriate inputs based on your use case (text, image, video, or audio).
Set the prompt and any additional parameters (max output tokens, temperature).
Run the workflow to generate content using Gemini 1.5 Flash.

Input Types and Parameters

Required Inputs:

prompt: (STRING) The main instruction or question for the Gemini model.
input_type: (["text", "image", "video", "audio"]) Specifies the type of input being processed.
api_key: (STRING) Your Gemini API key.
proxy: (STRING) Proxy settings (if needed).

Optional Inputs:

text_input: (STRING) Additional text input for text-based tasks.
image: (IMAGE) Image input for image analysis tasks.
video: (IMAGE) Video input (processed as a sequence of frames).
audio: (AUDIO) Audio input for transcription or analysis.
max_output_tokens: (INT, default: 1000) Controls the length of the generated output.
temperature: (FLOAT, default: 0.4, range: 0.0 to 1.0) Adjusts the randomness of the output.

Versions and Capabilities

New Version (Gemini Flash 002)

Supports text, image, video, and audio inputs.
Improved video handling with frame sampling and resizing to manage payload size.
Better prompts for video analysis, considering movement and changes across frames.
Robust error handling and payload size management.

Old Version (Gemini Flash)

Supported text and image inputs.
Basic video support (treated similarly to images).
Limited audio support.

Input Processing Details

Text

Directly processed by the Gemini model.

Image

Resized to a maximum of 1024x1024 pixels to manage payload size.
Analyzed for content, objects, and visual elements.

Video

Processed as a sequence of frames.
Samples up to 10 frames evenly distributed throughout the video.
Each frame is resized to 512x512 pixels.
The model analyzes changes and movements across frames.

Audio

Converted to mono and resampled to 16kHz if necessary.
Sent to the Gemini model for transcription or analysis.

Output

The node returns a STRING containing the generated content or analysis from the Gemini model.

About Gemini 1.5 Flash

Gemini 1.5 Flash is designed to be the fastest and most cost-efficient model for high-volume tasks. It features a 1-million-token context window, enabling processing of large amounts of text, long videos, or extended audio clips in a single request.

Troubleshooting

If you encounter any issues:

Ensure your API key is correctly set in the config.json file.
Check that your input size is not exceeding the model's limits (especially for video and audio).
Verify that all required dependencies are installed.

Contributing

Contributions to improve ComfyUI_Gemini_Flash are welcome! Please feel free to submit issues or pull requests.

Acknowledgements

This project was made possible thanks to the contributions of the ComfyUI community and the developers of the Gemini model at Google. Special thanks to all the contributors and users for their continuous support and feedback.

License

[Specify your license here]

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
example		example
.gitattributes		.gitattributes
.gitignore		.gitignore
Gemini_Flash_Node.py		Gemini_Flash_Node.py
README.md		README.md
__init__.py		__init__.py
config.json		config.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI_Gemini_Flash 002

Features

Installation

Configuration

Proxy Configuration

Usage

Input Types and Parameters

Required Inputs:

Optional Inputs:

Versions and Capabilities

New Version (Gemini Flash 002)

Old Version (Gemini Flash)

Input Processing Details

Text

Image

Video

Audio

Output

About Gemini 1.5 Flash

Troubleshooting

Contributing

Acknowledgements

License

About

Releases

Packages

Contributors 3

Languages

ShmuelRonen/ComfyUI_Gemini_Flash

Folders and files

Latest commit

History

Repository files navigation

ComfyUI_Gemini_Flash 002

Features

Installation

Configuration

Proxy Configuration

Usage

Input Types and Parameters

Required Inputs:

Optional Inputs:

Versions and Capabilities

New Version (Gemini Flash 002)

Old Version (Gemini Flash)

Input Processing Details

Text

Image

Video

Audio

Output

About Gemini 1.5 Flash

Troubleshooting

Contributing

Acknowledgements

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages