mutimodal
Here are 8 public repositories matching this topic...
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
-
Updated
Apr 15, 2025 - Python
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
-
Updated
Nov 21, 2024 - Python
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
-
Updated
Feb 9, 2025 - Python
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
-
Updated
Mar 20, 2025 - Jupyter Notebook
基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
-
Updated
May 26, 2025 - Python
QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]
-
Updated
May 26, 2025 - Python
Improve this page
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."