Meissa-4B: Multi-modal Medical Agentic Intelligence
Meissa-4B is a lightweight 4B-parameter medical multi-modal LLM with full agentic capability. Instead of relying on proprietary frontier models (GPT, Gemini), Meissa brings tool calling, multi-agent collaboration, and clinical simulation offline by distilling structured trajectories from frontier agent systems into a compact vision-language model.
Key Features
- 4 agentic paradigms in a single model: continuous tool calling, interleaved thinking with images, multi-agent collaboration, and multi-turn clinical simulation
- Offline deployment: runs entirely locally with vLLM, no API calls needed
- Tool calling: native
<tool_call>support via Hermes format, compatible with vLLM's tool-call parser - Thinking: built-in
<think>chain-of-thought reasoning before actions
Model Details
| Base model | Qwen3-VL-4B-Instruct |
| Architecture | Qwen3VLForConditionalGeneration |
| Parameters | 4B |
| Precision | bfloat16 |
| Training method | LoRA SFT (rank=32, alpha=64), merged |
| Training data | 43,210 medical agentic trajectories (open subset) |
| Training framework | LLaMA-Factory |
| Context length | 8,192 tokens (training) |
Quickstart
Load with Transformers
from transformers import AutoModelForCausalLM, AutoProcessor
model = AutoModelForCausalLM.from_pretrained(
"CYX1998/Meissa-4B",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("CYX1998/Meissa-4B", trust_remote_code=True)
Serve with vLLM (Recommended)
For agentic use cases, serve Meissa with vLLM to enable tool calling:
python -m vllm.entrypoints.openai.api_server \
--model CYX1998/Meissa-4B \
--port 8877 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--dtype bfloat16 \
--enable-auto-tool-choice \
--tool-call-parser hermes
# Set the endpoint
export OPENAI_BASE_URL="http://127.0.0.1:8877/v1"
export OPENAI_API_KEY="dummy"
The --enable-auto-tool-choice --tool-call-parser hermes flags are required for tool calling.
Example: Tool Calling
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8877/v1", api_key="dummy")
tools = [{
"type": "function",
"function": {
"name": "ChestXRayClassifier",
"description": "Classify pathologies in a chest X-ray image.",
"parameters": {
"type": "object",
"properties": {
"image_path": {"type": "string", "description": "Path to the chest X-ray image"}
},
"required": ["image_path"]
}
}
}]
response = client.chat.completions.create(
model="CYX1998/Meissa-4B",
messages=[{"role": "user", "content": "Analyze this chest X-ray: /path/to/cxr.jpg"}],
tools=tools,
)
print(response.choices[0].message)
Supported Agentic Frameworks
| Framework | Description | Tools |
|---|---|---|
| I: Continuous Tool Calling | Sequential tool use for radiology analysis | 8 chest X-ray tools (classifier, report generator, VQA, segmentation, etc.) |
| II: Interleaved Thinking with Images | Iterative visual reasoning with zoom | ZoomInSubfigure, SegmentRegion, Terminate |
| III: Multi-Agent Collaboration | Multi-agent medical consultation | AssessDifficulty, RecruitExperts, ConsultExperts, FacilitateDebate |
| IV: Clinical Simulation | Multi-turn doctor-patient interaction | RequestPhysicalExam, RequestTest, Terminate |
Training Data
Trained on 43,210 medical agentic SFT trajectories distilled from Gemini:
| Framework | Samples | Source Datasets |
|---|---|---|
| I: Continuous Tool Calling | 4,898 | MIMIC-CXR-VQA |
| II: Interleaved Thinking | 15,211 | PathVQA, MIMIC-CXR-VQA, SLAKE, VQA-RAD |
| III: Multi-Agent Collaboration | 15,427 | MIMIC-CXR-VQA, PathVQA, MedQA, PubMedQA |
| IV: Clinical Simulation | 7,674 | MedQA, MIMIC-CXR |
The open-source subset (25,018 samples) is available at CYX1998/Meissa-SFT.
Evaluation
Meissa-4B matches or exceeds GPT-4o and Gemini-3-flash on multiple medical agentic benchmarks while being deployable offline on a single GPU. See our paper for full results.
Limitations
- Not for clinical use: This model is a research prototype and should NOT be used for real clinical decision-making.
- English only: Trained and evaluated on English medical data only.
- Domain scope: Primarily trained on radiology, pathology, and general clinical reasoning. Performance on other medical specialties may vary.
- Hallucination: Like all LLMs, Meissa may generate plausible but incorrect medical information.
Citation
@inproceedings{chen2026meissa,
title={Meissa: Multi-modal Medical Agentic Intelligence},
author={Chen, Yixiong and Bai, Xinyi and Pan, Yue and Zhou, Zongwei and Yuille, Alan},
journal={arXiv preprint arXiv:2603.09018},
year={2026}
}
License
This model is released under Apache 2.0. The base model Qwen3-VL-4B-Instruct is subject to the Qwen License.
- Downloads last month
- 45