Model Card for Model ID

A fine-tuned Mistral-7B-Instruct-v0.3 model specifically trained for generating medical rationales and explanations. The model was trained using QLoRA on a custom dataset of medical rationales.

Model Details

Model Description

This model is a fine-tuned version of Mistral-7B-Instruct-v0.3, specifically optimized for generating detailed medical rationales and explanations. It is mainly intended to be used in METEORA Rerankers of medical RAG systems. It was trained using Low-Rank Adaptation (LoRA) on a dataset of medical reasoning tasks, resulting in an 80%+ improvement in performance metrics compared to the base model.

Developed by: Chidiebere Okoene
Model type: Causal Language Model (Decoder-only)
Language(s) (NLP): English
License: MIT
Finetuned from model: mistralai/Mistral-7B-Instruct-v0.3

Model Sources [optional]

Repository: (https://github.com/ChidiOkoene/METEORA_Med-Reraker/tree/feat/V1)
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Direct Use

This model is intended for generating medical rationales, explanations, and reasoning for healthcare-related queries. It can be used by:

Medical educators creating teaching materials
Healthcare professionals seeking second opinions or explanations
Medical students learning diagnostic reasoning
Researchers exploring medical AI applications

Downstream Use [optional]

This model can be integrated into:

METEORA Reranker for Medical RAG systems
Clinical decision support systems
Healthcare chatbots for patient education
Medical documentation assistants

Out-of-Scope Use

This model should not be used for:

Direct patient diagnosis without human supervision
Making treatment decisions without clinical validation
Replacing licensed medical professionals
Generating medical advice for serious conditions

Bias, Risks, and Limitations

Training Data Bias: The model was trained on a specific dataset of medical rationales and may not cover all medical specialties or rare conditions
Accuracy Limitations: While performance improved significantly, the model may still generate incorrect or incomplete information
Temporal Limitations: Medical knowledge evolves rapidly, and the model may not reflect the latest guidelines or research
Demographic Biases: The training data may not adequately represent all patient populations

Recommendations

Always verify model outputs with current medical literature and guidelines
Use this model as an educational tool rather than a diagnostic tool
Implement human oversight for any clinical applications
Regularly update the model with new medical knowledge
Disclose the AI-assisted nature of generated content to end users

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "chidiokoene/mistral-7b-med-rationales-finetuned" 

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Generate rationales
def generate_rationale(prompt):
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Given the user query below, generate 3 concise rationales (1–2 sentences each) describing what evidence a correct passage should contain.
          Explain the mechanism of action of metformin in type 2 diabetes."
rationale = generate_rationale(prompt)
print(rationale)

Training Details

Training Data

The model was fine-tuned on a proprietary dataset of medical rationales containing approximately 11,362 training examples and 3,246 validation examples. The data consisted of medical questions paired with detailed explanatory rationales.

Training Procedure

Preprocessing [optional]

Text was tokenized using the Mistral tokenizer

Sequences were truncated or padded to 1024 tokens

Special tokens were added for instruction following

Training Hyperparameters

Training regime:
Training regime: bf16 mixed precision with QLoRA
Learning rate: 2e-4
Batch size: 2 (with gradient accumulation steps: 4)
Epochs: 3
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05

Speeds, Sizes, Times [optional]

Training time: ~13 hours on a single GPU with 15GB VRAM
Model size: ~15GB (4-bit quantized)
Inference speed: ~2.9 samples/second

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a held-out validation set of 1,624 medical rationale examples.

Factors

[More Information Needed]

Metrics

Perplexity (lower is better)
Average cross-entropy loss (lower is better)
Inference speed (samples per second)

Results

Metric	Baseline Model	Fine-tuned Model	Improvement
Perplexity	7.78	1.51	80.6%
Average Loss	2.05	0.41	79.9%
Inference Speed	5.17 samples/sec	2.91 samples/sec	-43.7%

The fine-tuned model shows exceptional improvement in understanding and generating medical rationales, with over 80% improvement in both perplexity and loss metrics. The reduction in inference speed is expected due to the added LoRA parameters.

{
  "baseline_model": {
    "perplexity": 7.784124134664591,
    "average_loss": 2.0520862921697764,
    "loss_std": 0.2737355939406239,
    "evaluation_time_seconds": 313.9927325248718,
    "samples_per_second": 5.1720942295101064
  },
  "fine_tuned_model": {
    "perplexity": 1.5100232168650496,
    "average_loss": 0.4121250261159502,
    "loss_std": 0.147794492117157,
    "evaluation_time_seconds": 557.3957495689392,
    "samples_per_second": 2.9135493072129037
  },
  "comparison": {
    "perplexity_improvement_percent": 80.60124439510734,
    "loss_improvement_percent": 79.9167789537647,
    "relative_speed": 0.5633210026586989
  },
  "evaluation_parameters": {
    "max_length": 1024,
    "batch_size": 1,
    "num_samples_evaluated": 1624
  }

Summary

The fine-tuning process was highly successful, resulting in a model that significantly outperforms the base Mistral-7B model on medical rationale generation tasks while maintaining reasonable inference speed.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA GPU with 15GB VRAM
Hours used: ~13 hours for training
Carbon Emitted: Estimated based on Machine Learning Impact calculator

Technical Specifications [optional]

Model Architecture and Objective

Architecture: Transformer-based decoder-only model

Objective: Causal language modeling with instruction tuning

Parameters: 7 billion

Context length: 4096 tokens

Compute Infrastructure

[More Information Needed]

Hardware

Single GPU training

Software

PyTorch, Transformers, PEFT, Accelerate

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month: 11

Safetensors

Model size

7B params

Tensor type

F16

Model tree for chidiokoene/okoene-med-rationale

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Finetuned

(408)

this model