Model Card for Model ID

A fine-tuned Mistral-7B-Instruct-v0.3 model specifically trained for generating medical rationales and explanations. The model was trained using QLoRA on a custom dataset of medical rationales.

Model Details

Model Description

This model is a fine-tuned version of Mistral-7B-Instruct-v0.3, specifically optimized for generating detailed medical rationales and explanations. It is mainly intended to be used in METEORA Rerankers of medical RAG systems. It was trained using Low-Rank Adaptation (LoRA) on a dataset of medical reasoning tasks, resulting in an 80%+ improvement in performance metrics compared to the base model.

  • Developed by: Chidiebere Okoene
  • Model type: Causal Language Model (Decoder-only)
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: mistralai/Mistral-7B-Instruct-v0.3

Model Sources [optional]

Direct Use

This model is intended for generating medical rationales, explanations, and reasoning for healthcare-related queries. It can be used by:

  • Medical educators creating teaching materials
  • Healthcare professionals seeking second opinions or explanations
  • Medical students learning diagnostic reasoning
  • Researchers exploring medical AI applications

Downstream Use [optional]

This model can be integrated into:

  • METEORA Reranker for Medical RAG systems
  • Clinical decision support systems
  • Healthcare chatbots for patient education
  • Medical documentation assistants

Out-of-Scope Use

This model should not be used for:

  • Direct patient diagnosis without human supervision
  • Making treatment decisions without clinical validation
  • Replacing licensed medical professionals
  • Generating medical advice for serious conditions

Bias, Risks, and Limitations

  • Training Data Bias: The model was trained on a specific dataset of medical rationales and may not cover all medical specialties or rare conditions
  • Accuracy Limitations: While performance improved significantly, the model may still generate incorrect or incomplete information
  • Temporal Limitations: Medical knowledge evolves rapidly, and the model may not reflect the latest guidelines or research
  • Demographic Biases: The training data may not adequately represent all patient populations

Recommendations

  • Always verify model outputs with current medical literature and guidelines
  • Use this model as an educational tool rather than a diagnostic tool
  • Implement human oversight for any clinical applications
  • Regularly update the model with new medical knowledge
  • Disclose the AI-assisted nature of generated content to end users

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "chidiokoene/mistral-7b-med-rationales-finetuned" 

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Generate rationales
def generate_rationale(prompt):
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Given the user query below, generate 3 concise rationales (1โ€“2 sentences each) describing what evidence a correct passage should contain.
          Explain the mechanism of action of metformin in type 2 diabetes."
rationale = generate_rationale(prompt)
print(rationale)

Training Details

Training Data

The model was fine-tuned on a proprietary dataset of medical rationales containing approximately 11,362 training examples and 3,246 validation examples. The data consisted of medical questions paired with detailed explanatory rationales.

Training Procedure

Preprocessing [optional]

Text was tokenized using the Mistral tokenizer

Sequences were truncated or padded to 1024 tokens

Special tokens were added for instruction following

Training Hyperparameters

  • Training regime:
  • Training regime: bf16 mixed precision with QLoRA
  • Learning rate: 2e-4
  • Batch size: 2 (with gradient accumulation steps: 4)
  • Epochs: 3
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05

Speeds, Sizes, Times [optional]

  • Training time: ~13 hours on a single GPU with 15GB VRAM
  • Model size: ~15GB (4-bit quantized)
  • Inference speed: ~2.9 samples/second

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a held-out validation set of 1,624 medical rationale examples.

Factors

[More Information Needed]

Metrics

  • Perplexity (lower is better)
  • Average cross-entropy loss (lower is better)
  • Inference speed (samples per second)

Results

Metric	Baseline Model	Fine-tuned Model	Improvement
Perplexity	7.78	1.51	80.6%
Average Loss	2.05	0.41	79.9%
Inference Speed	5.17 samples/sec	2.91 samples/sec	-43.7%

The fine-tuned model shows exceptional improvement in understanding and generating medical rationales, with over 80% improvement in both perplexity and loss metrics. The reduction in inference speed is expected due to the added LoRA parameters.

{
  "baseline_model": {
    "perplexity": 7.784124134664591,
    "average_loss": 2.0520862921697764,
    "loss_std": 0.2737355939406239,
    "evaluation_time_seconds": 313.9927325248718,
    "samples_per_second": 5.1720942295101064
  },
  "fine_tuned_model": {
    "perplexity": 1.5100232168650496,
    "average_loss": 0.4121250261159502,
    "loss_std": 0.147794492117157,
    "evaluation_time_seconds": 557.3957495689392,
    "samples_per_second": 2.9135493072129037
  },
  "comparison": {
    "perplexity_improvement_percent": 80.60124439510734,
    "loss_improvement_percent": 79.9167789537647,
    "relative_speed": 0.5633210026586989
  },
  "evaluation_parameters": {
    "max_length": 1024,
    "batch_size": 1,
    "num_samples_evaluated": 1624
  }

Summary

The fine-tuning process was highly successful, resulting in a model that significantly outperforms the base Mistral-7B model on medical rationale generation tasks while maintaining reasonable inference speed.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: NVIDIA GPU with 15GB VRAM

  • Hours used: ~13 hours for training

  • Carbon Emitted: Estimated based on Machine Learning Impact calculator

Technical Specifications [optional]

Model Architecture and Objective

Architecture: Transformer-based decoder-only model

Objective: Causal language modeling with instruction tuning

Parameters: 7 billion

Context length: 4096 tokens

Compute Infrastructure

[More Information Needed]

Hardware

Single GPU training

Software

PyTorch, Transformers, PEFT, Accelerate

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
11
Safetensors
Model size
7B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for chidiokoene/okoene-med-rationale

Finetuned
(408)
this model