YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Florence-2 Coordinate Detection Model

Fine-tuned Florence-2-base for captcha coordinate prediction.

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import re

model = AutoModelForCausalLM.from_pretrained("sadasd67/florence2-coordinates", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("sadasd67/florence2-coordinates", trust_remote_code=True)

image = Image.open("captcha.png")
inputs = processor(text="<COORDINATE>", images=image, return_tensors="pt")
generated = model.generate(**inputs, max_new_tokens=64)
result = processor.batch_decode(generated, skip_special_tokens=False)[0]

# Parse coordinates
locs = [int(x) for x in re.findall(r'<loc_(\d+)>', result)]
w, h = image.size
coords = [(locs[i] * w / 999, locs[i+1] * h / 999) for i in range(0, len(locs), 2)]
print(coords)

Training Details

  • Base Model: microsoft/Florence-2-base
  • Dataset: 74233 captcha images with click coordinates
  • Training: LoRA fine-tuning, 6 epochs
  • Best Val Loss: 3.8039
  • Training Time: 50.0 minutes
  • GPU: Nvidia A10G 24GB (Modal.com)

Coordinate Format

  • Model outputs Florence-2 location tokens: <loc_XXX> (normalized 0-999)
  • Denormalize: pixel_x = loc_value * image_width / 999
Downloads last month
18
Safetensors
Model size
0.2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support