YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Florence-2 Coordinate Detection Model
Fine-tuned Florence-2-base for captcha coordinate prediction.
Usage
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import re
model = AutoModelForCausalLM.from_pretrained("sadasd67/florence2-coordinates", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("sadasd67/florence2-coordinates", trust_remote_code=True)
image = Image.open("captcha.png")
inputs = processor(text="<COORDINATE>", images=image, return_tensors="pt")
generated = model.generate(**inputs, max_new_tokens=64)
result = processor.batch_decode(generated, skip_special_tokens=False)[0]
# Parse coordinates
locs = [int(x) for x in re.findall(r'<loc_(\d+)>', result)]
w, h = image.size
coords = [(locs[i] * w / 999, locs[i+1] * h / 999) for i in range(0, len(locs), 2)]
print(coords)
Training Details
- Base Model: microsoft/Florence-2-base
- Dataset: 74233 captcha images with click coordinates
- Training: LoRA fine-tuning, 6 epochs
- Best Val Loss: 3.8039
- Training Time: 50.0 minutes
- GPU: Nvidia A10G 24GB (Modal.com)
Coordinate Format
- Model outputs Florence-2 location tokens:
<loc_XXX>(normalized 0-999) - Denormalize:
pixel_x = loc_value * image_width / 999
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support