Full-text search
Search in
Scope to owner or repo
+ 1,000 results
Aleistar / Gengar_toy_example
README.md
dataset
1 matches
dhirajudhani / image
README.md
model
5 matches
tags: diffusers, flux, lora, replicate, text-to-image, en, base_model:black-forest-labs/FLUX.1-dev, base_model:adapter:black-forest-labs/FLUX.1-dev, license:other, region:us
22
# Image
⋯
32
You should use `dhiraj` to trigger the image generation.
⋯
42
pipeline.load_lora_weights('dhirajudhani/image', weight_name='lora.safetensors')
43
image = pipeline('your prompt').images[0]
nvidia / nemotron-ocr-v2
README.md
model
28 matches
tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, multilingual, image-to-text, en, zh, ja, ko, ru, license:other, region:us
30
...ptical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a dete...
32
...on speed and accuracy on both document and natural scene images.
⋯
53
...cy and high-speed extraction of textual information from images across multiple languages, making it ideal for powering ...
⋯
66
...ne for high-accuracy localization of text regions within images.
⋯
70
... and production-ready OCR for diverse document and scene images.
⋯
111
| Input Type & Format | Image (RGB, PNG/JPEG, float32/uint8), aggregation level (word, sentence, or paragraph) |
112
| Input Parameters (Two-Dimensional) | 3 x H x W (single image) or B x 3 x H x W (batch) |
113
| Input Range | [0, 1] (float32) or [0, 255] (uint8, auto-converted) |
114
| Other Properties | Handles both single images and batches. Automatic multi-scale resizing for best accuracy. |
nvidia / nemotron-ocr-v1
README.md
model
24 matches
tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, image-to-text, en, license:other, region:us
27
*Preview of the model output on the example image.* -->
⋯
31
...ptical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a dete...
33
...on speed and accuracy on both document and natural scene images.
⋯
61
...cy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval syste...
⋯
77
...ne for high-accuracy localization of text regions within images.
⋯
81
... and production-ready OCR for diverse document and scene images.
qpqpqpqpqpqp / Ovis_Image_7B_fp8
README.md
model
2 matches
tags: image generation, comfyui, text-to-image, en, zh, base_model:AIDC-AI/Ovis-Image-7B, base_model:finetune:AIDC-AI/Ovis-Image-7B, license:apache-2.0, region:us
14
<div align="center">The world's first fp8 quants of Ovis Image 7B!
15
<img src=https://cdn-uploads.huggingface.co/production/uploads/636f4c6b5d2050767e4a1491/cfsnngElzYv8DbTKsLohl.png widt...
16
</div>
nvidia / nemotron-page-elements-v3
README.md
model
16 matches
tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us
22
*Preview of the model output on the example image.*
⋯
78
**Input Type(s)**: Image <br>
⋯
81
**Other Properties Related to Input**: Image size resized to `(1024, 1024)`
⋯
136
from PIL import Image
⋯
141
# Load image
142
path = "./example.png"
143
img = Image.open(path).convert("RGB")
nvidia / nemotron-graphic-elements-v1
README.md
model
24 matches
tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, arxiv:2305.04151, license:other, region:us
22
*Preview of the model output on the example image.*
23
24
The input of this model is expected to be a chart image. You can use the [Nemotron Page Element v3](https://huggingface....
⋯
30
...ing and localizing various graphic elements within chart images, including titles, axis labels, legends, and data point ...
⋯
82
**Input Type(s)**: Image <br>
⋯
85
**Other Properties Related to Input**: Image size resized to `(1024, 1024)`
⋯
149
from PIL import Image
nvidia / nemotron-table-structure-v1
README.md
model
23 matches
tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us
23
*Preview of the model output on the example image.*
24
25
The input of this model is expected to be a table image. You can use the [Nemotron Page Element v3](https://huggingface....
⋯
29
...igned to identify and extract the structure of tables in images. Based on YOLOX, an anchor-free version of YOLO (You Onl...
⋯
62
The **Nemotron Table Structure v1** model specializes in analyzing images containing tables by:
⋯
70
3. Enable accurate extraction of tabular data from images
⋯
97
**Input Type(s)**: Image <br>
xinyu1205 / recognize_anything_model
README.md
model
12 matches
tags: image tagging, image captioning, image-to-text, en, arxiv:2306.03514, arxiv:2303.05657, license:mit, region:us
12
...ognize-anything.github.io/">Recognize Anything: A Strong Image Tagging Model </a> and <a href="https://tag2text.github.i...
⋯
18
|  |
19
|:--:|
20
| <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|
⋯
26
...nize Anything Model~(RAM): a strong foundation model for image tagging. RAM makes a substantial step for large models in...
⋯
33
title={Recognize Anything: A Strong Image Tagging Model},
⋯
41
title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
This file contains 3 more matches not shown. See all 9
matches in the full file.
kviai / Kvi-Upscale-V1
README.md
model
5 matches
huwhitememes / laptophunterbiden_v1-qwen_image
README.md
model
6 matches
tags: image, lora, qwen, hunter-biden, generative-image, huwhitememes, Meme King Studio, Green Frog Labs, NSFW, text-to-image, base_model:Qwen/Qwen-Image, base_model:adapter:Qwen/Qwen-Image, license:apache-2.0, region:us
17
# Laptop Hunter Biden LoRA for Qwen Image V1
19
... a custom-trained **LoRA (Low-Rank Adapter)** for **Qwen Image**, fine-tuned on 85+ upscaled and varied images sourced f...
⋯
32
- **GPU**: Nvidia H100 (WaveSpeedAI)
33
- **Image Count**: 85 (curated, upscaled, real-world lighting)
34
- **Trigger Word**: `Hunt3r Bid3n` (recommended at start of prompt)
This file contains 1 more match not shown. See all 5
matches in the full file.
gymball / FatimaFellowship-UpsideDown
README.md
model
2 matches
unography / PP-HumanSegV1-Lite
README.md
model
2 matches
unography / PP-HumanSegV2-Lite
README.md
model
2 matches
johko / capdec_015
README.md
model
3 matches
johko / capdec_001
README.md
model
3 matches
johko / capdec_005
README.md
model
3 matches
johko / capdec_025
README.md
model
3 matches