Full Text Search - Hugging Face

Full-text search

Search in

Models Datasets Spaces

Scope to owner or repo

+ 1,000 results

Aleistar / Gengar_toy_example

README.md

dataset

1 matches

tags: region:us

image

crcoder07 / dataset

README.md

dataset

1 matches

tags: size_categories:n<1K, format:imagefolder, modality:image, library:datasets, library:mlcroissant, region:us

image

dhirajudhani / image

README.md

model

5 matches

tags: diffusers, flux, lora, replicate, text-to-image, en, base_model:black-forest-labs/FLUX.1-dev, base_model:adapter:black-forest-labs/FLUX.1-dev, license:other, region:us

# Image

⋯

You should use `dhiraj` to trigger the image generation.

⋯

pipeline.load_lora_weights('dhirajudhani/image', weight_name='lora.safetensors')

image = pipeline('your prompt').images[0]

nvidia / nemotron-ocr-v2

README.md

model

28 matches

tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, multilingual, image-to-text, en, zh, ja, ko, ru, license:other, region:us

...ptical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a dete...

...on speed and accuracy on both document and natural scene images.

⋯

...cy and high-speed extraction of textual information from images across multiple languages, making it ideal for powering ...

⋯

...ne for high-accuracy localization of text regions within images.

⋯

... and production-ready OCR for diverse document and scene images.

⋯

111

| Input Type & Format | Image (RGB, PNG/JPEG, float32/uint8), aggregation level (word, sentence, or paragraph) |

112

| Input Parameters (Two-Dimensional) | 3 x H x W (single image) or B x 3 x H x W (batch) |

113

| Input Range | [0, 1] (float32) or [0, 255] (uint8, auto-converted) |

114

| Other Properties | Handles both single images and batches. Automatic multi-scale resizing for best accuracy. |

nvidia / nemotron-ocr-v1

README.md

model

24 matches

tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, image-to-text, en, license:other, region:us

*Preview of the model output on the example image.* -->

⋯

...ptical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a dete...

...on speed and accuracy on both document and natural scene images.

⋯

...cy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval syste...

⋯

...ne for high-accuracy localization of text regions within images.

⋯

... and production-ready OCR for diverse document and scene images.

qpqpqpqpqpqp / Ovis_Image_7B_fp8

README.md

model

2 matches

tags: image generation, comfyui, text-to-image, en, zh, base_model:AIDC-AI/Ovis-Image-7B, base_model:finetune:AIDC-AI/Ovis-Image-7B, license:apache-2.0, region:us

<div align="center">The world's first fp8 quants of Ovis Image 7B!

<img src=https://cdn-uploads.huggingface.co/production/uploads/636f4c6b5d2050767e4a1491/cfsnngElzYv8DbTKsLohl.png widt...

</div>

nvidia / nemotron-page-elements-v3

README.md

model

16 matches

tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us

*Preview of the model output on the example image.*

⋯

**Input Type(s)**: Image <br>

⋯

**Other Properties Related to Input**: Image size resized to `(1024, 1024)`

⋯

136

from PIL import Image

⋯

141

# Load image

142

path = "./example.png"

143

img = Image.open(path).convert("RGB")

nvidia / nemotron-graphic-elements-v1

README.md

model

24 matches

tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, arxiv:2305.04151, license:other, region:us

*Preview of the model output on the example image.*

The input of this model is expected to be a chart image. You can use the [Nemotron Page Element v3](https://huggingface....

⋯

...ing and localizing various graphic elements within chart images, including titles, axis labels, legends, and data point ...

⋯

**Input Type(s)**: Image <br>

⋯

**Other Properties Related to Input**: Image size resized to `(1024, 1024)`

⋯

149

from PIL import Image

nvidia / nemotron-table-structure-v1

README.md

model

23 matches

tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us

*Preview of the model output on the example image.*

The input of this model is expected to be a table image. You can use the [Nemotron Page Element v3](https://huggingface....

⋯

...igned to identify and extract the structure of tables in images. Based on YOLOX, an anchor-free version of YOLO (You Onl...

⋯

The **Nemotron Table Structure v1** model specializes in analyzing images containing tables by:

⋯

3. Enable accurate extraction of tabular data from images

⋯

**Input Type(s)**: Image <br>

xinyu1205 / recognize_anything_model

README.md

model

12 matches

tags: image tagging, image captioning, image-to-text, en, arxiv:2306.03514, arxiv:2303.05657, license:mit, region:us

...ognize-anything.github.io/">Recognize Anything: A Strong Image Tagging Model </a> and <a href="https://tag2text.github.i...

⋯

| ![RAM.jpg](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) |

|:--:|

| <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|

⋯

...nize Anything Model~(RAM): a strong foundation model for image tagging. RAM makes a substantial step for large models in...

⋯

title={Recognize Anything: A Strong Image Tagging Model},

⋯

title={Tag2Text: Guiding Vision-Language Model via Image Tagging},

This file contains 3 more matches not shown. See all 9 matches in the full file.

kviai / Kvi-Upscale-V1

README.md

model

5 matches

tags: diffusers, Image Upscaling, Img2Img, image-to-image, en, license:cc-by-4.0, region:us

### Image Upscaling Model

This repository contains the PyTorch model for upscaling images. The model has been trained to upscale low-resolution im...

huwhitememes / laptophunterbiden_v1-qwen_image

README.md

model

6 matches

tags: image, lora, qwen, hunter-biden, generative-image, huwhitememes, Meme King Studio, Green Frog Labs, NSFW, text-to-image, base_model:Qwen/Qwen-Image, base_model:adapter:Qwen/Qwen-Image, license:apache-2.0, region:us

# Laptop Hunter Biden LoRA for Qwen Image V1

... a custom-trained **LoRA (Low-Rank Adapter)** for **Qwen Image**, fine-tuned on 85+ upscaled and varied images sourced f...

⋯

- **GPU**: Nvidia H100 (WaveSpeedAI)

- **Image Count**: 85 (curated, upscaled, real-world lighting)

- **Trigger Word**: `Hunt3r Bid3n` (recommended at start of prompt)

This file contains 1 more match not shown. See all 5 matches in the full file.

gymball / FatimaFellowship-UpsideDown

README.md

model

2 matches

tags: Image Classification, en, dataset:cifar100, license:unlicense, region:us

This repo contains a model that is capable of detecting upside images.

This is part of my submission for the Fatima Fellowship Selection Task.

unography / PP-HumanSegV1-Lite

README.md

model

2 matches

tags: image matting, image segmentation, en, license:apache-2.0, region:us

unography / PP-HumanSegV2-Lite

README.md

model

2 matches

tags: image matting, image segmentation, en, license:apache-2.0, region:us

johko / capdec_015

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

...ided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/p...

johko / capdec_0

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

...ided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/p...

johko / capdec_001

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

...ided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/p...

johko / capdec_005

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

...ided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/p...

johko / capdec_025

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

...ided by the authors of the paper [Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/p...