OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

YYangzzzz authored a paper 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

vansin submitted a paper 2 days ago

End-to-End Video Character Replacement without Structural Guidance

heroding77 authored a paper 3 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

View all activity

Papers

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

View all Papers

Eurayka

submitted a paper to Daily Papers about 1 hour ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published about 22 hours ago • 2

YYangzzzz

authored a paper 1 day ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 3 days ago • 24

vansin

submitted a paper to Daily Papers 2 days ago

End-to-End Video Character Replacement without Structural Guidance

Paper • 2601.08587 • Published 3 days ago • 6

heroding77

authored a paper 3 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 3 days ago • 24

heroding77

submitted a paper to Daily Papers 3 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 3 days ago • 24

prithivMLmods

posted an update 5 days ago

Post

4086

LTX-2 Camera-Control LoRA demo with dolly-in/out and dolly-left/right is now available on Hugging Face, paired with ltx-2-19b-distilled-lora for fast inference. It also includes dynamic GPU duration adjustments for long video generations. Click the related Space links below.

🤗Try it now on : prithivMLmods/LTX-2-LoRAs-Camera-Control-Dolly
⭐Github: https://github.com/PRITHIVSAKTHIUR/LTX-2-LoRAs-Camera-Control-Dolly
🕹️Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To learn more, visit the app page or the respective model pages.

2 replies

Nymbo

posted an update 8 days ago

Post

1682

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

prithivMLmods

posted an update 12 days ago

Post

2392

Dropping Image Edit (Object Manipulator): Add or remove specified objects/designs, with flexible support for both single-image and multi-image modes.

🤗 Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator

Qwen-Image-Edit-2511-Object-Remover is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object removal from images.

⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Remover

Qwen-Image-Edit-2511-Object-Adder is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object addition to images.

⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Adder

🕹️ Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-object-manipulator
🕹️ github: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-Object-Manipulator

To learn more, visit the app page or the respective model pages.

kpzhang996

submitted a paper to Daily Papers 17 days ago

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published 20 days ago • 59

prithivMLmods

posted an update 18 days ago

Post

4134

Update: TRELLIS.2 (Text to 3D, Image to 3D) Gradio with Rerun Embedded demo with improved visualization of the 3D model previewer is now available on Hugging Face. Generate assets and view them in the 3D viewer, powered and streamlined with Microsoft’s TRELLIS.2 and Tongyi-MAI’s Z-Image-Turbo models.

🤗 TRELLIS.2 (Demo): prithivMLmods/TRELLIS.2-Text-to-3D
🕹️ GitHub: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D-RERUN
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 20 days ago

Post

4207

Introducing the Qwen-Image-Edit-2511-LoRAs-Fast demo, featuring image property comparison and contrast, built on top of Gradio and the combined Rerun SDK. It supports single and multi-image edits with existing LoRAs that are lazily loaded. (Note: This is still an experimental Space for Qwen-Image-Edit-2511.)

⭐ Space Demo: prithivMLmods/Qwen-Image-Edit-2511-LoRAs-Fast
⭐ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-2511-LoRAs-Fast-Multi-Image-Rerun
⭐ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!

2 replies

lll2343

updated 3 models 20 days ago

revliter

updated a collection 23 days ago

InternVideo-Next

Collection

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision • 5 items • Updated 23 days ago • 4

prithivMLmods

posted an update 26 days ago

Post

3709

Introducing demos for new SOTA models from AI2: SAGE-MM (Smart Any-Horizon Agents for Long-Video Reasoning) and Molmo-2, an open vision-language model that supports multi-image (QA and pointing) and video (QA, pointing, and tracking). The respective demo-related collections are listed below. 🎃🔥

✨ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
✨ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo

🎃 GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
🎃 GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
🎃 Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

To know more about it, visit the app page or the respective model page!

1 reply

vansin

posted an update 27 days ago

Post

266

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

prithivMLmods

posted an update 27 days ago

Post

2070

Introducing TRELLIS.2 Text-to-3D. The demo for the TRELLIS.2-4B (Image-to-3D) model is streamlined with the Z-Image Turbo image generation model to enable Text-to-3D functionality. There is no need for input assets, making a small leap forward for ideation. Optionally, it also includes default support for Image-to-3D inference using direct image assets. Find the demo and related collections below... 🤗🔥

✨ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
✨ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D

To know more about it, visit the app page or the respective model page!

Eurayka

authored a paper 28 days ago

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

Paper • 2511.20272 • Published Nov 25, 2025 • 1

Nymbo

posted an update 28 days ago

Post

2104

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

AI & ML interests

Recent Activity

Papers

Team members 117

OpenGVLab's activity