Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
29.7
TFLOPS
27
11
28
Pavan Kumar Balijepalli
pavankumarbalijepalli
Follow
regisss's profile picture
LeroyDyer's profile picture
Prashantkh70's profile picture
5 followers
·
6 following
pavankumarbalijepalli
pavan-kumar-balijepalli
AI & ML interests
Learn. Build. Teach.
Recent Activity
posted
an
update
9 days ago
The quadratic bottleneck of long-context LLMs just hit a massive speed wall. Processing long-context sequences in LLMs is computationally expensive due to the quadratic complexity of self-attention. Existing sparse attention methods often rely on sorting or cumulative summation (Top-k/Top-p), which are slow and struggle to prune the "long-tail" of irrelevant tokens. - FlashPrefill achieves a 27.78× speedup on 256K sequences by replacing heavy sorting with a Max-based Dynamic Thresholding mechanism. - It introduces "Instantaneous Pattern Discovery" using block-level approximations, bypassing the need for expensive, full-attention score calculations. - Unlike previous methods that struggle with shorter contexts, it maintains a 1.71× speedup even at 4K, proving its robustness across all scales. - The framework is fully compatible with existing LLM/VLM architectures and integrates seamlessly into vLLM for real-world deployment. This breakthrough significantly reduces Time-to-First-Token (TTFT) for long-context applications, making massive document analysis and long-video understanding practical and cost-effective. It turns a major performance bottleneck into a streamlined, hardware-efficient process. How much compute are we wasting on "long-tail" tokens that don't actually matter? FlashPrefill suggests the answer is: a lot. #AI #LLMs #MachineLearning #DeepLearning #TechInnovation #GPUComputing Source: https://arxiv.org/pdf/2603.06199
updated
a Space
16 days ago
pavankumarbalijepalli/portfolio
published
a Space
16 days ago
pavankumarbalijepalli/portfolio
View all activity
Organizations
pavankumarbalijepalli
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
3 models
about 2 months ago
Disty0/LTX-2-SDNQ-4bit-dynamic
Updated
Jan 8
•
339
•
12
tencent/HunyuanVideo-1.5
Text-to-Video
•
Updated
Dec 25, 2025
•
627
•
•
583
meituan-longcat/LongCat-Video
Text-to-Video
•
Updated
Oct 29, 2025
•
746
•
•
448
liked
a model
4 months ago
neuphonic/neutts-air
Text-to-Speech
•
0.7B
•
Updated
Feb 12
•
11.2k
•
860
liked
a model
7 months ago
rednote-hilab/dots.ocr
Image-Text-to-Text
•
3B
•
Updated
Oct 31, 2025
•
263k
•
1.27k
liked
a dataset
12 months ago
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
Jan 3
•
11.6k
•
5.26k
•
385
liked
3 models
12 months ago
OpenHands/openhands-lm-32b-v0.1
Text Generation
•
33B
•
Updated
Apr 16, 2025
•
316
•
•
392
Qwen/Qwen2.5-Omni-7B
Any-to-Any
•
Updated
Apr 30, 2025
•
442k
•
1.87k
rasbt/llama-3.2-from-scratch
Updated
Jun 12, 2025
•
284
liked
4 models
about 1 year ago
google/gemma-3-12b-it
Image-Text-to-Text
•
Updated
Mar 21, 2025
•
1.99M
•
•
681
pavankumarbalijepalli/telLM-gemma2-9b-16bit
Text Generation
•
9B
•
Updated
May 15, 2025
•
1
•
1
pavankumarbalijepalli/phi2-nl2sql-lora
Text Generation
•
3B
•
Updated
Feb 28, 2025
•
8
•
1
pavankumarbalijepalli/telLM-gemma2-9b
Updated
Mar 1, 2025
•
1
liked
a dataset
about 1 year ago
eswardivi/telugu_instruction_dataset
Viewer
•
Updated
Feb 1, 2024
•
145k
•
33
•
5
liked
2 models
about 1 year ago
sarvamai/sarvam-1
Text Generation
•
3B
•
Updated
Nov 8, 2024
•
8.98k
•
132
watt-ai/watt-tool-8B
Updated
Dec 20, 2024
•
227k
•
117
liked
3 datasets
about 1 year ago
Salesforce/xlam-function-calling-60k
Viewer
•
Updated
Jan 24, 2025
•
60k
•
6.48k
•
581
indiehackers/hellaswag-telugu-custom
Viewer
•
Updated
Apr 22, 2024
•
10k
•
8
•
1
indiehackers/Telugu_InstructData
Viewer
•
Updated
Mar 2, 2024
•
33.4k
•
6
•
1
liked
a model
about 1 year ago
microsoft/phi-4
Text Generation
•
Updated
Nov 24, 2025
•
986k
•
2.22k
Load more