27 11 28

Pavan Kumar Balijepalli

pavankumarbalijepalli

AI & ML interests

Learn. Build. Teach.

Recent Activity

posted an update 9 days ago

The quadratic bottleneck of long-context LLMs just hit a massive speed wall. Processing long-context sequences in LLMs is computationally expensive due to the quadratic complexity of self-attention. Existing sparse attention methods often rely on sorting or cumulative summation (Top-k/Top-p), which are slow and struggle to prune the "long-tail" of irrelevant tokens. - FlashPrefill achieves a 27.78× speedup on 256K sequences by replacing heavy sorting with a Max-based Dynamic Thresholding mechanism. - It introduces "Instantaneous Pattern Discovery" using block-level approximations, bypassing the need for expensive, full-attention score calculations. - Unlike previous methods that struggle with shorter contexts, it maintains a 1.71× speedup even at 4K, proving its robustness across all scales. - The framework is fully compatible with existing LLM/VLM architectures and integrates seamlessly into vLLM for real-world deployment. This breakthrough significantly reduces Time-to-First-Token (TTFT) for long-context applications, making massive document analysis and long-video understanding practical and cost-effective. It turns a major performance bottleneck into a streamlined, hardware-efficient process. How much compute are we wasting on "long-tail" tokens that don't actually matter? FlashPrefill suggests the answer is: a lot. #AI #LLMs #MachineLearning #DeepLearning #TechInnovation #GPUComputing Source: https://arxiv.org/pdf/2603.06199

updated a Space 16 days ago

pavankumarbalijepalli/portfolio

published a Space 16 days ago

pavankumarbalijepalli/portfolio

View all activity

Organizations

liked 3 models about 2 months ago

liked a model 4 months ago

neuphonic/neutts-air

Text-to-Speech • 0.7B • Updated Feb 12 • 11.2k • 860

liked a model 7 months ago

rednote-hilab/dots.ocr

Image-Text-to-Text • 3B • Updated Oct 31, 2025 • 263k • 1.27k

liked a dataset 12 months ago

NousResearch/hermes-function-calling-v1

Viewer • Updated Jan 3 • 11.6k • 5.26k • 385

liked 3 models 12 months ago

OpenHands/openhands-lm-32b-v0.1

Text Generation • 33B • Updated Apr 16, 2025 • 316 • • 392

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 442k • 1.87k

rasbt/llama-3.2-from-scratch

Updated Jun 12, 2025 • 284

liked 4 models about 1 year ago

google/gemma-3-12b-it

Image-Text-to-Text • Updated Mar 21, 2025 • 1.99M • • 681

pavankumarbalijepalli/telLM-gemma2-9b-16bit

Text Generation • 9B • Updated May 15, 2025 • 1 • 1

pavankumarbalijepalli/phi2-nl2sql-lora

Text Generation • 3B • Updated Feb 28, 2025 • 8 • 1

pavankumarbalijepalli/telLM-gemma2-9b

Updated Mar 1, 2025 • 1

liked a dataset about 1 year ago

eswardivi/telugu_instruction_dataset

Viewer • Updated Feb 1, 2024 • 145k • 33 • 5

liked 2 models about 1 year ago

sarvamai/sarvam-1

Text Generation • 3B • Updated Nov 8, 2024 • 8.98k • 132

watt-ai/watt-tool-8B

Updated Dec 20, 2024 • 227k • 117

liked 3 datasets about 1 year ago

Salesforce/xlam-function-calling-60k

Viewer • Updated Jan 24, 2025 • 60k • 6.48k • 581

indiehackers/hellaswag-telugu-custom

Viewer • Updated Apr 22, 2024 • 10k • 8 • 1

indiehackers/Telugu_InstructData

Viewer • Updated Mar 2, 2024 • 33.4k • 6 • 1

liked a model about 1 year ago

microsoft/phi-4

Text Generation • Updated Nov 24, 2025 • 986k • 2.22k

Pavan Kumar Balijepalli

AI & ML interests

Recent Activity

Organizations

pavankumarbalijepalli's activity