AbstractPhila's picture

AbstractPhila PRO

AbstractPhil

·

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

replied to their post about 23 hours ago

The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram. TLDR: This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures. https://github.com/AbstractEyes/pytorch-parallel-compiler The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out. MLP (N=100, batch=32, CUDA): ``` Eager: 2-3x speedup Compiled: 35-40x speedup ``` ResBlock (N=20, batch=8, CUDA): ``` Eager: ~5x speedup Compiled: ~10x speedup ``` This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization. This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate. Training for this is not tested nor supported yet, use at your own risk.

posted an update about 24 hours ago

The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram. TLDR: This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures. https://github.com/AbstractEyes/pytorch-parallel-compiler The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out. MLP (N=100, batch=32, CUDA): ``` Eager: 2-3x speedup Compiled: 35-40x speedup ``` ResBlock (N=20, batch=8, CUDA): ``` Eager: ~5x speedup Compiled: ~10x speedup ``` This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization. This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate. Training for this is not tested nor supported yet, use at your own risk.

updated a model 2 days ago

AbstractPhil/mobiusnet-collective

View all activity

Organizations

AbstractPhil 's models 105

AbstractPhil/mobiusnet-collective

Updated 2 days ago

AbstractPhil/mobiusnet-distillations

Updated 2 days ago

AbstractPhil/mobiusnet

Updated 3 days ago

AbstractPhil/vit-beatrix-contrarian

Updated 15 days ago

AbstractPhil/vit-beatrix-contrarian-baselines

Updated 15 days ago

AbstractPhil/beatrix-diffusion-proto

Updated 16 days ago • 371

AbstractPhil/global_fractal_router

Updated Dec 13, 2025

AbstractPhil/agatha-diffusion-proto

Updated Dec 9, 2025

AbstractPhil/math_collective_v2

Updated Dec 4, 2025

AbstractPhil/math_collective_v1

Text Classification • Updated Dec 4, 2025

AbstractPhil/geovit-david-beans

Image Classification • Updated Dec 3, 2025 • 12

AbstractPhil/geovit-david-beans-run002-5expert

Image Classification • Updated Nov 30, 2025 • 8

AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious

Updated Nov 28, 2025 • 70 • 1

AbstractPhil/bert-beatrix-200_000

Updated Nov 27, 2025

AbstractPhil/clips

5B • Updated Nov 25, 2025 • 18 • 19

AbstractPhil/vit-beans-v3

Image Classification • Updated Nov 24, 2025 • 2

AbstractPhil/lune-leco-adapters

Updated Nov 20, 2025

AbstractPhil/liminal-staircase-v2

Updated Nov 17, 2025

AbstractPhil/liminal-staircase-danbooru-v2

Updated Nov 16, 2025 • 3

AbstractPhil/liminal-staircase-danbooru

Updated Nov 16, 2025 • 6

AbstractPhil/sd15-flow-lune-flux

Updated Nov 13, 2025

AbstractPhil/sd15-flow-lune

Text-to-Image • Updated Nov 11, 2025 • 9

AbstractPhil/vae-lyra-xl-adaptive-cantor

Updated Nov 11, 2025 • 4 • 1

AbstractPhil/vae-lyra-sdxl-t5xl

Updated Nov 10, 2025 • 6 • 3

AbstractPhil/vae-lyra

Any-to-Any • Updated Nov 7, 2025 • 21 • 2

AbstractPhil/sd15-flow-matching

Text-to-Image • Updated Nov 7, 2025 • 374 • 3

AbstractPhil/gated-david

Image Classification • Updated Nov 4, 2025

AbstractPhil/sd15-flow-matching-try2

Text-to-Image • Updated Nov 4, 2025 • 4

AbstractPhil/cantor-linear-imagenet

Updated Oct 30, 2025

AbstractPhil/cantor-resnet-imagenet

Updated Oct 30, 2025