Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
13
2
15
AbstractPhila
PRO
AbstractPhil
Follow
rzyns's profile picture
otterpupp's profile picture
aelitis's profile picture
69 followers
·
85 following
https://civitai.com/user/AbstractPhila
AbstractEyes
AI & ML interests
datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.
Recent Activity
replied
to
their
post
about 23 hours ago
The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram. TLDR: This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures. https://github.com/AbstractEyes/pytorch-parallel-compiler The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out. MLP (N=100, batch=32, CUDA): ``` Eager: 2-3x speedup Compiled: 35-40x speedup ``` ResBlock (N=20, batch=8, CUDA): ``` Eager: ~5x speedup Compiled: ~10x speedup ``` This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization. This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate. Training for this is not tested nor supported yet, use at your own risk.
posted
an
update
about 24 hours ago
The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram. TLDR: This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures. https://github.com/AbstractEyes/pytorch-parallel-compiler The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out. MLP (N=100, batch=32, CUDA): ``` Eager: 2-3x speedup Compiled: 35-40x speedup ``` ResBlock (N=20, batch=8, CUDA): ``` Eager: ~5x speedup Compiled: ~10x speedup ``` This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization. This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate. Training for this is not tested nor supported yet, use at your own risk.
updated
a model
2 days ago
AbstractPhil/mobiusnet-collective
View all activity
Organizations
AbstractPhil
's models
105
Sort:Â Recently updated
AbstractPhil/mobiusnet-collective
Updated
2 days ago
AbstractPhil/mobiusnet-distillations
Updated
2 days ago
AbstractPhil/mobiusnet
Updated
3 days ago
AbstractPhil/vit-beatrix-contrarian
Updated
15 days ago
AbstractPhil/vit-beatrix-contrarian-baselines
Updated
15 days ago
AbstractPhil/beatrix-diffusion-proto
Updated
16 days ago
•
371
AbstractPhil/global_fractal_router
Updated
Dec 13, 2025
AbstractPhil/agatha-diffusion-proto
Updated
Dec 9, 2025
AbstractPhil/math_collective_v2
Updated
Dec 4, 2025
AbstractPhil/math_collective_v1
Text Classification
•
Updated
Dec 4, 2025
AbstractPhil/geovit-david-beans
Image Classification
•
Updated
Dec 3, 2025
•
12
AbstractPhil/geovit-david-beans-run002-5expert
Image Classification
•
Updated
Nov 30, 2025
•
8
AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious
Updated
Nov 28, 2025
•
70
•
1
AbstractPhil/bert-beatrix-200_000
Updated
Nov 27, 2025
AbstractPhil/clips
5B
•
Updated
Nov 25, 2025
•
18
•
19
AbstractPhil/vit-beans-v3
Image Classification
•
Updated
Nov 24, 2025
•
2
AbstractPhil/lune-leco-adapters
Updated
Nov 20, 2025
AbstractPhil/liminal-staircase-v2
Updated
Nov 17, 2025
AbstractPhil/liminal-staircase-danbooru-v2
Updated
Nov 16, 2025
•
3
AbstractPhil/liminal-staircase-danbooru
Updated
Nov 16, 2025
•
6
AbstractPhil/sd15-flow-lune-flux
Updated
Nov 13, 2025
AbstractPhil/sd15-flow-lune
Text-to-Image
•
Updated
Nov 11, 2025
•
9
AbstractPhil/vae-lyra-xl-adaptive-cantor
Updated
Nov 11, 2025
•
4
•
1
AbstractPhil/vae-lyra-sdxl-t5xl
Updated
Nov 10, 2025
•
6
•
3
AbstractPhil/vae-lyra
Any-to-Any
•
Updated
Nov 7, 2025
•
21
•
2
AbstractPhil/sd15-flow-matching
Text-to-Image
•
Updated
Nov 7, 2025
•
374
•
3
AbstractPhil/gated-david
Image Classification
•
Updated
Nov 4, 2025
AbstractPhil/sd15-flow-matching-try2
Text-to-Image
•
Updated
Nov 4, 2025
•
4
AbstractPhil/cantor-linear-imagenet
Updated
Oct 30, 2025
AbstractPhil/cantor-resnet-imagenet
Updated
Oct 30, 2025
Previous
1
2
3
4
Next