Interesting - a mhykes Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

mhykes 's Collections

Interesting

updated Jun 4, 2024

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 14
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 25
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 45
Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Paper • 2402.11450 • Published Feb 18, 2024 • 22
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

Paper • 2402.06149 • Published Feb 9, 2024 • 18
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

Paper • 2402.04858 • Published Feb 7, 2024 • 15
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 117
Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5, 2024 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 141
LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31, 2024 • 24
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Paper • 2401.12963 • Published Jan 23, 2024 • 12
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Paper • 2401.12954 • Published Jan 23, 2024 • 33
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19, 2024 • 59
Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11, 2024 • 39
Learning to Decode Collaboratively with Multiple Language Models

Paper • 2403.03870 • Published Mar 6, 2024 • 21
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Paper • 2403.04746 • Published Mar 7, 2024 • 24
PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15, 2024 • 60
Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 33
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8, 2024 • 32
Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Paper • 2403.09919 • Published Mar 14, 2024 • 21
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29, 2024 • 34
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13, 2024 • 51
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
Chronos: Learning the Language of Time Series

Paper • 2403.07815 • Published Mar 12, 2024 • 48
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 77
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 44
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 30
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 29
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 124
Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28
Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 81
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30, 2024 • 20
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Paper • 2404.15420 • Published Apr 23, 2024 • 11
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33
Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published May 18, 2024 • 31
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 40
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28, 2024 • 22
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 33
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 41
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published May 15, 2024 • 27
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14, 2024 • 18
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Paper • 2401.06102 • Published Jan 11, 2024 • 22
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11, 2024 • 28
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Paper • 2401.06003 • Published Jan 11, 2024 • 25
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 65
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Paper • 2312.16145 • Published Dec 26, 2023 • 10
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 24
Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Paper • 2312.13150 • Published Dec 20, 2023 • 15
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 10
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Paper • 2312.12423 • Published Dec 19, 2023 • 13
LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 10
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Paper • 2312.11461 • Published Dec 18, 2023 • 20
VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 22
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 27
PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 16
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

Paper • 2310.17750 • Published Oct 26, 2023 • 9
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

Paper • 2310.13227 • Published Oct 20, 2023 • 15
ControlLLM: Augment Language Models with Tools by Searching on Graphs

Paper • 2310.17796 • Published Oct 26, 2023 • 18
CodeFusion: A Pre-trained Diffusion Model for Code Generation

Paper • 2310.17680 • Published Oct 26, 2023 • 74
Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Paper • 2310.15008 • Published Oct 23, 2023 • 22
Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
Safe RLHF: Safe Reinforcement Learning from Human Feedback

Paper • 2310.12773 • Published Oct 19, 2023 • 28
3D-GPT: Procedural 3D Modeling with Large Language Models

Paper • 2310.12945 • Published Oct 19, 2023 • 61
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 106
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Paper • 2308.00675 • Published Aug 1, 2023 • 37

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs