-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2205.13147
-
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published -
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published -
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Paper • 2212.03533 • Published • 2 -
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Paper • 2104.08821 • Published
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 3 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45 -
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper • 2309.00255 • Published • 1 -
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 24 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 47 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 49
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 16
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
-
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published -
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published -
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Paper • 2212.03533 • Published • 2 -
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Paper • 2104.08821 • Published
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 47 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 3 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 55 -
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 49
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45 -
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper • 2309.00255 • Published • 1 -
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 24 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25