Xiaol.x - VideoAndMovie

Researcher
Currently, my research focuses on RNNs in large language models (LLMs). I’m interested in exploring how to adapt transformer-based models, like the 671B R1, to use RNN attention. You can check out my ongoing work on ARWKV here.
https://huggingface.co/papers/2501.15570

X: https://x.com/xiaolGo
Papers: https://scholar.google.com/citations?user=TPJYxnkAAAAJ

DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor

12:20

DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor

FlowTSE: Target Speaker Extraction with Flow Matching

12:53

FlowTSE: Target Speaker Extraction with Flow Matching

Outcome-based Reinforcement Learning to Predict the Future

22:11

Outcome-based Reinforcement Learning to Predict the Future

Reasoning Models Don't Always Say What They Think

25:43

Reasoning Models Don't Always Say What They Think

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

15:56

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

WebDancer: Towards Autonomous Information Seeking Agency

22:13

WebDancer: Towards Autonomous Information Seeking Agency

efunc: An Efficient Function Representation without Neural Networks

18:04

efunc: An Efficient Function Representation without Neural Networks

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention...

19:30

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention...

ATLAS: Learning to Optimally Memorize the Context at Test Time

15:12

ATLAS: Learning to Optimally Memorize the Context at Test Time

LoLA: Low-Rank Linear Attention With Sparse Caching

19:08

LoLA: Low-Rank Linear Attention With Sparse Caching

Characterizing the Expressivity of Transformer Language Models

14:46

Characterizing the Expressivity of Transformer Language Models

Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning

12:26

Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning

RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding

17:59

RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding

SELF: Self-Extend the Context Length With Logistic Growth Function

20:51

SELF: Self-Extend the Context Length With Logistic Growth Function

SageAttention2++: A More Efficient Implementation of SageAttention2

20:19

SageAttention2++: A More Efficient Implementation of SageAttention2

How Do LLMs Perform Two-Hop Reasoning in Context?

19:22

How Do LLMs Perform Two-Hop Reasoning in Context?

One-shot Entropy Minimization

13:41

One-shot Entropy Minimization

Geometric Hyena Networks for Large-scale Equivariant Learning

17:01

Geometric Hyena Networks for Large-scale Equivariant Learning

AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer

16:35

AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

16:58

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

22:17

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

Hardware-Efficient Attention for Fast Decoding

22:31

Hardware-Efficient Attention for Fast Decoding

Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization

18:27

Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

16:02

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Hybrid Mamba-Transformer Decoder for Error-Correcting Codes

15:31

Hybrid Mamba-Transformer Decoder for Error-Correcting Codes

1bit-Merging: Dynamic Quantized Merging for Large Language Models

14:06

1bit-Merging: Dynamic Quantized Merging for Large Language Models

Test-Time Training Done Right

22:23

Test-Time Training Done Right

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

17:27

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models?

15:37

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models?

Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion

15:59

Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion