Full Mamba (SSM) with Agent Attention and Fast Feed Forward Sparse Activations | Los Angeles

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

Sign in View FAQ

January 10, 2024 · Los Angeles

Mamba: Agent Attention, Sparse FFN

Overview

Implemented some code I’d like to share with a live demonstration on training

I started with some boilerplate code
https://colab.research.google.com/drive/1g9qpeVcFa0ca0cnhmqusO4RZtQdh9umY
compared with straight mamba as from hf models
https://huggingface.co/state-spaces/mamba-x

After trying both, realized the former trained better due to the added attention layer.

Using the custom code.

I was able to implement the following papers

Exponentially Faster Language Modelling
https://arxiv.org/abs/2311.10770

Agent Attention: On the Integration of Softmax and Linear Attention
https://arxiv.org/abs/2312.08874

For small use cases it shows promise in less than 24 hours (for example, training on quotes).

Going to be a live demo. Can be done all in one file less than 600 lines.

Links

https://gist.github.com/thistleknot/e9227e5149b6fc6f2f0d7443ef7e8456
https://colab.research.google.com/drive/1g9qpeVcFa0ca0cnhmqusO4RZtQ...
Trains a PyTorch Mamba SSM for Shakespearean character-level sequence generation.
https://huggingface.co/state-spaces/mamba-x
https://arxiv.org/abs/2311.10770
https://arxiv.org/abs/2312.08874

Tech stack