Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Mamba: Agent Attention, Sparse FFN
Implemented some code I’d like to share with a live demonstration on training
I started with some boilerplate code
https://colab.research.google.com/drive/1g9qpeVcFa0ca0cnhmqusO4RZtQdh9umY
compared with straight mamba as from hf models
https://huggingface.co/state-spaces/mamba-x
After trying both, realized the former trained better due to the added attention layer.
Using the custom code.
I was able to implement the following papers
Exponentially Faster Language Modelling
https://arxiv.org/abs/2311.10770
Agent Attention: On the Integration of Softmax and Linear Attention
https://arxiv.org/abs/2312.08874
For small use cases it shows promise in less than 24 hours (for example, training on quotes).
Going to be a live demo. Can be done all in one file less than 600 lines.
Trains a PyTorch Mamba SSM for Shakespearean character-level sequence generation.