The MAMBA design transformer by using a language modeling head on top rated (linear layer with weights tied to your input
It starts off having a linear projection to extend on the input embeddings. Then, a convolution https://k2spiceshop.com/product/liquid-k2-on-paper-online/