TEA — The Embedded Alphabet

TEA is a novel approach using contrastive learning to convert protein language model embeddings into a new 20-letter alphabet, enabling highly sensitive and efficient large-scale protein homology searches, without the need for structure.

TEA sequence conversion command, model code, training scripts and documentation are available on GitHub

The Embedded Alphabet (TEA) on Hugging Face

For more detailed information, please refer to our preprint on bioRxiv