TEA — The Embedded Alphabet

STEAM

Search with TEA against Many

Search PDB, UniRef50 and AFDB Clusters with TEA using your protein sequence.

Downloads

All the embeddings FASTA format.

Get the full TEA datasets in FASTA format for your research and applications.

TEA is a novel approach using contrastive learning to convert protein language model embeddings into a new 20-letter alphabet, enabling highly sensitive and efficient large-scale protein homology searches, without the need for structure.

TEA sequence conversion command, model code, training scripts and documentation are available on GitHub

The Embedded Alphabet (TEA) on Hugging Face

For more detailed information, please refer to our preprint on bioRxiv