TEA — The Embedded Alphabet
STEAM
Search with TEA against Many
Search PDB, UniRef50 and AFDB Clusters with TEA using your protein sequence.
!! COMING SOON !!
Downloads
All the embeddings FASTA format.
Get the full TEA datasets in FASTA format for your research and applications.
TEA is a novel approach using contrastive learning to convert protein language model embeddings into a new 20-letter alphabet, enabling highly sensitive and efficient large-scale protein homology searches, without the need for structure.
TEA sequence conversion command, model code, training scripts and documentation are available on GitHub
The Embedded Alphabet (TEA) on Hugging Face
For more detailed information, please refer to our preprint on bioRxiv