1. Introduction

"Propedia is a database of protein-peptide complexes"

Propedia 26 is available at https://bioinfo.dcc.ufmg.br/propedia26

What is Propedia?

PROPEDIA is a database of peptide-protein complexes clusterized in three methodologies: based on peptide sequences; based on structure interface; and based on binding sites. PROPEDIA main goal is to give new insights into peptide design of biotechnological interests.

Propedia 26 stats

Entries summary
	pep-pro complexes	multipro	Total
Unique entries	48,685	0	48,685
Duplicated entries	24,707	19,759	44,466
Total	73,392	19,759	93,151

1.1 Overview

Propedia is a publicly accessible, curated database dedicated to protein-peptide interactions. It serves as a central repository for structural, thermodynamic, and functional data of complexes formed between proteins and peptide ligands. Derived from the Protein Data Bank (PDB), Propedia offers a robust platform for researchers in bioinformatics, structural biology, and drug discovery to explore, analyze, and derive insights from these critical molecular interactions.

Protein-peptide interactions are fundamental to numerous cellular processes, including signal transduction, immune response, and enzyme regulation. Understanding the principles that govern these interactions is crucial for deciphering biological mechanisms and developing novel therapeutics. Propedia addresses this need by providing a systematically organized and enriched dataset that goes beyond the raw structural data available in the PDB.

The database is equipped with a user-friendly web interface and powerful search tools, allowing users to query complexes by PDB ID, peptide sequence, protein sequence, specific interaction motifs, or thermodynamic parameters. Furthermore, Propedia integrates advanced analytical capabilities, such as multiple sequence alignment and clustering based on peptide similarity, enabling comparative studies and the identification of binding patterns.

Key Highlights:

Curated Dataset: A comprehensive collection of protein-peptide complexes from the PDB, carefully validated and annotated.
Dual Search Modes: Supports both text-based queries (e.g., PDB ID, UniProt ID) and sequence-based similarity searches (BLAST).
Advanced Filtering: Enables refinement of results by experimental method, resolution, interaction energy, and more.
Integrated Analysis Tools: Built-in tools for visualizing interfaces, aligning sequences, and clustering complexes.
Open Access: All data is freely available for download, supporting reproducible research.

1.2 What's new in version 26

Propedia v26 introduces major updates that significantly expand the database and enhance its analytical power.

1.2.1 Expanded dataset

Increased complex count: The updated version of Propedia now includes 78,148 protein-peptide complexes, representing nearly a fourfold increase in data coverage compared to the previous release (19,813 complexes), an increase of approximately 3.9-fold, as shown in figure 1.
Updated PDB sources: Includes structures from the Protein Data Bank up to 2023, ensuring researchers have access to the most recent structural data.

**Figure 1.** Expanding the dataset. (A) Latest version of Propedia (2026, Propedia v26); (B) Original version of Propedia (2020).

1.2.2 Redesigned user interface

Modernized layout: Complete visual overhaul with improved navigation and responsive design (Figure 2).
Enhanced search page: More intuitive organization of search options and filters.
Advanced results page: Redesigned results table with better sorting capabilities and immediate access to key complex information.

1.2.3 New analytical tools

Peptide clustering: Implementation of a novel peptide similarity clustering algorithm that groups complexes based on peptide sequence similarity, enabling evolutionary and functional analysis (Figure 3), more details in section 2.3 and 2.4.

1.2.4 Improved search capabilities

BLAST Search: Updated sequence search with better performance and more configurable parameters (Figure 4).

1.2.5 Technical improvements

In version 26, the complex details page has been extensively redesigned to offer a much deeper interaction analysis: it now displays atomic data with precise distance measurements and clear categorization of interaction types (hydrogen bonds, hydrophobic contacts, etc.). In addition, complete structural metrics, such as interface area and interaction energy, which were previously absent or very basic, have been incorporated. The presentation of the data has also been reorganized: in v26, the information is distributed across tabs (structure, energy, sequence) for greater clarity; in the old version, everything was on a single page with less organization. From a computational standpoint, energy calculations have been improved with updated algorithms (e.g., NACCESS or equivalents) with more refined parameterization, while the previous version applied basic calculations with limited validation. These topics are shown in Table 1, and they will be discussed in more detail in the following sections.

*Table 1. News in Propedia's property*
Property	Propedia-legacy	Propedia v26
Description Box
PDB Title	Yes	Updated for PDB ID
Resolution (Å)	Yes	Yes
Classification	Yes	Updated for “Description”
Download the complex (PDB file)	Yes	Yes
Download contacts	No	Yes
Download complex data	No	Yes
Structure method	No	Yes
Peptide chain	No	Yes
Protein chain	No	Yes
Peptide length	No	Yes
Protein length	No	Yes
Links to UniProt, PDB, and PubMed	Yes	Yes
Physical-chemical parameters - Protein/peptide Box
Description	Yes	Yes
Organism	Yes	No
Chain	Yes	Yes
Length	Yes	Yes
Binding Area (Å²)	Yes	Yes* (in a new panel)
Molecular Weight	Yes	Yes
Aromaticity	Yes	No
Instability	Yes	Yes
Isoelectric Point	Yes	Yes
Sequence	Yes	Yes
Aliphatic Index	No	Yes
GRAVY	No	Yes
Hydrophobic (%)	No	Yes
Positive Residues	No	Yes
Negative Residues	No	Yes
Atomic Formula	No	Yes
Total Atoms	No	Yes
Extinction Coeff. (with disulfide)	No	Yes
Extinction Coeff. (no disulfide)	No	Yes
Clustering Classification Box
Sequence cluster	Yes	Yes
Contact cluster	Yes	Yes
Interface cluster	Yes	Yes
Unique complex	No	Yes
Similar complex	No	Yes
Similar peptides	No	Yes
PDB classification	Yes	Yes
CSM-peptides classes
Anti-Angiogenic (AAP)	No	Yes
Anti-Bacterial (ABP)	No	Yes
Anti-Cancer (ACP)	No	Yes
Anti-Inflammatory (AIP)	No	Yes
Quorum Sensing (QSP)	No	Yes
Surface Binding (SBP)	No	Yes
Protein-peptide interactions Box
ASA Complex (Naccess)	No	Yes
ASA (protein)	No	Yes
ASA (peptide)	No	Yes
BProA	No	Yes
BPepA	No	Yes
BPP%	No	Yes
BSA	No	Yes
Interaction energy (Prodigy)
Number of intermolecular contacts	No	Yes
Charged–charged contacts	No	Yes
Charged–polar contacts	No	Yes
Charged–apolar contacts	No	Yes
Polar–polar contacts	No	Yes
Apolar–polar contacts	No	Yes
Apolar–apolar contacts	No	Yes
Percentage of apolar NIS residues	No	Yes
Percentage of charged NIS residues	No	Yes
Predicted binding affinity (kcal·mol⁻¹)	No	Yes
Predicted dissociation constant (M) at 25°C	No	Yes
Interface residues (distmax ≤ 6 Å)	No	Yes
Contacts (Calculated using COCaDA)	No	Yes

1.3 How to Cite & Licenses

To cite PROPEDIA, we recommend referencing both the original article and the most recent publication in the database. If specific features or previous versions are used, the respective publications may also be cited. The original 2021 article presents the first description of the database:


          Martins, P.M., Santos, L.H., Mariano, D. et al. Propedia: a database for protein–peptide identification based on a hybrid clustering algorithm. BMC Bioinformatics 22, 1 (2021). doi: 10.1186/s12859-020-03881-z.

Version 2.3, published in 2023, introduces a new representation approach based on structural signatures:


          Martins P, Mariano D, Carvalho FC, Bastos LL, Moraes L, Paixão V, and docs-cardoso de Melo-Minardi R (2023). Propedia v2.3: A novel representation approach for the peptide-protein interaction database using graph-based structural signatures. Front. Bioinform. 3:1103103. doi: 10.3389/fbinf.2023.1103103.

An article for Propedia v26 is currently under development.

1.3.1 License: CC-BY ND 4.0

Propedia v26 data is available under the Creative Commons Attribution 4.0 International (CC BY ND 4.0) license. This license allows:

Unrestricted use, including commercial use.
Sharing and redistribution of the material in any format.
Reproduction in any medium.

Requirements to use the material:

Give appropriate credit to the original authors (cite the articles).
Include a link to the license.
Indicate if changes have been made.
Cite the papers.

Exceptions

Images or third-party materials included in the article may have specific credits or restrictions.
Content not covered by CC BY 4.0 requires permission from the rights holder.

You can use the Propedia data freely in your research, but there is a restriction if you wish to create a competing database.

The code is available on GitHub and is shared under an MIT license.

2. How to use the platform

Propedia v26 can be accessed directly through the official website at:

https://bioinfo.dcc.ufmg.br/propedia26

Upon accessing the home page, users will find an intuitive navigation panel that allows them to quickly explore the database's main features, including complex search, structural visualization, interaction analysis, and download tools.

The initial interface features a top navigation bar that directs users to the Home, About, Browse, Clusters, Downloads, and Help pages. In addition, there is a quick search field that allows users to search for PDB IDs, peptides, or proteins directly.

The page also includes a highlights panel with information on new features and updates in version 26. These details are shown in the Figure below.

In addition, the home page features a highlights/statistics panel that displays, in a visual and objective manner, the main figures from the database, such as the number of complexes available, the number of clusters, and the total size of the database. This section gives users an immediate sense of the scale and information value of Propedia, enabling them to understand the repository's magnitude on their first visit. The page features a section dedicated to the credibility and authorship of the project, which identifies the developers responsible for Propedia. In a further step, the page includes an area dedicated to use cases and practical examples, illustrating how the user can search using an input code. Users can enter the code for a protein-peptide complex, also known as a “Propedia code” (e.g., 1WRZ-B-A, where the first four characters correspond to the PDB code, the fifth character corresponds to the peptide chain, and the sixth character corresponds to the protein chain) or a multipro (e.g., 1MT1-A), which does not specify the protein chain.

At the bottom of the page, institutional support and funding sources linked to the development of Propedia are also indicated, such as the Bioinformatics and Systems Laboratory (LBS), the Department of Computer Science (DCC), and the Federal University of Minas Gerais (UFMG), reinforcing the transparency and academic origin of the platform.

2.1 Entry page

In Propedia 26, each complex formed by the protein-peptide pair has an entry page. The entry page interface is divided into five parts:

Entry description: Contains information extracted from the PDB.
Physicochemical parameters: Contains predicted information using ProtParam.
Interactive 3D structure visualization panel: Enables analysis of the complex's 3D structure.
Clustering information: Contains similar structures based on various methods, such as sequence identity, structural alignment, prediction using machine learning models, and classes extracted from Propedia v1.
Protein-peptide interaction information: Contains predicted information of the complex, such as interaction surface (predicted with NACCESS), binding energy (predicted with Prodigy), and interface residues and contacts (predicted with COCaDA).

2.1.1 Entry description

Contains information extracted from the PDB file, including PDB ID, structure method, resolution, complex, peptide chain, peptide length, protein chain, protein length, and PDB title (description).

The "complex" field contains a link to the multipro page. Peptides that interact with more than one protein chain have an entry in Propedia multipro. A Propedia multipro entry has a 6-character ID: the PDB ID followed by "-" and the peptide chain ID (e.g., 1A1R-C). Information on protein chains complexed to this peptide can be found in the main table on each Propedia multipro entry page.

The contact map button displays a contact map for each pair of chains in the complex. It is shown on both the individual page for each entry and the multipro page. The charts are generated using the chart.js library, and the contacts are calculated using the COCaDA-CLI tool.

The download button allows you to download the input data, as well as the predicted contacts and structures. To obtain structural signatures, calculated using aCSM, and sequence signatures, calculated using iFeature, go to the "Download" page.

2.1.2 Physicochemical parameters

Chain: Unique identifier assigned to each molecular chain within the same crystallographic structure or PDB entry.
Description: Annotated name or description of the polymer chain, as defined in the PDB file (e.g., “Chain A - β-glucosidase”).
Length (residues): Total number of amino acid residues observed in the polymer chain.
Molecular Weight (Da): Total molecular mass of the chain, expressed in Daltons (Da), calculated as the sum of the atomic masses of all atoms in the protein.
Isoelectric Point (pI): The pH value at which the protein carries no net electrical charge, resulting in minimal electrophoretic mobility.
Instability Index: A computed value that estimates the in vitro stability of a protein. Proteins with an instability index greater than 40 are predicted to be unstable, while lower values indicate greater stability.
Aliphatic Index: A measure of the relative volume occupied by aliphatic side chains (Ala, Val, Ile, and Leu). It is often correlated with the thermostability of the protein.
GRAVY (Grand Average of Hydropathy): The average hydropathy score of all amino acids in the sequence, based on the Kyte-Doolittle scale. Positive values indicate a more hydrophobic protein, while negative values suggest a more hydrophilic character.
Hydrophobic (%): The proportion of residues in the sequence that are classified as hydrophobic (e.g., Ala, Val, Leu, Ile, Phe, Trp, Met), expressed as a percentage of the total number of residues.
Positive Residues: Total number of positively charged amino acids in the sequence (Lys, Arg, and His).
Negative Residues: Total number of negatively charged amino acids in the sequence (Asp and Glu).
Atomic Formula: The complete elemental formula representing the protein’s overall atomic composition (e.g., C₂₆₄₄H₄₂₀₅N₇₅₇O₈₁₆S₁₂).
Total Atoms: The total number of atoms constituting the polypeptide chain.
Extinction Coefficient (with disulfide): Molar extinction coefficient (in M⁻¹ cm⁻¹) calculated assuming all cysteine residues form disulfide bonds (Cys–Cys). This value indicates the protein’s absorbance at 280 nm under these conditions.
Extinction Coefficient (no disulfide): Molar extinction coefficient (in M⁻¹ cm⁻¹) calculated assuming no disulfide bond formation, i.e., all cysteine residues remain in the reduced form.
Sequence: The primary amino acid structure of the protein or peptide, defining its linear arrangement of residues.

2.1.3 Interactive 3D structure visualization panel

Allows you to interact with the 3D structure of the protein-peptide complex. You can click on the atoms to display their labels. Left-click and drag to move the protein. Use the mouse scroll wheel to zoom. The top bar lets you adjust the transparency of the complex's surface. Click the maximize button to view the interface in full screen (residue labels are displayed by default).

2.1.4 Clustering information

Unique complex: Indicates whether a protein-peptide pair exists with both sequences identical.
Similar complex: If there is an identical sequence, it indicates which is the main entry with an exact sequence (if the sequence is unique, the entry itself will be considered the leader).
Similar peptide: Indicates a complex that has a peptide with the exact same sequence.
PDB classification: Molecular classification according to PDB.
CSM-peptides classes: CSM-peptides (link) is a web tool and machine learning model that predicts peptide classes based on their sequence. Using a machine learning model inspired by CSM-peptides, Propedia built six models to predict the function of therapeutic peptides. Here, we present the probability that the current peptide belongs to each class. Values range from 0 to 1 (0 = low likelihood, 1 = high likelihood). For more details, see http://doi.org/10.1002/pro.4442.
Anti-Angiogenic (AAP): Probability that the peptide belongs to the Anti-Angiogenic class (cutoff ≥ 0.9).
Function: Inhibit angiogenesis (formation of new blood vessels).
Importance: Prevent tumor growth by limiting nutrient supply.
Applications: Antitumor and antiviral therapies.
Anti-Bacterial (ABP): Probability that the peptide belongs to the Anti-Bacterial class (cutoff ≥ 0.9).
Function: Destroy or inhibit bacterial growth.
Mechanism: Interact with bacterial membranes, causing lysis.
Importance: Potential alternative to antibiotics in the context of resistance.
Anti-Cancer (ACP): Probability that the peptide belongs to the Anti-Cancer class (cutoff ≥ 0.9).
Function: Selectively kill tumor cells.
Mechanisms: Alter membrane permeability, trigger apoptosis, modulate signaling pathways.
Applications: Next-generation antineoplastic therapies.
Anti-Inflammatory (AIP): Probability that the peptide belongs to the Anti-Inflammatory class (cutoff ≥ 0.9).
Function: Reduce or regulate inflammatory responses.
Mechanism: Inhibit pro-inflammatory cytokines or modulate macrophages.
Applications: Treatment of chronic inflammatory and autoimmune diseases.
Quorum Sensing (QSP): Probability that the peptide belongs to the Quorum Sensing class (cutoff ≥ 0.9).
Function: Participate in bacterial communication (biofilm formation, virulence).
Importance: Target for non-bactericidal infection control strategies.
Surface Binding (SBP): Probability that the peptide belongs to the Surface Binding class (cutoff ≥ 0.9).
Function: Bind to biological or material surfaces (e.g., metals, polymers, minerals).
Biotechnological Uses: Immobilization of enzymes, biomaterials, biosensors, nanodevices.
Examples: Peptides that bind to gold, silica, metal oxides for nanotechnology.

2.1.5 Protein-peptide interaction information

Surface (calculated using Naccess): Accessible surface analyses were performed using the NACCESS program, which implements the classic Lee & Richards algorithm (see reference) to calculate the Accessible Surface Area (ASA). This method simulates the path of a 1.4 Å radius spherical probe — equivalent to the approximate size of a water molecule — over the structure's van der Waals surface, estimating the total area exposed to the solvent.
ASA: Accessible Surface Area (ASA) is the measure of the entire surface area of the molecule that is exposed and can come into contact with the solvent (usually water).
ΔASA (protein): ΔASA_protein represents the surface area that is no longer exposed to the solvent upon complex formation and is calculated by the equation: ΔASA = ASA_unbound − ASA_bound. (Value given in Å²)
ΔASA (peptide): ΔASA_peptide represents the surface area that is no longer exposed to the solvent upon complex formation and is calculated by the equation: ΔASA = ASA_unbound − ASA_bound. (Value given in Å²)
BProA: Buried protein area (value given in Å²).
BPepA: Buried peptide area (value given in Å²).
BPP%: Buried Peptide Percentage (%), obtained by the expression: 100 × BPepA / ΔASA_peptide.
BSA: Buried Surface Area represents the area effectively shared at the binding interface and can be calculated using the formula: BSA = (ASA_protein + ASA_peptide − ASA_complex) / 2.

Interaction Energy (calculated using PRODIGY): Estimated binding free energy (ΔG) of the protein–peptide complex, predicted by the PRODIGY command-line tool. See the documentation for details. More information about the methodology can be found at the Prodigy website.
Number of Intermolecular Contacts: Total number of atomic contacts between the protein and peptide within a specified cutoff distance (typically ≤ 5.5 Å). A higher number of contacts generally indicates a more extensive interaction interface.
Number of Charged–Charged Contacts: Number of interactions between oppositely charged residues (e.g., Lys–Asp, Arg–Glu) across the interface, contributing significantly to electrostatic stabilization.
Number of Charged–Polar Contacts: Count of contacts between charged residues and polar uncharged residues (e.g., Lys–Ser, Asp–Thr), which often form hydrogen bonds or dipole interactions.
Number of Charged–Apolar Contacts: Number of contacts between charged residues and hydrophobic residues (e.g., Arg–Leu, Lys–Val). These interactions contribute less to stability but may influence interface geometry.
Number of Polar–Polar Contacts: Number of interactions between polar uncharged residues (e.g., Ser–Thr, Asn–Gln), frequently involving hydrogen bonding or dipole alignment across the interface.
Number of Apolar–Polar Contacts: Count of interactions between hydrophobic and polar residues, contributing to partial desolvation and interface packing.
Number of Apolar–Apolar Contacts: Number of hydrophobic–hydrophobic interactions (e.g., Leu–Val, Phe–Ile) that stabilize the interface through exclusion of water molecules (hydrophobic effect).
Percentage of Apolar NIS Residues (%): Proportion of residues in the Non-Interacting Surface (NIS) that are apolar, expressed as a percentage. Indicates the hydrophobic character of the exposed surface outside the binding interface.
Percentage of Charged NIS Residues (%): Proportion of residues in the NIS that are charged (positive or negative), indicating the electrostatic character of the surface not involved in binding.
Predicted Binding Affinity (kcal·mol⁻¹): Estimated Gibbs free energy of binding (ΔG), in kilocalories per mole. More negative values correspond to stronger binding.
Predicted Dissociation Constant (M) at 25.0 °C: Predicted dissociation constant (K_d), in molar units (M), at 25 °C. Represents the expected concentration at which half of the binding sites are occupied. Lower values indicate stronger affinity.

Interface Residues (distmax ≤ 6 Å): List of residues located within 6 Å of the interacting partner, defining the binding interface between the protein and peptide.
Contacts (calculated using COCaDA): Number and type of interatomic contacts calculated by the COCaDA tool (https://bioinfo.dcc.ufmg.br/cocada-web), used to characterize specific atom–atom interactions across the interface.

Contact map captions:

HB: Hydrogen bond
HY: Hydrophobic
AT: Attractive
RE: Repulsive
AR: Aromatic
SB: Salt Bridge
DS: Disulfide bonds
UN: Unknown

Criteria for defining contacts:

Contact Type	Distance range (Å)	Description	Acronym
Hydrogen Bond	0 ≤ dist ≤ 3.9	Acceptor and Donor atom pair	HB
Disulfide Bond	0 ≤ dist ≤ 2.8	Cys:SG atom pair	DS
Hydrophobic	2.0 ≤ dist ≤ 4.5	Hydrophobic atom pair	HY
Repulsive	2.0 ≤ dist ≤ 6.0	Equally charged atoms	RE
Attractive	3.9 ≤ dist ≤ 6.0	Differently charged atoms	AT
Sulft Bridge	0 ≤ dist ≤ 3.9	Equally charged atoms AND hydrogen bonding	SB
Aromatic Stacking	2.0 ≤ dist ≤ 5.0	Centroids of two aromatic rings in parallel or perpendicular orientation	AS

Source: https://bioinfo.dcc.ufmg.br/cocada-web/public/documentation

2.2 BLAST tool

The BLAST (Basic Local Alignment Search Tool) identifies local similarities between protein sequences. It compares a query sequence with sequences stored in a database, evaluating the statistical relevance of the matches found (Mariano et al., 2015; Wheeler; Bhagwat, 2016). The BLAST tool available in PROPEDIA allows users to search for peptides or proteins similar to those present in the database, using local alignment based on sequence similarity. This functionality is essential for identifying structurally or functionally related complexes, locating similar peptides already described in the database, and facilitating comparative studies, evolutionary analyses, and functional inference.

The Propedia sequence search system is implemented using the BLAST+ package, as described in Altschul et al. (1990) and Camacho et al. (2009). The tool compares the sequence provided by the user with all sequences deposited in PROPEDIA 26, returning the best local alignments, along with identifiers of the associated complexes, similarity metrics, and coverage and identity information.

The search can be performed for both peptides and proteins, and each type of query uses different parameters, adjusted for greater sensitivity according to the size of the sequence analyzed.

2.2.1 Parameters and Configuration

Peptides have short sequences and require specialized parameters to ensure good sensitivity. For this reason, Propedia uses:

word_size 2

The word-size is a NCBI parameter which determines the minimum size of the fragment (“word”) that must match between the query sequence and the database sequences for the algorithm to initiate an alignment extension. A word is the smallest sequence block that BLAST uses to identify possible regions of similarity between the query sequence and the database. The sequence is fragmented into all possible word sizes. For example, if word-size = 3, the protein ACDEFG becomes: ACD, CDE, DEF, EFG. BLAST searches the database for identical or similar occurrences of these words.

As described in the NCBI documentation (“BLAST Search Parameters - BlastTopics 0.1.1 documentation”, [s.d.]), BLAST operates heuristically, first identifying “hot spots,” i.e., short local matches, which can then expand into more complete alignments. In protein searches, these matches do not need to be identical and may involve similarity based on the substitution matrix. According to BLAST logic, reducing the word size increases sensitivity, as it allows relevant matches to be detected even when the comparison space is limited. Thus, using word size = 2 favors the detection of small hot spots capable of initiating extensions in peptide queries.

task blastp-short

The task blastp-short parameter activates an optimized version of BLASTP specifically configured to handle short protein sequences, typically with fewer than 30 amino acids (Table C3: [blastp application options. The blastp...].”, 2021). This mode automatically adjusts various internal aspects of the algorithm to maximize sensitivity and detection of real similarity, even when the amount of information (sequence length) is very low.

In implementing the sequence search system in Propedia, -word_size 2 was used in conjunction with -task blastp-short. This choice is directly aligned with the expected behavior for searches involving short peptides, whose sequences have few positions for forming larger “words.”

seg no

A tool designed to filter low-complexity segments in amino acid sequences. In alignments, residues that have been masked are displayed as “X.” SEG filtering is no longer the default option in the NCBI blastp service due to the adoption of compositional adjustments for estimating BLAST statistics (Fassler; Cooper, 2011). The -seg no parameter disables complexity masking, which would be undesirable in such short sequences.

evalue 100000

The E-value represents the probability that an observed alignment arose by chance. Under normal conditions, values close to zero indicate highly significant alignments, while high values tend to be disdocs-carded because they represent statistical noise. However, the behavior of the E-value changes dramatically for short sequences, such as peptides, which is exactly the case with Propedia. These settings allow minimal peptides, including fragments with only 5-10 amino acids, to find significant matches in the database.

For complete proteins, Propedia uses a more conservative set of parameters that are better suited for long sequences:

word_size 3

When dealing with full-length protein sequences, the search behavior differs substantially from searches involving short peptides. Longer sequences contain a much larger amount of information, allowing BLAST to reliably detect similarity using more stringent initial seeds. In this context, the parameter word_size 3 is more appropriate because it requires longer contiguous matches (3 amino acids) before extending an alignment. This choice reduces noise, improves specificity, and accelerates the search, as larger words decrease the number of initial “hotspots” generated during the seeding phase. Since full proteins typically range from hundreds to thousands of residues, a word-size of 3 does not compromise sensitivity: even distantly related proteins usually share enough local similarity to satisfy this requirement.

Therefore, for protein-versus-protein searches, PROPEDIA adopts a more conservative configuration to balance sensitivity and performance. This contrasts with peptide searches, where shorter sequences require extremely permissive parameters. The distinction ensures that each type of query, short peptides versus complete proteins, is processed using criteria tailored to its biological characteristics and statistical behavior under the BLAST algorithm.

A summary of all parameters is illustrated in Figure 6. It is important to note that BLAST alignment will always search for peptides if the input is a peptide sequence, or proteins if the input is a protein sequence.

2.2.2 How to use Propedia BLAST?

When you access the Propedia website, the home page displays “BLAST” in the navigation bar (Figure 5, 3). Clicking on it will open a window where you can enter your peptide or protein sequence (Figure 4). Before running BLAST, you must indicate whether your sequence is peptide or protein, because, as seen in section 2.1.1, the parameters for alignment are different for each type of sequence. To start, simply click on the “Run Blast” button and wait a few seconds for the result.

2.2.3 Other search tools

Propedia allows users to search in three ways: (1) BLAST; (2) based on link sites (uses ProBis); and (3) traditional search bar (uses regular expressions to find entries based on descriptions). We will discuss this further in the following sections.

2.3 Clusters

The Clusters page of Propedia v26 presents an organized view of clusters obtained from different methods of similarity between proteins, peptides, and interaction interfaces. These clusters are fundamental for exploratory navigation, redundancy identification, structural comparison, and functional inference.

The different clusters are summarized in the table below.

Table. Propedia's v26 clusters.
Cluster Type	Description
Seq100	Peptides exhibiting complete sequence identity (100%) are clustered within this category
Redundant Sequences	Complexes built from protein–peptide pairs sharing 100% identical sequences are grouped within this category
Classifications (PDB)	Entries in this category are grouped based on the classes defined within their respective PDB files
Sequence (Propedia v1)	Inherited from Propedia v1; see the seq100 category for the clustering approach used in Propedia26 for new entries*
Interface (Propedia v1)	This category originates from Propedia v1*
Binding Site (Propedia v1)	This category originates from Propedia v1*
CSM-peptides inspired	CSM-peptides is a sequence-based prediction server employing machine learning to assign functional categories to biologically active peptides. Using approaches adapted from Rodrigues et al. (2022), we developed models to classify Propedia26 peptides into six classes: AAP, ABP, ACP, AIP, QSP, and SBP.

2.3.1 History and evolution of clustering in Propedia

Early versions of Propedia used three main clustering approaches:

Sequence-based clusters: constructed with Hammock v1.2, which identified identical or highly similar peptide sequences. In the initial version, 3,495 unique sequences were detected, grouped into 771 clusters and 1,074 singletons, totaling 1,845 peptide clusters.
Interface-based clusters: generated using MUSTANG, which performs multiple structural alignments. This method identified 535 clusters and 1,356 singletons, resulting in 1,891 interfaces.
Binding-site-based clusters: defined by the ProBiS algorithm, which detects local similarities between protein surfaces. A total of 521 clusters and 945 singletons were formed, totaling 1,466 distinct binding sites.

These methods allowed the user to identify peptides that could interact with the same site or exhibit similar structural properties. Propedia 2.3 introduced the use of structural signatures to detect similarity patterns. However, this was not used for clustering; it was only used to evaluate previous results.

2.3.2 Redundancy and cluster formation in version 26

In version 26 of Propedia, the pipeline has been expanded and modernized. The main steps include:

Redundancy detection by sequence combination: proteins and peptides have their sequences concatenated, allowing completely identical complexes to be identified. This resulted in 51,082 unique complexes.
Canonical Non-Redundant (CNR) dataset: from all peptides containing only canonical amino acids, 17,509 unique peptide sequences were extracted, forming the new set of non-redundant peptides.
Recalculation of previous clusters: all clusters from past versions were redone using Python scripts and modern structural analysis tools.
Automated annotation and classification: structural and functional parameters were extracted directly from PDB files using the Biopython library (Bio.PDB).

As a result, the clustering process became more robust, scalable, and reproducible.

2.2.3 Practical applications of clusters

The clusters provided by Propedia v26 are a central tool for exploring, comparing, and selecting protein-peptide complexes. In the Clusters tab, users can browse clusters organized by three complementary criteria: peptide sequence similarity, interface structural similarity, and binding site similarity. For each cluster, the interface displays the group size, its members, and similarity metrics. It is also possible to directly access the page for each complex, where relevant structural, physicochemical, and functional properties are available.

These features not only facilitate exploration of the database, but also support several practical applications:

Identification of peptides with similar binding patterns: integrated visualization of interfaces, sites, and alignments allows you to quickly locate peptides that share modes of interaction, helping to identify conserved hotspots and understand the molecular determinants of recognition.
Detection and control of redundancy in experiments and computational analyses: by displaying the composition of clusters and allowing the selection of centroids, the system helps remove redundant complexes before statistical analyses, machine learning training, or docking benchmarks, reducing biases and increasing data representativeness.
Structural comparisons in evolutionary and functional studies: Structural and site clusters allow exploration of relationships between complexes that maintain similar binding modes, even when they have low sequence identity.
Selection of candidates for molecular repositioning or rational peptide design: the combination of sequence, interface, and intra-cluster variability information helps identify peptides with the potential to be reused in new target proteins or as a starting point for rational engineering. Clusters reveal peptides that are structurally compatible or capable of mimicking specific interactions.

By integrating visualization, similarity metrics, and direct access to the structural characteristics of the complexes, the Clusters tab provides a solid foundation for in silico screening, structural biology studies, bioinformatics, and the discovery of new therapeutic molecules. Also, at the bottom of the page, you will find a download button that allows you to download the entire cluster of interest.

2.4 Available downloads

The Downloads section provides access to key files and resources derived from the database. The main list (Propedia v26 - New) is illustrated in the Figure below.

In addition to the main section, the page provides legacy versions (Propedia v2.3 and Propedia v1) with historical files (summarized in Figure 8), for example:

Propedia v2.3: separate sets of PDBs (peptides, receptors, complexes), signatures, and FASTA files. Useful for reproducibility of previous work.
Propedia v1: complete CSV files, PDBs, and SQL dumps from the original database.

2.4.1 Quick usage recommendations

For tabular analysis and subset selection: download propedia_26.csv and open with pandas/R.
For batch structural reprocessing: download propedia_26.zip (or multipro_v6.zip if working with multiprotein inputs).
For peptide-focused studies (peptide signatures/FASTA/PDB): use peptides_pdb.zip, sequence_signature.zip, and structural_signature.zip.
For clustering and redundancy analysis: download clusters.zip and, if necessary, legacy files for historical comparison.

Some important considerations include file sizes and the number of entries specified on the Download page, which may be updated as new versions become available. Furthermore, when reusing the data, please respect the licenses and citations indicated in the “How to cite” section 1.3.

2.5 Explore Page

The Explore page is the main interface for browsing and filtering Propedia protein-peptide complexes. It brings together interactive filters, options to reduce redundancy, and a table of entries that allows quick inspection and direct download of associated files. A quick tutorial is illustrated in the figure below.

2.5.1 Practical search examples, tips and best practices

In the table below, we list some examples for you to practice different ways of exploring in Propedia v26.

Table 3. Practical examples to explore in Propedia v26.
Objectives	What to do
Find short antimicrobial peptides (2-10 aa)	Set Min = 2, Max = 10; select classifications related to ANTIMICROBIAL or ANTIBIOTIC; apply filters
Search for peptides that interact with transcription proteins	Filter by TRANSCRIPTION and then inspect sequences and interfaces on the detail pages
Extract non-redundant canonical set	Check Only canonical amino acids + Remove redundancy and export the list (via Individual download or using propedia_26.csv for batch processing)

For large-scale analyses, we recommend downloading propedia_26.csv from the Downloads page and applying filters locally (pandas/R), it is faster and more reproducible. Use the Remove redundancy option before generating statistics to avoid bias from repeated entries. Combine filters (size + classification + canonicity) to reduce results and facilitate manual inspection. If you want to compare similar groups, use the Clusters tab after identifying relevant hits in Explore.

2.5.2 Troubleshooting

If no results are displayed after applying filters, check that the size range or combination of classifications is not too restrictive; if necessary, remove some filters and try again. If the list displayed is too long or the page appears slow, consider reducing the filters or using the local CSV file to perform the filtering, especially on slower connections where downloading individual PDB files may take time. If the download does not work, check the Download link corresponding to the selected row and, if the problem persists, use the Downloads section to obtain the files in batches. Finally, discrepancies observed in the “Unique?” field are expected, as this attribute reflects the internal methodology for removing redundancy, based on the concatenation of protein and peptide sequences,whose details and criteria can be found in the Clusters area or in the technical documentation.

2.5.3 ID page: e.g.: 1A0N-A-B

The ID page (Figure 10) displays all available data and analyses for a specific protein-peptide complex: metadata (PDB, experimental method, description), sequences, calculated physicochemical properties, cluster classification, surface and energy metrics, atomic contact table, contact map, and 3D viewer. It also provides download links and shortcuts to external resources (RCSB PDB, UniProt, PubMed).

The header and metadata include the Identifier (ID), for example, 1A0N-A-B, which indicates the PDB code accompanied by the peptide and protein chains, as well as external links that provide direct access to the corresponding entries in RCSB PDB, UniProt, and PubMed. The structural method is also presented, containing the experimental technique used (such as SOLUTION NMR) and, when available, the resolution of the structure, as well as a concise description of the complex, such as in “Calmodulin complexed with a peptide...”. The page shows two columns (Protein/Peptide) with automatically calculated sequences and properties, for example: sequence (complete receptor chain and peptide sequence), length, molecular weight, isoelectric point (pI), instability index, aliphatic index, GRAVY, % Hydrophobicity, Residues + / -, atomic formula, total atoms and extinction coefficient. All of these properties are shown in Figure 10. These values are useful for rapid assessment of physicochemical properties and for filtering in pipelines.

2.5.3.1 How should it be interpreted?

In section 2.1, you saw the description of all items in the column that biochemically characterize the protein-peptide complex. The “Classification and Clusters” section presents information on structural similarities and the classification generated by clustering, indicating whether the complex is considered unique and listing other similar complexes or peptides identified by sequence, interface, or binding site grouping methods. It also includes CSM-peptides classes, which provide predictive scores for different functional categories, such as antibacterial, anticancer, or quorum sensing activities, accompanied by their respective confidence values. In practice, this section can be used to locate related complexes, for example, to identify alternative candidates that share the same binding site.

The protein-peptide interaction analysis section presents surface metrics calculated by Naccess, including the solvent-accessible area (ASA) for the complex, the protein, and the peptide, as well as interface-related parameters such as BProA, BPepA, BPP%, and BSA, which describe the contribution of each chain and the buried area in the interaction. For formal details on the formulas Naccess uses to calculate ASA and BSA, we recommend consulting its official documentation. The page also provides energy information generated by Prodigy, displaying the number of intermolecular contacts by type, the predicted affinity in kcal/mol, and the estimated Kd, although some fields may remain empty depending on the structure or availability of calculations. Complementing these analyses, the system lists the interface residues identified by COCaDA and presents a detailed contact table containing atomic or residual pairs, distances in angstroms, interaction type and class, such as hydrogen bonds (HB) or hydrophobic contacts (HY), with the possibility of filtering by backbone, side chain, or interaction category. In practice, it is recommended to observe contacts with a distance of less than 3.5 Å and marked as “HB” to identify potential hydrogen bonds, while “HY” interactions often indicate hydrophobic components relevant to affinity. The contact map should be analyzed in conjunction with the interactive 3D viewer on the page, which allows you to rotate the structure, inspect the interface, highlight residues present in the table, and save images configured from these views. These characteristics are illustrated in the figure below.

The Download / PDB file buttons in the header allow you to download the complex PDB (or open the entry in RCSB) and download any associated reports shown on the page.

2.5.3.2 Troubleshooting

Some energy fields or contact counts may appear empty. This can occur when the calculation failed or was not applicable to the input (e.g., NMR ensemble without a standard model). Check for the presence of expected atomic coordinates in the PDB.

Legends/abbreviations in the contact table may vary; if there is no explicit legend, use the website documentation or inspect the names to infer (HB → Hydrogen Bond, HY → Hydrophobic, etc.).

2.6 Search for Similar Binding Sites (ProBiS)

The Search for Similar Binding Sites tool allows you to identify binding sites that are structurally similar to the one you specify. This feature is handy for:

Finding peptides that bind to equivalent regions in different proteins.
Detecting functional structural conservation even among proteins with low sequence similarity.
Exploring possible molecular recognition mechanisms in distant families.

The search is based on the ProBiS (Protein Binding Sites) algorithm, which performs local structural alignment between protein surfaces. Unlike global methods, ProBiS searches for local three-dimensional patterns of physicochemical properties, including geometry, functional groups, curvature, and electrostatic characteristics. A tutorial is shown in Figure 12.

This tool should be used to locate other experimental complexes in which the peptide interacts with equivalent sites, as well as to predict cross-reactivity, identifying peptides capable of binding to multiple proteins that have similar surfaces. It is also useful for exploring mutations, allowing the evaluation of whether structural changes at the site modify its similarity to already known sites, in addition to assisting in the identification of functional analogues in proteins that have not yet been characterized.

2.6.1 Example (ProBiS)

To perform a search for binding sites, click the option in the top menu. Enter the PDB ID used in the search, including the chain, and the residues that compose the desired binding site. The figure below shows an example for the 1a1m (chain A) structure and their binding site: 60,62-82,146-171.

Wait for the result. Propedia uses the ProBis algorithm to perform parallelized searches (on average, searches take about 10 minutes).

At the end, Propedia returns a list of structures with similar binding sites. Note that similar regions are highlighted in green (the input is displayed on the left, and the result is displayed on the right). Click on the radio input fields to change the structure shown on the right.

3. Final Considerations

This documentation presented the structure, functionalities, and usage flows of Propedia v26, including navigation, data models, structural analyses, algorithms employed, and search methods by interaction and binding sites.

As a database dedicated to protein-peptide complexes, Propedia remains in active development, maintaining its commitment to transparency, reproducibility, and continuous updating. We hope that this tool will provide solid support for research in structural bioinformatics, peptide design, biomolecular interaction mining, and the development of computational methods.

For questions, suggestions, or feature requests, users can contact the team at the address listed on the project's official website.

We appreciate your use of the platform and hope that Propedia will contribute significantly to the advancement of your research.

4. References

ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. Basic local alignment search tool. J. Mol. Biol. v. 215, p. 403-410, 1990.

BANK, Protein Data. Protein data bank. Nature New Biol, v. 233, n. 223, p. 10-1038, 1971.

BERMAN, Helen M., et al. “The protein data bank.” Biological Crystallography 58.6 (2002): 899–907.

BLAST Search Parameters - BlastTopics 0.1.1 documentation. Disponível em: <https://blast.ncbi.nlm.nih.gov/doc/blast-topics/blastsearchparams.html#word-size>.

CAMACHO, C. et al. BLAST+: architecture and applications. BMC Bioinformatics. v. 10, p. 421, 2009.

FASSLER, J.; COOPER, P. BLAST Glossary. Disponível em: <https://www.ncbi.nlm.nih.gov/books/NBK62051/>.

GASTEIGER, E. et al. Protein identification and analysis tools on the ExPASy server. In: The proteomics protocols handbook, p. 571–607 (Springer, 2005).

HUBBARD, S. J.; THORNTON, J. M. (1993). "NACCESS", Computer Program, Department of Biochemistry and Molecular Biology, University College London.

LEMOS, Rafael Pereira, et al. “COCαDA - A fast and scalable algorithm for interatomic contact detection in proteins using Cα distance matrices.” Frontiers in Bioinformatics 5 (2025): 1630078.

LEMOS, Rafael P., et al. “Cocαda - large-scale protein interatomic contact cutoff optimization by Cα distance matrices.” Simpósio Brasileiro de Bioinformática (BSB). SBC, 2024.

MARIANO, D. C. B.; BARROSO, J. R. P. M.; CORREIA, T. S.; DE MELO-MINARDI, R. C. Introdução à Programação para Bioinformática com Biopython. 3. ed. North Charleston, SC (EUA): CreateSpace Independent Publishing Platform, v. 1. 230 p. 2015.

Table C3: [blastp application options. The blastp...]. Disponível em: <https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.T.blastp_application_options/>.

WHEELER, D.; BHAGWAT, M. BLAST QuickStart. Disponível em: <https://www.ncbi.nlm.nih.gov/books/NBK1734/>.

XUE, Li C., et al. “PRODIGY: a web server for predicting the binding affinity of protein–protein complexes.” Bioinformatics 32.23 (2016): 3676–3678.

Documentation