1. Introduction

RNApedia is a database of protein–RNA complexes

RNApedia is a publicly available database designed to store, organize, and analyze three-dimensional (3D) structures of protein–RNA complexes. The platform provides a unified environment for accessing structural, sequence-based, and interaction-level information, enabling systematic studies of protein–RNA recognition and binding mechanisms.

RNApedia is available at:

https://bioinfo.dcc.ufmg.br/rnapedia3/


What is RNApedia?

RNApedia is a curated database dedicated to protein–RNA interactions, focusing on experimentally resolved complexes deposited in the Protein Data Bank (PDB)[1]. Unlike general structural repositories, RNApedia organizes data at the protein–RNA pair level, allowing precise characterization of interactions between individual protein chains and RNA molecules.

The primary goal of RNApedia is to provide a standardized and interaction-centered resource that supports both biological interpretation and computational modeling of protein–RNA complexes. The database was designed to facilitate large-scale analyses, comparative studies across RNA classes, and the development of predictive models, including machine learning and artificial intelligence approaches.


1.1 Overview

Protein–RNA interactions play a central role in essential cellular processes such as transcriptional regulation, RNA processing, translation, ribosome assembly, and post-transcriptional control. Despite the availability of raw structural data in the PDB, protein–RNA complexes remain challenging to analyze systematically due to structural heterogeneity, variable annotation standards, and the lack of interaction-focused organization[2].

RNApedia addresses these challenges by extracting, curating, and enriching protein–RNA complexes with uniform annotations, RNA classification, interaction metrics, and structural descriptors. Each entry in the database represents a well-defined protein–RNA interaction pair, enabling consistent comparisons across complexes.

The platform integrates an intuitive web interface with advanced search and exploration tools, allowing users to query complexes by PDB identifiers, RNA type, experimental resolution, interaction features, and sequence-derived properties. All data are freely available, promoting reproducibility and broad reuse within the scientific community.


Key Highlights


1.2 Database content and scope

RNApedia encompasses protein–RNA complexes resolved by experimental methods such as X-ray crystallography, cryo-electron microscopy, and NMR spectroscopy. The database includes a diverse range of RNA classes, including ribosomal RNAs, transfer RNAs, messenger RNAs, and non-coding RNAs, reflecting the broad functional landscape of protein–RNA interactions.

For each complex, RNApedia provides:


1.3 Intended applications

RNApedia was designed to support a wide range of applications, including:


2. How to use RNApedia

2.1 Home page

RNApedia provides an integrated web interface that enables access, exploration, and analysis of protein–RNA complexes in a systematic and user-friendly manner. The homepage of the platform (Figure 1) was designed to centralize the main functionalities, allowing users to quickly initiate data exploration or targeted analyses.

As shown in Figure 1, the top navigation bar organizes the core modules of RNApedia, including institutional information (About), technical documentation (Documentation), data download options (Download), data exploration (Explore), advanced querying (Advanced Search), and a dedicated tool for protein–RNA contact analysis (Contact Analysis Tool). This modular organization facilitates a clear transition between data retrieval, exploration, and analysis workflows.

Also on the homepage (Figure 1), the quick search field enables direct retrieval of specific complexes using a PDB identifier, an internal RNApedia identifier, or chain-specific notation. This feature is particularly useful for users who already have a predefined complex of interest.

The homepage further provides shortcut buttons to the Explore and Documentation sections (Figure 1), offering immediate access to data browsing and methodological descriptions of the database. These shortcuts reduce the number of steps required to begin using the platform, especially for new users.

In addition to navigation elements, the homepage displays a schematic representation of a protein–RNA complex, emphasizing the focus of RNApedia on three-dimensional structures and molecular interaction interfaces. Information on how to cite the database is also provided, ensuring appropriate acknowledgment of RNApedia in scientific publications.

Figure 1. RNApedia homepage
Figure 1
RNApedia homepage. The top navigation bar provides access to core modules: (1) Home, (2) About, (3) Documentation, (4) Download, (5) Explore, (6) Advanced Search, and (7) Contact Analysis Tool. The homepage also includes a (8) quick search box to query by PDB ID or RNApedia ID, and shortcut buttons to (9) Explore the database and (10) open the Documentation page.

2.2 Data download in RNApedia

RNApedia provides open access to all curated datasets through a dedicated Download page, enabling users to retrieve structural, sequence-based, and annotation data for offline analysis and large-scale computational studies. The Download interface (Figure 2) was designed to clearly present the available datasets, their formats, and file sizes, facilitating data reuse and reproducibility.

As shown in Figure 2, each dataset entry is organized into three main columns: file type, file size, and download action. This layout allows users to quickly assess the format and storage requirements of each dataset before initiating a download.

The RNApedia datasets include a comprehensive tabular file listing all curated protein–RNA complexes, complete datasets containing structural files, sequences, and annotations, as well as specialized subsets focused on binding affinity data, RNA-binding structural motifs, and RNA modifications (Figure 2). In addition, RNApedia provides separate FASTA files for protein and RNA sequences, enabling straightforward integration with sequence-based analyses and external bioinformatics pipelines.

All datasets are distributed in standard formats, such as TSV, ZIP, and FASTA, ensuring compatibility with commonly used bioinformatics tools and workflows. This modular organization allows users to download only the datasets relevant to their specific research questions, ranging from exploratory analyses to large-scale modeling and machine learning applications.

Figure 2. RNApedia download page
Figure 2
RNApedia download page. The download interface lists the datasets available in RNApedia, including a tabular summary of protein–RNA complexes, complete curated datasets, and specialized subsets containing affinity data, RNA-binding motifs, and RNA modifications. Each entry displays the file type (1), file size (2), and a direct download option (3). Protein and RNA sequence files are also provided in FASTA format to support sequence-based analyses.

2.3 Advanced Search on RNApedia

Advanced Search in RNApedia enables refined queries to identify protein–RNA complexes, based on various biological, structural, and annotation criteria. The advanced search interface (Figure 3) is designed to offer flexibility in defining queries while maintaining clarity and control over the returned results.

In Figure 3, the user can enter a search term in the main field and start the query using the Search button (1). Searches use partial matching and are case-insensitive, being applied exclusively to the selected field, which ensures greater accuracy in retrieving entries. The interface allows selection of only one search criterion at a time (Figure 3, 3), avoiding ambiguities in interpreting results. Available criteria include RNA sequence, protein sequence, PDB identifier, RNA organism, protein organism, RNA type, RNA modifications, Pfam-annotated RNA-binding domains, and RNA type classification method.

Advanced Search (Figure 3) also allows downloading results in CSV format via the Download CSV button (2). This functionality enables direct reuse of data for statistical analyses, external computational pipelines, and large-scale comparative studies.

Figure 3. RNApedia advanced search interface
Figure 3
RNApedia advanced search interface. The advanced search interface allows the definition of the query term and the selection of a single active search criterion at a time (3). Searches are initiated by pressing the Search button (1), with partial matching and case-insensitivity. Results can be exported in CSV format using the Download CSV option (2). Search criteria include RNA and protein sequences, PDB identifier, organism information, RNA type, RNA modifications, Pfam-annotated binding domains, and RNA type classification method.

After performing a query using Advanced Search, the retrieved protein–RNA complexes are displayed in a dynamic results table (Figure 4), enabling detailed inspection and efficient navigation of the returned data.

As illustrated in Figure 4, the total number of entries found is explicitly displayed (1), allowing the user to assess the scope of the query performed immediately. This information is particularly useful for broad searches, such as those based on general RNA types.

The table interface allows control over the number of entries displayed per page using the Show entries option (2), facilitating navigation in extensive datasets. Furthermore, the table provides column sorting and filtering, allowing the user to refine the results visualization based on specific criteria.

Each row in the results table represents an individual protein–RNA pair and presents essential information, including the RNApedia internal identifier, the PDB ID, the structural resolution, the RNA type, and a brief description of the complex. From this table, the user can access the individual page of each complex through the View option, continuing the structural, sequential, and interaction analysis. In an integrated manner, the results displayed in Figure 4 can be exported in CSV format, enabling reuse in statistical analyses, computational pipelines, and large-scale comparative studies.

Figure 4. Table of results from the advanced search in RNApedia
Figure 4
Table of results from the advanced search in RNApedia. The table presents the protein–RNA complexes retrieved after performing an advanced search. The total number of entries found is explicitly displayed (1), along with options for controlling the number of results displayed per page (2). Each entry includes the RNApedia internal identifier, the PDB ID, the structural resolution, the RNA type, a brief description of the complex, and direct access to individual visualization via the View option.


2.4 Explore

In addition to advanced search, RNApedia offers a general exploration page (Explore), which allows direct navigation through all protein–RNA complexes available in the database. This interface provides a panoramic view of the data, enabling quick inspection, sorting, and filtering of entries (Figure 5).

As illustrated in Figure 5, the exploration table includes a global text search field (1), which allows dynamically filtering the displayed entries based on keywords present in the table columns. This functionality is useful for quickly inspecting recurring terms, without the need to define specific advanced search criteria.

Each row in the table represents an individual protein–RNA pair, identified by an RpID (RNApedia entry ID) (2). The RpID is the unique identifier for each RNApedia entry and consists of an eight-character code in the format:

PDB_ID–RNA_CHAIN–PROTEIN_CHAIN

For example, the identifier 1ABC_A_B indicates a complex derived from the 1ABC structure of the PDB, where strand A corresponds to RNA and strand B to protein. This scheme ensures the unambiguous identification of each protein–RNA pair, even when multiple strands are present in the same structure.

In addition to the RpID, the table presents relevant structural and functional information, including experimental resolution, structure title, RNA size, protein size, RNA type, and the number of protein–RNA contacts identified. The interface also allows controlling the number of entries displayed per page and sorting the results according to different columns, facilitating exploratory data analysis.

By selecting a specific RpID, the user is directed to the individual page of the complex, where structural, sequence, and interaction information is presented in detail. Thus, the Explore page (Figure 5) complements the advanced search functionalities by offering a broad and interactive navigation tool, suitable for initial exploratory analyses and general inspection of the database content.

Figure 5. General exploration page (Explore) of RNApedia
Figure 5
General exploration page (Explore) of RNApedia. The interface presents a table with all protein–RNA complexes available in the database. A global text search field allows dynamic filtering of the displayed entries (1). Each complex is identified by an internal RNApedia ID (2) and accompanied by structural and functional information, including experimental resolution, structure title, RNA size, protein size, RNA type, and number of protein–RNA contacts. The table allows sorting and interactive navigation between entries.

2.5 Homepage of a complex in RNApedia

The individual page for each complex in RNApedia was designed to integrate, in a single environment, structural visualization, experimental information, and detailed quantitative analyses of the protein–RNA interface. Figure 6 shows this page and its main components, described below.

When accessing an RNApedia entry, the user views a specific protein–RNA pair, identified by an RpID (RNApedia ID). This identifier reflects the database's central strategy, which analyzes interactions at the level of individual protein–RNA pairs.

The RpID is composed of the PDB identifier followed by the identifiers of the RNA and protein chains, in the format:

PDB_ID – RNA_CHAIN – PROTEIN_CHAIN
Example: 1H4Q_B_T

(1) Three-dimensional visualization of the complex

(Figure 6.1)

The panel on the right of the page presents the interactive three-dimensional visualization of the protein–RNA complex. The visualization is fully interactive, allowing rotation, zoom, and detailed inspection of the interaction interface. This visualization serves as a structural reference for all quantitative analyses presented on the page. It can be synchronized with selections made in the sequence and contact panels, facilitating integrated data interpretation.

(2) Protein–RNA contact analysis

(Figure 6.2)

The Contacts analysis button provides access to the module for analyzing intermolecular interactions between protein and RNA. In this module, the user can identify hydrogen bonds, hydrophobic, electrostatic, and repulsive interactions; view specific contacts directly in the 3D structure, highlighting the residues and nucleotides involved; and export interaction lists for external analyses.

This feature allows detailed investigation of which protein residues and RNA nucleotides directly participate in the interface, making it particularly useful for functional and comparative studies.

(3) General information about the complex

(Figure 6.3)

The central panel of the page presents the structural and experimental metadata of the complex.

Structural identification

Composition of the original PDB structure

RNApedia explicitly reports the overall composition of the original PDB file:

Important: Although a PDB file can contain multiple chains, RNApedia defines and analyzes interactions at the protein–RNA pair level rather than considering the entire complex.

Protein–RNA interaction metrics

When available, the following are displayed:

Accessible and buried surface area

To characterize the protein–RNA interface, RNApedia calculates surface area metrics using the NACCESS software [6,7]:

These values help understand how the interface is stabilized and the relative roles of polar and hydrophobic interactions.

Interface coverage

These metrics allow comparisons of the relative involvement of each molecule in complex formation.

Ligands and ions

Figure 6. Visualization page of a protein–RNA complex in RNApedia
Figure 6
Visualization page of a protein–RNA complex in RNApedia. The interface presents (1) the interactive three-dimensional visualization of the complex, (2) access to the protein–RNA contact analysis module, and (3) the structural and experimental information panel, including composition of the original PDB, interface metrics, binding affinity data, accessible and buried surface areas, as well as the presence of ligands and ions.

Download options on the complex page

RNApedia offers download options for each protein–RNA complex, allowing users to obtain structural files, sequences, and analytical reports for external analyses. Figure 8 illustrates the download menu available on the complex page, with its components described below.

Downloading the PDB file processed by RNApedia

(Figure 7.1)

The PDB option allows downloading the PDB file corresponding to the protein–RNA pair defined by RNApedia. This file contains only the RNA and protein chains selected for the RpID. It differs from the original PDB by removing chains that are not relevant to the analyzed pair. This standardization ensures consistency between the structural data and the metrics presented on the page.

Accessible Surface Area (ASA) Report

(Figure 7.2)

The ASA report option generates a detailed text file containing the Accessible Surface Area (ASA) calculations performed for the complex.

Download of structural and sequence formats

(Figure 7.3)

The lower block of the download menu brings together different structural and sequence file formats:

These options allow easy integration of RNApedia data into external pipelines for analysis, sequence alignment, structural modeling, or machine learning.

Figure 7. Download menu for a protein–RNA complex page in RNApedia
Figure 7
Download menu for a protein–RNA complex page in RNApedia. The interface allows downloading (1) a processed PDB file containing only the analyzed protein–RNA pair, (2) a detailed accessible surface area (ASA) report, including global and residue-specific metrics, and (3) different structural and sequence formats, such as mmCIF, full PDB, and FASTA.

Accessible Surface Area (ASA) Report

As described previously, RNApedia provides a detailed Accessible Surface Area (ASA) report for each protein–RNA complex. This report is automatically generated and can be accessed through the ASA report option available in the download menu of each complex page.

Global summary of ASA and buried surface area

(Figure 8.1)

The first section of the ASA report presents a global summary of surface area metrics, calculated for the protein–RNA complex and for its individual components.

What is ASA?

Accessible Surface Area (ASA) quantifies the portion of a molecular surface that is exposed to the solvent. In RNApedia, ASA values are calculated separately for:

Additionally, the report distinguishes between contributions from polar atoms and non-polar (hydrophobic) atoms, allowing a detailed characterization of the interaction interface.

Information presented in this section

Together, these measurements provide a quantitative description of the size, composition, and physicochemical nature of the protein–RNA interaction interface.

Per-residue ASA report

(Figure 8.2)

The second section of the file contains the per-residue ASA report, which details the individual contribution of each protein residue to the interaction interface. Each row typically includes residue type (RES), chain identifier, residue number, total ASA, polar ASA, and non-polar ASA.

This level of detail allows users to:

Important note: Although the ASA report contains detailed information about the entire protein surface, RNApedia always interprets these data in the context of specific protein–RNA pairs, as defined by the RpID.
Figure 8. Accessible Surface Area (ASA) report generated by RNApedia
Figure 8
Accessible Surface Area (ASA) report generated by RNApedia. Section (1) presents a global summary of accessible and buried surface areas for the protein–RNA complex, including polar and non-polar contributions and interface coverage. Section (2) shows the per-residue ASA report, detailing the individual solvent accessibility of each protein residue involved in the interaction with RNA.

RNA section

(Figure 9)

In addition to global complex information, RNApedia provides sections dedicated exclusively to RNA and protein, allowing detailed analysis of the sequential, structural, and functional properties of each molecule individually. These sections integrate data derived from external databases, specialized tools, and standardized computational analyses, maintaining consistency with the protein–RNA pair defined by the RpID.

The RNA section presents information on sequence, composition, classification, modifications, and secondary structure, as well as detailed reports accessible for download.

RNA report

(Figure 9.1)

The RNA report option provides a complete report containing detailed analyses of the RNA secondary structure, automatically generated by RNApedia. This report consolidates information from the DSSR [8] and RNAfold tools [9].

RNA identification and classification

(Figure 9.2)

RNA length, modifications, and composition

(Figure 9.3)
Interactive feature: RNA modifications are clickable — selecting a modification highlights its position in the 3D visualization of the complex.

RNA sequence and secondary structure

(Figure 9.4)
Figure 9
RNA section on the entry page of a protein–RNA complex in RNApedia. The interface presents (1) access to the RNA report, (2) RNA identification and classification method (RNAcentral, Infernal, or manual curation), (3) length, chemical modifications, and nucleotide composition, and (4) interactive sequence and secondary structures in dot-bracket notation derived from DSSR and RNAfold.

RNA report – file details

(Figure 10)

The RNA report generated by RNApedia provides an in-depth characterization of the RNA molecule at the level of sequence composition, secondary structure organization, and thermodynamic stability. The file is structured into blocks, each focusing on a specific aspect of RNA architecture.

Structural summary (DSSR)

(Figure 10.1)

The first block corresponds to the global structural summary derived from DSSR. It includes total nucleotides, number of base pairs, helices/stems, hairpins, internal loops, junctions, and special interactions such as wobble pairs (e.g., G–U) and stacking.

Nucleotide composition and base pair classification

(Figure 10.2)

This block includes absolute nucleotide counts (A, C, G, U) and base-pair classification using the Leontis–Westhof scheme, describing interacting edges (Watson–Crick, Hoogsteen, Sugar) and cis/trans orientation.

Structural elements

(Figure 10.3)

Summarizes secondary structure motifs along the sequence, listing structural elements with nucleotide ranges to facilitate mapping between structure and sequence.

Free energy and structural details

(Figure 10.4)

Provides thermodynamic and structural refinement with detailed secondary structure elements and free energy values (ΔG), typically derived from RNAfold predictions. Negative ΔG values indicate energetically favorable conformations.

Figure 11. RNA report generated by RNApedia
Figure 10
RNA report generated by RNApedia. Presents (1) global structural summary of the RNA, (2) nucleotide composition and base pair classification, (3) structural elements such as hairpins, bulges, and junctions, and (4) detailed description of the secondary structure with associated free energies.

Protein section

(Figure 11)

The protein section provides an integrated view of the protein component of each protein–RNA complex, combining sequence information, physicochemical properties, structural annotation, and functional domain mapping. RNApedia also offers a downloadable protein report containing detailed structural analyses.

Protein report

(Figure 11.1)

The Protein report option allows users to download a detailed, automatically generated report summarizing the protein’s secondary structure and functional architecture.

Protein identification and physicochemical properties

This panel summarizes fundamental protein-chain characteristics:

Together, these parameters provide a concise physicochemical profile of the protein, facilitating comparisons across different complexes.

Protein sequence and functional motifs

(Figure 11.2)

The protein sequence is presented in a fully interactive format, tightly integrated with the 3D visualization of the complex. Residues or regions can be selected and synchronized with the three-dimensional structure. Multi-selection is supported (e.g., Ctrl/Cmd for multi-selection, Shift for intervals).

Functional motifs (Pfam) are identified using the Pfam database and displayed directly on the sequence track (Figure 12.3), enabling direct interpretation of functional domains involved in RNA binding.

Figure 11. Protein section on the entry page of a protein–RNA complex in RNApedia
Figure 11
Protein section on the entry page of a protein–RNA complex in RNApedia. The interface features (1) access to the Protein report, (2) the protein sequence displayed interactively and synchronized with the 3D visualization, and (3) protein binding motifs identified from the Pfam database.

Protein report – file details

(Figure 12)

Secondary structure per residue

(Figure 12.1)

Provides residue-by-residue secondary structure assignment computed using STRIDE [14], including secondary structure type, dihedral angles φ and ψ, chain identifier, and residue position.

Global secondary structure statistics

(Figure 12.2)

Includes a quantitative summary of the protein’s overall structural composition, such as percentage and absolute counts of α-helices, β-strands, and coil/loop regions.

Structural motifs and domains (Pfam)

(Figure 12.3)

Lists all functional domains identified via Pfam, including domain name/description, accession number, residue boundaries, and confidence scores (e.g., E-values), enabling correlation between protein architecture and RNA binding.

Figure 12. Protein report automatically generated by RNApedia
Figure 12
Protein report automatically generated by RNApedia. (1) Secondary structure by residue (STRIDE), (2) global secondary structure statistics, and (3) Pfam domains with identifiers, residue ranges, and significance values.

Contact Analysis – Protein–RNA interactions

The Contact Analysis page presents a detailed analysis of molecular interactions between protein and RNA for each protein–RNA pair defined in RNApedia. Interactions are calculated at the atomic level and integrated into an interactive 3D visualization, allowing simultaneous exploration of tabular and structural data.

Description of interface elements

(Figure 13.)

Interaction list (Figure 13.1)

This panel presents the complete list of detected interactions between the protein and RNA. Each row includes:

Interactions can be filtered by category, including:

The list can be exported in CSV format, enabling external quantitative analysis and reproducibility.

3D visualization of interactions (Figure 13.2)

Displays the interactive three-dimensional visualization synchronized with the interaction list. It supports different representation modes (cartoon, stick, sphere), color/opacity adjustment, molecular surface display, and automatic highlighting of residues/atoms when selecting an interaction.

Atomic-level interaction detail (Figure 13.3)

Shows a zoomed inspection of the selected interaction, including precise interatomic distance and interaction type. This is useful for geometric validation and mechanistic studies.

Contact map (Figure 13.4)

The Contact map tool generates a matrix representation of interactions between protein residues and RNA nucleotides, providing a global view of the interface and enabling pattern analyses and comparisons across complexes.

Download interactions (Figure 13.5)

Allows downloading the complete interaction dataset identified for the complex, ensuring direct access to the raw data used in visualization and analysis.

Figure 13. RNApedia contact analysis page
Figure 13
RNApedia contact analysis page. The interface integrates (1) a detailed list of protein–RNA interactions, (2) an interactive 3D visualization, and (3) atomic-level inspection of selected interactions. Additional tools allow the generation of contact maps (4) and the download of interaction data (5).

3. Contact Analysis

Interatomic contacts represent fundamental biochemical interactions that occur between macromolecules and play a central role in stabilizing molecular structures and complexes. In protein–RNA systems, these interactions are particularly important for defining binding specificity, molecular recognition, and functional regulation. Computational identification of contacts allows systematic characterization of these interactions at atomic resolution.

In RNApedia, contacts are defined based on distance-based criteria, the most widely used approach for contact detection, in which two atoms are considered interacting when the distance between them is below a predefined cutoff value [15]. Although individual interactions such as hydrogen bonds or hydrophobic contacts are relatively weak, their cumulative effect is essential for the structural stability of protein–RNA complexes [16,17].

Computational framework for contact detection

To ensure a consistent and chemically accurate description of protein–RNA interactions, RNApedia integrates several computational tools and libraries:

RNA definition and modifications

RNA molecules, including all mapped nucleotide modifications, are explicitly defined using the RDKit library [19], which provides a robust cheminformatics framework for molecular representation.

Atom-type classification

For the four standard ribonucleotides (A, C, G, U), atom types are assigned using the LUNA library [18], which enables a detailed characterization of atomic physicochemical properties relevant to molecular interactions. Atoms are classified according to:

The same classification scheme is applied to modified nucleotides, whose atom properties are computed using RDKit [19] to ensure compatibility with the standard nucleotide definitions.

Contact calculation engine

The COCαDA scripts [20,21] were adapted and extended to incorporate the atom-type classification of nucleotides and their modifications. These modified scripts are then used to calculate interatomic contacts between protein and RNA chains.

Protein–RNA pairs that do not present any detectable contacts are excluded from the database. After this filtering step, the final RNApedia dataset comprises 56,133 protein–RNA interaction entries, each supported by explicit atomic contact information.


3.1 Contact Analysis interface

The contact analysis tool can be accessed directly from an individual RNApedia entry for already calculated RNApedia entries, where the user can submit the complexes they are interested in analyzing, or through the independent Atomic Contact Analysis page. The display layout and interaction logic are consistent with those used throughout the platform, ensuring a uniform user experience.

Entry page for contact analysis

(Figure 14)

Users can start a contact analysis using one of two input options:

After selecting one of these options, the contact calculation is initiated by clicking on Calculate Contacts.

Figure 14. Atomic Contact Analysis input interface in RNApedia
Figure 14
Atomic Contact Analysis input interface in RNApedia. This page allows users to calculate interatomic contacts for protein–RNA complexes using the RNApedia analysis pipeline. (1) Submit a structure file: upload a local structure file in PDB or CIF format. (2) Type a PDB ID: provide a valid PDB identifier automatically retrieved and processed by RNApedia. After selecting one of the input methods, contact calculation is initiated by clicking Calculate Contacts, leading to the interactive results page.

3.2 Contact analysis results page

(Figure 15)

Once the calculation is complete, RNApedia presents the results in an interactive visualization environment composed of several coordinated panels.

Interaction list

(Figure 15.1)

This panel displays a comprehensive table of all detected protein–RNA contacts. For each interaction, the following information is provided:

Contacts are classified into categories such as:

Interactive filters allow users to isolate specific interaction types, and the table can be exported as a CSV file for downstream analysis.

3D visualization

(Figure 15.2)

The central panel shows an interactive three-dimensional representation of the protein–RNA complex. Key features include:

Selecting an interaction in the table automatically highlights the corresponding atoms in the 3D structure, enabling intuitive exploration of contact geometry. This view provides a clear structural interpretation of individual interactions at atomic resolution.

Contact map

(Figure 15.3, Figure 16)

The contact map offers a two-dimensional overview of interactions between protein and RNA residues. Each point represents a detected contact, plotted according to the residue indices of the interacting chains. This representation facilitates the identification of interaction hotspots and global binding patterns.

Download options

(Figure 15.4)

Users may download:

Figure 15. Contact analysis interface in RNApedia
Figure 15
Contact analysis interface in RNApedia. The interface enables interactive exploration of protein–RNA contacts at atomic resolution. (1) Interaction list with detailed annotation and interaction-type filtering. (2) Three-dimensional visualization with synchronized highlighting of selected interactions. (3) Contact map summarizing residue–residue interactions between protein and RNA chains. (4) Download options for exporting interaction data and structural files.
Figure 16. Protein–RNA contact map representation
Figure 16
Protein–RNA contact map representation. This scatter plot summarizes residue–residue contacts identified between the protein and RNA chains in a given complex. Each point corresponds to at least one interatomic contact detected between a residue of Chain A (x-axis, protein) and a residue of Chain B (y-axis, RNA), as defined by the distance cutoff used in the contact analysis pipeline.

Conclusion

RNApedia is presented as a comprehensive and integrated platform for the structural analysis of protein–RNA complexes, combining curated structural data with robust computational tools and interactive visualizations. By integrating sequence, structural, physicochemical, and molecular interaction information into a single environment, RNApedia enables both global overviews and detailed atomic-level investigations of protein–RNA interfaces.

The availability of dedicated RNA and protein reports, together with consolidated metrics such as accessible surface area (ASA), secondary structure assignment, functional motifs, and detailed contact analyses, ensures methodological consistency and comparability across complexes. The RNA classification strategy combining RNAcentral annotations, Infernal-based inference, and manual curation provides broad coverage and reliability, particularly for complexes containing modified or poorly characterized RNAs. Complementarily, protein functional motifs annotated using the Pfam database allow direct association between protein functional regions and their involvement in RNA recognition and binding.

The contact analysis tool represents a key contribution of RNApedia, employing a refined methodology for identifying and classifying interatomic and inter-residue interactions based on physicochemical properties of both standard and modified nucleotides. This approach enables a more realistic description of the forces stabilizing protein–RNA complexes, going beyond purely geometric distance-based criteria. The combined use of interaction tables, three-dimensional visualizations, and contact maps provides multiple complementary perspectives, facilitating biological interpretation and downstream computational applications, including structural modeling and machine learning.

Overall, RNApedia establishes itself as a valuable resource for the structural biology and bioinformatics communities, supporting advances in the understanding of protein–RNA interactions and providing a high-quality, large-scale dataset suitable for functional, comparative, and data-driven studies.


References

  1. Hamelryck, T. & Manderick, B. PDB file parser and structure class implemented in Python. Bioinformatics 19, 2308–2310 (2003).
  2. Fierro-Monti, I. RBPs: an RNA editor’s choice. Front. Mol. Biosci. 11 (2024).
  3. Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 47, 2977–2980 (2004).
  4. Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind Database: Methodologies and Updates. J. Med. Chem. 48, 4111–4119 (2005).
  5. Harini, K., Srivastava, A., Kulandaisamy, A. & Gromiha, M. M. ProNAB: database for binding affinities of protein–nucleic acid complexes and their mutants. Nucleic Acids Res. 50, D1528–D1534 (2022).
  6. Hubbard, S. J. & Thornton, J. M. NACCESS computer program. Department of Biochemistry and Molecular Biology, University College London (1993).
  7. Ding, J. & Arnold, E. NACCESS. In: International Tables for Crystallography, 685–694 (2006).
  8. Lu, X.-J., Bussemaker, H. J. & Olson, W. K. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 43, e142 (2015).
  9. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
  10. The RNAcentral Consortium. RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res. 47, D221–D229 (2019).
  11. The RNAcentral Consortium. RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res. 43, D123–D129 (2015).
  12. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 45, D128–D134 (2017).
  13. Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
  14. Heinig, M. & Frishman, D. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32, W500–W502 (2004).
  15. Silveira, C. H. da. Protein cutoff scanning: aplicação da varredura exaustiva de distâncias inter-resíduos na análise de contatos intracadeia em proteínas globulares. (2008).
  16. Silva, M. F. M. et al. Proteingo: Motivation, user experience, and learning of molecular interactions in biological complexes. Entertain. Comput. 29, 31–42 (2019).
  17. Fassio, A. V., Santos, L. H., Silveira, S. A., Ferreira, R. S. & de Melo-Minardi, R. C. nAPOLI: A Graph-Based Strategy to Detect and Visualize Conserved Protein-Ligand Interactions in Large-Scale. IEEE/ACM Trans. Comput. Biol. Bioinform. 17, 1317–1328 (2020).
  18. Fassio, A. V. et al. Prioritizing Virtual Screening with Interpretable Interaction Fingerprints. J. Chem. Inf. Model. 62, 4300–4318 (2022).
  19. Landrum, G. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling.
  20. Lemos, R. P., Mariano, D., Silveira, S. D. A. & de Melo-Minardi, R. C. COCαDA - a fast and scalable algorithm for interatomic contact detection in proteins using Cα distance matrices. Front. Bioinforma. 5 (2025).
  21. Lemos, R. P., Mariano, D., Silveira, S. A. & de Melo-Minardi, R. C. COCαDA - Large-Scale Protein Interatomic Contact Cutoff Optimization by Cα Distance Matrices. In: Simpósio Brasileiro de Bioinformática (BSB), 59–70 (SBC, 2024). doi:10.5753/bsb.2024.245545.

How to cite

If you use RNApedia, please cite the following reference:

BASTOS, Luana Luiza et al.
A strategy for refining the calculation of contacts in protein–RNA complexes.
In: Brazilian Symposium on Bioinformatics (BSB). SBC, 2024. pp. 241–246.


Additional citations

If RNApedia is used, please also cite the following references, as appropriate: