b10prot is an R package designed for the analysis of proteomics data, specifically focusing on protein identification. It is developed as part of the EhuB10 initiative, a collaborative effort between several research groups from the University of the Basque Country (UPV/EHU). The name b10prot is a reference to bioinformatics and proteomics, with “b10” representing “bio” in a way that reflects both biology and binary code.
This package is built with the aim of simplifying the integration of our latest research into proteomics data analysis workflows. It works with data in a “tidy” format, following principles similar to those of the tidyverse.
Key Features
-
Protein Inference using the PAnalyzer algorithm:
-
panalyzerruns the PAnalyzer algorithm on peptide-to-protein data. -
plot_groupsplots PAnalyzer protein groups composition.
-
-
Rank Identifications using the LPGF score:
-
lpgcalculates the different LP Gamma (LPG) scores, including the recommended LPGF score. -
plot_rankplots decoy scores vs their rank to check for an uniform distribution.
-
-
FDR Estimation including the refined FDRr technique:
-
target_decoy_approachcalculates p-values and q-values based on the traditional target-decoy approach. -
refined_fdrcomputes different FDR estimations using a competitive approach between target and decoy identifications.
-
Identification Workflow
The b10prot package includes a set of functions (with the iwf_ prefix) specifically designed to streamline the protein identification workflow. These functions are designed to work with data in a “tidy” format, so it should be organized in a way that each type of observation is stored in its own column and each row represents a single observation.
This workflow is based on two main types of data:
-
Identification Lists containing a list of identifications with their scores:
-
iwf_load_psmsloads PSMs from mzIdentML files. -
iwf_psm2pepaggregates PSMs into peptides. -
lpgcollapses relationships into a list of identifications including LPG scores.
-
-
Identification Relationships between lower-level (e.g., peptide) and higher-level (e.g., protein) identifications:
-
iwf_pep2levelmaps peptides to the specified level. -
iwf_groupingperforms protein grouping based on peptide-to-protein relations. -
iwf_pep2groupcreates peptide-to-group relations from protein grouping relations.
-
Installation
You can install the development version of b10prot from GitHub with:
# install.packages("devtools")
devtools::install_github("akrogp/b10prot")Tutorials
You can learn more in vignette("b10prot").