Refined False Discovery Rate (FDR) Calculation
refined_fdr.Rd
This function computes refined False Discovery Rate (FDR) estimates using a competitive approach between target and decoy identifications. It provides three types of FDR calculations: FDRn, FDRp, and FDRr, which adjust for different competitive scenarios between targets and decoys.
Arguments
- data
A data frame containing the identification data, including columns for the reference level, a score, and whether each identification is a decoy (
isDecoy
).- levelRef
The column name containing the reference level for each identification (e.g., protein or gene reference). This should be an unquoted column name.
- score
The column name of the score used to rank the identifications. This should be an unquoted column name.
- lower_better
A logical value indicating whether lower scores are better (default is
TRUE
).- affix
A string indicating the suffix of prefix used to identify decoy entries in the reference level column. Default is
"_REVERSED"
.
Value
A data frame with the original data and additional columns for the refined FDR estimates:
- FDRn
Normal FDR estimation as cumulative minimum (q-value).
- FDRp
Picked FDR estimation as cumulative minimum (q-value).
- FDRr
Refined FDR estimationas cumulative minimum (q-value).
- to
Target-only identifications count.
- do
Decoy-only identifications count.
- td
Count of identifications with the same target and decoy scores.
- tb
Target-best identifications count.
- db
Decoy-best identifications count.
See also
Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics for more information on the refined FDR estimation.
Examples
# Example usage with a sample dataset
sample_data <- data.frame(
proteinRef = c("P1", "P1_REVERSED", "P2", "P3", "P3_REVERSED"),
score = c(0.1, 0.2, 0.3, 0.5, 0.4),
isDecoy = c(FALSE, TRUE, FALSE, FALSE, TRUE)
)
refined_fdr(sample_data, levelRef = proteinRef, score = score, lower_better = TRUE)
#> # A tibble: 5 × 11
#> proteinRef score isDecoy to do td tb db FDRn FDRp FDRr
#> <chr> <dbl> <lgl> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 P1 0.1 FALSE 1 0 0 0 0 0 0 0
#> 2 P1_REVERSED 0.2 TRUE 0 0 0 1 0 0.5 0 0
#> 3 P2 0.3 FALSE 1 0 0 1 0 0.5 0 0
#> 4 P3_REVERSED 0.4 TRUE 1 1 0 1 0 0.667 0.5 0.5
#> 5 P3 0.5 FALSE 1 0 0 1 1 0.667 0.5 0.667