You can install the development version of SeedMatchR from GitHub or the stable build from CRAN.
This example uses the siRNA sequence, D1, targeting the Ttr gene in rat liver from the publication:
Schlegel MK, Janas MM, Jiang Y, Barry JD, Davis W, Agarwal S, Berman D, Brown CR, Castoreno A, LeBlanc S, Liebow A, Mayo T, Milstein S, Nguyen T, Shulga-Morskaya S, Hyde S, Schofield S, Szeto J, Woods LB, Yilmaz VO, Manoharan M, Egli M, Charissé K, Sepp-Lorenzino L, Haslett P, Fitzgerald K, Jadhav V, Maier MA. From bench to bedside: Improving the clinical safety of GalNAc-siRNA conjugates using seed-pairing destabilization. Nucleic Acids Res. 2022 Jul 8;50(12):6656-6670. doi: 10.1093/nar/gkac539. PMID: 35736224; PMCID: PMC9262600.
The guide sequence of interest is 23 bp long and oriented 5’ -> 3’.
The required inputs to SeedMatchR
are a DESeq2 results
data frame in addition to species specific annoation data such as GTF,
2bit DNA, and 3’ UTR sequences.
SeedMatchR
makes extensive use of
AnnotationDB
objects to help access genomic information in
a reproducible manner. The required annotations are:
res
: a data frame of DESEQ2 results.GTF
: gene transfer file containing species specific
genomic information for gene bodies. This is used to derive the list of
3’ UTRs and other features used in the analysis. This is also used to
map transcript IDs to gene IDs.DNAStringSet
: A
DNAStringSet
object of sequences for each of the features
of interest. The features must be named according to the transcript they
were derived from. Examples include those generated by
GenomicFeatures::extractTranscriptSeqs()
paired with
functions like
GenomicFeatures::threeUTRsByTranscript()
.The function load_species_anno_db()
has built in
annotation data for human, rat, and mouse annotations. We can load the
species specific annotations using the following approach:
We will use the annotations to derive the features and feature sequences that we want to scan for each gene.
SeedMatchR
assumes that you will be performing your
analysis on DESEQ2 results outputs. The first step is to load your
DESEQ2 results file as a data frame.
The test data that is provided with SeedMatchR
was
derived from the 2022 publication by Schlegel et al. The data set
represents a DESeq2 analysis performed on rat liver that had been
treated with Ttr targeting siRNA. We will use this example to explore
seed mediated activity. The data set name is long, so it will be renamed
to res
.
We start by downloading the example data set. This function will download three files from the GEO accession GSE184929. These files represent three samples with different siRNA treatments at two dosages.
We can load the example data into the environment.
The DESeq2 results are available through the names
Schlegel_2022_Ttr_D1_30mkg
,
Schlegel_2022_Ttr_D4_30mkg
and
Schlegel_2022_Ttr_D1_10mkg
. The data set name is long, so
it will be renamed to res
.
The DESeq2 results file is then filtered. The function
filter_deseq()
can be used to filter a results file by
log2FoldChange, padj, baseMean, and remove NA entries.
Use the plot_seeds()
function to visualize the available
SeedMatchR options for your input sequence of interest. The only input
to plot_seeds()
is your input siRNA sequence of interest.
This function assumes that you are using a RNA input.
You can extract the seed sequence information from the siRNA input
sequence using the get_seed()
function. The inputs to the
get_seed()
function in the siRNA sequence of interest and
the name of the seed.
You can perform a seed match for a single seed using the
SeedMatchR()
function.
You can perform seed matching for all available seeds using a for loop. The results will be appended as a new column to the results data frame.
for (seed in c("mer8", "mer6", "mer7A1")){
res <- SeedMatchR(res,
anno.db$gtf,
features$seqs,
guide.seq,
seed.name = seed)
}
head(res)
You can also allow for inexact seed matches in your analysis with the
mismatches
and indels
arguments. The names can
be adjusted to reflect the arguments using the col.name
argument.
for (indel.bool in c(TRUE, FALSE)){
for (mm in c(0,1,2)){
for (seed in c("mer7m8", "mer8", "mer6", "mer7A1")){
res <- SeedMatchR(res,
anno.db$gtf,
features$seqs,
guide.seq,
seed.name = seed,
col.name = paste0(seed, ".", "mm", mm, "_indel", indel.bool),
mismatches = mm,
indels = indel.bool)
}
}
}
head(res)
Many factors that perturb gene expression, like miRNA, show cumulative changes in their targets gene expression. Cumulative changes in the profile of genes expression can be visualized and tested with the emperical distribution function (ecdf) coupled with a statistical test such as the Kolmogorov-Smirnov test.
SeedMatchR
provides functions for comparing the
log2(Fold Change) of two gene sets. The function
deseq_fc_ecdf
is designed to work directly with a DESeq2
results data frame.
Required Inputs:
res
: DESeq2 results data framegene.lists
: A list of lists containing gene names# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8.mm0_indelFALSE >= 1 & res$mer8.mm0_indelFALSE ==0]
# Gene set 2
mer8.list = res$gene_id[res$mer8.mm0_indelFALSE >= 1]
background.list = res$gene_id[res$mer7m8.mm0_indelFALSE == 0 & res$mer8.mm0_indelFALSE == 0]
ecdf.results = de_fc_ecdf(res,
list("Background" = background.list, "mer8" = mer8.list, "mer7m8" = mer7m8.list),
stats.test = "KS",
factor.order = c("Background", "mer8", "mer7m8"),
null.name = "Background",
target.name = "mer8",
alternative = "greater")
ecdf.results$plot
SeedMatchR
to explore potential small activating
RNA effects# Group transcripts by gene
sequences <- transcriptsBy(anno.db$tx.db, by="gene")
# Extract promoter sequences from tx.db object
prom.seq = getPromoterSeq(sequences,
anno.db$dna,
upstream=2000,
downstream=100)
# perform a seed search of the promoter sequences. Set tx.id.col to F to use gene annotations
res = SeedMatchR(res, anno.db$gtf, prom.seq@unlistData, guide.seq, tx.id.col = FALSE, col.name = "promoter.mer7m8")
# Find the genes with matches
promoterWseed = res$gene_id[res$promoter.mer7m8 >= 1]
# Generate the background list of genes
background.list = res$gene_id[!(res$gene_id %in% promoterWseed)]
# Plot ecdf results for promoter matches with stats testing
ecdf.results = de_fc_ecdf(res,
title = "Ttr D1 30mkg",
list("Background" = background.list,
"Promoter w/ mer7m8" = promoterWseed),
stats.test = "KS",
factor.order = c("Background",
"Promoter w/ mer7m8"),
null.name = "Background",
target.name = "Promoter w/ mer7m8",
alternative = "less",
palette = c("black", "#d35400"))
ecdf.results$plot