Package 'SeedMatchR' reference manual

Title:	Find Matches to Canonical SiRNA Seeds in Genomic Features
Description:	On-target gene knockdown using siRNA ideally results from binding fully complementary regions in mRNA transcripts to induce cleavage. Off-target siRNA gene knockdown can occur through several modes, one being a seed-mediated mechanism mimicking miRNA gene regulation. Seed-mediated off-target effects occur when the ~8 nucleotides at the 5’ end of the guide strand, called a seed region, bind the 3’ untranslated regions of mRNA, causing reduced translation. Experiments using siRNA knockdown paired with RNA-seq can be used to detect siRNA sequences with potential off-target effects driven by the seed region. 'SeedMatchR' provides tools for exploring and detecting potential seed-mediated off-target effects of siRNA in RNA-seq experiments. 'SeedMatchR' is designed to extend current differential expression analysis tools, such as 'DESeq2', by annotating results with predicted seed matches. Using publicly available data, we demonstrate the ability of 'SeedMatchR' to detect cumulative changes in differential gene expression attributed to siRNA seed regions.
Authors:	Tareian Cazares [aut, cre] , Gulcin Ozer [aut] , Jibo Wang [aut], Rick Higgs [aut], Eli Lilly and Company [cph]
Maintainer:	Tareian Cazares <[email protected]>
License:	MIT + file LICENSE
Version:	1.1.1
Built:	2025-03-09 05:29:52 UTC
Source:	https://github.com/tacazares/seedmatchr

Check if input gene lists overlap

Description

Check if input gene lists overlap

Usage

check_gene_list_overlap(gene.lists)
check_gene_list_overlap(gene.lists)

Arguments

gene.lists

A list of gene lists. example: list(c("gene1", "gene2"), c("gene1"))

Value

Warning if gene sets overlap

Examples

# Overlap
check_gene_list_overlap(list(c("gene1", "gene2"), c("gene1")))

#No overlap
check_gene_list_overlap(list(c("gene1", "gene2"), c("gene3")))
# Overlap
check_gene_list_overlap(list(c("gene1", "gene2"), c("gene1")))

#No overlap
check_gene_list_overlap(list(c("gene1", "gene2"), c("gene3")))

Plot the ECDF for differential expression analysis log2(Fold Changes)

Description

This functions will take differential expression results, such as from DESEQ2, as a data.frame and plot the ecdf for the input gene.lists.

The gene sets to plot should be provided as a list of lists.

Example:

gene.lists = list("Background" = c("gene1", "gene2"), "Target" = c("gene2", "gene3"), "Overlap" = c("gene2"))

This function will also perform statistical testing if plot.hist is TRUE. The output will be saved to a PDF if an output.filename is provided.

Users can define the groups that are to be compared in the statistical test using the null.name and target.name arguments. The names must be found in gene.lists. The factor.order is used to order the groups in the analysis.

This functions returns:

⁠$plot⁠: The ECDF plot
⁠$stats⁠: The stats results object

Usage

de_fc_ecdf(
  res,
  gene.lists,
  title = "ECDF",
  output.filename = NULL,
  palette = SeedMatchR.palette,
  factor.order = NULL,
  x.lims = c(-1, 1),
  stats.test = c("KS", "Wilcoxen"),
  alternative = c("greater", "less", "two.sided"),
  null.name = 1,
  target.name = 2,
  height = 5,
  width = 5,
  dpi = 320,
  l2fc.col = "log2FoldChange"
)
de_fc_ecdf(
  res,
  gene.lists,
  title = "ECDF",
  output.filename = NULL,
  palette = SeedMatchR.palette,
  factor.order = NULL,
  x.lims = c(-1, 1),
  stats.test = c("KS", "Wilcoxen"),
  alternative = c("greater", "less", "two.sided"),
  null.name = 1,
  target.name = 2,
  height = 5,
  width = 5,
  dpi = 320,
  l2fc.col = "log2FoldChange"
)

Arguments

`res`	The DESeq2 results dataframe
`gene.lists`	A nest list of gene names. Example: gene.lists = list("Background" = gene.list2, "Target" = gene.list1, "Overlap" = gene.list3)
`title`	The tile of the plot
`output.filename`	If the output filename is provided, then the plot is saved.
`palette`	The color palette to use for your curves
`factor.order`	The order to use for the legends
`x.lims`	The xlimits range
`stats.test`	The statistic test to use. Options: KS, Kuiper, DTS, CVM, AD, Wass
`alternative`	The alternative hypothesis to test. Options: greater, less, two.sided
`null.name`	The name in the gene.list to use as the null for ecdf plots
`target.name`	The name in the gene.list to use as the target for ecdf plots
`height`	Plot height in inches
`width`	Plot width in inches
`dpi`	The dpi resolution for the figure
`l2fc.col`	The name of the column containing log2FoldChange values. Based on DESEQ2 names as default.

Value

A ggplot object for the ECDF plot

Examples


library(dplyr)

guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, "mer7m8")

# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8 >= 1]

# Gene set 2
background.list = res$gene_id[!(res$mer7m8 %in% mer7m8.list)]

ecdf.results = de_fc_ecdf(res,
list("Background" = background.list, "mer7m8" = mer7m8.list),
stats.test = "KS",
factor.order = c("Background", "mer7m8"),
null.name = "Background",
target.name = "mer7m8")

library(dplyr)

guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, "mer7m8")

# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8 >= 1]

# Gene set 2
background.list = res$gene_id[!(res$mer7m8 %in% mer7m8.list)]

ecdf.results = de_fc_ecdf(res,
list("Background" = background.list, "mer7m8" = mer7m8.list),
stats.test = "KS",
factor.order = c("Background", "mer7m8"),
null.name = "Background",
target.name = "mer7m8")

Download and parse DESeq2 output from GSE184929

Description

Download and parse DESeq2 output from GSE184929

Usage

download_parse_file(download.path, output.path)
download_parse_file(download.path, output.path)

Arguments

`download.path`	File url to be downloaded
`output.path`	Filename used for saving downloaded file

Value

DESeq2 results as a data.frame.

Examples


download_parse_file()

download_parse_file()

Test for differences in log2(Fold Change) ECDFs between two gene lists using the `stats` package

Description

This function uses the stats package to test the ECDF of log2(Fold Changes) between two groups based on DESeq2 analysis.

The inputs of this function are a differential expression results data.frame and two sets of gene IDs called gene.list1 and gene.list2. The functions will look for a column called log2FoldChange in the dataframe.

Usage

ecdf_stat_test(
  res,
  gene.list1,
  gene.list2,
  stats.test = c("KS", "Wilcoxen"),
  alternative = "greater",
  l2fc.col = "log2FoldChange"
)
ecdf_stat_test(
  res,
  gene.list1,
  gene.list2,
  stats.test = c("KS", "Wilcoxen"),
  alternative = "greater",
  l2fc.col = "log2FoldChange"
)

Arguments

`res`	Input results file data frame
`gene.list1`	Gene list 1: Usually null distribution
`gene.list2`	Gene list 2: Target set of genes
`stats.test`	Stats test to use. Options: KS or Wilcoxen
`alternative`	The alternative hypothesis to test. Options: greater, less, two.sided
`l2fc.col`	The name of the column containing log2FoldChange values. Based on DESEQ2 names as default.

Value

A vector containing the dstat and pvalue

Examples


library(dplyr)

guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, "mer7m8")

# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8 >= 1]

# Gene set 2
background.list = res$gene_id[!(res$mer7m8 %in% mer7m8.list)]

ecdf.res = ecdf_stat_test(res, mer7m8.list, background.list)

library(dplyr)

guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, "mer7m8")

# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8 >= 1]

# Gene set 2
background.list = res$gene_id[!(res$mer7m8 %in% mer7m8.list)]

ecdf.res = ecdf_stat_test(res, mer7m8.list, background.list)

Filter DESEQ2 Results for SeedMatchR

Description

Filter DESeqDataSet results for use with seed matching and counting functions.

The filtering criteria are:

Filter out genes that are not expressed or counted at all: baseMean = 0 & pvalue = NA & log2FoldChange = NA

Filter out genes that are expressed, but there is not difference across groups: log2FoldChange = 0

Filter out genes with extreme outliers: pvalue = NA and padj = NA

Filter out genes that have been excluded by independent filtering. padj = NA

Filter results by the fdr.cutoff

Filter the results by the log2FoldChange

Filter the results by the baseMean

Remove NA gene_ids and log2FoldChange values

Usage

filter_deseq(
  res,
  fdr.cutoff = 1,
  fc.cutoff = 0,
  rm.na.name = FALSE,
  rm.na.log2fc = FALSE,
  baseMean.cutoff = 0
)
filter_deseq(
  res,
  fdr.cutoff = 1,
  fc.cutoff = 0,
  rm.na.name = FALSE,
  rm.na.log2fc = FALSE,
  baseMean.cutoff = 0
)

Arguments

`res`	The DESEQ2 results as a data frame
`fdr.cutoff`	The false discovery rate cutoff to use.
`fc.cutoff`	The fold change cutoff to use. The absolute value will be used as the cutoff and values greater-than-or-equal-to will be kept.
`rm.na.name`	Remove na values from the gene_name column
`rm.na.log2fc`	Remove na values from the log2FoldChange column
`baseMean.cutoff`	The minimum baseMean expression cutoff

Value

A modified DESEQ2 results table that has been filtered

Examples


# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

# Load test data
get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

Download example DESeq2 data from GEO

Description

This function will download data that can be used for SeedMatchR. Choosing 'sirna' will download 3 DESeq2 results files from GSE184929. Choosing 'mirna' will download the miRDB database as a tsv.

Usage

get_example_data(example.type)
get_example_data(example.type)

Arguments

example.type

Name of the example to load. Options: sirna, mirna

Value

None?

Examples


get_example_data()

get_example_data()

Get transcripts features and feature sequences

Description

This function is used to get the genomic features of interest and the DNA sequences associated with them. This function takes advantage of the GenomicFeatures package functions threeUTRsByTranscript, fiveUTRsByTranscript, exonsBy, intronsByTranscript, and cdsBy. These functions are used to generate the features given an input tx.db object. A 2bit dna input is also required for extracting features sequences.

The output of the this function is:

⁠$db⁠: the feature GRanges object
⁠$seqs⁠: DNAStringSet of sequences associated to those features

Usage

get_feature_seqs(tx.db, dna, feature.type = "3UTR")
get_feature_seqs(tx.db, dna, feature.type = "3UTR")

Arguments

`tx.db`	A tx.db object
`dna`	A 2bit dna sequence
`feature.type`	The type of feature to return. Options: 3UTR, 5UTR, exons, introns, cds

Value

list containing the feature db object and the feature sequences

Examples


anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

Get the target seed sequence given a canonical seed name and input sequence

Description

Given a sequence greater than 8 bp oriented 5' -> 3' and a seed definition, this function will return an object containing seed-specific sequence information. Users can input a custom seed name, but must provide the start position (start.pos) and stop position (stop.pos) that define the range of the seed sequence.

Built-in options: mer8, mer7A1, mer7m8, mer6

Note: The seed definitions mer8 and mer7A1 force a U at position g1. This results in an A in the target sequence being searched.

Usage

get_seed(guide.seq, seed.name = "mer7m8", start.pos = 1, stop.pos = 8)
get_seed(guide.seq, seed.name = "mer7m8", start.pos = 1, stop.pos = 8)

Arguments

`guide.seq`	A character string greater than 8 bp and oriented 5'-> 3'.
`seed.name`	The seed name of interest. Options: mer8, mer7A1, mer7m8, mer6. If not in the default list, the start.pos and stop.pos arguments will be used to define the seed.
`start.pos`	The start position for a custom seed definition
`stop.pos`	The stop position for a custom seed definition

Value

An object with the entries:

Guide: Input guide sequence. Input is expected to be RNA.
Seed.Name: The seed name.
Seed.Seq.RNA: The seed sequence as a RNAString
Seed.Seq.DNA: The seed sequence as a DNAString
Target.Seq: The target DNA sequence based on the reverse complement of the seed as a DNAString

Examples

# Example Ttr from Schlegel et al. 2022
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

# Get seed match
seed.seq = get_seed(guide.seq, "mer7m8")
# Example Ttr from Schlegel et al. 2022
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

# Get seed match
seed.seq = get_seed(guide.seq, "mer7m8")

Load example DESeq2 data into the environment

Description

Load example DESeq2 data into the environment

Usage

load_example_data(example.type)
load_example_data(example.type)

Arguments

example.type

Name of the example to load. Options: sirna, mirna

Value

Loads either the Schlegel 2022 RNAseq data or miRDB into the environment.

Examples


load_example_data()

load_example_data()

Load species specific `AnnotationDb`

Description

Use AnnotationHub to load species-specific GTF and 2bit DNA sequences. This function currently works for human, rat, and mouse.

The function will return:

⁠$gtf⁠: A GRanges object containing the GTF information
⁠$tx.db⁠: A tx.db object made from the GTF
⁠$dna⁠: The 2bit DNA sequence as a DNAStringSet

Usage

load_species_anno_db(species.name, remove.na.rows = TRUE)
load_species_anno_db(species.name, remove.na.rows = TRUE)

Arguments

`species.name`	Species name. Options: human, rat, mouse
`remove.na.rows`	Remove rows with NA in the gene_id column

Value

Species specific AnnotationDb

Examples


anno.db = load_species_anno_db("human")

anno.db = load_species_anno_db("human")

Plot the Guide Strand with different optional seeds

Description

Plot the Guide Strand with different optional seeds

Usage

plot_seeds(guide.seq)
plot_seeds(guide.seq)

Arguments

guide.seq

Guide a.k.a anti-sense sequence oriented 5' > 3'. Sequence must be greater than 8 bp.

Value

A msaggplot of the guide sequence in addition to the available seed sequences

Examples

library(msa)

# Ttr siRNA sequence
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

# generate seed plot
plotted.seeds = plot_seeds(guide.seq)
library(msa)

# Ttr siRNA sequence
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

# generate seed plot
plotted.seeds = plot_seeds(guide.seq)

Find seed matches in genomic features

Description

Find seed matches in a DNAStringSet object of sequences. This function will use get.seed extract the seed sequence from the guide sequence. The seed is then searched across all rows of the DNAStringSet object using vpatterncount.

This function returns the input DESeq2 results data.frame with an additional column that contains the counts for the input seed.name.

Usage

SeedMatchR(
  res,
  gtf,
  seqs,
  sequence,
  seed.name = "mer7m8",
  col.name = NULL,
  mismatches = 0,
  indels = FALSE,
  tx.id.col = TRUE
)
SeedMatchR(
  res,
  gtf,
  seqs,
  sequence,
  seed.name = "mer7m8",
  col.name = NULL,
  mismatches = 0,
  indels = FALSE,
  tx.id.col = TRUE
)

Arguments

`res`	A DESeq2 results `data.frame`
`gtf`	GTF file used to map features to genes. The object must have columns transcript_id and gene_id
`seqs`	The `DNAStringSet` object with sequence information for features. The names of the sequences should be the transcript names.
`sequence`	The `DNAString` guide sequence oriented 5' > 3'.
`seed.name`	The name of specific seed to extract. Options are: mer8, mer7A1, mer7m8, mer6
`col.name`	The string to use for the column name. Defaults to seed name
`mismatches`	The number of mismatches to allow in search
`indels`	Whether to allow indels in search
`tx.id.col`	Use the transcript_id column instead of gene_id

Value

A modified DESeq2 results dataframe that has column named after the seed of choice representing the number of match counts.

Examples


library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")

library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")

Package 'SeedMatchR'

Help Index

Check if input gene lists overlap

Description

Usage

Arguments

Value

Examples

Plot the ECDF for differential expression analysis log2(Fold Changes)

Description

Usage

Arguments

Value

Examples

Download and parse DESeq2 output from GSE184929

Description

Usage

Arguments

Value

Examples

Test for differences in log2(Fold Change) ECDFs between two gene lists using the stats package

Description

Usage

Arguments

Value

Examples

Filter DESEQ2 Results for SeedMatchR

Description

Usage

Arguments

Value

Examples

Download example DESeq2 data from GEO

Description

Usage

Arguments

Value

Examples

Get transcripts features and feature sequences

Description

Usage

Arguments

Value

Examples

Get the target seed sequence given a canonical seed name and input sequence

Description

Usage

Arguments

Value

Examples

Load example DESeq2 data into the environment

Description

Usage

Arguments

Value

Examples

Load species specific AnnotationDb

Description

Usage

Arguments

Value

Examples

Plot the Guide Strand with different optional seeds

Description

Usage

Arguments

Value

Examples

Find seed matches in genomic features

Description

Usage

Arguments

Value

Examples

Test for differences in log2(Fold Change) ECDFs between two gene lists using the `stats` package

Load species specific `AnnotationDb`