gGraph-constructor.Rdsimplify
simplify
reduce
new()All purpose constructor of gGraphs from nodes, edges, junctions or various input formats (JaBbA, Weaver, etc)
gGraph$new(
genome = NULL,
breaks = NULL,
juncs = NULL,
alignments = NULL,
prego = NULL,
jabba = NULL,
cougar = NULL,
weaver = NULL,
remixt = NULL,
rck = NULL,
walks = NULL,
nodes = NULL,
edges = NULL,
nodeObj = NULL,
edgeObj = NULL,
meta = NULL,
verbose = FALSE
)genomeSeqinfo or object coercible to seqinfo
breaksGRanges whose endpoints specify breakpoints in the genome
juncsJunction object or GRangesList coercible to Junction
pregoPREGO output directory path
jabbaJaBbA graph rds file
cougarCouGar output directory path
weaverWeaver output directory path
remixtRemiXT output directory path
rckRCK output directory path
walks()Exhaustively generates walks (if greedy = FALSE) or otherwise applies a greedy heuristic (greedy = TRUE) returns a gWalk object tied to this graph warning: greedy = FALSE will not scale to large graphs (i.e. may even hang on a couple of hundred nodes, depending on the topology)
set()set metadata of gGraph right now mainly useful for gTrack defaults such as "name" or "colormaps", and also used for setting default "by" field for simplify but can be used for configuring other settings in the future (note that this is graph level and not gNode or gEdge level metadata)
queryLookup()Returned a data.table of the provided snode.ids, their indicies and the indicies of their reverse complements in the graph. data.table is keyed on snode.id.
disjoin()disjoins (i.e. collapses) all overlapping nodes in graph (subject to "by" argument), and aggregates node and edge metadata among them using FUN modifies the current graph optional input gr will first concatenate a reference graph with GRanges gr prior to disjoining collapse argument (if TRUE) will output a graph where there is a single node per reference interval and if collapse = FALSE will only disjoin all the nodes in the graph but keep all overlapping nodes separate i.e. so that overlapping graphs are composed of a common set of disjoint intervals, but we allow there to be several instances of a given interval among the different graphs
gGraph$disjoin(
gr = NULL,
by = NULL,
collapse = TRUE,
na.rm = TRUE,
avg = FALSE,
sep = ",",
FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg)
)grGRanges around which to disjoin the graph
bymetadata field of current graph around which to limit disjoining
collapselogical scalar specifying whether to collapse graph nodes after disjoining
na.rmlogical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avglogical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUNfunction which should take (numeric or character) x and na.rm = TRUE and return a scalar value
simplify()Simplifies gGraph by collapsing reference adjacent nodes that lack a junction or loose end (ignore.loose = FALSE) between them.
Takes an optional "by" column. If by is not NULL simplify will only collapse adjacent nodes if they share metadata in the columns specified by "by"
gGraph$simplify(
by = private$pmeta$by,
na.rm = TRUE,
avg = TRUE,
sep = ",",
FUN = default.agg.fun.generator(na.rm = na.rm, sep = sep, avg = avg),
ignore.loose = FALSE
)bymetadata field of current graph around which to limit simplification
na.rmlogical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avglogical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUNfunction which should take (numeric or character) x and na.rm = TRUE and return a scalar value
reduce()Reduces graph which is $disjoin() followed by a simplify()$ i.e. collapsing overlapping nodes, then merging adjacent ones subject to (optional) matching on some metadata field
bymetadata field of current graph around which to limit reduction (i.e. disjoining and simplification)
na.rmlogical scalar specifying whether to remove NA's when aggregating metadata after collapsing
avglogical scalar specifying whether to average (if TRUE) or sum (if FALSE) numeric metadata during aggregation (default = FALSE)
FUNfunction which should take (numeric or character) x and na.rm = TRUE and return a scalar value
subgraph()compute subgraph within a certain distance or degree of separation of (all nodes) intersection given GRanges "seed" window win
clusters()Marks nodes in graph with metadata field $cluster based on one of several algorithms, selected by mode If i and j are specified, graph is first subsetted then clusters computed, then cluster ids are lifted back to mark the original graph.
inode filter to apply to graph prior to clustering
jedge filter to apply to graph prior to clustering
weakcharacter scalar that can take one of the following possible values - "weak" or "strong" specifying weakly or strongly connected components, walktrap specifying cluster_walktrap community detection
eclusters()gGraph$eclusters(
thresh = 1000,
range = 1e+06,
weak = TRUE,
paths = !weak,
mc.cores = 1,
verbose = FALSE,
chunksize = 1e+30,
method = "single"
)threshthe distance threshold with which to group nearby quasi-reciprocal junctions - i.e. if thresh=0 then we only consider clusters of exactly reciprocal junctions.
weaklogical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the graph [FALSE]
mc.coresparallel
eclusters2()Marks ALT edges belonging (quasi) reciprocal cycles
gGraph$eclusters2(
thresh = 1000,
range = 1e+06,
weak = TRUE,
paths = !weak,
mc.cores = 1,
verbose = FALSE,
chunksize = 1e+30,
method = "single",
return_pairs = FALSE,
ignore.small = TRUE,
max.small = 10000,
ignore.isolated = TRUE,
strict = c("strict", "one_to_one", "loose"),
min.isolated = max.small,
only_chains = FALSE
)weaklogical flag if TRUE will not differentiate between cycles and paths and will return all weakly connected clusters in the junction graph [FALSE]
mc.coresparallel
max.smallsize below which simple dups and dels are excluded
only_chainsTRUE will only pair breakend to its nearest nearest neighbor IFF the nearest neighbor is reciprocal
juncsGRangesList of junctions
ignore.strandusually TRUE
paths()Returns shortest paths from query gNode to subject gNode in graph in the form of gWalks (note: the gNodes must exist in the graph, unlike in the related but more general proximity function)
Each output path is a gWalk that connects query-subject on the genome described by gGraph gg. Each gWalk is annotated by the metadata of the corresponding query-subject GRanges pair as well as fields "altdist" and "refdist" specifying the "alternate and "reference" gGraph distance of the query-subject pair. The gWalk metadata field "reldist" specifies the relative distance (i.e. ratio of altdist to refdist) for that walk.
NOTE: this operation can be quite expensive for large combinations of of query and subject, so max.dist parameter will by default only compute paths for query-subject pairs that are less then max.dist apart (default 1MB). That default is chosen for large queries (eg >10K on each side), however for smaller queries (eg length <100) the user may want to set max.dist = Inf
By default performs a "cartesian" search, i.e. all pairs of query and subject but if cartesian is set to FALSE will only search for specified pairs of query and subject (then query and subject must be of the same length)
gGraph$paths(
query,
subject = query,
mc.cores = 1,
weight = NULL,
meta = NULL,
ignore.strand = TRUE,
cartesian = TRUE
)dist()Computes a distance matrix of query and subject intervals (in base pairs) on the gGraph between any arbitrary pairs of granges gr1 and gr2.
gGraph$dist(
query,
subject,
weight = NULL,
ignore.strand = TRUE,
include.internal = TRUE,
verbose = FALSE
)weightmetadata field of gEdges to use as weight (instead of distance of target node)
include.internallogical flag whether to allow paths that begin or end inside teh query or subject
gr1GRanges query
gr2GRanges query (if NULL, will set to gr1)
dtreturns data.table if TRUE, excluding all Inf distances
rep()Creates "bubbles" in the graph by replicating the nodes or gwalks in the argument. Node replication replicates edges going in and out of all replicated nodes. If an edge connects a pair of replicated nodes that edge will be replicated across all pairs of those replciated nodes. Walk replication will create "longer bubbles" with fewer edges getting replicated i.e. it will only replicate intra-walk edges within each walk replicate (but not between separate walk replicates).
(note that this changes the current gGraph in place, and thus the input gNode or gWalk will no longer apply to the new altered gWalk)
New graph keeps track of the parent node and edge ids in the original graph using node metadata parent.node.id and edge metadata parent.edge.id i.e. the replicated nodes will be connected to the sources of the original nodes and if replicated nodes connect to each other, then there will exist an edge connecting all of their instances to each other.
swap()Swap nodes with granges, grl, or Gwalks. Provided replacement vector must be the same length as the inputted nodes, resulting in each node being "swapped" by the provided interval, node, grl (representing a walk), or gWalk. The replacement will inherit left and right edges for the removed node. If the replacement is a walk, then the left side of the first node in the walk will inherit the edges that were previously to the left of the node being replaced, and right side of the last node of the walk will inherit the edges that were previously to the right of the node being replaced.
Note: these replacement obey the orientation of the arguments. So if the node to be replaced is flipped (- orientation with respect to the reference, then it's "left" is to the right on the reference. Similarly for walks whose first interval is flipped with respect to the reference, the left edges will be attached to the right of the node on the reference.
connect()Connect node pairs in the gGraph by adding (optional) edge metadata and (optionally) inserting nodes or grl / walks in between the given edge. Note: the connections are made with respect to the provided node orientation so if the node is provided in a "flipped" orientation then it's right direction will point left on the reference.
gGraph$connect(
n1,
n2,
n1.side = "right",
n2.side = "left",
type = "ALT",
meta = NULL,
insert = NULL
)n1= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n2= gNode object must point to a node in this graph, can also be an index of a node (but not a metadata expression), can also be a gWalk object
n1.sidecharacter vector of length length(n1) whose value is either "left" or "right" (default 'right')
n2.sidecharacter vector of length length(n1) whose value is either "left" or "right" (default 'left')
print()Prints out this gGraph. Prints number of nodes and edges, the gNode associated with this gGraph and the gEdge associated with this gGraph
annotate()Used by the mark() functions in gNode, gEdge and gWalks to alter the metadata associated with the nodes and edges in this gGraph. Not recommended to use this function. It is much safer to use mark.
FYI id for nodes is the node.id (not snode.id) id for edges is the edge.id (not sedge.id)
maxflow()Computes the "max flow" between every node pair in self for some metadata field.
The "max flow" for a node pair i, j is the maximum value m of node and/or edge metadata for which there exists a path p between i and j whose nodes n and/or edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p. (i.e. m is the maximum lower bound of the value of nodes / edges across all paths connecting ij)
The "min version" of this problem (max = FALSE) will determine the min value m for which there exists p whose nodes n and edges e obey field(n)>=m and/or field(e)>=m for all n,e \(\in\) p.
The user can also do the problem with lower.bound = FALSE i.e. where m is the (maximum or minimum) <upper> bound value in each path.
By default will try to solve problem across both node and edge metadata if the field is present in either. If the field is only present in one then will solve for that. This property can be toggled using edges.only and nodes.only parameters.
gGraph$maxflow(
field = NA,
walk = FALSE,
max = TRUE,
lower.bound = TRUE,
nfield = NA,
efield = NA,
cfield = NA,
path.only = TRUE,
require.nodes = NULL,
multi = FALSE,
ncopies = 1,
reverse.complement = FALSE,
verbose = FALSE
)fieldmetadata field to run maxflow on
walkif TRUE will return the single walk that maximizes the sum of metadata fields
maxlogical flag whether to find maximum path or (if max = FALSE) minimum path
nfieldfield to specify a node field to maximize across paths
efieldfield to specify an edge field to maximize across paths
cfieldfield to specify a node / edge field that limits / caps the dosage at nodes / edges
path.onlylogical flag relevant only if walk = TRUE, if path is TRUE will only allow path based maxflows (TRUE) ie will not return a solution when the graph contains only cycles
multilogical flag (FALSE) if TRUE will allow the optimization to compute a solution that outputs multiple disjoint paths
ncopiespositive integer representing the number of copies of the flow that we want the graph to support
reversecomplement will compute maximum flow between each node i and the reverse complement of node j in a strand specific way
window()Returns the region this gGraph spans as a GRanges
trim()Trims the current gGraph to the provided GRanges and returns this as a new gGraph.
tileGRanges to trim on
tileinterval around which to trim the gGraph
modDefaults to FALSE, set to TRUE to modify this gGraph
fix()Modifies (in place) the current seqlevels of the gGraph, including keeping only certain seqlevels, dropping certain seqlevels, and replacing seqlevels.
Warning: this may modify the graph including getting rid of nodes and edges (i.e. those outside the retained seqlevels) and also change coordinates (ie move ranges that were previously on different chromosomes to the same chromosome etc.). Use with caution!
Default behavior is to replace 'chr', with ''.
patterncharacter pattern to replace in seqlevels (used in a gsub, can have backreferences)
replacementcharacter to replace pattern with (used in a gsub, can have backreferences)
dropcharacter vector of seqlevels to drop or logical TRUE to drop all seqlevels that are unused (TRUE)
seqlengthsnew seqlengths i.e. named integer vector of seqlevels to drop or embed graph into
add()Adds GRanges nodes, edges (data.table), or junctions to graph Only one of the below parameters can be specified at a time (since the graph is modified in place, order matters)
json()Creates a json file for active visualization using gGnome.js annotations are node / edge features that will be dumped to json
filenamecharacter path to save to
savewhether to save or return list object representing json contents
annotationswhich graph annotations to dump to json
nfieldswhich node fields to dump to json (NULL)
efieldswhich edge fields to dump to json (NULL)
settingsgGnome.js settings values to add to the output JSON files (list)
cid.fieldfield in the graph edges that should be used for setting the cid values in the JSON (default: 'sedge.id'). This is useful for cases in which there is some unique identifier used across samples to identify identical junctions (for example "merged.ix" field, which is generated by merge.Junction())
split()