| Title: | Identification and Analysis of Co-Occurrence Networks |
|---|---|
| Description: | Implementation of the NetCutter algorithm described in Müller and Mancuso (2008) <doi:10.1371/journal.pone.0003178>. The package identifies co-occurring terms in a list of containers. For example, it may be used to detect genes that co-occur across genomes. |
| Authors: | Heiko Müller [aut], Francesco Mancuso [aut], Federico Marotta [cre] |
| Maintainer: | Federico Marotta <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.1 |
| Built: | 2026-05-25 08:40:03 UTC |
| Source: | https://github.com/fmarotta/netcutter |
Helper function to generate the list of co-occurrence terms grouped into modules of a specified size.
nc_define_modules(occ_matrix, terms_of_interest, module_size, min_occurrences)nc_define_modules(occ_matrix, terms_of_interest, module_size, min_occurrences)
occ_matrix |
The original occurrence matrix. |
terms_of_interest |
Vector of column names or indices representing the terms that should be included in the analysis. |
module_size |
The number of terms that should be tested for co-occurrence. |
min_occurrences |
Minimum number of occurrences of each term. |
A list of the valid modules.
The main NetCutter function. It generates p-values for all the co-occurring modules.
nc_eval( occ_matrix, occ_probs, terms_of_interest = NULL, module_size = 2, min_occurrences = 0, min_support = 0, mc.cores = 1 )nc_eval( occ_matrix, occ_probs, terms_of_interest = NULL, module_size = 2, min_occurrences = 0, min_support = 0, mc.cores = 1 )
occ_matrix |
The original occurrence matrix. |
occ_probs |
The matrix of occurrence probabilities, as computed by
|
terms_of_interest |
Vector of column names or indices representing the terms that should be included in the analysis. |
module_size |
The number of terms that should be tested for co-occurrence. |
min_occurrences |
Minimum number of occurrences of each term. |
min_support |
Minimum number of occurrences of each module. |
mc.cores |
Number of parallel computations with mclapply() (set to 1 for serial execution) |
If terms_of_interest is NULL, all the terms in occ_matrix are used. If
it is not null, only modules containing at least one of these terms will be
considered. min_occurrences and min_support are still used to further
restrict the list of terms that are considered.
A data.frame with one row for each valid module, and corresponding
number of co-occurrences and p-value.
# Generate an occurrence matrix. m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9))) m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE # Set the seed using the "L'Ecuyer-CMRG" random number generator. set.seed(1, "L'Ecuyer-CMRG") # Compute the occurrence probabilities. occ_probs <- nc_occ_probs(m, R = 20, S = 50) # Evaluate the co-occurrences of pairs of terms and their statistical significance. nc_eval(m, occ_probs, module_size = 2) # Now evaluate triples; no need to recompute the occurrence probabilities. nc_eval(m, occ_probs, module_size = 3) # Now consider only modules involving gene1 or gene2. nc_eval(m, occ_probs, module_size = 2, terms_of_interest = c("gene1", "gene2"))# Generate an occurrence matrix. m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9))) m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE # Set the seed using the "L'Ecuyer-CMRG" random number generator. set.seed(1, "L'Ecuyer-CMRG") # Compute the occurrence probabilities. occ_probs <- nc_occ_probs(m, R = 20, S = 50) # Evaluate the co-occurrences of pairs of terms and their statistical significance. nc_eval(m, occ_probs, module_size = 2) # Now evaluate triples; no need to recompute the occurrence probabilities. nc_eval(m, occ_probs, module_size = 3) # Now consider only modules involving gene1 or gene2. nc_eval(m, occ_probs, module_size = 2, terms_of_interest = c("gene1", "gene2"))
Use the EdgeSwapping method to find the probability of occurrence of each term in each container under the null hypothesis.
nc_occ_probs( occ_matrix, R = 500, S = sum(occ_matrix) * 10, mc.cores = getOption("mc.cores", 1L), n_batches = ceiling(R/30), verbose = FALSE )nc_occ_probs( occ_matrix, R = 500, S = sum(occ_matrix) * 10, mc.cores = getOption("mc.cores", 1L), n_batches = ceiling(R/30), verbose = FALSE )
occ_matrix |
The original co-occurrence matrix |
R |
The number of randomisations to perform |
S |
The number of successful edge swaps for each randomisation |
mc.cores |
Number of parallel computations with mclapply() (set to 1 for serial execution) |
n_batches |
Split the computation into |
verbose |
Print a status message when starting every new batch. |
The occurrence probability matrix.
# Generate an occurrence matrix. m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9))) m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE # Set the seed using the `rlecuyer` package rlecuyer::.lec.SetPackageSeed(1:6) # Compute the occurrence probabilities. occ_probs <- nc_occ_probs(m, R = 20, S = 50) # Using `n_batches=1` can speed up the computations at the cost of more RAM. occ_probs <- nc_occ_probs(m, R = 20, n_batches = 1, mc.cores = 1)# Generate an occurrence matrix. m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9))) m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE # Set the seed using the `rlecuyer` package rlecuyer::.lec.SetPackageSeed(1:6) # Compute the occurrence probabilities. occ_probs <- nc_occ_probs(m, R = 20, S = 50) # Using `n_batches=1` can speed up the computations at the cost of more RAM. occ_probs <- nc_occ_probs(m, R = 20, n_batches = 1, mc.cores = 1)
This is a simpler implementation used to check that the official
implementation (nc_occ_probs()) works well.
nc_occ_probs_simple(occ_matrix, R, S)nc_occ_probs_simple(occ_matrix, R, S)
occ_matrix |
The original co-occurrence matrix |
R |
The number of randomisations to perform |
S |
The number of successful edge swaps for each randomisation |
Apply an edge-swapping algorithm.
nc_randomize(occ_matrix, S)nc_randomize(occ_matrix, S)
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
A randomized copy of the occurrence matrix.
Faster implementation that samples row and column independently
nc_randomize_fast(occ_matrix, S)nc_randomize_fast(occ_matrix, S)
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Old implementation in pure R, kept for testing purposes and for reproducibility of old results.
nc_randomize_R(occ_matrix, S)nc_randomize_R(occ_matrix, S)
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
This is a simpler implementation used to check that the official
implementation (nc_randomize()) works well.
nc_randomize_simple(occ_matrix, S)nc_randomize_simple(occ_matrix, S)
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Sample one item from a vector, even when the vector has length 1
safe_sample(x)safe_sample(x)
x |
Vector of values to sample |
When x has length 1, the sample() function thinks that we want to
sample from 1 to x. However, we deal want to sample vectors of unknown
length, and possibly of length 1, but we always want to sample among
the values of x. This function ensures that.
One value from x.