Skip to contents

Generate a precision matrix that exhibits block structure induced by a stochastic block model (SBM).

Usage

gen_prec_sbm(
  p,
  block.sizes = NULL,
  K = 3,
  prob.mat = NULL,
  within.prob = 0.25,
  between.prob = 0.05,
  weight.mat = NULL,
  weight.dists = list("gamma", "unif"),
  weight.paras = list(c(shape = 10000, rate = 100), c(min = 0, max = 5)),
  min.eig = 0.1
)

Arguments

p

An integer specifying the number of variables (dimensions).

block.sizes

An integer vector (default = NULL) specifying the size of each group. If NULL, the \(p\) variables are divided as evenly as possible across \(K\) groups.

K

An integer (default = 3) specifying the number of groups. Ignored if block.sizes is provided; then K <- length(block.sizes).

prob.mat

A \(K\)-by-\(K\) symmetric matrix (default = NULL) specifying the Bernoulli rates. Element (i,j) gives the probability of creating an edge between vertices from groups i and j. If NULL, a matrix with within.prob on the diagonal and between.prob off-diagonal is used.

within.prob

A scalar in [0,1] (default = 0.25) specifying the probability of creating an edge between vertices within the same group. This argument is used only when prob.mat = NULL.

between.prob

A scalar in [0,1] (default = 0.05) specifying the probability of creating an edge between vertices from different groups. This argument is used only when prob.mat = NULL.

weight.mat

A \(p\)-by-\(p\) symmetric matrix (default = NULL) specifying the edge weights. If NULL, weights are generated block-wise according to weight.dists and weight.paras.

weight.dists

A list (default = list("gamma", "unif")) specifying the sampling distribution for each block of weights. Its length determines how the distributions are assigned:

  • length = 1: Same specification for all blocks.

  • length = 2: First for within-group blocks, second for between-group blocks.

  • length = \(K + K(K-1)/2\): Full specification for each block. The first \(K\) elements correspond to within-group blocks with indices 1, ..., K, and the remaining \(K(K-1)/2\) elements correspond to between-group blocks ordered as (1,2), (1,3), ..., (1,K), (2,3), ..., (K-1,K).

Each element of weight.dists can be:

  1. A string specifying the distribution family. Accepted distributions (base R samplers in parentheses) include:

    • "beta": Beta distribution (rbeta)

    • "cauchy": Cauchy distribution (rcauchy).

    • "chisq": Chi-squared distribution (rchisq).

    • "exp": Exponential distribution (rexp).

    • "f": F distribution (rf).

    • "gamma": Gamma distribution (rgamma).

    • "lnorm": Log normal distribution (rlnorm).

    • "norm": Normal distribution (rnorm).

    • "t": Student's t distribution (rt).

    • "unif": Uniform distribution (runif).

    • "weibull": Weibull distribution (rweibull).

  2. A user-supplied function used for sampling. The function must accept an argument n specifying the number of samples.

weight.paras

A list (default = list(c(shape = 1e4, rate = 1e2), c(min = 0, max = 5))) specifying the parameters associated with weight.dists. It must follow the same length rules as weight.dists. Each element should be a named vector or list suitable for the corresponding sampler.

min.eig

A scalar (default = 0.1) specifying the minimum eigenvalue target for the precision matrix. A diagonal shift ensures positive definiteness.

Value

A list with the following components:

Omega

The precision matrix with SBM block structure.

Sigma

The covariance matrix, i.e., the inverse of Omega.

sparsity

Proportion of zero entries in Omega.

membership

An integer vector specifying the group membership.

Details

Edge sampling. Within- and between-group edges are sampled independently according to Bernoulli distributions specified by prob.mat, or by within.prob and between.prob if prob.mat is not supplied.

Weight sampling. Conditional on the adjacency structure, edge weights are sampled block-wise from samplers specified in weight.dists and weight.paras. The length of weight.dists (and weight.paras) determines how weight distributions are assigned:

  • length = 1: Same specification for all blocks.

  • length = 2: first for within-group blocks, second for between-group blocks.

  • length = \(K + K(K - 1)/2\): Full specification for each block.

Block indexing. The order for blocks is:

  • Within-group blocks: Indices 1, ..., K.

  • Between-group blocks: \(K(K-1)/2\) blocks in order (1,2), (1,3), ..., (1,K), (2,3), ..., (K-1,K).

Positive definiteness. The weighted adjacency matrix is symmetrized and used as the precision matrix \(\Omega\). Since arbitrary block-structured weights may not be positive definite, a diagonal adjustment is applied to ensure the minimum eigenvalue of \(\Omega\) is at least min.eig.

Examples

## reproducibility for everything
set.seed(1234)

## user-defined sampler
my_gamma <- function(n) {
  rgamma(n, shape = 10, scale = 0.5)
}

sim <- gen_prec_sbm(p = 20, K = 3,
                    within.prob = 0.25, between.prob = 0.05,
                    weight.dists = list(my_gamma, "unif"),
                    weight.paras = list(NULL, c(min = 0, max = 5)),
                    min.eig = 0.1)