Provide a collection of statistical methods that incorporate both element-wise and group-wise penalties to estimate a precision matrix.
Usage
grasps(
X,
n = nrow(X),
membership,
penalty,
diag.ind = TRUE,
diag.grp = TRUE,
diag.include = FALSE,
lambda = NULL,
alpha = NULL,
gamma = NULL,
nlambda = 10,
lambda.min.ratio = 0.01,
growiter.lambda = 30,
tol.lambda = 0.001,
maxiter.lambda = 50,
rho = 2,
tau.incr = 2,
tau.decr = 2,
nu = 10,
tol.abs = 1e-04,
tol.rel = 1e-04,
maxiter = 10000,
crit = "BIC",
kfold = 5,
ebic.tuning = 0.5
)Arguments
- X
An n-by-p data matrix with sample size n and dimension p.
A p-by-p sample covariance/correlation matrix with dimension p.
- n
An integer (default = the number of columns of
X) specifying the sample size. This is only required when the input matrixXis a p-by-p sample covariance/correlation matrix with dimension p.- membership
An integer vector specifying the group membership. The length of
membershipmust be consistent with the dimension p.- penalty
A character string specifying the penalty for estimating precision matrix. Available options include:
"adapt": adaptive lasso (Zou 2006; Fan et al. 2009) .
"lasso": lasso (Tibshirani 1996; Friedman et al. 2008) .
"mcp": minimax concave penalty (Zhang 2010) .
"scad": smoothly clipped absolute deviation (Fan and Li 2001; Fan et al. 2009) .
- diag.ind
A boolean (default = TRUE) specifying whether to penalize the diagonal elements.
- diag.grp
A boolean (default = TRUE) specifying whether to penalize the within-group blocks.
- diag.include
A boolean (default = FALSE) specifying whether to include the diagonal entries in the penalty for within-group blocks when
diag.grp = TRUE.- lambda
A grid of non-negative scalars for the regularization parameter. The default is
NULL, which generates its ownlambdasequence based onnlambdaandlambda.min.ratio.- alpha
A grid of scalars in [0,1] specifying the parameter leveraging the element-wise individual L1 penalty and the block-wise group L2 penalty. An alpha of 1 corresponds to the element penalty only; an alpha of 0 corresponds to the group penalty only. The default values is a sequence from 0.05 to 0.95 with increments of 0.05.
- gamma
A scalar specifying the hyperparameter for the chosen
penalty. Default values:"adapt": 0.5
"mcp": 3
"scad": 3.7
- nlambda
An integer (default = 10) specifying the number of
lambdavalues to be generated whenlambda = NULL.- lambda.min.ratio
A scalar (default = 0.01) specifying the fraction of the maximum
lambdavalue \(\lambda_{max}\) to generate the minimumlambda\(\lambda_{min}\). Iflambda = NULL, the program automatically generates alambdagrid as a sequence of lengthnlambdain log scale, starting from \(\lambda_{min}\) to \(\lambda_{max}\).- growiter.lambda
An integer (default = 30) specifying the maximum number of exponential growth steps during the initial search for an admissible upper bound \(\lambda_{\max}\).
- tol.lambda
A scalar (default = 1e-03) specifying the relative tolerance for the bisection stopping rule on the interval width.
- maxiter.lambda
An integer (default = 50) specifying the maximum number of bisection iterations in the line search for \(\lambda_{\max}\).
- rho
A scalar > 0 (default = 2) specifying the ADMM augmented-Lagrangian penalty parameter (often called the ADMM step size). Larger values typically put more weight on enforcing the consensus constraints each iteration; smaller values yield more conservative updates.
- tau.incr
A scalar > 1 (default = 2) specifying the multiplicative factor used to increase
rhowhen the primal residual dominates the dual residual in ADMM.- tau.decr
A scalar > 1 (default = 2) specifying the multiplicative factor used to decrease
rhowhen the dual residual dominates the primal residual in ADMM.- nu
A scalar > 1 (default = 10) controlling how aggressively
rhois rescaled in the adaptive-rhoscheme (residual balancing).- tol.abs
A scalar > 0 (default = 1e-04) specifying the absolute tolerance for ADMM stopping (applied to primal/dual residual norms).
- tol.rel
A scalar > 0 (default = 1e-04) specifying the relative tolerance for ADMM stopping (applied to primal/dual residual norms).
- maxiter
An integer (default = 1e+04) specifying the maximum number of ADMM iterations.
- crit
A string (default = "BIC") specifying the parameter selection method to use. Available options include:
"AIC": Akaike information criterion (Akaike 1973) .
"BIC": Bayesian information criterion (Schwarz 1978) .
"EBIC": extended Bayesian information criterion (Foygel and Drton 2010) .
"HBIC": high dimensional Bayesian information criterion (Wang et al. 2013; Fan et al. 2017) .
"CV": k-fold cross validation with negative log-likelihood loss.
- kfold
An integer (default = 5) specifying the number of folds used for
crit = "CV".- ebic.tuning
A scalar (default = 0.5) specifying the tuning parameter to calculate for
crit = "EBIC".
Value
A list containing the following components:
- hatOmega
The estimated precision matrix.
- lambda
The optimal regularization parameter.
- alpha
The optimal penalty balancing parameter.
- initial
The initial estimate of
hatOmegawhenpenaltyis set to"adapt","mcp", or"scad".- gamma
The optimal hyperparameter when
penaltyis set to"adapt","mcp", or"scad".- iterations
The number of ADMM iterations.
- lambda.grid
The actual lambda grid used in the program.
- alpha.grid
The actual alpha grid used in the program.
- lambda.safe
The bisection-refined upper bound \(\lambda_{\max}\), corresponding to
alpha.grid, whenlambda = NULL.- loss
The optimal k-fold loss when
crit = "CV".- CV.loss
Matrix of CV losses, with rows for parameter combinations and columns for CV folds, when
crit = "CV".- score
The optimal information criterion score when
critis set to"AIC","BIC","EBIC", or"HBIC".- IC.score
The information criterion score for each parameter combination when
critis set to"AIC","BIC","EBIC", or"HBIC".
References
Akaike H (1973).
“Information Theory and an Extension of the Maximum Likelihood Principle.”
In Petrov BN, Csáki F (eds.), Second International Symposium on Information Theory, 267–281.
Akad\'emiai Kiad\'o, Budapest, Hungary.
Fan J, Feng Y, Wu Y (2009).
“Network Exploration via the Adaptive LASSO and SCAD Penalties.”
The Annals of Applied Statistics, 3(2), 521–541.
doi:10.1214/08-aoas215
.
Fan J, Li R (2001).
“Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties.”
Journal of the American Statistical Association, 96(456), 1348–1360.
doi:10.1198/016214501753382273
.
Fan J, Liu H, Ning Y, Zou H (2017).
“High Dimensional Semiparametric Latent Graphical Model for Mixed Data.”
Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(2), 405–421.
doi:10.1111/rssb.12168
.
Foygel R, Drton M (2010).
“Extended Bayesian Information Criteria for Gaussian Graphical Models.”
In Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds.), Advances in Neural Information Processing Systems 23 (NIPS 2010), 604–612.
Friedman J, Hastie T, Tibshirani R (2008).
“Sparse Inverse Covariance Estimation with the Graphical Lasso.”
Biostatistics, 9(3), 432–441.
doi:10.1093/biostatistics/kxm045
.
Schwarz G (1978).
“Estimating the Dimension of a Model.”
The Annals of Statistics, 6(2), 461–464.
doi:10.1214/aos/1176344136
.
Tibshirani R (1996).
“Regression Shrinkage and Selection via the Lasso.”
Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
doi:10.1111/j.2517-6161.1996.tb02080.x
.
Wang L, Kim Y, Li R (2013).
“Calibrating Nonconvex Penalized Regression in Ultra-High Dimension.”
The Annals of Statistics, 41(5), 2505–2536.
doi:10.1214/13-AOS1159
.
Zhang C (2010).
“Nearly Unbiased Variable Selection under Minimax Concave Penalty.”
The Annals of Statistics, 38(2), 894–942.
doi:10.1214/09-AOS729
.
Zou H (2006).
“The Adaptive Lasso and Its Oracle Properties.”
Journal of the American Statistical Association, 101(476), 1418–1429.
doi:10.1198/016214506000000735
.
