Title: | Correcting Misclassified Mediation Analysis |
---|---|
Description: | Use three methods to estimate parameters from a mediation analysis with a binary misclassified mediator. These methods correct for the problem of "label switching" using Youden's J criteria. A detailed description of the analysis methods is available in Webb and Wells (2024), "Effect estimation in the presence of a misclassified binary mediator" <doi:10.48550/arXiv.2407.06970>. |
Authors: | Kimberly Webb [aut, cre] |
Maintainer: | Kimberly Webb <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.1 |
Built: | 2025-01-15 06:12:39 UTC |
Source: | https://github.com/kimberlywebb/comma |
Jointly estimate and
parameters from the true outcome
and observation mechanisms, respectively, in a binary outcome misclassification
model.
COMBO_EM_algorithm( Ystar, x_matrix, z_matrix, beta_start, gamma_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMBO_EM_algorithm( Ystar, x_matrix, z_matrix, beta_start, gamma_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
COMBO_EM_algorithm
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
EM-Algorithm Function for Estimation of the Misclassification Model
COMBO_EM_function(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
COMBO_EM_function(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
param_current |
A numeric vector of regression parameters, in the order
|
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
COMBO_EM_function
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Compute E-step for Binary Outcome Misclassification Model Estimated With the EM-Algorithm
COMBO_weight(ystar_matrix, pistar_matrix, pi_matrix, sample_size, n_cat)
COMBO_weight(ystar_matrix, pistar_matrix, pi_matrix, sample_size, n_cat)
ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
outcome matrix, |
n_cat |
The number of categorical values that the true outcome, |
COMBO_weight
returns a matrix of E-step weights for the EM-algorithm,
computed as follows:
.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true outcome categories
n_cat
.
Generate Bootstrap Samples for Estimating Standard Errors
COMMA_boot_sample( parameter_estimates, sigma_estimate = 1, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix )
COMMA_boot_sample( parameter_estimates, sigma_estimate = 1, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix )
parameter_estimates |
A column matrix of |
sigma_estimate |
A numeric value specifying the estimated
standard deviation. This value is only required if |
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
interaction_indicator |
A logical value indicating if an interaction between
|
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
COMMA_boot_sample
returns a list with the bootstrap sample data:
obs_mediator |
A vector of observed mediator values. |
true_mediator |
A vector of true mediator values. |
outcome |
A vector of outcome values. |
x_matrix |
A matrix of predictor values in the true mediator mechanism. Identical to that supplied by the user. |
z_matrix |
A matrix of predictor values in the observed mediator mechanism. Identical to that supplied by the user. |
c_matrix |
A matrix of covariates. Identical to that supplied by the user. |
Generate Data to use in COMMA Functions
COMMA_data( sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator, outcome_distribution, true_beta, true_gamma, true_theta )
COMMA_data( sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator, outcome_distribution, true_beta, true_gamma, true_theta )
sample_size |
An integer specifying the sample size of the generated data set. |
x_mu |
A numeric value specifying the mean of |
x_sigma |
A positive numeric value specifying the standard deviation of
|
z_shape |
A positive numeric value specifying the shape parameter of
|
c_shape |
A positive numeric value specifying the shape parameter of
|
interaction_indicator |
A logical value indicating if an interaction between
|
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
true_beta |
A column matrix of |
true_gamma |
A numeric matrix of |
true_theta |
A column matrix of |
COMMA_data
returns a list of generated data elements:
obs_mediator |
A vector of observed mediator values. |
true_mediator |
A vector of true mediator values. |
outcome |
A vector of outcome values. |
x |
A vector of generated predictor values in the true mediator mechanism, from the Normal distribution. |
z |
A vector of generated predictor values in the observed mediator mechanism from the Gamma distribution. |
c |
A vector of generated covariates. |
x_design_matrix |
The design matrix for the |
z_design_matrix |
The design matrix for the |
c_design_matrix |
The design matrix for the |
set.seed(20240709) sample_size <- 10000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) head(example_data$obs_mediator) head(example_data$true_mediator)
set.seed(20240709) sample_size <- 10000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) head(example_data$obs_mediator) head(example_data$true_mediator)
Jointly estimate ,
, and
parameters from
the true mediator, observed mediator, and outcome mechanisms, respectively,
in a binary mediator misclassification model.
COMMA_EM( Mstar, outcome, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, sigma_start = NULL, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMMA_EM( Mstar, outcome, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, sigma_start = NULL, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Mstar |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
outcome |
A vector containing the outcome variables of interest. There
should be no |
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
interaction_indicator |
A logical value indicating if an interaction between
|
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
theta_start |
A numeric vector or column matrix of starting values for the |
sigma_start |
A numeric value specifying the starting value for the
standard deviation. This value is only required if |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
COMMA_EM
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) EM_results
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) EM_results
Estimate Bootstrap Standard Errors using EM
COMMA_EM_bootstrap_SE( parameter_estimates, sigma_estimate = 1, n_bootstrap, n_parallel, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
COMMA_EM_bootstrap_SE( parameter_estimates, sigma_estimate = 1, n_bootstrap, n_parallel, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
parameter_estimates |
A column matrix of |
sigma_estimate |
A numeric value specifying the estimated
standard deviation. This value is only required if |
n_bootstrap |
A numeric value specifying the number of bootstrap samples to draw. |
n_parallel |
A numeric value specifying the number of parallel cores to run the computation on. |
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
interaction_indicator |
A logical value indicating if an interaction between
|
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
random_seed |
A numeric value specifying the random seed to set for bootstrap
sampling. Default is |
COMMA_EM_bootstrap_SE
returns a list with two elements: 1)
bootstrap_df
and 2) bootstrap_SE
. bootstrap_df
is a data
frame containing COMMA_EM
output for each bootstrap sample. bootstrap_SE
is a data frame containing bootstrap standard error estimates for each parameter.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) EM_results EM_SEs <- COMMA_EM_bootstrap_SE(EM_results$Estimates, sigma_estimate = NULL, n_bootstrap = 3, n_parallel = 1, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, random_seed = 1) EM_SEs$bootstrap_SE
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) EM_results EM_SEs <- COMMA_EM_bootstrap_SE(EM_results$Estimates, sigma_estimate = NULL, n_bootstrap = 3, n_parallel = 1, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, random_seed = 1) EM_SEs$bootstrap_SE
Estimate ,
, and
parameters from
the true mediator, observed mediator, and outcome mechanisms, respectively,
in a binary mediator misclassification model using an ordinary least squares
correction.
COMMA_OLS( Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMMA_OLS( Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Mstar |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
outcome |
A vector containing the outcome variables of interest. There
should be no |
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
theta_start |
A numeric vector or column matrix of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
Note that this method can only be used for Normal outcome models, and interaction
terms (between x
and m
) are not supported.
COMMA_PVW
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
. The third column,
Convergence
, reports whether or not the algorithm converged for a
given parameter estimate. The final column, Method
, reports
that the estimates are obtained from the "PVW" procedure.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, 2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Normal", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] OLS_results <- COMMA_OLS(Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) OLS_results
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, 2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Normal", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] OLS_results <- COMMA_OLS(Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) OLS_results
Estimate Bootstrap Standard Errors using OLS
COMMA_OLS_bootstrap_SE( parameter_estimates, sigma_estimate = 1, n_bootstrap, n_parallel, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
COMMA_OLS_bootstrap_SE( parameter_estimates, sigma_estimate = 1, n_bootstrap, n_parallel, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
parameter_estimates |
A column matrix of |
sigma_estimate |
A numeric value specifying the estimated standard deviation. Default is 1. |
n_bootstrap |
A numeric value specifying the number of bootstrap samples to draw. |
n_parallel |
A numeric value specifying the number of parallel cores to run the computation on. |
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
random_seed |
A numeric value specifying the random seed to set for bootstrap
sampling. Default is |
COMMA_OLS_bootstrap_SE
returns a list with two elements: 1)
bootstrap_df
and 2) bootstrap_SE
. bootstrap_df
is a data
frame containing COMMA_OLS
output for each bootstrap sample. bootstrap_SE
is a data frame containing bootstrap standard error estimates for each parameter.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, 2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Normal", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] OLS_results <- COMMA_OLS(Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) OLS_results OLS_SEs <- COMMA_OLS_bootstrap_SE(OLS_results$Estimates, sigma_estimate = 1, n_bootstrap = 3, n_parallel = 1, x_matrix, z_matrix, c_matrix, random_seed = 1) OLS_SEs$bootstrap_SE
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, 2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Normal", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] OLS_results <- COMMA_OLS(Mstar, outcome, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) OLS_results OLS_SEs <- COMMA_OLS_bootstrap_SE(OLS_results$Estimates, sigma_estimate = 1, n_bootstrap = 3, n_parallel = 1, x_matrix, z_matrix, c_matrix, random_seed = 1) OLS_SEs$bootstrap_SE
Estimate ,
, and
parameters from
the true mediator, observed mediator, and outcome mechanisms, respectively,
in a binary mediator misclassification model using a predictive value weighting
approach.
COMMA_PVW( Mstar, outcome, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMMA_PVW( Mstar, outcome, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Mstar |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
outcome |
A vector containing the outcome variables of interest. There
should be no |
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
interaction_indicator |
A logical value indicating if an interaction between
|
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
theta_start |
A numeric vector or column matrix of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
Note that this method can only be used for binary outcome models.
COMMA_PVW
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
. The third column,
Convergence
, reports whether or not the algorithm converged for a
given parameter estimate. The final column, Method
, reports
that the estimates are obtained from the "PVW" procedure.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) PVW_results
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) PVW_results
Estimate Bootstrap Standard Errors using PVW
COMMA_PVW_bootstrap_SE( parameter_estimates, sigma_estimate, n_bootstrap, n_parallel, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
COMMA_PVW_bootstrap_SE( parameter_estimates, sigma_estimate, n_bootstrap, n_parallel, outcome_distribution, interaction_indicator, x_matrix, z_matrix, c_matrix, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem", random_seed = NULL )
parameter_estimates |
A column matrix of |
sigma_estimate |
A numeric value specifying the estimated
standard deviation. This value is only required if |
n_bootstrap |
A numeric value specifying the number of bootstrap samples to draw. |
n_parallel |
A numeric value specifying the number of parallel cores to run the computation on. |
outcome_distribution |
A character string specifying the distribution of
the outcome variable. Options are |
interaction_indicator |
A logical value indicating if an interaction between
|
x_matrix |
A numeric matrix of predictors in the true mediator and outcome mechanisms.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
random_seed |
A numeric value specifying the random seed to set for bootstrap
sampling. Default is |
COMMA_PVW_bootstrap_SE
returns a list with two elements: 1)
bootstrap_df
and 2) bootstrap_SE
. bootstrap_df
is a data
frame containing COMMA_PVW
output for each bootstrap sample. bootstrap_SE
is a data frame containing bootstrap standard error estimates for each parameter.
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) PVW_results PVW_SEs <- COMMA_PVW_bootstrap_SE(PVW_results$Estimates, sigma_estimate = NULL, n_bootstrap = 3, n_parallel = 1, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, random_seed = 1) PVW_SEs$bootstrap_SE
set.seed(20240709) sample_size <- 2000 n_cat <- 2 # Number of categories in the binary mediator # Data generation settings x_mu <- 0 x_sigma <- 1 z_shape <- 1 c_shape <- 1 # True parameter values (gamma terms set the misclassification rate) true_beta <- matrix(c(1, -2, .5), ncol = 1) true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE) true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1) example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape, interaction_indicator = FALSE, outcome_distribution = "Bernoulli", true_beta, true_gamma, true_theta) beta_start <- matrix(rep(1, 3), ncol = 1) gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2) theta_start <- matrix(rep(1, 4), ncol = 1) Mstar = example_data[["obs_mediator"]] outcome = example_data[["outcome"]] x_matrix = example_data[["x"]] z_matrix = example_data[["z"]] c_matrix = example_data[["c"]] PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, beta_start, gamma_start, theta_start) PVW_results PVW_SEs <- COMMA_PVW_bootstrap_SE(PVW_results$Estimates, sigma_estimate = NULL, n_bootstrap = 3, n_parallel = 1, outcome_distribution = "Bernoulli", interaction_indicator = FALSE, x_matrix, z_matrix, c_matrix, random_seed = 1) PVW_SEs$bootstrap_SE
Function is for cases with and with no interaction term
in the outcome mechanism.
EM_function_bernoulliY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_bernoulliY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Function is for cases with and with an interaction term
in the outcome mechanism.
EM_function_bernoulliY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_bernoulliY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Function is for cases with and with no interaction term
in the outcome mechanism.
EM_function_normalY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_normalY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Function is for cases with and with an interaction term
in the outcome mechanism.
EM_function_normalY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_normalY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Function is for cases with and without an interaction term
in the outcome mechanism.
EM_function_poissonY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_poissonY( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Function is for cases with and with an interaction term
in the outcome mechanism.
EM_function_poissonY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
EM_function_poissonY_XM( param_current, obs_mediator, obs_outcome, X, Z, c_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_mediator |
A numeric vector of indicator variables (1, 2) for the observed
mediator |
obs_outcome |
A vector containing the outcome variables of interest. There
should be no |
X |
A numeric design matrix for the true mediator mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
EM_function_bernoulliY
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Compute the conditional probability of observing mediator given
the latent true mediator
as
for each of the
n
subjects.
misclassification_prob(gamma_matrix, z_matrix)
misclassification_prob(gamma_matrix, z_matrix)
gamma_matrix |
A numeric matrix of estimated regression parameters for the
observation mechanism, |
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
misclassification_prob
returns a dataframe containing four columns.
The first column, Subject
, represents the subject ID, from to
n
,
where n
is the sample size, or equivalently, the number of rows in z_matrix
.
The second column, M
, represents a true, latent mediator category .
The third column,
Mstar
, represents an observed outcome category .
The last column,
Probability
, is the value of the equation
computed for each subject, observed mediator category, and true, latent mediator category.
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gammas <- matrix(c(1, -1, .5, .2, -.6, 1.5), ncol = 2) P_Ystar_M <- misclassification_prob(estimated_gammas, z_matrix) head(P_Ystar_M)
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gammas <- matrix(c(1, -1, .5, .2, -.6, 1.5), ncol = 2) P_Ystar_M <- misclassification_prob(estimated_gammas, z_matrix) head(P_Ystar_M)
Example data from the National Vital Statistics System of the National Center for Health Statistics (NCHS), 2022
NCHS2022_sample
NCHS2022_sample
A dataframe 30 columns, including demographic and birth information for a random sample of 20,000 singleton births from nulliparous mothers in the US in 2022.
https://data.nber.org/nvss/natality/inputs/raw/2022/
## Not run: data("NCHS2022_sample") head(NCHS2022_sample) ## End(Not run)
## Not run: data("NCHS2022_sample") head(NCHS2022_sample) ## End(Not run)
Compute Probability of Each True Outcome, for Every Subject
pi_compute(beta, X, n, n_cat)
pi_compute(beta, X, n, n_cat)
beta |
A numeric column matrix of regression parameters for the
|
X |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pi_compute
returns a matrix of probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject. Columns of the matrix correspond to the true outcome
categories
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome, for Every Subject
pistar_compute(gamma, Z, n, n_cat)
pistar_compute(gamma, Z, n, n_cat)
gamma |
A numeric matrix of regression parameters for the observed
outcome mechanism, |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_compute
returns a matrix of conditional probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject and observed category $1$ occurs at row
. The probability
for subject
and observed category $2$ occurs at row
n
.
Columns of the matrix correspond to the true outcome categories
n_cat
.
Sum Every "n"th Element
sum_every_n(x, n)
sum_every_n(x, n)
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
sum_every_n
returns a vector of sums of every n
th element of the vector x
.
Sum Every "n"th Element, then add 1
sum_every_n1(x, n)
sum_every_n1(x, n)
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
sum_every_n1
returns a vector of sums of every n
th element of the vector x
, plus 1.
Likelihood Function for Normal Outcome Mechanism with a Binary Mediator
theta_optim(param_start, m, x, c_matrix, outcome, sample_size, n_cat)
theta_optim(param_start, m, x, c_matrix, outcome, sample_size, n_cat)
param_start |
A numeric vector or column matrix of starting values for the |
m |
A vector or column matrix containing the true binary mediator or the
E-step weight (with values between 0 and 1). There
should be no |
x |
A vector or column matrix of the predictor or exposure of interest. There
should be no |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
outcome |
A vector containing the outcome variables of interest. There
should be no |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
theta_optim
returns a numeric value of the (negative) log-likelihood function.
Likelihood Function for Normal Outcome Mechanism with a Binary Mediator and an Interaction Term
theta_optim_XM(param_start, m, x, c_matrix, outcome, sample_size, n_cat)
theta_optim_XM(param_start, m, x, c_matrix, outcome, sample_size, n_cat)
param_start |
A numeric vector or column matrix of starting values for the |
m |
vector or column matrix containing the true binary mediator or the
E-step weight (with values between 0 and 1). There
should be no |
x |
A vector or column matrix of the predictor or exposure of interest. There
should be no |
c_matrix |
A numeric matrix of covariates in the true mediator and outcome mechanisms.
|
outcome |
A vector containing the outcome variables of interest. There
should be no |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
theta_optim_XM
returns a numeric value of the (negative) log-likelihood function.
Compute the probability of the latent true mediator as
for each of the
n
subjects.
true_classification_prob(beta_matrix, x_matrix)
true_classification_prob(beta_matrix, x_matrix)
beta_matrix |
A numeric column matrix of estimated regression parameters for the
true mediator mechanism, |
x_matrix |
A numeric matrix of covariates in the true mediator mechanism.
|
true_classification_prob
returns a dataframe containing three columns.
The first column, Subject
, represents the subject ID, from to
n
,
where n
is the sample size, or equivalently, the number of rows in x_matrix
.
The second column, M
, represents a true, latent mediator category .
The last column,
Probability
, is the value of the equation
computed
for each subject and true, latent mediator category.
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) x_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_betas <- matrix(c(1, -1, .5), ncol = 1) P_M <- true_classification_prob(estimated_betas, x_matrix) head(P_M)
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) x_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_betas <- matrix(c(1, -1, .5), ncol = 1) P_M <- true_classification_prob(estimated_betas, x_matrix) head(P_M)
Note that this function should only be used for Binary outcome models.
w_m_binaryY( mstar_matrix, outcome_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
w_m_binaryY( mstar_matrix, outcome_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
mstar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
mediator |
outcome_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
p_yi_m0 |
A numeric vector of outcome probabilities computed assuming a true mediator value of 0. |
p_yi_m1 |
A numeric vector of outcome probabilities computed assuming a true mediator value of 1. |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
mediator matrix, |
n_cat |
The number of categorical values that the true outcome, |
w_m_binaryY
returns a matrix of E-step weights for the EM-algorithm.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true mediator categories
n_cat
.
Note that this function should only be used for Normal outcome models.
w_m_normalY( mstar_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
w_m_normalY( mstar_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
mstar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
mediator |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
p_yi_m0 |
A numeric vector of Normal outcome likelihoods computed assuming a true mediator value of 0. |
p_yi_m1 |
A numeric vector of Normal outcome likelihoods computed assuming a true mediator value of 1. |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
mediator matrix, |
n_cat |
The number of categorical values that the true outcome, |
w_m_normalY
returns a matrix of E-step weights for the EM-algorithm.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true mediator categories
n_cat
.
Note that this function should only be used for Poisson outcome models.
w_m_poissonY( mstar_matrix, outcome_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
w_m_poissonY( mstar_matrix, outcome_matrix, pistar_matrix, pi_matrix, p_yi_m0, p_yi_m1, sample_size, n_cat )
mstar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
mediator |
outcome_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
p_yi_m0 |
A numeric vector of outcome probabilities computed assuming a true mediator value of 0. |
p_yi_m1 |
A numeric vector of outcome probabilities computed assuming a true mediator value of 1. |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
mediator matrix, |
n_cat |
The number of categorical values that the true outcome, |
w_m_poissonY
returns a matrix of E-step weights for the EM-algorithm.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true mediator categories
n_cat
.