Title: | Correcting Misclassified Binary Outcomes in Association Studies |
---|---|
Description: | Use frequentist and Bayesian methods to estimate parameters from a binary outcome misclassification model. These methods correct for the problem of "label switching" by assuming that the sum of outcome sensitivity and specificity is at least 1. A description of the analysis methods is available in Hochstedler and Wells (2023) <doi:10.48550/arXiv.2303.10215>. |
Authors: | Kimberly Hochstedler Webb [aut, cre] |
Maintainer: | Kimberly Hochstedler Webb <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0 |
Built: | 2024-10-30 15:21:32 UTC |
Source: | https://github.com/kimberlywebb/combo |
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
check_and_fix_chains( n_chains, chains_list, pistarjj_matrix, dim_x, dim_z, n_cat )
check_and_fix_chains( n_chains, chains_list, pistarjj_matrix, dim_x, dim_z, n_cat )
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
pistarjj_matrix |
A numeric matrix of the average
conditional probability |
dim_x |
The number of columns of the design matrix of the true outcome mechanism, |
dim_z |
The number of columns of the design matrix of the observation mechanism, |
n_cat |
The number of categorical values that the true outcome, |
check_and_fix_chains
returns a numeric list of the samples from
n_chains
MCMC chains which have been corrected for label switching if
the following assumption is not met: .
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
check_and_fix_chains_2stage( n_chains, chains_list, pistarjj_matrix, pitildejjj_matrix, dim_x, dim_z, dim_v, n_cat )
check_and_fix_chains_2stage( n_chains, chains_list, pistarjj_matrix, pitildejjj_matrix, dim_x, dim_z, dim_v, n_cat )
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
pistarjj_matrix |
A numeric matrix of the average
conditional probability |
pitildejjj_matrix |
A numeric matrix of the average conditional probability
|
dim_x |
The number of columns of the design matrix of the true outcome mechanism, |
dim_z |
The number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
The number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
The number of categorical values that the true outcome, |
check_and_fix_chains
returns a numeric list of the samples from
n_chains
MCMC chains which have been corrected for label switching if
the following assumption is not met: .
Generate Data to use in COMBO Functions
COMBO_data(sample_size, x_mu, x_sigma, z_shape, beta, gamma)
COMBO_data(sample_size, x_mu, x_sigma, z_shape, beta, gamma)
sample_size |
An integer specifying the sample size of the generated data set. |
x_mu |
A numeric value specifying the mean of |
x_sigma |
A positive numeric value specifying the standard deviation of
|
z_shape |
A positive numeric value specifying the shape parameter of
|
beta |
A column matrix of |
gamma |
A numeric matrix of |
COMBO_data
returns a list of generated data elements:
obs_Y |
A vector of observed outcomes. |
true_Y |
A vector of true outcomes. |
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
x |
A vector of generated predictor values in the true outcome mechanism, from the Normal distribution. |
z |
A vector of generated predictor values in the observation mechanism from the Gamma distribution. |
x_design_matrix |
The design matrix for the |
z_design_matrix |
The design matrix for the |
set.seed(123) n <- 500 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) my_data <- COMBO_data(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z_shape = z_shape, beta = true_beta, gamma = true_gamma) table(my_data[["obs_Y"]], my_data[["true_Y"]])
set.seed(123) n <- 500 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) my_data <- COMBO_data(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z_shape = z_shape, beta = true_beta, gamma = true_gamma) table(my_data[["obs_Y"]], my_data[["true_Y"]])
Generate data to use in two-stage COMBO Functions
COMBO_data_2stage( sample_size, x_mu, x_sigma, z1_shape, z2_shape, beta, gamma1, gamma2 )
COMBO_data_2stage( sample_size, x_mu, x_sigma, z1_shape, z2_shape, beta, gamma1, gamma2 )
sample_size |
An integer specifying the sample size of the generated data set. |
x_mu |
A numeric value specifying the mean of |
x_sigma |
A positive numeric value specifying the standard deviation of
|
z1_shape |
A positive numeric value specifying the shape parameter of
|
z2_shape |
A positive numeric value specifying the shape parameter of
|
beta |
A column matrix of |
gamma1 |
A numeric matrix of |
gamma2 |
A numeric array of |
COMBO_data_2stage
returns a list of generated data elements:
obs_Ystar1 |
A vector of first-stage observed outcomes. |
obs_Ystar2 |
A vector of second-stage observed outcomes. |
true_Y |
A vector of true outcomes. |
obs_Ystar1_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ystar2_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
x |
A vector of generated predictor values in the true outcome mechanism, from the Normal distribution. |
z1 |
A vector of generated predictor values in the first-stage observation mechanism from the Gamma distribution. |
z2 |
A vector of generated predictor values in the second-stage observation mechanism from the Gamma distribution. |
x_design_matrix |
The design matrix for the |
z1_design_matrix |
The design matrix for the |
z2_design_matrix |
The design matrix for the |
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) my_data <- COMBO_data_2stage(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z1_shape = z1_shape, z2_shape = z2_shape, beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2) table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) my_data <- COMBO_data_2stage(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z1_shape = z1_shape, z2_shape = z2_shape, beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2) table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
Jointly estimate and
parameters from the true outcome
and observation mechanisms, respectively, in a binary outcome misclassification
model.
COMBO_EM( Ystar, x_matrix, z_matrix, beta_start, gamma_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMBO_EM( Ystar, x_matrix, z_matrix, beta_start, gamma_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
COMBO_EM
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the binary misclassification model, as well as two
additional cases. The "SAMBA" parameter estimates are from the R Package,
SAMBA, which uses the EM algorithm to estimate a binary outcome misclassification
model that assumes there is perfect specificity. The "PSens" parameter estimates
are estimated using the EM algorithm for the binary outcome misclassification
model that assumes there is perfect sensitivitiy. The "Naive" parameter
estimates are from a simple logistic regression Y* ~ X
.
Beesley, L. and Mukherjee, B. (2020). Statistical inference for association studies using electronic health records: Handling both selection bias and outcome misclassification. Biometrics, 78, 214-226.
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z_matrix = matrix(rgamma(n, z_shape), ncol = 1) Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_zg = exp(Z %*% true_gamma) pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE) pistar_result = exp_zg / pistar_denominator pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1], pistar_result[,2], 1 - pistar_result[,2]), ncol = 2, byrow = FALSE) obs_Y <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y[i] = which(rmultinom(1, 1, pistar_matrix[c(i, n + i), true_j]) == 1) } Ystar <- obs_Y starting_values <- rep(1,6) beta_start <- matrix(starting_values[1:2], ncol = 1) gamma_start <- matrix(starting_values[3:6], ncol = 2, nrow = 2, byrow = FALSE) EM_results <- COMBO_EM(Ystar, x_matrix = x_matrix, z_matrix = z_matrix, beta_start = beta_start, gamma_start = gamma_start) EM_results
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z_matrix = matrix(rgamma(n, z_shape), ncol = 1) Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_zg = exp(Z %*% true_gamma) pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE) pistar_result = exp_zg / pistar_denominator pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1], pistar_result[,2], 1 - pistar_result[,2]), ncol = 2, byrow = FALSE) obs_Y <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y[i] = which(rmultinom(1, 1, pistar_matrix[c(i, n + i), true_j]) == 1) } Ystar <- obs_Y starting_values <- rep(1,6) beta_start <- matrix(starting_values[1:2], ncol = 1) gamma_start <- matrix(starting_values[3:6], ncol = 2, nrow = 2, byrow = FALSE) EM_results <- COMBO_EM(Ystar, x_matrix = x_matrix, z_matrix = z_matrix, beta_start = beta_start, gamma_start = gamma_start) EM_results
Jointly estimate ,
,
parameters from the true outcome,
first-stage observation, and second-stage observation mechanisms, respectively,
in a two-stage binary outcome misclassification model.
COMBO_EM_2stage( Ystar1, Ystar2, x_matrix, z1_matrix, z2_matrix, beta_start, gamma1_start, gamma2_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
COMBO_EM_2stage( Ystar1, Ystar2, x_matrix, z1_matrix, z2_matrix, beta_start, gamma1_start, gamma2_start, tolerance = 1e-07, max_em_iterations = 1500, em_method = "squarem" )
Ystar1 |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ystar2 |
A numeric vector of indicator variables (1, 2) for the second-stage
observed outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z1_matrix |
A numeric matrix of covariates in the first-stage observation mechanism.
|
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma1_start |
A numeric vector or matrix of starting values for the |
gamma2_start |
A numeric array of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
COMBO_EM_2stage
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the two-stage binary misclassification model.
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) my_data <- COMBO_data_2stage(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z1_shape = z1_shape, z2_shape = z2_shape, beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2) table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]]) beta_start <- rnorm(length(c(true_beta))) gamma1_start <- rnorm(length(c(true_gamma1))) gamma2_start <- rnorm(length(c(true_gamma2))) EM_results <- COMBO_EM_2stage(Ystar1 = my_data[["obs_Ystar1"]], Ystar2 = my_data[["obs_Ystar2"]], x_matrix = my_data[["x"]], z1_matrix = my_data[["z1"]], z2_matrix = my_data[["z2"]], beta_start = beta_start, gamma1_start = gamma1_start, gamma2_start = gamma2_start) EM_results
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) my_data <- COMBO_data_2stage(sample_size = n, x_mu = x_mu, x_sigma = x_sigma, z1_shape = z1_shape, z2_shape = z2_shape, beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2) table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]]) beta_start <- rnorm(length(c(true_beta))) gamma1_start <- rnorm(length(c(true_gamma1))) gamma2_start <- rnorm(length(c(true_gamma2))) EM_results <- COMBO_EM_2stage(Ystar1 = my_data[["obs_Ystar1"]], Ystar2 = my_data[["obs_Ystar2"]], x_matrix = my_data[["x"]], z1_matrix = my_data[["z1"]], z2_matrix = my_data[["z2"]], beta_start = beta_start, gamma1_start = gamma1_start, gamma2_start = gamma2_start) EM_results
A dataset for testing the COMBO_EM
function, generated from the
COMBO_data
function.
COMBO_EM_data
COMBO_EM_data
A list containing 6 variables for 1000 observations:
The true outcome variable
The observed outcome variable
A matrix of predictor values in the true outcome mechanism
A matrix of predictor values in the observed outcome mechanism
Beta parameter values used for data generation in the true outcome mechanism
Gamma parameter values used for data generation in the observed outcome mechanism
## Not run: data("COMBO_EM_data") head(COMBO_EM_data) ## End(Not run)
## Not run: data("COMBO_EM_data") head(COMBO_EM_data) ## End(Not run)
Jointly estimate and
parameters from the true outcome
and observation mechanisms, respectively, in a binary outcome misclassification
model.
COMBO_MCMC( Ystar, x_matrix, z_matrix, prior, beta_prior_parameters, gamma_prior_parameters, number_MCMC_chains = 4, MCMC_sample = 2000, burn_in = 1000, display_progress = TRUE )
COMBO_MCMC( Ystar, x_matrix, z_matrix, prior, beta_prior_parameters, gamma_prior_parameters, number_MCMC_chains = 4, MCMC_sample = 2000, burn_in = 1000, display_progress = TRUE )
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
prior |
A character string specifying the prior distribution for the
|
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute.
The default is |
MCMC_sample |
An integer specifying the number of MCMC samples to draw.
The default is |
burn_in |
An integer specifying the number of MCMC samples to discard
for the burn-in period. The default is |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
COMBO_MCMC
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
posterior_sample_df |
A data frame containing three columns. The first
column indicates the chain from which a sample is taken, from 1 to |
posterior_means_df |
A data frame containing three columns. The first
column specifies the parameter associated with a given row. Parameters are
indexed as in the |
naive_posterior_sample_df |
A data frame containing three columns.
The first column indicates the chain from which a sample is taken, from
1 to |
naive_posterior_means_df |
A data frame containing three columns. The first
column specifies the naive parameter associated with a given row. Parameters are
indexed as in the |
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z_matrix = matrix(rgamma(n, z_shape), ncol = 1) Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_zg = exp(Z %*% true_gamma) pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE) pistar_result = exp_zg / pistar_denominator pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1], pistar_result[,2], 1 - pistar_result[,2]), ncol = 2, byrow = FALSE) obs_Y <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y[i] = which(rmultinom(1, 1, pistar_matrix[c(i, n + i), true_j]) == 1) } Ystar <- obs_Y unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE) unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE) unif_lower_gamma <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_gamma <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta) gamma_prior_parameters <- list(lower = unif_lower_gamma, upper = unif_upper_gamma) MCMC_results <- COMBO_MCMC(Ystar, x = x_matrix, z = z_matrix, prior = "uniform", beta_prior_parameters = beta_prior_parameters, gamma_prior_parameters = gamma_prior_parameters, number_MCMC_chains = 2, MCMC_sample = 200, burn_in = 100) MCMC_results$posterior_means_df
set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z_matrix = matrix(rgamma(n, z_shape), ncol = 1) Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_zg = exp(Z %*% true_gamma) pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE) pistar_result = exp_zg / pistar_denominator pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1], pistar_result[,2], 1 - pistar_result[,2]), ncol = 2, byrow = FALSE) obs_Y <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y[i] = which(rmultinom(1, 1, pistar_matrix[c(i, n + i), true_j]) == 1) } Ystar <- obs_Y unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE) unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE) unif_lower_gamma <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_gamma <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta) gamma_prior_parameters <- list(lower = unif_lower_gamma, upper = unif_upper_gamma) MCMC_results <- COMBO_MCMC(Ystar, x = x_matrix, z = z_matrix, prior = "uniform", beta_prior_parameters = beta_prior_parameters, gamma_prior_parameters = gamma_prior_parameters, number_MCMC_chains = 2, MCMC_sample = 200, burn_in = 100) MCMC_results$posterior_means_df
Jointly estimate ,
, and
parameters from the true outcome
first-stage observation, and second-stage observation mechanisms, respectively,
in a two-stage binary outcome misclassification model.
COMBO_MCMC_2stage( Ystar1, Ystar2, x_matrix, z1_matrix, z2_matrix, prior, beta_prior_parameters, gamma1_prior_parameters, gamma2_prior_parameters, naive_gamma2_prior_parameters, number_MCMC_chains = 4, MCMC_sample = 2000, burn_in = 1000, display_progress = TRUE )
COMBO_MCMC_2stage( Ystar1, Ystar2, x_matrix, z1_matrix, z2_matrix, prior, beta_prior_parameters, gamma1_prior_parameters, gamma2_prior_parameters, naive_gamma2_prior_parameters, number_MCMC_chains = 4, MCMC_sample = 2000, burn_in = 1000, display_progress = TRUE )
Ystar1 |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
Ystar2 |
A numeric vector of indicator variables (1, 2) for the second-stage
observed outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z1_matrix |
A numeric matrix of covariates in the observation mechanism.
|
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
prior |
A character string specifying the prior distribution for the
|
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma1_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma2_prior_parameters |
A numeric list of prior distribution parameters
for the |
naive_gamma2_prior_parameters |
A numeric list of prior distribution parameters
for the naive model |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute.
The default is |
MCMC_sample |
An integer specifying the number of MCMC samples to draw.
The default is |
burn_in |
An integer specifying the number of MCMC samples to discard
for the burn-in period. The default is |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
COMBO_MCMC_2stage
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
posterior_sample_df |
A data frame containing three columns. The first
column indicates the chain from which a sample is taken, from 1 to |
posterior_means_df |
A data frame containing three columns. The first
column specifies the parameter associated with a given row. Parameters are
indexed as in the |
naive_posterior_sample_df |
A data frame containing three columns.
The first column indicates the chain from which a sample is taken, from
1 to |
naive_posterior_means_df |
A data frame containing three columns. The first
column specifies the naive parameter associated with a given row. Parameters are
indexed as in the |
# Helper functions sum_every_n <- function(x, n){ vector_groups = split(x, ceiling(seq_along(x) / n)) sum_x = Reduce(`+`, vector_groups) return(sum_x) } sum_every_n1 <- function(x, n){ vector_groups = split(x, ceiling(seq_along(x) / n)) sum_x = Reduce(`+`, vector_groups) + 1 return(sum_x) } # Example set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z1_matrix = matrix(rgamma(n, z1_shape), ncol = 1) Z1 = matrix(c(rep(1, n), z1_matrix[,1]), ncol = 2, byrow = FALSE) z2_matrix = matrix(rgamma(n, z2_shape), ncol = 1) Z2 = matrix(c(rep(1, n), z2_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_z1g1 = exp(Z1 %*% true_gamma1) pistar1_denominator = matrix(c(1 + exp_z1g1[,1], 1 + exp_z1g1[,2]), ncol = 2, byrow = FALSE) pistar1_result = exp_z1g1 / pistar1_denominator pistar1_matrix = matrix(c(pistar1_result[,1], 1 - pistar1_result[,1], pistar1_result[,2], 1 - pistar1_result[,2]), ncol = 2, byrow = FALSE) obs_Y1 <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y1[i] = which(rmultinom(1, 1, pistar1_matrix[c(i, n + i), true_j]) == 1) } Ystar1 <- obs_Y1 exp_z2g2_1 = exp(Z2 %*% true_gamma2[,,1]) exp_z2g2_2 = exp(Z2 %*% true_gamma2[,,2]) pi_denominator1 = apply(exp_z2g2_1, FUN = sum_every_n1, n, MARGIN = 2) pi_result1 = exp_z2g2_1 / rbind(pi_denominator1) pi_denominator2 = apply(exp_z2g2_2, FUN = sum_every_n1, n, MARGIN = 2) pi_result2 = exp_z2g2_2 / rbind(pi_denominator2) pistar2_matrix1 = rbind(pi_result1, 1 - apply(pi_result1, FUN = sum_every_n, n = n, MARGIN = 2)) pistar2_matrix2 = rbind(pi_result2, 1 - apply(pi_result2, FUN = sum_every_n, n = n, MARGIN = 2)) pistar2_array = array(c(pistar2_matrix1, pistar2_matrix2), dim = c(dim(pistar2_matrix1), 2)) obs_Y2 <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_k = Ystar1[i] obs_Y2[i] = which(rmultinom(1, 1, pistar2_array[c(i,n+ i), obs_k, true_j]) == 1) } Ystar2 <- obs_Y2 unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE) unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE) unif_lower_gamma1 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_gamma1 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) unif_upper_gamma2 <- array(rep(c(5, NA), 8), dim = c(2,2,2,2)) unif_lower_gamma2 <- array(rep(c(-5, NA), 8), dim = c(2,2,2,2)) unif_lower_naive_gamma2 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_naive_gamma2 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta) gamma1_prior_parameters <- list(lower = unif_lower_gamma1, upper = unif_upper_gamma1) gamma2_prior_parameters <- list(lower = unif_lower_gamma2, upper = unif_upper_gamma2) naive_gamma2_prior_parameters <- list(lower = unif_lower_naive_gamma2, upper = unif_upper_naive_gamma2) MCMC_results <- COMBO_MCMC_2stage(Ystar1, Ystar2, x_matrix = x_matrix, z1_matrix = z1_matrix, z2_matrix = z2_matrix, prior = "uniform", beta_prior_parameters = beta_prior_parameters, gamma1_prior_parameters = gamma1_prior_parameters, gamma2_prior_parameters = gamma2_prior_parameters, naive_gamma2_prior_parameters = naive_gamma2_prior_parameters, number_MCMC_chains = 2, MCMC_sample = 200, burn_in = 100) MCMC_results$posterior_means_df
# Helper functions sum_every_n <- function(x, n){ vector_groups = split(x, ceiling(seq_along(x) / n)) sum_x = Reduce(`+`, vector_groups) return(sum_x) } sum_every_n1 <- function(x, n){ vector_groups = split(x, ceiling(seq_along(x) / n)) sum_x = Reduce(`+`, vector_groups) + 1 return(sum_x) } # Example set.seed(123) n <- 1000 x_mu <- 0 x_sigma <- 1 z1_shape <- 1 z2_shape <- 1 true_beta <- matrix(c(1, -2), ncol = 1) true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE) true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2)) x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1) X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE) z1_matrix = matrix(rgamma(n, z1_shape), ncol = 1) Z1 = matrix(c(rep(1, n), z1_matrix[,1]), ncol = 2, byrow = FALSE) z2_matrix = matrix(rgamma(n, z2_shape), ncol = 1) Z2 = matrix(c(rep(1, n), z2_matrix[,1]), ncol = 2, byrow = FALSE) exp_xb = exp(X %*% true_beta) pi_result = exp_xb[,1] / (exp_xb[,1] + 1) pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE) true_Y <- rep(NA, n) for(i in 1:n){ true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1) } exp_z1g1 = exp(Z1 %*% true_gamma1) pistar1_denominator = matrix(c(1 + exp_z1g1[,1], 1 + exp_z1g1[,2]), ncol = 2, byrow = FALSE) pistar1_result = exp_z1g1 / pistar1_denominator pistar1_matrix = matrix(c(pistar1_result[,1], 1 - pistar1_result[,1], pistar1_result[,2], 1 - pistar1_result[,2]), ncol = 2, byrow = FALSE) obs_Y1 <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_Y1[i] = which(rmultinom(1, 1, pistar1_matrix[c(i, n + i), true_j]) == 1) } Ystar1 <- obs_Y1 exp_z2g2_1 = exp(Z2 %*% true_gamma2[,,1]) exp_z2g2_2 = exp(Z2 %*% true_gamma2[,,2]) pi_denominator1 = apply(exp_z2g2_1, FUN = sum_every_n1, n, MARGIN = 2) pi_result1 = exp_z2g2_1 / rbind(pi_denominator1) pi_denominator2 = apply(exp_z2g2_2, FUN = sum_every_n1, n, MARGIN = 2) pi_result2 = exp_z2g2_2 / rbind(pi_denominator2) pistar2_matrix1 = rbind(pi_result1, 1 - apply(pi_result1, FUN = sum_every_n, n = n, MARGIN = 2)) pistar2_matrix2 = rbind(pi_result2, 1 - apply(pi_result2, FUN = sum_every_n, n = n, MARGIN = 2)) pistar2_array = array(c(pistar2_matrix1, pistar2_matrix2), dim = c(dim(pistar2_matrix1), 2)) obs_Y2 <- rep(NA, n) for(i in 1:n){ true_j = true_Y[i] obs_k = Ystar1[i] obs_Y2[i] = which(rmultinom(1, 1, pistar2_array[c(i,n+ i), obs_k, true_j]) == 1) } Ystar2 <- obs_Y2 unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE) unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE) unif_lower_gamma1 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_gamma1 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) unif_upper_gamma2 <- array(rep(c(5, NA), 8), dim = c(2,2,2,2)) unif_lower_gamma2 <- array(rep(c(-5, NA), 8), dim = c(2,2,2,2)) unif_lower_naive_gamma2 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA), dim = c(2,2,2)) unif_upper_naive_gamma2 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA), dim = c(2,2,2)) beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta) gamma1_prior_parameters <- list(lower = unif_lower_gamma1, upper = unif_upper_gamma1) gamma2_prior_parameters <- list(lower = unif_lower_gamma2, upper = unif_upper_gamma2) naive_gamma2_prior_parameters <- list(lower = unif_lower_naive_gamma2, upper = unif_upper_naive_gamma2) MCMC_results <- COMBO_MCMC_2stage(Ystar1, Ystar2, x_matrix = x_matrix, z1_matrix = z1_matrix, z2_matrix = z2_matrix, prior = "uniform", beta_prior_parameters = beta_prior_parameters, gamma1_prior_parameters = gamma1_prior_parameters, gamma2_prior_parameters = gamma2_prior_parameters, naive_gamma2_prior_parameters = naive_gamma2_prior_parameters, number_MCMC_chains = 2, MCMC_sample = 200, burn_in = 100) MCMC_results$posterior_means_df
EM-Algorithm Function for Estimation of the Misclassification Model
em_function(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
em_function(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
param_current |
A numeric vector of regression parameters, in the order
|
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
em_function
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
EM-Algorithm Function for Estimation of the Two-Stage Misclassification Model
em_function_2stage( param_current, obs_Ystar_matrix, obs_Ytilde_matrix, X, Z, V, sample_size, n_cat )
em_function_2stage( param_current, obs_Ystar_matrix, obs_Ytilde_matrix, X, Z, V, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrices, |
n_cat |
The number of categorical values that the true outcome, |
em_function_2stage
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
expit(x)
expit(x)
x |
A numeric value or vector to compute the expit function on. |
expit
returns the result of the function
for a given
x
.
jags.model
Object for a Given PriorSet up a Binary Outcome Misclassification jags.model
Object for a Given Prior
jags_picker( prior, sample_size, dim_x, dim_z, n_cat, Ystar, X, Z, beta_prior_parameters, gamma_prior_parameters, number_MCMC_chains, model_file, display_progress = TRUE )
jags_picker( prior, sample_size, dim_x, dim_z, n_cat, Ystar, X, Z, beta_prior_parameters, gamma_prior_parameters, number_MCMC_chains, model_file, display_progress = TRUE )
prior |
A character string specifying the prior distribution for the
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
jags_picker
returns a jags.model
object for a binary
outcome misclassification model. The object includes the specified
prior distribution, model, number of chains, and data.
jags.model
Object for a Given PriorSet up a Two-Stage Binary Outcome Misclassification jags.model
Object for a Given Prior
jags_picker_2stage( prior, sample_size, dim_x, dim_z, dim_v, n_cat, Ystar, Ytilde, X, Z, V, beta_prior_parameters, gamma_prior_parameters, delta_prior_parameters, number_MCMC_chains, model_file, display_progress = TRUE )
jags_picker_2stage( prior, sample_size, dim_x, dim_z, dim_v, n_cat, Ystar, Ytilde, X, Z, V, beta_prior_parameters, gamma_prior_parameters, delta_prior_parameters, number_MCMC_chains, model_file, display_progress = TRUE )
prior |
A character string specifying the prior distribution for the
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ytilde |
A numeric vector of indicator variables (1, 2) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
delta_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
jags_picker
returns a jags.model
object for a two-stage binary
outcome misclassification model. The object includes the specified
prior distribution, model, number of chains, and data.
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
label_switch(chain_matrix, dim_x, dim_z, n_cat)
label_switch(chain_matrix, dim_x, dim_z, n_cat)
chain_matrix |
A numeric matrix containing the posterior samples for all
parameters in a given MCMC chain. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
label_switch
returns a named matrix of MCMC posterior samples for
all parameters after performing label switching according the following pattern:
all terms are multiplied by -1, all
terms are "swapped"
with the opposite
j
index.
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
label_switch_2stage(chain_matrix, dim_x, dim_z, dim_v, n_cat)
label_switch_2stage(chain_matrix, dim_x, dim_z, dim_v, n_cat)
chain_matrix |
A numeric matrix containing the posterior samples for all
parameters in a given MCMC chain. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
label_switch_2stage
returns a named matrix of MCMC posterior samples for
all parameters after performing label switching according the following pattern:
all terms are multiplied by -1, all
and
terms are "swapped"
with the opposite
j
index.
Expected Complete Data Log-Likelihood Function for Estimation of the Misclassification Model
loglik(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
loglik(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
param_current |
A numeric vector of regression parameters, in the order
|
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
sample_size |
Integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
loglik
returns the negative value of the expected log-likelihood function,
,
at the provided inputs.
Expected Complete Data Log-Likelihood Function for Estimation of the Two-Stage Misclassification Model
loglik_2stage( param_current, obs_Ystar_matrix, obs_Ytilde_matrix, X, Z, V, sample_size, n_cat )
loglik_2stage( param_current, obs_Ystar_matrix, obs_Ytilde_matrix, X, Z, V, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrices, |
n_cat |
The number of categorical values that the true outcome, |
loglik_2stage
returns the negative value of the expected log-likelihood function,
,
at the provided inputs.
Example data from The Law School Admissions Council's (LSAC) National Bar Passage Study (Linda Wightman, 1998)
LSAC_data
LSAC_data
A dataframe 39 columns, including background and demographic information, as well as if the candidates passed the bar exam to become lawyers in the USA.
## Not run: data("LSAC_data") head(LSAC_data) ## End(Not run)
## Not run: data("LSAC_data") head(LSAC_data) ## End(Not run)
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects
mean_pistarjj_compute(pistar_matrix, j, sample_size)
mean_pistarjj_compute(pistar_matrix, j, sample_size)
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
j |
An integer value representing the true outcome category to compute
the average conditional probability of correct classification for.
|
sample_size |
An integer value specifying the number of observations in the sample. |
mean_pistarjj_compute
returns a numeric value equal to the average
conditional probability across all subjects.
Compute the conditional probability of observing outcome given
the latent true outcome
as
for each of the
n
subjects.
misclassification_prob(gamma_matrix, z_matrix)
misclassification_prob(gamma_matrix, z_matrix)
gamma_matrix |
A numeric matrix of estimated regression parameters for the
observation mechanism, |
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
misclassification_prob
returns a dataframe containing four columns.
The first column, Subject
, represents the subject ID, from to
n
,
where n
is the sample size, or equivalently, the number of rows in z_matrix
.
The second column, Y
, represents a true, latent outcome category .
The third column,
Ystar
, represents an observed outcome category .
The last column,
Probability
, is the value of the equation
computed for each subject, observed outcome category, and true, latent outcome category.
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gammas <- matrix(c(1, -1, .5, .2, -.6, 1.5), ncol = 2) P_Ystar_Y <- misclassification_prob(estimated_gammas, z_matrix) head(P_Ystar_Y)
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gammas <- matrix(c(1, -1, .5, .2, -.6, 1.5), ncol = 2) P_Ystar_Y <- misclassification_prob(estimated_gammas, z_matrix) head(P_Ystar_Y)
Compute the conditional probability of observing second-stage outcome given
the latent true outcome
and the first-stage outcome
as
for each of the
subjects.
misclassification_prob2(gamma2_array, z2_matrix)
misclassification_prob2(gamma2_array, z2_matrix)
gamma2_array |
A numeric array of estimated regression parameters for the
observation mechanism, |
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
misclassification_prob2
returns a dataframe containing five columns.
The first column, Subject
, represents the subject ID, from to
n
,
where n
is the sample size, or equivalently, the number of rows in z2_matrix
.
The second column, Y
, represents a true, latent outcome category .
The third column,
Ystar1
, represents a first-stage observed outcome category .
The fourth column,
Ystar2
, represents a second-stage observed outcome category .
The last column,
Probability
, is the value of the equation
computed for each subject, first-stage observed outcome category, second-stage
observed outcome category, and true, latent outcome category.
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z2_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gamma2 <- array(c(1, -1, .5, .2, -.6, 1.5, -1, .5, -1, -.5, -1, -.5), dim = c(3,2,2)) P_Ystar2_Ystar1_Y <- misclassification_prob2(estimated_gamma2, z2_matrix) head(P_Ystar2_Ystar1_Y)
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) z2_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_gamma2 <- array(c(1, -1, .5, .2, -.6, 1.5, -1, .5, -1, -.5, -1, -.5), dim = c(3,2,2)) P_Ystar2_Ystar1_Y <- misclassification_prob2(estimated_gamma2, z2_matrix) head(P_Ystar2_Ystar1_Y)
Select a Binary Outcome Misclassification Model for a Given Prior
model_picker(prior)
model_picker(prior)
prior |
A character string specifying the prior distribution for the
|
model_picker
returns a character string specifying the binary
outcome misclassification model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Select a Two-Stage Binary Outcome Misclassification Model for a Given Prior
model_picker_2stage(prior)
model_picker_2stage(prior)
prior |
A character string specifying the prior distribution for the
|
model_picker
returns a character string specifying the two-stage binary
outcome misclassification model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
jags.model
Object for a Given PriorSet up a Naive Logistic Regression jags.model
Object for a Given Prior
naive_jags_picker( prior, sample_size, dim_x, n_cat, Ystar, X, beta_prior_parameters, number_MCMC_chains, naive_model_file, display_progress = TRUE )
naive_jags_picker( prior, sample_size, dim_x, n_cat, Ystar, X, beta_prior_parameters, number_MCMC_chains, naive_model_file, display_progress = TRUE )
prior |
character string specifying the prior distribution for the naive
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
naive_model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
naive_jags_picker
returns a jags.model
object for a naive
logistic regression model predicting the potentially misclassified Y*
from the predictor matrix x
. The object includes the specified
prior distribution, model, number of chains, and data.
jags.model
Object for a Given PriorSet up a Naive Two-Stage Regression jags.model
Object for a Given Prior
naive_jags_picker_2stage( prior, sample_size, dim_x, dim_v, n_cat, Ystar, Ytilde, X, V, beta_prior_parameters, delta_prior_parameters, number_MCMC_chains, naive_model_file, display_progress = TRUE )
naive_jags_picker_2stage( prior, sample_size, dim_x, dim_v, n_cat, Ystar, Ytilde, X, V, beta_prior_parameters, delta_prior_parameters, number_MCMC_chains, naive_model_file, display_progress = TRUE )
prior |
character string specifying the prior distribution for the naive
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the first-stage outcome mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage outcome mechanism, |
n_cat |
An integer specifying the number of categorical values that the observed outcomes can take. |
Ystar |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ytilde |
A numeric vector of indicator variables (1, 2) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
V |
A numeric design matrix for the second-stage outcome mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
delta_prior_parameters |
A numeric list of prior distribution parameters
for the naive |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
naive_model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
naive_jags_picker_2stage
returns a jags.model
object for a naive
two-stage regression model predicting the potentially misclassified Y*
from the predictor matrix x
and the potentially misclassified
from the predictor matrix
v
. The object includes the specified
prior distribution, model, number of chains, and data.
Observed Data Log-Likelihood Function for Estimation of the Naive Two-Stage Misclassification Model
naive_loglik_2stage( param_current, X, V, obs_Ystar_matrix, obs_Ytilde_matrix, sample_size, n_cat )
naive_loglik_2stage( param_current, X, V, obs_Ystar_matrix, obs_Ytilde_matrix, sample_size, n_cat )
param_current |
A numeric vector of regression parameters, in the order
|
X |
A numeric design matrix for the first-stage observed mechanism. |
V |
A numeric design matrix for the second-stage observed mechanism. |
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
sample_size |
Integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the first- and second-stage outcomes,
|
naive_loglik_2stage
returns the negative value of the observed data log-likelihood function,
,
at the provided inputs.
Select a Logisitic Regression Model for a Given Prior
naive_model_picker(prior)
naive_model_picker(prior)
prior |
A character string specifying the prior distribution for the naive
|
naive_model_picker
returns a character string specifying the
logistic regression model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Select a Naive Two-Stage Regression Model for a Given Prior
naive_model_picker_2stage(prior)
naive_model_picker_2stage(prior)
prior |
A character string specifying the prior distribution for the naive
|
naive_model_picker_2stage
returns a character string specifying the
logistic regression model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Code is adapted by the SAMBA R package from Lauren Beesley and Bhramar Mukherjee.
perfect_sensitivity_EM( Ystar, Z, X, start, beta0_fixed = NULL, weights = NULL, expected = TRUE, tolerance = 1e-07, max_em_iterations = 1500 )
perfect_sensitivity_EM( Ystar, Z, X, start, beta0_fixed = NULL, weights = NULL, expected = TRUE, tolerance = 1e-07, max_em_iterations = 1500 )
Ystar |
A numeric vector of indicator variables (1, 0) for the observed
outcome |
Z |
A numeric matrix of covariates in the true outcome mechanism.
|
X |
A numeric matrix of covariates in the observation mechanism.
|
start |
Numeric vector of starting values for parameters in the true
outcome mechanism ( |
beta0_fixed |
Optional numeric vector of values of the observation mechanism
intercept to profile over. If a single value is entered, this corresponds to
fixing the intercept at the specified value. The default is |
weights |
Optional vector of row-specific weights used for selection bias
adjustment. The default is |
expected |
A logical value indicating whether or not to calculate the
covariance matrix via the expected Fisher information matrix. The default is |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
perfect_sensitivity_EM
returns a list containing nine elements.
The elements are detailed in ?SAMBA::obsloglikEM
documentation. Code
is adapted from the SAMBA::obsloglikEM
function.
Beesley, L. and Mukherjee, B. (2020). Statistical inference for association studies using electronic health records: Handling both selection bias and outcome misclassification. Biometrics, 78, 214-226.
Compute Probability of Each True Outcome, for Every Subject
pi_compute(beta, X, n, n_cat)
pi_compute(beta, X, n, n_cat)
beta |
A numeric column matrix of regression parameters for the
|
X |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pi_compute
returns a matrix of probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject. Columns of the matrix correspond to the true outcome
categories
n_cat
.
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain
pistar_by_chain(n_chains, chains_list, Z, n, n_cat)
pistar_by_chain(n_chains, chains_list, Z, n, n_cat)
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_by_chain
returns a numeric matrix of the average
conditional probability across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to
n_chains
.
The first column contains the conditional probability .
The second column contains the conditional probability
.
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain for a 2-stage model
pistar_by_chain_2stage(n_chains, chains_list, Z, n, n_cat)
pistar_by_chain_2stage(n_chains, chains_list, Z, n, n_cat)
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_by_chain
returns a numeric matrix of the average
conditional probability across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to
n_chains
.
The first column contains the conditional probability .
The second column contains the conditional probability
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome, for Every Subject
pistar_compute(gamma, Z, n, n_cat)
pistar_compute(gamma, Z, n, n_cat)
gamma |
A numeric matrix of regression parameters for the observed
outcome mechanism, |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_compute
returns a matrix of conditional probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject and observed category $1$ occurs at row
. The probability
for subject
and observed category $2$ occurs at row
n
.
Columns of the matrix correspond to the true outcome categories
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
pistar_compute_for_chains(chain_colMeans, Z, n, n_cat)
pistar_compute_for_chains(chain_colMeans, Z, n, n_cat)
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_compute_for_chains
returns a matrix of conditional probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject and observed category $0$ occurs at row
. The probability
for subject
and observed category $1$ occurs at row
n
.
Columns of the matrix correspond to the true outcome categories
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject for 2-stage models
pistar_compute_for_chains_2stage(chain_colMeans, Z, n, n_cat)
pistar_compute_for_chains_2stage(chain_colMeans, Z, n, n_cat)
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pistar_compute_for_chains
returns a matrix of conditional probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject and observed category $0$ occurs at row
. The probability
for subject
and observed category $1$ occurs at row
n
.
Columns of the matrix correspond to the true outcome categories
n_cat
.
Compute the Mean Conditional Probability of Second-Stage Correct Classification, by First-Stage and True Outcome Across all Subjects for each MCMC Chain
pitilde_by_chain(n_chains, chains_list, V, n, n_cat)
pitilde_by_chain(n_chains, chains_list, V, n, n_cat)
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pitilde_by_chain
returns a numeric matrix of the average
conditional probability across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to
n_chains
.
The first column contains the conditional probability .
The second column contains the conditional probability
.
Compute Conditional Probability of Each Second-Stage Observed Outcome Given Each True Outcome and First-Stage Observed Outcome, for Every Subject
pitilde_compute(delta, V, n, n_cat)
pitilde_compute(delta, V, n, n_cat)
delta |
A numeric array of regression parameters for the second-stage observed
outcome mechanism, |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pitilde_compute
returns an array of conditional probabilities,
for each of the
n
subjects. Rows of the matrix
correspond to each subject and second-stage observed outcome. Specifically, the probability
for subject and observed category $1$ occurs at row
. The probability
for subject
and observed category $2$ occurs at row
n
.
Columns of the matrix correspond to the first-stage outcome categories,
n_cat
.
The third dimension of the array corresponds to the true outcome categories,
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
pitilde_compute_for_chains(chain_colMeans, V, n, n_cat)
pitilde_compute_for_chains(chain_colMeans, V, n, n_cat)
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
pitilde_compute_for_chains
returns a matrix of conditional probabilities,
corresponding to each subject and observed outcome. Specifically, the probability
for subject
and second-stage observed category $1$ occurs at row
. The probability
for subject
and second-stage observed category $2$ occurs at row
n
.
Columns of the matrix correspond to the first-stage outcome categories
n_cat
.
The third dimension of the array corresponds to the true outcome categories,
n_cat
.
Objective function of the form:
.
Used to obtain estimates of
parameters.
q_beta_f(beta, X, w_mat, sample_size, n_cat)
q_beta_f(beta, X, w_mat, sample_size, n_cat)
beta |
A numeric vector of regression parameters for the
|
X |
A numeric design matrix. |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
q_beta_f
returns the negative value of the expected log-likelihood function,
,
at the provided inputs.
Objective function of the form:
.
Used to obtain estimates of
parameters.
q_delta_f( delta_v, V, obs_Ystar_matrix, obs_Ytilde_matrix, w_mat, sample_size, n_cat )
q_delta_f( delta_v, V, obs_Ystar_matrix, obs_Ytilde_matrix, w_mat, sample_size, n_cat )
delta_v |
A numeric array of regression parameters for the second-stage observed
outcome mechanism, |
V |
A numeric design matrix. |
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
q_beta_f
returns the negative value of the expected log-likelihood function,
,
at the provided inputs.
Objective function of the form:
.
Used to obtain estimates of
parameters.
q_gamma_f(gamma_v, Z, obs_Y_matrix, w_mat, sample_size, n_cat)
q_gamma_f(gamma_v, Z, obs_Y_matrix, w_mat, sample_size, n_cat)
gamma_v |
A numeric vector of regression parameters for the observed
outcome mechanism, |
Z |
A numeric design matrix. |
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
q_beta_f
returns the negative value of the expected log-likelihood function,
,
at the provided inputs.
Sum Every "n"th Element
sum_every_n(x, n)
sum_every_n(x, n)
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
sum_every_n
returns a vector of sums of every n
th element of the vector x
.
Sum Every "n"th Element, then add 1
sum_every_n1(x, n)
sum_every_n1(x, n)
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
sum_every_n1
returns a vector of sums of every n
th element of the vector x
, plus 1.
Compute the probability of the latent true outcome as
for each of the
n
subjects.
true_classification_prob(beta_matrix, x_matrix)
true_classification_prob(beta_matrix, x_matrix)
beta_matrix |
A numeric column matrix of estimated regression parameters for the
true outcome mechanism, |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
true_classification_prob
returns a dataframe containing three columns.
The first column, Subject
, represents the subject ID, from to
n
,
where n
is the sample size, or equivalently, the number of rows in x_matrix
.
The second column, Y
, represents a true, latent outcome category .
The last column,
Probability
, is the value of the equation
computed
for each subject and true, latent outcome category.
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) x_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_betas <- matrix(c(1, -1, .5), ncol = 1) P_Y <- true_classification_prob(estimated_betas, x_matrix) head(P_Y)
set.seed(123) sample_size <- 1000 cov1 <- rnorm(sample_size) cov2 <- rnorm(sample_size, 1, 2) x_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE) estimated_betas <- matrix(c(1, -1, .5), ncol = 1) P_Y <- true_classification_prob(estimated_betas, x_matrix) head(P_Y)
Synthetic example data of pretrial failure risk factors and outcomes, VPRAI recommendations, and judge decisions
VPRAI_synthetic_data
VPRAI_synthetic_data
A dataframe 1990 columns, including defendant race, risk factors, VPRAI recommendations, judge decisions, and pretrial failure outcomes.
## Not run: data("VPRAI_synthetic_data") head(VPRAI_synthetic_data) ## End(Not run)
## Not run: data("VPRAI_synthetic_data") head(VPRAI_synthetic_data) ## End(Not run)
Compute E-step for Binary Outcome Misclassification Model Estimated With the EM-Algorithm
w_j(ystar_matrix, pistar_matrix, pi_matrix, sample_size, n_cat)
w_j(ystar_matrix, pistar_matrix, pi_matrix, sample_size, n_cat)
ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
outcome matrix, |
n_cat |
The number of categorical values that the true outcome, |
w_j
returns a matrix of E-step weights for the EM-algorithm,
computed as follows:
.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true outcome categories
n_cat
.
Compute E-step for Two-Stage Binary Outcome Misclassification Model Estimated With the EM-Algorithm
w_j_2stage( ystar_matrix, ytilde_matrix, pitilde_array, pistar_matrix, pi_matrix, sample_size, n_cat )
w_j_2stage( ystar_matrix, ytilde_matrix, pitilde_array, pistar_matrix, pi_matrix, sample_size, n_cat )
ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pitilde_array |
A numeric array of conditional probabilities obtained from
the internal function |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
outcome matrices, |
n_cat |
The number of categorical values that the true outcome, |
w_j
returns a matrix of E-step weights for the EM-algorithm,
computed as follows:
.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true outcome categories
n_cat
.