The optimize method of a CmdStanModel object runs Stan's optimizer.

Details

CmdStan can find the posterior mode (assuming there is one). If the posterior is not convex, there is no guarantee Stan will be able to find the global mode as opposed to a local optimum of log probability. For optimization, the mode is calculated without the Jacobian adjustment for con- strained variables, which shifts the mode due to the change of variables. Thus modes correspond to modes of the model as written.

-- CmdStan Interface User's Guide

Usage

$optimize(
  data = NULL,
  seed = NULL,
  refresh = NULL,
  init = NULL,
  algorithm = NULL,
  init_alpha = NULL,
  iter = NULL
)

Arguments shared by all fitting methods

The following arguments can be specified for any of the fitting methods (sample, optimize, variational). Arguments left at NULL default to the default used by the installed version of CmdStan.

  • data (multiple options): The data to use:

    • A named list of R objects like for RStan;

    • A path to a data file compatible with CmdStan (R dump or JSON). See the appendices in the CmdStan manual for details on using these formats.

  • seed: (positive integer) A seed for the (P)RNG to pass to CmdStan.

  • refresh: (non-negative integer) The number of iterations between screen updates.

  • init: (multiple options) The initialization method:

    • A real number x>0 initializes randomly between [-x,x] (on the unconstrained parameter space);

    • 0 initializes to 0;

    • A character vector of data file paths (one per chain) to initialization files.

Arguments unique to the optimize method

In addition to the arguments above, the optimize method also has its own set of arguments. These arguments are described briefly here and in greater detail in the CmdStan manual. Arguments left at NULL default to the default used by the installed version of CmdStan.

  • algorithm: (string) The optimization algorithm. One of "lbfgs", "bfgs", or "newton".

  • iter: (positive integer) The number of iterations.

  • init_alpha: (non-negative real) The line search step size for first iteration. Not applicable if algorithm="newton".

Value

The optimize method returns a CmdStanMLE object.

See also

The CmdStanR website (mc-stan.org/cmdstanr) for online documentation and tutorials.

The Stan and CmdStan documentation:

Other CmdStanModel methods: CmdStanModel-method-compile, CmdStanModel-method-sample, CmdStanModel-method-variational

Examples

# \dontrun{ # Set path to cmdstan # Note: if you installed CmdStan via install_cmdstan() with default settings # then default below should work. Otherwise use the `path` argument to # specify the location of your CmdStan installation. set_cmdstan_path(path = NULL)
#> CmdStan path set to: /Users/jgabry/.cmdstanr/cmdstan
# Create a CmdStan model object from a Stan program, # here using the example model that comes with CmdStan stan_program <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.stan") mod <- cmdstan_model(stan_program) mod$print()
#> data { #> int<lower=0> N; #> int<lower=0,upper=1> y[N]; #> } #> parameters { #> real<lower=0,upper=1> theta; #> } #> model { #> theta ~ beta(1,1); #> for (n in 1:N) #> y[n] ~ bernoulli(theta); #> }
# Compile to create executable mod$compile()
#> Running make /Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli #> make: `/Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli' is up to date.
# Run sample method (MCMC via Stan's dynamic HMC/NUTS), # specifying data as a named list (like RStan) standata <- list(N = 10, y =c(0,1,0,0,0,0,0,0,0,1)) fit_mcmc <- mod$sample(data = standata, seed = 123, num_chains = 2)
#> method = sample (Default) #> sample #> num_samples = 1000 (Default) #> num_warmup = 1000 (Default) #> save_warmup = 0 (Default) #> thin = 1 (Default) #> adapt #> engaged = 1 (Default) #> gamma = 0.050000000000000003 (Default) #> delta = 0.80000000000000004 (Default) #> kappa = 0.75 (Default) #> t0 = 10 (Default) #> init_buffer = 75 (Default) #> term_buffer = 50 (Default) #> window = 25 (Default) #> algorithm = hmc (Default) #> hmc #> engine = nuts (Default) #> nuts #> max_depth = 10 (Default) #> metric = diag_e (Default) #> metric_file = (Default) #> stepsize = 1 (Default) #> stepsize_jitter = 0 (Default) #> id = 1 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b02e4d2c66.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> #> Gradient evaluation took 2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 100 / 2000 [ 5%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 300 / 2000 [ 15%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 500 / 2000 [ 25%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 700 / 2000 [ 35%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 900 / 2000 [ 45%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1100 / 2000 [ 55%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1300 / 2000 [ 65%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1500 / 2000 [ 75%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1700 / 2000 [ 85%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 1900 / 2000 [ 95%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.012707 seconds (Warm-up) #> 0.0214 seconds (Sampling) #> 0.034107 seconds (Total) #> #> method = sample (Default) #> sample #> num_samples = 1000 (Default) #> num_warmup = 1000 (Default) #> save_warmup = 0 (Default) #> thin = 1 (Default) #> adapt #> engaged = 1 (Default) #> gamma = 0.050000000000000003 (Default) #> delta = 0.80000000000000004 (Default) #> kappa = 0.75 (Default) #> t0 = 10 (Default) #> init_buffer = 75 (Default) #> term_buffer = 50 (Default) #> window = 25 (Default) #> algorithm = hmc (Default) #> hmc #> engine = nuts (Default) #> nuts #> max_depth = 10 (Default) #> metric = diag_e (Default) #> metric_file = (Default) #> stepsize = 1 (Default) #> stepsize_jitter = 0 (Default) #> id = 2 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b02e4d2c66.data.R #> init = 2 (Default) #> random #> seed = 124 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-2.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> #> Gradient evaluation took 2.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 100 / 2000 [ 5%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 300 / 2000 [ 15%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 500 / 2000 [ 25%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 700 / 2000 [ 35%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 900 / 2000 [ 45%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1100 / 2000 [ 55%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1300 / 2000 [ 65%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1500 / 2000 [ 75%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1700 / 2000 [ 85%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 1900 / 2000 [ 95%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.012695 seconds (Warm-up) #> 0.019917 seconds (Sampling) #> 0.032612 seconds (Total) #>
# Call CmdStan's bin/summary fit_mcmc$summary()
#> Running bin/stansummary \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-1.csv \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-2.csv #> Inference for Stan model: bernoulli_model #> 2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved. #> #> Warmup took (0.013, 0.013) seconds, 0.025 seconds total #> Sampling took (0.021, 0.020) seconds, 0.041 seconds total #> #> Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat #> lp__ -7.3 2.7e-02 7.3e-01 -8.9 -7.0 -6.8 738 17873 1.0e+00 #> accept_stat__ 0.92 3.1e-03 1.3e-01 0.64 0.97 1.0 1670 40408 1.0e+00 #> stepsize__ 0.92 1.7e-03 1.7e-03 0.92 0.92 0.92 1.0 24 1.4e+12 #> treedepth__ 1.3 1.1e-02 4.7e-01 1.0 1.0 2.0 1968 47626 1.0e+00 #> n_leapfrog__ 2.4 2.5e-02 1.0e+00 1.0 3.0 3.0 1640 39697 1.0e+00 #> divergent__ 0.00 0.0e+00 0.0e+00 0.00 0.00 0.00 1000 24203 nan #> energy__ 7.8 4.0e-02 1.0e+00 6.8 7.5 9.8 647 15654 1.0e+00 #> theta 0.24 4.6e-03 1.2e-01 0.077 0.22 0.47 720 17429 1.0e+00 #> #> Samples were drawn using hmc with nuts. #> For each parameter, N_Eff is a crude measure of effective sample size, #> and R_hat is the potential scale reduction factor on split chains (at #> convergence, R_hat=1). #>
# Run optimization method (default is Stan's LBFGS algorithm) # and also demonstrate specifying data as a path to a file (readable by CmdStan) my_data_file <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.data.R") fit_optim <- mod$optimize(data = my_data_file, seed = 123)
#> Warning: Optimization method is experimental and the structure of returned object may change.
#> method = optimize #> optimize #> algorithm = lbfgs (Default) #> lbfgs #> init_alpha = 0.001 (Default) #> tol_obj = 9.9999999999999998e-13 (Default) #> tol_rel_obj = 10000 (Default) #> tol_grad = 1e-08 (Default) #> tol_rel_grad = 10000000 (Default) #> tol_param = 1e-08 (Default) #> history_size = 5 (Default) #> iter = 2000 (Default) #> save_iterations = 0 (Default) #> id = 1 #> data #> file = /Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-optimize-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> Initial log joint probability = -9.51104 #> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes #> 6 -5.00402 0.000103557 2.55661e-07 1 1 9 #> Optimization terminated normally: #> Convergence detected: relative gradient magnitude is below tolerance
#' Print estimates fit_optim$summary()
#> Estimates from optimization:
#> theta lp__ #> 0.20000 -5.00402
# Run variational Bayes method (default is meanfield ADVI) fit_vb <- mod$variational(data = standata, seed = 123)
#> Warning: Variational inference method is experimental and the structure of returned object may change.
#> method = variational #> variational #> algorithm = meanfield (Default) #> meanfield #> iter = 10000 (Default) #> grad_samples = 1 (Default) #> elbo_samples = 100 (Default) #> eta = 1 (Default) #> adapt #> engaged = 1 (Default) #> iter = 50 (Default) #> tol_rel_obj = 0.01 (Default) #> eval_elbo = 100 (Default) #> output_samples = 1000 (Default) #> id = 1 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b026c9de3df.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-variational-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> ------------------------------------------------------------ #> EXPERIMENTAL ALGORITHM: #> This procedure has not been thoroughly tested and may be unstable #> or buggy. The interface is subject to change. #> ------------------------------------------------------------ #> #> #> #> Gradient evaluation took 2.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. #> Adjust your expectations accordingly! #> #> #> Begin eta adaptation. #> Iteration: 1 / 250 [ 0%] (Adaptation) #> Iteration: 50 / 250 [ 20%] (Adaptation) #> Iteration: 100 / 250 [ 40%] (Adaptation) #> Iteration: 150 / 250 [ 60%] (Adaptation) #> Iteration: 200 / 250 [ 80%] (Adaptation) #> Success! Found best value [eta = 1] earlier than expected. #> #> Begin stochastic gradient ascent. #> iter ELBO delta_ELBO_mean delta_ELBO_med notes #> 100 -6.258 1.000 1.000 #> 200 -6.475 0.517 1.000 #> 300 -6.228 0.358 0.040 #> 400 -6.220 0.269 0.040 #> 500 -6.379 0.220 0.034 #> 600 -6.195 0.188 0.034 #> 700 -6.262 0.163 0.030 #> 800 -6.345 0.144 0.030 #> 900 -6.201 0.131 0.025 #> 1000 -6.307 0.119 0.025 #> 1100 -6.290 0.020 0.023 #> 1200 -6.238 0.017 0.017 #> 1300 -6.182 0.014 0.013 #> 1400 -6.167 0.014 0.013 #> 1500 -6.219 0.012 0.011 #> 1600 -6.164 0.010 0.009 MEDIAN ELBO CONVERGED #> #> Drawing a sample of size 1000 from the approximate posterior... #> COMPLETED.
# Call CmdStan's bin/summary fit_vb$summary()
#> Running bin/stansummary \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-variational-1.csv #> Warning: non-fatal error reading adapation data #> Inference for Stan model: bernoulli_model #> 1 chains: each with iter=(1001); warmup=(0); thin=(0); 1001 iterations saved. #> #> Warmup took (0.00) seconds, 0.00 seconds total #> Sampling took (0.00) seconds, 0.00 seconds total #> #> Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat #> lp__ 0.00 0.0e+00 0.00 0.00 0.00 0.0e+00 500 inf nan #> log_p__ -7.2 2.5e-02 0.72 -8.6 -7.0 -6.8e+00 789 inf 1.0e+00 #> log_g__ -0.54 2.9e-02 0.76 -2.1 -0.27 -1.5e-03 679 inf 1.0e+00 #> theta 0.26 4.2e-03 0.12 0.091 0.23 4.9e-01 823 inf 1.0e+00 #> #> Samples were drawn using meanfield with . #> For each parameter, N_Eff is a crude measure of effective sample size, #> and R_hat is the potential scale reduction factor on split chains (at #> convergence, R_hat=1). #>
# For models fit using MCMC, if you like working with RStan's stanfit objects # then you can create one with rstan::read_stan_csv() if (require(rstan, quietly = TRUE)) { stanfit <- rstan::read_stan_csv(fit_mcmc$output_files()) print(stanfit) }
#> Inference for Stan model: bernoulli-stan-sample-1. #> 2 chains, each with iter=2000; warmup=1000; thin=1; #> post-warmup draws per chain=1000, total post-warmup draws=2000. #> #> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat #> theta 0.24 0.00 0.12 0.06 0.14 0.22 0.32 0.52 720 1 #> lp__ -7.32 0.03 0.73 -9.37 -7.53 -7.05 -6.81 -6.75 737 1 #> #> Samples were drawn using NUTS(diag_e) at Mon Oct 14 21:41:30 2019. #> For each parameter, n_eff is a crude measure of effective sample size, #> and Rhat is the potential scale reduction factor on split chains (at #> convergence, Rhat=1).
# }