An Introduction to AD MODEL BUILDER for Use in Nonlinear Modeling and Statistics Version 11.1  (2013-05-01)
Revised manual  (2013-06-24)

David Fournier

PICAdmb-project.org
--This is a non official version. --
The official version with bibliography and proper citations and stuff is on the official admb website. This is a preliminary attempt at an html manual, if you get upset at missing things LOOK at the offical version available at the link.

http://admb-project.org/documentation/manuals

 

ADMB Foundation, Honolulu.

This is the manual for AD Model Builder (ADMB) version 11.1.

Copyright  1993, 1994, 1996, 2000, 2001, 2004, 2007, 2008, 2011, 2013 David Fournier

The latest edition of the manual is available at:
http://admb-project.org/documentation/manuals

Contents

Chapter 1
Getting Started with AD Model Builder

This manual describes AD Model Builder, the fastest, most powerful software for rapid development and fitting of general nonlinear statistical models available. The accompanying demonstration disk has a number of example programs from various fields, including chemical engineering, natural resource modeling, and financial modeling. As you will see, with a few statements, you can build powerful programs to solve problems that would completely defeat other modeling environments. The AD Model Builder environment makes it simple to deal with recurring difficulties in nonlinear modeling, such as restricting the values that parameters can assume, carrying out the optimization in a stepwise manner, and producing a report of the estimates of the standard deviations of the parameter estimates. In addition, these techniques scale up to models with at least 5000 independent parameters on a 1000 MHz Pentium III, and more on more powerful platforms. So, if you are interested in a really powerful environment for nonlinear modeling—read on!

AD Model Builder provides a template-like approach to code generation. Instead of needing to write all the code for the model, the user can employ any ASCII file editor to simply fill in the template, describing the particular aspects of the model—data, model parameters, and the fitting criterion—to be used. With this approach, the specification of the model is reduced to the absolute minimum number of statements. Reasonable default behavior for various aspects of modeling, such as the input of data and initial parameters, and reporting of results, are provided. Of course, it is possible to override this default behavior to customize an application when desired. The command line argument -ind NAME followed by the string NAME changes the default data input file to NAME, where NAME is any name you like.

The various concepts embodied in AD Model Builder are introduced in a series of examples. You should at least skim through each of the examples in the order they appear, so that you will be familiar with the concepts used in the later examples. The examples disk contains the AD Model Builder template code, the C++ code produced by AD Model Builder, and the executable programs produced by compiling the C++ code. This process of producing the executable is automated, so that the user who doesn’t wish to consider the vagaries of C++ programming can go from the AD Model Builder template to the compiled executable in one step. Assuming that the C++ compiler and the AD Model Builder and AUTODIF libraries have been properly installed, then to produce a AD Model Builder executable, it is only necessary to type makeadm root, where root.tpl is the name of the ASCII file containing the template specification. To simplify model development, two modes of operation are provided: a safe mode with bounds checking on all array objects, and an optimized mode, for fastest execution.

AD Model Builder achieves its high performance levels by employing the AUTODIF C++ class library. AUTODIF combines an array language with the reverse mode of automatic differentiation, supplemented with precompiled adjoint code for the derivatives of common array and matrix operations. However, all of this is completely transparent to the AD Model Builder user. It is only necessary to provide a simple description of the statistical model desired, and the entire process of fitting the model to data and reporting the results is taken care of automatically.

Although C++ potentially provides good support for mathematical modeling, the language is rather complex—it cannot be learned in a few days. Moreover, many features of the language are not needed for mathematical modeling. A novice user who wishes to build mathematical models may have a difficult time deciding what features of the language to learn and what features can be ignored until later. AD Model Builder is intended to help overcome these difficulties and to speed up model development. When using AD Model Builder, most of the aspects of C++ programming are hidden from the user. In fact, the beginning user can be almost unaware that C++ underlies the implementation of AD Model Builder. It is only necessary to be familiar with some of the simpler aspects of C or C++ syntax.

To interpret the results of the statistical analysis, AD Model Builder provides simple methods for calculating the profile likelihood and Markov chain simulation estimates of the posterior distribution for parameters of interest (Hastings-Metropolis algorithm).

1.1 What are nonlinear statistical models?

AD Model Builder is software for creating computer programs to estimate the parameters (or the probability distribution of parameters) for nonlinear statistical models. This raises the question: “What is a nonlinear statistical model?” Consider the following model. We have a set of observations Y i and xij, where it is assumed that
      m
     ∑
Yi =     ajxij + ϵi,
      j=1
(1.1)

and where the ϵi are assumed to be normally distributed random variables with equal variance σ2. Given these assumptions, it can be shown that “good” estimates for the unknown parameters aj are obtained by minimizing
    (              )
∑         ∑m        2
     Yi −     ajxij
 i        j=1
(1.2)

with respect to these parameters. These minimizing values can be found by taking the derivatives with respect to the aj and setting them equal to zero. Since equation (1.1) is linear in the aj and equation (1.2) is quadratic, it follows that the equations given by setting the derivatives equal to zero are linear in the aj. So, the estimates can be found by solving a system of linear equations. For this reason, such a statistical model is referred to as “linear.” Over time, very good numerically stable methods have been developed for calculating these least-squares estimates. For situations where either the equations in the model corresponding to equation (1.1) are not linear, or the statistical assumptions involve non-normal random variables, the methods for finding good parameter estimates will involve minimizing functions that are not quadratic in the unknown parameters aj.

In general, these optimization problems are much more difficult than those arising in least-squares problems. There are, however, various techniques that render the estimation of parameters in such nonlinear models more tractable. The AD Model Builder package is intended to organize these techniques in such a way that they are easy to employ (where possible, employing them in a way that the user does not need to be aware of them), so that investigating nonlinear statistical models becomes—so far as possible—as simple as for linear statistical models.

1.2 Installing the software

AD Model Builder is available without charge from http://admb-project.org/downloads/. Libraries compiled for most common combinations of computer architectures, operating systems, and compilers (including Windows, Linux, MacOS and OpenSolaris) and the complete source code are available.

Installation instructions for different compilers are also available on-line at
http://admb-project.org/documentation/installation

1.3 The sections in an AD Model Builder  TPL file

An AD Model Builder template (TPL file) consists of up to 11 sections. Eight of these sections are optional. Optional sections are enclosed in brackets [ ]. The optional FUNCTION keyword defines a subsection of the PROCEDURE_SECTION.

The simplest model contains only the three required sections:

For example,

 DATA_SECTION 
 
 [INITIALIZATION_SECTION] 
 
 PARAMETER_SECTION 
 
 [PRELIMINARY_CALCS_SECTION] 
 
 PROCEDURE_SECTION 
   [FUNCTION] 
 
 [REPORT_SECTION] 
 
 [RUNTIME_SECTION] 
 
 [TOP_OF_MAIN_SECTION] 
 
 [GLOBALS_SECTION] 
 
 [BETWEEN_PHASES_SECTION] 
 
 [FINAL_SECTION]

1.4 The original AD Model Builder examples

This section includes a short description of the original examples distributed with AD Model Builder. There are now many more examples, which are discussed in subsequent chapters.

A very simple example. This is a trivial least-squares linear model, which is included simply to introduce the basics of AD Model Builder.

A simple nonlinear regression model for estimating the parameters describing a von Bertalanffy growth curve from size-at-age data. AD Model Builder’s robust regression routine is introduced and used to illustrate how problems caused by “outliers” in the data can be avoided.

A chemical kinetics problem. A model defined by a system of ordinary differential equations. The purpose is to estimate the parameters that describe the chemical reaction.

A problem in financial modeling. A Generalized Autoregressive Conditional Heteroskedasticity, or GARCH, model is used to attempt describing the time series of returns from some market instrument.

A problem in natural resource management. The Schaeffer-Pella-Tomlinson Model for investigating the response of an exploited fish population is developed and extended to include a Bayesian times series treatment of time-varying carrying capacity. This example is interesting because the model is rather temperamental. Several techniques for producing reliable convergence of the estimation procedure to the correct answer are described. For one of the data sets, over 100 parameters are estimated.

A simple fisheries catch-at-age model. These models are used to try and estimate the exploitation rates, etc., in exploited fish populations.

More complex examples are presented in subsequent chapters.

1.5 Example 1: linear least squares

To illustrate this method, we begin with a simple statistical model, which is to estimate the parameters of a linear relationship of the form

Yi = axi + b    for 1 <=  i <= n,
where xi and Y i are vectors, and a and b are the model parameters that are to be estimated. The parameters are estimated by the method of least squares—that is, we find the values of a and b such that the sum of the squared differences between the observed values Y i and the predicted values axi + b is minimized. That is, we want to solve the problem
    ∑n
min    (Yi − axi − b)2
 a,b i=1

The template for this model is in the file SIMPLE.TPL. To make the model, one would type makeadm simple. The resulting executable for the model is in the file SIMPLE.EXE. The contents of SIMPLE.TPL are below. (Anything following “//” is a comment.)

DATA_SECTION 
 init_int nobs         // nobs is the number of observations 
 init_vector Y(1,nobs)  // the observed Y values 
 init_vector x(1,nobs) 
PARAMETER_SECTION 
 init_number a 
 init_number b 
 vector pred_Y(1,nobs) 
 objective_function_value f 
PROCEDURE_SECTION 
 pred_Y=a*x+b;    // calculate the predicted Y values 
 f=regression(Y,pred_Y); // do the regression---the vector of 
                    // observations goes first

The main requirement is that all keywords must begin in column 1, while the code itself must be indented.

1.6 The data section

Roughly speaking, the data consist of the stuff in the real world that you observe and want to analyze. The data section describes the structure of the data in your model. Data objects consist of integers (int) and floating point numbers (number). These can be grouped into 1-dimensional (ivector and vector) and 2-dimensional (imatrix and matrix) arrays. The ‘i’ in ivector distinguishes a vector of type int from a vector of type number. For arrays of type number, there are currently arrays up to dimension 7.

Some of your data must be read in from somewhere—that is, you need to start with something. These data objects are referred to as “initial objects” and are distinguished by the prefix init, as in init_int or init_number. All objects prefaced with init in the DATA_SECTION are read in from a data file in the order in which they are declared. The default file names for various files are derived from the name of the executable program. For example, if the executable file is named ROOT.EXE, then the default input data file name is ROOT.DAT. For this example, the executable file is named SIMPLE.EXE, so the default data file is SIMPLE.DAT. Notice that once an object has been read in, its value is available to be used to describe other data objects. In this case, the value of nobs can be used to define the size of the vectors Y and x. The next line

 init\_vector Y(1,nobs)

defines an initial vector object Y whose minimum valid index is 1 and whose maximum valid index is nobs. This vector object will be read in next from the data file. The contents of the file SIMPLE.DAT are shown below.

# number of observations 
   10 
# observed Y values 
   1.4 4.7 5.1 8.3 9.0 14.5 14.0 13.4 19.2 18 
# observed x values 
   -1 0 1 2 3 4 5 6 7 8

It is possible to put comment lines in the data files. Comment lines must have the character # in the first column.

It is often useful to have data objects that are not initial. Such objects have their values calculated from the values of initial data objects. Examples of the use of non-initial data objects are given below.

1.7 The parameter section

It is the parameters of your model that provide the analysis of the data. (Perhaps more correctly, it is the values of these parameters, as picked by the fitting criterion for the model, that provide the analysis of the data.) The PARAMETER_SECTION is used to describe the structure of the parameters in your model. The description of the model parameters is similar to that used for the data in the DATA_SECTION.

All parameters are floating point numbers (or arrays of floating point numbers). The statement init_number b defines a floating point (actually, a double) number. The preface init means that this is an initial parameter. Initial parameters have two properties that distinguish them from other model parameters. First, all of the other model parameters are calculated from the initial parameters. This means that in order to calculate the values of the model parameters, it is first necessary to have values for the initial parameters. A major difference between initial data objects (which must be read in from a data file) and initial parameters is that since parameters are estimated in the model, it is possible to assign initial default values to them.

The default file name for the file that contains initial values for the initial model parameters is ROOT.PIN. If no file named ROOT.PIN is found, default values are supplied for the initial parameters. (Methods for changing the default values for initial parameters are described below.) The statement

 vector pred_Y(1,nobs)

defines a vector of parameters. Since it is not prefaced with init, the values for this vector will not be read in from a file or given default values. It is expected that the value of the elements of this vector will be calculated in terms of other parameters.

The statement objective_function_value f defines a floating point (again, actually a double) number. It will hold the value of the fitting criterion. The parameters of the model are chosen so that this value is minimized.1 1Thus it should be set equal to minus the log-likelihood function if that criterion is used Every AD Model Builder template must include a declaration of an object of type objective_function_value and this object must be set equal to a fitting criterion. (Don’t worry. For many models, the fitting criterion is provided for you—as in the regression and robust_regression fitting criterion functions in the current and next examples.)

1.8 The procedure section

The PROCEDURE_SECTION contains the actual model calculations. This section contains C++ code, so C++ syntax must be obeyed. (Those familiar with C or C++ will notice that the usual methods for defining and ending a function are not necessary and, in fact, cannot be used for the routine in the main part of this section.) Statements must end with a “;”—exactly as with C or C++. The ‘;’ is optional in the DATA_SECTION and the PARAMETER_SECTION. The code uses AUTODIF’s vector operations, which enable you to avoid writing a lot of code for loops. In the statement pred_Y=a*x+b; the expression a*x forms the product of the number a and the components of the vector x, while +b adds the value of the number b to this product, so that pred_Y has the components axi + b. In the line

 f=regression(Y,pred\_Y);

the function regression calculates the log-likelihood function for the regression and assigns this value to the object f, which is of type objective_function_value. This code generalizes immediately to nonlinear regression models and can be trivially modified (with the addition of one word) to perform the robust nonlinear regression. This is discussed in the second example. For the reader who wants to know, the form of the regression function is described in Appendix A.

Note that the vector of observed values goes first. The use of the regression function makes the purpose of the calculations clearer, and it prepares the way for modifying the routine to use AD Model Builder’s robust regression function.

1.9 The preliminary calculations section

Note that LOCAL_CALCS and its variants in the DATA_SECTION and the PROCEDURE_SECTION has greatly reduced the need for the PRELIMINARY_CALCS_SECTION.

The PRELIMINARY_CALCS_SECTION, as its name implies, permits one to do preliminary calculations with the data before getting into the model proper. Often the input data are not in a convenient form for doing the analysis and one wants to carry out some calculations with the input data to put them in a more convenient form. Suppose that the input data for the simple regression model are in the form

# number of observations 
   10 
# observed Y values observed x values 
   1.4               -1 
   4.7               0 
   5.1               1 
   8.3               2 
   9.0               3 
   14.5               4 
   14.0               5 
   13.4               6 
   19.2               7 
   18                8

The problem is that the data are in pairs in the form (Y i,xi), so that we can’t read in either the xi or Y i first. To read in the data in this format, we will define a matrix with nobs rows and two columns. The DATA_SECTION becomes

DATA_SECTION 
 init_int nobs 
 init_matrix Obs(1,nobs,1,2) 
 vector Y(1,nobs) 
 vector x(1,nobs)

Notice that since we do not want to read in Y or x, these objects are no longer initial objects, so their declarations are no longer prefaced with int. The observations will be read into the initial matrix object Obs so that Y is in the first column of Obs, while x is in the second column. If we don’t want to change the rest of the code, the next problem is to get the first column of Obs into Y, and the second column of Obs into x. The following code in the PRELIMINARY_CALCS_SECTION will accomplish this objective. It uses the function column, which extracts a column from a matrix object so that it can be put into a vector object.

PRELIMINARY_CALCS_SECTION 
Y=column(Obs,1); // extract the first column 
x=column(Obs,2); // extract the second column

1.10 The use of loops and element-wise operations

This section can be skipped on first reading.

To accomplish the column-wise extraction presented above, you would have to know that AUTODIF provides the column operation. What if you didn’t know that and don’t feel like reading the manual yet? For those who are familiar with C, it is generally possible to use lower level “C-like” operations to accomplish the same objective as AUTODIF’s array and matrix operations. In this case, the columns of the matrix Obs can also be copied to the vectors x and Y by using a standard for loop and the following element-wise operations

PRELIMINARY_CALCS_SECTION 
 for (int i=1;i<=nobs;i++) 
 { 
   Y[i]=Obs[i][1]; 
   x[i]=Obs[i][2]; 
 }

Incidentally, the C-like operation [] was used for indexing members of arrays. AD Model Builder also supports the use of (), so that the above code could be written as

PRELIMINARY_CALCS_SECTION 
 for (int i=1;i<=nobs;i++) 
 { 
   Y(i)=Obs(i,1); 
   x(i)=Obs(i,2); 
 }

which may be more readable for some users. Notice that it is also possible to define C objects, such as the object of type int i used as the index for the for loop, “on the fly” in the PRELIMINARY_CALCS_SECTION or the PROCEDURE_SECTION.

1.11 The default output from AD Model Builder

By default, AD Model Builder produces three or more files: ROOT.PAR, which contains the parameter estimates in ASCII format, ROOT.BAR, which is the parameter estimates in a binary file format, and ROOT.COR, which contains the estimated standard deviations and correlations of the parameter estimates.

The template code for the simple model is in the file SIMPLE.TPL. The input data is in the file SIMPLE.DAT. The parameter estimates are in the file SIMPLE.PAR. By default, the standard deviations and the correlation matrix for the model parameters are estimated. They are in the file SIMPLE.COR:

index    value    std.dev     1     2 
   1  a 1.9091e+00 1.5547e-01   1 
   2  b 4.0782e+00 7.0394e-01 -0.773    1

The format of the standard deviations report is to give the name of the parameter followed by its value and standard deviation. After that, the correlation matrix for the parameters is given.

1.12 Robust nonlinear regression with AD Model Builder

The code for the model template for this example is found in the file VONB.TPL. This example is intended to demonstrate the advantages of using AD Model Builder’s robust regression routine over standard nonlinear least square regression procedures. Further discussion about the underlying theory can be found in the AUTODIF user’s manual, but it is not necessary to understand the theory to make use of the procedure.


__ __ __ 010 20 30 40 50 Size _______________________________________________________ 0 2 4 6 8 10 12 14 16 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

index      value    std.dev     1     2     3 
   1  Linf 5.4861e+01 2.4704e+00 1.0000 
   2  K    1.7985e-01 2.7127e-02 -0.9191 1.0000 
   3  t0   1.8031e-01 2.9549e-01 -0.5856 0.7821 1.0000

Figure 1.1: Results for nonlinear regression with good data set.


Figure 1.1 estimates the parameters describing a growth curve from a set of data consisting of ages and size-at-age data. The form of the (von Bertalanffy) growth curve is assumed to be
s(a) = L∞ (1 − exp ( − K (a − t0)))
(1.3)

The three parameters of the curve to be estimated are L, K, and t0.

Let Oi and ai be the observed size and age of the i

thanimal.Thepredictedsizes(a_i)isgivenbyequation (1.3).Theleast squaresestimatesfortheparametersarefoundbyminimizing
min L,K,t0 i(Oi s(ai))2

DATA_SECTION 
 init_int nobs; 
 init_matrix data(1,nobs,1,2) 
 vector age(1,nobs); 
 vector size(1,nobs); 
PARAMETER_SECTION 
 init_number Linf; 
 init_number K; 
 init_number t0; 
 vector pred_size(1,nobs) 
 objective_function_value f; 
PRELIMINARY_CALCS_SECTION 
 // get the data out of the columns of the data matrix 
 age=column(data,1); 
 size=column(data,2); 
 Linf=1.1*max(size); // set Linf to 1.1 times the longest observed length 
PROCEDURE_SECTION 
 pred_size=Linf*(1.-exp(-K*(age-t0))); 
 f=regression(size,pred_size);

Notice the use of the regression function, which calculates the log-likelihood function of the nonlinear least-squares regression. This part of the code is formally identical to the code for the linear regression problem in the simple example, even though we are now doing nonlinear regression. A graph of the least-square estimated growth curve and the observed data is given in Figure 1.1. The parameter estimates and their estimated standard deviations produced by AD Model Builder are also given. For example, the estimate for L is 54.86, with a standard deviation of 2.47. Since a 95% confidence limit is about  two standard deviations, the usual 95% confidence limit of L for this analysis would be 54.86  4.94.

A disadvantage of least-squares regression is the sensitivity of the estimates to a few “bad” data points or outliers. Figure 1.2 shows the least-squares estimates when the observed size for age 2 and age 14 have been moved off the curve. There has been a rather large change in some of the parameter estimates. For example, the estimate for L has changed from 54.86 to 48.91, and the estimated standard deviation for this parameter has increased to 5.99. This is a common effect of outliers on least-squares estimates. They greatly increase the size of the estimates of the standard deviations. As a result, the confidence limits for the parameters are increased. In this case, the 95% confidence limits for L have been increased from 54.86 4.94 to 48.91 11.98.

Of course, for this simple example, it could be argued that a visual examination of the residuals would identify the outliers so that they could be removed. This is true, but in larger nonlinear models, it is often not possible, or convenient, to identify and remove all the outliers in this fashion. Also, the process of removing “inconvenient” observations from data can be uncomfortably close to “cooking” the data in order to obtain the desired result from the analysis. An alternative approach, which avoids these difficulties, is to employ AD Model Builder’s robust regression procedure, which removes the undue influence of outlying points without the need to expressly remove them from the data.


__ __ __ 010 20 30 40 50 Size _______________________________________________________ 0 2 4 6 8 10 12 14 16 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nonlinear regression with bad data set 
index      value    std.dev     1     2     3 
   1  Linf 4.8905e+01 5.9938e+00 1.0000 
   2  K    2.1246e-01 1.2076e-01 -0.8923 1.0000 
   3  t0  -5.9153e-01 1.4006e+00 -0.6548 0.8707 1.0000

Figure 1.2: Nonlinear regression with bad data set.


1.13 Modifying the model to use robust nonlinear regression

To invoke the robust regression procedure, it is necessary to make three changes to the existing code. The template for the robust regression version of the model can be found in the file VONBR.TPL.

DATA_SECTION 
 init_int nobs; 
 init_matrix data(1,nobs,1,2) 
 vector age(1,nobs) 
 vector size(1,nobs) 
PARAMETER_SECTION 
 init_number Linf 
 init_number K 
 init_number t0 
 vector pred_size(1,nobs) 
 objective_function_value f 
 init_bounded_number a(0.0,0.7,2) 
PRELIMINARY_CALCS_SECTION 
 // get the data out of the columns of the data matrix 
 age=column(data,1); 
 size=column(data,2); 
 Linf=1.1*max(size); // set Linf to 1.1 times the longest observed length 
 a=0.7; 
PROCEDURE_SECTION 
 pred_size=Linf*(1.-exp(-K*(age-t0))); 
 f=robust_regression(size,pred_size,a);

The main modification to the model involves the addition of the parameter a, which is used to estimate the amount of contamination by outliers. This parameter is declared in the PARAMETER_SECTION:

 init_bounded_number a(0.0,0.7,2)

This introduces two concepts: 1) putting bounds on the values that initial parameters can take on and 2) carrying out the minimization in a number of stages. The value of a should be restricted to lie between 0.0 and 0.7. (See the discussion on robust regression in the AUTODIF user’s manual if you want to know where the 0.0 and 0.7 come from.) This is accomplished by declaring a to be of type init_bounded_number. In general, it is not possible to estimate the parameter a determining the amount of contamination by outliers until the other parameters of the model have been “almost” estimated—that is, until we have done a preliminary fit of the model. This is a common situation in nonlinear modeling and is discussed further in some later examples. So, we want to carry out the minimization in two phases.

During the first phase, a should be held constant. In general, for any initial parameter, the last number in its declaration, if present, determines the number of the phase in which that parameter becomes active. If no number is given, the parameter becomes active in phase 1. Note: For an init_bounded_number, the upper and lower bounds must be given, so the declaration

 init_bounded_number a(0.0,0.7)

would use the default phase 1. The 2 in the declaration for a causes a to be constant until the second phase of the minimization.

The second change to the model involves the default initial value a. The default value for a bounded number is the average of the upper and lower bounds. For a, this would be 0.35, which is too small. We want to use the upper bound of 0.7. This is done by adding the line

 a=0.7;

in the PRELIMINARY_CALCS_SECTION. Finally, we modify the statement in the PROCEDURE_SECTION that includes the regression function to be

 f=robust_regression(size,pred_size,a);

to invoke the robust regression function.

That’s all there is to it! These three changes will convert any AD Model builder template from a nonlinear regression model to a robust nonlinear regression model.


__ __ __ 010 20 30 40 50 Size _______________________________________________________ 0 2 4 6 8 10 12 14 16 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

index      value    std.dev     1     2     3     4 
   1  Linf 5.6184e+01 3.6796e+00 1.0000 
   2  K    1.6818e-01 3.4527e-02 -0.9173 1.0000 
   3  t0   6.5129e-04 4.5620e-01 -0.5483 0.7724 1.0000 
   4  a    3.6144e-01 1.0721e-01 -0.1946 0.0367 -0.2095 1.0000

Figure 1.3: Robust Nonlinear regression with bad data set.


The results for the robust regression fit to the bad data set are shown in Figure 1.3. The estimate for L is 56.18, with a standard deviation of 3.68, to give a 95% confidence interval of about 56.18 7.36. Both the parameter estimates and the confidence limits are much less affected by the outliers for the robust regression estimates than they are for the least-squares estimates. The parameter a is estimated to be equal to 0.36, which indicates that the robust procedure has detected some moderately large outliers.

The results for the robust regression fit to the good data set are shown in Figure 1.4. The estimates are almost identical to the least-square estimates for the same data. This is a property of the robust estimates. They do almost as well as the least-square estimates when the assumption of normally distributed errors in the statistical model is satisfied exactly. They do much better than least-square estimates in the presence of moderate or large outliers. You can lose only a little and you stand to gain a lot by using these estimators.


__ __ __ 010 20 30 40 50 Size _______________________________________________________ 0 2 4 6 8 10 12 14 16 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

index      value    std.dev     1     2     3     4 
   1  Linf 5.5707e+01 1.9178e+00 1.0000 
   2  K    1.7896e-01 1.9697e-02 -0.9148 1.0000 
   3  t0   2.1490e-01 2.0931e-01 -0.5604 0.7680 1.0000 
   4  a    7.0000e-01 3.2246e-05 -0.0001 0.0000 -0.0000 1.0000

Figure 1.4: Robust Nonlinear regression with good data set.


1.14 Chemical engineering: a chemical kinetics problem

This example may strike you as being fairly complicated. If so, you should compare it with the original solution using the so-called sensitivity equations. The reference is [?], Chapter 8. We consider the chemical kinetics problem introduced on page 233. This is a model defined by a first order system of two ordinary differential equations.
ds1∕dt = θ1 exp ( θ2∕T)(s1 e1000∕T s 22)(1 + θ 3 exp(θ4∕T)s1)2
ds2∕dt = 2θ1 exp ( θ2∕T)(s1 e1000∕T s 22)(1 + θ 3 exp(θ4∕T)s1)2
(1.4)
The differential equations describe the evolution over time of the concentrations of the two reactants, s1 and s2. There are 10 initial parameters in the model: θ1,10. T is the temperature at which the reaction takes place. To integrate the system of differential equations, we require the initial concentrations of the reactants: s1(0) and s2(0) at time zero.

The reaction was carried out three times at temperatures of 200, 400, and 600 degrees. For the first run, there were initially equal concentrations of the two reactants. The second run initially consisted of only the first reactant, and the third run initially consisted of only the second reactant. The initial concentrations of the reactants are known only approximately. See Table 1.1 for what they are.


1.2.3.

 Run 1 s1(0) = θ5 = 1 0.05 s2(0) = θ6 = 1 0.05
 Run 2 s1(0) = θ7 = 1 0.05 s2(0) = 0
 Run 3 s1(0) = 0 s2(0) = θ8 = 1 0.05
 

Table 1.1:

The unknown initial concentrations are treated as parameters to be estimated with Bayesian prior distributions on them, reflecting the level of certainty of their true values that we have. The concentrations of the reactants were not measured directly. Rather, the mixture was analyzed by a “densitometer,” whose response to the concentrations of the reactants is

y = 1 + θ9s1 + θ10s2
where θ9 = 1 0.05 and θ10 = 2 0.05. The differences between the predicted and observed responses of the densitometer are assumed to be normally distributed, so least squares is used to fit the model. Bard employs an “explicit” method for integrating these differential equations, that is, the equations are approximated by a finite difference scheme, such as
s1(tn+1) = s1(tn) 1exp ( − θ ∕T)(s (t ) − e−1000∕Ts (t )2)
---------2-----1--n------------22--n---
      (1 + θ3exp(− θ4∕T )s1(tn))
s2(tn+1) = s2(tn) + 21exp ( − θ2∕T)(s1(tn) − e−1000∕Ts2(tn)2)
-------------------------------2------
      (1 + θ3exp(− θ4∕T )s1(tn)) (1.5)
over the time period tn to tn+1 of length h. Equations (1.5) are called “explicit,” because the values of s1 and s2 at time tn+1 are given explicitly in terms of the values of s1 and s2 at time tn.

The advantage of using an explicit scheme for integrating the model differential equations is that the derivatives of the model functions with respect to the model parameters also satisfy differential equations. They are called “sensitivity equations” (see [?], pages 227–229). It is possible to integrate these equations, as well as the model equations, to get values for the derivatives. However, this involves generating a lot of extra code, as well as carrying out a lot of extra calculations. Since with AD Model Builder it is not necessary to produce any code for derivative calculations, it is possible to employ alternate schemes for integrating the differential equations.

Let A = θ1 exp(θ2∕T), B = exp(1000∕T), and C = (1 + θ3 exp(θ4∕T)s1)2. In terms of A and C, we can replace the explicit finite difference scheme by the semi-implicit scheme
s1(tn+1) = s1(tn) hA(s1(tn+1) Bs22(t n+1))∕C
s2(tn+1) = s2(tn) + 2hA(s1(tn+1) Bs2(tn)s2(tn+1))∕C (1.6)
Now let D = hA∕C and solve equations (3) for s1(tn+1) and s2(tn+1) to obtain
s1(tn+1) = (s1(tn) + DBs2(tn))(1 + D)
s2(tn+1) = (s2(tn) + 2Ds1(tn))(1 + (2DBs2(tn))) (1.7)
Implicit and semi-implicit schemes tend to be more stable than explicit schemes over large time steps and large values of some of the model parameters. This stability is especially important when fitting nonlinear models, because the algorithms for function minimization will pick very large, or “bad,” values of the parameters from time to time, and the minimization procedure will generally perform better when a more stable scheme is employed.

DATA_SECTION 
 init_matrix Data(1,10,1,3) 
 init_vector T(1,3) // the initial temperatures for the three runs 
 init_vector stepsize(1,3) // the stepsize to use for numerical integration 
 matrix data(1,3,1,10) 
 matrix sample_times(1,3,1,10) // times at which reaction was sampled 
 vector x0(1,3)       // the beginning time for each of the three 
                    // runs 
 vector x1(1,3)       // the ending time for each of the three runs 
 // for each of the three runs 
 
PARAMETER_SECTION 
 init_vector theta(1,10) // the model parameters 
 matrix init_conc(1,3,1,2) // the initial concentrations of the two 
                     // reactants over three time periods 
 vector instrument(1,2) // determines the response of the densitometer 
 matrix y_samples(1,10,1,2)// the predicted concentrations of the two 
                     // reactants at the ten sampling periods 
                     // obtained by integrating the differential 
                     // equations 
 vector diff(1,10)     // the difference between the observed and 
                     // readings of the densitometer 
 objective_function_value f // the log_likelihood function 
 number bayes_part     // the Bayesian contribution 
 number y2 
 number x_n 
 vector y_n(1,2) 
 vector y_n1(1,2) 
 number A // A B C D hold some common subexpressions 
 number B 
 number C 
 number D 
PRELIMINARY_CALCS_SECTION 
 data=trans(Data); // it is more convenient to work with the transformed 
              // matrix 
PROCEDURE_SECTION 
 
   // set up the begining and ending times for the three runs 
   x0(1)=0; 
   x1(1)=90; 
   x0(2)=0; 
   x1(2)=18; 
   x0(3)=0; 
   x1(3)=4.5; 
   // set up the sample times for each of the three runs 
   sample_times(1).fill_seqadd(0,10); // fill with 0,10,20,...,90 
   sample_times(2).fill_seqadd(0,2); // fill with 0,2,4,...,18 
   sample_times(3).fill_seqadd(0,0.5); // fill with 0,0.5,1.0,...,4.5 
 
   // set up the initial concentrations of the two reactants for 
   // each of the three runs 
   init_conc(1,1)=theta(5); 
   init_conc(1,2)=theta(6); 
   init_conc(2,1)=theta(7); 
   init_conc(2,2)=0.0;  // the initial concentrations is known to be 0 
   init_conc(3,1)=0.0;  // the initial concentrations is known to be 0 
   init_conc(3,2)=theta(8); 
 
   // coefficients which determine the response of the densitometer 
   instrument(1)=theta(9); 
   instrument(2)=theta(10); 
   f=0.0; 
   for (int run=1;run<=3;run++) 
   { 
     // integrate the differential equations to get the predicted 
     // values for the y_samples 
    int nstep=(x1(run)-x0(run))/stepsize(run); 
    nstep++; 
    double h=(x1(run)-x0(run))/nstep; // h is the stepsize for integration 
 
    int is=1; 
    // get the initial conditions for this run 
    x_n=x0(run); 
    y_n=init_conc(run); 
    for (int i=1;i<=nstep+1;i++) 
    { 
      // gather common subexpressions 
      y2=y_n(2)*y_n(2); 
      A=theta(1)*exp(-theta(2)/T(run)); 
      B=exp(-1000/T(run)); 
      C=(1+theta(3)*exp(-theta(4)/T(run))*y_n(1)); 
      C=C*C; 
      D=h*A/C; 
      // get the y vector for the next time step 
      y_n1(1)=(y_n(1)+D*B*y2)/(1.+D); 
      y_n1(2)=(y_n(2)+2.*D*y_n(1))/(1.+(2*D*B*y_n(2))); 
 
      // if an observation occurred during this time period save 
      // the predicted value 
      if (is <=10) 
      { 
       if (x_n<=sample_times(run,is) && x_n+h >= sample_times(run,is)) 
       { 
         y_samples(is++)=y_n; 
       } 
      } 
      x_n+=h; // increment the time step 
      y_n=y_n1; // update the y vector for the next step 
 
    } 
    diff=(1.0+y_samples*instrument)-data(run); //differences between the 
            // predicted and observed values of the densitometer 
    f+=diff*diff; // sum of squared differences 
   } 
   // take the log of f and multiply by nobs/2 to get log-likelihood 
   f=15.*log(f); // This is (number of obs)/2. It is wrong in Bard (pg 236). 
 
   // Add the Bayesian stuff 
   bayes_part=0.0; 
   for (int i=5;i<=9;i++) 
   { 
    bayes_part+=(theta(i)-1)*(theta(i)-1); 
   } 
   bayes_part+=(theta(10)-2)*(theta(10)-2); 
   f+=1./(2.*.05*.05)*bayes_part;

AD Model Builder produces a report containing values, standard deviations, and correlation matrix of the parameter estimates. As discussed below, any parameter or group of parameters can easily be included in this report. For models with a large number of parameters, this report can be a bit unwieldly, so options are provided to exclude parameters from the report, if desired.

   index          value   std.dev     1     2     3     4     5     6     7     8     9     10
      1   theta  1.37e+00 2.09e-01     1
      2   theta  1.12e+03 7.70e+01  0.95     1
      3   theta  1.80e+00 7.95e-01   0.9  0.98     1
      4   theta  3.58e+02 1.94e+02  0.91  0.98  0.99     1
      5   theta  1.00e+00 4.49e-02  0.20  0.28  0.12  0.17     1
      6   theta  9.94e-01 2.99e-02 -0.42 -0.35 -0.25 -0.22 -0.58     1
      7   theta  9.86e-01 2.59e-02  0.01  0.22  0.22  0.28  0.26  0.42     1
      8   theta  1.02e+00 1.69e-02 -0.38 -0.34 -0.36 -0.30  0.09  0.63  0.34     1
      9   theta  1.00e+00 2.59e-02 -0.02 -0.23 -0.23 -0.30 -0.28 -0.43 -0.98 -0.37     1
     10   theta  1.97e+00 3.23e-02  0.44  0.37  0.40  0.32 -0.09 -0.65 -0.37  -0.93  0.40    1

1.15 Financial Modelling: a generalized autoregressive conditional heteroskedasticity or GARCH model

Time series models are often used in financial modeling. For these models, the parameters are often extremely badly determined. With the stable numerical environment produced by AD Model Builder, it is a simple matter to fit such models.

Consider a time series of returns rt, where t = 0,,T, available from some type of financial instrument. The model assumptions are

rt = μ + ϵt  ht = a0 + a1ϵ2  + a2ht−1    for  1 ≤ t ≤ T,  a0 ≥ 0, a1 ≥ 0,  a2 ≥ 0
                         t−1
where the ϵt are independent normally distributed random variables with mean 0 and variance ht. We assume ϵ0 = 0 and h0 = i=0T (r i r)2(T + 1). There are four initial parameters to be estimated for this model: μ, a0, a1, and a2. The log-likelihood function for the vector rt is equal to a constant plus
    ∑T
− .5    (log(ht) + (rt − μ)2∕ht)
     t=1

DATA_SECTION 
 init_int T 
 init_vector r(0,T) 
 vector sub_r(1,T) 
 number h0 
INITIALIZATION_SECTION 
 a0 .1 
 a1 .1 
 a2 .1 
PARAMETER_SECTION 
 init_bounded_number a0(0.0,1.0) 
 init_bounded_number a1(0.0,1.0,2) 
 init_bounded_number a2(0.0,1.0,3) 
 init_number Mean 
 vector eps2(1,T) 
 vector h(1,T) 
 objective_function_value log_likelihood 
PRELIMINARY_CALCS_SECTION 
 h0=square(std_dev(r)); // square forms the element-wise square 
 sub_r=r(1,T);  // form a subvector so we can use vector operations 
 Mean=mean(r);  // calculate the mean of the vector r 
PROCEDURE_SECTION 
 eps2=square(sub_r-Mean); 
 h(1)=a0+a2*h0; 
 for (int t=2;t<=T;t++) 
 { 
   h(t)=a0+a1*eps2(t-1)+a2*h(t-1); 
 } 
 // calculate minus the log-likelihood function 
 log_likelihood=.5*sum(log(h)+elem_div(eps2,h)); // elem_div performs 
                        // element-wise division of vectors 
RUNTIME_SECTION 
 convergence_criteria .1, .1, .001 
 maximum_function_evaluations 20, 20, 1000

We have used vector operations such as elem_div and sum to simplify the code. Of course, the code could also have employed loops and element-wise operations. The parameter values and standard deviation report for this model appears below.

index      value    std.dev     1     2     3     4 
   1  a0   1.6034e-04 2.3652e-05 1.0000 
   2  a1   9.3980e-02 2.0287e-02 0.1415 1.0000 
   3  a2   3.7263e-01 8.2333e-02 -0.9640 -0.3309 1.0000 
   4  Mean -1.7807e-04 3.0308e-04 0.0216 -0.1626 0.0144 1.0000

This example employs bounded initial parameters. Often it is necessary to put bounds on parameters in nonlinear modeling, to ensure that the minimization is stable. In this example, a0 is constrained to lie between 0.0 and 1.0:

 init_bounded_number a0(0.0,1.0) 
 init_bounded_number a1(0.0,1.0,2) 
 init_bounded_number a2(0.0,1.0,3)

1.16 Carrying out the minimization in a number of phases

For linear models, one can simply estimate all the model parameters simultaneously. For nonlinear models, often this simple approach does not work very well. It may be necessary to keep some of the parameters fixed during the initial part of the minimization process, and carry out the minimization over a subset of the parameters. The other parameters are included into the minimization process, in a number of phases until all of the parameters have been included. AD Model Builder provides support for this multi-phase approach. In the declaration of any initial parameter, the last number, if present, determines the phase of the minimization during which this parameter is included (becomes active). If no number is present, the initial parameter becomes active in phase 1. In this case, a0 has no phase number and so becomes active in phase 1. a1 becomes active in phase 2, and a2 becomes active in phase 3. In this example, phase 3 is the last phase of the optimization.

It is often convenient to modify aspects of the code depending on what phase of the minimization procedure is the current phase, or on whether a particular initial parameter is active. The function

current_phase()

returns an integer (an object of type int) that is the value of the current phase. The function

 last_phase()

returns the value “true” (0) if the current phase is the last phase and false (= 0) otherwise. If xxx is the name of any initial parameter the function

 active(xxx)

returns the value “true” if xxx is active during the current phase and false otherwise.

After the minimization of the objective function has been completed, AD Model Builder calculates the estimated covariance matrix for the initial parameters, as well as any other desired parameters that have been declared to be of sd_report type. Often, these additional parameters may involve considerable additional computational overhead. If the values of these parameters are not used in calculations proper, it is possible to only calculate them during the standard deviations report phase.

sd_phase()

The sd_phase function returns the value “true” if we are in the standard deviations report phase and “false” otherwise. It can be used in a conditional statement to determine whether to perform calculations associated with some sd_report object.

When estimating the parameters of a model by a multi-phase minimization procedure, the default behavior of AD Model Builder is to carry out the default number of function evaluations until convergence is achieved in each stage. If we are only interested in the parameter estimates obtained after the last stage of the minimization, it is often not necessary to carry out the full minimization in each stage. Sometimes, considerable time can be saved by relaxing the convergence criterion in the initial stages of the optimization.

The RUNTIME_SECTION allows the user to modify the default behavior of the function minimizer during the phases of the estimation process:

RUNTIME_SECTION 
 convergence_criteria .1, .1, .001 
 maximum_function_evaluations 20, 20, 1000

The convergence_criteria affects the criterion used by the function minimizer to decide when the optimization process has occurred. The function minimizer compares the maximum value of the vector of derivatives of the objective function, with respect to the independent variables, to the numbers after the convergence_criteria keyword. The first number is used in the first phase of the optimization, the second number in the second phase, and so on. If there are more phases to the optimization than there are numbers, the last number is used for the rest of the phases of the optimization. The numbers must be separated by commas. The spaces are optional. The maximum_function_evaluations keyword controls the maximum number of evaluations of the objective function that will be performed by the function minimizer in each stage of the minimization procedure.

1.17 Natural resource management: the Schaeffer-Pella-Tomlinson Model

It is typical of many models in natural resource management that the model tends to be rather unstable numerically. In addition, some of the model parameters are often poorly determined. Notwithstanding these difficulties, it is often necessary to make decisions about resource management based on the analysis provided by these models. This example provides a good opportunity for presenting some more advanced features of AD Model Builder that are designed to overcome these difficulties.

The Schaeffer-Pella-Tomlinson model is employed in fisheries management. The model assumes that the total biomass of an exploited fish stock satisfies an ordinary differential equation of the form
dB-       (    (B-)m− 1)
 dt =  rB  1 −   k       − F B     where   m  > 1
(1.8)

([?], page 303), where B is the biomass, F is the instantaneous fishing mortality rate, r is a parameter often referred to an the “intrinsic rate of increase,” k is the unfished equilibrium stock size,
C  = F B
(1.9)

is the catch rate, and m is a parameter that determines where the maximum productivity of the stock occurs. If the value of m is fixed at 2, the model is referred to as the “Schaeffer model.” The explicit form of the difference equation corresponding to this differential equation is
Bt+δ = Bt + rBt δ − rBt (Bt)m− 1δ − FtBtδ
                         k
(1.10)

To get a semi-implicit form of this difference equation that has better numerical stability than the explicit version, we replace some of the terms Bt on the right hand side of 1.10 by Bt+δ, to get
                          (  )m− 1
Bt+δ = Bt + rBt δ − rBt+δ  Bkt     δ − FtBt+δδ
(1.11)

We solve for Bt+δ to give
        ------Bt-(1 +-rδ-)-----
Bt+ δ = 1 + (r(Bt∕k)m −1 + Ft)δ
(1.12)

The catch Ct+δ over the period (t,t + δ) is given by
Ct+ δ = FtBt+ δδ
(1.13)

1.18 Bayesian considerations in the Pella-Tomlinson Model

The parameter k is referred to as the “carrying capacity” or the “unfished equilibrium biomass level,” because it is the value that the biomass of the population will eventually assume if there is no fishing. For a given value of k, the parameter m determines the level of maximum productivity—that is, the level of biomass BMAX for which the removals from fishing can be the greatest:

B     =  --k√----
  MAX    m −1m

For m = 2, maximum productivity is obtained by that level of fishing pressure that reduces the stock to 50% of the carrying capacity. For the data available in many real fisheries problems, the parameter m is very poorly determined. It is common practice, therefore, to simply assume that m = 2. Similarly, it is commonly assumed that the carrying capacity k does not change over time, even though changes such as habitat degradation may well lead to changes in k.

We want to construct a statistical model where the carrying capacity can be varying slowly over time if there appears to be any information in the fisheries data supporting this hypothesis. What is meant by “slowly”? The answer to this question will depend on the particular situation. For our purposes, “slowly” means “slowly enough” so that the model has some chance of supplying a useful analysis of the situation at hand. We refer to this as “the assumption of manageability.” The point is that since we are going to use this model anyways to try and mange a resource, we may as well assume that the model’s assumptions are satisfied—at least well enough that we have some hope of success. This may seem extremely arbitrary, and it is. However, it is not as arbitrary as assuming that the carrying capacity is constant.

We assume that ki+1 = ki exp(κi), where the κi are independent normally distributed random variables with mean 0, and that log(m1) is normally distributed with mean 0. The parameters log(k) are assumed to have the structure of a random walk, which is the simplest type of time series. This Bayesian approach is a very simple method for including a time series structure into the parameters of a nonlinear model.

We don’t know the true catches Ci in each year. What we have are estimates Cobs i of the catch. We assume that the quantities log(Cobs i ∕Ci) are normally distributed with mean 0.

Finally, we must deal with the fishing mortality F. Estimates of F are not obtained directly. Instead, what is observed is an index of fishing mortality—in this case, fishing effort. We assume that for each year, we have an estimate Ei of fishing effort and that the fishing mortality rate Fi in year i satisfies the relationship Fi = qEi exp(ηi), where q is a parameter referred to as the “catchability” and the ηi are normally distributed random variables with mean 0.

We assume that the variance of the ηi is 10 times the variance in the observed catch errors, and that the variance of the κi is 0.1 times the variance in the observed catch errors. We assume that the variance in log(m1) is 0.25. Then, given the data, the Bayesian posterior distribution for the model parameters is proportional to
(3n 1)2 log (                                                )
  ∑n                            ∑n         ∑ n
      (log(Cobis) − log (Ci))2 + .1    η2i + 10    κ2i
   i=1                            i=1         i=2
2. log(m 1)2 (1.14)

The number of initial parameters in the model (that is, the number of independent variables in the function to be minimized) is 2n + 4. For the halibut data, there are 56 years of data, which gives 116 parameters. As estimates of the model parameters, we use the mode of the posterior distribution, which can by found by minimizing 1 times expression (1.8). The covariance matrix of the model parameters are estimated by computing the inverse of the Hessian of expression (1.8) at the minimum. The template for the model follows. To improve the readability, the entire template has been included. The various sections are discussed below.

DATA_SECTION 
 init_int nobs; 
 init_matrix data(1,nobs,1,3) 
 vector obs_catch(1,nobs); 
 vector cpue(1,nobs); 
 vector effort(1,nobs); 
 number avg_effort 
INITIALIZATION_SECTION 
 m 2. 
 beta 1. 
 r 1. 
PARAMETER_SECTION 
 init_bounded_number q(0.,1.) 
 init_bounded_number beta(0.,5.) 
 init_bounded_number r(0.,5,2) 
 init_number log_binit(2) 
 init_bounded_dev_vector effort_devs(1,nobs,-5.,5.,3) 
 init_bounded_number m(1,10.,4) 
 init_bounded_vector k_devs(2,nobs,-5.,5.,4) 
 number binit 
 vector pred_catch(1,nobs) 
 vector biomass(1,nobs) 
 vector f(1,nobs) 
 vector k(1,nobs) 
 vector k_trend(1,nobs) 
 sdreport_number k_1 
 sdreport_number k_last 
 sdreport_number k_change 
 sdreport_number k_ratio 
 sdreport_number B_projected 
 number tmp_mort; 
 number bio_tmp; 
 number c_tmp; 
 objective_function_value ff; 
PRELIMINARY_CALCS_SECTION 
 // get the data out of the data matrix into 
 obs_catch=column(data,2); 
 cpue=column(data,3); 
 // divide the catch by the cpue to get the effort 
 effort=elem_div(obs_catch,cpue); 
 // normalize the effort and save the average 
 double avg_effort=mean(effort); 
 effort/=avg_effort; 
 cout << " beta" << beta << endl; 
PROCEDURE_SECTION 
 // calculate the fishing mortality 
 calculate_fishing_mortality(); 
 // calculate the biomass and predicted catch 
 calculate_biomass_and_predicted_catch(); 
 // calculate the objective function 
 calculate_the_objective_function(); 
 
FUNCTION calculate_fishing_mortality 
 // calculate the fishing mortality 
 f=q*effort; 
 if (active(effort_devs)) f=elem_prod(f,exp(effort_devs)); 
 
FUNCTION calculate_biomass_and_predicted_catch 
 // calculate the biomass and predicted catch 
 if (!active(log_binit)) 
 { 
   log_binit=log(obs_catch(1)/(q*effort(1))); 
 } 
 binit=exp(log_binit); 
 biomass[1]=binit; 
 if (active(k_devs)) 
 { 
   k(1)=binit/beta; 
   for (int i=2;i<=nobs;i++) 
   { 
    k(i)=k(i-1)*exp(k_devs(i)); 
   } 
 } 
 else 
 { 
   // set the whole vector equal to the constant k value 
   k=binit/beta; 
 } 
 // only calculate these for the standard deviation report 
 if (sd_phase) 
 { 
   k_1=k(1); 
   k_last=k(nobs); 
   k_ratio=k(nobs)/k(1); 
   k_change=k(nobs)-k(1); 
 } 
 // two time steps per year desired 
 int nsteps=2; 
 double delta=1./nsteps; 
 // Integrate the logistic dynamics over n time steps per year 
 for (int i=1; i<=nobs; i++) 
 { 
   bio_tmp=1.e-20+biomass[i]; 
   c_tmp=0.; 
   for (int j=1; j<=nsteps; j++) 
   { 
    //This is the new biomass after time step delta 
    bio_tmp=bio_tmp*(1.+r*delta)/ 
      (1.+ (r*pow(bio_tmp/k(i),m-1.)+f(i))*delta ); 
    // This is the catch over time step delta 
    c_tmp+=f(i)*delta*bio_tmp; 
   } 
   pred_catch[i]=c_tmp;    // This is the catch for this year 
   if (i<nobs) 
   { 
    biomass[i+1]=bio_tmp;// This is the biomass at the beginning of the next 
   }                // year 
   else 
   { 
    B_projected=bio_tmp; // get the projected biomass for std dev report 
   } 
 } 
 
FUNCTION calculate_the_objective_function 
 if (!active(effort_devs)) 
 { 
   ff=nobs/2.*log(norm2(log(obs_catch)-log(1.e-10+pred_catch))); 
 } 
 else if(!active(k_devs)) 
 { 
   ff= .5*(size_count(obs_catch)+size_count(effort_devs))* 
    log(norm2(log(obs_catch)-log(1.e-10+pred_catch)) 
    +0.1*norm2(effort_devs)); 
 } 
 else 
 { 
   ff= .5*( size_count(obs_catch)+size_count(effort_devs) 
    +size_count(k_devs) )* 
    log(norm2(log(obs_catch)-log(pred_catch)) 
    + 0.1*norm2(effort_devs)+10.*norm2(k_devs)); 
 } 
 // Bayesian contribution for Pella Tomlinson m 
 ff+=2.*square(log(m-1.)); 
 if (current_phase()<3) 
 { 
   ff+=1000.*square(log(mean(f)/.4)); 
 }

The data are contained in three columns, with the catch and catch per unit effort data contained in the second and third columns. The matrix data is defined in order to read the data. The second and third columns of data, which is what we are interested in, will then be put into the vectors obs_catch and cpue. (Later, we get the fishing effort by dividing the obs_catch by the cpue.)

DATA_SECTION 
 init_int nobs 
 init_matrix data(1,nobs,1,3) 
 vector obs_catch(1,nobs) 
 vector cpue(1,nobs) 
 vector effort(1,nobs) 
 number avg_effort

The INITIALIZATION_SECTION is used to define default values for some model parameters if the standard default provided by AD Model Builder is not acceptable. If the model finds the parameter file (whose default name is admodel.par), it will read in the initial values for the parameters from there. Otherwise, the default values will be used—unless the parameters appear in the INITIALIZATION_SECTION, in which case, those values will be used.

INITIALIZATION_SECTION 
 m 2. 
 beta 1. 
 r 1.

The PARAMETER_SECTION for this model introduces several new features of AD Model Builder. The statement

 init_bounded_number r(0.,5.,2)

declares an initial parameter whose value will be constrained to lie between 0.0 and 5.0. It is often necessary to put bounds on the initial parameters in nonlinear models to get stable model performance. This is accomplished in AD Model Builder simply by declaring the initial parameter to be bounded and providing the desired bounds. The default initial value for a bounded object is the average of the lower and upper bounds.

The third number ‘2’ in the declaration determines that this initial parameter will not be made active until the second phase of the minimization. This introduces the concept of “phases” in the minimization process.

As soon as nonlinear statistical models become a bit complicated, one often finds that simply attempting to estimate all the parameters simultaneously does not work very well. In short, “You can’t get there from here.” A better strategy is to keep some of the parameters fixed and to first minimize the function with respect to the other parameters. More parameters are added in a stepwise relaxation process. In AD Model Builder, each step of this relaxation process is termed a “phase.” The parameter r is not allowed to vary until the second phase. Initial parameters that are allowed to vary will be termed “active.” In the first phase, the active parameter are beta and q. The default phase for an initial parameter is phase 1 if no phase number is included in its declaration. The phase number for an initial parameter is the last number in the declaration for that parameter. The general order for the arguments in the definition of any initial parameter is the size data for a vector or matrix object (if needed), the bounds for a bounded object (if needed), followed by the phase number (if desired).

It is often a difficult problem to decide what the order of relaxation for the initial parameters should be. This must sometimes be done by trial and error. However, AD Model Builder makes the process a lot simpler. One only needs to change the phase numbers of the initial parameters in the PARAMETER_SECTION and recompile the program.

Often in statistical modeling, it is useful to regard a vector of quantities xi as consisting of an overall mean, μ, and a set of deviations from that mean, δi, so that

x =  μ + δ     where  ∑   δ = 0
 i        i                i
                       i
AD Model Builder provides support for this modeling construction with the

 init_bounded_dev_vector

declaration. The components of an object created by this declaration will automatically sum to zero without any user intervention. The line

 init_bounded_dev_vector effort_devs(1,nobs,-5.,5.,3)

declares effort_devs to be this kind of object. The bounds -5.,5. control the range of the deviations. Putting reasonable bounds on such deviations often improves the stability of the estimation procedure.

AD Model Builder has sdreport_number, sdreport_vector, and sdreport_matrix declarations in the PARAMETER_SECTION. These objects behave the same as number, vector, and matrix objects, with the additional property that they are included in the report of the estimated standard deviations and correlation matrix.

For example, merely by including the statement sdreport_number B_projected, one can obtain the estimated standard deviation of the biomass projection for the next year. (Of course, you must also set B_projected equal to the projected biomass. This is done in the PROCEDURE_SECTION.)

PARAMETER_SECTION 
 init_bounded_number q(0.,1.) 
 init_bounded_number beta(0.,5.) 
 init_bounded_number r(0.,5,2) 
 init_number log_binit(2) 
 init_bounded_dev_vector effort_devs(1,nobs,-5.,5.,3) 
 init_bounded_number m(1,10.,4) 
 init_bounded_vector k_devs(2,nobs,-5.,5.,4) 
 number binit 
 vector pred_catch(1,nobs) 
 vector biomass(1,nobs) 
 vector f(1,nobs) 
 vector k(1,nobs) 
 vector k_trend(1,nobs) 
 sdreport_number k_1 
 sdreport_number k_last 
 sdreport_number k_change 
 sdreport_number k_ratio 
 sdreport_number B_projected 
 number tmp_mort; 
 number bio_tmp; 
 number c_tmp; 
 objective_function_value ff;

The PRELIMINARY_CALCS_SECTION carries out a few simple operations on the data. The model expects to have catch and effort data, but the input file contained catch and “cpue” (“catch/effort”) data. We divide the catch data by the cpue data to get the effort data. The AUTODIF operation elem_div, which performs element-wise divisions of vector objects, is used. As usual, the same thing could have been accomplished by employing a loop and writing element-wise code. The effort data are then normalized—that is, they are divided by their average so that their average becomes 1. This is done so that we have a good idea what the catchability parameter q should be to give reasonable values for the fishing mortality rate (since F = qE).

extract a column from a matrix Notice that the PRELIMINARY_CALCS_SECTION section is C++ code, so that statements must be ended with a ;.

PRELIMINARY_CALCS_SECTION 
 // get the data out of the data matrix into 
 obs_catch=column(data,2); 
 cpue=column(data,3); 
 // divide the catch by the cpue to get the effort 
 effort=elem_div(obs_catch,cpue); 
 // normalize the effort and save the average 
 double avg_effort=mean(effort); 
 effort/=avg_effort;

The PROCEDURE_SECTION contains several new AD Model Builder features. Some have to do with the notion of carrying out the minimization in a number of steps or phases. The line

 if (active(effort_devs)) f=elem_prod(f,exp(effort_devs));

introduces the active function. This function can be used on any initial parameter and will return a value “true” if that parameter is active in the current phase. The idea here is that if the initial parameters effort_devs are not active, then since their value is zero, carrying out the calculations will have no effect, and we can save time by avoiding the calculations. The active function is also used in the statement

 if (!active(log_binit)) 
 { 
   log_binit=log(obs_catch(1)/(q*effort(1))); 
 }

The idea is that if the log_binit initial parameter (this is the logarithm of the biomass at the beginning of the first year) is not active, then we set it equal to the value that produces the observed catch (using the relationship C = qEB, such that B = C∕(qE). The active function is also used in the calculations of the objective function, so that unnecessary calculations are avoided.

The following code helps to deal with convergence problems in this type of nonlinear model. The problem is that the starting parameter values are often so bad that the optimization procedure will try to make the population very large and the exploitation rate very small, because this is the best local solution near the starting parameter values. To circumvent this problem, we include a penalty function to keep the average value of the fishing mortality rate f close to 0.2 during the first two phases of the minimization. In the final phase, the size of the penalty term is reduced to a very small value. The function current_phase() returns the value of the current phase of the minimization:

if (current_phase()<3) 
 { 
   ff+=1000.*square(log(mean(f)/.4)); 
 }

1.19 Using functions to improve code organization

Subroutines or functions are used to improve the organization of the code. The code for the main part of the PROCEDURE_SECTION that invokes the functions should be placed at the top of the PROCEDURE_SECTION.

PROCEDURE_SECTION 
 // calculate the fishing mortality 
 calculate_fishing_mortality(); 
 // calculate the biomass and predicted catch 
 calculate_biomass_and_predicted_catch(); 
 // calculate the objective function 
 calculate_the_objective_function();

There are three user-defined functions called at the beginning of the PROCEDURE_SECTION. The code to define the functions comes next. To define a function whose name is, say, name, the template directive FUNCTION name is used. Notice that no parentheses () are used in the definition of the function, but to call the function, the statement takes the form name();

1.20 A fisheries catch-at-age model

This section describes a simple catch-at-age model. The data input to this model include estimates of the numbers at age caught by the fishery each year and estimates the fishing effort each year. This example introduces AD Model Builder’s ability to automatically calculate profile likelihoods for carrying out Bayesian inference. To cause the profile likelihood calculations to be carried out, use the -lprof command line argument.

Let i index fishing years 1 i n and j index age classes, with 1 j r. The instantaneous fishing mortality rate is assumed to have the form Fij = qEisj exp(δi), where q is called the “catchability,” Ei is the observed fishing effort, sj is an age-dependent effect termed the “selectivity,” and the δi are deviations from the expected relationship between the observed fishing effort and the resulting fishing mortality. The δi are assumed to be normally distributed, with mean 0. The instantaneous natural mortality rate M is assumed to be independent of year and age class. It is not estimated in this version of the model. The instantaneous total mortality rate is given by Zij = Fij + M. The survival rate is given by Sij = exp(Zij). The number of age-class j fish in the population in year i is denoted by Nij. The relationship Ni+1,j+1 = NijSij is assumed to hold. Note that using this relationship, if one knows Sij, then all the Nij can be calculated from knowledge of the initial population in year 1, N11,N12,,N1r and knowledge of the recruitment in each year N21,N31,Nn1.

The purpose of the model is to estimate quantities of interest to managers, such as the population size and exploitation rates, and to make projections about the population. In particular, we can get an estimate of the numbers of fish in the population in year n + 1 for age classes 2 or greater from the relationship Nn+1,j+1 = NnjSnj. If we have estimates mj for the mean weight at age j, then the projected biomass level Bn+1 of age class 2+ fish for year n + 1 can be computed from the relationship Bn+1 = j=2rm jNn+1,j.

Besides getting a point estimate for quantities of interest like Bn+1, we also want to get an idea of how well determined the estimate is. AD Model Builder has completely automated the process of deriving good confidence limits for these parameters in a Bayesian context. One simply needs to declare the parameter to be of type likeprof_number. The results are given in Section 1.21.

The code for the catch-at-age model is:

DATA_SECTION 
 // the number of years of data 
 init_int nyrs 
 // the number of age classes in the population 
 init_int nages 
 // the catch-at-age data 
 init_matrix obs_catch_at_age(1,nyrs,1,nages) 
 //estimates of fishing effort 
 init_vector effort(1,nyrs) 
 // natural mortality rate 
 init_number M 
 // need to have relative weight at age to calculate biomass of 2+ 
 vector relwt(2,nages) 
INITIALIZATION_SECTION 
 log_q -1 
 log_P 5 
PARAMETER_SECTION 
 init_number log_q(1) // log of the catchability 
 init_number log_P(1) // overall population scaling parameter 
 init_bounded_dev_vector log_sel_coff(1,nages-1,-15.,15.,2) 
 init_bounded_dev_vector log_relpop(1,nyrs+nages-1,-15.,15.,2) 
 init_bounded_dev_vector effort_devs(1,nyrs,-5.,5.,3) 
 vector log_sel(1,nages) 
 vector log_initpop(1,nyrs+nages-1); 
 matrix F(1,nyrs,1,nages) // the instantaneous fishing mortality 
 matrix Z(1,nyrs,1,nages) // the instantaneous total mortality 
 matrix S(1,nyrs,1,nages) // the survival rate 
 matrix N(1,nyrs,1,nages) // the predicted numbers at age 
 matrix C(1,nyrs,1,nages) // the predicted catch at age 
 objective_function_value f 
 sdreport_number avg_F 
 sdreport_vector predicted_N(2,nages) 
 sdreport_vector ratio_N(2,nages) 
 likeprof_number pred_B 
PRELIMINARY_CALCS_SECTION 
 // this is just to invent some relative average 
 // weight numbers 
 relwt.fill_seqadd(1.,1.); 
 relwt=pow(relwt,.5); 
 relwt/=max(relwt); 
PROCEDURE_SECTION 
 // example of using FUNCTION to structure the procedure section 
 get_mortality_and_survival_rates(); 
 
 get_numbers_at_age(); 
 
 get_catch_at_age(); 
 
 evaluate_the_objective_function(); 
 
FUNCTION get_mortality_and_survival_rates 
 // calculate the selectivity from the sel_coffs 
 for (int j=1;j<nages;j++) 
 { 
   log_sel(j)=log_sel_coff(j); 
 } 
 // the selectivity is the same for the last two age classes 
 log_sel(nages)=log_sel_coff(nages-1); 
 
 // This is the same as F(i,j)=exp(log_q)*effort(i)*exp(log_sel(j)); 
 F=outer_prod(mfexp(log_q)*effort,mfexp(log_sel)); 
 if (active(effort_devs)) 
 { 
   for (int i=1;i<=nyrs;i++) 
   { 
    F(i)=F(i)*exp(effort_devs(i)); 
   } 
 } 
 // get the total mortality 
 Z=F+M; 
 // get the survival rate 
 S=mfexp(-1.0*Z); 
 
FUNCTION get_numbers_at_age 
 log_initpop=log_relpop+log_P; 
 for (int i=1;i<=nyrs;i++) 
 { 
   N(i,1)=mfexp(log_initpop(i)); 
 } 
 for (int j=2;j<=nages;j++) 
 { 
   N(1,j)=mfexp(log_initpop(nyrs+j-1)); 
 } 
 for (i=1;i<nyrs;i++) 
 { 
   for (j=1;j<nages;j++) 
   { 
    N(i+1,j+1)=N(i,j)*S(i,j); 
   } 
 } 
 // calculated predicted numbers at age for next year 
 for (j=1;j<nages;j++) 
 { 
   predicted_N(j+1)=N(nyrs,j)*S(nyrs,j); 
   ratio_N(j+1)=predicted_N(j+1)/N(1,j+1); 
 } 
 // calculate predicted biomass for profile 
 // likelihood report 
 pred_B=predicted_N *relwt; 
 
FUNCTION get_catch_at_age 
 C=elem_prod(elem_div(F,Z),elem_prod(1.-S,N)); 
 
FUNCTION evaluate_the_objective_function 
 // penalty functions to ‘‘regularize ’’ the solution 
 f+=.01*norm2(log_relpop); 
 avg_F=sum(F)/double(size_count(F)); 
 if (last_phase()) 
 { 
   // a very small penalty on the average fishing mortality 
   f+= .001*square(log(avg_F/.2)); 
 } 
 else 
 { 
   // use a large penalty during the initial phases to keep the 
   // fishing mortality high 
   f+= 1000.*square(log(avg_F/.2)); 
 } 
 // errors in variables type objective function with errors in 
 // the catch at age and errors in the effort fishing mortality 
 // relationship 
 if (active(effort_devs) 
 { 
   // only include the effort_devs in the objective function if 
   // they are active parameters 
   f+=0.5*double(size_count(C)+size_count(effort_devs)) 
    * log( sum(elem_div(square(C-obs_catch_at_age),.01+C)) 
    + 0.1*norm2(effort_devs)); 
 } 
 else 
 { 
   // objective function without the effort_devs 
   f+=0.5*double(size_count(C)) 
    * log( sum(elem_div(square(C-obs_catch_at_age),.01+C))); 
 } 
REPORT_SECTION 
 report << "Estimated numbers of fish " << endl; 
 report << N << endl; 
 report << "Estimated numbers in catch " << endl; 
 report << C << endl; 
 report << "Observed numbers in catch " << endl; 
 report << obs_catch_at_age << endl; 
 report << "Estimated fishing mortality " << endl; 
 report << F << endl;

This model employs several instances of the init_bounded_dev_vector type. This type consists of a vector of numbers that sum to zero—that is, they are deviations from a common mean, and are bounded. For example, the quantities log_relpop are used to parameterize the logarithm of the variations in year class strength of the fish population. Putting bounds on the magnitude of the deviations helps to improve the stability of the model. The bounds are from 15.0 to 15.0, which gives the estimates of relative year class strength a dynamic range of exp(30.0).

The FUNCTION keyword has been employed a number of times in the PARAMETER_SECTION to help structure the code. A function is defined simply by using the FUNCTION keyword followed by the name of the function.

 FUNCTION get_mortality_and_survival_rates

Don’t include the parentheses or semicolon there. To use the function, simply write its name in the procedure section.

 get_mortality_and_survival_rates();

You must include the parentheses and the semicolon here.

The REPORT_SECTION shows how to generate a report for an AD Model Builder program. The default report generating machinery utilizes the C++ “stream formalism.” You don’t need to know much about streams to make a report, but a few comments are in order. The stream formalism associates a stream object with a file. In this case, the stream object associated with the AD Model Builder report file is report. To write an object xxx into the report file, you insert the line

 report << xxx;

into the REPORT_SECTION. If you want to skip to a new line after writing the object, you can include the stream manipulator endl as in

 report << "Estimated numbers of fish " << endl;

Notice that the stream operations know about common C objects, such as strings, so that it is a simple matter to put comments or labels into the report file.

1.21 Bayesian inference and the profile likelihood

AD Model Builder enables one to quickly build models with large numbers of parameters. This is especially useful for employing Bayesian analysis. Traditionally, however, it has been difficult to interpret the results of analysis using such models. In a Bayesian context, the results are represented by the posterior probability distribution for the model parameters. To get exact results from the posterior distribution, it is necessary to evaluate integrals over large-dimensional spaces. This can be computationally intractable. AD Model Builder provides approximations to these integrals in the form of the profile likelihood. The profile likelihood can be used to estimate for extreme values (such as estimating a value β so that for a parameter b, the probability that b < β 0.10, or the probability that b > β 0.10) for any model parameter. To use this facility, simply declare the parameter of interest to be of type likeprof_number in the PARAMETER_SECTION and assign the correct value to the parameter in the PROCEDURE_SECTION.

The code for the catch-at-age model estimates the profile likelihood for the projected biomass of age class 2+ fish. (Age class 2+ has been used to avoid the extra problem of dealing with the uncertainty of the recruitment of age class 1 fish.) As a typical application of the method, the user of the model can estimate the probability of biomass of fish for next year being larger or smaller than a certain value. Estimates like these are obviously of great interest to managers of natural resources.

The profile likelihood report for a variable is in a file with the same name as the variable (truncated to eight letters, if necessary, with the suffix .PLT appended). For this example, the report is in the file PRED_B.PLT. Part of the file is shown here:

pred_B: 
Profile likelihood 
-1411.23 1.1604e-09 
-1250.5 1.71005e-09 
-1154.06 2.22411e-09 
 ...................   // skip some here 
 ................... 
278.258 2.79633e-05 
324.632 5.28205e-05 
388.923 6.89413e-05 
453.214 8.84641e-05 
517.505 0.0001116 
581.796 0.000138412 
 ................... 
 ................... 
1289 0.000482459 
1353.29 0.000494449 
1417.58 0.000503261 
1481.87 0.000508715 
1546.16 0.0005107 
1610.45 0.000509175 
1674.74 0.000504171 
1739.03 0.000490788 
1803.32 0.000476089 
1867.61 0.000460214 
1931.91 0.000443313 
1996.2 0.000425539 
2060.49 0.000407049 
2124.78 0.000388 
2189.07 0.00036855 
 ................... 
 ................... 
4503.55 2.27712e-05 
4599.98 2.00312e-05 
4760.71 1.48842e-05 
4921.44 1.07058e-05 
5082.16 7.45383e-06 
 ................... 
 ................... 
6528.71 6.82689e-07 
6689.44 6.91085e-07 
6850.17 7.3193e-07 
Minimum width confidence limits: 
      significance level lower bound upper bound 
           0.90         572.537   3153.43 
           0.95         453.214   3467.07 
           0.975        347.024   3667.76 
 
One sided confidence limits for the profile likelihood: 
 
The probability is  0.9 that pred_B is greater than 943.214 
The probability is 0.95 that pred_B is greater than 750.503 
The probability is 0.975 that pred_B is greater than 602.507 
 
The probability is  0.9 that pred_B is less than 3173.97 
The probability is 0.95 that pred_B is less than 3682.75 
The probability is 0.975 that pred_B is less than 4199.03

The file contains the probability density function and the approximate confidence limits for the the profile likelihood and the normal approximation. Since the format is the same for both, we only discuss the profile likelihood here. The first part of the report contains pairs of numbers (xi,yi), which consist of values of the parameter in the report (in this case, PRED_B) and the estimated value for the probability density associated with that parameter at the point. The probability that the parameter x lies in the interval xr x xs, where xr < xs, can be estimated from the sum

∑s
   (xi+1 − xi)yi.
i=r
The reports of the one and two-sided confidence limits for the parameter were produced this way. Also, a plot of yi versus xi gives the user an indication of what the probability distribution of the parameter looks like. (See Figure 1.5.)

________________________________________________________________________________ 1.00 0.50 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50.................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. like. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ....................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................................................................. ........................................................................Normal approx. _ _ _ _ _ _ _

Figure 1.5: Predicted biomass of (2+) fish ×103.


The profile likelihood indicates the fact that the biomass cannot be less than zero. The normal approximation is not very useful for calculating the probability that the biomass is very low—a question of great interest to managers, who are probably not going to be impressed by the knowledge that there is an estimated probability of 0.975 that the biomass is greater than 52.660.

One sided confidence limits for the normal approximation 
 
The probability is  0.9 that pred_B is greater than 551.235 
The probability is 0.95 that pred_B is greater than 202.374 
The probability is 0.975 that pred_B is greater than -52.660

1.22 Saving the output from profile likelihood to use as starting values for MCMC analysis

If the profile likelihood calculations are carried out with the -prsave option, the values of the independent variables for each point on the profile are saved in a file named, say, xxx.pvl, where xxx is the name of the variable being profiled.

#Step -8 
#num sigmas -27 
-2.96325 6.98069 -2.96893 -1.15811 0.417864 1.5352 1.50556 
0.668417 1.29106 2.04238 1.85167 1.02342 1.03264 1.35247 
1.5832 1.87033 1.67212 0.984254 -0.58013 -8.10159 0.757686 
0.958038 0.414446 -1.48443 -2.57572 -4.09184 -0.869426 -0.545055 
-0.333125 -0.350978 -0.487261 -0.123192 -0.158569 -0.434328 
-0.609651 -0.684244 -0.405214 5.00104 
#Step -7 
#num sigmas -22 
 // ............................. 
#Step 7 
#num sigmas 22 
-5.94034 9.29211 -2.6122 0.0773101 1.54853 1.91895 0.578923 
-1.51152 0.0124827 0.712157 0.520084 -0.202059 -0.0505122 
0.284112 0.469956 0.731273 0.664325 0.642344 0.691073 
-1.10233 -0.362781 0.034522 0.0127999 -0.538117 -0.575466 
-1.94386 -0.544077 -0.0349702 0.349352 0.355073 0.237236 
0.335559 0.177427 -0.0507647 -0.167382 -0.303103 -0.249956 -0.104393 
#Step 8 
#num sigmas 27 
-6.09524 9.43103 -2.59874 0.0930842 1.55938 
1.91285 0.561478 -1.52804 -0.0139936 0.687758 0.502089 
-0.212203 -0.0519722 0.287149 0.474422 0.739316 0.678415 
0.663857 0.71933 -1.07637 -0.387684 0.0146463 0.00647923 
-0.530625 -0.566471 -1.93414 -0.521944 -0.0111346 0.372352 
0.372706 0.247599 0.333505 0.171122 -0.0585298 -0.177735 
-0.319115 -0.273111 -0.135715

To use the values as a starting point for the MCMC analysis, use a text editor to put the desired starting values in a file by themselves. Suppose that the file name is mcmc.dat. Run the MCMC analysis with the option -mcpin mcmc.dat and it will begin the MCMC analysis from that file.

1.23 The profile likelihood calculations

We have been told that the profile likelihood as calculated in AD Model Builder for dependent variables may differ from that calculated by other authors. This section will clarify what we mean by the term and will motivate our calculation.

Let (x1,,xn) be n independent variables, f(x1,,xn) be a probability density function, and g denote a dependent variable—that is, a real valued function of (x1,,xn). The profile likelihood calculation for g is intended to produce an approximation to the probability density function for g.

Consider first the case, where g is equal to one of the independent variables, say g = x1. In this simple case, the marginal distribution of x1 is give by the integral
∫

  f (x1, ...,xn)dx2dx3 ⋅⋅⋅dxn
(1.15)

The use of the profile likelihood in this case is based on the assumption (or hope) that there exists a constant λ independent of x1 such that λ max x2,,xn{f(x1,...,xn)} is a good approximation to this integral.

This approach should be useful for a lot of applications based on the fact that the central limit theorem implies that for a lot of observations, the posterior probability distribution is more or less well approximated by a multivariate normal distribution, and for such distributions, the assumptions holds exactly. So the profile likelihood is calculated by calculating the conditional maximum of the likelihood function and then normalizing it so that it integrates to 1.

For an arbitrary dependent variable, the situation is a bit more complicated. A good approximation to a probability distribution should have the property of parameter invariance, that is, Pr{a x b} = Pr{h(a) h(x) h(b)} for any monotonically increasing function h. To achieve the property of parameter invariance, we modify the definition of profile likelihood for dependent variables.

Fix a value g0 for g and consider the integral

∫

 {x:g−ϵ∕2≤g(x)≤g +ϵ∕2} f(x1,...,xn)dx1dx2 ⋅⋅⋅dxn
    0          0
which is the probability that g(x) has a value between g0 ϵ∕2 and g0 + ϵ∕2. This probability depends on two quantities, the value of f(x) and the thickness of the region being integrated over. We approximate f(x) by its maximum value ˆx(g) = max {x:g(x)=g0}{f(x)}. For the thickness, we have g(ˆx + h) g(ˆx) + ⟨∇g(xˆ),h= ϵ∕2, where h is a vector perpendicular to the level set of g at ˆx. However, g is also perpendicular to the level set, so ⟨∇g(ˆx),h= ∥∇g(xˆ)∥∥h, such that h= ϵ∕(2g(ˆx)). Thus, the integral is approximated by ϵf(xˆ)∥∇g(ˆx). Taking the derivative with respect to ϵ yields f(ˆx)∥∇g(ˆx), which is the profile likelihood expression for a dependent variable. For an independent variable, ∥∇g(ˆx)= 1, so our definition of the profile likelihood corresponds to the usual one in this case.

1.24 Modifying the profile likelihood approximation procedure

The functions set_stepnumber() and set_stepsize() can be used to modify the number of points used to approximate the profile likelihood, or to change the stepsize between the points. This can be carried out in the PRELIMINARY_CALCS_SECTION. If u has been declared to be of type likeprof_number,

PRELIMINARY_CALCS_SECTION 
 u.set_stepnumber(10); // default value is 8 
 u.set_stepsize(0.2); // default value is 0.5

will set the number of steps equal to 21 (from 10 to 10) and will set the step size equal to 0.2 times the estimated standard deviation for the parameter u.

1.25 Changing the default file names for data and parameter input

The following code fragment illustrates how the files used for input of the data and parameter values can be changed. This code has been taken from the example catage.tpl and modified. In the DATA_SECTION, the data are first read in from the file catch.dat. Then the effort data are read in from the file effort.dat. The remainder of the data are read in from the file catch.dat. It is necessary to save the current file position in an object of type streampos. This object is used to position the file properly. The escape sequence !! can be used to include one line of the user’s code into the DATA_SECTION or PARAMETER_SECTION. This is more compact than the LOCAL_CALCS construction.

DATA_SECTION 
// will read data from file catchdat.dat 
!! ad_comm::change_datafile_name("catchdat.dat"); 
 init_int nyrs 
 init_int nages 
 init_matrix obs_catch_at_age(1,nyrs,1,nages) 
// now read the effort data from the file effort.dat and save the current 
// file position in catchdat.dat in the object tmp 
!! streampos tmp = ad_comm::change_datafile_name("effort.dat"); 
 init_vector effort(1,nyrs) 
// now read the rest of the data from the file catchdat.dat 
// including the ioption argument tmp will reset the file to that position 
!! ad_comm::change_datafile_name("catchdat.dat",tmp); 
 init_number M 
 
// .... 
 
PARAMETER_SECTION 
// will read parameters from file catch.par 
!! ad_comm::change_parfile_name("catch.par");

1.26 Using the subvector operation to avoid writing loops

If v is a vector object, then for integers l and u, the expression v(l,u) is a vector object of the same type, with minimum valid index l and maximum valid index u. (Of course, l and u must be within the valid index range for v, and l must be less than or equal to u.) The subvector formed by this operation can be used on both sides of the equals sign in an arithmetic expression. The number of loops that must be written can be significantly reduced in this manner. We shall use the subvector operator to remove some of the loops in the catch-at-age model code.

 // calculate the selectivity from the sel_coffs 
 for (int j=1;j<nages;j++) 
 { 
   log_sel(j)=log_sel_coff(j); 
 } 
 // the selectivity is the same for the last two age classes 
 log_sel(nages)=log_sel_coff(nages-1); 
 
 // same code using the subvector operation 
 log_sel(1,nage-1)=log_sel_coff; 
 // the selectivity is the same for the last two age classes 
 log_sel(nages)=log_sel_coff(nages-1);

Notice that log_sel(1,nage-1) is not a distinct vector from log_sel. This means that an assignment to log_sel(1,nage-1) is an assignment to a part of log_sel. The next example is a bit more complicated. It involves taking a row of a matrix to form a vector, forming a subvector, and changing the valid index range for the vector.

 // loop form of the code 
 for (i=1;i<nyrs;i++) 
 { 
   for (j=1;j<nages;j++) 
   { 
    N(i+1,j+1)=N(i,j)*S(i,j); 
   } 
 } 
 
 // can only eliminate the inside loop 
 for (i=1;i<nyrs;i++) 
 { 
   // ++ increments the index bounds by 1 
   N(i+1)(2,nyrs)=++elem_prod(N(i)(1,nage-1),S(i)(1,nage-1)); 
 }

Notice that N(i+1) is a vector object, so N(i+1)(2,nyrs) is a subvector of N(i). Another point is that elem_prod(N(i)(1,nage-1),S(i)(1,nage-1)) is a vector object with minimum valid index 1 and maximum valid index nyrs-1. The operator ++ applied to a subvector increments the valid index range by 1, so that it has the same range of valid index values as N(i+1)(2,nyrs). The operator --@-- would decrement the valid index range by 1.

1.27 The use of higher-dimensional arrays

The example contained in the file FOURD.TPL illustrates some aspects of the use of 3 and 4-dimensional arrays. There are now examples of the use of arrays up to dimension 7 in the documentation. 2 2See Chapter 4 for an example of the use of higher-dimensional arrays.

DATA_SECTION 
 init_4darray d4(1,2,1,2,1,3,1,3) 
 init_3darray d3(1,2,1,3,1,3) 
PARAMETER_SECTION 
 init_matrix M(1,3,1,3) 
 4darray p4(1,2,1,2,1,3,2,3) 
 objective_function_value f 
PRELIMINARY_CALCS_SECTION 
for (int i=1;i<=3;i++) 
{ 
  M(i,i)=1; // set M equal to the identity matrix to start 
} 
PROCEDURE_SECTION 
for (int i=1;i<=2;i++) 
{ 
  for (int j=1;j<=2;j++) 
  { 
   // d4(i,j) is a 3x3 matrix -- d3(i) is a 3x3 matrix 
   // d4(i,j)*M is matrix multiplication -- inv(M) is matrix inverse 
   f+= norm2( d4(i,j)*M + d3(i)+ inv(M) ); 
  } 
} 
REPORT_SECTION 
 report << "Printout of a 4 dimensional array" << endl << endl; 
 report << d4 << endl << endl; 
 report << "Printout of a 3 dimensional array" << endl << endl; 
 report << d3 << endl << endl;

In the DATA_SECTION, you can use 3darrays, 4darrays,…, 7darrays, and init_3darrays, init_4darrays,…, init_7darrays. In the PARAMETER_SECTION, you can use 3darrays, 4darrays,…, 7darrays, and init_3darrays, init_4darrays,…, init_5darrays, at the time of writing.

If d4 is a 4darray, then d4(i) is a 3-dimensional array and d4(i,j) is a matrix object, so d4(i,j)*M is matrix multiplication. Similarly, if d3 is a 3darray, then d3(i) is a matrix object, so d4(i,j)*M + d3(i) + inv(M) combines matrix multiplication, matrix inversion, and matrix addition.

1.28 The TOP_OF_MAIN section

The TOP_OF_MAIN section is intended to allow the programmer to insert any desired C++ code at the top of the main() function in the program. The code is copied literally from the template to the program. This section can be used to set the AUTODIF global variables. (See the AUTODIF user’s manual chapter on AUTODIF global variables.) The following code fragment will set these variables:

TOP_OF_MAIN_SECTION 
 arrmblsize = 200000; // use instead of 
              // gradient_structure::set_ARRAY_MEMBLOCK_SIZE 
 gradient_structure::set_GRADSTACK_BUFFER_SIZE(100000); // this may be 
                             // incorrect in the AUTODIF manual. 
 gradient_structure::set_CMPDIF_BUFFER_SIZE(50000); 
 gradient_structure::set_MAX_NVAR_OFFSET(500); // can have up to 500 
                                 // independent variables 
 gradient_structure::set_MAX_NUM_DEPENDENT_VARIABLES(500); // can have 
                               // up to 500 dependent variables

Note that within AD Model Builder, one doesn’t use the function

 gradient_structure::set_ARRAY_MEMBLOCK_SIZE

to set the amount of memory available for variable arrays. Instead, use the line of code

 arrmblsize = nnn;

where nnn is the amount of memory desired.

1.29 The GLOBALS_SECTION

The GLOBALS_SECTION is intended to allow the programmer to insert any desired C++ code before the main() function in the program. The code is copied literally from the template to the program. This enables the programmer to define global objects, and to include include header files and user-defined functions into the generated C++ code.

1.30 The BETWEEN_PHASES_SECTION

Code in BETWEEN_PHASES_SECTION is executed before each phase of the minimization. It is possible to carry out different actions that depend on what phase of the minimization is to begin, by using a switch statement (you can read about this in a book on C or C++), together with the current_phase() function.

switch (current_phase()) 
{ 
case 1: 
 // some action 
 cout << "Before phase 1 minimization " << endl; 
 break; 
case 2: i 
 // some action 
 cout << "Before phase 2 minimization " << endl; 
 break; 
// .... 
}

Chapter 2
Markov Chain Simulation

2.1 Introduction to the Markov Chain Monte Carlo Approach in Bayesian Analysis

The reference for this chapter is [?], Chapter 11).

The Markov chain Monte Carlo method (MCMC) is a method for approximating the posterior distribution for parameters of interest in the Bayesian framework. This option is invoked by using the command line option -mcmc N, where N is the number of simulations performed. You will probably also want to include the option -mcscale, which dynamically scales the covariance matrix until a reasonable acceptance rate is observed. You may also want to use the -mcmult n option, which scales the initial covariances matrix if the initial values are so large that arithmetic errors occur. One advantage of AD Model Builder over some other implementations of MCMC is that the mode of the posterior distribution, together with the Hessian at the mode, is available to use for the MCMC routine. This information is used to implement a version of the Hastings-Metropolis algorithm. Another advantage is that with AD Model Builder, it is possible to calculate the profile likelihood for a parameter of interest and compare the distribution to the MCMC distribution for that parameter. A large discrepancy may indicate that one or both estimates are inadequate. If you wish to do more simulations (and to carry on from where the last one ended), use the -mcr option. Figure 2.1 compares the profile likelihood for the projected biomass to the estimates produced by the MCMC method, for different sample sizes (25,000 and 2,500,000 samples) for the catage example.


____________________________________________________________ 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0_ _ _ _ _ _ _ _ _................................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .....................................................................................................................................Prof. like. _ _ _ MCMC 25,000 ................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................................ MCMC 2,500,000 ............................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .....................................................................................................................................................................

Figure 2.1: Predicted biomass of (2+) fish ×103.


A report containing the observed distributions is produced in the file root.hst. All objects of type sdreport, i.e., number, vector, or matrix, are included. It is possible to save the results of every n

thsimulationbyusingthe-mcsave noption.Afterwords,thesevaluescanbeusedbyrunningthemodelwiththe-mcevaloption,whichwillevaluatetheuserfunctiononceforeverysavedsimulationvalue.Atthistime,thefunctionmceval_phase()willreturnthevaluetrue,′′andcanbeusedasaswitchtoperformdesiredcalculations.Theresultsaresavedinabinaryfileroot.psv.IfyouwanttoconvertthisfileintoASCII,seethenextsection.Ifyouhavealargenumberofvariablesoftypesdreport,calculatingthevaluesofthemfortheMCMCcanappreciablyslowdownthecalculations.Toturnoffthesecalculationsduringthe-mcsavephase,usetheoption-nosdmcmc.Note : IfyouusethisoptionandrestarttheMCMCcalculationswiththe-mcroption,youmustusethe-nosdmcmcaswell.Otherwise,theprogramwilltrytoreadinthenon existenthistogramdata.

AD Model Builder uses the Hessian to produce an (almost) multivariate normal distribution for the Metropolis-Hastings algorithm. It is not exactly multivariate normal, because the random vectors produced are modified to satisfy any bounds on the parameters.

There is also an option for using a fatter-tailed distribution. This distribution is a mixture of the multivariate normal and a fat-tailed distribution. It is invoked with the -mcprobe n option, where n is the amount of fat-tailed distribution in the mixture. Probably, a value of n between 0.05 and 0.10 is best.

2.2 Reading AD Model Builder binary files

Often, the data that AD Model Builder needs to save are saved in the form of a binary file, using the uistream and uostream classes. If these data consist of a series of vectors, all of which have the same dimension, they are often saved in this form, where the dimension is saved at the top of the file, and the vectors are saved afterwards. It may be useful to convert these numbers into binary form, so they can be put into other programs, such as spreadsheets. The following code will read the contents of these binary files. You should call the program readbin.cpp. It should be a simple matter to modify this program for other uses.

#include <fvar.hpp> 
/* program to read a binary file (using ADMBs uistream and 
   uostream stream classes) of vectors of length n. 
   It is assumed that the size n is stored at the top of 
   the file. there is no information about any many vectors 
   are stored so we must check for an eof after each read 
   To use the program you type: 
 
            readbin filename 
*/ 
void produce_comma_delimited_output(dvector& v) 
{ 
 int i1=v.indexmin(); 
 int i2=v.indexmax(); 
 for (int i=i1;i<=i2;i++) 
 { 
   cout << v(i) << ","; 
 } 
 cout << endl; 
} 
 
main(int argc, char * argv[]) 
{ 
 if (argc < 2) 
 { 
   cerr << " Usage: progname inputfilename" << endl; 
   exit(1); 
 } 
 uistream uis = uistream(argv[1]); 
 if (!uis) 
 { 
   cerr << " Error trying to open binary input file " 
      << argv[1] << endl; 
   exit(1); 
 } 
 int ndim; 
 uis >> ndim; 
 if (!uis) 
 { 
   cerr << " Error trying to read dimension of the vector" 
         " from the top of the file " 
      << argv[1] << endl; 
   exit(1); 
 } 
 if (ndim <=0) 
 { 
   cerr << " Read invalid dimension for the vector" 
         " from the top of the file " 
      << argv[1] << " the number was " << ndim << endl; 
   exit(1); 
 } 
 
 int nswitch; 
 cout << " 1 to see all records" << endl 
     << " 2 then after the prompts n1 and n2 to see all" << endl 
     << " records between n1 and n2 inclusive" << endl 
     << " 3 to see the dimension of the vector" << endl 
     << " 4 to see how many vectors there are" << endl; 
 cin >> nswitch; 
 dvector rec(1,ndim); 
 int n1=0; 
 int n2=0; 
 int ii=0; 
 switch(nswitch) 
 { 
 case 2: 
   cout << " Put in the number for the first record you want to see" 
      << endl; 
   cin >> n1; 
   cout << " Put in the number for the second record you want to see" 
      << endl; 
   cin >> n2; 
 case 1: 
   do 
   { 
    uis >> rec; 
    if (uis.eof()) break; 
    if (!uis) 
    { 
      cerr << " Error trying to read vector number " << ii 
         << " from file " << argv[1] << endl; 
      exit(1); 
    } 
    ii++; 
    if (!n1) 
    { 
      // comment out the one you dont want 
      //cout << rec << endl; 
      produce_comma_delimited_output(rec); 
    } 
    else 
    { 
      if (n1<=ii && ii<=n2) 
      { 
       // comment out the one you dont want 
       //cout << rec << endl; 
       produce_comma_delimited_output(rec); 
      } 
    } 
   } 
   while (1); 
   break; 
 case 4: 
   do 
   { 
    uis >> rec; 
    if (uis.eof()) break; 
    if (!uis) 
    { 
      cerr << " Error trying to read vector number " << ii 
         << " from file " << argv[1] << endl; 
      exit(1); 
    } 
    ii++; 
   } 
   while (1); 
   cout << " There are " << ii << " vectors" << endl; 
   break; 
 case 3: 
   cout << " Dimension = " << ndim << endl; 
 default: 
   ; 
 } 
}

2.3 Convergence diagnostics for MCMC analysis

A major difficulty with MCMC analysis is determining whether or not the chain has converged to the underlying distribution. In general, it is never possible to prove that this convergence has occurred. In this section, we concentrate on methods that hopefully will detect situations when convergence has not occurred.

The default MCMC method employed in AD Model Builder takes advantage of the fact that AD Model Builder can find the mode of the posterior distribution and compute the Hessian at the mode. If the posterior distribution is well approximated by a multivariate normal centered at the mode, with covariance matrix equal to the inverse of the Hessian, this method can be extremely efficient for many parameter problems—especially when compared to simpler methods, such as the Gibbs sampler. The price one pays for this increased efficiency is that the method is not as robust as is the Gibbs sampler, and for some problems, it will perform much more poorly than does the Gibbs sampler.

As an example of this poor performance, we consider a simple three-parameter model developed by Vivian Haist to analyze Bowhead whale data.

The data for the model consist of total catches between 1848 and 1993, together with an estimate of the biomass in 1988, and an estimate of the change in relative biomass between 1978 and 1988.

DATA_SECTION 
 init_vector cat(1848,1993) 
 
PARAMETER_SECTION 
 init_bounded_number k(5000,40000,1) 
 init_bounded_number r(0,0.10,1) 
 init_bounded_number p(0.5,1,2) 
 number delta; 
 vector bio(1848,1994); 
 likeprof_number fink 
!! fink.set_stepsize(.003); 
!! fink.set_stepnumber(20); 
 sdreport_number finr 
 sdreport_number finp 
 objective_function_value f 
PROCEDURE_SECTION 
 if (initial_params::mc_phase) 
 { 
   cout << k << endl; 
   cout << r << endl; 
   cout << p << endl; 
 } 
 bio(1848)=k*p; 
 
 for (int iy=1848; iy<=1993; iy++) 
 { 
   dvariable fpen1=0.0; 
   bio(iy+1)=posfun(bio(iy)+r*bio(iy)*(1.-(bio(iy)/k)),100.0,fpen1); 
   dvariable sr=1.- cat(iy)/bio(iy); 
   dvariable kcat=cat(iy); 
    f+=1000*fpen1; 
   if(sr< 0.05) 
   { 
     dvariable fpen=0.; 
     kcat=bio(iy)*posfun(sr,0.05,fpen); 
     f+=10000*fpen; 
//   cout << " kludge "<<iy <<" "<<kcat<<" "<<cat(iy)<<" "<<fpen<<endl; 
   } 
   bio(iy+1)-=kcat; 
 } 
 finr=r; 
 fink=k; 
 finp=p; 
 delta=(bio(1988)-bio(1978))/bio(1978); 
 f+=log(sqrt(2.*PI)*500)+square(bio(1988)-7635.)/(2.*square(500)); 
 f+=log(sqrt(2.*PI)*.03)+square(delta-0.15)/(2.*square(.03));

This is a biomass dynamic model, where the biomass is assumed to satisfy the difference equation
Bi+1 = Bi + r ∗ Bi(1 − Bi∕k) − Ci
(2.1)

For this formulation, there is no guarantee that the biomass will remain positive, so the posfun function has been used in the program to ensure that this condition will hold. This is a very “data-poor” design.

The model was fit to the data and the standard MCMC analysis was performed for it. The results were compared to an MCMC analysis performed with the Gibbs sampler. It was found that the Gibbs sampler performed better.

It is not difficult to determine why the MCMC performed so poorly. The estimated covariance matrix for the parameters is shown below. To four significant figures, the correlation between r and k is 1.0000. Thus, the Hessian matrix is almost singular.

  index  name   value    std.dev   1    2     3 
   1  k    1.0404e+04 8.8390e+05 1.0000 
   2  r    4.8838e-02 9.0337e+00 -1.0000 1.0000 
   3  p    5.7293e-01 3.6946e+00 0.9998 -0.9998 1.0000

If the posterior distribution were exactly normally distributed, then the Hessian would be constant, i.e., not depend on the point at which is is calculated, and its use would produce the most efficient MCMC procedure. However, in nonlinear models, the posterior distribution is not normally distributed, so the Hessian changes as we move away from the mode. Using an almost singular Hessian can make things perform very badly, as in the present case.

To deal with almost singular Hessians, we have added the -mcrb N option. This option reduces the amount of correlation in the Hessian, while leaving the standard deviations fixed. The number N should be between 1 and 9. The smaller the number, the more the correlation is reduced. For this example (see Figure 2.2), a value of 3 seemed to perform well.


PIC

Figure 2.2:


Chapter 3
A Forestry Model: Estimating the Size Distribution of Wildfires

3.1 Model description

This examples highlights two features of AD Model Builder: 1) the use of a numerical integration routine within a statistical parameter estimation model, and 2) the use of the ad_begin_funnel mechanism to reduce the size of temporary file storage required. It also provides a performance comparison between AD Model Builder and Splus.

This problem investigates a model that predicts a relationship between the size and frequency of wildfires. It is assumed that the probability of observing a wildfire in size category i is given by Pi, where
log(Pi) = ln(Si − Si+1) − ln (S(1)).

If fi is the number of wildfires observed to lie in size category i, the log-likelihood function for the problem is given by
              ∑
l(τ,ν,β, σ) =    f [ln(S  − S   ) − ln(S (1 ))]
                   i     i    i+1
               i
(3.1)

where Si is defined by the integral
     ∫ ∞          2                        β
Si =      exp{ − z ∕2 + τ ( − 1 + exp ( − νai exp (σz)))}dz
      −∞
(3.2)

The parameters τ, ν, β, and σ are functions of the parameters of the original model, and don’t have a simple interpretation. Fitting the model to data involves maximizing the above log-likelihood (3.1). While the gradient can be calculated (in integral form), coding it is cumbersome. Numerically maximizing the log-likelihood without specifying the gradient is preferable.

The parameter β is related to the fractal dimension of the perimeter of the fire. One hypothesis of interest is that β = 23, which is related to hypotheses about the nature of the mechanism by which fires spread. The AD Model Builder code for the model follows.

DATA_SECTION 
int time0 
init_int nsteps 
init_int k 
init_vector a(1,k+1) 
init_vector freq(1,k) 
int a_index; 
number sum_freq 
!! sum_freq=sum(freq); 
PARAMETER_SECTION 
 init_number log_tau 
 init_number log_nu 
 init_number log_beta(2) 
 init_number log_sigma 
 sdreport_number tau 
 sdreport_number nu 
 sdreport_number sigma 
 sdreport_number beta 
 vector S(1,k+1) 
 objective_function_value f 
INITIALIZATION_SECTION 
 log_tau 0