## Calibration

The next thing you will want to do analyzing the model is to compare its output to the other data that is available about the system. In many cases we may have better data about the dependent variables in the model than data about the independent variables or parameters. For instance, US Geological Survey (USGS) provides quite extensive data sets for water flows measured over a network of river gages. For a stream hydrology model that is to produce river flow dynamics, we will most likely have quite good information about the flows but poor data about the hydrologic coefficients, such as infiltration, transpiration, and evaporation rates, etc. By solving an inverse problem we will be determining the values of parameters such that the model output will be as close as possible to the observed data. This process of model refinement in attempt to match a certain existing data set is called 'model calibration'. We compare the model output to the data points and start changing the model parameters or structure in order to get a fit as close as possible.

Sensitivity analysis may have already informed us as to what are the parameters that need be modified to produce a certain change in model trajectories. Now we actually change them in such a way that the trajectory of the model output matches the data 'close enough'. How close? It depends upon the level of confidence in the data we have and upon the goals of the study. It also depends on the model that we built. Sometimes we find it very difficult to produce the change in the model output that is needed to get the trajectories closer to the data points. Sometimes it is simply impossible, and we have to find other ways to fix the model, either digging into its structure, or realizing that we have misinterpreted something in the conceptual model, or the time or space scales. Modeling is an iterative process and it is perfectly fine to go back and re-evaluate our assumptions and formalizations.

Note that the data set used for calibration, in a way, is also a model of the process observed. The data are also a simplification of the real process and they may also contain errors, they are never perfect, and besides they have been collected with a certain goal in mind, which does not necessarily have to match the goal of the newly built numerical model. We may call these monitoring results an experimental model or a 'data model'. In this process of calibration we are actually comparing two models and modifying one of them (simulation) to better match the other one (data).

When comparing models, it makes sense to think of a measure of their closeness, or a measure of the fit of the simulation model to the data model. This comparison is important for both calibration and further testing of the model (validation, verification). In all these analyses we would want to see how far the model results deviate from the other information we have about the system (both qualitative and quantitative). We may call this measure the 'error model'. There may be very different ways to represent this error, but they all have in common one feature, which is that they represent the distance between two models - in this case, the data model and the formal model. The very simplest error model is 'eyeballing', or visual comparison. That is when you simply look at the graphs and decide whether they are close enough or not.

However, this may become difficult as we get closer to the target, or when the graph is closer in one time range for one set of parameters and closer in a different time range for another set of parameters. In those cases visual comparisons can fail. Mathematical formulas can then become useful. One simple formula for the error model is

where y j are the data points, and xy are the values in the model output that correspond in time or space to the data points.

Very often the metric used to compare the models is the Pearson moment product correlation coefficient:

or the 'R2 value', which is equal to r2.

There may be many other ways to estimate the error model. For example, a model performance index was proposed that incorporates some 12 metrics to estimate the deviation between the two time series (Table 1).

There are many other statistical tools that are available (e.g., in Excel or in statistical software packages), which may be further used for a refinement of these comparisons.

There is a difference in calibrating empirical and process-based models. In empirical models, we entirely rely on the information that we have in the data sets. We come up with some type of equation and then quite mechanically adjust the parameters in an attempt to reproduce the data as well as possible. All the information we know about the system is in the data, and the parameters usually can take any values as long as the error model is minimal.

In process-based models calibration is different since we are restricted by the ecological, physical, or chemical meaning of the parameters that we change. Besides there are usually some estimates for the size of the parameters: they are rarely precisely measured but at least the order of magnitude or a range is usually known. Moreover, there are other factors that may play a role, such as confidence in the available estimates for the parameter; sensitivity of the model to a parameter, etc. These are important considerations in the calibration process.

Percentage of points falling into a reference interval

Like BOUNDS, weighted according to distance of outliers from nearest limit of interval Proportion of points falling into 95% confidence interval of reference data Like CINT, weighted according to distance of outliers Theil's (1961) coefficient of inequality between paired data

Result of simultaneous test of slope = 1, intercept = 0 in linear regression of observed vs. simulated data, according to

Dent and Blackie (1979) Steady state, done as piecewise linear regression and test of slopes Monotonic increase, done as piecewise linear regression and test of slopes Monotonic decrease, done as piecewise linear regression and test of slopes Known trend, intended as slope of linear regression line

Compares the structure of autocorrelation of simulated and observed data to identify common frequencies of oscillation, or looks for specified periods in the simulated data Concordance between the simulated data error composition and specified admissible percentages of mean, variance, and random error

Table 1 Available variable tests in the MPI software package

Test Description

BOUNDS WBOUNDS CINT WCINT THEIL DBK

### ERRCOMP

At the bottom of any calibration we have an optimization problem: we seek a minimum for the error model. In most cases, we have certain parameters whose values are known and others that are only estimated within a certain domain of change. We call the latter ones 'free' parameters. They are the ones to change in the model in order to minimize the value of the error. To perform optimization we first formulate a 'goal function' (also called 'objective function'). Then we try to make this function as large or as little as we can by changing different parameters that are involved. In the case ofcalibration the goal function is the error model E = f(P, C, R), described as a function of the parameter vector P, the vector of initial conditions C, and the vector of restrictions R. We then try to find a minimum min E over the space of the free parameters P and initial conditions C, making sure that the restrictions R (such as a requirement that all state variables are positive) hold. There is rarely a model that would allow this task to be solved analytically. It is usually a numerical procedure that requires a certain, fairly complicated, software to be employed.

There are different ways to solve this problem. One approach is to do it manually with the so-called 'trial and error' method or 'educated guess' approach. The model is run, then a parameter is changed, then the model is rerun, output is compared, then the same or another parameter is changed, and so on. It may seem quite tiresome and boring, but actually this process is extremely useful to understand how the system works. Playing with the parameters you learn how they affect output (as in the sensitivity analysis stage), but you also understand the synergetic effects that parameters may have. In some cases you get quite unexpected behavior, and it takes some thought and analysis to explain yourself how and why the specific change in parameters had this effect. If you cannot find any reasonable explanation, chances are that there is a bug in the model. A closer look at the equations may solve the problem: something may have been missed or entered with a wrong sign, or some effect was not accounted for.

In addition to the educated guess approach, there are also formal mathematical methods that are available for calibration. They are usually based on the solution of the so-called optimization problem.

Some modeling systems have the functionality to solve the optimization problem and do the curve fitting for models. One such package is 'Madonna'. One big advantage of Madonna is that it can also take Stella equations almost as is and run them under its own shell. Madonna also has a nice graphic user interface of its own. So you may as well start putting your model together directly in Madonna, if you expect some optimization to be needed.

The calibration problem may not have a unique solution. There may be several parameter vectors P that deliver the same or almost the same minima to the optimization task. In that case it may be unclear what parameters to choose for the model. Other considerations and restrictions may be used to make the decision.

If we have done our best finding the values for all the parameters in the simulation model and still the error is inappropriately large, this means that something is wrong in one of the models that we are comparing. Either the conceptual model needs to be revised (the structure changed or the equations modified), or the chosen scales were incorrect and we need to reconsider the spatial or temporal resolution. Alternatively, the data is wrong, which also happens quite often and cannot be dismissed.

To conclude, there are different ways to describe systems by means of models. There are different models that may be built. The process of adjustment of one model to match the output from another model is called calibration. This is probably the most general definition. In most cases we would speak of calibration as the process of fitting the model output to the available data points or 'curve-fitting'. In this case it is the data model that is used to calibrate the mathematical model.

Note that there is hardly any reason to always give preference to the data model. The uncertainty in the data model may be as high as the uncertainty in the simulation model. The mathematical model may in fact cover areas that are not yet presented in data at all. However in most cases we will have data models precede mathematical models, and, at least initially, assume that the data models convey our knowledge about the system.

Empirical models are entirely based on data models, they may be considered as 'extensions' of the data models. They attempt to generalize the data available and present them in a different form. The process-based models, in addition to knowledge about the modeled system, may also employ information about similar systems studied elsewhere or they may incorporate theoretical knowledge about processes involved. In a way these process-based models can be even better than the data available for the particular system that is modeled. Therefore, we may hope that process-based models will be performing better outside of the data domains that were used for their calibration. So perhaps it will be easier to apply process-based models to other similar systems than empirical models, which would require a whole new calibration effort.