# Testing

Now we have a simulation model that represents the data set close enough. Does this mean that we have a reliable model of the system, which we can use for forecast or management? Did we really capture the essence of the system behavior, do we really understand how the system works, or we have simply tweaked a set of parameters to produce the needed output? Are we representing the system and the processes in it, or, as in empirical models, we only see an artifact of the data set used?

We build process-based models with the presumption that they describe the guts of the system and therefore are general enough to be reapplied in different conditions, since they actually describe how the system works. This would be indeed true if all the parameters in the process formulations could be measured in experiment and then simply substituted into the model. However, usually this data is nonexistent or imprecise for all of the parameters.

The solution we found was to approximate the parameter values based on the data we had about the dynamics of state variables, or flows; that was the model calibration procedure. We were solving an inverse problem: finding the parameters based on the dynamics of the unknowns. This would be fine if we could really solve that problem and find the exact values for the parameters. However, in most cases that is also impossible, and, instead, we are finding approximate solutions that come from model fitting. But then how is this different from the fitting we do when we deal with empirical models? In that case, we also had a curve equation with unknown coefficients, which we determined empirically by finding the best combination of parameters that made the model output as close as possible to the data.

The only difference is that instead of some kind of generic equation in the empirical models (e.g., a polynomial of some form), in process-based models we have particular equations that have some ecological meaning. These equations display certain behavior by themselves, no matter what parameters are inserted. A polynomial can generate pretty much arbitrary dynamics as long as the right coefficients are chosen. However, a classic predator-prey system will always produce a certain type of dynamics, no matter what coefficients we insert. So we may conclude that to a large extent, we are building a good model as long as we chose the right dynamic equations to describe our system.

On top of the basic dynamic equations we overlay the many other descriptions for the processes that need to be included in the model. These may be the limiting factors, describing the modifying effect of temperature, light, or other external conditions. There may be some other details that we wish to add to the system. However, if these processes are not studied in an experiment, and if the related coefficients are not measured, their role in the model is not any different from that of the coefficients that we have in an empirical model. In both cases we figure out their values based on a time series of model output - in both cases the values are approximate and uncertain. They are only as good as they are the best ones found: we can never be sure that a better parameter set does not exist.

So the bottom line is that there is a good deal of empiricism in most process-based models, and the more parameters we have estimated in the calibration process, the more empiricism is involved, the less applicable the model will be in situations outside the existing data range. How can we make sure that we have really captured the essence of the system dynamics and can reproduce the system behavior beyond the domain that we have already studied?

To answer all these questions, the model needs to undergo a process of vigorous testing. There is and probably will never be a definite procedure for model testing and comparisons. The obvious reason is that models are built for various purposes; their goals may be very different. Moreover, these goals may easily change when the project is already underway. There is no reason why goal setting should be left out of the iterative modeling process. As we start generating new knowledge and understanding with a model, its goals may very well modify. We may start asking new questions and will need to modify the model, even while it has not been yet brought to perfection.

Besides, most of the ecological systems are open, which makes their modeling similar to shooting at a moving target. While we study the system and build a model of it, it already evolves. It evolves even more when we start administering control, when we try to manage the ecosystem. As a result models can very well become obsolete even before they are used to produce results. We are modeling the system as it was until a year ago, but during the last year because of some external conditions (e.g., global climate change) the system has already evolved and the model is no longer relevant.

Nevertheless, there are several procedures of model testing that became part of good modeling practice and should be certainly encouraged. Ironically in various applications you may find the names for these processes used interchangeably, which can only add to the confusion. Model testing is probably a more neutral and general term.

One way to test the model is to compare its output with some independent data set, which has not been used previously for model calibration. This is important to make sure that the model output is not an artifact of the model formalization and that the processes in the model indeed represent reality and are not just empirical constructs based on the calibrated parameters. This process is called validation. There is no agreed procedure of model validation (verification in some texts), especially when models become complex and difficult to parametrize and analyze.

One way is to run the model for spatial or temporal domains that were not used for building the model. We can run the model for places and time periods for which we either did not have data, or have deliberately set that data aside and have not used it for model calibration. We may have the luxury to wait until the new data sets are acquired, making our predictions first and then comparing them to what we measure, or we set aside a part of the data set that is already available and pretend that we do not know it while constructing the model. Then, when the model is built and calibrated based on the remaining data, we will want to bring the other portion of data into light and see if we have equally well matched this other data set. This time we do not do any calibration, we do not tweak model parameters or functions, we only compare and estimate the error model. If the error is small, we may conclude that the model is good and may have certain confidence in applying the model for future predictions.

In reality, unfortunately, it rarely happens like this. First of all, the temptation is too strong to use all the data available when building the model. As a result we usually do not have sufficient data sets for a true validation. Besides, even when the validation is undertaken, in most cases it proves to be less accurate than the calibration and therefore the researcher is likely to jump into model modifications and improvements to make the validation result look better. However, this immediately defeats the purpose of validation. Once you started using the validation data set for model adjustments you have abandoned your validation attempts and went back to further calibration.

Actually this became quite standard in many ongoing modeling projects and is called data assimilation. Special procedures are designed to constantly update and improve models based on the incoming flow of new experimental data. This becomes crucial for a complex open system (which is most usually the case for ecological and socioeconomic systems), which is always changing and evolving. As a result the data set considered for calibration and collected during one period may not be representing quite the same system as the one that produced the other data set that is intended for validation. We might be calibrating a model of one system and then trying to validate the same model, but for a different system.

Another important step in model analysis is 'verification'. A model is verified when it is scrupulously checked for all sort of internal inconsistencies, errors, and bugs. These can be in the equations chosen, in the units used, or in links and connections. These can be simply programming bugs in the code that is used to solve the model on the computer. They may be conceptual, when wrong data sets are used to drive the model. Once again, there is hardly a prescribed method to weed them out. Just check and recheck. Run the model and rerun it. Test it and test again.

One efficient method of model testing is to run the model with extreme values of forcing functions and parameters. There are always certain ranges where the forcing functions can vary. Suppose we are talking about temperature. Make the temperature as high as it can get in this system, or as low as it can be. See what happens to the model. Will it still perform reasonably well? Will the output stay within certain plausible values.? Will the model crash? If so, try figure out why. Is it something you can explain? If yes, then probably the model can still be salvaged and you may simply need to remember that the forcing function should stay within certain allowed limits. If the behavior cannot be explained, keep digging -most likely there is something wrong.

Another important check is based on first principles, such as mass and energy conservation. Make sure that there is a mass balance in the model - that nothing gets created from nowhere and nothing is lost.

The bottomline of all this testing is that there is no perfect model. It is hardly possible to get a perfect calibration, and the validation results will likely be even worse. No matter how long you spend debugging the model and the code there will always be another bug, another imperfection. Does this mean that this is all futile?

By no means! As long as you get new understanding of the system, and as long as the model helps communicate understanding to others and helps manage and control the system, you are on the right path, and the efforts will be fruitful. 'Any model that is useful is a good model'.

See also: Ecological Models, Optimization; Empirical Models; Model Development and Analysis; Parameters; Sensitivity and Uncertainty; Statistical Prediction.