Population dynamics studies the behavior of a given group of living organisms (population) over time, usually taking into account abiotic factors and possibly other populations in the environment. For example, one might study the population of phytoplankton in a given lake and its relation to water temperature, concentrations of nutrients/pollutants (such as nitrogen and phosphorus), and the biomass of zooplankton (which feeds on phytoplankton). The modeling formalism most often used by ecological experts is the formalism of differential equations, which describe the change of state of a dynamic system over time. A typical approach to modeling population dynamics is as follows: an ecological expert writes a set of differential equations that capture the most important relationships in the domain. These are often linear differential equations. The coefficients of these equations are then determined (calibrated) using measured data.
Relationships among living communities and their abiotic environment can be highly nonlinear. Population dynamics (and other ecological) models have to reflect this to be realistic. This has caused a surge of interest in the use of techniques such as neural networks for ecological modeling. Measured data are used to train a neural network which can then be used to predict the future behavior of the studied population. In this fashion, population dynamics of algae, aquatic fauna, fish, phyto-plankton, and zooplankton - among others - have been modeled.
While regression tree induction has also been used to model population dynamics, systems for discovery of differential equations have proved most useful in this respect, since differential equations are the prevailing formalisms used for ecological modeling. Algal growth has been modeled for the Lagoon of Venice and the Slovenian Lake of Bled, while phytoplankton growth has been modeled for the Danish Lake Glumsoe.
Case study: Modeling algal growth in the Lagoon of Venice
The beautiful and shallow Lagoon of Venice is under heavy pollution stress due to agricultural activities (use of fertilizers) on the neighboring mainland. Pollutants are food (nutrients) for algae, which have on occasion grown excessively to the point of suffocating themselves, then decayed and caused unpleasant odors (noticed also by the tourists). Models of algal growth are needed to support environmental management decisions and answer questions such as: ''Would a reduction in the use of phosphorus-rich fertilizers reduce algal growth?''
Kompare and Dzeroski use regression trees and equation discovery to model the growth of the dominant species of algae (Ulva rígida) in the Lagoon of Venice in relation to water temperature, dissolved nitrogen and phosphorus, and dissolved oxygen. The trees give a rough picture of the relative importance of the factors influencing algal growth (cf. Figure 5), revealing that nitrogen is the limiting factor (and thus providing a negative answer to the question in the above paragraph). The equations discovered, on the other hand, give better prediction of the peaks and crashes of algal biomass.
Severe problems of data quality were encountered in this application:
1. Dissolved oxygen, for example, was measured at the water surface approximately at noon (when oxygen is produced by photosynthesis and is plentiful). Hence, the data does not reveal potential anoxic conditions, which might occur at night.
2. Measurement errors of algal biomass were estimated to be quite large by the domain experts (up to 50% relative error).
3. Finally, winds were not taken into account: these might move algae away from the sampling stations and cause huge variations in the observed biomass values.
Case study: Phytoplankton growth in Lake Glumsoe
The shallow Lake Glumsoe is situated in a subglacial valley in Denmark. It has received mechanically-biologically treated waste water, as well as non-point-source pollution due to agricultural activities in the surrounding area. High concentration of pollutants (food for phytoplankton) has led to excessive growth of phyto-plankton and consequently no submerged vegetation, due to low transparency of the water and oxygen deficit
Figure 5 A regression tree for predicting algal growth, i.e., change in biomass. Bio(t), DO(t), and NO3(t) stand for the concentrations of biomass, dissolved oxygen, and nitrates at time t, respectively, and AX(t) = X(t) -X(t - 1).
Table 3 The discovered model for phytopankton growth in Lake Glumsoe phyt = 0.553 • temp • phyt •_ph0Sp_
(anoxia) at the bottom of the lake. It was thus important to have a good model of phytoplankton growth to support environmental management decisions.
We used KDD methods for the discovery of differential equations to relate phytoplankton (phyt) growth to water temperature (temp), nutrient concentrations (nitrogen, nitro, and phosphorus, phosp) and zooplankton concentration (zoo). Some elementary knowledge on population dynamics modeling was taken into account during the discovery process. This domain knowledge tells us that a term called Monod's term, which has the form Nutrient/ (Nutrient + constant), is a reasonable term to be expected in differential equations describing the growth of an organism that feeds on Nutrient. It describes the saturation of the population of organisms with the nutrient.
The discovered model is given in Table 3. Here phyt denotes the rate of change of phytoplankton concentration. The model reveals that phosphorus is the limiting nutrient for phytoplankton growth, as it includes a Monod term with phosphorus as a nutrient. This model made better predictions than a linear model, which has the form phyt — - 5.41 - 0.043 9 ? phyt - 13.5 • nitro - 38.2 • zoo + 93.9 • phosp + 3.20 • temp
It was also more understandable to domain experts: the first term describes phytoplankton growth, where temperature and phosphorus are limiting factors. The last two terms describe phytoplankton death and the feeding of zooplankton on phytoplankton.
The following issues were raised in this application:
1. Data quantity and preprocessing. Measurements were only made at 14 time points during 2 months (once or twice weekly). Some preprocessing/interpolation was thus necessary to generate enough data for discovering differential equations.
2. Data quality. Ecological experts often have poor understanding of modelling concepts, which strongly influences the way data are collected. An electrical engineer with knowledge of control theory would know much better that sampling frequency has to be increased at times when the system under study has faster dynamics (e.g., at peaks of phytoplankton growth).
3. The need for taking into account domain knowledge during the KDD process. This can compensate to a certain extent for poor data quality and quantity (as was the case in this application). This issue is of great importance, yet few KDD methods allow for the provision of domain knowledge by experts.
Case study: Modeling the interactions of a red deer population with the new growth in a forest
Here we studied the interactions between a population of red deer and new forest growth in a natural regenerated forest in Slovenia. Ideally, foresters would like to keep in balance the size of the deer population and the rate of regeneration of the forest: if the deer population is large, so are the browsing rates of new forest growth and regeneration slows down. Understanding the relationship between the two is crucial for managing the balance. Our study has shown that meteorological parameters strongly influence this relationship and have to be taken into account.
A preliminary study using regression trees to model the interactions was performed by Stankovski et al. Here
we summarize the results of a follow-up study that used a slightly larger data set, cleaner data, and more reliable methods of regression tree induction. The induced models show that the degree of browsing for maple (the preferred browse species of red deer) depends directly on the size of the population. The degree of beech browsing, on the other hand, was most strongly influenced by meteorological parameters, that is, winter monthly quantity of precipitation (snow) and average monthly minimal diurnal air temperature (cf.Figure 6). While beech is not the preferred browse species of red deer, it is consumed yearlong; it is also elastic and snow resistant and thus more exposed to the reach of red deer even in deeper snow.
The following issues were raised by this application:
1. Data quantity. The size of the deer population and browsing rates are only estimated once a year. Even though we were dealing with 18 years worth of data, these were still only 18 data points.
2. Data qualjty. Some of the data collected in this domain were unreliable and had to be cleaned/corrected/ removed before obtaining reasonable results.
3. Mjssjng jnformatjon. The outcome of the data analysis process suggested that measuring winter and summer browsing rates separately would greatly improve the models. This information was not captured in the measured data, but should be measured in the future.
Was this article helpful?