M D Wilson, University of Georgia, Aiken, SC, USA © 2008 Elsevier B.V. All rights reserved.
Introduction Leave-One-Out Cross-Validation
Support Vector Machines Further Reading
Support Vectors in Nonlinear Feature Spaces
Support vector machines (SVMs) have been fairly recently introduced in the field of ecology. The most common use is in pattern recognition and classification problems in remote sensing. Investigators using remote sensing in ecological and environmental applications embraced SVMs early in comparison to other fields, perhaps due to the rapid development of data-intensive technology, along with the corresponding lag in the development of analytical tools. Even more recently, however, the use of SVMs in a variety of ecological disciplines has grown rapidly. We see the use of SVMs in species identification, the mapping of plant species and disease distributions, and the estimation of age in fish. Indeed, wherever high-dimensional data exist along with a corresponding lack of knowledge of the underlying distribution (as well as perhaps a relatively small sample size), SVMs have a strong potential to solve the resulting difficulties in data analysis. SVMs also work well in more tractable data sets of smaller dimension and always have the benefit of not being constrained by distributional assumptions, other than that the data are independent and identically distributed, an issue becoming more and more acknowledged in modern science and statistics.
Support vector classification is based on a particular type of statistical learning machine, with supporting theory well developed by Vapnik. Support vector classification requires no assumptions on the distribution of the underlying population, other than that the data are independent and identically distributed (iid). Furthermore, SVMs exploit theorems bounding the actual risk in terms of the empirical risk, rather than estimating error using asymptotic convergence to normality. Hence, even small sample sizes can produce accurate estimates of the prediction error, while making no distributional assumptions. The optimal machine achieves a balance between consistency in the training set and generalization to future data sets. Additionally, SVMs allow us to avoid the degradation of computational performance that often occurs in high dimensions. Because of these important qualities, support vector classification is a good candidate for the often high dimensional, noisy, and messy data found in ecology.
In the binary classification scheme, we have l observations along with their labels. For example, y- = 1 could indicate that the ¿th observation came from an invasive plant, and yy = — 1 indicates that the jth observation came from a native plant. Thus we have,
Was this article helpful?