## Data

The input to a DM algorithm is most commonly a single flat table comprising a number of fields (columns) and records (rows). In general, each row represents an object and columns represent properties of objects. A hypothetical example of such a table is given in Table 1. We will use this example in the remainder of this article to illustrate the different DM tasks and the different types of patterns considered by DM algorithms.

Here rows correspond to persons that have recently (in the last month) visited a small shop and columns carry some information collected on these persons (such as their age, gender, and income). Of particular interest to the store is the amount each person has spent at the store this year (over multiple visits), stored in the field 'Total'. One can easily imagine that data from a transaction table,

 CID Gender Age Income Total BigSpender c1 Male 30 214000 18 800 Yes c2 Female 19 139000 15100 Yes c3 Male 55 50 000 12 400 No c4 Female 48 26 000 8600 No c5 Male 63 191 000 28100 Yes c6 Male 63 114000 20 400 Yes c7 Male 58 38 000 11 800 No c8 Male 22 39 000 5 700 No c9 Male 49 102000 16 400 Yes c10 Male 19 125 000 15 700 Yes c11 Male 52 38 000 10 600 No c12 Female 62 64 000 15 200 Yes c13 Male 37 66 000 10 400 No c14 Female 61 95 000 18100 Yes c15 Male 56 44 000 12 000 No c16 Male 36 102000 13 800 No c17 Female 57 215000 29 300 Yes c18 Male 33 67 000 9 700 No c19 Female 26 95 000 11 000 No c20 Female 55 214000 28 800 Yes

where each purchase is recorded, have been aggregated over all purchases for each customer to derive the values for this field. Customers that have spent over 15 000 in total are of special value to the shop. An additional field has been created ('BigSpender') that has value 'Yes' if a customer has spent over 15 000 and 'No' otherwise.

In data mining terminology, rows are called examples and columns are called attributes (or sometimes features). Attributes that have numeric (real) values are called continuous attributes: 'Age', 'Income', and 'Total' are continuous attributes. Attributes that have nominal values (such as 'Gender' and 'BigSpender') are called discrete attributes.