Data

The input to a DM algorithm is most commonly a single flat table comprising a number of fields (columns) and records (rows). In general, each row represents an object and columns represent properties of objects. A hypothetical example of such a table is given in Table 1. We will use this example in the remainder of this article to illustrate the different DM tasks and the different types of patterns considered by DM algorithms.

Here rows correspond to persons that have recently (in the last month) visited a small shop and columns carry some information collected on these persons (such as their age, gender, and income). Of particular interest to the store is the amount each person has spent at the store this year (over multiple visits), stored in the field 'Total'. One can easily imagine that data from a transaction table,

Table 1 A single table with data on customers (table 'Customer')

CID

Gender

Age

Income

Total

BigSpender

c1

Male

30

214000

18 800

Yes

c2

Female

19

139000

15100

Yes

c3

Male

55

50 000

12 400

No

c4

Female

48

26 000

8600

No

c5

Male

63

191 000

28100

Yes

c6

Male

63

114000

20 400

Yes

c7

Male

58

38 000

11 800

No

c8

Male

22

39 000

5 700

No

c9

Male

49

102000

16 400

Yes

c10

Male

19

125 000

15 700

Yes

c11

Male

52

38 000

10 600

No

c12

Female

62

64 000

15 200

Yes

c13

Male

37

66 000

10 400

No

c14

Female

61

95 000

18100

Yes

c15

Male

56

44 000

12 000

No

c16

Male

36

102000

13 800

No

c17

Female

57

215000

29 300

Yes

c18

Male

33

67 000

9 700

No

c19

Female

26

95 000

11 000

No

c20

Female

55

214000

28 800

Yes

where each purchase is recorded, have been aggregated over all purchases for each customer to derive the values for this field. Customers that have spent over 15 000 in total are of special value to the shop. An additional field has been created ('BigSpender') that has value 'Yes' if a customer has spent over 15 000 and 'No' otherwise.

In data mining terminology, rows are called examples and columns are called attributes (or sometimes features). Attributes that have numeric (real) values are called continuous attributes: 'Age', 'Income', and 'Total' are continuous attributes. Attributes that have nominal values (such as 'Gender' and 'BigSpender') are called discrete attributes.

Was this article helpful?

0 0
10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment