time | outlook | temperature | humidity | windy | play |
1 | sunny | 85 | 85 | false | 0 |
2 | sunny | 80 | 90 | true | 0 |
3 | overcast | 83 | 86 | false | 1 |
4 | rainy | 70 | 96 | false | 1 |
5 | rainy | 68 | 80 | false | 1 |
6 | rainy | 65 | 70 | true | 0 |
7 | overcast | 64 | 65 | true | 1 |
8 | sunny | 72 | 95 | false | 0 |
9 | sunny | 69 | 70 | false | 1 |
10 | rainy | 75 | 80 | false | 1 |
11 | sunny | 75 | 70 | true | 1 |
12 | overcast | 72 | 90 | true | 1 |
13 | overcast | 81 | 75 | false | 1 |
14 | rainy | 71 | 91 | true | 0 |
day:We may also extend the sets of numbers or replace them with intervals, which will make the hierarchy complete (covering all possible values). For example, humidity may look like this:
all
______|_________
| |
week 1 week 2
_____|_____ _______|_______
| | | | | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14
temperature:
all
____________|_____________
| | |
hot mild cool
____|___ ___|____ ___|____
| | | | | | | | | | | |
80 81 83 85 70 71 72 75 64 65 68 69
humidity:
all
______|________
| |
high normal
______|_______ ___|____
| | | | | | | | | |
85 86 90 91 95 96 65 70 75 80
allFor the nominal (non numeric) attributes outlook and windy we define one-level hierarchies, as their values cannot be ordered or grouped.
____|____
| |
high normal
| |
[85,96] [65,84]
outlook:
all
_______|________
| | |
sunny rainy overcast
windy:
all
___|____
| |
true false
The numbers in the internal cells are obtained by adding up the values of the play attribute, where the time and the temperature attribute are equal to the values in the corresponding row and column. For example the value 2 (row 2, column 8) means that tennis was played two days during week 2 when the temperature was 75.
64 65 68 69 70 71 72 75 80 81 83 85 week 1 1 0 1 0 1 0 0 0 0 0 1 0 week 2 0 0 0 1 0 0 1 2 0 1 0 0
In other words, climbing up the concept hierarchy produces roll-up's. Inversely, climbing down the concept hierarchy expands the table and is called drill-down. For example, the drill down of the above data cube over the time dimension produces the following:
cool mild hot week 1 2 1 1 week 2 1 3 1
cool mild hot day 1 0 0 0 day 2 0 0 0 day 3 0 0 1 day 4 0 1 0 day 5 1 0 0 day 6 0 0 0 day 7 1 0 0 day 8 0 0 0 day 9 1 0 0 day 10 0 1 0 day 11 0 1 0 day 12 0 1 0 day 13 0 0 1 day 14 0 0 0
{}
_____|______
| |
... {outlook}
{temperature} ...
___________|________
|
|
... {temperature,humidity} {outlook,temperature}
...
|
|
...
...
...
|
{outlook,temperature,humidity,windy} {time,temperature,humidity,windy}
|____________________________________|
|
{time,outlook,temperature,humidity,windy}
In the above terms the selection of dimensions actually means selection of a cube, i.e. an element of the above lattice.
There are two other OLAP operations that are related to the selection of a cube - slice and dice. Slice performs a selection on one dimension of the given cube, thus resulting in a subcube. For example, if we make the selection (temperature=cool) we will reduce the dimensions of the cube from two to one, resulting in just a single column from the table above.
The dice operation works similarly and performs a selection on two or more dimensions. For example, applying the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot) to the original cube we get the following subcube (still two-dimensional):
cool day 1 0 day 2 0 day 3 0 day 4 0 day 5 1 day 6 0 day 7 1 day 8 0 day 9 1 day 10 0 day 11 0 day 12 0 day 13 0 day 14 0
cool hot day 3 0 1 day 4 0 0
{ALL,ALL,temperature,ALL,ALL}
__________________|_________________
|
|
{ALL,ALL,temperature,humidity,ALL}
{ALL,outlook,temperature,ALL,ALL}
Using this technique the whole data cube can be represented as a single relational table as follows (we use higher levels in the concept hierarchies and omit some rows for brevity):
time | outlook | temperature | humidity | windy | play |
week 1 | sunny | cool | normal | true | 0 |
week 1 | sunny | cool | normal | false | 0 |
week 1 | sunny | cool | normal | ALL | 0 |
week 1 | sunny | cool | high | true | 0 |
week 1 | sunny | cool | high | false | 0 |
week 1 | sunny | cool | high | ALL | 0 |
week 1 | sunny | cool | ALL | true | 0 |
week 1 | sunny | cool | ALL | false | 0 |
week 1 | sunny | cool | ALL | ALL | 0 |
week 1 | sunny | mild | normal | true | 0 |
... | ... | ... | ... | ... | ... |
week 1 | overcast | ALL | ALL | ALL | 2 |
week 1 | ALL | ALL | ALL | ALL | 4 |
week 2 | sunny | cool | normal | true | 0 |
week 2 | sunny | cool | normal | false | 1 |
week 2 | sunny | cool | normal | ALL | 1 |
week 2 | sunny | cool | high | true | 0 |
... | ... | ... | ... | ... | ... |
ALL | ALL | ALL | high | ALL | 3 |
ALL | ALL | ALL | ALL | true | 3 |
ALL | ALL | ALL | ALL | false | 6 |
ALL | ALL | ALL | ALL | ALL | 9 |
can be extracted from the table by selecting the rows that match the pattern (*, ALL, *, ALL, ALL), where * matches all legitimate values for the corresponding dimension except for ALL.
cool mild hot week 1 2 1 1 week 2 1 3 1