fredag 9. oktober 2015

SQL self joins explained

In my quest to transform a database of SNMP traps into actual information, I need to master joins. Or specifically self joins. What follows is an attempt at explaining this.
I'll be using two examples. The first goes straight to the solution. The next example is more long-winded.

The TL;DR version

Consider the following table:

+------+--------+---------+------+
| id   | c1     | c2      | c3   |
+------+--------+---------+------+
|    1 | name   | letters | adam |
|    1 | weight | kg      | 80   |
|    1 | height | cm      | 180  |
|    2 | name   | letters | brad |
|    2 | weight | kg      | 70   |
|    2 | height | cm      | 170  |
+------+--------+---------+------+
One may find it natural to think that the data were entered in groups by id. But records may actually be grouped by any of the columns. From a database perspective, the table design may not appear optimal. But for illustrating self joins it works well.

How do I get from that table......to this:

+------+--------+---------+------+
| id   | name   | weight  |height|
+------+--------+---------+------+
|    1 | adam   | 80      | 180  |
|    2 | brad   | 70      | 170  |
+------+--------+---------+------+


The solution is shown below. To give you a head start understanding it:
Any column you specify in your query must be unambigous.
AS means we assign an alias to something.
The column headers in the output do not originate in the first table above, but is assigned in the query.


SELECT 
nameTable.id AS id,
nameTable.c3 AS name,
weightTable.c3 AS weight,
heightTable.c3 AS height 
FROM example AS nameTable  
JOIN example AS weightTable ON nameTable.id = weightTable.id AND weightTable.c1 = 'weight'  
JOIN example AS heightTable ON nameTable.id = heightTable.id AND heightTable.c1 = 'height' 
WHERE nameTable.c1 = 'name';


Got that? Great. You have mastered self joins and may now return to the/your original problem. Or read on.


Expanded example

The next example makes use of two tables ('beings' and 'stuff').

SELECT * FROM beings; 
+------+------------+---------+
| id   | recordtype | content |
+------+------------+---------+
|    1 | firstname  | Arnold  |
|    1 | height     | 180     |
|    1 | weight     | 80      |
|    1 | age        | 50      |
|    1 | species    | human   |
|    2 | firstname  | Camilla |
|    2 | width      | 48      |
|    2 | species    | dog     |
|    3 | firstname  | Arnold  |
|    3 | length     | 30      |
|    3 | food       | tuna    |
|    3 | species    | cat     |
|    4 | firstname  | Edmund  |
|    4 | length     | 32      |
|    4 | food       | chicken |
|    4 | species    | cat     |
+------+------------+---------+


Notice how the table beings contain a variable number of records per id. And to spice it up even further, two different ids share the same recordtype and content.

SELECT * FROM stuff; 
+------+-----------+
| id   | birthyear |
+------+-----------+
|    1 |      2001 |
|    2 |      2002 |
|    3 |      2003 |
|    4 |      2004 |
+------+-----------+

Nothing exciting about this table.



First, we try to join data from the two tables. We will join data where the column 'id' matches between the tables.

SELECT 
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id;
+-----------+---------+
| birthyear | content |
+-----------+---------+
|      2001 | Arnold  |
|      2001 | 180     |
|      2001 | 80      |
|      2001 | 50      |
|      2001 | human   |
|      2002 | Camilla |
|      2002 | 48      |
|      2002 | dog     |
|      2003 | Arnold  |
|      2003 | 30      |
|      2003 | tuna    |
|      2003 | cat     |
|      2004 | Edmund  |
|      2004 | 32      |
|      2004 | chicken |
|      2004 | cat     |
+-----------+---------+

So... we got what we asked for. Right? Notice how we have to prefix 'id' with the tablename and a period on the last line of the query.

SELECT 
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
+-----------+---------+
| birthyear | content |
+-----------+---------+
|      2001 | Arnold  |
|      2002 | Camilla |
|      2003 | Arnold  |
|      2004 | Edmund  |
+-----------+---------+

Now we're moving in the right direction. Can we get the id in the output as well?

SELECT 
id,
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
ERROR 1052 (23000): Column 'id' in field list is ambiguous


Not like that, anyway. Remember being unambiguous when you ask for something. As a general rule in life. And in particular when querying a database. Prefix the id with tablename and a period. It does not matter which table...

SELECT 
stuff.id,
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
+------+-----------+---------+
| id   | birthyear | content |
+------+-----------+---------+
|    1 |      2001 | Arnold  |
|    2 |      2002 | Camilla |
|    3 |      2003 | Arnold  |
|    4 |      2004 | Edmund  |
+------+-----------+---------+


Much better!
From here on we'll skip the 'stuff' table and see if we can join table 'beings' with itself! To do that, we employ the 'AS'...verb? subject? thingy?. Whatever. We use 'AS'. Look:
SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    1 | firstname  | Arnold  | firstname  | Arnold  |
|    1 | height     | 180     | firstname  | Arnold  |
|    1 | weight     | 80      | firstname  | Arnold  |
|    1 | age        | 50      | firstname  | Arnold  |
|    1 | species    | human   | firstname  | Arnold  |
|    2 | firstname  | Camilla | firstname  | Camilla |
|    2 | width      | 48      | firstname  | Camilla |
|    2 | species    | dog     | firstname  | Camilla |
|    3 | firstname  | Arnold  | firstname  | Arnold  |
|    3 | length     | 30      | firstname  | Arnold  |
|    3 | food       | tuna    | firstname  | Arnold  |
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | firstname  | Edmund  | firstname  | Edmund  |
|    4 | length     | 32      | firstname  | Edmund  |
|    4 | food       | chicken | firstname  | Edmund  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+


(I believe the term may be 'clause'.)
What happened here, is that we named table 'beings' as 'idalias' and extracted the columns 'id', 'recordtype' and 'content', and then we joined each of the new records with the same records from the 'beings' table (now renamed to 'namealias'), but only where the 'id' matches in both tables, and only where the 'recordtype' of namealias contains 'firstname'. Why on earth would I do that?!
Look what happens when I add one more condition for the join:

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.recordtype = 'species';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    1 | species    | human   | firstname  | Arnold  |
|    2 | species    | dog     | firstname  | Camilla |
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+


If I know I don't care for all species, but only our feline friends, I can limit my selection even furrierfurther:

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+


We're getting there

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.content,
foodalias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
+------+------------+---------+---------+---------+
| id   | recordtype | content | content | content |
+------+------------+---------+---------+---------+
|    3 | species    | cat     | Arnold  | tuna    |
|    4 | species    | cat     | Edmund  | chicken |
+------+------------+---------+---------+---------+


Let us trim the number of columns. And give those columns more descriptive names:

SELECT 
idalias.id AS id,
namealias.content AS firstname,
foodalias.content AS food,
speciesalias.content AS species
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
JOIN beings AS speciesalias ON speciesalias.id = idalias.id AND speciesalias.recordtype = 'species';
+------+-----------+---------+---------+
| id   | firstname | food    | species |
+------+-----------+---------+---------+
|    3 | Arnold    | tuna    | cat     |
|    4 | Edmund    | chicken | cat     |
+------+-----------+---------+---------+


Much better! Now, if we wanted to figure out the dietary needs of one furry friend in particular, we could add a WHERE clause. Like this:

SELECT 
idalias.id AS id,
namealias.content AS firstname,
foodalias.content AS food,
speciesalias.content AS species
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
JOIN beings AS speciesalias ON speciesalias.id = idalias.id AND speciesalias.recordtype = 'species'
WHERE namealias.content = 'Arnold';
+------+-----------+------+---------+
| id   | firstname | food | species |
+------+-----------+------+---------+
|    3 | Arnold    | tuna | cat     |
+------+-----------+------+---------+


I'll let this rest for now. Hope it was of use to you!

Ingen kommentarer:

Legg inn en kommentar