Feb 15, 2013

Hierarchical queries in MySQL, Tree structures

Hierarchical queries in MySQL, Managing tree like structures

I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).
The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.
We need to combine two queries here:
  1. Original hierarchical query that returns all descendants of a given id (a descendancy chain)
  2. A query that would return all ancestors of a given id (an ancestry chain)
An id can have only one parent, that’s why we can employ a linked list technique to build an ancestry chain, like shown in this article:
Here’s the query to to this (no functions required):
SELECT  @r AS _id,
         (
         SELECT  @r := parent
         FROM    t_hierarchy
         WHERE   id = _id
         ) AS parent,
         @l := @l + 1 AS lvl
 FROM    (
         SELECT  @r := 1218,
                 @l := 0,
                 @cl := 0
         ) vars,
         t_hierarchy h
WHERE    @r <> 0
To combine two queries, we can employ a simple UNION ALL.
The only problem that is left to preserve the correct level, since the ancestry chain query conts levelbackwards, and the hierarchical query will count it starting from zero.

Let’s create a sample table and see what we get:


CREATE TABLE t_hierarchy (
02.id int(10) unsigned NOT NULL AUTO_INCREMENT,
03.parent int(10) unsigned NOT NULL,
04.PRIMARY KEY (id),
05.KEY ix_hierarchy_parent (parent, id)
06.) ENGINE=InnoDB DEFAULT CHARSET=utf8;
07. 
08.DELIMITER $$
09.CREATE PROCEDURE prc_fill_hierarchy (level INT, fill INT)
10.BEGIN
11.DECLARE _level INT;
12.DECLARE _fill INT;
13.INSERT
14.INTO    t_hierarchy (id, parent)
15.VALUES  (1, 0);
16.SET _fill = 0;
17.WHILE _fill < fill DO
18.INSERT
19.INTO    t_hierarchy (parent)
20.VALUES  (1);
21.SET _fill = _fill + 1;
22.END WHILE;
23.SET _fill = 1;
24.SET _level = 0;
25.WHILE _level < level DO
26.INSERT
27.INTO    t_hierarchy (parent)
28.SELECT  hn.id
29.FROM    t_hierarchy ho, t_hierarchy hn
30.WHERE   ho.parent = 1
31.AND hn.id > _fill;
32.SET _level = _level + 1;
33.SET _fill = _fill + POWER(fill, _level);
34.END WHILE;
35.END
36.$$
37.DELIMITER ;
38. 
39.DROP FUNCTION IF EXISTS hierarchy_connect_by_parent_eq_prior_id;
40. 
41.DELIMITER $$
42. 
43.CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INTRETURNS INT
44.NOT DETERMINISTIC
45.READS SQL DATA
46.BEGIN
47.DECLARE _id INT;
48.DECLARE _parent INT;
49.DECLARE _next INT;
50.DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;
51. 
52.SET _parent = @id;
53.SET _id = -1;
54. 
55.IF @id IS NULL THEN
56.RETURN NULL;
57.END IF;
58. 
59.LOOP
60.SELECT  MIN(id)
61.INTO    @id
62.FROM    t_hierarchy
63.WHERE   parent = _parent
64.AND id > _id;
65.IF @id IS NOT NULL OR _parent = @start_with THEN
66.SET @level = @level + 1;
67.RETURN @id;
68.END IF;
69.SET @level := @level - 1;
70.SELECT  id, parent
71.INTO    _id, _parent
72.FROM    t_hierarchy
73.WHERE   id = _parent;
74.END LOOP;      
75.END
76.$$
77. 
78.DELIMITER ;
79. 
80.START TRANSACTION;
81.CALL prc_fill_hierarchy(6, 5);
82.COMMIT;


Now, let’s try to UNION ALL the queries as is:
SELECT  CONCAT(REPEAT('    ', lvl  - 1), _id) AS treeitem, parent, lvl AS level
FROM    (
        SELECT  @r AS _id,
                (
                SELECT  @r := parent
                FROM    t_hierarchy
                WHERE   id = _id
                ) AS parent,
                @l := @l + 1 AS lvl
        FROM    (
                SELECT  @r := 1218,
                        @l := 0,
                        @cl := 0
                ) vars,
                t_hierarchy h
        WHERE   @r <> 0
        ORDER BY
                lvl DESC
        ) qi
UNION ALL
SELECT  CONCAT(REPEAT('    ', level  - 1), CAST(hi.id AS CHAR)), parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 1218,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id

treeitemparentlevel
106
215
724
3873
165382
12181651
547312181
2479654732
2479754732
2479854732
2479954732
2480054732
547412181
2480154742
2480254742
2480354742
2480454742
2480554742
547512181
2480654752
2480754752
2480854752
2480954752
2481054752
547612181
2481154762
2481254762
2481354762
2481454762
2481554762
547712181
2481654772
2481754772
2481854772
2481954772
2482054772
36 rows fetched in 0.0014s (0.1447s)
We see that the hierarchical order is mangled: the first resultset is upside down, the second one is starting from level = 1.
To fix it, we need to change the code that calculates level a little.
First, we need to reverse the ancestry part.
This can be easily done by sorting it on lvl DESC:
SELECT  CONCAT(REPEAT('    ', level  - 1), _id) AS treeitem, parent, level
FROM    (
        SELECT  @r AS _id,
                (
                SELECT  @r := parent
                FROM    t_hierarchy
                WHERE   id = _id
                ) AS parent,
                @l := @l + 1 AS level
        FROM    (
                SELECT  @r := 1218,
                        @l := 0,
                        @cl := 0
                ) vars,
                t_hierarchy h
        WHERE   @r <> 0
        ORDER BY
                level DESC
        ) qi

treeitemparentlevel
106
215
724
3873
165382
12181651
6 rows fetched in 0.0003s (0.0684s)
We now have it in correct order but with wrong level values.
Since a level is essentially a rownum here, we can just calculate it as a rownum instead:
SELECT  CONCAT(REPEAT('    ', level  - 1), id) AS treeitem, parent, level
FROM    (
        SELECT  _id AS id, parent,
                @cl := @cl + 1 AS level
        FROM    (
                SELECT  @r AS _id,
                        (
                        SELECT  @r := parent
                        FROM    t_hierarchy
                        WHERE   id = _id
                        ) AS parent,
                        @l := @l + 1 AS level
                FROM    (
                        SELECT  @r := 1218,
                                @l := 0,
                                @cl := 0
                        ) vars,
                        t_hierarchy h
                WHERE   @r <> 0
                ORDER BY
                        level DESC
                ) qi
        ) qo
treeitemparentlevel
101
212
723
3874
165385
12181656
6 rows fetched in 0.0003s (0.0712s)
We disregard the previously selected level at all, leaving it only inside the inline view qi for ordering purposes.
Now, we need to merge the descendancy tree query, but with levels starting from 6, not from 1.
Since this query will come after the ancestry one, we can use accumulated value of @cl (which we used to calculate level in the previous query) as a seed.
To do that, we can take the level returned by the descendancy tree query, and just add @cl to it.
The only problem it to determine when we should increment @cl and when we should add it to level.
We will just use a boolean field which will help us to tell the sets apart.
Here’s the query to do it:
SELECT  CONCAT(REPEAT('    ', level - 1), CAST(id AS CHAR)),
        parent,
        level
FROM    (
        SELECT  id, parent, IF(ancestry, @cl := @cl + 1, level + @cl) AS level
        FROM    (
                SELECT  TRUE AS ancestry, _id AS id, parent, level
                FROM    (
                        SELECT  @r AS _id,
                                (
                                SELECT  @r := parent
                                FROM    t_hierarchy
                                WHERE   id = _id
                                ) AS parent,
                                @l := @l + 1 AS level
                        FROM    (
                                SELECT  @r := 1218,
                                        @l := 0,
                                        @cl := 0
                                ) vars,
                                t_hierarchy h
                        WHERE   @r <> 0
                        ORDER BY
                                level DESC
                        ) qi
                UNION ALL
                SELECT  FALSE, hi.id, parent, level
                FROM    (
                        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
                        FROM    (
                                SELECT  @start_with := 1218,
                                        @id := @start_with,
                                        @level := 0
                                ) vars, t_hierarchy
                        WHERE   @id IS NOT NULL
                        ) ho
                JOIN    t_hierarchy hi
                ON      hi.id = ho.id
                ) q
        ) q2

CONCAT(REPEAT(‘ ‘, level – 1), CAST(id AS CHAR))parentlevel
101
212
723
3874
165385
12181656
547312187
2479654738
2479754738
2479854738
2479954738
2480054738
547412187
2480154748
2480254748
2480354748
2480454748
2480554748
547512187
2480654758
2480754758
2480854758
2480954758
2481054758
547612187
2481154768
2481254768
2481354768
2481454768
2481554768
547712187
2481654778
2481754778
2481854778
2481954778
2482054778
36 rows fetched in 0.0016s (0.1514s)
One more thing:  use this if extremely necessary, and for small trees,, some times it crashes the Mysql server,
im still working on some more efficient solution.

Till Happy coding :) 

1 comment:

  1. Hi There

    Thank you for an excellent article, it is more or less what I was looking for, however I only needed the very first query you posted on Sorting Lists for use in my Application.

    But I have come across a problem and hope you may help.

    I need to query a DB that has a Hierachical Structure, See Below: Table: xmlgames1

    id | GameID | ParentGameID
    11 | 570 | 1
    13 | 5157 | 570
    20 | 7878 | 5175
    67 | 8252 | 7878

    When I try out your query above as follows:

    SELECT @r AS _id,
    (
    SELECT @r := ParentGameID
    FROM xmlgames1
    WHERE GameID = _id
    ) AS parent,
    @l := @l + 1 AS lvl
    FROM (
    SELECT @r := 8252,
    @l := 0,
    @cl := 0) vars, xmlgames1 h
    WHERE @r <> 0


    The query returns an Error: [Err] 1242 - Subquery returns more than 1 row

    Can you explain what I am doing wrong?

    ReplyDelete