### Monday, March 14, 2005

## Path Enumeration using Prime Number Products

Managing Hierarchies in Relational Database Environment is always a challenging task. Over the years many models like Adjacency List Model, Nested Set Model, Nested Interval Model, Path Enumeration Model etc. were proposed and implemented. There are also other popular methods using recursion and user defined function. SQL Server 2005 T-SQL Extensions has special operators for managing recursive queries. Joe CELKO has written a book exclusively on Managing Trees and Hierarchies.

The basic Hierarchical operations in a typical Employee-Manager type of relation, include

1. Add a new Employee

2. Show the whole Hierarchy with levels.

3. Show the superiors of a given employee.

4. Show the subordinates of a given employee.

5. Show the immediate subordinates of a given employee.

6. List all the Leaf Level Employees.

7. Change the manager of an Employee

8. Delete an Employee and all his subordiantes.

9. Delete an Employee.

Here I am trying to propose a simple model which works almost like Path Enumeration model. This model uses a

This model works on two basic concepts.

That is, Let product of n prime numbers be Px, then, the factors of Px can only be the Prime Numbers that participated in the multiplication, or the sub products.

So the product cannot be divided by any other Prime Number.

For eg.

P1, P2, P3,...Pn is the list of Prime Numbers.

Let P be the Product.

Then P1 * P2 * P3 * ... * Pn = P

Then the factors of P can be either P1, P2, P3,...Pn Or the sub product of P1, P2, P3,...Pn.

Now let's try to answer the above questions.

The following is the table used in the examples.

CREATE TABLE dbo.EmpHierarchy

(EmpId int PRIMARY KEY,

EmpName VARCHAR(15),

ManagerId int,

HKey bigint UNIQUE,

Path bigint)

GO

The purpose of the first three columns are obvious. The column Hkey is a Unique Prime Number assigned to each employee. The column Path Holds the product of the Hkey values of the current node and nodes above the current node.

The script used for creating and Populating the PrimeNumber table is here.

For inserting a new node (employee), we will get the least available Prime Number and use it as the Hkey of the node. The Path will be the Product of the Hkey and the Path of the parent node. For the root level entry the path will be the Hkey itself.

Here is the script for the stored procedure dbo.InsertNode

Now lets insert some sample data.

EXEC dbo.InsertNode 1,'Albert', NULL

EXEC dbo.InsertNode 2,'Bert', 1

EXEC dbo.InsertNode 3,'Chuck', 1

EXEC dbo.InsertNode 4,'Dona', 3

EXEC dbo.InsertNode 5,'Eddie', 3

EXEC dbo.InsertNode 6,'Fred', 3

SELECT * FROM EmpHierarchy

Now we have this data in the table.

The level of a node in a hierarchy depends on the number of levels above it. So the level of a given node can be found by counting the levels above it. Thats the technique used in the stored procedure dbo.PrintHierarchy. The SP also accepts an EmployeeId, to display a partial hierarchy.

EXEC dbo.PrintHierarchy

Here is the output

The path of a given employee is the product of the Hkeys of his superiors and his Hkey. As per our rule the path can be divided only by the Prime Numbers participated in it. Based on that finding the superiors logic is failrly easy. You can get the list of the superios of a given employee by

SELECT A.EmpId, A.EmpName

FROM EmpHierarchy A

WHERE [Path of the given Employee] % A.Hkey = 0

I have this Stored Procedure dbo.getSuperiors which accepts an Employee Id and return the list of his superiors.

To get the Superiors of 'Fred', execute

EXEC dbo. getSuperiors 6

Here is the output

The Hkey of a given employee will be a factor of the Path of all his subordinates and only his subordinates.

So to get the list of the subordinates of an employee, you can run a query like

SELECT A.EmpId, A.EmpName AS Employee

FROM EmpHierarchy A

WHERE A.Path % [HKey of the given Employee] = 0

Here is the stored procedure dbo. getSubordinates for doing the same.

Now to get the list of subordiantes of 'Chuck', execute

EXEC dbo. getSubordinates 3

Here is the output

The path of the immediate subordiantes of a given employee will be the product of the path of the manager and the HKey of the subordinate.

So one can find the immediate subordinates of a given employee by writing a query like

SELECT EmpId, EmpName

FROM EmpHierarchy

WHERE Path = HKey * [Path of the Given Employee]

For eg, to get All the immediate subordinates of Albert, the query will be

SELECT EmpId, EmpName

FROM EmpHierarchy

WHERE Path = HKey *2 --Path of Albert

Here is the output

Leaf level employees are those who dont have any subordinates. So finding them is a trivial task.

The following query gives you the list of leaf level employees.

SELECT EmpId, EmpName

FROM EmpHierarchy A

WHERE NOT Exists(SELECT 1 FROM EmpHierarchy T

WHERE T.Path % A.Hkey = 0

AND T.Path > A.Path)

And here is the output.

Updating can be of two types, promotion and demotion. When updating an Employees manager, you need to update the rows of the given employee and all his subordinates Path. Also you need to perform some validation to avoid cyclic reference. Here is the stored Procedure dbo.UpdateNode to update a given node.

So now to change Fred's Manager to Bert from Chuck, you can Execute

EXEC dbo.UpdateNode EmployeeId, NewManagerId

EXEC dbo.UpdateNode 6, 2

After the update, the hierarchy will look like

EXEC dbo.PrintHierarchy

Now, lets try a promotion. Eddie directly reports to Albert instead of Chuck

EXEC dbo.UpdateNode 5, 1

This is how the hierarchy will look like now.

EXEC dbo.PrintHierarchy

Finally lets demote chuck one level down the hierarchy. Instead of reporting to Albert, he will now report to Bert

EXEC dbo.UpdateNode 3, 2

EXEC dbo.PrintHierarchy

Deleting a subtree is fairly easy. we can use the same logic of finding subordinates and apply that in the where clause.

Here is the stored procedure dbo.deleteSubTree

Now to fire the whole team of Chuck, you can execute

EXEC dbo.deleteSubTree 3 --EmployeeId of chuck

After the deletion, the Hierarchy will look like

EXEC dbo.PrintHierarchy

So Now lets try deleting Bert.

EXEC dbo.deleteNode 2

After the deletion, the hierarchy will look like

And thats it :)

These are the pro's and con's of the model from my point of view.

1. Use only standard SQL constructs.

2. Use only set based solutions.

3. Can be implemented as self manageable using triggers.

4. Gives better performance because it uses Numeric operations instead of string operations.

1. For larger Hierarchies, you may encounter out of range errors because the Path value grows in an exponential rate.

Still you can replace the bigint with a larger datatype like numeric and replace the % operator with division operator and overcome the issue.

AS I mentioned earlier, this model is still in its evolving face. So as always, your feedback is highly appreciated.

The basic Hierarchical operations in a typical Employee-Manager type of relation, include

1. Add a new Employee

2. Show the whole Hierarchy with levels.

3. Show the superiors of a given employee.

4. Show the subordinates of a given employee.

5. Show the immediate subordinates of a given employee.

6. List all the Leaf Level Employees.

7. Change the manager of an Employee

8. Delete an Employee and all his subordiantes.

9. Delete an Employee.

Here I am trying to propose a simple model which works almost like Path Enumeration model. This model uses a

**Prime Number**table and basic mathematical operations to answer the above questions. Also the Model only uses set based solutions and can be implemented as a self managing system with the help of triggers.This model works on two basic concepts.

**1. There is a unique path to every node in a hierarchy from the root node.**

2. The Prime Number that can be a divisor of the Product of a set of Prime Numbers, are only the Prime numbers in the set.

2. The Prime Number that can be a divisor of the Product of a set of Prime Numbers, are only the Prime numbers in the set.

That is, Let product of n prime numbers be Px, then, the factors of Px can only be the Prime Numbers that participated in the multiplication, or the sub products.

So the product cannot be divided by any other Prime Number.

For eg.

P1, P2, P3,...Pn is the list of Prime Numbers.

Let P be the Product.

Then P1 * P2 * P3 * ... * Pn = P

Then the factors of P can be either P1, P2, P3,...Pn Or the sub product of P1, P2, P3,...Pn.

Now let's try to answer the above questions.

The following is the table used in the examples.

CREATE TABLE dbo.EmpHierarchy

(EmpId int PRIMARY KEY,

EmpName VARCHAR(15),

ManagerId int,

HKey bigint UNIQUE,

Path bigint)

GO

The purpose of the first three columns are obvious. The column Hkey is a Unique Prime Number assigned to each employee. The column Path Holds the product of the Hkey values of the current node and nodes above the current node.

The script used for creating and Populating the PrimeNumber table is here.

**1. Add a new Employee**For inserting a new node (employee), we will get the least available Prime Number and use it as the Hkey of the node. The Path will be the Product of the Hkey and the Path of the parent node. For the root level entry the path will be the Hkey itself.

Here is the script for the stored procedure dbo.InsertNode

Now lets insert some sample data.

EXEC dbo.InsertNode 1,'Albert', NULL

EXEC dbo.InsertNode 2,'Bert', 1

EXEC dbo.InsertNode 3,'Chuck', 1

EXEC dbo.InsertNode 4,'Dona', 3

EXEC dbo.InsertNode 5,'Eddie', 3

EXEC dbo.InsertNode 6,'Fred', 3

SELECT * FROM EmpHierarchy

Now we have this data in the table.

EmpId | EmpName | ManagerId | HKey | Path |

2 | Bert | 1 | 3 | 6 |

3 | Chuck | 1 | 5 | 10 |

4 | Dona | 3 | 7 | 70 |

5 | Eddie | 3 | 11 | 110 |

6 | Fred | 3 | 13 | 130 |

**2. Show the whole Hierarchy with levels.**The level of a node in a hierarchy depends on the number of levels above it. So the level of a given node can be found by counting the levels above it. Thats the technique used in the stored procedure dbo.PrintHierarchy. The SP also accepts an EmployeeId, to display a partial hierarchy.

EXEC dbo.PrintHierarchy

Here is the output

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Bert | Albert |

-- | 2 | Chuck | Albert |

--- | 3 | Dona | Chuck |

--- | 3 | Eddie | Chuck |

--- | 3 | Fred | Chuck |

**3. Show the superiors of a given employee.**The path of a given employee is the product of the Hkeys of his superiors and his Hkey. As per our rule the path can be divided only by the Prime Numbers participated in it. Based on that finding the superiors logic is failrly easy. You can get the list of the superios of a given employee by

SELECT A.EmpId, A.EmpName

FROM EmpHierarchy A

WHERE [Path of the given Employee] % A.Hkey = 0

I have this Stored Procedure dbo.getSuperiors which accepts an Employee Id and return the list of his superiors.

To get the Superiors of 'Fred', execute

EXEC dbo. getSuperiors 6

Here is the output

EmpId | Employee | Manager |

6 | Fred | Chuck |

3 | Chuck | Albert |

1 | Albert | NULL |

**4. Show the subordinates of a given employee.**The Hkey of a given employee will be a factor of the Path of all his subordinates and only his subordinates.

So to get the list of the subordinates of an employee, you can run a query like

SELECT A.EmpId, A.EmpName AS Employee

FROM EmpHierarchy A

WHERE A.Path % [HKey of the given Employee] = 0

Here is the stored procedure dbo. getSubordinates for doing the same.

Now to get the list of subordiantes of 'Chuck', execute

EXEC dbo. getSubordinates 3

Here is the output

EmpId | Employee | Manager |

3 | Chuck | Albert |

4 | Dona | Chuck |

5 | Eddie | Chuck |

6 | Fred | Chuck |

**5. Show the immediate subordinates of a given employee.**The path of the immediate subordiantes of a given employee will be the product of the path of the manager and the HKey of the subordinate.

So one can find the immediate subordinates of a given employee by writing a query like

SELECT EmpId, EmpName

FROM EmpHierarchy

WHERE Path = HKey * [Path of the Given Employee]

For eg, to get All the immediate subordinates of Albert, the query will be

SELECT EmpId, EmpName

FROM EmpHierarchy

WHERE Path = HKey *2 --Path of Albert

Here is the output

EmpId | EmpName |

2 | Bert |

3 | Chuck |

**6. List all the Leaf Level Employees.**Leaf level employees are those who dont have any subordinates. So finding them is a trivial task.

The following query gives you the list of leaf level employees.

SELECT EmpId, EmpName

FROM EmpHierarchy A

WHERE NOT Exists(SELECT 1 FROM EmpHierarchy T

WHERE T.Path % A.Hkey = 0

AND T.Path > A.Path)

And here is the output.

EmpId | EmpName |

2 | Bert |

4 | Dona |

5 | Eddie |

6 | Fred |

**7. Change the manager of an Employee**Updating can be of two types, promotion and demotion. When updating an Employees manager, you need to update the rows of the given employee and all his subordinates Path. Also you need to perform some validation to avoid cyclic reference. Here is the stored Procedure dbo.UpdateNode to update a given node.

So now to change Fred's Manager to Bert from Chuck, you can Execute

EXEC dbo.UpdateNode EmployeeId, NewManagerId

EXEC dbo.UpdateNode 6, 2

After the update, the hierarchy will look like

EXEC dbo.PrintHierarchy

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Bert | Albert |

-- | 2 | Chuck | Albert |

--- | 3 | Dona | Chuck |

--- | 3 | Eddie | Chuck |

--- | 3 | Fred | Bert |

Now, lets try a promotion. Eddie directly reports to Albert instead of Chuck

EXEC dbo.UpdateNode 5, 1

This is how the hierarchy will look like now.

EXEC dbo.PrintHierarchy

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Bert | Albert |

-- | 2 | Chuck | Albert |

-- | 2 | Eddie | Albert |

--- | 3 | Fred | Bert |

--- | 3 | Dona | Chuck |

Finally lets demote chuck one level down the hierarchy. Instead of reporting to Albert, he will now report to Bert

EXEC dbo.UpdateNode 3, 2

EXEC dbo.PrintHierarchy

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Bert | Albert |

-- | 2 | Eddie | Albert |

--- | 3 | Fred | Bert |

--- | 3 | Chuck | Bert |

---- | 4 | Dona | Chuck |

**8. Delete an Employee and all his subordiantes.**Deleting a subtree is fairly easy. we can use the same logic of finding subordinates and apply that in the where clause.

Here is the stored procedure dbo.deleteSubTree

Now to fire the whole team of Chuck, you can execute

EXEC dbo.deleteSubTree 3 --EmployeeId of chuck

After the deletion, the Hierarchy will look like

EXEC dbo.PrintHierarchy

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Bert | Albert |

-- | 2 | Eddie | Albert |

--- | 3 | Fred | Bert |

**9. Delete an Employee.**

This can be more tricky, because when you delete an Employee, If he has subordinates, the hierarchy will be broken. So all the subordinates of the deleted employee should report to the Manager of the deleted employee. Here is the Stored Procedure dbo.deleteNode, which takes care of it.So Now lets try deleting Bert.

EXEC dbo.deleteNode 2

After the deletion, the hierarchy will look like

Hierarchy | Level | Employee | Manager |

- | 1 | Albert | NULL |

-- | 2 | Eddie | Albert |

-- | 2 | Fred | Albert |

And thats it :)

These are the pro's and con's of the model from my point of view.

**Pros**1. Use only standard SQL constructs.

2. Use only set based solutions.

3. Can be implemented as self manageable using triggers.

4. Gives better performance because it uses Numeric operations instead of string operations.

**Cons**1. For larger Hierarchies, you may encounter out of range errors because the Path value grows in an exponential rate.

Still you can replace the bigint with a larger datatype like numeric and replace the % operator with division operator and overcome the issue.

AS I mentioned earlier, this model is still in its evolving face. So as always, your feedback is highly appreciated.

**EmpHierarchy Table Script**

This is the table script used in the examples for Path enumeration using Prime Number Products.

CREATE TABLE dbo.EmpHierarchy(EmpId int PRIMARY KEY,

EmpName VARCHAR(50),

ManagerId int,

HKey bigint UNIQUE,

Path bigint)

**Prime Numbers table and Insert Script**

Here is the script to create the Prime Numbers table and populate the first 1000 Prime numbers.

CREATE TABLE dbo.PrimeNumbers(Number bigint PRIMARY KEY)

GO

--Insert First 1000 Prime Numbers

DECLARE @num bigint

DECLARE @cnt int

DECLARE @isPrime bit

DECLARE @nsqrt int

DECLARE @divisor int

INSERT INTO PrimeNumbers VALUES(2) -- First Prime Number

SET @num = 3 --Next Prime Number

SET @cnt = 1 --One row already inserted.

WHILE @cnt <>

BEGIN

SET @isPrime = 1 -- Be Optimistic ;)

SET @nsqrt = FLOOR(SQRT(@num))

SET @divisor = 3

WHILE @divisor <= @nsqrt

BEGIN

IF @num % @divisor = 0

BEGIN

SET @isPrime = 0

BREAK

END

SET @divisor = @divisor + 2

END

IF @isPrime = 1

BEGIN

SET @cnt = @cnt + 1

INSERT INTO PrimeNumbers VALUES(@num)

END

SET @num = @num+2

END

--SELECT * FROM PrimeNumbers

**Insert Node**

Stored Procedure for Inserting a new Employee

/*

Inserts a new Employee into the table

*/

CREATE PROC dbo.InsertNode

(@empid int,

@empName VARCHAR(50),

@MgrId int)

AS

BEGIN

DECLARE @Hkey int

DECLARE @PathVal Numeric(32,0)

BEGIN TRAN

--Get the next Prime Number

SELECT @Hkey = Min(Number) FROM PrimeNumbers

WHERE Number > (SELECT ISNULL(Max(Hkey),0) FROM EmpHierarchy)

IF @MgrID IS NULL OR @MgrId = @EmpId --Root Node

SET @PathVal = @Hkey

ELSE --Get Managers Path and Calculate the path of the new node

SELECT @PathVal = Path * @Hkey FROM EmpHierarchy

WHERE EmpId = @MgrId

INSERT INTO EmpHierarchy

VALUES(@empid, @empName, @MgrId,@Hkey, @PathVal)

IF @@ERROR <> 0

ROLLBACK TRAN

ELSE

COMMIT TRAN

END

**Print Hierarchy**

/*

Print Hierarchy with Graphical representation of the Level

*/

CREATE PROC dbo.PrintHierarchy

@EmpID as bigint = 1

AS

BEGIN

SET NOCOUNT ON

DECLARE @PathVal as bigint

SELECT @PathVal = Path FROM EmpHierarchy

WHERE EmpId = @EmpID

SELECT REPLICATE('-',(SELECT COUNT(*) FROM EmpHierarchy B WHERE A.Path % B.Path = 0)) As Hierarchy,

(SELECT COUNT(*) FROM EmpHierarchy B WHERE A.Path % B.Path = 0) as [Level],

A.EmpName AS Employee, C.EmpName As Manager

FROM EmpHierarchy A

LEFT JOIN EmpHierarchy C

ON A.ManagerId = C.EmpId

WHERE A.Path % @PathVal = 0

ORDER By [Level]

END

**List of Superiors**

/*

Get the List of Superiors and their managers

*/

CREATE PROC dbo.getSuperiors

(@EmpId int)

AS

BEGIN

SET NOCOUNT ON

DECLARE @Path bigint

SELECT @Path = Path

FROM EmpHierarchy

WHERE EmpId = @EmpId

SELECT A.EmpId, A.EmpName AS Employee, C.EmpName As Manager

FROM EmpHierarchy A

LEFT JOIN EmpHierarchy C

ON A.ManagerId = C.EmpId

WHERE @Path % A.Hkey = 0

ORDER By A.Path DESC

END

**List of Subordinates**

/*

Get the List of Subordinates and their managers

*/

CREATE PROC dbo.getSubordinates

(@EmpId int)

AS

BEGIN

SET NOCOUNT ON

DECLARE @Hkey bigint

SELECT @HKey = Hkey

FROM EmpHierarchy

WHERE EmpId = @EmpId

SELECT A.EmpId, A.EmpName AS Employee, C.EmpName As Manager

FROM EmpHierarchy A

LEFT JOIN EmpHierarchy C

ON A.ManagerId = C.EmpId

WHERE A.Path % @HKey = 0

ORDER By A.Path

END

GO

**Update Node**

/*

Update an Employee with a new Manager. Also adjusts the path of the

Employee and his Subordinates

*/

CREATE PROC dbo.UpdateNode

(@Empid bigint,

@NewMgrId bigint)

AS

BEGIN

SET NOCOUNT ON

DECLARE @CurrentMgrPath bigint

DECLARE @NewMgrPath bigint

DECLARE @CurrentMgr bigint

DECLARE @CurrentPath bigint

DECLARE @NewPath bigint

--Employee Exists?

IF NOT EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @empid)

RAISERROR('Employee Id %d Does not exists',16,1,@Empid)

--Manager Exists

IF NOT EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @NewMgrId)

RAISERROR('Manager %d Does not exists',16,1,@NewMgrId)

--Get the required values to variables

SELECT @CurrentMgr = ManagerId,

@CurrentPath = Path

FROM EmpHierarchy

WHERE EmpId = @empid

SELECT @CurrentMgrPath = Path

FROM EmpHierarchy

WHERE EmpId = @CurrentMgr

SELECT @NewMgrPath = Path

FROM EmpHierarchy

WHERE EmpId = @NewMgrId

--Cyclic Reference

IF EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @NewMgrId

AND Path % @CurrentPath = 0)

RAISERROR('Error: Cyclic Reference',16,1)

--Seems OK. Update.

BEGIN TRAN

--Update Manager

UPDATE EmpHierarchy

SET ManagerId = @NewMgrId

WHERE EmpId = @Empid

IF @@ROWCOUNT = 0 OR @@ERROR <> 0

BEGIN

ROLLBACK

RAISERROR('Failed To update',16,1)

END

--Update Path

UPDATE EmpHierarchy

SET Path = ((Path / @CurrentMgrPath)*@NewMgrPath)

WHERE Path % @CurrentPath = 0

IF @@ROWCOUNT = 0 OR @@ERROR <> 0

BEGIN

ROLLBACK

RAISERROR('Failed To update',16,1)

END

ELSE

BEGIN

COMMIT

RETURN 1

END

END

GO

**Delete a SubTree**

/*

Delete a subtree from the hierarchy

*/

CREATE PROC dbo.deleteSubTree

(@EmpId int)

AS

BEGIN

SET NOCOUNT ON

DECLARE @Hkey bigint

SELECT @Hkey = Hkey

FROM EmpHierarchy

WHERE EmpId = @EmpId

DELETE FROM EmpHierarchy

WHERE Path % @Hkey = 0

END

**Delete a Single Node**

/*

Delete a given node. All Employees reporting to

the deleted employee will now report to the manager of the deleted employee

*/

ALTER PROC dbo.deleteNode

(@EmpId int)

AS

BEGIN

SET NOCOUNT ON

DECLARE @Hkey bigint

DECLARE @MgrId bigint

DECLARE @EmpPath bigint

DECLARE @MgrPath bigint

SELECT @Hkey = Hkey,

@MgrId = ManagerId,

@EmpPath = Path

FROM EmpHierarchy

WHERE EmpId = @EmpId

SELECT @MgrPath = Path

FROM EmpHierarchy

WHERE EmpId = @MgrId

--Delete Employee

DELETE FROM EmpHierarchy

WHERE EmpId = @EmpId

--Update ManagerID of Immediate Subordiantes

UPDATE EmpHierarchy

SET ManagerId = @MgrId

WHERE Path / Hkey = @EmpPath

--Connect subordinates to the immediate superior

UPDATE EmpHierarchy

SET Path = ((Path/@EmpPath)*@MgrPath)

WHERE PAth % @Hkey = 0

END

GO