Monday, March 14, 2005

 

Path Enumeration using Prime Number Products

Managing Hierarchies in Relational Database Environment is always a challenging task. Over the years many models like Adjacency List Model, Nested Set Model, Nested Interval Model, Path Enumeration Model etc. were proposed and implemented. There are also other popular methods using recursion and user defined function. SQL Server 2005 T-SQL Extensions has special operators for managing recursive queries. Joe CELKO has written a book exclusively on Managing Trees and Hierarchies.

The basic Hierarchical operations in a typical Employee-Manager type of relation, include

1. Add a new Employee
2. Show the whole Hierarchy with levels.
3. Show the superiors of a given employee.
4. Show the subordinates of a given employee.
5. Show the immediate subordinates of a given employee.
6. List all the Leaf Level Employees.
7. Change the manager of an Employee
8. Delete an Employee and all his subordiantes.
9. Delete an Employee.

Here I am trying to propose a simple model which works almost like Path Enumeration model. This model uses a Prime Number table and basic mathematical operations to answer the above questions. Also the Model only uses set based solutions and can be implemented as a self managing system with the help of triggers.

This model works on two basic concepts.

1. There is a unique path to every node in a hierarchy from the root node.
2. The Prime Number that can be a divisor of the Product of a set of Prime Numbers, are only the Prime numbers in the set.

That is, Let product of n prime numbers be Px, then, the factors of Px can only be the Prime Numbers that participated in the multiplication, or the sub products.
So the product cannot be divided by any other Prime Number.

For eg.

P1, P2, P3,...Pn is the list of Prime Numbers.

Let P be the Product.

Then P1 * P2 * P3 * ... * Pn = P

Then the factors of P can be either P1, P2, P3,...Pn Or the sub product of P1, P2, P3,...Pn.


Now let's try to answer the above questions.

The following is the table used in the examples.

CREATE TABLE dbo.EmpHierarchy
(EmpId int PRIMARY KEY,
EmpName VARCHAR(15),
ManagerId int,
HKey bigint UNIQUE,
Path bigint)

GO

The purpose of the first three columns are obvious. The column Hkey is a Unique Prime Number assigned to each employee. The column Path Holds the product of the Hkey values of the current node and nodes above the current node.

The script used for creating and Populating the PrimeNumber table is here.

1. Add a new Employee
For inserting a new node (employee), we will get the least available Prime Number and use it as the Hkey of the node. The Path will be the Product of the Hkey and the Path of the parent node. For the root level entry the path will be the Hkey itself.

Here is the script for the stored procedure dbo.InsertNode

Now lets insert some sample data.

EXEC dbo.InsertNode 1,'Albert', NULL
EXEC dbo.InsertNode 2,'Bert', 1
EXEC dbo.InsertNode 3,'Chuck', 1
EXEC dbo.InsertNode 4,'Dona', 3
EXEC dbo.InsertNode 5,'Eddie', 3
EXEC dbo.InsertNode 6,'Fred', 3

SELECT * FROM EmpHierarchy

Now we have this data in the table.

EmpIdEmpNameManagerIdHKeyPath
2Bert136
3Chuck1510
4Dona3770
5Eddie311110
6Fred313130

2. Show the whole Hierarchy with levels.
The level of a node in a hierarchy depends on the number of levels above it. So the level of a given node can be found by counting the levels above it. Thats the technique used in the stored procedure dbo.PrintHierarchy. The SP also accepts an EmployeeId, to display a partial hierarchy.

EXEC dbo.PrintHierarchy

Here is the output
HierarchyLevelEmployeeManager
-1AlbertNULL
--2BertAlbert
--2ChuckAlbert
---3DonaChuck
---3EddieChuck
---3FredChuck

3. Show the superiors of a given employee.
The path of a given employee is the product of the Hkeys of his superiors and his Hkey. As per our rule the path can be divided only by the Prime Numbers participated in it. Based on that finding the superiors logic is failrly easy. You can get the list of the superios of a given employee by

SELECT A.EmpId, A.EmpName
FROM EmpHierarchy A
WHERE [Path of the given Employee] % A.Hkey = 0


I have this Stored Procedure dbo.getSuperiors which accepts an Employee Id and return the list of his superiors.

To get the Superiors of 'Fred', execute

EXEC dbo. getSuperiors 6

Here is the output
EmpIdEmployeeManager
6FredChuck
3ChuckAlbert
1AlbertNULL

4. Show the subordinates of a given employee.
The Hkey of a given employee will be a factor of the Path of all his subordinates and only his subordinates.
So to get the list of the subordinates of an employee, you can run a query like

SELECT A.EmpId, A.EmpName AS Employee
FROM EmpHierarchy A
WHERE A.Path % [HKey of the given Employee] = 0

Here is the stored procedure dbo. getSubordinates for doing the same.

Now to get the list of subordiantes of 'Chuck', execute

EXEC dbo. getSubordinates 3

Here is the output
EmpIdEmployeeManager
3ChuckAlbert
4DonaChuck
5EddieChuck
6FredChuck

5. Show the immediate subordinates of a given employee.
The path of the immediate subordiantes of a given employee will be the product of the path of the manager and the HKey of the subordinate.
So one can find the immediate subordinates of a given employee by writing a query like

SELECT EmpId, EmpName
FROM EmpHierarchy
WHERE Path = HKey * [Path of the Given Employee]

For eg, to get All the immediate subordinates of Albert, the query will be

SELECT EmpId, EmpName
FROM EmpHierarchy
WHERE Path = HKey *2 --Path of Albert

Here is the output

EmpIdEmpName
2Bert
3Chuck

6. List all the Leaf Level Employees.
Leaf level employees are those who dont have any subordinates. So finding them is a trivial task.
The following query gives you the list of leaf level employees.

SELECT EmpId, EmpName
FROM EmpHierarchy A
WHERE NOT Exists(SELECT 1 FROM EmpHierarchy T
WHERE T.Path % A.Hkey = 0
AND T.Path > A.Path)

And here is the output.
EmpIdEmpName
2Bert
4Dona
5Eddie
6Fred

7. Change the manager of an Employee
Updating can be of two types, promotion and demotion. When updating an Employees manager, you need to update the rows of the given employee and all his subordinates Path. Also you need to perform some validation to avoid cyclic reference. Here is the stored Procedure dbo.UpdateNode to update a given node.

So now to change Fred's Manager to Bert from Chuck, you can Execute

EXEC dbo.UpdateNode EmployeeId, NewManagerId

EXEC dbo.UpdateNode 6, 2

After the update, the hierarchy will look like

EXEC dbo.PrintHierarchy
HierarchyLevelEmployeeManager
-1AlbertNULL
--2BertAlbert
--2ChuckAlbert
---3DonaChuck
---3EddieChuck
---3FredBert

Now, lets try a promotion. Eddie directly reports to Albert instead of Chuck

EXEC dbo.UpdateNode 5, 1

This is how the hierarchy will look like now.

EXEC dbo.PrintHierarchy
HierarchyLevelEmployeeManager
-1AlbertNULL
--2BertAlbert
--2ChuckAlbert
--2EddieAlbert
---3FredBert
---3DonaChuck

Finally lets demote chuck one level down the hierarchy. Instead of reporting to Albert, he will now report to Bert

EXEC dbo.UpdateNode 3, 2

EXEC dbo.PrintHierarchy
HierarchyLevelEmployeeManager
-1AlbertNULL
--2BertAlbert
--2EddieAlbert
---3FredBert
---3ChuckBert
----4DonaChuck

8. Delete an Employee and all his subordiantes.
Deleting a subtree is fairly easy. we can use the same logic of finding subordinates and apply that in the where clause.
Here is the stored procedure dbo.deleteSubTree

Now to fire the whole team of Chuck, you can execute

EXEC dbo.deleteSubTree 3 --EmployeeId of chuck

After the deletion, the Hierarchy will look like

EXEC dbo.PrintHierarchy
HierarchyLevelEmployeeManager
-1AlbertNULL
--2BertAlbert
--2EddieAlbert
---3FredBert

9. Delete an Employee.
This can be more tricky, because when you delete an Employee, If he has subordinates, the hierarchy will be broken. So all the subordinates of the deleted employee should report to the Manager of the deleted employee. Here is the Stored Procedure dbo.deleteNode, which takes care of it.

So Now lets try deleting Bert.

EXEC dbo.deleteNode 2

After the deletion, the hierarchy will look like
HierarchyLevelEmployeeManager
-1AlbertNULL
--2EddieAlbert
--2FredAlbert

And thats it :)

These are the pro's and con's of the model from my point of view.

Pros
1. Use only standard SQL constructs.
2. Use only set based solutions.
3. Can be implemented as self manageable using triggers.
4. Gives better performance because it uses Numeric operations instead of string operations.

Cons
1. For larger Hierarchies, you may encounter out of range errors because the Path value grows in an exponential rate.
Still you can replace the bigint with a larger datatype like numeric and replace the % operator with division operator and overcome the issue.

AS I mentioned earlier, this model is still in its evolving face. So as always, your feedback is highly appreciated.

 
EmpHierarchy Table Script
This is the table script used in the examples for Path enumeration using Prime Number Products.


CREATE TABLE dbo.EmpHierarchy(EmpId int PRIMARY KEY,
EmpName VARCHAR(50),
ManagerId int,
HKey bigint UNIQUE,
Path bigint)

 
Prime Numbers table and Insert Script

Here is the script to create the Prime Numbers table and populate the first 1000 Prime numbers.

CREATE TABLE dbo.PrimeNumbers(Number bigint PRIMARY KEY)
GO

--Insert First 1000 Prime Numbers

DECLARE @num bigint
DECLARE @cnt int
DECLARE @isPrime bit
DECLARE @nsqrt int
DECLARE @divisor int

INSERT INTO PrimeNumbers VALUES(2) -- First Prime Number

SET @num = 3 --Next Prime Number
SET @cnt = 1 --One row already inserted.

WHILE @cnt <>
BEGIN
SET @isPrime = 1 -- Be Optimistic ;)
SET @nsqrt = FLOOR(SQRT(@num))
SET @divisor = 3
WHILE @divisor <= @nsqrt
BEGIN
IF @num % @divisor = 0
BEGIN
SET @isPrime = 0
BREAK
END

SET @divisor = @divisor + 2
END

IF @isPrime = 1
BEGIN
SET @cnt = @cnt + 1
INSERT INTO PrimeNumbers VALUES(@num)
END

SET @num = @num+2
END

--SELECT * FROM PrimeNumbers

 
Insert Node
Stored Procedure for Inserting a new Employee


/*
Inserts a new Employee into the table
*/
CREATE PROC dbo.InsertNode
(@empid int,
@empName VARCHAR(50),
@MgrId int)
AS

BEGIN
DECLARE @Hkey int
DECLARE @PathVal Numeric(32,0)

BEGIN TRAN

--Get the next Prime Number
SELECT @Hkey = Min(Number) FROM PrimeNumbers
WHERE Number > (SELECT ISNULL(Max(Hkey),0) FROM EmpHierarchy)


IF @MgrID IS NULL OR @MgrId = @EmpId --Root Node
SET @PathVal = @Hkey
ELSE --Get Managers Path and Calculate the path of the new node
SELECT @PathVal = Path * @Hkey FROM EmpHierarchy
WHERE EmpId = @MgrId

INSERT INTO EmpHierarchy
VALUES(@empid, @empName, @MgrId,@Hkey, @PathVal)

IF @@ERROR <> 0
ROLLBACK TRAN
ELSE
COMMIT TRAN


END


 
Print Hierarchy


/*
Print Hierarchy with Graphical representation of the Level
*/

CREATE PROC dbo.PrintHierarchy
@EmpID as bigint = 1
AS
BEGIN
SET NOCOUNT ON
DECLARE @PathVal as bigint
SELECT @PathVal = Path FROM EmpHierarchy
WHERE EmpId = @EmpID

SELECT REPLICATE('-',(SELECT COUNT(*) FROM EmpHierarchy B WHERE A.Path % B.Path = 0)) As Hierarchy,
(SELECT COUNT(*) FROM EmpHierarchy B WHERE A.Path % B.Path = 0) as [Level],
A.EmpName AS Employee, C.EmpName As Manager
FROM EmpHierarchy A
LEFT JOIN EmpHierarchy C
ON A.ManagerId = C.EmpId
WHERE A.Path % @PathVal = 0
ORDER By [Level]
END





 
List of Superiors

/*
Get the List of Superiors and their managers
*/

CREATE PROC dbo.getSuperiors
(@EmpId int)
AS
BEGIN
SET NOCOUNT ON
DECLARE @Path bigint

SELECT @Path = Path
FROM EmpHierarchy
WHERE EmpId = @EmpId


SELECT A.EmpId, A.EmpName AS Employee, C.EmpName As Manager
FROM EmpHierarchy A
LEFT JOIN EmpHierarchy C
ON A.ManagerId = C.EmpId
WHERE @Path % A.Hkey = 0
ORDER By A.Path DESC

END



 
List of Subordinates

/*
Get the List of Subordinates and their managers
*/


CREATE PROC dbo.getSubordinates
(@EmpId int)
AS
BEGIN
SET NOCOUNT ON
DECLARE @Hkey bigint

SELECT @HKey = Hkey
FROM EmpHierarchy
WHERE EmpId = @EmpId


SELECT A.EmpId, A.EmpName AS Employee, C.EmpName As Manager
FROM EmpHierarchy A
LEFT JOIN EmpHierarchy C
ON A.ManagerId = C.EmpId
WHERE A.Path % @HKey = 0
ORDER By A.Path

END

GO

 
Update Node

/*
Update an Employee with a new Manager. Also adjusts the path of the
Employee and his Subordinates
*/

CREATE PROC dbo.UpdateNode
(@Empid bigint,
@NewMgrId bigint)
AS

BEGIN
SET NOCOUNT ON

DECLARE @CurrentMgrPath bigint
DECLARE @NewMgrPath bigint
DECLARE @CurrentMgr bigint
DECLARE @CurrentPath bigint
DECLARE @NewPath bigint

--Employee Exists?
IF NOT EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @empid)
RAISERROR('Employee Id %d Does not exists',16,1,@Empid)

--Manager Exists
IF NOT EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @NewMgrId)
RAISERROR('Manager %d Does not exists',16,1,@NewMgrId)

--Get the required values to variables
SELECT @CurrentMgr = ManagerId,
@CurrentPath = Path
FROM EmpHierarchy
WHERE EmpId = @empid

SELECT @CurrentMgrPath = Path
FROM EmpHierarchy
WHERE EmpId = @CurrentMgr

SELECT @NewMgrPath = Path
FROM EmpHierarchy
WHERE EmpId = @NewMgrId

--Cyclic Reference
IF EXISTS(SELECT 1 FROM EmpHierarchy WHERE EmpId = @NewMgrId
AND Path % @CurrentPath = 0)
RAISERROR('Error: Cyclic Reference',16,1)


--Seems OK. Update.
BEGIN TRAN
--Update Manager
UPDATE EmpHierarchy
SET ManagerId = @NewMgrId
WHERE EmpId = @Empid

IF @@ROWCOUNT = 0 OR @@ERROR <> 0
BEGIN
ROLLBACK
RAISERROR('Failed To update',16,1)
END

--Update Path
UPDATE EmpHierarchy
SET Path = ((Path / @CurrentMgrPath)*@NewMgrPath)
WHERE Path % @CurrentPath = 0

IF @@ROWCOUNT = 0 OR @@ERROR <> 0
BEGIN
ROLLBACK
RAISERROR('Failed To update',16,1)
END
ELSE
BEGIN
COMMIT
RETURN 1
END
END

GO

 
Delete a SubTree


/*
Delete a subtree from the hierarchy
*/


CREATE PROC dbo.deleteSubTree
(@EmpId int)
AS
BEGIN
SET NOCOUNT ON
DECLARE @Hkey bigint

SELECT @Hkey = Hkey
FROM EmpHierarchy
WHERE EmpId = @EmpId

DELETE FROM EmpHierarchy
WHERE Path % @Hkey = 0


END



 
Delete a Single Node



/*
Delete a given node. All Employees reporting to
the deleted employee will now report to the manager of the deleted employee

*/


ALTER PROC dbo.deleteNode
(@EmpId int)
AS
BEGIN
SET NOCOUNT ON
DECLARE @Hkey bigint
DECLARE @MgrId bigint
DECLARE @EmpPath bigint
DECLARE @MgrPath bigint

SELECT @Hkey = Hkey,
@MgrId = ManagerId,
@EmpPath = Path
FROM EmpHierarchy
WHERE EmpId = @EmpId

SELECT @MgrPath = Path
FROM EmpHierarchy
WHERE EmpId = @MgrId

--Delete Employee
DELETE FROM EmpHierarchy
WHERE EmpId = @EmpId

--Update ManagerID of Immediate Subordiantes
UPDATE EmpHierarchy
SET ManagerId = @MgrId
WHERE Path / Hkey = @EmpPath

--Connect subordinates to the immediate superior
UPDATE EmpHierarchy
SET Path = ((Path/@EmpPath)*@MgrPath)
WHERE PAth % @Hkey = 0


END

GO



This page is powered by Blogger. Isn't yours?