Tuesday, March 28, 2006

Hierarchies in RDBMS

Here's one I've been knocking around while: how do you most effectively model an infinite-depth hierarchical structure in a relational database like PostgreSQL, MySQL, etc.?

A couple of choices come to mind:
1. a table with a field to contain the key value of its parent record
2. a table with a field containing a delimited value, like a filesystem pathname
3. A "link" table to map parent and child record IDs from the "data" table. This has the added capability of a record with multiple ancestors.

It seems that options 1 & 2 are similar, the main difference being 3 supports multiple ancestors. This could be extra confusing if you don't need this capability. Finding all children of a parent would difficult to do, and I think impossible with SQL alone--some sort of iteration or recursion logic would need to be written, and it is likely to be very slow with large datasets--you can index the keys to find records quickly, but you have to query the index many times.

Option 3 would support speedy lookups, but reorganizing the structure would be messy, especially for large datasets.

Is there a way to get the best of both worlds? How can lookups be quick and re-orgs also?

Someone must have an answer out there...