Explanation:
I'll use the following tree from this article in the explanation that follows:
Given the tree above, we know that in order to retrieve the descendants of the Meat
node, we need to execute an SQL query that looks like this:
SELECT * FROM tree WHERE lft BETWEEN 12 AND 17;
And for the ancestors of the Pork
node, we'd need to execute an SQL query that looks like this:
SELECT * FROM tree WHERE lft < 15 AND rght > 16;
But what if we wanted the all of the descendants of both the Fruit
and Meat
nodes? We could do the following:
SELECT * FROM tree WHERE lft BETWEEN 2 AND 11 OR lft BETWEEN 12 AND 17;
And if we wanted to retrieve all of the ancestors of the Red
, Beef
and Pork
nodes, we could do this:
SELECT * FROM tree WHERE lft < 3 AND rght > 6 OR lft < 13 AND rght > 14 OR lft < 15 AND rght > 16;
Now take a look at that last query again. Couldn't it be simplified and still accomplish the same result? Sure it could:
SELECT * FROM tree WHERE lft < 3 AND rght > 6 OR lft < 15 AND rght > 14;
The difference is that we've put the nodes into groups and then identified the highest lft
value and the lowest rght
value per group. A group is defined as any collection of nodes that share a tree_id
and a parent_id
. Any ancestors of these groups of nodes will have a lft
value less than the highest lft
value and a rght
value greater than the lowest rght
value.
~~For the descendants of a group of nodes, it is just the opposite:~~ (See the edit below, addressing the problem with this method.) Any descendants of a group of nodes will have a lft
value greater than the lowest lft
value and a rght
value less than the highest rght
value. For example, if we wanted to retrieve the descendants of Red
, Yellow
, and Meat
:
SELECT * FROM tree WHERE lft > 3 AND rght < 10 OR lft > 12 AND rght < 17;
(Edit: 30 Sept. 2014) However, this method returns more nodes than is required, as pointed out by craigds in the comments below. I'm currently working on a method to fix this.
Why?
Both get_queryset_descendants
and get_queryset_ancestors
use a Q
object in order to accomplish their job: retrieving either the ancestors or descendants of a queryset (i.e., group) of nodes. This Q
object can become quite large and in some cases it can raise an OperationalError
if there are too many SQL variables involved. For example, sqlite
only allows a single query to have <1000 SQL variables; well, if you wanted to get the ancestors of a queryset that contains 400 distinct nodes, you're looking at a query that contains 1200 variables. Now we could discuss the various techniques involved in shrinking that queryset, but with sufficiently large sets of data, sometimes there just isn't a way to do it.
That's where the aggregate
kwarg comes into play. This kwarg tells get_queryset_ancestors
and get_queryset_descendants
to group nodes first by their tree_id
then by their parent_id
. Then, for each "group", it finds the the appropriate lft
and rght
values to ensure that there is only one Q
object per group of nodes. To reuse the example above, if you had a queryset with 400 distinct nodes but 200 of them shared the same tree_id
and parent_id
and the other 200 shared a different tree_id
/parent_id
combination, you would have two groups, with a total of 6 SQL variables (tree_id
, lft
, rght
variables for each group.) Not only does this make for faster queries, but it has the potential to avoid the scenario I mentioned earlier where too many SQL variables raises an OperationalError
.
Examples:
models.py
from django.db import models
from mptt.models import TreeForeignKey, MPTTModel
class Directory(MPTTModel):
parent = TreeForeignKey(
'self',
null = True,
blank = True,
related_name = 'children')
path = models.CharField(
max_length = 255)
def __unicode__(self):
return self.path
class File(models.Model):
parent = models.ForeignKey(
Directory,
null = True,
blank = True,
related_name = 'files')
path = models.CharField(
max_length = 255)
def __unicode__(self):
return self.path
Our dataset is a directory structure that matches the example tree in the above graphic:
mkdir -p Food/Fruit/Red/Cherry
mkdir -p Food/Fruit/Yellow/Banana
mkdir -p Food/Meat/{Beef,Pork}
Then:
import os
from myapp.filesanddirectories.models import *
for root, directories, files in os.walk('/Food'):
parent_dir, created = Directory.objects.get_or_create(path = root)
for directory in directories:
path = os.path.join(root, directory)
print path
object, created = Directory.objects.get_or_create(path = path, parent = parent_dir)
if created:
print("CREATED: %s" % path)
Running in the python interpreter (with debugging output):
>>> from myapp.filesanddirectories.models import *
>>> from django.db.models import Q
>>> from django.db import transaction
>>> dirs = Directory.objects.filter(Q(path = '/Food/Fruit') | Q(path = '/Food/Meat'))
>>> dirs
[<Directory: /Food/Fruit>, <Directory: /Food/Meat>]
>>>
>>> ### get_queryset_descendants ###
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_descendants(dirs)
Finished in 0.000601
Q Object: (OR: (AND: (u'lft__gt', 2), (u'tree_id', 1), (u'rght__lt', 11)), (AND: (u'lft__gt', 12), (u'tree_id', 1), (u'rght__lt', 17)))
[<Directory: /Food/Fruit/Red>, <Directory: /Food/Fruit/Red/Cherry>, <Directory: /Food/Fruit/Yellow>, <Directory: /Food/Fruit/Yellow/Banana>, <Directory: /Food/Meat/Pork>, <Directory: /Food/Meat/Beef>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_descendants(dirs, aggregate=True)
Finished in 0.000036
Q Object: (AND: (u'lft__gt', 2), (u'tree_id', 1), (u'rght__lt', 17))
[<Directory: /Food/Fruit/Red>, <Directory: /Food/Fruit/Red/Cherry>, <Directory: /Food/Fruit/Yellow>, <Directory: /Food/Fruit/Yellow/Banana>, <Directory: /Food/Meat/Pork>, <Directory: /Food/Meat/Beef>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_descendants(dirs, include_self=True)
Finished in 0.000043
Q Object: (OR: (AND: (u'lft__gt', 1), (u'tree_id', 1), (u'rght__lt', 12)), (AND: (u'lft__gt', 11), (u'tree_id', 1), (u'rght__lt', 18)))
[<Directory: /Food/Fruit>, <Directory: /Food/Fruit/Red>, <Directory: /Food/Fruit/Red/Cherry>, <Directory: /Food/Fruit/Yellow>, <Directory: /Food/Fruit/Yellow/Banana>, <Directory: /Food/Meat>, <Directory: /Food/Meat/Pork>, <Directory: /Food/Meat/Beef>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_descendants(dirs, include_self=True, aggregate=True)
Finished in 0.000027
Q Object: (AND: (u'lft__gt', 1), (u'tree_id', 1), (u'rght__lt', 18))
[<Directory: /Food/Fruit>, <Directory: /Food/Fruit/Red>, <Directory: /Food/Fruit/Red/Cherry>, <Directory: /Food/Fruit/Yellow>, <Directory: /Food/Fruit/Yellow/Banana>, <Directory: /Food/Meat>, <Directory: /Food/Meat/Pork>, <Directory: /Food/Meat/Beef>]
>>>
>>>
>>> ### get_queryset_ancestors ###
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_ancestors(dirs)
Finished in 0.000042
Q Object: (OR: (AND: (u'rght__gt', 11), (u'lft__lt', 2), (u'tree_id', 1)), (AND: (u'rght__gt', 17), (u'lft__lt', 12), (u'tree_id', 1)))
[<Directory: /Food>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_ancestors(dirs, aggregate=True)
Finished in 0.000027
Q Object: (AND: (u'rght__gt', 11), (u'lft__lt', 12), (u'tree_id', 1))
[<Directory: /Food>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_ancestors(dirs, include_self=True)
Finished in 0.000045
Q Object: (OR: (AND: (u'rght__gt', 10), (u'lft__lt', 3), (u'tree_id', 1)), (AND: (u'rght__gt', 16), (u'lft__lt', 13), (u'tree_id', 1)))
[<Directory: /Food>, <Directory: /Food/Fruit>, <Directory: /Food/Meat>]
>>>
>>> transaction.commit()
>>> Directory.objects.get_queryset_ancestors(dirs, include_self=True, aggregate=True)
Finished in 0.000091
Q Object: (AND: (u'rght__gt', 10), (u'lft__lt', 13), (u'tree_id', 1))
[<Directory: /Food>, <Directory: /Food/Fruit>, <Directory: /Food/Meat>]
>>>
Summary
Please tear this to shreds. Seriously. I haven't been able to find any bugs with this implementation but that doesn't mean that they aren't there. Tests pass though so ... yay. :-)