-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using subqueries to filter across relationships #99
Comments
Results: q1 is the current behavior, but generally incorrect and has poor performance (n table joins for n filter statements). This also requires a q2 is unfeasible to generate, but correct. q3/q4 and q5/q6 should be equivalent, with the exception that q3 & q4 rely on a join. That said, q3 & q4 are easier to generate than the subquery approach. The subquery approach is more difficult since it doesn't automatically account for
q1 = Blog.objects \
.filter(entry__headline__contains='Lennon') \
.filter(entry__pub_date__year=2008)
q2 = Blog.objects.filter(
entry__headline__contains='Lennon',
entry__pub_date__year=2008,
)
q3 = Blog.objects.filter(
entry__in=Entry.objects \
.filter(headline__contains='Lennon') \
.filter(pub_date__year=2008)
)
q4 = Blog.objects.filter(
entry__in=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)
q5 = Blog.objects.filter(
id__in=Entry.objects \
.filter(headline__contains='Lennon') \
.filter(pub_date__year=2008) \
.values('blog')
)
q6 = Blog.objects.filter(
id__in=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
).values('blog'),
) q1 SELECT "testapp_blog"."id",
"testapp_blog"."name"
FROM "testapp_blog"
INNER JOIN "testapp_entry" ON ("testapp_blog"."id" = "testapp_entry"."blog_id")
INNER JOIN "testapp_entry" T3 ON ("testapp_blog"."id" = T3."blog_id")
WHERE ("testapp_entry"."headline" LIKE %Lennon% ESCAPE '\'
AND T3."pub_date" BETWEEN 2008-01-01 AND 2008-12-31) q2 SELECT "testapp_blog"."id",
"testapp_blog"."name"
FROM "testapp_blog"
INNER JOIN "testapp_entry" ON ("testapp_blog"."id" = "testapp_entry"."blog_id")
WHERE ("testapp_entry"."headline" LIKE %Lennon% ESCAPE '\'
AND "testapp_entry"."pub_date" BETWEEN 2008-01-01 AND 2008-12-31) q3 & q4 SELECT "testapp_blog"."id",
"testapp_blog"."name"
FROM "testapp_blog"
INNER JOIN "testapp_entry" ON ("testapp_blog"."id" = "testapp_entry"."blog_id")
WHERE "testapp_entry"."id" IN
(SELECT U0."id"
FROM "testapp_entry" U0
WHERE (U0."headline" LIKE %Lennon% ESCAPE '\'
AND U0."pub_date" BETWEEN 2008-01-01 AND 2008-12-31)) q5 & q6 SELECT "testapp_blog"."id",
"testapp_blog"."name"
FROM "testapp_blog"
WHERE "testapp_blog"."id" IN
(SELECT U0."blog_id"
FROM "testapp_entry" U0
WHERE (U0."headline" LIKE %Lennon% ESCAPE '\'
AND U0."pub_date" BETWEEN 2008-01-01 AND 2008-12-31)) |
The upcoming v1 will change how queries are constructed, going from queries like q1 to queries like q3. However, this issue is now more about the differences q3 and q5, and that will not be resolved in the release. For now, removing from the milestone. The change from q3 to q5 would also make the |
Just to rehash, the goal of this issue is to investigate the differences between the following two queries:
In the latter query, by filtering on the
pk
of the blogs with theblog_id
s of the filtered entries, we avoid a having to JOIN and having to specify DISTINCT.As detailed here, there are nuances to filtering across to-many relationships. In general, most users would want to find blogs with entries about Lennon during 2008, not blogs that have entries about Lennon and that also have entries during 2008.
In general, the combined join statements syntax (1st case above) is not possible to construct with the current implementation of django-filter. To make this possible, the filter API would need to return
Q
objects instead of invokingfilter()
andexclude()
directly. Additionally,exclude()
does not follow the same rules asfilter()
, and combined joins will behave like chained excludes.Either way, it will be necessary to use either the nested join or nested subquery approach. This makes it possible to filter the default queryset (permissions handling).
Todo:
are chained and combined subqueries equivalent?
The text was updated successfully, but these errors were encountered: