-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex filters? #54
Comments
I should note - I've already written some code to support this functionality in plain django - https://gist.github.com/chozabu/86b60caa0ce211f232da - the function takes a table name, list of filters, list of excludes, sort method and gives back JSON. So, my above question would be answered like:
(probably got URL formatting wrong, could be using JSON post there) I'm not convinced this method is production-ready (hence saying do not use in its title), and would be much more interested replicating the important bits in DFF - interested to hear thoughts |
I believe the only problem here is the repeated |
@philipn so - if we were to assume the HTTP spec allows for repeated query string keys - and they were decoded into a list of pairs, with order maintained, We could do a query like:
But still would it work? if the filters are all applied in one go, or one at a time, rather than in two "filtersets" would we not get results about users with gmail and ymail post in the wrong dates - or no results atall? |
To expand on above thoughts:
would resolve to either: (4 separate filters)
or (1 combined filter)
but what we really want is: (two separate filters)
|
Ah, I see what you're saying, now. I'm not sure what / if we should do here. Do other libraries allow queries as expressive as these? It's almost as if we would want to duplicate all possible operations on querysets -- that could get messy. Maybe we should suggest, for people who want multiple levels of queryset operations, that they define the operation explicitly on the FilterSet instead? |
All possible operations on a query set would be a bit much, including annotation and aggregation - though, I'd sure be interested in such a lib! I'm not aware of libraries that allow this functionality - though I'd be surprised if something does not exist... I can see how multiple querysets/excludesets are "on the line" - though this functionality is sure something that would be very handy - particularly on top of the sensible filter/security systems in DRFF. I'm not sure of a compatible way of encoding into the url query section - it seems sending lists of filtersets is vital. |
That's kind of the problem with the standard query string syntax - it's a single set of key/value pairs. We would need to define a more complex syntax that supports multiple filter sets. If we defined a multiple set syntax, then that would also allow us to add in OR and negation operations. eg, something like:
Which would map to f1 = Widget.objects.filter(field1=value1).exclude(field2=value2)
f2 = Widget.objects.ftiler(field3=value3).filter(field4=value4)
return f1 | ~f2 @chozabu As to your original question. Even if we implement this more complex syntax, your problem wouldn't be completely solved. The behavior you describe in your original post only applies to filters spanning multi-valued relationships. django-filter processes each filter serially, resulting in uo.filter(posts__title__icontains="gmail").filter(posts__publish_date__lte="01/01/2009") not uo.filter(posts__title__icontains="gmail", posts__publish_date__lte="01/01/2009") You would probably need to create a method filter that takes both values and does the combined filtering manually. eg, /api/users?old_post_title=gmail,01/01/2009&recent_post_title=ymail,01/01/2015 |
Also, a couple of quick notes on query strings:
|
I've just pushed out a new release (0.7.0), so you should be able to do a normal |
So, for query strings, it looks like there are non-standard ways of passing arrays - http://stackoverflow.com/a/9547490/445831 But perhaps when a query complexity is getting to this level, it could make more sense to POST the query as json? Also - nice links, was not aware you could link to a line range on github! And 0.7.0 seems good :) |
I'm hitting a point in my project where Think kind of functionality would be really handy Writing quite a few filters like:
where my models are like:
I can query like
This works - with some limitations, hard to find the admin of two groups, becomes a bit more complex for some other filters like:
to find questions tagged cheese, with a rating above or equal to 13. but, I cannot easily query for two tags matching some criteria. And If i want to be able to filter by the existing matching methods (icontains,iexact,gte,gt,lte,lt etc), and the model has a few different fields to filter on, and there are quite a few more models to do the same tasks for - this looks more and more like something that should be automated. It is somewhat tempting to accept syntax like
or even
Which I think should be happily parsable by literal_eval. not to awful to look at, and not too hard to do a backend for. Main problem is "the easy way" would be to filter directly on the users given values - which is as secure as a wet paper wall... Is there an easy/practical way I can check if a filterstring (like Hope you don't mind the rambling post! |
Is it necessary to do this all in one request? The problems you're running into only apply when filtering across relationships. If you remove the relationship from the query and use multiple requests, the issue is no longer relevant. You could get your user ids through: /api/group_memberships?role=admin&group__in=6,7 Let's say the above query returns memberships for users 1 and 2. You can filter your questions by doing: /api/questions?user__in=1,2 The same applies to filtering your tags. /api/tags?name__in=cheese,tomato&rated__gte=13
/api/questions?tag__in=1,2 |
@rpkilby I am doing similar things in quite a few places, for something like group admin it is just kinda nice to not have to make the extra request, but there are many situations where it will not scale very well - tags being quite a good example say I have 1million questions, the first 750k are tagged cheese, and the last 750k tomato.
could well return ~400k items. assuming we were doing something like this
which I do do on many routes, returning |
See #99 - I'm thinking about a reimplementation based on subqueries, which should address the issue. As this thread points out, users are generally going to want to filter criteria together, not separately. In the meantime, I'm not sure why you have 400k tags for two names. You should just have the two tags, right? I'd argue that the extra request is okay if it's relatively small. eg, getting a handful of groups or tags. Obviously, this is problematic for several hundreds or hundreds of thousands of results. |
@rpkilby #99 looks interesting - the same direction for sure - though I'm not sure what your plan is for how this will look in a url? As for the tags - sorry, I may have oversimplified my example a little, current setup is: Table of Posts each Post can have multiple TagConnections to relate it to a Tag I was prototyping a system along these lines in plain 'ol Django last year - Though no longer developed it may be interesting to have a glance at this page: http://voteflow.net/agora/woi/ (may require a second load to work) - it lets you build a query and graph your choice of two data items Hope you don't mind the rambling posts, I'm usually more concise - #99 looks like it may be a better start to the same conversation :) |
btw, it looks like I mistyped the subquery example in #99 - it's been corrected.
The idea is that this request: /api/users?memberships__role=admin&memberships__group__in=6,7 would result in this query+subquery: User.objects.filter(
group_memberships=GroupMembership.objects.filter(
role="admin"
).filter(
group__in=[6, 7]
)
) which would ideally return admin users for groups 6 and 7. See the first TODO in #99, as the above query may not actually work as intended.
Ah, yeah - the through model would present a problem as all of those results would be distinct. This should be solved by the subquery approach (again, probably...). eg, /api/questions?connection__rated_gte=13&connection__tag__name__in=cheese,tomato Note that the above doesn't solve your original problem of being able to achieve: uo = User.objects
uo = uo.filter(posts__title__icontains="gmail", posts__publish_date__lte="01/01/2009")
uo = uo.filter(posts__title__icontains="ymail", posts__publish_date__gte="01/01/2015") which is related to your point here:
At this point, there isn't anything much to do except to define a custom set syntax, urlencode it, then parse it inside the FilterBackend. You would need to manually combine the FilterSet results. It's been on my mind, but it seems somewhat out of scope for the moment. |
@rpkilby Ah-ha, good correction - the subquery now makes more sense - though, I think perhaps it should be I too mistyped my custom query syntax thoughts, using And I see what you mean - combining the parts of the query by default - which to me is by no means ideal, but sure sounds like a better default than not doing it thinking more on custom syntax for complex queries a more refined & flexible version could look like
which de-uglified looks like
only uses This would provide all the query power I want from the frontend (and then some!) To me - the hard part would be getting it to work as part of DRFF, using it to prevent querys that reveal private data - but perhaps I should get out the elbow grease and look into implementing it. Would you be interested in having this functionality in DRFF? |
Ach, copied it from the django docs, forgot to change the example to use filtering. And to answer in reverse:
Something well tested and well specified - I'd definitely be interested. :)
I've put down some thoughts in #100. The problem is that we're currently filtering on the model's default queryset (aka, all of the objects) without taking into account default filtering based on the request user. For example, a posts end point may allow the logged in user to view all of their posts + all other published posts - all other draft posts. Default user filtering is usually handled in DRF ViewSet's The downside is that this couples the FilterSet to the request, but I think we can structure this in a way that makes request handling optional and dry.
By all means, feel free to take a crack at it. It's definitely on my todo list too, and I have every intent of tackling these issues. Current priorities are:
Your example seems like a good start for another, more complex FilterBackend. A couple of ideas I've been floating around:
As far as getting around URL syntax, I think this could actually be done by two levels of url escaping. escape the Again, with the email posts example: >>> from urllib2 import quote
>>> set1 = quote('posts__title__icontains=gmail&posts__publish_date__lte=2009-01-01')
>>> set2 = quote('posts__title__icontains=ymail&posts__publish_date__gte=2015-01-01')
>>> set1
'posts__title__icontains%3Dgmail%26posts__publish_date__lte%3D2009-01-01'
>>> f = quote("(%s) & (%s)" % (set1, set2))
>>> f
'%28posts__title__icontains%253Dgmail%2526posts__publish_date__lte%253D2009-01-01%29%20%26%20%28posts__title__icontains%253Dymail%2526posts__publish_date__gte%253D2015-01-01%29' That value should be entirely querystring safe, so you could then just GET /api/users?%28posts__title__icontains%253Dgmail%2526posts__publish_date__lte%253D2009-01-01%29%20%26%20%28posts__title__icontains%253Dymail%2526posts__publish_date__gte%253D2015-01-01%29 |
@rpkilby That looks fantastic - perhaps harder to implement, and URLs have more chance of being a bit ugly, but very worth it for the extra features and safety Few thoughts: How does the performance of a large sub-query compare to doing it as a filterset? (hopefully very similar?) are there situations where we would need sub-sub-querys?
Perhaps not too important - I think this subquery approach is generally better. This also means should be able do even more complex querys and remain secure/private. On the system I am working on, we have a voting system - but some votes may be private. For normal querys, this is an issue - as I would like to be able to filter "Questions" based on a users vote - but only for public votes. I don't think this can be done without a subquery. (which can be done clientside) Currently not much of a security issue, but with the system I was suggesting for filtersets it would be |
The URLs are going to be ugly. I guarantee it.
No clue. It'd be worth coming up with some complex examples, then looking at the differences in the resulting SQL and query plans.
Yes. As far as I can tell, nested subqueries should work just fine. Related filters should handle this automatically, as they're processed recursively. |
Just performed a very quick test to compare subquerys: https://gist.github.com/chozabu/e2f7e9e56f828feaaf390bd01627de1f Looks like subquerys are very slightly slower in my test. some info on the output attached in this post
|
Funnily enough - looking at http://scottlobdell.me/2015/01/sql-database-best-practices-django-orm/ it is suggested that
mistest - the list evaluated subquery does indeed pass in all the ids from the initial query updated gist here: https://gist.github.com/chozabu/6f41a359423c7ef07171279471d981d0 |
Yeah, I would probably disregard the article as it comes from a complete lack of understanding. As you saw, evaluating the query w/ |
@chozabu - btw, still churning on related work. Some interesting results. Anyway, as a quick note on subqueries the two below are actually a bit different: q1 = User.objects.filter(
posts=Post.objects
.filter(title__icontains="gmail")
.filter(publish_date__lte=datetime(2009, 1, 1)))
q2 = User.objects.filter(
id__in=Post.objects
.filter(title__icontains="gmail")
.filter(publish_date__lte=datetime(2009, 1, 1))
.values('author')) Both use a subquery, but q1 relies on a join while q2 is just a subquery. I'd be curious to see if there's any performance difference there. |
Closed by #198. |
How complex can our filters be?
Say I have a Post model:
I want to find users who have an old post about gmail AND a new post about ymail
I don't see how I could currently do this with DRFF? I'd actually like to be able to do much more complex filters too
The text was updated successfully, but these errors were encountered: