-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for POSIX regular expression operators #2236
Comments
We shied away from exposing regex because of ReDoS in the past #539 (comment). Another concern I have: can we ensure a regex op will be a fast operation? According to this so question, an index won't help to speed up a regex operation. |
@steve-chavez The ReDoS issue shouldn't exist anymore since those ancient PostgreSQL versions have long been EOL'd. If there are still people out there using them, I bet there are other syntactical issues when trying to run PostgREST on those. For the indexing you are totally correct as there currently isn't any way to speed them up, but this is true for many other operators. Only the possibility to create an index doesn't mean that it actually exists and even if it does also doesn't imply that the query planner will actually make use of it. So I see no guarantees anywhere, at least not something that PostgREST would be able to hold in practice. In the end, it's all in the operators/DBA's hands to do optimizations or not. We could put a warning into the docs to emphasize the fact that there are no ways to speed up execution of regular expressions using indexes, so when not combined with other filters will always result in a full table scan. |
True, agree.
Hm, what other operators do we expose that cannot be sped up with an index?
Oh, but that would be a problem for every API(that might not need a regex op) since clients could probe this regex operator and deplete the db resources.
If using the regex will always result in an expensive plan, maybe we should apply the idea we've been discussing in #915 (comment). Basically, for operations that are known to be expensive, we run an additional @enote-kane WDYT? |
Hm. How would you protect against that, even with the current operators? As long as a big table is accessible anonymously, you will always have that problem. It's unlikely you'll create indexes for every column and all operators, right? Even more so, because it's a big table... Edit: So basically, I think, the question of whether it's possible to create an index for an operator is orthogonal to the question of DoS attacks. The cost threshold is a way to protect against that. But the attack surface would not really be different, even if we introduced other operators. |
@steve-chavez I generally like the idea with protecting expensive operations with an However, if PostgREST goes this route then it would have to do it for almost any filter, since no filter is "guaranteed" to perform well. Also, statistics may be out-of-date so the planner would chose the wrong plan anyway and that could result into situations where sometimes, requests get rejected (data modification bursts) while most of the time the requests are working just fine and are lightning fast. I think that would be even more confusing for users. I've recently had a situation where a query took 1.5 hours and after So I think the additional |
Hm, but for the regex we'd have the guarantee to always to perform bad(full table scan). The current defenses against slow queries are not perfect but they're something.
All true regarding the EXPLAIN cost, I guess we need to look at other options. For this particular case, should we restrict the regex input so at least we know that an index can speed it up? I just saw this article about pg_trgm.
So perhaps we can forbid left-anchored inputs. @enote-kane Would that still be flexible enough for your use case? Any other ideas wrt ensuring indexes speed up the operation? |
@steve-chavez Just tested it and indeed, adding trgm indexes can convert from a sequential scan into an index scan for regular expression operations (also case-insensitive). Both types seem to work the same here (GIN/GIST). So, we DO have a possible optimization for DBA's. And this is totally suitable for my use case. |
Great news. So for now could you add the non-left-anchored input safeguard to your PR? |
The pg_trgm docs also say:
So we should also ensure there's at least one trigram present? |
@steve-chavez For like/ilke there is no such check as far as I can see and using GIN/GIST indexes makes the left-anchored argument more or less irrelevant. Sadly, the DbStructure doesn't contain index information, so we can't make a guess here. For checking for trigrams: how do you envision that to be implemented? I am really new to Haskell and the internal concepts of PostgREST. Curious to learn more here. |
Oh, we don't have a dedicated module for checking a filter's input(we have never done it) so it would have to be brand new. Probably just in
Right, and I've noted that one can also force a seq scan with LIKE(ref). It also seems complex to make sure the input doesn't trigger a seq scan.
So whatever solution we land on(probably the cost plus some smartness), it should also be applied for LIKE. This can be done in a later enhancement - not a requirement for the regex op. Will review your PR now 👍 |
@steve-chavez Would you mind creating a new issue for the checks (which most likely needs more research)? Then we could close this one as the PR's are merged already. |
@enote-kane Sure, sorry for the delay. Opened #2249. |
Environment
Description of issue
PostgREST should support comparison operators as defined in: https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
It already does support
LIKE
/ILIKE
, so an extension here would be very beneficial.The text was updated successfully, but these errors were encountered: