Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft][Please review] Make Paginator use simpler queries when there are no sql joins used (#8278) #11595

Draft
wants to merge 3 commits into
base: 3.3.x
Choose a base branch
from

Conversation

d-ph
Copy link

@d-ph d-ph commented Sep 5, 2024

May I ask for an initial review of the code in this PR before I move forward with it, please?

This PR implements the scope of work discussed in #8278 (comment). It only implements the "no joins used" case. It doesn't implement the "only ToOne joins used" case. I plan do implement that other case in a future PR, after this one gets reviewed and hopefully merged.

A few comments:

  1. This PR adds quite a few instance variables to the Paginator class, and also adds a few big docblocks that explain things. Please let me know whether this is acceptable.
  2. As discussed in the github issue mentioned above, the feature introduced by this PR is only ever effective if the users explicitly use a certain argument value (namely: $fetchJoinCollection = null). This means that there could be a long period of time before the "auto-detection of better defaults" gets enabled by default (if ever).
  3. The PR lacks unit tests. I plan to add them after I get an initial "ok" for this PR.
  4. I see that this PR says that there are failing unit tests. They don't seem related to this PR's code change. I would look into reproducing them in my machine after the initial code review is done.

Thank you.

@d-ph d-ph changed the title [Draft] Make Paginator use simpler queries when there are no sql joins used (#8278) [Draft][Please review] Make Paginator use simpler queries when there are no sql joins used (#8278) Sep 5, 2024
}

// When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker.
// phpcs:ignore SlevomatCodingStandard.ControlStructures.UselessIfConditionWithReturn.UselessIfCondition
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think it's better not to enforce the failing coding standard on that if-condition because otherwise it's a multiline return abomination.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why the return abomination would be worse than the if abomination.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The way I see it is that this function is a set of if-statements that each check for a very individual use case. If none of the statements "wants to make a decision", then the default "return true;" is returned.
  2. The if-condition for the new if-statement is quite elaborate, so I didn't want it to be the last statement of the function.
  3. The next dev that wants to add more if-statements to this function, would have to "un-return" my if-statement. This argument is my common argument for all phpstan rules that enforce "return statement code style" rules, which often in my mind are "false positives".

These points are semantics + I'm the "guest" here, so I've merged the last two return statements, as per phpstan detection.

@greg0ire greg0ire closed this Sep 5, 2024
@greg0ire greg0ire reopened this Sep 5, 2024
@greg0ire
Copy link
Member

greg0ire commented Sep 5, 2024

Triggering a new build to address 4. , #11596 should fix it.

public function __construct(
Query|QueryBuilder $query,
private readonly bool $fetchJoinCollection = true,
bool|null $fetchJoinCollection = true,
Copy link
Member

@greg0ire greg0ire Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would revert this and instead introduce a static method.

public static function newWithAutoDetection(QueryBuilder $queryBuilder): self
{
        // do the autodetection here
        // use the unaltered constructor here
        
}

That way:

  • You remove the possibility of passing a Query instance and null to the constructor.
  • You don't complexify the constructor that much.
  • You don't rely on null for communicating failure.
  • You can reduce the comments a bit because you don't need to explain how both parameters related to each other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I've introduced the static function as you advised and the constructor returned to its original form in its entirety. Thanks.

}

// When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker.
// phpcs:ignore SlevomatCodingStandard.ControlStructures.UselessIfConditionWithReturn.UselessIfCondition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why the return abomination would be worse than the if abomination.

if (
$forCountQuery
&& $this->queryStyleAutoDetectionEnabled
&& ! $this->fetchJoinCollection === false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't compare boolean variable with boolean constants

Suggested change
&& ! $this->fetchJoinCollection === false
&& $this->fetchJoinCollection

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising to me that I left that in the code, despite reviewing it twice myself. Thanks.

src/Tools/Pagination/Paginator.php Outdated Show resolved Hide resolved
$countQuery->setHint(CountWalker::HINT_DISTINCT, true);
$hintDistinctDefaultTrue = true;

// When not joining onto *ToMany relations, then use a simpler COUNT query in the CountWalker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchJoinCollection is not only about whether we join on a ToMany relation, but whether we include this relation in what get fetched by the ORM. So the condition you use does not match what the comment says.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. It appears I misunderstood the purpose of that variable.

I gave a thought about what to do with it, and I initially concluded that that variable is just misnamed. Consider the following:

  1. The toIterator()'s main if-statement (link) executes the extra "SELECT DISTINCT" subquery only if fetchJoinCollection is true. Otherwise, the original query is executed as-provided (with an OFFSET and LIMIT attached to it). Note that if fetchJoinCollection is supposed to be true only when (a) the original query joins onto a *ToMany relation AND only when (b) the query fetch-loads that relation, then toInterator() would not run the extra "SELECT DISTINCT" subquery when the query does the first without the second, which would produce an incorrect result for the "current pagination page". The result would be incorrect because even though the *ToMany relation is not fetch-loaded, the database would still return (and paginate on) "duplicate records" of the entities that are fetch-loaded (which most likely is only the "root" entity).

In other words, both the following DQL queries require the extra "SELECT DISTINCT" subquery to happen in the toIterator(), otherwise the result is incorrect:

SELECT u FROM User u JOIN u.tags ut;
SELECT u, ut FROM User u JOIN u.tags ut;
  1. The docblocks for the fetchJoinCollection consistently say the following:
/** @param bool $fetchJoinCollection Whether the query joins a collection (true by default). */

// (...)

/**
 * Returns whether the query joins a collection.
 *
 * @return bool Whether the query joins a collection.
 */
public function getFetchJoinCollection(): bool {}

It appears then that the docblocks are under an impression that that variable is "true" regardless of whether a *ToMany relation is fetch-loaded or not.


Do you agree that that instance variable is not named correctly (or the author incorrectly thought that not fetch-loading a *ToMany relation is immune to the "duplicated records being returned from the database" problem)? Would you agree for me to just rename that instance variable to something like: "queryHasToManyJoins"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe it is indeed badly named.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a case where joined collections that are not selected will not impact the count or the iteration: when using SELECT DISTINCT in the query (see SelectClause::$isDistinct)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that I understand: you're trying to say that running the following DQLs without an "outputWalker" would be fine because it would produce the correct results, therefore the literal meaning of fetchJoinCollection is the correct meaning?

SELECT DISTINCT u FROM User u JOIN u.userGroups;
SELECT COUNT(DISTINCT u) FROM User u JOIN u.userGroups;

I agree, however note that in the current code there is nothing that would ensure that DISTINCT is added/present in the toInterator()'s else branch of the main if-else-statement. I.e. what you say is correct, but the current code would still either not work, or work by an extreme fluke (i.e. because someone did add a DISTINCT keyword to their $query prior to using the Paginator). Based on this, I would still say that fetchJoinCollection should just be renamed to reflect the fact that it's actual present meaning is that " whether the query contains ToMany joins (true/false)".

In case there is a legitimate worry that there might be someone in the userspace that does the (undocumented) due diligence and adds the DISTINCT keyword to their queries when they set fetchJoinCollection to false, and we don't want to break that code, then I can introduce a new queryHasToManyJoins instance variable, and use that instead of fetchJoinCollection (in which case, I would also need to find out what I need to default fetchJoinCollection to, but that would not be an unsurmountable problem).

Other that that, I admit that there is a potential future improvement, where useOutputWalkers(forCountQuery: true) could still returns false (i.e. saying that the "simple" count query should be used) even when queryHasToManyJoins is true, however fetchJoinCollection is false, and we know that a COUNT(DISTINCT foo) is going to be used.

So, are you ok for me to rename that instance variable, or would you rather that I should not touch it, and instead introduce a new instance variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a lot to take in for me.

After a substantial amount of thought, I would like to propose and get your opinion on the following.

  1. Rename the $fetchJoinCollection to $query(Potentially)ProducesDuplicateRows: bool.
    1.1. Why: The new name expresses the "root of the problem", and does not go into details about "how it comes about". It also succinctly expresses why pagination is not a trivial task.
    1.2. What it achieves: it allows the end user to correctly express their confidence (or lack of it) in whether their query produces duplicate rows (which necessitate that the Paginator should run more complex queries). In particular: it puts the onus on the end user to ensure that this "declaration" (in particular: when they "declare false") is accurate.
    1.3. What it achieves: In case the end user's query has a ToMany join, but it doesn't select any fields from it (or only selects SQL aggregate functions that use those fields), and the query uses a "SELECT DISTINCT" (or an appropriate GROUP BY), then the end user can succinctly inform the Paginator that their query is safe for the "fast result & count queries" without explaining the minute details about what exactly allows those fast queries to be used.
    1.4. Bonus: It would "fix" what I said somewhere above that currently when $fetchJoinCollection is false, then the toIterator() runs the "fast result query", however there is nothing that indicates that the end user did the due diligence of using SELECT DISTINCT. The new name of $queryProducesDuplicateRows=false would in this case communicate the intended use case (and requirements) to the end user in a more complete way than $fetchJoinCollection.
    1.5. Breaking change handling: the getFetchJoinCollection() getter would be preserved but deprecated. A new appropriate getter would be introduced and recommended as a replacement.
    1.6. Why: This new name would resolve the review comment that was raised by you in relation to line 257 in the PR.
    1.7. Side note: "$fetchJoinCollection" is just a terrible name, as it suggests that the Paginator would do something to the joined collections in the query.

  2. Split $useOutputWalkers into $useResultQueryOutputWalker and $useCountQueryOutputWalker.
    2.1. Why: Apart from what you said, splitting this config variable just makes sense. The two output walkers (i.e. on the result query and the count query) have different reasons to be enabled/disabled.
    2.2. Breaking change handling: the (get|set)UseOutputWalkers() functions would be preserved but deprecated. A new set of functions (4 in total) would be introduced and recommended as a replacement.
    2.3. Downside: There would effectively be one more pair of public functions to (optionally) configure the Paginator. But again: it should be viewed as a long overdue code change, rather than an questionable and optional addition.
    2.4. Benefit: It would allow end users to manually control whether they are confident to run the "fast" count query, without affecting the runtime of the toIterator() function.

  3. The auto-detection/guessing would investigate the given Query and appropriately set the variables mentioned above.
    3.1. The auto-detection would initially not attempt to auto-detect whether a presence of a ToMany join is compatible with the "fast" queries. This is to limit the scope of this PR. It obviously could be attempted in future PRs. As such, as soon as there is a ToMany join detected, the "fast" queries would not be used (which is equivalent to the current (default) behaviour of the Paginator).

Q1. Are you personally ok with the proposition above?


the fast counting query is compatible with toMany joins when using CountWalker::HINT_DISTINCT as it will count distinct values

Q2. Could you please confirm that you meant: is compatible with toMany joins when using CountWalker::HINT_DISTINCT, AND when the query does NOT select any of the toMany entities/fields? Because when the query selects any of the toMany fields, then it will make the CountWalker (i.e. the fast query) not work correctly, regardless of whether CountWalker::HINT_DISTINCT is added or not.

when no fields from the grouping condition is dependant on the ToMany join (i.e. is part of the joined entity or of other entities joined again from that one)

Thanks for mentioning the case of other entities joined again from that one. This is quite important and I did not initially think of it.

[the points about when the walker-based fetching queries and tree-walker fetching queries are supported]

That is good to know, however note that I do not intend to touch the region of the code that makes the decision whether the walker-based or tree-walker fetching query should be used.

[The "fast" result query] is also the only result fetching mode compatible with composite primary keys.

Fair point, however this again is not something that I'm going to change or touch in my PR. Note that if a query uses composite primary keys AND produces duplicate rows (i.e. is not compatible with the "fast" result query), then the Paginator will not able to be used at all, and it's no different to the current situation in the Paginator.

Regarding output walker vs tree walker, the output walker supports more cases (...) but is generally less performant than then tree-walker query when both are supported.

This again is good to know, however I don't intend to auto-detect/guess in this PR which of the two walkers should be used by the result-fetching query.


Could you reply to Q1 and Q2, please, and share any other thought you have about what I said above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good to know, however note that I do not intend to touch the region of the code that makes the decision whether the walker-based or tree-walker fetching query should be used.

your code provides guessing for use*OutputWalker variables, which is precisely about making that decision. So I don't understand what you intend to do if you don't intend to touch that decision.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, are you referring to the following diff in the current PR?

    private function useOutputWalker(Query $query): bool
    {
        if ($this->useOutputWalkers === null) {
            return (bool) $query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) === false;
        }

        return $this->useOutputWalkers;
    }

->

    private function useOutputWalker(Query $query, bool $forCountQuery = false): bool
    {
        if ($this->useOutputWalkers !== null) {
            return $this->useOutputWalkers;
        }

        // When a custom output walker already present, then do not use the Paginator's.
        if ($query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) !== false) {
            return false;
        }

        // When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker.
        return ! (
            $forCountQuery
// (rest of the diff not relevant)
        );
    }

Do we both agree that this code diff does not result in any change to whether the walker-based or the tree-walker result-fetching query should be used?

Or do you refer to the proposition of splitting useOutputWalker into two variables? In that case I can assure you that I do not intend to modify the current logic for the $useResultQueryOutputWalker variable. It could be done in future PRs, most likely by different contributors than me.

Note that I started this PR with only the intention of adding code that "if there are any dql-joins present, then use the fast version of the queries". I believe this alone would immediately address 80% of the use cases in the wild. Anything more than this, I consider a nicety. However, I do agree that going the extra mile and checking for *ToMany dql-joins should be easy enough, so as an exception from my original plan, I'm willing to attempt to make that change in the current PR. Anything other than that is a "less than 5% of use cases" matter in my mind, and although I do appreciate us discussing it, I do not intend to address it in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when $fetchJoinCollection is true (the default), forbidding to use the fast result fetching query, useOutputWalker is what decides between fetching results with an output walker or a tree walker.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when $fetchJoinCollection is true (the default), forbidding to use the fast result fetching query,

I agree.

when $fetchJoinCollection is true (the default) (...), useOutputWalker is what decides between fetching results with an output walker or a tree walker.

I agree.

useOutputWalker is what decides between fetching results with an output walker or a tree walker.

Do we both agree that my code diff does not change the behaviour of useOutputWalker?

when $fetchJoinCollection is true (the default)

Are you trying to say that because the new Paginator::newWithAutoDetection() function in this PR is now going to decide whether the (former) $fetchJoinCollection is true or false, it means that I should pay more attention to when the function decides that $fetchJoinCollection should be "set" to true, because this implies that useOutputWalker is going to be run? To this I would say that I partially see the concern here, however I don't view Paginator::newWithAutoDetection() function as deciding whether the $fetchJoinCollection should be set to true, but rather whether the $fetchJoinCollection should "revert to the current behaviour of the Paginator" (i.e. setting the variable to true). It's not ideal the the default behaviour of Paginator::newWithAutoDetection() could lead to problems in some cases, but likewise the current default behaviour of the Paginator class would lead to the same problems. And for this reason, I do not see why I should go even farther with this PR by making sure that the details about the output/tree walkers in the result-fetching query are taken into the consideration. Please correct me if I'm wrong.

If you're not referring to the consequences of the Paginator::newWithAutoDetection() function, then I truly struggle to see what we are discussing here, and I would appreciate some more help in understanding your standpoint. Why exactly do you believe I should concern myself with the the output/tree walkers in the result-fetching query in this PR?


// When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker.
return ! (
$forCountQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a separate detection for count query and result query, I would rather extract the check for $forCountQuery in a separate query (as maybe we will also have smarter detection for the case of the result query in the future)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said:

extract the check for $forCountQuery in a separate query

Did you mean to say "extract (...) in a separate function"? In other words, are you proposing splitting the useOutputWalker() into useResultOutputWalker() and useCountOutputWalker()?

Copy link
Member

@stof stof Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, I meant a separate condition (no need for a separate function which would force to duplicate the previous logic). We could even factor out the auto-detection check:

This would look like that:

// ...

if (!$this->queryStyleAutoDetectionEnabled) {
    return true; // Old simple guessing
}

if ($forCountQuery) {
    return /* condition for the support of CountWalker */;
}

return /* condition for the support of LimitSubqueryWalker */;

public static function newWithAutoDetection(QueryBuilder $queryBuilder): self
{
/** @var array<string, Join[]> $joinsPerRootAlias */
$joinsPerRootAlias = $queryBuilder->getDQLPart('join');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to also support Query by analyzing the DQL AST instead of the QueryBuilder ? this would make this method more useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would prefer to support Query in addition to QueryBuilder, not instead of. This is because constructing an AST from a Query's DQL is not instantaneous, while checking for presence of DQL parts in the QueryBuilder is.
  2. My main use case for the Paginator (and I would assume that of a lot of other real-world usages too), is to implement CRUD listings that support filtering/conditions. The conditions are only added when selected by the user in the website gui. Since the conditions are added in a "conditional manner", the DQL query that retrieves the records for the CRUD listing is often built incrementally via a QueryBuilder and a bunch of if-statements (in one form or another -- I actually use a Symfony Bundle for the "bunch of if-statement" part). My point being: supporting the instances of Query is not my personal main priority. I could look into doing it, if I have the time after getting the "initial approval" for this PR (and I still have the unit tests to write).
  3. Supporting Query|QueryBuilder is definitely an option.

Are you ok with this new function's supporting only instances of QueryBuilder if I decide that I ran out of time for this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-ph Supporting QueryBuilder in addition to Query would then work the same than what is done in the constructor: calling getQuery on it and then dealing with the Query.

I'm not asking for the public API to drop support for QueryBuilder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting queries is important to support cases using Query::setFetchMode to trigger eager loading of some toOne relations (which would load them in batch in a single query instead of doing N lazy-loading queries).
In my experience, this can result in better performance in some cases compared to fetch-joining the relation in the paginated query (assuming the paginated query does not already join it for filtering purposes of course):

  • if the toOne relation is nullable and is often null
  • if you have multiple toOne relations pointing to the same entity (the batch loading is per target entity, not per relation, so they would be grouped)
  • if the target entity has a non-negligeable probably to refer to an instance that is already initialized in the identity map (if the current page only contains values using null or an initialized entity, the eager loading won't need to perform a second query at all).

The QueryBuilder cannot configuring the fetch mode of relations. So even when using the QueryBuilder (which I often do), I often have cases where I pass a Query to the Paginator instead of passing the QueryBuilder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting QueryBuilder in addition to Query would then work the same than what is done in the constructor: calling getQuery on it and then dealing with the Query.

What about the comment I raised in point 1. above about the cost of constructing a Query's AST?

My intention was to at some point add the support for Query, but in a form where there would be separate routines for inspecting the provided instance of Query or QueryBuilder, so that the AST construction is avoided when a QueryBuilder is provided. Something along the following lines:

$sqlJoinsPresent = false;

if ($queryOrQueryBuilder instance of Query) {
  $ast = $queryOrQueryBuilder->getAST();
  // inspect the AST
  $sqlJoinsPresent = true;
} else {
  // do not construct any ASTs
  // use QueryBuilder::getPart(partName) to inspect the query 
  $sqlJoinsPresent = true;
}

Are you saying that you'd like to disregard the fact that constructing an AST is not a cheap operation (and which depends on the complexity of the query)?


The QueryBuilder cannot configure the fetch mode of relations.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could accept a fast path introspecting the QueryBuilder instead of parsing the Query as an optimization if it has indeed an impact. However, it would have to support the same capabilities. See my other comment related to the detection of joining collections rather than any joins. Maybe the QueryBuilder introspection would not be viable for that.

However, my personal opinion would be that we should not merge this without support for Query as it would prevent downstream projects from migrating to it (for instance, the Pagerfanta DoctrineOrmAdapter uses this Paginator and currently supports Query|QueryBuilder).
Note that this is only my own opinion and the ORM maintainers might take a different decision (I'm maintainer on DoctrineBundle but not on the ORM itself, even though I regularly help with reviews here as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stof when there are multiple to-one associations and the eager loading fetch mode is used, does this cause multiple result rows for a single entity and this is something where we need to be careful with regard to the goal of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to-one associations don't create multiple rows in the SQL results. Only toMany relations do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, even when there are multiple of them 🤦🏻‍♂️

@@ -38,6 +40,16 @@ class Paginator implements Countable, IteratorAggregate
private readonly Query $query;
private bool|null $useOutputWalkers = null;
private int|null $count = null;
/**
* @var bool The auto-detection of queries style was added a lot later to this class, and this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @var bool The auto-detection of queries style was added a lot later to this class, and this
* The auto-detection of queries style was added a lot later to this class, and this

For a property phpdoc, the description is just the phpdoc description, not something attached to the tag (and then, we don't need @var at all as there is a native type).

/** @var array<string, Join[]> $joinsPerRootAlias */
$joinsPerRootAlias = $queryBuilder->getDQLPart('join');

// For now, only check whether there are any sql joins present in the query. Note,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to implement the proper detection of joining collections.

Taking the code of my work project, almost all my paginated queries involve some toOne joins but the majority of them don't join a collection (making them compatible with the fast queries, which would then mean that switching to auto-detection would be a performance regression right now).


// For now, only check whether there are any sql joins present in the query. Note,
// however, that it is doable to detect a presence of only *ToOne joins via the access to
// joined entity classes' metadata (see: QueryBuilder::getEntityManager()->getClassMetadata(className)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing this analysis on a QueryBuilder will be complex, as it has to account for joins done on previous joins and not on the root entity, replicating work already done by the query parser as part of the AST.

$countQuery->setHint(CountWalker::HINT_DISTINCT, true);
$hintDistinctDefaultTrue = true;

// When not joining onto *ToMany relations, then use a simpler COUNT query in the CountWalker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a case where joined collections that are not selected will not impact the count or the iteration: when using SELECT DISTINCT in the query (see SelectClause::$isDistinct)

public static function newWithAutoDetection(QueryBuilder $queryBuilder): self
{
/** @var array<string, Join[]> $joinsPerRootAlias */
$joinsPerRootAlias = $queryBuilder->getDQLPart('join');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could accept a fast path introspecting the QueryBuilder instead of parsing the Query as an optimization if it has indeed an impact. However, it would have to support the same capabilities. See my other comment related to the detection of joining collections rather than any joins. Maybe the QueryBuilder introspection would not be viable for that.

However, my personal opinion would be that we should not merge this without support for Query as it would prevent downstream projects from migrating to it (for instance, the Pagerfanta DoctrineOrmAdapter uses this Paginator and currently supports Query|QueryBuilder).
Note that this is only my own opinion and the ORM maintainers might take a different decision (I'm maintainer on DoctrineBundle but not on the ORM itself, even though I regularly help with reviews here as well)

@d-ph
Copy link
Author

d-ph commented Oct 1, 2024

@greg0ire

Hi,

While I wait for some final answers from stof to my (regrettably) wall of text, may I know your initial thoughts on my proposition to rename some instance variables in the Paginator class, please?

I explained it in more details in the discussion above, but the bottom line is:

  1. Rename Paginator::$fetchJoinCollection to something along the following line: $query(Potentially)ProducesDuplicateRows(ForTheRootAlias). It's just a change of a variable name, because the "semantics" of that variable would remain the same (i.e. it being a boolean, defaulted to "true"). The potentially undesired consequence is that the Paginator::getFetchJoinCollection() would have to remain in the class, and be deprecated (a new getter would also be added as a replacement).

  2. Split Paginator::$useOutputWalkers into two variables: $useResultQueryOutputWalker and $useCountQueryOutputWalker. The potentially undesired consequence is that, again, the current getter and setter for that instance variable would be deprecated, and 4 new instance functions would be added: getters and setters for the 2 new variables. I'm not totally crazy about it myself, but it is what it is.


Would you say that you would initially allow for the changes described above to be made to the Paginator by me?

Thanks.

@greg0ire
Copy link
Member

greg0ire commented Oct 1, 2024

  1. If fetchJoinCollection is truly inaccurate, then yes, I'd be open to renaming it, but maybe to something shorter like $queryProducesDuplicates. The ensuing deprecation is no big deal IMO.
  2. Nothing to comment here, I think it's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants