Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #118 - [Performance] Excel-like filter for huge data sets #119

Merged
merged 2 commits into from
Sep 30, 2024

Conversation

fipro78
Copy link
Contributor

@fipro78 fipro78 commented Sep 27, 2024

No description provided.

Switched from IntList to IntSet in NatCombo#select(int[])

Signed-off-by: Dirk Fauth <[email protected]>
@fipro78 fipro78 merged commit e22407a into master Sep 30, 2024
2 checks passed
Copy link

@mmnze mmnze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I found a new one, I'd like to add:
ComboBoxFilterUtils.isAllSelected, l. 69. Temporarily putting the dataCollection into a set should do the trick (containsAll calls contains on this, not the provided collection).

@@ -264,7 +264,7 @@ && getComboBoxDataProvider() instanceof FilterRowComboBoxDataProvider
// available items
List<?> allValues = ((FilterRowComboBoxDataProvider) getComboBoxDataProvider()).getAllValues(getColumnIndex());
List<?> visibleValues = getComboBoxDataProvider().getValues(getColumnIndex(), getRowIndex());
List<?> diffValues = new ArrayList<>(allValues);
HashSet<?> diffValues = new HashSet<>(allValues);
diffValues.removeAll(visibleValues);
Copy link

@mmnze mmnze Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized: This does not actually solve the issue:
The expensive contains operation within the removeAll is not executed on "this" (ie. diffValues) but the method argument, an ArrayList in this case. See HashSet, l. 175 (Java 11).
Sorry, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I changed from ArrayList to HashSet I tested locally with this code:

    @Test
    void testListPerformance() {
        ArrayList<String> all = new ArrayList<>();
        ArrayList<String> flanders = new ArrayList<>();

        for (int i = 0; i < 100_000; i++) {
            all.add("Simpson_" + i);
            all.add("Flanders_" + i);
            flanders.add("Flanders_" + i);
        }

        long start = System.currentTimeMillis();

        ArrayList<String> diff = new ArrayList<>(all);
        diff.removeAll(flanders);

        long end = System.currentTimeMillis();

        System.out.println("ArrayList#removeAll(ArrayList) - " + (end - start));

        start = System.currentTimeMillis();

        HashSet<String> diffSet = new HashSet<>(all);
        diffSet.removeAll(flanders);

        end = System.currentTimeMillis();

        System.out.println("HashSet#removeAll(ArrayList) - " + (end - start));

        start = System.currentTimeMillis();

        HashSet<String> diffSetSet = new HashSet<>(all);
        diffSetSet.removeAll(new HashSet<>(flanders));

        end = System.currentTimeMillis();

        System.out.println("HashSet#removeAll(HashSet) - " + (end - start));
    }

If diff is a List, the operation takes about 60s. The diffSet variant takes around 30ms. I would say that the test is similar to the code in discussion. Or what am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I found out why my test was working. It strongly depends on the size of the collection passed as parameter. As long as the collection is smaller than the "this" collection, things work fast. If the collection is bigger, the performance is a mess.

I adjusted the collection creation to verify it:

        for (int i = 0; i < 100_000; i++) {
            all.add("Simpson_" + i);
            all.add("Flanders_" + i);
            flanders.add("Flanders_" + i);
            flanders.add("FlandersO_" + i);
        }

        for (int i = 0; i < 10_000; i++) {
            flanders.add("FlandersXxx_" + i);
        }

Now the HashSet#removeAll(ArrayList) takes 123s while HashSet#removeAll(HashSet) is still at about 30ms.

@mmnze
Copy link

mmnze commented Oct 1, 2024

And some more (I just searched "systematically" through the code looking for potentially expensive collection operations in the filter code instead of stopping a running program if it took too long ;-) ):

  • org.eclipse.nebula.widgets.nattable.filterrow.combobox.FilterRowComboBoxCellEditor.setCanonicalValue(Object), l. 273ff (after the code we already talked about). The size of the collections involved should be much smaller here, but we probably can apply the "List to Set"-performance fix here as well
  • org.eclipse.nebula.widgets.nattable.filterrow.combobox.FilterRowComboBoxDataProvider.doCommand(ILayer, UpdateDataCommand), l. 1085, 1094, 1095; probably the same idea (
  • org.eclipse.nebula.widgets.nattable.extension.glazedlists.filterrow.ComboBoxFilterRowHeaderComposite.handleEvent(FilterRowComboUpdateEvent), l. 907ff
  • org.eclipse.nebula.widgets.nattable.filterrow.combobox.FilterRowComboBoxDataProvider.buildUpdateEvent(FilterRowComboUpdateEvent, int, List, List), l. 764ff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants