Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print the feature names in report.print_tree() #23

Open
ablaom opened this issue Apr 19, 2022 · 5 comments
Open

Print the feature names in report.print_tree() #23

ablaom opened this issue Apr 19, 2022 · 5 comments
Labels
easy good first issue Good for newcomers

Comments

@ablaom
Copy link
Member

ablaom commented Apr 19, 2022

This is actually possible, because DecisionTree.print_tree() has an option to pass the feature names: https://github.com/bensadeghi/DecisionTree.jl/blob/3fcb5b083e9abf45773ad1f22945473a7cc4ef89/src/DecisionTree.jl#L86

cc @roland-KA

@adarshpalaskar1
Copy link
Contributor

Hello, can I work on this issue?

I modified the TreePrinter struct and fit function to include the feature_names parameter.

Running the example from the documentation https://docs.juliahub.com/MLJDecisionTreeInterface/QLzS8/0.2.5/autodocs/#MLJDecisionTreeInterface.DecisionTreeClassifier

Current output:

julia> report(mach).print_tree(3)
Feature 4 < 0.8 ?
├─ 1 : 50/50
└─ Feature 4 < 1.75 ?
    ├─ Feature 3 < 4.95 ?
        ├─
        └─
    └─ Feature 3 < 4.85 ?
        ├─
        └─ 3 : 43/43
julia> report(mach).print_tree(6)
Feature 4 < 0.8 ?
├─ 1 : 50/50
└─ Feature 4 < 1.75 ?
    ├─ Feature 3 < 4.95 ?
        ├─ Feature 4 < 1.65 ?
            ├─ 2 : 47/47
            └─ 3 : 1/1
        └─ Feature 4 < 1.55 ?
            ├─ 3 : 3/3
            └─ 2 : 2/3
    └─ Feature 3 < 4.85 ?
        ├─ Feature 2 < 3.1 ?
            ├─ 3 : 2/2
            └─ 2 : 1/1
        └─ 3 : 43/43

New output:

julia> report(mach).print_tree(3)
Feature 4: "petal_width" < 0.8 ?
├─ 1 : 50/50
└─ Feature 4: "petal_width" < 1.75 ?
    ├─ Feature 3: "petal_length" < 4.95 ?
        ├─
        └─
    └─ Feature 3: "petal_length" < 4.85 ?
        ├─
        └─ 3 : 43/43
julia> report(mach).print_tree(6)
Feature 4: "petal_width" < 0.8 ?
├─ 1 : 50/50
└─ Feature 4: "petal_width" < 1.75 ?
    ├─ Feature 3: "petal_length" < 4.95 ?
        ├─ Feature 4: "petal_width" < 1.65 ?
            ├─ 2 : 47/47
            └─ 3 : 1/1
        └─ Feature 4: "petal_width" < 1.55 ?
            ├─ 3 : 3/3
            └─ 2 : 2/3
    └─ Feature 3: "petal_length" < 4.85 ?
        ├─ Feature 1: "sepal_length" < 5.95 ?
            ├─ 2 : 1/1
            └─ 3 : 2/2
        └─ 3 : 43/43

Is the new output consistent with the required output of this issue? Please let me know if any further changes are required.

@roland-KA
Copy link
Collaborator

This looks good to me with respect to the feature names.

The only strange thing is, that the last part of the decision tree is different in the new output example (Feature 2 < 3.1 vs. Feature 1 < 5.95). This shouldn't be the case if the same data and the same algorithm has been used.

Current output:

└─ Feature 3 < 4.85 ?
        ├─ Feature 2 < 3.1 ?
            ├─ 3 : 2/2
            └─ 2 : 1/1
        └─ 3 : 43/43

New output:

└─ Feature 3: "petal_length" < 4.85 ?
        ├─ Feature 1: "sepal_length" < 5.95 ?
            ├─ 2 : 1/1
            └─ 3 : 2/2
        └─ 3 : 43/43

@adarshpalaskar1
Copy link
Contributor

Yes, I think this is because of tie breaks while selecting the feature. Since both conditions (Feature 2 < 3.1 vs. Feature 1 < 5.95) are giving us the same output,

├─ 3 : 2/2
└─ 2 : 1/1

they have the same entropy/gini index etc. metric scores. In such cases, the algorithm may pick a random feature/ feature that occurred first during the iteration. I think this could be a possible reason for the observed difference.

Also, I re-executed the code for the new output:

julia> report(mach).print_tree(6)
Feature 4: "petal_width" < 0.8 ?
├─ 1 : 50/50
└─ Feature 4: "petal_width" < 1.75 ?
    ├─ Feature 3: "petal_length" < 4.95 ?
        ├─ Feature 4: "petal_width" < 1.65 ?
            ├─ 2 : 47/47
            └─ 3 : 1/1
        └─ Feature 4: "petal_width" < 1.55 ?
            ├─ 3 : 3/3
            └─ 2 : 2/3
    └─ Feature 3: "petal_length" < 4.85 ?
        ├─ Feature 2: "sepal_width" < 3.1 ?
            ├─ 3 : 2/2
            └─ 2 : 1/1
        └─ 3 : 43/43

which is now the same as the current output.

Let me know if it is okay or if I should dig deeper.

@roland-KA
Copy link
Collaborator

Ah, I think that explains the situation well. So everything seems to work perfect! 👍

adarshpalaskar1 added a commit to adarshpalaskar1/MLJDecisionTreeInterface.jl that referenced this issue Feb 20, 2024
This commit addresses issue JuliaAI#23 by modifying TreePrinter struct and fit function to include the feature_names parameter.
@ablaom
Copy link
Member Author

ablaom commented Feb 22, 2024

closed by #54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants