-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing taxonomy python notebook to script #245
base: main
Are you sure you want to change the base?
Conversation
Thanks @park2454! It looks like there are some lint errors related to unused imports in the code. Did you run pre-commit on your changes? If not, try:
Then fix the errors, and commit/push again. |
Hi Brian,
Thank you for the suggestions! I will try that and recommit once they are
fixed.
Thank you,
Sungmin
…On Mon, Feb 6, 2023 at 2:24 PM Brian Healy ***@***.***> wrote:
Thanks @park2454 <https://github.com/park2454>! It looks like there are
some lint errors related to unused imports in the code. Did you run
pre-commit on your changes? If not, try:
pre-commit install
pre-commit run --files tools/missing_taxonomy.py
Then fix the errors, and commit/push again.
—
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFILAKA7WXYQSTEUMF7X22DWWFMXLANCNFSM6AAAAAAUTCPYPI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping to make this notebook become part of the codebase! Please see the comments below for recommended changes.
parser.add_argument( | ||
"-merge_features", | ||
type=bool, | ||
nargs='?', | ||
const=True, | ||
default=False, | ||
help="merge downloaded results with features from Kowalski", | ||
) | ||
parser.add_argument( | ||
"-features_catalog", | ||
type=str, | ||
default='ZTF_source_features_DR5', | ||
help="catalog of features on Kowalski", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These arguments are not used by the code above, so they can be removed.
# Read in golden dataset (downloaded from Fritz), mapper | ||
parquet_path = os.path.join(os.path.dirname(__file__), parquet) | ||
mapper_path = os.path.join(os.path.dirname(__file__), mapper) | ||
output_path = os.path.join(os.path.dirname(__file__), "golden_missing_labels.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to allow the user to customize the name of this output file using another argument.
gold_map = gold_map.reset_index(drop=False).set_index('fritz_label') | ||
gold_dict = gold_map.transpose().to_dict() | ||
|
||
labels_gold = gold_new.set_index('obj_id')[gold_new.columns[1:54]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columns corresponding to the labels may not always be 1:54. Perhaps we could use the mapper's keys or the config file to generate a list of classifications?
tools/missing_taxonomy.py
Outdated
import matplotlib.pyplot as plt | ||
from collections import Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these imports - while the notebook used them, this code does not.
@@ -0,0 +1,113 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above the first import, add this commented line: #!/usr/bin/env python
This allows users to run the script using ./missing_taxonomy.py
in addition to python missing_taxonomy.py
.
missing_taxonomy.py creates a csv file for objects with missing labels based on Brian(@bfhealy)'s jupyter notebook. A classification feature parquet and a dataset mapper json file are required as input and the output are stored in "golden_missing_label.csv".