Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust data cleaning script to prune data points outside of LA Neighborhood Districts #1708

Open
4 tasks
ryanfchase opened this issue Apr 12, 2024 · 7 comments · May be fixed by #1744
Open
4 tasks

Adjust data cleaning script to prune data points outside of LA Neighborhood Districts #1708

ryanfchase opened this issue Apr 12, 2024 · 7 comments · May be fixed by #1744
Assignees
Labels
Dependency An issue that includes dependencies p-feature: data Role: Data Science Data management, loading, or analysis Role: Frontend React front end work Size: 2pt Can be done in 7-12 hours

Comments

@ryanfchase
Copy link
Member

ryanfchase commented Apr 12, 2024

Dependency

  • obtain 2024 or most recent Boundaries JSON file

Overview

We need to remove 311 Data service requests on our map that do not fall within the boundaries of a Neighborhood Council since they are inaccessible and mostly confuse the users

Action Items

  • write python script that takes as input a CSV with a subset of 311 dataset (see example dataset below) that identifies 311 requests that do not fall within a Neighborhood Council boundary (provide proof of concept)
  • report on this ticket how many 2024 service requests fall outside NC boundaries
  • incorporate the functionality into our daily hugging face cron job @ updateHfDataset.py

Resources/Instructions

Screenshot of requests outside NC boundaries

image

Useful Links

@ryanfchase
Copy link
Member Author

Adjusting this ticket, most likely @mru-hub will pick this up once there is enough instructions to get started

@ryanfchase ryanfchase removed their assignment Apr 27, 2024
@ryanfchase ryanfchase added the ready for dev lead ready for developer lead to review the issue label Apr 27, 2024
@ryanfchase ryanfchase added the Role: Frontend React front end work label May 8, 2024
@ryanfchase
Copy link
Member Author

ryanfchase commented May 8, 2024

Note: I'm realizing that we may need to do work to prune old data. Adding a check into our cleaning logic will simply stop new data (e.g. requests falling outside NC boundaries) from being added -- we'll still need to handle old data that has the same problem.

Follow up ticket: Make sure we are cleaning 2023 and prior data with the same logic

@ryanfchase ryanfchase added Size: 2pt Can be done in 7-12 hours and removed Size: 1pt Can be done in 6 hours labels May 8, 2024
@ryanfchase
Copy link
Member Author

Note for Ryan: provide an example resource of checking if a Lat/Long is within a provided boundary in Duckdb

@ryanfchase ryanfchase added ready for prioritization and removed ready for dev lead ready for developer lead to review the issue ready for prioritization labels May 8, 2024
@ryanfchase
Copy link
Member Author

This ticket is ready to be picked up

@mru-hub mru-hub self-assigned this May 15, 2024
@mru-hub mru-hub linked a pull request May 24, 2024 that will close this issue
4 tasks
@ryanfchase
Copy link
Member Author

@mru-hub's update from this previous week is on the PR: #1736 (comment)

@ryanfchase
Copy link
Member Author

Latest PR: #1744

@ryanfchase ryanfchase self-assigned this Aug 31, 2024
@ryanfchase ryanfchase added the Dependency An issue that includes dependencies label Aug 31, 2024
@ryanfchase
Copy link
Member Author

Update: added dependency for needing 2024 NC boundary json file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependency An issue that includes dependencies p-feature: data Role: Data Science Data management, loading, or analysis Role: Frontend React front end work Size: 2pt Can be done in 7-12 hours
Projects
Status: Icebox (on hold)
2 participants