Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.
/ ct-cabnyc Public archive

NYC Cab Data Science and Machine Learning Test

Notifications You must be signed in to change notification settings

pagerinc/ct-cabnyc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Commuting to Work: Finding a place to live in NYC

The TLC releases monthly trip record data including fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. It can be found here: (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml)

Using data from 2016, please build a model that would help a prospective employee determine the best place to live in 2017 based on the following:

Considerations

  1. The employee is moving to New York to start a new job at 1 Irving Pl, New York, NY 10003
  2. The employee wants to live in one of the following areas: a) +/- 2 blocks around Lincoln Centre, b) Sutton Place or c) Two Bridges.
  3. The employee prefers an efficient commute (she doesn't like to ride her bike or take the subway). Her employer pays for yellow cab rides.
  4. She aims to get into work before 9:00 AM, and leave around 6:00 PM.

*** (bonus) factor in real estate prices. Her budget is around 2500 USD.

Deliverable:

  • A data science model that predicts the best place where the employee should live if she wants an efficient commute (based on her commute times), based on her three preferences.
  • Clear assumptions on how efficiency is defined.
  • Visualization of the sample data method used to compute commute times. Which statistical methods did you use? Be sure to document your assumptions and thought process.
  • A report that details your process of experimenting and building the above.

Submission Guidelines

In your report, be sure to include answers to the following:

  • Where should the candidate live and why? What's the commute time from that location?
  • Which data science problem are you tackling?
  • Which features do you find more relevant? Why?
  • Which subset of the data are you using? Which (if any) sampling methods did you apply?
  • We are very interested in your thought process, assumptions, and design decisions. Please document them in your report.

Submission Rules

  • The time limit for this challenge is 72 hours. You can use whichever programming languages or stack you feel most comfortable with.
  • Please submit a PR to this repository, with the code that you have produced and the report on your process.
  • Your solution should be functional, and we should be able to reproduce the results in your report.

Honor code

As data scientists, an invaluable part of our skill set is knowing how to effectively Google our problems and bugs. As such, it is OK for you to use resources on the Internet for this challenge. We only ask you to refrain from doing two things:

  1. Copying and pasting code samples from the Internet and presenting them as your own work. This would be considered plagiarism and disqualify you immediately.
  2. Googling anything specific to this dataset. Please treat the dataset as if it is novel and unique to you.

Contact/Progress

VERY IMPORTANT: Don't hesitate to contact us along the way and update us on your progress, so we can provide feedback on your direction.

About

NYC Cab Data Science and Machine Learning Test

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published