GitHub - g-eorge/CCPDS-02: Cloudera Data Scientist Challenge

Cloudera Certified Professional Data Scientist

Challenge 2 - Anomaly Detection

This is the source code for my submission. In order to run the tools, the data files (not included) need to be placed in the data directory.

In general, with the exception of part three, each sub-question has a shell script that can be used to produce the solution files. For example part1a.sh. Intermediate data is created in solution/.tmp.

Data files

The data files that are provided as part of the challenge need to be placed in the data directory (PCDR2011.ZIP, PNTDUMP.ZIP and REVIEW.ZIP). There are scripts in the 0-exploring directory for extracting and processing these files.

Requirements

I used Mac OS X to develop and test all of the scripts. A Hadoop MapReduce and Hive environment is required to extract and transform the patient data, I used Amazon EMR for this. A recent JDK is required to build the Hadoop job in tools/medicare. A Spark 1.x cluster is required for part three. Other scripts depend on R or Python.

Challenge Instructions

See Challenge Instructions.pdf for the original questions to be answered and other challenge information.

Solution Abstract

See Solution Abstract.pdf for more details on my solution.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
solution		solution
.gitignore		.gitignore
Challenge Instructions.pdf		Challenge Instructions.pdf
Readme.md		Readme.md
Solution Abstract.pdf		Solution Abstract.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudera Certified Professional Data Scientist

Challenge 2 - Anomaly Detection

Data files

Requirements

Challenge Instructions

Solution Abstract

About

Releases

Packages

Languages

g-eorge/CCPDS-02

Folders and files

Latest commit

History

Repository files navigation

Cloudera Certified Professional Data Scientist

Challenge 2 - Anomaly Detection

Data files

Requirements

Challenge Instructions

Solution Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages