Data Mining and Milwaukee: Mining Community Health Risk Factors

About This Presentation

Title:

Description:

Number of Views:48

Avg rating:3.0/5.0

Slides: 18

Provided by: itsl6

Category:

more less

Transcript and Presenter's Notes

Title: Data Mining and Milwaukee: Mining Community Health Risk Factors

1
Data Mining and MilwaukeeMining Community
Health Risk Factors

2
Goal of the Project

3
Data Mining Tasks

Classification
Build a good classifier which predicts the
overall health status of the individuals
Feature Selection
Identify the meaningful attributes that impact
the overall health status

4
Data Transformation

1200 instances
275 attributes
Cleaned and consolidated attributes to 94
Eliminated nulls
Consolidated attributes that separated out based
on gender
Eliminated attributes that are irrelevant (Zip
code, state, city)
Converted the SPSS file into Excel format for
clean up
Cleaned data was loaded into an Access database
Helps for easy loading in WEKA
Modified the WEKAs jar file to add the new
database connectivity in the jdbc connection file

5
Methods

Methods
Split data set into 2/3 and 1/3 sets (800
instances training set and 400 test set)
Classification
ZeroR model
OneR Model
Decision tree( j48 Algorithm)
Naïve Bayesian

6
Baseline Models

7
Model Interpretation(Before Feature Selection)
8
Attribute Selection

9
Information Gain

10
Relief-F

11
Model Interpretation(After Feature Selection)
12
Principal Components

Previously the analysis would not stop
The reason WEKA did not stop was due to out of
memory errors
Modified the runweka.bat file to instruct the
virtual machine to use more system memory by
adding XmemX 1000M
WEKA completes, but we have not had time to
interpret the results.

13
Where do we go from here?