- Download KNIME (Links to an external site.) to complete this assignment.
- Before you start working on this assignment, study the KNIME Quickstart Guide (Links to an external site.) to get familiar with the tool.
- Download the file adult.csv available in the data folder on the KNIME Hub. The data are provided by the UCI Machine Learning Repository.
- Create a new workflow and name it as: your_firstname_lastname_decision_tree.
- In your workflow, train a Decision Tree to predict whether or not a person earns more than $50K per year:
- Partition the dataset into a training set (75%) and a test set (25%). In your Partitioning node, apply stratified sampling option on the income column.
- Train a Decision Tree model on the training set, and apply the model to the test set.
- Use the Scorer node to evaluate the accuracy of the model.
- Change the label of each node from the default (such as “Node 1”) to a brief description (e.g. “Read data from adults.csv”).
- Try out other parameter settings to get a higher accuracy. For example, change the quality measure, pruning method, or minimum number of records.
- When you are satisfied, export your workflow project and change the file name to include your full name. e.g. “potter_henry_decision_tree.knwf.”
- Research and prepare answers to the following questions in a Word or pdf file:
- What’s the purpose of applying stratified sampling option on the income column?
- What’s the purpose of pruning and minimum number of records?
- How did you changed parameter settings to get a higher accuracy?