{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "# Decision Trees\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing Needed packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "button": false, "jupyter": { "outputs_hidden": true }, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [], "source": [ "import numpy as np \n", "import pandas as pd\n", "from sklearn import preprocessing\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn import metrics\n", "from sklearn import tree\n", "from six import StringIO\n", "import pydotplus\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "import matplotlib as mpl\n", "mpl.style.use(['ggplot']) \n", "%matplotlib inline " ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "## About the dataset\n", "\n", "Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. \n", "\n", "Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to.\n", "\n", "It is a sample of binary classifier, and you can use the training part of the dataset to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "now, read data using pandas dataframe:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "jupyter": { "outputs_hidden": true }, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Age | \n", "Sex | \n", "BP | \n", "Cholesterol | \n", "Na_to_K | \n", "Drug | \n", "
---|---|---|---|---|---|---|
0 | \n", "23 | \n", "F | \n", "HIGH | \n", "HIGH | \n", "25.355 | \n", "drugY | \n", "
1 | \n", "47 | \n", "M | \n", "LOW | \n", "HIGH | \n", "13.093 | \n", "drugC | \n", "
2 | \n", "47 | \n", "M | \n", "LOW | \n", "HIGH | \n", "10.114 | \n", "drugC | \n", "
3 | \n", "28 | \n", "F | \n", "NORMAL | \n", "HIGH | \n", "7.798 | \n", "drugX | \n", "
4 | \n", "61 | \n", "F | \n", "LOW | \n", "HIGH | \n", "18.043 | \n", "drugY | \n", "