{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SVM (Support Vector Machines)\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing Needed packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn import preprocessing, svm\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import classification_report, confusion_matrix, f1_score\n", "import itertools\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline " ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "## Load the Cancer data\n", "\n", "The example is based on a dataset that is publicly available from the UCI Machine Learning Repository (Asuncion and Newman, 2007) http://mlearn.ics.uci.edu/MLRepository.html. The dataset consists of several hundred human cell sample records, each of which contains the values of a set of cell characteristics. The fields in each record are:\n", "\n", "|Field name|Description|\n", "|--- |--- |\n", "|ID|Clump thickness|\n", "|Clump|Clump thickness|\n", "|UnifSize|Uniformity of cell size|\n", "|UnifShape|Uniformity of cell shape|\n", "|MargAdh|Marginal adhesion|\n", "|SingEpiSize|Single epithelial cell size|\n", "|BareNuc|Bare nuclei|\n", "|BlandChrom|Bland chromatin|\n", "|NormNucl|Normal nucleoli|\n", "|Mit|Mitoses|\n", "|Class|Benign or malignant|\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### Load Data From CSV File " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDClumpUnifSizeUnifShapeMargAdhSingEpiSizeBareNucBlandChromNormNuclMitClass
010000255111213112
1100294554457103212
210154253111223112
310162776881343712
410170234113213112
\n", "
" ], "text/plain": [ " ID Clump UnifSize UnifShape MargAdh SingEpiSize BareNuc \\\n", "0 1000025 5 1 1 1 2 1 \n", "1 1002945 5 4 4 5 7 10 \n", "2 1015425 3 1 1 1 2 2 \n", "3 1016277 6 8 8 1 3 4 \n", "4 1017023 4 1 1 3 2 1 \n", "\n", " BlandChrom NormNucl Mit Class \n", "0 3 1 1 2 \n", "1 3 2 1 2 \n", "2 3 1 1 2 \n", "3 3 7 1 2 \n", "4 3 1 1 2 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cell_df = pd.read_csv(\"cell_samples.csv\")\n", "cell_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ID field contains the patient identifiers. The characteristics of the cell samples from each patient are contained in fields Clump to Mit. The values are graded from 1 to 10, with 1 being the closest to benign.\n", "\n", "The Class field contains the diagnosis, as confirmed by separate medical procedures, as to whether the samples are benign (value = 2) or malignant (value = 4).\n", "\n", "Lets look at the distribution of the classes based on Clump thickness and Uniformity of cell size:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAhKklEQVR4nO3df3Rc9Xnn8fdjS4pHYInEFq2LWY2TxUUCE4GFjovdYn4Y5xwcIBzajQO74Ci42yInKSGBpCf8yGlzUpJNSEObXTfCuEktSJ1ACCVgnMIGKIksYweMpphSxsaNWo8FqyaMiCTr2T9mJGvkH/o1c++V7ud1js7VPB7Nffydq4+uvjO6X3N3REQkPmaF3YCIiARLwS8iEjMKfhGRmFHwi4jEjIJfRCRmysJuYDzmz5/vyWQy7DZERKaVnTt3HnL3mtH1aRH8yWSSjo6OsNsQEZlWzGzfseqa6hERiRkFv4hIzCj4RURiZlrM8R9Lf38/Bw4c4J133gm7lRllzpw5LFy4kPLy8rBbEZESmbbBf+DAAebOnUsymcTMwm5nRnB3uru7OXDgAIsWLQq7HREpkZJN9ZjZfWZ20Mz2jKi9x8yeNLNX89t3T/bx33nnHebNm6fQLyIzY968ebH4LSqTybJjRxeZTDbUPlKpbjZv3kMq1R1qH1ERlfGIyvHx6KOv8bGPPcGjj75W1Mct5Rn//cC9wN+OqN0G/Njdv2Rmt+Vv3zrZHSj0iy8OY9rWlqK5+QkqKmbR1zdIa+tq1q6tC7yPDRu2c++9u4dvt7Q08I1vXBp4H1ERlfGIyvGxZMkm9uzJ/QBsbX2JJUvm8eKL64ry2CU743f3nwBvjipfCWzOf74ZuKpU+xc5lkwmS3PzE/T2DtDT00dv7wDNzU8EfmaXSnUXhBzAvffuDv1MNyxRGY+oHB+PPvracOgPeeml7qKd+Qf9rp7fcPcugPz21OPd0czWm1mHmXVkMpnAGgzK008/zZo1awB45JFH+NKXvhTYvnfv3s1jjz0W2P6iJJ3uoaKi8LAvL59FOt0TaB/t7V0Tqs90URmPqBwfDz/8LxOqT1Rk387p7hvdvdHdG2tqjvqL4xnliiuu4Lbbbgtsf3EO/mSymr6+wYJaf/8gyWR1oH00NS2YUH2mi8p4ROX4uOqq/zqh+kQFHfz/YWYLAPLbg0HuvNgv2KTTac4880w+9rGPcfbZZ3Pttdeyfft2li9fzhlnnEF7ezvt7e1ccMEFnHvuuVxwwQW88sorRz3O/fffT0tLCwCvvfYay5Yt4/zzz+f222/n5JNPBnK/IaxcuZJrrrmGM888k2uvvZah1dO+8IUvcP7553P22Wezfv364frKlSu59dZbaWpqYvHixTzzzDP09fVx++238+CDD9LQ0MCDDz5YlLGYLmpqKmltXU0iUUZVVQWJRBmtraupqakMtI+6unm0tDQU1FpaGqirmxdoH1ERlfGIyvGxZs37WLKk8P++ZMk81qx5X3F24O4l+wCSwJ4Rt78M3Jb//Dbg7vE8ztKlS320zs7Oo2onsmVLpycSX/Pq6q97IvE137JlYl9/LK+//rrPnj3bX3zxRT98+LCfd955vm7dOh8cHPSHH37Yr7zySu/p6fH+/n53d3/yySf96quvdnf3p556yi+//HJ3d9+0aZPfdNNN7u5++eWX+5YtW9zd/Zvf/KafdNJJw/evqqryN954ww8fPuzLli3zZ555xt3du7u7h3u67rrr/JFHHnF39wsvvNBvvvlmd3f/h3/4B7/kkkuO2t+xTHRsp6ODB9/29vZf+MGDb4faR2fnIb///pe8s/NQqH1ERVTGIyrHxw9/+C/e3Py4//CH/zKprwc6/BiZWrJ39ZhZG7ASmG9mB4A7gC8B3zWzZmA/8Pul2v9II1+w6e3N1Zqbn+DSS2un/JN80aJFLFmyBICzzjqLSy65BDNjyZIlpNNpenp6uP7663n11VcxM/r7+0/4eM8//zwPP/wwAB/5yEe45ZZbhv+tqamJhQsXAtDQ0EA6nWbFihU89dRT3H333WSzWd58803OOussPvjBDwJw9dVXA7B06VLS6fSU/q8zSU1NZeBnccdSVzcvtmf5xxKV8YjK8bFmzfuKd5Y/QsmC393XHuefLinVPo9n6AWbodCHIy/YTPXJfde73jX8+axZs4Zvz5o1i4GBAT7/+c9z0UUX8dBDD5FOp1m5cmVR9jV79mwGBgZ45513+OM//mM6Ojo4/fTTufPOOwvehz/0NUP3FxGJ7Iu7xRTmCzY9PT2cdtppQG4ufyzLli3je9/7HgAPPPDAmPcfCvn58+fzq1/9iq1bt475NXPnzuWXv/zlmPcTkZkpFsEf5gs2n/nMZ/jsZz/L8uXLOXz48Jj3v+eee/jqV79KU1MTXV1dVFef+IfTKaecwo033siSJUu46qqrOP/888fcx0UXXURnZ2csX9wVETDPvwMkyhobG330QiypVIq6uon9NV0mkyWd7iGZrI7E/N2xZLNZEokEZsYDDzxAW1sbP/jBDwLtYTJjKyLRY2Y73b1xdH3aXqRtMqLygs2J7Ny5k5aWFtydU045hfvuuy/slkRkholV8E8Hv/u7v8vPf/7zsNsQkRksFnP8IiJyhIJfRCRmFPwiIjGj4BcRiRkF/xSk02nOPvvsKT9OR0cHH//4x4vQkYjI2PSunghobGyksfGot9qKiJREzM74M8CO/LY4BgYGuP766znnnHO45ppryGaz7Ny5kwsvvJClS5eyevVqurpyi0kc6zLJULgoSyaTYdWqVZx33nn84R/+IbW1tRw6dIh0Ok1dXR033ngjZ511Fpdddhm9Iy8+JCIyTjEK/jagFliV37YV5VFfeeUV1q9fz4svvkhVVRV/9Vd/xYYNG9i6dSs7d+7kox/9KH/6p386fP+BgQHa29u55557uOuuu456vLvuuouLL76YF154gQ996EPs379/+N9effVVbrrpJl5++WVOOeWU4Wv6iIhMREymejJAM9Cb/yB/+1Jgaqt7nX766SxfvhyA6667ji9+8Yvs2bOHVatWAXD48GEWLDiyitBYl0l+9tlneeihhwD4wAc+wLvf/e7hf1u0aBENDQ0n/HoRkbHEJPjTQAVHQh+gPF+fWvCbWcHtuXPnctZZZ/H8888f8/5jXSb5RNdOGn1ZZk31iMhkxGSqJwn0jar15+tTs3///uGQb2trY9myZWQymeFaf38/L7/88rgfb8WKFXz3u98FYNu2bbz11ltT7lFEZKSYBH8N0AokgKr8tpWpnu0D1NXVsXnzZs455xzefPPN4fn9W2+9lfe///00NDTwT//0T+N+vDvuuINt27Zx3nnn8aMf/YgFCxYwd+7cKfcpIjIkVpdlzs31p8md6U899Evh17/+NbNnz6asrIznn3+eP/qjP2L37t2B9qDLMovMDLosM5AL+2gG/pD9+/fzB3/wBwwODlJRUcHf/M3fhN2SiMwwMQv+6DvjjDPYtWtX2G2IyAw2ref4p8M01XSjMRWZ+aZt8M+ZM4fu7m4FVRG5O93d3cyZMyfsVkSkhKbtVM/ChQs5cOAAmUzxLr8guR+oCxcuDLsNESmhaRv85eXlLFq0KOw2RESmnWk71SMiIpOj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMyEEvxm9idm9rKZ7TGzNjPT5SAlpjLAjvw2xC4yWXbs6CKTyYbahwQj8OA3s9OAjwON7n42MBv4cNB9iISvDagFVuW3beF00ZaitnYjq1b9PbW1G2lrS4XShwQnrKmeMiBhZmVAJfCLkPoQCUkGaAZ6gZ78tpmgz/wzmSzNzU/Q2ztAT08fvb0DNDc/oTP/GS7w4Hf3fwO+AuwHuoAed982+n5mtt7MOsysQ9fcl5knDVSMqpXn6wF2ke6hoqIwBsrLZ5FO9wTahwQrjKmedwNXAouA3wJOMrPrRt/P3Te6e6O7N9bURHuBdJGJSwJ9o2r9+XqAXSSr6esbLOyif5BksjrQPiRYYUz1XAq87u4Zd+8Hvg9cEEIfIiGqAVqBBFCV37bm6wF2UVNJa+tqEokyqqoqSCTKaG1dTU1NZaB9SLDCWIFrP7DMzCrJTWxeAnSE0IdIyNaSOw9KkzvTD+c327Vr67j00lrS6R6SyWqFfgwEHvzu/jMz2wq8AAwAu4CNQfchEg01hBX4BV3UVCrwYySUNXfd/Q7gjjD2LSISd/rLXRGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBX+MZDJZduzo0nqqElk6RgulUt1s3ryHVKq7qI8bymWZJXhtbSmam5+gomIWfX2DtLauZu3aurDbEhmmY7TQhg3buffe3cO3W1oa+MY3Li3KY5u7F+WBSqmxsdE7OrRI12RlMllqazfS2zswXEskyti3b70W35BI0DFaKJXqpr5+01H1zs511NXNG/fjmNlOd28cXddUTwyk0z1UVBQ+1eXls0ine0LqSKSQjtFC7e1dE6pPlII/BpLJavr6Bgtq/f2DJJPVIXUkUkjHaKGmpgUTqk+Ugj8GamoqaW1dTSJRRlVVBYlEGa2tq2P5K7REk47RQnV182hpaSiotbQ0TGia50Q0xx8jmUyWdLqHZLI6tt9QEm06RgulUt20t3fR1LRgUqF/vDl+vasnRmpqKvXNJJGmY7RQXd28op3lj6SpHhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxMyEgt/MTipVIyIiEoxxBb+ZXWBmnUAqf/v9ZvbXJe1MRERKYrxn/F8DVgPdAO7+c+D3StWUiIiUzrinetz9jVGlw0XuRUREAjDe4H/DzC4A3MwqzOwW8tM+k2Fmp5jZVjP7ZzNLmdnvTPaxZPwymSw7dnSRyWTDbiV00RmLDLAjv5WoiMrxkUp1s3nzHlKp7qI+7niD/38CNwGnAQeAhvztyfo68Li7nwm8nyn8EJHxaWtLUVu7kVWr/p7a2o20tcV3yKMzFm1ALbAqv20LqQ8ZKSrHx4YN26mv38QNNzxOff0mNmzYXrTHHtcKXGZ2+uipHjP7TXf/9wnv0KwK+DnwXh/n8l9agWtqMpkstbUb6e0dGK4lEmXs27c+doteRGcsMuTCvndELQHsA2oC7ENGisrxkUp1U1+/6ah6Z+e6CS3McrwVuMZ7xv+6mbWZWWJE7bFx773Qe8kd9ZvMbJeZfetYbxM1s/Vm1mFmHZmMfg2einS6h4qKwqe6vHwW6XRPSB2FJzpjkQYqRtXK83UJS1SOj/b2rgnVJ2q8wf8S8AzwrJm9L1+zSe6zDDgP+Ka7nwu8Ddw2+k7uvtHdG929saZGZ0BTkUxW09c3WFDr7x8kmawOqaPwRGcskkDfqFp/vi5hicrx0dS0YEL1iRpv8Lu7/zXwceCHZvZBYLKrtB8ADrj7z/K3t5L7QSAlUlNTSWvrahKJMqqqKkgkymhtXR27aR6I0ljUAK3kpneq8ttWNM0TrqgcH3V182hpaSiotbQ0FG393fHO8e/Kn51jZguAB4FGd5/UaJjZM8DH3P0VM7sTOMndP328+2uOvzgymSzpdA/JZHUsQ3+k6IxFhtz0ThKFfnRE5fhIpbppb++iqWnBpEL/eHP84w3+Be7eNeJ2GXCBu/9kwp3kvr4B+Ba5Sc5/Bda5+1vHu7+CX0Rk4o4X/GVjfNF17v4dYK3ZMaf0JxX87r4bOKoZEREpvRMGPzD0bpu5pW5ERESCccLgd/f/k9/eFUw7IiJSaid8V4+Z3WhmZ+Q/NzO7z8x6zOxFMzs3mBZFRKSYxno75yc48hcla8ldXuG9wM3AX5auLRERKZWxgn/A3fvzn68B/tbdu919O0fm/0VEZBoZK/gHzWyBmc0BLgFGXiUocZyvERGRCBvrXT2fBzqA2cAj7v4ygJldSO799yIiMs2MFfyV5C4heI67vzCi3gH8t5J1JSIiJTPWVM9n3X2A3F/ZDnP3t939V6VrS0RESmWsM/5uM3sKWGRmj4z+R3e/ojRtiYhIqYwV/JeTu3Lmt4H/Vfp2RESk1Mb6y90+4KdmdoG7azUUEZEZYKyLtN3j7p8E7jOzoy7jqakeEZHpZ6ypnm/nt18pdSMzWVSu7R2VPqJAY1FI41FoqtfBj3ofY0317Mxv/2/R9hgzbW0pmpufoKJiFn19g7S2rmbt2rrY9hEFGotCGo9CGzZs5957dw/fbmlp4BvfuHRG9THehViWA3eSe09/Gbn1dt3d31uULsYwXRdiyWSy1NZupLd3YLiWSJSxb9/6QM+qotJHFGgsCmk8CqVS3dTXbzqq3tm5LtAz/2L1cbyFWMa75m4r8FVgBXA+uUVUzh/33mMqne6hoqJwiMvLZ5FO98SyjyjQWBTSeBRqb++aUH269jHWHP+QHnf/UVH2GCPJZDV9fYMFtf7+QZLJ6lj2EQUai0Iaj0JNTQsmVJ+ufYz3jP8pM/uymf2OmZ039FGUDmawmppKWltXk0iUUVVVQSJRRmvr6sB/hY5KH1GgsSik8ShUVzePlpaGglpLS0PgL/CWuo/xzvE/lf906M5Dc/wXF6WLMUzXOf4hUXnHRFT6iAKNRSGNR6GZ8q6e483xnzD4zezmoU/zWwcywLPu/vqEu5ik6R78IiJhmOyLu3PzHyfnP+aSe2H3R2b24aJ3KSIiJTfW+/iPuci6mb2H3KIsD5SiKRERKZ3xvrhbwN3f5Mj0j4iITCOTCn4zuxh4q8i9iIhIAMa6SNtLHHknz5D3AL8A/kepmhIRkdIZ6w+41oy67UC3u79don5ERKTExnpxd19QjYiISDAmNccvIiLTl4JfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzoQW/mc02s11m9mhYPUh8HTq0j87OH3HokP5UBXLX49+xo4tMJqs+ItRHKtXN5s17SKW6i/q4YZ7xfwJIhbh/iannnvsKlZVncNppV1NZeQbPPfeVsFsKVVtbitrajaxa9ffU1m6krS2cb0v1UWjDhu3U12/ihhsep75+Exs2bC/aY49rBa5iM7OFwGbgz4Gb3X30pSEKaCEWKZZDh/ZRWXkGlZX9w7Vstpxs9lXmz68NsbNwZDJZams30ts7MFxLJMrYt299oCtxqY9CqVQ39fWbjqp3dq6b0Epck12IpVTuAT4DDB7vDma23sw6zKwjk8kE1pjMbAcPdtLfP7ug1t8/m4MHO0PqKFzpdA8VFYUxUF4+i3S6R32E2Ed7e9eE6hMVePCb2RrgoLvvPNH93H2juze6e2NNTU1A3clMd+qp9ZSXHy6olZcf5tRT60PqKFzJZDV9fYXnX/39gyST1eojxD6amhZMqD5RYZzxLweuMLM0uRW8Ljaz74TQh8TQ/Pm17Nr1RbLZcnp65pDNlrNr1xdjOc0DUFNTSWvrahKJMqqqKkgkymhtXR34guvqo1Bd3TxaWhoKai0tDUVb+D2UOf7hnZutBG7RHL8E7dChfRw82Mmpp9bHNvRHymSypNM9JJPVgYec+ji+VKqb9vYumpoWTCr0jzfHP9b1+EVmpPnzaxX4I9TUVIYacOrj2Orq5hXtLH+kUIPf3Z8Gng6zBxGRuNFf7oqIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMwo+EVEYkbBLyISMwp+EZGYUfCLiMSMgl9EJGYU/CIiMaPgFxGJGQW/iEjMzPDgzwA78tsQu8hk2bGji0wmqz4iYtu2dv7sz/6SbdvaQ+0jKs9JKtXN5s17SKW6Q+0jKuMRFSV7Xtw98h9Lly71idvi7gl3r85vt0ziMaZuy5ZOTyS+5tXVX/dE4mu+ZUtnrPuIgk9/er2//Xa5v/XWHH/77XL/9KfXh9JHVJ6TlpYnHb48/NHS8mQofURlPKKiGM8L0OHHyNRQV+Aar4mvwJUBaoHeEbUEsA8Ibv3eTCZLbe1GensHjnSRKGPfvvWBLvIQlT6iYNu2dlasWEFlZf9wLZst59lnn+Wyy5oC6yMqz0kq1U19/aaj6p2d60qyAMjxRGU8oqJYz8vxVuCaoVM9aaBiVK08Xw+wi3QPFRWFQ1xePot0uieWfURBe/tP6eubXVDr759Fe/tPA+0jKs9Je3vXhOqlEpXxiIpSPy8zNPiTQN+oWn++HmAXyWr6+gYLu+gfJJmsjmUfUdDUtIyKisMFtfLyQZqalgXaR1Sek6amBROql0pUxiMqSv28zNDgrwFayU3vVOW3rQQ5zQO5dTtbW1eTSJRRVVVBIlFGa+vqwH91jUofUXDZZU3ceec6stlyenreRTZbzp13rgt0mgei85zU1c2jpaWhoNbS0hDoNA9EZzyiotTPywyd4x+SITe9kyTo0C/oIpMlne4hmawO9UCOSh9RsG1bO+3tP6WpaVngoT9SVJ6TVKqb9vYumpoWBB76I0VlPKJiqs/L8eb4Z3jwi4jEV8xe3BURkeNR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMwo+EVEYkbBLyISMwp+EZGYUfCLiMSMgl9EJGYU/CIiMaPgFxGJGQW/iEjMBB78Zna6mT1lZikze9nMPhF0D8HLADvyW/URhT727n2FRx99gL17Xwmth5zwxyJKUqluNm/eQyrVHXYrM1oYZ/wDwKfcvQ5YBtxkZvUh9BGQNnILv6/Kb9vUR8h9bNr0ORYuXMKKFetYuHAJmzZ9LvAecsIfiyjZsGE79fWbuOGGx6mv38SGDdvDbmnGCn0hFjP7AXCvuz95vPtM34VYMuS+oXtH1BLAPoJdEUx9DNm79xUWLlxCZWX/cC2bLefAgZdYvPi3A+khJ/yxiJJUqpv6+k1H1Ts714W6Ith0F8mFWMwsCZwL/OwY/7bezDrMrCOTma6/BqeBilG18nxdfYTRx969u+jrm11Q6++fxd69uwLrISdN2GMRJe3tXROqy9SEFvxmdjLwPeCT7v6fo//d3Te6e6O7N9bUTNczoCTQN6rWn6+rjzD6WLz4XCoqDhfUyssHWbz43MB6yEkS9lhESVPTggnVZWpCCX4zKycX+n/n7t8Po4dg1ACt5H6Fr8pvWwn+V3n1MWTx4t/mwQdvIZstp6fnXWSz5Tz44C0BT/NAFMYiSurq5tHS0lBQa2lp0DRPiQQ+x29mBmwG3nT3T47na6bvHP+QDLlf4ZOE+42tPobs3fsKe/fuYvHic0MI/ZHCH4soSaW6aW/voqlpgUK/CI43xx9G8K8AngFeAgbz5c+5+2PH+5rpH/wiIsE7XvCXBd2Iuz8LWND7FRGRHP3lrohIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMwo+EVEYkbBLyISMwp+EZGYUfCLiMTMDA/+q4CT89swfYrc+qqfCrmPu8mtdHm3+uA54I78NkwZYEd+KxKM0BdbH4/JXY//WFd+DuP/Opsjyw4M3R4IoY+TgOyo27+KaR+XAU+Ouv1EwD0AtAHN5Nbe7SO3AtfaEPqQmSqSi62XzlUTrJfKpygMfYDDBH/mfzeFYQvwNsGfcUehj+coDH2AbQR/5p8hF/q9QE9+24zO/CUIMzT4t0+wXipbJ1gvlbYJ1kslCn1sm2C9VNLkzvRHKs/XRUprhgb/pROsl8o1E6yXyvGmD4KeVohCH5dNsF4qSXLTOyP15+sipaU5/pIrIze9MySsOf6TyU2rDAlrjj8Kfaym8Aw/7Dn+cnKhrzl+Ka6YzfFDLuSvJBcsVxJO6EMu5G8G/kt+G0boQy5c/wJoyG/DCP2o9PEE8Cxwe34bRuhDLuT3kZuC3IdCX4Iyg8/4RUTiLYZn/CIiciwKfhGRmFHwi4jEjIJfRCRmFPwiIjEzLd7VY2YZcu93m87mA4fCbiJCNB5HaCwKaTwKTWU8at29ZnRxWgT/TGBmHcd6W1VcaTyO0FgU0ngUKsV4aKpHRCRmFPwiIjGj4A/OxrAbiBiNxxEai0Iaj0JFHw/N8YuIxIzO+EVEYkbBLyISMwr+EjOz083sKTNLmdnLZvaJsHsKm5nNNrNdZvZo2L2EzcxOMbOtZvbP+WPkd8LuKSxm9if575E9ZtZmZnPC7ilIZnafmR00sz0jau8xsyfN7NX89t3F2JeCv/QGgE+5ex2wDLjJzOpD7ilsnwBSYTcREV8HHnf3M4H3E9NxMbPTgI8Dje5+NrkViz4cbleBux/4wKjabcCP3f0M4Mf521Om4C8xd+9y9xfyn/+S3Df2aeF2FR4zWwhcDnwr7F7CZmZVwO+RW3oLd+9z9/8XalPhKgMSZlYGVAK/CLmfQLn7T4A3R5WvBDbnP98MXFWMfSn4A2RmSeBc4GchtxKme4DPAIMh9xEF7wUywKb81Ne3zOyksJsKg7v/G/AVYD/QBfS4+7YTf1Us/Ia7d0HuJBI4tRgPquAPiJmdDHwP+KS7/2fY/YTBzNYAB919Z9i9REQZcB7wTXc/l9xixEX5VX66yc9dXwksAn4LOMnMrgu3q5lLwR8AMysnF/p/5+7fD7ufEC0HrjCzNPAAcLGZfSfclkJ1ADjg7kO/AW4l94Mgji4FXnf3jLv3A98HLgi5pyj4DzNbAJDfHizGgyr4S8zMjNwcbsrdvxp2P2Fy98+6+0J3T5J74e4f3T22Z3Xu/u/AG2b22/nSJUBniC2FaT+wzMwq898zlxDTF7pHeQS4Pv/59cAPivGgZcV4EDmh5cB/B14ys9352ufc/bHwWpII2QD8nZlVAP8KrAu5n1C4+8/MbCvwArl3wu0iZpduMLM2YCUw38wOAHcAXwK+a2bN5H44/n5R9qVLNoiIxIumekREYkbBLyISMwp+EZGYUfCLiMSMgl9EJGYU/CKAmf2mmT1gZq+ZWaeZPWZmi0deKVFkptD7+CX28n8w9BCw2d0/nK81AL8RZl8ipaIzfhG4COh39/89VHD33cAbQ7fN7AYzu3fE7UfNbGX+81+Z2V+Y2U4z225mTWb2tJn9q5ldMeLrf2Bmj5vZK2Z2R0D/N5GjKPhF4GxgKheOOwl42t2XAr8E/gxYBXwI+MKI+zUB1wINwO+bWeMU9ikyaZrqEZm6PuDx/OcvAb92934zewlIjrjfk+7eDWBm3wdWAB1BNioCOuMXAXgZWDrGfQYo/H4ZuSxgvx+59skg8GsAdx+k8ORq9PVRdL0UCYWCXwT+EXiXmd04VDCz84HaEfdJAw1mNsvMTic3bTNRq/JrqCbIraT03ORbFpk8Bb/EXv5s/UPkgvk1M3sZuJPCpf+eA14nN5XzFXJXkZyoZ4FvA7uB77m7pnkkFLo6p0gAzOwGcguJt4Tdi4jO+EVEYkZn/CIiMaMzfhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiZn/D4jvLygcakNpAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = cell_df[cell_df['Class'] == 4][0:50].plot(kind='scatter', x='Clump', y='UnifSize', color='DarkBlue', label='malignant');\n", "cell_df[cell_df['Class'] == 2][0:50].plot(kind='scatter', x='Clump', y='UnifSize', color='Yellow', label='benign', ax=ax);\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data pre-processing and selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets first look at columns data types:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ID int64\n", "Clump int64\n", "UnifSize int64\n", "UnifShape int64\n", "MargAdh int64\n", "SingEpiSize int64\n", "BareNuc object\n", "BlandChrom int64\n", "NormNucl int64\n", "Mit int64\n", "Class int64\n", "dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cell_df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the __BareNuc__ column includes some values that are not numerical. We can drop those rows:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ID int64\n", "Clump int64\n", "UnifSize int64\n", "UnifShape int64\n", "MargAdh int64\n", "SingEpiSize int64\n", "BareNuc int32\n", "BlandChrom int64\n", "NormNucl int64\n", "Mit int64\n", "Class int64\n", "dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cell_df = cell_df[pd.to_numeric(cell_df['BareNuc'], errors='coerce').notnull()]\n", "cell_df['BareNuc'] = cell_df['BareNuc'].astype('int')\n", "cell_df.dtypes" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 5, 1, 1, 1, 2, 1, 3, 1, 1],\n", " [ 5, 4, 4, 5, 7, 10, 3, 2, 1],\n", " [ 3, 1, 1, 1, 2, 2, 3, 1, 1],\n", " [ 6, 8, 8, 1, 3, 4, 3, 7, 1],\n", " [ 4, 1, 1, 3, 2, 1, 3, 1, 1]], dtype=int64)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "feature_df = cell_df[['Clump', 'UnifSize', 'UnifShape', 'MargAdh', 'SingEpiSize', 'BareNuc', 'BlandChrom', 'NormNucl', 'Mit']]\n", "X = np.asarray(feature_df)\n", "X[0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want the model to predict the value of Class (that is, benign (=2) or malignant (=4)). As this field can have one of only two possible values, we need to change its measurement level to reflect this." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 4])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cell_df['Class'] = cell_df['Class'].astype('int')\n", "y = np.asarray(cell_df['Class'])\n", "y [0:15]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train/Test dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, we split our dataset into train and test set:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train set: (546, 9) (546,)\n", "Test set: (137, 9) (137,)\n" ] } ], "source": [ "X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)\n", "print ('Train set:', X_train.shape, y_train.shape)\n", "print ('Test set:', X_test.shape, y_test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Modeling (SVM with Scikit-learn)

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SVM algorithm offers a choice of kernel functions for performing its processing. Basically, mapping data into a higher dimensional space is called kernelling. The mathematical function used for the transformation is known as the kernel function, and can be of different types, such as:\n", "\n", " 1.Linear\n", " 2.Polynomial\n", " 3.Radial basis function (RBF)\n", " 4.Sigmoid\n", "Each of these functions has its characteristics, its pros and cons, and its equation, but as there's no easy way of knowing which function performs best with any given dataset, we usually choose different functions in turn and compare the results. Let's just use the default, RBF (Radial Basis Function) for this lab." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SVC()" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = svm.SVC(kernel='rbf')\n", "clf.fit(X_train, y_train) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After being fitted, the model can then be used to predict new values:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 4, 2, 4, 2])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yhat = clf.predict(X_test)\n", "yhat [0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Evaluation

" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def plot_confusion_matrix(cm, classes,\n", " normalize=False,\n", " title='Confusion matrix',\n", " cmap=plt.cm.Blues):\n", " \"\"\"\n", " This function prints and plots the confusion matrix.\n", " Normalization can be applied by setting `normalize=True`.\n", " \"\"\"\n", " if normalize:\n", " cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", " print(\"Normalized confusion matrix\")\n", " else:\n", " print('Confusion matrix, without normalization')\n", "\n", " print(cm)\n", "\n", " plt.imshow(cm, interpolation='nearest', cmap=cmap)\n", " plt.title(title)\n", " plt.colorbar()\n", " tick_marks = np.arange(len(classes))\n", " plt.xticks(tick_marks, classes, rotation=45)\n", " plt.yticks(tick_marks, classes)\n", "\n", " fmt = '.2f' if normalize else 'd'\n", " thresh = cm.max() / 2.\n", " for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n", " plt.text(j, i, format(cm[i, j], fmt),\n", " horizontalalignment=\"center\",\n", " color=\"white\" if cm[i, j] > thresh else \"black\")\n", "\n", " plt.tight_layout()\n", " plt.ylabel('True label')\n", " plt.xlabel('Predicted label')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 2 1.00 0.94 0.97 90\n", " 4 0.90 1.00 0.95 47\n", "\n", " accuracy 0.96 137\n", " macro avg 0.95 0.97 0.96 137\n", "weighted avg 0.97 0.96 0.96 137\n", "\n" ] } ], "source": [ "# Compute confusion matrix\n", "cnf_matrix = confusion_matrix(y_test, yhat, labels=[2,4])\n", "np.set_printoptions(precision=2)\n", "\n", "print (classification_report(y_test, yhat))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion matrix, without normalization\n", "[[85 5]\n", " [ 0 47]]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVMAAAEmCAYAAADfpHMGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAp6klEQVR4nO3dd5xcZdnG8d+1m0BCChASWugKxBAkQOiCIBZABFQQFHxFrK8FFRUbLwj2DtIUEGlSREGRIFUiRWpCQDpIqAmQhJZAKEnu94/nGZism5nZnTM7M7vXl898duacM2fu3bD3PuWc51ZEYGZm9elodgBmZv2Bk6mZWQGcTM3MCuBkamZWACdTM7MCOJmamRXAydSaStJQSX+T9Lyk8+s4z/6SLi8ytmaRtL2k+5odh/WMfJ2p1ULSR4BDgHHAPGA68IOIuK7O834U+CKwbUQsrDfOVicpgPUj4sFmx2LFcsvUqpJ0CHA08ENgFWAt4ARgzwJOvzZw/0BIpLWQNKjZMVgvRYQffiz1ASwPzAf2qXDMsqRkOzM/jgaWzft2BB4Hvgo8DcwCPp73HQm8CryWP+MTwHeBs8rOvQ4QwKD8+kDgIVLreAawf9n268rety1wC/B8/rpt2b4pwPeA6/N5LgdGL+V7K8V/aFn8ewG7AfcDzwDfLjt+S+AG4Ll87HHAMnnfNfl7eTF/v/uWnf8bwJPAmaVt+T1vyp+xWX69OjAH2LHZ/2/4seTDLVOrZhtgCHBhhWO+A2wNTAQ2ISWUw8r2r0pKymNJCfN4SStGxBGk1u55ETE8In5XKRBJw4BfA7tGxAhSwpzezXGjgMn52JWAXwKTJa1UdthHgI8DKwPLAF+r8NGrkn4GY4HDgZOBA4DNge2BwyWtl49dBHwFGE362e0MfA4gInbIx2ySv9/zys4/itRK/3T5B0fEf0iJ9g+SlgN+D5wWEVMqxGtN4GRq1awEzInK3fD9gaMi4umImE1qcX60bP9ref9rEXEJqVW2YS/jWQxMkDQ0ImZFxF3dHPNe4IGIODMiFkbEOcC9wPvKjvl9RNwfEQuAP5L+ECzNa6Tx4deAc0mJ8piImJc//y7grQARMTUibsyf+zDwW+DtNXxPR0TEKzmeJUTEycADwE3AaqQ/XtZinEytmrnA6CpjeasDj5S9fiRve/0cXZLxS8DwngYSES+SusafBWZJmixpXA3xlGIaW/b6yR7EMzciFuXnpWT3VNn+BaX3S9pA0sWSnpT0AqnlPbrCuQFmR8TLVY45GZgAHBsRr1Q51prAydSquQF4mTROuDQzSV3UkrXytt54EViu7PWq5Tsj4rKIeBephXYvKclUi6cU0xO9jKknTiTFtX5EjAS+DajKeypeUiNpOGkc+nfAd/MwhrUYJ1OrKCKeJ40THi9pL0nLSRosaVdJP82HnQMcJmmMpNH5+LN6+ZHTgR0krSVpeeBbpR2SVpG0Rx47fYU0XLCom3NcAmwg6SOSBknaFxgPXNzLmHpiBPACMD+3mv+3y/6ngPX+612VHQNMjYhPksaCf1N3lFY4J1OrKiJ+SbrG9DBgNvAY8AXgL/mQ7wO3AncA/wam5W29+awrgPPyuaayZALsIF0VMJM0w/128uROl3PMBXbPx84lzcTvHhFzehNTD32NNLk1j9RqPq/L/u8Cp0t6TtKHqp1M0p7ALqShDUj/DptJ2r+wiK0QvmjfzKwAbpmamRXAydTMrABOpmZmBXAyNTMrgBdV6GMaNDS0zIhmhzFgvXXcms0OYcB67NFHmDtnTrVrbmvWOXLtiIX/dcPYEmLB7MsiYpeiPrMSJ9M+pmVGsOyGVa+IsQa58pqjmx3CgPXOHbYq9HyxcEHV36WXpx9f7e6zwjiZmll7kqCjs9lRvM7J1Mzal1pn2sfJ1Mzalwobgq2bk6mZtSl3883M6ifczTczq59aqpvfOmndzKynOjorP2og6SuS7pJ0p6RzJA2RNErSFZIeyF9XrBpK3d+MmVlTKHXzKz2qnUEaCxwMTIqICUAnsB/wTeCqiFgfuCq/rsjJ1Mzak0jd/EqP2gwChubSPMuR1svdEzg97z+dypUmACdTM2tbgo5BlR9VRMQTwM+BR0mluZ+PiMuBVSJiVj5mFqmKbUVOpmbWvjpU+ZGKQd5a9liilHYeC90TWJdUiHGYpAN6E4pn882sPdV2adSciJhUYf87gRm5RDmSLgC2BZ6StFpEzJK0GvB0tQ9yy9TM2pSKmM1/FNg6F4oUsDNwD3AR8LF8zMeAv1Y7kVumZta+6rzONCJukvQnUhHIhcBtwEnAcOCPkj5BSrj7VDuXk6mZta8C7oCKiCOAI7psfoXUSq2Zk6mZtScvwWdmVpAWup3UydTM2pS80ImZWd2Eu/lmZvVzy9TMrBgeMzUzK4C7+WZmdZK7+WZmxXA338ysPgI6OtwyNTOrj/KjRTiZmlmbEnI338ysfu7mm5kVoJVapq2T1s3MekI1PKqdQtpQ0vSyxwuSvuxSz2Y2YAjR0dFR8VFNRNwXERMjYiKwOfAScCEu9WxmA4mkio8e2hn4T0Q8Qi9KPXvM1MzaVg0Jc7SkW8tenxQRJy3l2P2Ac/LzJUo9S6pa6tnJ1Mzak0AdVZNpteqk6VTSMsAewLd6G467+WbWlkTlLn4Pu/m7AtMi4qn8+qlc4hmXejazfq/AZPph3ujig0s9m9mAUVs3v/pppOWAdwGfKdv8Y1zq2cwGiiIu2o+Il4CVumybi0s9m9lAULrOtFU4mdpSfXH/nTjw/dsSEdz14Ew+fcRZfO3j7+agD2zL7GfnA3DEcRdx2XV3NznS/m+zjd7M8OHD6ejsZNCgQVx5zU3NDqk1tM7dpE6m1r3VxyzP5z78djb94A94+ZXXOOsnB7HPezYH4NizruboM69qcoQDz4WTr2Sl0aObHUbrkO/NtzYxqLOTocsOprOzg6FDlmHW7OebHZLZEuq9nbTQWPr006xtzJz9PEefcRX3//17zLjiB7wwfwFX3XgvAJ/dbwduPu9b/OaI/VlhxNAmRzowSGKfvXZl5+235IxTT252OK2jzoVOitSSyVTSoryCy+2Spknato5zHSXpnXW8f6ikf0rqlDRR0g2S7pJ0h6R9y447V9L6vf2cVrPCiKHsvuPGvGX3I1jv3d9h2NBl2G+3LTj5/GsZ/77vstV+P+bJOS/w40M+0OxQB4TJV/yTf1x3C+decDGnnnwi/7ru2maH1BIKvje/Li2ZTIEFeSWXTUi3d/2otyeKiMMj4so6YjkIuCAiFpFWlPmfiNgI2AU4WtIK+bgTgUPr+JyW8o6txvHwzLnMeXY+Cxcu5i//uJ2tN1mXp5+Zx+LFQURw6gXXM2nC2s0OdUBYdbXVARgzZmV2e99e3Db1liZH1HxS/atGFalVk2m5kcCzpReSvi7pltwyPDJvW0fSPZJOzq3GyyUNzftOk7R3fr6bpHslXSfp15Iuztu/K+lUSVMkPSTp4LLP359890NE3B8RD+TnM0m3mI3Jx10LvFNSv5jUe+zJZ9hy43UZOmQwADttuSH3zXiKVUePfP2YPd+xCXf/Z1azQhwwXnzxRebPm/f68ylXXcG48Rs1OarW0Eot01b9xR8qaTowBFgNeAeApHcD6wNbkkZELpK0A+kOhfWBD0fEpyT9EfggcFbphJKGAL8FdoiIGZLKbx0DGAfsBIwA7pN0Yv6M9SLi4a4BStoSWAb4D0BELJb0ILAJMLXLsZ8GPg3A4OG9+4n0sVvufIQLr7yNG87+BgsXLeb2ex/nd3++nhMP/whv3XANIoJHZj3DF7/f9cdoRZv99FMc+JG9AVi4cBEf+NB+7Pyu9zQ5qhbROpP5LZtMF+TFWpG0DXCGpAnAu/PjtnzccFISfRSYERHT8/apwDpdzjkOeCgiZuTX51BKcMnkiHgFeEXS08AqwGLgua7B5YUPzgQ+FhGLy3Y9DaxOl2Sal/w6CaBjuZWj6nffIr7/m0v4/m8uWWLbJ/7vjCZFM3Cts+56TLlhWrPDaD1yDageiYgbJI0mdacF/Cgiflt+jKR1gFfKNi0Cuk4zV/sb1vX9g4DnSa3j8s8aCUwGDouIG7ucYwiwoMrnmFkBBLTQZaatP2YqaRzQCcwFLgMOkjQ87xtby6Kt2b3AejnxAuxb4VgAIuJZoDMPEZTWPLwQOCMizu/mLRsAd9UYj5nVpdAl+OrWqi3T0pgppD9AH8uz6ZdLegtwQ/5BzQcOILUkK4qIBZI+B1wqaQ5wc42xXA68DbgS+BCwA7CSpAPz/gMjYrqkVUjDE56RMesjHQWsGlWUlkymEdFZYd8xwDHd7JpQdszPy54fWHbM1RExTikTHw/cmo/5bpfPmFD28jjgEODKiDiLskmtLj5CmuAys76gYrr5+fLGU0g5JEiXQ94HnEeae3kY+FDuqS5Vy3fzC/ap3OK9C1ieGpJfRNwGXC1pqQk+e443CnCZWYOJ1DKt9KjRMcClETGOdDXOPfSiOmlLtkwbJSJ+BfyqF+87tYZjft+roMys1+rt5ucJ5R2AAwEi4lXgVUl7Ajvmw04HpgDfqBhLXZGYmTVL7uZXetRgPWA28HtJt0k6RdIwulQnBapOdDuZmllbSpdGVZ3NHy3p1rLHp7ucZhCwGXBiRGwKvEgNXfruDKhuvpn1JzWNi1Yr9fw48HhElFbb/hMpmT4labWImCVXJzWz/q7e60wj4kngMUkb5k07A3fj6qRmNmAUdGkU8EXgD/mmnIeAj5Mamq5Oamb9X+nSqHrlNT26GwpwdVIzGxj6+pbRSpxMzaxttVAudTI1s/Yk+d58M7MC9P3KUJU4mZpZ22qhXOpkamZtyt18M7P6lW4nbRVOpmbWtpxMzcwK4G6+mVm9irudtBBOpmbWluRLo8zMitHZDt18SceSikt1KyIObkhEZmY1aqGGacWW6a19FoWZWQ+l0iStk02XmkwjYolKm5KGRcSLjQ/JzKw2RXTzJT0MzAMWAQsjYpKkURRd6lnSNpLuJpU/RdImkk6oK3ozswIUUFCvZKeImFhW4qTHpZ5rKVtyNPAeYC5ARNxOKo1qZtY0Is/oV/ivDnuSSjyTv+5V7Q011YCKiMe6bFrUo7DMzIom0dlR+UH16qSQJtovlzS1bH+PSz3XcmnUY5K2BSLXSDmY3OU3M2umGrry1aqTAmwXETMlrQxcIene3sRSS8v0s8DngbHAE8DE/NrMrGkEdEgVH7WIiJn569PAhcCW5FLPALWWeq7aMo2IOcD+NUVlZtaH6r03X9IwoCMi5uXn7waO4o1Szz+mqFLPktYDjgG2Jo0t3AB8JSIe6vV3YGZWp17M2HdnFeDCfL3qIODsiLhU0i00oNTz2cDxwPvz6/2Ac4CtehG4mVlhau3KL01uFG7Szfa59LDUcy1jpoqIMyNiYX6cRYXbTM3M+koRY6ZFqXRv/qj89GpJ3wTOJSXRfYHJfRCbmdlSpQmoZkfxhkrd/Kmk5FkK9zNl+wL4XqOCMjOrSm2yBF9ErNuXgZiZ9VTbrbQvaQIwHhhS2hYRZzQqKDOzatqpmw+ApCOAHUnJ9BJgV+A6wMnUzJqqlbr5tczm7026RODJiPg46TKCZRsalZlZFRJ0ShUffamWbv6CiFgsaaGkkaTbqtZrcFxmZlW1UMO0pmR6q6QVgJNJM/zzgZsbGZSZWS1aqZtfy735n8tPfyPpUmBkRNzR2LDMzCoTapuCeptV2hcR0xoTkplZDYq5N78wlVqmv6iwL4B3FBzLgLDpW9bi+puOa3YYA9ZB50xvdggD1mPPLij8nG3RzY+InfoyEDOznhD0+Yx9JTWVLTEza0UdqvyolaROSbdJuji/HiXpCkkP5K8rVo2l99+GmVlzFZVMgS+xZDmmhlQnNTNrORK1FNSr4TxaA3gvcErZ5uKrkyo5QNLh+fVakrasKUozswYqrba/tEeNjgYOBRaXbetxddJaWqYnANsAH86v55FW3jcza5oaC+pVLPUsaXfg6YiYWm88tdwBtVVEbCbpNoCIeDaXfDYza6rO+ks9bwfsIWk30qp4IyWdRa5OGhGzaq1OWkvL9DVJneRSJZLGsGRz2Mysz6lKq7SWsiUR8a2IWCMi1iHVt/tHRBzAG9VJoajqpMCvSbWkV5b0A9IqUofV8D4zs4bqbNwU+o8pujppRPxB0lTSMnwC9oqIe6q8zcysoUpjpkWJiCnAlPy8x9VJa1kcei3gJeBv5dsi4tGefJCZWdFa6Aaomrr5k3mjsN4QYF3gPmCjBsZlZlaZWut20lq6+RuXv86rSX1mKYebmfWJtqsB1VVETJO0RSOCMTPribZKppIOKXvZAWwGzG5YRGZmNRC0x+LQZUaUPV9IGkP9c2PCMTOrURstDk2+WH94RHy9j+IxM6tZkZdG1atS2ZJBEbGwUvkSM7NmSd38Zkfxhkot05tJ46PTJV0EnA+8WNoZERc0ODYzswpEB23QMi0zCphLqvlUut40ACdTM2sa0T5jpivnmfw7eSOJlkRDozIzq0YwqE1m8zuB4dBtO9rJ1Myaqp1aprMi4qg+i8TMrIfaYjaf7lukZmYtIZV6bnYUb6h0YUGPlp8yM+tTSgtEV3pUPYU0RNLNkm6XdJekI/P24ko9R8QzPfrGzMz6mKo8avAK8I6I2ASYCOwiaWtc6tnMBorUzVfFRzWRzM8vB+dH0IhSz2ZmraqGUs8Vq5Omc6hT0nRS0bwrIuImelHqucdL8JmZtYaaxkWrVSclIhYBEyWtAFwoaUJvonEyNbO2VOrmFyUinpM0BdiFBpV6NjNrSfVOQEkak1ukSBoKvBO4lwaVejYzaz350qg6rQacnpcb7QD+GBEXS7qBoks9m5m1oiK6+RFxB7BpN9uLL/VsZtaqWugGKCdTM2tfLXRrvpOpmbWnomfz6+VkamZtSqiFOvpOpmbWtlqoYepkambtSXI338ysEC2US51MrTaXX3YpXzvkSyxatIgDD/okXz+06opkVicJfrDbBjzz0mv8/OoZfHH7tVlt5BAAhi3TyYuvLuLbk+9rcpTN5TFTayuLFi3iywd/nsl/v4Kxa6zB27begt1334O3jB/f7ND6tV3HjeGJ519h6OB01/ex1z7y+r79N1+dl15d1KzQWkKrzeb73nyr6pabb+ZNb3oz6663Hsssswz77LsfF/+t6q3KVodRyw1m4tiRXP3g3G73b732Ctzw8LN9HFXrqWEJvj7jZGpVzZz5BGussebrr8eOXYMnnniiiRH1fx+dNJZzps0kuqkDPG7lYTz/8kKenPdq3wfWYlTlv77UsGQqKSSdWfZ6kKTZki6u8r4dS8dI2kNSnw3OSZooabcu2/aSdHiXbXvn729Sfj1G0qV9FWdfi25+owtYYMKWYtOxI3nh5YXMeGZBt/u3XWdF/jXDrVJReZX9vh4CaOSY6YvABElDI2IB8C6gR82ZiLiItBRWX5kITAIuKdt2KLBH6YWkEcDBwE2lbRExW9IsSdtFxPV9FGufGTt2DR5//LHXXz/xxOOsvvrqTYyof9tg5WFstsZIJo4dz+BOMXRwJ5/bbi1OuP5ROgRbrLU837nk/maH2XxN6MpX0uhu/t+B9+bnHwbOKe2QtKWkf0m6LX/dsOubJR0o6bj8/E2SbpR0i6SjJM3P23eUNEXSnyTdK+kPys0mSYfn4++UdFLZ9imSfpKrEt4vaXtJywBHAftKmi5pX0kbAK9ExJyysL4H/BR4uUu4fwH2r/9H1nombbEFDz74AA/PmMGrr77K+eedy3t336P6G61XzrttFl+84G6+dOHdHHvtI9z15DxOuP5RACasNoKZL7zCMy+91uQoW0MB65muKelqSffk6qRfytuLq05akHOB/SQNAd5KWWuOtADrDhGxKXA48MMq5zoGOCYitgBmdtm3KfBlYDywHrBd3n5cRGwREROAocDuZe8ZFBFb5vcdERGv5jjOi4iJEXFePs+00hskbQqsGRHdDVXcCmzfXeCSPl2qQTN7zuwq32brGTRoEL865jje9973MHHjt/DBfT7E+I02anZYA9I27uK/roiCesBC4KsR8RZga+DzksbTi+qkDb00KiLukLQOqVV6SZfdy5MWZV2fVA1wcJXTbcMbFQLPBn5etu/miHgcIBfGWge4DthJ0qHAcsAo4C7gb/k9F+SvU/Px3VkNmJ3P2wH8CjhwKcc+DXTb942Ik4CTADbffFI3Uwqtb5ddd2OXXXerfqAV6p6n5nPPU/Nff/3bfz3axGhaUJ3d/Fwsr1Q4b56ke4CxpOqkO+bDTgemAN+odK6+mM2/iJT4zumy/XvA1bnV+D5gSB2f8UrZ80XAoNwaPgHYOyI2Bk7u8hmvlB+/lPMuKHvPCGACMEXSw6S/YheVJqHycd3PGJhZQxQ5m58bfpuSetA9rk7aF8n0VOCoiPh3l+3L88aE1IE1nOdG4IP5+X41HF9KgnMkDQf2ruE980hJs+Qe4M0AEfF8RIyOiHUiYp0czx4RcWs+dgPgzho+w8wK0qHKD2oo9QyQc8SfgS9HxAu9iqXX30WNIuLxiDimm10/BX4k6Xqgs4ZTfRk4RNLNpO7381U+9zlSa/TfpMmhW2r4jKuB8aUJKOAaYNPSxFUVOwGTazjOzIpSfQZqTkRMKnuc9F+nkAaTEukfIqI0/PdUrkpKrdVJGzZmGhHDu9k2hTT2QETcQGrNlfxfN8ecBpyW9z8BbB0RIWk/0oTPEsfn118oe34YcFg3cexY9nwOecw0Ip4Btig/VtKVpFowVy7tHNkepHEWM+sDKV/WN2iaG0q/A+6JiF+W7SpVJ/0x/bA66ebAcfmbfw44qI8+94fAVpUOkDQG+GVEeJrVrK+80ZWvx3bAR4F/58lrgG+Tkmj/rE4aEdcCmzThc5+iyo0DETGbNJRgZn2p/tn86yqcxdVJzWwgcNkSM7O6iUK6+YVxMjWz9uVkamZWP3fzzcwK4G6+mVm9al0aqo84mZpZ23I338ysTp7NNzMripOpmVn93M03MyuAu/lmZkVwMjUzq08RS/AVycnUzNpTMUvwFaYvypaYmTVGnbWeJZ0q6WlJd5Zt63GZZ3AyNbO2JTpU+VGD04BdumzrcZlncDI1szZVrVFaSyqNiGuAZ7ps3pNU3pn8da9a4vGYqZm1r+oZc7SkW8ten9RdUb0ulijzLKlqmWdwMjWzNlZDV35OREzqk1j64kPMzBqh3m7+UvS4zDM4mZpZuxKoyqOXSmWeocYyz+Buvpm1KQGqI2OS3n8OsCNpbPVx4Ah6UeYZnEzNrI3Ve81+RHx4Kbt6VOYZnEzNrI3V2TAtlJOpmbWterv5RXIyNbO21Tqp1MnUzNpUnTP2hXMyNbO25W6+mVkBWieVOpmaWRtroYapk6mZtSdR8zJ7fcK3k5qZFcAtUzNrWy3UMHUyNbM2pZqW4OszTqZm1pbqXGavcE6mZta+WiibegLKzNpWAQX1kLSLpPskPSippuJ53cbS2zeamTVbvSvtS+oEjgd2BcYDH5Y0vjexOJmaWfuqv27JlsCDEfFQRLwKnEuqTtpjTqZm1pZEId38scBjZa8fz9t6zBNQfWzatKlzhg7WI82Oow6jgTnNDmKAavef/dpFnmzatKmXDR2s0VUOG1Kl1HN3GTd6E4+TaR+LiDHNjqEekm7tq9K5tiT/7JcUEbsUcJrHgTXLXq8BzOzNidzNN7OB7BZgfUnrSloG2I9UnbTH3DI1swErIhZK+gJwGdAJnBoRd/XmXE6m1lMnVT/EGsQ/+waIiEuAS+o9jyJ6NdZqZmZlPGZqZlYAJ1MzswI4mVqfUCtVPjNrAE9AWcNIWgl4GVgUES9L6oiIxc2Oy6wR3DK1RvoG8H/AWZLWdCJtLkkrS/qhpFUlLd/sePobz+Zbw0gaRPqD/XXgPcAxwJSImNvUwAYoSUOA7wOLgWHA6RFxc3Oj6j+cTK1QktYFNgFei4jJZdv3Ja3Gc1FEnCtJ4f/5+kRuhc4r9QwkrQ/sQOo1fDIirmxmfP2Fu/lWGEnjSBc/bwucLOmQ0r6IOA+4APiUpA0iIjwp1XiSxgAPAP8jaSRARDwQEb8jDcOcIWmbZsbYXziZWiEkjQJ+D/wyIg4FdgG+IGliKWlGxJ+AS4ETJA1zy7RPDCZNAn4A2FXSsNKO/AfuMNK/06pNiq/f8Gy+FeU54JfAn/Ks/R2SrgU6y5NmRPwsdzvHAvc3J9SBIyJmSjoTmA18HpgvaTppiO9x4O/AOsAI4MlmxdkfuGVqdZE0WlLpnvFLIynN2i8kr2EpaW1Jy+dW6qPAqCaEOyBIGiNpxfxcwHKk1ZE+QxonvQvYAiAiZgGLgI83J9r+w8nU6hIRc4CXgI0jYh68XlcHYCjwmqTNgMnAqrmVegpwZzPi7e8kdQB7A+MA8s/778AGwL3AKsDTwEJJQ/MxRwIXSxrclKD7CSdT6zUlHcCzwG7dHHIrsA/wa+A7EXFfnsVfHBHz+zLWgSL3ClYCPlK2eR7waeBu4CfAQcCh+biSGyPitb6Ksz/ypVFWN0lrAlcDx0XE0WXbv0765d0lIi735VCNU/6zzYscnw5cERGn5m3HAw9FxC/y61Ui4qmmBdwPOZlaj5V+cSWNBjoi4unclT+VtLjur/NxE4BhEXGTE2nj5IvxJ0bEjblM8bKkSaXtgcsi4rIux3eQRgD871Egz+Zbj+VEugfwLWCEpMMj4gJJnwDOlzQcODsi7oQ3fnmbGHJ/NwbYPF/XuzGwE/AEadz03ZJGRcQ5pYN9W29jeMzUekzSRsAXgE8B3wG+K2m/iJgKvB1YDfi8pO9J6sxjpE6mDRIRj5F+lz9AGvt8MiKeBs4GpgN7SDpe0lqlSScrnrv51iOSVge+B6weEbvmbbvnbcdExGl5zA5gc+D2iHipOdH2b13GSYcBewHjSffe/yoinslDMS8BPyDdCXU/cJX/uBXPydRqJmntiHhE0keBDwHnA3/My+vtBfwI2DkielUq13pO0i7AlsDMiDhF0lbAAcBTwDTgXcDXc+G4TtLv/MLmRdx/uZtvFZVuBc2LY5wq6UsRcSYpkW4B7C1pSET8BdjBibTxyv5NNiWtxPUy8H5JZ0bETaSZ/FHAccA/SskzIhY5kTaOW6ZWVZ5s+jip+ziatPLTLyQdAOwIXAucQfr/yZMbfUDSlsDHgOsj4uw8FvonYG5E/E8+ZmxEPOErKfqGW6b2XyQNl7Rcfr4CeZIJ+DBwOLCVpM9HxFnA9cC0LreRWuOtTlpG762SlouIBcAHgTUl/TUfMxNevwvKGsyXRtkScvL8MnCcpAXAq3nX/Ih4VdI04A7gQEkv56XcrMHKru1dA5gVEX+R9DxwBLCbpMkRsUDSrsAEcBLta26Z2hIi4jngJGAI8P48E38R8AtJa+T7728HrgHeprQYtDVYTqTvA/4AHC/ps6Thle8D/wt8QNLQiHg5Im5tZqwDlZOpvS5fXE+eRNoDOCDP0p8L3AhcJemrwNHAX0kLmYxoSrADjKTtgCNJQy2DSNf4/h/pj9ovSAnVdZ2ayN18A17vRi4u3bMdESdIegF4P+mP7tHAg8AKpOsZlwPWB55pTsQDg96o6Lo28ElSSZhNSAn0QNL1vUcCN4VrazWVk6kBr3cjdwN+IulG0j3dZ+WrcPYg/b9yUb6mdBvgp8BBeYFhK1jZDPwQ4KU8Yz8I+C3w0Yi4N19juhowNiIeaGa85mRqmaRJwL7AIaR7ureXtGJEnJzXudwL+CfpmsangX19TWnj5D9uuwCfU6pYcFFewnA08E1Jx5Jaq192Im0Nvs7UyL+gU0i3fu4vaVnSfd5bAffnLv/qTp59J19H+iPgLFI9rYfy8yeB3wEjgWMj4sKmBWlLcDI1ACR9CDge+FxEnJ+7lPuTEuqP8mIa1gckrU1apOTPEfFLSWuRFpZZDJwXEbflXsOzviC/dTiZDkBl1yxuT7ol9A7gNlLi/DFwZET8OSfUld0ibbwui5YsR7pNdFtgr4h4QNJY4Juk636PjIgXmhetdceXRg1AZeNxJwEvACcAB0TEJaS7nX4m6UMRsdCJtPHK/rhtK+km4DVS8bsLgB9JWj8iniBVLTjZibQ1OZkOQEqllncH3gf8m7RE2x/z7smkSahZzYluYMmXPoWknYB3AuuSJvoGkS7I/zfwa0kbRMTjEXFvE8O1CtzNHwAkvYl0beKiiPhr3nYoqRu5GrBPRDyaL9CfGxHX5mM8HtcgeaWtl/PzjYGLSVdTPEm6dvStwCRgMPBt4IKImNakcK0Gbpn2c5I2IN2ttB3wjXwbIsB/gFWBn+VEOonUjSyVafa93Q0iaRTwVUkj86aXgcsj4saIeDgiPgrMB/4BvBoRhzmRtj4n035MqbjaecC3IuKrpLFRSdowIv5Mqqf+fkmTSZfbfD0ipjQt4AEgT+otIM3Wj8i3iT5Duq63vFz2ScAywNml23yttbmb349JehtwTUR05Nd3kAqtjQWujYjPS1oFeBOpe3+fu/aNk1ukvwJ+HBH3SPomqQDet0nVRC/MzxcAHwV+SLqd95BwTfuW5794/VhEXAe8V9JDkq4E/hSpbtMk4F2Svpnvw/9XRNyX3+NE2jiDgUeBoyStQ2p9TictWPIQ8B7S3WfvAr5CaplOIq2DYC3OLdMBQNLOwGXAMqUFnJXKMq8QEb9oanADjKTVgINIxQYPAZ4lrfi0LnBCRNyey5LsAJxCWgbxzmbFa7Vzy3QAiIirSIuV3A8g6c3A10mX3ViDlWo2AUTELOBYUov0l8CKwImk4ZeDJa1IapEuBHZ1Im0fbpkOIPlC/QuAGcBXI+LSJofU75VdkP8e0uVprwC/ITVkDgU2Ar5BmoRaKSIealqwVhcn0wEmd/lHeoGMviNpd+AoUomRr5IS5/7AIlJtrY1Iq3C93KwYrX5OpgOUZ+0bJy9MsmZEXJ9rah1DupvpraT6Wk+Quvd7ApGPfbA50VpRnEzNCpLHRkcA9wHPA1+MiCskrU6qUHAW6TZeATeRxrB39h+1/sETUGYFieQF4DTgMeArkj6QF4sJ4Ob8fF3STRLfciLtP7zSvlkBJA0uu7B+Cqkl+ndSSWyAK4D1JZ1Iqlqwf0Tc1PeRWqO4m29WJ0njSHcunRoRU3J3/w/Aw8DNpEqiPwDuAsYDi51I+x+3TM3qtzJwALCRpN+QVsT/Fqks802kktg/BI6OiL80K0hrLCdTszpFxDWSdiDdZTaTtLThuaQ1EK4irRUrwJVc+zF3880Kki/MP5p0CdSmwK7A9RFxpaRBEbGwmfFZYzmZmhVI0nuBnwNbR8TzXSamrB9zN9+sQBExWdIi4H5J4yLi2WbHZH3DLVOzBsgt1Be92PbA4WRq1kC+bXfgcDI1MyuAbyc1MyuAk6mZWQGcTM3MCuBkag0jaZGk6ZLulHS+pF4XhpN0mqS98/NTchnrpR27o6Rte/EZD0saXev2LsfM7+FnfVfS13oao7UuJ1NrpAURMTEiJgCvAp8t3ympszcnjYhPRsTdFQ7ZkXRLp1mfcTK1vnIt8Obcarxa0tnAvyV1SvqZpFsk3SHpM5AuKZJ0nKS7JU0mLSZC3jdF0qT8fBdJ0yTdLumqXEL5s6S1RKdL2l7SGEl/zp9xi6Tt8ntXknS5pNsk/ZZ0/3xFkv4iaaqkuyR9usu+X+RYrpI0Jm97k6RL83uuzStMWT/kO6Cs4SQNIt2nXirgtyUwISJm5IT0fERsIWlZ4HpJl5Pubd8Q2BhYBbgbOLXLeccAJwM75HONiohn8spN8yPi5/m4s4FfRcR1uaTIZcBbSDWZrouIo/JF9kskx6U4KH/GUOAWSX+OiLnAMGBaRHxV0uH53F8ATgI+GxEPSNoKOAF4Ry9+jNbinEytkYZKmp6fX0taXX5b0orzM/L2dwNvLY2HAssD65Pqxp8TEYuAmZL+0c35twauKZ0rIp5ZShzvBMaXVVweKWlE/owP5PdOllTLrZ8HS3p/fr5mjnUuadm98/L2s4ALJA3P3+/5ZZ+9bA2fYW3IydQaaUFETCzfkJPKi+WbSLWSLuty3G6kUh+VqIZjIA1nbRMRC7qJpea7ViTtSErM20TES5KmAEOWcnjkz32u68/A+iePmVqzXQb8r6TBAJI2kDQMuAbYL4+prgbs1M17bwDeLmnd/N5Refs8UmG7kstJXW7ycRPz02tIJZeRtCupYmglywPP5kQ6jtQyLukASq3rj5CGD14AZkjaJ3+GJG1S5TOsTTmZWrOdQhoPnSbpTuC3pB7ThcADwL+BE4F/dn1jRMwmjXNeIOl23uhm/w14f2kCCjgYmJQnuO7mjasKjgR2kDSNNNzwaJVYLwUGSboD+B5wY9m+F0kr7U8ljYkelbfvD3wix3cXqbyz9UO+N9/MrABumZqZFcDJ1MysAE6mZmYFcDI1MyuAk6mZWQGcTM3MCuBkamZWgP8HpKn7/Mv2YUkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot non-normalized confusion matrix\n", "plt.figure()\n", "plot_confusion_matrix(cnf_matrix, classes=['Benign(2)','Malignant(4)'],normalize= False, title='Confusion matrix')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also easily use the __f1_score__ from sklearn library:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9639038982104676" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f1_score(y_test, yhat, average='weighted') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thanks for reading :)\n", "Created by [Saeed Aghabozorgi](https://www.linkedin.com/in/saeedaghabozorgi/) and modified by [Tarun Kamboj](https://www.linkedin.com/in/kambojtarun/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }