{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python for Signal Processing\n", "Danilo Greco, PhD - danilo.greco@uniparthenope.it - University of Naples Parthenope" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# TASK: Develop a kNN classifier from scratch\n", "The k-Nearest Neighbors algorithm is commonly available within the standard machine learning libraries, but implementing the algorithm from scratch is a good exercise to better understand important aspects of the approach and experiment your own variations.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The approach\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The k-Nearest Neighbors algorithm\n", "It is a very simple classification technique that is based on the similarity between records. The training dataset is stored upfront and, when a prediction is required, the algorithm identifies the $k$ most similar training records and returns either the most common class value, in case of classification, or the average, in case of regression.\n", "\n", "___\n", "\n", "In this workbook, we will be using the publicly available **Iris Dataset** that provides a multinomial classification problem for predicting the flower species from 4 characterizing features. \n", "\n", "This dataset and many others can be downloaded at https://archive.ics.uci.edu/ml/datasets.php\n", "\n", "The dataset, available in file `iris.csv`, is organized as follows:\n", "\n", "* Sepal length in cm\n", "* Sepal width in cm\n", "* Petal length in cm\n", "* Petal width in cm\n", "* Class\n", "\n", "We will also be using some accessory libraries (pandas, numpy) for easier loading the dataset and other array management operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First Step: Analyze your dataset\n", "\n", "The next cell shows how to load a CSV into a pandas dataframe and print the first elements of the dataset." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | sepal-length | \n", "sepal-width | \n", "petal-length | \n", "petal-width | \n", "class | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "Iris-setosa | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "