FlowPicRefresh/_reference/Intrusion-Detection-System-.../1-Data_pre-processing_CAN.i...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles \n",
    "This is the code for the paper entitled \"**A Transfer Learning and Optimized CNN Based Intrusion Detection System for Internet of Vehicles**\" accepted in IEEE International Conference on Communications (IEEE ICC).  \n",
    "Authors: Li Yang (lyang339@uwo.ca) and Abdallah Shami (Abdallah.Shami@uwo.ca)  \n",
    "Organization: The Optimized Computing and Communications (OC2) Lab, ECE Department, Western University\n",
    "\n",
    "**Notebook 1: Data pre-processing**  \n",
    "Procedures:  \n",
    "&nbsp; 1): Read the dataset  \n",
    "&nbsp; 2): Transform the tabular data into images  \n",
    "&nbsp; 3): Display the transformed images  \n",
    "&nbsp; 4): Split the training and test set  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:07.788679800Z",
     "start_time": "2023-07-06T09:03:07.746481Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os\n",
    "import cv2\n",
    "import math\n",
    "import random\n",
    "import matplotlib.pyplot as plt\n",
    "import shutil\n",
    "from sklearn.preprocessing import QuantileTransformer\n",
    "from PIL import Image\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read the Car-Hacking/CAN-Intrusion dataset\n",
    "The complete Car-Hacking dataset is publicly available at: https://ocslab.hksecurity.net/Datasets/CAN-intrusion-dataset  \n",
    "In this repository, due to the file size limit of GitHub, we use the 5% subset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:08.040220300Z",
     "start_time": "2023-07-06T09:03:07.750003500Z"
    }
   },
   "outputs": [],
   "source": [
    "#Read dataset\n",
    "df=pd.read_csv('data/Car_Hacking_5%.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:08.050784400Z",
     "start_time": "2023-07-06T09:03:08.042218700Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "        CAN ID  DATA[0]  DATA[1]  DATA[2]  DATA[3]  DATA[4]  DATA[5]  DATA[6]  \\\n0         1201       41       39       39       35        0        0        0   \n1          809       64      187      127       20       17       32        0   \n2         1349      216        0        0      136        0        0        0   \n3         1201       41       39       39       35        0        0        0   \n4            2        0        0        0        0        0        3        2   \n...        ...      ...      ...      ...      ...      ...      ...      ...   \n818435     848        5       32       52      104      117        0        0   \n818436    1088      255        0        0        0      255      134        9   \n818437     848        5       32      100      104      117        0        0   \n818438    1349      216       90        0      137        0        0        0   \n818439     790        5       33       48       10       33       30        0   \n\n        DATA[7] Label  \n0           154     R  \n1            20     R  \n2             0     R  \n3           154     R  \n4           228     R  \n...         ...   ...  \n818435       12     R  \n818436        0     R  \n818437       92     R  \n818438        0     R  \n818439      111     R  \n\n[818440 rows x 10 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>CAN ID</th>\n      <th>DATA[0]</th>\n      <th>DATA[1]</th>\n      <th>DATA[2]</th>\n      <th>DATA[3]</th>\n      <th>DATA[4]</th>\n      <th>DATA[5]</th>\n      <th>DATA[6]</th>\n      <th>DATA[7]</th>\n      <th>Label</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1201</td>\n      <td>41</td>\n      <td>39</td>\n      <td>39</td>\n      <td>35</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>154</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>809</td>\n      <td>64</td>\n      <td>187</td>\n      <td>127</td>\n      <td>20</td>\n      <td>17</td>\n      <td>32</td>\n      <td>0</td>\n      <td>20</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1349</td>\n      <td>216</td>\n      <td>0</td>\n      <td>0</td>\n      <td>136</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1201</td>\n      <td>41</td>\n      <td>39</td>\n      <td>39</td>\n      <td>35</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>154</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>3</td>\n      <td>2</td>\n      <td>228</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>818435</th>\n      <td>848</td>\n      <td>5</td>\n      <td>32</td>\n      <td>52</td>\n      <td>104</td>\n      <td>117</td>\n      <td>0</td>\n      <td>0</td>\n      <td>12</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>818436</th>\n      <td>1088</td>\n      <td>255</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>255</td>\n      <td>134</td>\n      <td>9</td>\n      <td>0</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>818437</th>\n      <td>848</td>\n      <td>5</td>\n      <td>32</td>\n      <td>100</td>\n      <td>104</td>\n      <td>117</td>\n      <td>0</td>\n      <td>0</td>\n      <td>92</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>818438</th>\n      <td>1349</td>\n      <td>216</td>\n      <td>90</td>\n      <td>0</td>\n      <td>137</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>R</td>\n    </tr>\n    <tr>\n      <th>818439</th>\n      <td>790</td>\n      <td>5</td>\n      <td>33</td>\n      <td>48</td>\n      <td>10</td>\n      <td>33</td>\n      <td>30</td>\n      <td>0</td>\n      <td>111</td>\n      <td>R</td>\n    </tr>\n  </tbody>\n</table>\n<p>818440 rows × 10 columns</p>\n</div>"
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:08.131587100Z",
     "start_time": "2023-07-06T09:03:08.052784200Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "Label\nR        701832\nRPM       32539\ngear      29944\nDoS       29501\nFuzzy     24624\nName: count, dtype: int64"
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The labels of the dataset. \"R\" indicates normal patterns, and there are four types of attack (DoS, fuzzy. gear spoofing, and RPM spoofing zttacks)\n",
    "df.Label.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Data Transformation\n",
    "Convert tabular data to images\n",
    "Procedures:\n",
    "1. Use quantile transform to transform the original data samples into the scale of [0,255], representing pixel values\n",
    "2. Generate images for each category (Normal, DoS, Fuzzy, Gear, RPM), each image consists of 27 data samples with 9 features. Thus, the size of each image is 9*9*3, length 9, width 9, and 3 color channels (RGB)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:09.029917800Z",
     "start_time": "2023-07-06T09:03:08.087993Z"
    }
   },
   "outputs": [],
   "source": [
    "# Transform all features into the scale of [0,1]\n",
    "numeric_features = df.dtypes[df.dtypes != 'object'].index\n",
    "scaler = QuantileTransformer() \n",
    "df[numeric_features] = scaler.fit_transform(df[numeric_features])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:09.083315300Z",
     "start_time": "2023-07-06T09:03:09.030919300Z"
    }
   },
   "outputs": [],
   "source": [
    "# Multiply the feature values by 255 to transform them into the scale of [0,255]\n",
    "df[numeric_features] = df[numeric_features].apply(\n",
    "    lambda x: (x*255))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:09.331286300Z",
     "start_time": "2023-07-06T09:03:09.084313100Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "              CAN ID        DATA[0]        DATA[1]        DATA[2]  \\\ncount  818440.000000  818440.000000  818440.000000  818440.000000   \nmean      127.457890     113.711554     107.926505      89.813595   \nstd        73.812063      89.982269      93.314034     100.866477   \nmin         0.000000       0.000000       0.000000       0.000000   \n25%        66.621622       0.000000       0.000000       0.000000   \n50%       122.267267     126.223724     115.630631       0.000000   \n75%       190.292793     192.590090     192.972973     199.992492   \nmax       255.000000     255.000000     255.000000     255.000000   \n\n             DATA[3]        DATA[4]        DATA[5]        DATA[6]  \\\ncount  818440.000000  818440.000000  818440.000000  818440.000000   \nmean      109.978430     105.412321     112.250627      84.973873   \nstd       103.679776      95.557986      91.033532     101.390068   \nmin         0.000000       0.000000       0.000000       0.000000   \n25%         0.000000       0.000000       0.000000       0.000000   \n50%       130.690691     127.244745     129.159159       0.000000   \n75%       191.186186     192.717718     190.420420     192.207207   \nmax       255.000000     255.000000     255.000000     255.000000   \n\n             DATA[7]  \ncount  818440.000000  \nmean       93.112763  \nstd       100.247486  \nmin         0.000000  \n25%         0.000000  \n50%         0.000000  \n75%       190.675676  \nmax       255.000000  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>CAN ID</th>\n      <th>DATA[0]</th>\n      <th>DATA[1]</th>\n      <th>DATA[2]</th>\n      <th>DATA[3]</th>\n      <th>DATA[4]</th>\n      <th>DATA[5]</th>\n      <th>DATA[6]</th>\n      <th>DATA[7]</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>count</th>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n      <td>818440.000000</td>\n    </tr>\n    <tr>\n      <th>mean</th>\n      <td>127.457890</td>\n      <td>113.711554</td>\n      <td>107.926505</td>\n      <td>89.813595</td>\n      <td>109.978430</td>\n      <td>105.412321</td>\n      <td>112.250627</td>\n      <td>84.973873</td>\n      <td>93.112763</td>\n    </tr>\n    <tr>\n      <th>std</th>\n      <td>73.812063</td>\n      <td>89.982269</td>\n      <td>93.314034</td>\n      <td>100.866477</td>\n      <td>103.679776</td>\n      <td>95.557986</td>\n      <td>91.033532</td>\n      <td>101.390068</td>\n      <td>100.247486</td>\n    </tr>\n    <tr>\n      <th>min</th>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n    </tr>\n    <tr>\n      <th>25%</th>\n      <td>66.621622</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n    </tr>\n    <tr>\n      <th>50%</th>\n      <td>122.267267</td>\n      <td>126.223724</td>\n      <td>115.630631</td>\n      <td>0.000000</td>\n      <td>130.690691</td>\n      <td>127.244745</td>\n      <td>129.159159</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n    </tr>\n    <tr>\n      <th>75%</th>\n      <td>190.292793</td>\n      <td>192.590090</td>\n      <td>192.972973</td>\n      <td>199.992492</td>\n      <td>191.186186</td>\n      <td>192.717718</td>\n      <td>190.420420</td>\n      <td>192.207207</td>\n      <td>190.675676</td>\n    </tr>\n    <tr>\n      <th>max</th>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n      <td>255.000000</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "All features are in the same scale of [0,255]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate images for each class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:09.553237400Z",
     "start_time": "2023-07-06T09:03:09.334282600Z"
    }
   },
   "outputs": [],
   "source": [
    "df0=df[df['Label']=='R'].drop(['Label'],axis=1)\n",
    "df1=df[df['Label']=='RPM'].drop(['Label'],axis=1)\n",
    "df2=df[df['Label']=='gear'].drop(['Label'],axis=1)\n",
    "df3=df[df['Label']=='DoS'].drop(['Label'],axis=1)\n",
    "df4=df[df['Label']=='Fuzzy'].drop(['Label'],axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:09.561532500Z",
     "start_time": "2023-07-06T09:03:09.557535700Z"
    }
   },
   "outputs": [],
   "source": [
    "# Generate 9*9 color images for class 0 (Normal)\n",
    "count=0\n",
    "ims = []\n",
    "\n",
    "image_path = \"train/0/\"\n",
    "os.makedirs(image_path)\n",
    "\n",
    "for i in range(0, 2):\n",
    "    count=count+1\n",
    "    if count<=27: \n",
    "        im=df0.iloc[i].values\n",
    "        ims=np.append(ims,im)\n",
    "    else:\n",
    "        print(ims)\n",
    "        ims=np.array(ims).reshape(9,9,3)\n",
    "        print(ims)\n",
    "        array = np.array(ims, dtype=np.uint8)\n",
    "        new_image = Image.fromarray(array)\n",
    "        new_image.save(image_path+str(i)+'.png')\n",
    "        count=0\n",
    "        ims = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:11.704231100Z",
     "start_time": "2023-07-06T09:03:09.564039300Z"
    }
   },
   "outputs": [],
   "source": [
    "# Generate 9*9 color images for class 1 (RPM spoofing)\n",
    "count=0\n",
    "ims = []\n",
    "\n",
    "image_path = \"train/1/\"\n",
    "os.makedirs(image_path)\n",
    "\n",
    "for i in range(0, len(df1)):  \n",
    "    count=count+1\n",
    "    if count<=27: \n",
    "        im=df1.iloc[i].values\n",
    "        ims=np.append(ims,im)\n",
    "    else:\n",
    "        ims=np.array(ims).reshape(9,9,3)\n",
    "        array = np.array(ims, dtype=np.uint8)\n",
    "        new_image = Image.fromarray(array)\n",
    "        new_image.save(image_path+str(i)+'.png')\n",
    "        count=0\n",
    "        ims = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:13.510844700Z",
     "start_time": "2023-07-06T09:03:11.707374200Z"
    }
   },
   "outputs": [],
   "source": [
    "# Generate 9*9 color images for class 2 (Gear spoofing)\n",
    "count=0\n",
    "ims = []\n",
    "\n",
    "image_path = \"train/2/\"\n",
    "os.makedirs(image_path)\n",
    "\n",
    "for i in range(0, len(df2)):  \n",
    "    count=count+1\n",
    "    if count<=27: \n",
    "        im=df2.iloc[i].values\n",
    "        ims=np.append(ims,im)\n",
    "    else:\n",
    "        ims\n",
    "        ims=np.array(ims).reshape(9,9,3)\n",
    "        ims\n",
    "        array = np.array(ims, dtype=np.uint8)\n",
    "        new_image = Image.fromarray(array)\n",
    "        new_image.save(image_path+str(i)+'.png')\n",
    "        count=0\n",
    "        ims = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:15.293229800Z",
     "start_time": "2023-07-06T09:03:13.514351300Z"
    }
   },
   "outputs": [],
   "source": [
    "# Generate 9*9 color images for class 3 (DoS attack)\n",
    "count=0\n",
    "ims = []\n",
    "\n",
    "image_path = \"train/3/\"\n",
    "os.makedirs(image_path)\n",
    "\n",
    "\n",
    "for i in range(0, len(df3)):  \n",
    "    count=count+1\n",
    "    if count<=27: \n",
    "        im=df3.iloc[i].values\n",
    "        ims=np.append(ims,im)\n",
    "    else:\n",
    "        ims=np.array(ims).reshape(9,9,3)\n",
    "        array = np.array(ims, dtype=np.uint8)\n",
    "        new_image = Image.fromarray(array)\n",
    "        new_image.save(image_path+str(i)+'.png')\n",
    "        count=0\n",
    "        ims = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:16.797229300Z",
     "start_time": "2023-07-06T09:03:15.294734300Z"
    }
   },
   "outputs": [],
   "source": [
    "# Generate 9*9 color images for class 4 (Fuzzy attack)\n",
    "count=0\n",
    "ims = []\n",
    "\n",
    "image_path = \"train/4/\"\n",
    "os.makedirs(image_path)\n",
    "\n",
    "\n",
    "for i in range(0, len(df4)):  \n",
    "    count=count+1\n",
    "    if count<=27: \n",
    "        im=df4.iloc[i].values\n",
    "        ims=np.append(ims,im)\n",
    "    else:\n",
    "        ims=np.array(ims).reshape(9,9,3)\n",
    "        array = np.array(ims, dtype=np.uint8)\n",
    "        new_image = Image.fromarray(array)\n",
    "        new_image.save(image_path+str(i)+'.png')\n",
    "        count=0\n",
    "        ims = []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split the training and test set "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:16.815834800Z",
     "start_time": "2023-07-06T09:03:16.797229300Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4163\n"
     ]
    }
   ],
   "source": [
    "# Create folders to store images\n",
    "Train_Dir='./train/'\n",
    "Val_Dir='./test/'\n",
    "allimgs=[]\n",
    "for subdir in os.listdir(Train_Dir):\n",
    "    for filename in os.listdir(os.path.join(Train_Dir,subdir)):\n",
    "        filepath=os.path.join(Train_Dir,subdir,filename)\n",
    "        allimgs.append(filepath)\n",
    "print(len(allimgs)) # Print the total number of images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:16.838914900Z",
     "start_time": "2023-07-06T09:03:16.818833300Z"
    }
   },
   "outputs": [],
   "source": [
    "#split a test set from the dataset, train/test size = 80%/20%\n",
    "Numbers=len(allimgs)//5 \t#size of test set (20%)\n",
    "\n",
    "def mymovefile(srcfile,dstfile):\n",
    "    if not os.path.isfile(srcfile):\n",
    "        print (\"%s not exist!\"%(srcfile))\n",
    "    else:\n",
    "        fpath,fname=os.path.split(dstfile)    \n",
    "        if not os.path.exists(fpath):\n",
    "            os.makedirs(fpath)               \n",
    "        shutil.move(srcfile,dstfile)          \n",
    "        #print (\"move %s -> %s\"%(srcfile,dstfile))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "scrolled": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:16.838914900Z",
     "start_time": "2023-07-06T09:03:16.822343500Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "832"
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The size of test set\n",
    "Numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:17.654719900Z",
     "start_time": "2023-07-06T09:03:16.832397200Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finish creating test set\n"
     ]
    }
   ],
   "source": [
    "# Create the test set\n",
    "val_imgs=random.sample(allimgs,Numbers)\n",
    "for img in val_imgs:\n",
    "    dest_path=img.replace(Train_Dir,Val_Dir)\n",
    "    mymovefile(img,dest_path)\n",
    "print('Finish creating test set')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:17.660725400Z",
     "start_time": "2023-07-06T09:03:17.658724800Z"
    }
   },
   "outputs": [],
   "source": [
    "#resize the images 224*224 for better CNN training\n",
    "def get_224(folder,dstdir):\n",
    "    imgfilepaths=[]\n",
    "    for root,dirs,imgs in os.walk(folder):\n",
    "        for thisimg in imgs:\n",
    "            thisimg_path=os.path.join(root,thisimg)\n",
    "            imgfilepaths.append(thisimg_path)\n",
    "    for thisimg_path in imgfilepaths:\n",
    "        dir_name,filename=os.path.split(thisimg_path)\n",
    "        dir_name=dir_name.replace(folder,dstdir)\n",
    "        new_file_path=os.path.join(dir_name,filename)\n",
    "        if not os.path.exists(dir_name):\n",
    "            os.makedirs(dir_name)\n",
    "        img=cv2.imread(thisimg_path)\n",
    "        img=cv2.resize(img,(224,224))\n",
    "        cv2.imwrite(new_file_path,img)\n",
    "    print('Finish resizing'.format(folder=folder))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:22.772090900Z",
     "start_time": "2023-07-06T09:03:17.661728600Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finish resizing\n"
     ]
    }
   ],
   "source": [
    "DATA_DIR_224='./train_224/'\n",
    "get_224(folder='./train/',dstdir=DATA_DIR_224)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:24.056886300Z",
     "start_time": "2023-07-06T09:03:22.772621Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finish resizing\n"
     ]
    }
   ],
   "source": [
    "DATA_DIR2_224='./test_224/'\n",
    "get_224(folder='./test/',dstdir=DATA_DIR2_224)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display samples for each category"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-07-06T09:03:24.562540100Z",
     "start_time": "2023-07-06T09:03:24.056886300Z"
    }
   },
   "outputs": [
    {
     "ename": "FileNotFoundError",
     "evalue": "[Errno 2] No such file or directory: './train_224/0/27.png'",
     "output_type": "error",
     "traceback": [
      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
      "\u001B[1;31mFileNotFoundError\u001B[0m                         Traceback (most recent call last)",
      "Cell \u001B[1;32mIn[34], line 2\u001B[0m\n\u001B[0;32m      1\u001B[0m \u001B[38;5;66;03m# Read the images for each category, the file name may vary (27.png, 83.png...)\u001B[39;00m\n\u001B[1;32m----> 2\u001B[0m img1 \u001B[38;5;241m=\u001B[39m \u001B[43mImage\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mopen\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[38;5;124;43m./train_224/0/27.png\u001B[39;49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m      3\u001B[0m img2 \u001B[38;5;241m=\u001B[39m Image\u001B[38;5;241m.\u001B[39mopen(\u001B[38;5;124m'\u001B[39m\u001B[38;5;124m./train_224/1/83.png\u001B[39m\u001B[38;5;124m'\u001B[39m)\n\u001B[0;32m      4\u001B[0m img3 \u001B[38;5;241m=\u001B[39m Image\u001B[38;5;241m.\u001B[39mopen(\u001B[38;5;124m'\u001B[39m\u001B[38;5;124m./train_224/2/27.png\u001B[39m\u001B[38;5;124m'\u001B[39m)\n",
      "File \u001B[1;32m~\\anaconda3\\envs\\FlowPicRefresh\\lib\\site-packages\\PIL\\Image.py:3227\u001B[0m, in \u001B[0;36mopen\u001B[1;34m(fp, mode, formats)\u001B[0m\n\u001B[0;32m   3224\u001B[0m     filename \u001B[38;5;241m=\u001B[39m fp\n\u001B[0;32m   3226\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m filename:\n\u001B[1;32m-> 3227\u001B[0m     fp \u001B[38;5;241m=\u001B[39m \u001B[43mbuiltins\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mopen\u001B[49m\u001B[43m(\u001B[49m\u001B[43mfilename\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mrb\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m   3228\u001B[0m     exclusive_fp \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;01mTrue\u001B[39;00m\n\u001B[0;32m   3230\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m:\n",
      "\u001B[1;31mFileNotFoundError\u001B[0m: [Errno 2] No such file or directory: './train_224/0/27.png'"
     ]
    }
   ],
   "source": [
    "# Read the images for each category, the file name may vary (27.png, 83.png...)\n",
    "img1 = Image.open('./train_224/0/27.png')\n",
    "img2 = Image.open('./train_224/1/83.png')\n",
    "img3 = Image.open('./train_224/2/27.png')\n",
    "img4 = Image.open('./train_224/3/27.png')\n",
    "img5 = Image.open('./train_224/4/27.png')\n",
    "\n",
    "plt.figure(figsize=(10, 10)) \n",
    "plt.subplot(1,5,1)\n",
    "plt.imshow(img1)\n",
    "plt.title(\"Normal\")\n",
    "plt.subplot(1,5,2)\n",
    "plt.imshow(img2)\n",
    "plt.title(\"RPM Spoofing\")\n",
    "plt.subplot(1,5,3)\n",
    "plt.imshow(img3)\n",
    "plt.title(\"Gear Spoofing\")\n",
    "plt.subplot(1,5,4)\n",
    "plt.imshow(img4)\n",
    "plt.title(\"DoS Attack\")\n",
    "plt.subplot(1,5,5)\n",
    "plt.imshow(img5)\n",
    "plt.title(\"Fuzzy Attack\")\n",
    "plt.show()  # display it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "start_time": "2023-07-06T09:03:24.562540100Z"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}