Ashing's Blog: 機器學習(8)--實作多層感知器(Multilayer Perceptron，MLP)手寫數字辨識

在這章節，我們使用Python實作一多層感知器(Multilayer Perceptron，MLP)來做手寫數字辨識功能。使用MNIST的數據集及反向傳遞演算法(Backpropagation)做模型訓練及測試。並將訓練好的權重存成檔案，以便對新的數據直接作預測，不須每次都重新訓練模型。最後練習使用windows內建的"小畫家"自己手畫數字進行辨識。結果如下圖所示：

<圖一>第1、3列為使用windows小畫家手繪出的手寫數字，第2，4列為相對應的預測結果。

有關反向傳遞演算法(Backpropagation)的公式推導，可以參閱底下連結：
該連結範例為有帶值的手解計算，作者：Matt Mazur已解釋的非常清楚了，
建議大家可以跟著做一次。
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

補充：對於啟動函數sigmoid 的求導推導有疑慮的，可以參考我下面手寫證明的方式：

多層感知器(Multilayer Perceptron，MLP)，如下圖所示。
因為我們使用來作MNIST的手寫辨識，MNIST 數據集為原本28x28 pixel的灰階數字圖，我們會把它展平成28x28=784個pixel單行向量當作是特徵輸入，因此輸入層Xn的n為784的特徵數量。
至於中間的隱藏層，在程式中可以自訂感知器數量m，m在程式裡預設為n_hidden=100。
模型皆使用sigmoid啟動函數，至於輸出層l個數為10個，分別代表數字0-9的輸出機率。
也就是每一個樣本有784個特徵值輸入，經過隱含層到輸出層後，會使用sigmoid函數算出10個可能機率，該機率最大者便是代表該預測的數字。

MNIST 數據集可以從下面網址下載原始資料：http://yann.lecun.com/exdb/mnist/

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

解壓縮後分別代表訓練樣本train-image 有60000筆，train-label為訓練樣本對應的正確標籤。
測試樣本t10k-image 有10000筆，t10k-label為測試樣本對應的正確標籤。
故在程式裡我們使用二維矩陣訓練樣本(60000,784)，測試樣本(10000,784)當輸入。

下圖為前100筆訓練樣本train-image還原成28x28 pixel的圖片樣式：
該sample code可以從我的Github 連結下載：
https://github.com/Ashing00/Load_MNIST/blob/master/mnist_load.py

底下介紹範例程式裡的參數設定，及我自行增加的程式：
有關NeuralNetMLP 模型的原始程式可以參閱底下作者Sebastian Raschka github：
https://github.com/rasbt/python-machine-learning-book

NeuralNetMLP 模型的使用參數：

nn = NeuralNetMLP(n_output=10, #輸出層個數，分別代表數字0-9的機率
n_features=X_train.shape[1], #輸入層樣本的的feature個數應為784
n_hidden=100, #隱藏層個數可以自訂
l2=0.5, #使用L2 正規化項,設0等於不使用
l1=0.0, #使用L1 正規化項,設0等於不使用
epochs=1000, #預設跑1000輪，
eta=0.002, #學習率
alpha=0.001, # Momentum
decrease_const=0.00001, #根據epochs調整學習率
minibatches=100, #使用mini batches
shuffle=True, #SGD 隨機洗牌
random_state=1) #隨機值

底下是我自行增加的儲存權重及重新載入權重的程式，當訓練完畢後，我們可以儲存權重，這樣當有新的樣本要做預測時便可重新載入該訓練好的權重直接進行預測，而不需重新訓練模型在進預測。因為該模型進行一次訓練可能需花10~30分鐘。我的筆電使用i7-6700CPU約15分鐘完成一次訓練。
在程式裡我使用了train_flag 設定是否需要訓練模型或是直接載入權重進行預測
預設 train_flag=True

def saveweight(w1,w2):
 with open ('weight01.csv', mode='w',newline="\n") as write_file:
  writer = csv.writer(write_file)
  for i in range(len(w1)):
   writer.writerow([w1[i]])
 with open ('weight02.csv', mode='w',newline="\n") as write_file2:
  writer = csv.writer(write_file2)
  for i in range(len(w2)):
   writer.writerow([w2[i]])  
   
def loadWeight1():  
 l=[]  
 with open('weight01.csv') as file:  
   lines=csv.reader(file)  
   for line in lines:  
    l.append(line) 
 l=np.array(l).astype(float)
 data=l.copy()
 return data

def loadWeight2():  
 l=[]  
 with open('weight02.csv') as file:  
   lines=csv.reader(file)  
   for line in lines:  
    l.append(line) 
 l=np.array(l).astype(float)
 data=l.copy()
 return data

底下是該MNIST 訓練的平均成本收斂情形，約在800輪時可收斂

底下是訓練樣本預測準確率為99.34%，而測試樣本為96.87% ,有些過度適合現象(Overfitting)，不過還可以接受。經實驗發現提高隱藏層個數容易發生過度適合現象(Overfitting)。可自行試看看。

接著底下程式片段便是我利用OPENCV載入，自行用小畫家繪製的數字圖進行預測。

##My Predict testing +
##預測自己輸入的手寫數字圖
#自己手寫的20個數字
My_X =np.zeros((20,784), dtype=int) 
#自己手寫的20個數字對應的正確期望數字
My_Yd =np.array([0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9], dtype=int) 
img_num=[0]*20
img_res=[0]*20
#輸入20個手寫數字圖檔28x28=784 pixel，
Input_Numer=[0]*20
Input_Numer[0]="0_1.jpg"
Input_Numer[1]="1_1.jpg"
Input_Numer[2]="2_1.jpg"
Input_Numer[3]="3_1.jpg"
Input_Numer[4]="4_1.jpg"
Input_Numer[5]="5_1.jpg"
Input_Numer[6]="6_1.jpg"
Input_Numer[7]="7_1.jpg"
Input_Numer[8]="8_1.jpg"
Input_Numer[9]="9_1.jpg"
Input_Numer[10]="0_2.jpg"
Input_Numer[11]="1_2.jpg"
Input_Numer[12]="2_2.jpg"
Input_Numer[13]="3_2.jpg"
Input_Numer[14]="4_2.jpg"
Input_Numer[15]="5_2.jpg"
Input_Numer[16]="6_2.jpg"
Input_Numer[17]="7_2.jpg"
Input_Numer[18]="8_2.jpg"
Input_Numer[19]="9_2.jpg"

for i in range(20):  #read 20 digits picture
 img = cv2.imread(Input_Numer[i],0)    #Gray
 img_num[i]=img.copy()
 img=img.reshape(My_X.shape[1])
 My_X[i] =img.copy()


My_test_pred = nn.predict(My_X)
print("期望值：",My_Yd)
print("預測值：",My_test_pred)
acc = np.sum(My_Yd == My_test_pred, axis=0) / My_X.shape[0]
print('Test accuracy: %.2f%%' % (acc * 100))

font = cv2.FONT_HERSHEY_SIMPLEX
for i in range(20):  
 img_res[i] = np.zeros((64,64,3), np.uint8)
 img_res[i][:,:]=[255,255,255]
 if (My_test_pred[i]%10)==(i%10):
  cv2.putText(img_res[i],str(My_test_pred[i]),(15,52), font, 2,(0,255,0),3,cv2.LINE_AA)
 else:
  cv2.putText(img_res[i],str(My_test_pred[i]),(15,52), font, 2,(255,0,0),3,cv2.LINE_AA)

Input_Numer_name = ['Input 0', 'Input 1','Input 2', 'Input 3','Input 4',\
     'Input 5','Input 6', 'Input 7','Input8', 'Input9',\
     'Input 0', 'Input 1','Input 2', 'Input 3','Input 4',\
     'Input 5','Input 6', 'Input 7','Input8', 'Input9',
     ]
     
predict_Numer_name =['predict 0', 'predict 1','predict 2', 'predict 3','predict 4', \
     'predict 5','predict6 ', 'predict 7','predict 8', 'predict 9',\
     'predict 0', 'predict 1','predict 2', 'predict 3','predict 4', \
     'predict 5','predict6 ', 'predict 7','predict 8', 'predict 9',
     ]
    
for i in range(20):
 if i<10: -="" cmap="gray" else:="" i="" img_num="" img_res="" nput_numer_name="" plt.imshow="" plt.show="" plt.subplot="" plt.title="" plt.xticks="" plt.yticks="" pre="" predict="" predict_numer_name="" testing="" y="">

結果便是底下圖片所示：
儘管在訓練及測試樣本都有很高的準確度，但在實際應用上還是略差一些。

<多層感知器(Multilayer Perceptron，MLP)完整範例程式>

https://github.com/Ashing00/Multilayer-Perceptron/blob/master/MLP.py

<參考資料>書名：Python機器學習，作者：Sebastian Raschka

https://github.com/rasbt/python-machine-learning-book

加入阿布拉機的3D列印與機器人的FB專頁
https://www.facebook.com/arbu00/

演算法(2)--使用Numpy.bincount來實作簡單的桶子排序法

機器學習(3)--適應線性神經元與梯度下降法(Adaline neuron and Gradient descent)
機器學習(4)--資料標準常態化與隨機梯度下降法( standardization & Stochastic Gradient descent)
機器學習(5)--邏輯斯迴歸，過度適合與正規化( Logistic regression，overfitting and regularization)
機器學習(6)--主成分分析(Principal component analysis，PCA)
機器學習(7)--利用核主成分分析(Kernel PCA)處理非線性對應

2017年3月11日 星期六

機器學習(8)--實作多層感知器(Multilayer Perceptron，MLP)手寫數字辨識

2017年3月11日星期六