如何建设提卡网站,学校网站建设策划方案,wordpress去掉域名后缀,马鞍山 做网站原文地址#xff1a; http://ihoge.cn/2018/PCASVM人脸识别.html
加载数据
这里使用的测试数据共包含40位人员照片#xff0c;每个人10张照片。也可登陆http://www.cl.cam.ac.uk/research/dtg/attarchive/facesataglance.html 查看400张照片的缩略图。
import time
impo…原文地址 http://ihoge.cn/2018/PCASVM人脸识别.html
加载数据
这里使用的测试数据共包含40位人员照片每个人10张照片。也可登陆http://www.cl.cam.ac.uk/research/dtg/attarchive/facesataglance.html 查看400张照片的缩略图。
import time
import logging
from sklearn.datasets import fetch_olivetti_faceslogging.basicConfig(level logging.INFO, format%(asctime)s %(message)s) # 这里INFO必须大写data_home code/datasets/
logging.info(开始加载数据)
faces fetch_olivetti_faces(data_homedata_home)
logging.info(加载完成)
这里做下简单的解释
加载的图片保存在faces变量里sklaern已经把每张照片处理成剪切掉头发部分并且64x64大小且人脸居中显示。在真实生产环境中这一步很重要否则模型将被大量的噪声干扰即照片背景变化的发型等这些特征都应该排除在输入特征之外。最后要成功下载数据集还需要安装Python图片图里工具Pillow否则无法对图片解码。下面输出下数据的概要信息
import numpy as npX faces.data
y faces.targettargets np.unique(faces.target)
target_names np.array([p%d % t for t in targets]) #给每个人做标签
n_targets target_name.shape[0]
n_samples, h, w faces.images.shapeprint(Samples count:{}\nTarget count:{}.format(n_samples, n_targets))
print(Image size:{}x{}\nData shape:{}.format(w, h, X.shape))
Samples count:400
Target count:40
Image size:64x64
Data shape:(400, 4096)由输出可知共有40人照片总量400输入特征(64x644096)个。
为了直观观察数据从每个人物的照片里随机选择一张显示定义下画图工具
其中输入参数images是一个二维数据每一行都是一个图片数据。在加载数据时fech_ollivetti_faces()函数已经自动做了预处理图片的每个像素的RBG值都转换成了[0,1]浮点数。因此画出来的照片也是黑白的。子图片识别领域一般用黑白照片就可以了减少计算量的同时也更加准确。
%matplotlib inline
from matplotlib import pyplot as plt
def plot_gallery(images, titles, h, w, n_row2, n_col5):
# 显示图片阵列plt.figure(figsize(2*n_col, 2.2*n_row),dpi140)plt.subplots_adjust(bottom0, left.01, right.99, top.90, hspace.01)for i in range(n_row * n_col):plt.subplot(n_row, n_col, i1)plt.imshow(images[i].reshape((h,w)), cmapplt.cm.gray)plt.title(titles[i])plt.axis(off)n_row 2
n_col 6sample_images None
sample_titles []
for i in range(n_targets):people_images X[yi] # 注意这里传入ipeople_sample_index np.random.randint(0, people_images.shape[0], 1)people_sample_image people_images[people_sample_index, :]if sample_images is not None:sample_images np.concatenate((sample_images, people_sample_image), axis0)else:sample_images people_sample_imagesample_titles.append(target_names[i]) # 这里target_names是在前面生成的标签plot_gallery(sample_images, sample_titles, h, w, n_row, n_col)#代码中X[yi]可以选择除特定的所有照片随机选出来的照片放在sample.images数组对象里最后调用之前定义的函数把照片画出来。 from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test train_test_split(X, y, test_size0.2, random_state4)
支持向量机第一次尝试
直接食用支持向量机来实现人脸识别
from time import time
from sklearn.svm import SVCt time()
clf SVC(class_weightbalanced)
clf.fit(X_train, y_train)
print(耗时{}秒.format(time() - t))
耗时1.0220119953155518秒1、接着对人脸数据进行预测使用confusion_matrix查看准确性
from sklearn.metrics import confusion_matrixy_pred clf.predict(X_test)
cm confusion_matrix(y_test, y_pred, labelsrange(n_targets))
print(confusion_matrix:\n)
# np.set_printoptions(thresholdnp.nan)
print(cm[:10])
confusion_matrix:[[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]上面np.set_printoptions()是为了确保cm完整输出这是因为这个数组是40x40的默认情况下不会全部输出。??
2、使用classification_report查看准确性
但是很明显输出结果效果很差。 因为confusion_matrix理想的输出是矩阵的对角线上有数组其他地方都为0而且这里很多图片都被预测成索引为12的类别了。我买再来看下classification_report的结果
from sklearn.metrics import classification_reportprint(classification_report(y_test, y_pred, target_names target_names)) #这里y_test和y_pred不要颠倒。 precision recall f1-score supportp0 0.00 0.00 0.00 1p1 0.00 0.00 0.00 3p2 0.00 0.00 0.00 2p3 0.00 0.00 0.00 1p4 0.00 0.00 0.00 1p5 0.00 0.00 0.00 1p6 0.00 0.00 0.00 4p7 0.00 0.00 0.00 2p8 0.00 0.00 0.00 4p9 0.00 0.00 0.00 2p10 0.00 0.00 0.00 1p11 0.00 0.00 0.00 0p12 0.00 0.00 0.00 4p13 0.00 0.00 0.00 4p14 0.00 0.00 0.00 1p15 0.00 0.00 0.00 1p16 0.00 0.00 0.00 3p17 0.00 0.00 0.00 2p18 0.00 0.00 0.00 2p19 0.00 0.00 0.00 2p20 0.00 0.00 0.00 1p21 0.00 0.00 0.00 2p22 0.00 0.00 0.00 3p23 0.00 0.00 0.00 2p24 0.00 0.00 0.00 3p25 0.00 0.00 0.00 3p26 0.00 0.00 0.00 2p27 0.00 0.00 0.00 2p28 0.00 0.00 0.00 0p29 0.00 0.00 0.00 2p30 0.00 0.00 0.00 2p31 0.00 0.00 0.00 3p32 0.00 0.00 0.00 2p33 0.00 0.00 0.00 2p34 0.00 0.00 0.00 0p35 0.00 0.00 0.00 2p36 0.00 0.00 0.00 3p37 0.00 0.00 0.00 1p38 0.00 0.00 0.00 2p39 0.00 0.00 0.00 2avg / total 0.00 0.00 0.00 80/Users/hadoop/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.precision, predicted, average, warn_for)
/Users/hadoop/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py:1137: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.recall, true, average, warn_for)结果不出预料果然很差。
这是因为这里把每个像素都作为一个输入特征来处理这样那个数据噪声太严重了模型根本没有办法对训练数据集进行拟合。这里有4096个特征向量可是数据集大小才400个比特征个数少了太多而且分出20%作为测试集。这种情况下根本无法进行准确的训练和预测。
使用PCA来处理数据集
选择kkk值解决上述问题的办法有两种,一个是加大数据样本量(在这里这个不太现实),或者使用PCA给数据降维,值选择前k个最重要的特征。这里我们根据PCA算法来计算失真程度来确定k值。在sklearn里,可以从PCA模型的explained_variance_ratio_变量里获取经PCA处理后的数据还原率。这是一个数组,所有元素求和即可知道选择的k" role="presentation" style="position: relative;">kkk值的数据还原率。随着kkk的增大,数值会无限接近于1。利用这一特征,可以让k" role="presentation" style="position: relative;">kkk取值10300之间每个30取一次样。针对这里的情况选择失真度小于5%即可。 from sklearn.decomposition import PCAprint(Exploring explained variance ratio for dataset ...)
candidate_components range(10, 300, 30)
explained_ratios []
t time()
for c in candidate_components:pca PCA(n_componentsc)X_pca pca.fit_transform(X)explained_ratios.append(np.sum(pca.explained_variance_ratio_))
print(Done in {0:.2f}s.format(time()-t)) Exploring explained variance ratio for dataset ...
Done in 2.17splt.figure(figsize(8, 5), dpi100)
plt.grid()
plt.plot(candidate_components, explained_ratios)
plt.xlabel(Number of PCA Components)
plt.ylabel(Explained Variance Ratio)
plt.title(Explained variance ratio for PCA)
plt.yticks(np.arange(0.5, 1.05, .05))
plt.xticks(np.arange(0, 300, 20)); 由上图可知若要保留95%的数据还原率kkk值选择120即可。为了更直观的看不同k" role="presentation" style="position: relative;">kkk值的区别这里画出来体验下 def title_prefix(prefix, title):return {}: {}.format(prefix, title)n_row 1
n_col 5sample_images sample_images[0:5]
sample_titles sample_titles[0:5]plotting_images sample_images
plotting_titles [title_prefix(orig, t) for t in sample_titles]
candidate_components [120, 75, 37, 19, 8]
for c in candidate_components:print(Fitting and projecting on PCA(n_components{}) ....format(c))t time()pca PCA(n_componentsc)pca.fit(X)X_sample_pca pca.transform(sample_images)X_sample_inv pca.inverse_transform(X_sample_pca)plotting_images np.concatenate((plotting_images, X_sample_inv), axis0)sample_title_pca [title_prefix({}.format(c), t) for t in sample_titles]plotting_titles np.concatenate((plotting_titles, sample_title_pca), axis0)print(Done in {0:.2f}s.format(time() - t))print(Plotting sample image with different number of PCA conpoments ...)
plot_gallery(plotting_images, plotting_titles, h, w,n_row * (len(candidate_components) 1), n_col) Fitting and projecting on PCA(n_components120) ...
Done in 0.18s
Fitting and projecting on PCA(n_components75) ...
Done in 0.14s
Fitting and projecting on PCA(n_components37) ...
Done in 0.11s
Fitting and projecting on PCA(n_components19) ...
Done in 0.07s
Fitting and projecting on PCA(n_components8) ...
Done in 0.06s
Plotting sample image with different number of PCA conpoments ...利用GridSearchCV选出最优参数
接下来选择k120k120k=120作为PCA的参数对数据集和测试集进行特征提取然后调用GridSearchCV选出最优参数
n_components 120print(Fitting PCA by using training data ...)
t time()
pca PCA(n_componentsn_components, svd_solverrandomized, whitenTrue).fit(X_train)
print(Done in {0:.2f}s.format(time() - t))print(Projecting input data for PCA ...)
t time()
X_train_pca pca.transform(X_train)
X_test_pca pca.transform(X_test)
print(Done in {0:.2f}s.format(time() - t))
Fitting PCA by using training data ...
Done in 0.16s
Projecting input data for PCA ...
Done in 0.01sfrom sklearn.model_selection import GridSearchCVprint(Searching the best parameters for SVC ...)
param_grid {C: [1, 5, 10, 50],gamma: [0.0001, 0.0005, 0.001, 0.005, 0.01]}
clf GridSearchCV(SVC(kernelrbf, class_weightbalanced), param_grid, verbose2, n_jobs4)# 参数n_jobs4表示启动4个进程
clf clf.fit(X_train_pca, y_train)
print(Best parameters found by grid search:)
print(clf.best_params_)
Searching the best parameters for SVC ...
Fitting 3 folds for each of 20 candidates, totalling 60 fits
[CV] C1, gamma0.0001 ...............................................
[CV] C1, gamma0.0001 ...............................................
[CV] C1, gamma0.0001 ...............................................
[CV] C1, gamma0.0005 ...............................................
[CV] ................................ C1, gamma0.0001, total 0.1s
[CV] C1, gamma0.0005 ...............................................
[CV] ................................ C1, gamma0.0001, total 0.1s
[CV] C1, gamma0.0005 ...............................................
[CV] ................................ C1, gamma0.0001, total 0.1s
[CV] C1, gamma0.001 ................................................
[CV] ................................ C1, gamma0.0005, total 0.1s
[CV] C1, gamma0.001 ................................................
[CV] ................................ C1, gamma0.0005, total 0.1s
[CV] C1, gamma0.001 ................................................
[CV] ................................ C1, gamma0.0005, total 0.1s
[CV] C1, gamma0.01 .................................................
[CV] ................................. C1, gamma0.001, total 0.1s
[CV] C5, gamma0.0001 ...............................................
[CV] ................................. C1, gamma0.001, total 0.1s
[CV] C5, gamma0.0005 ...............................................
[CV] ................................. C1, gamma0.001, total 0.1s
[CV] C1, gamma0.005 ................................................
[CV] .................................. C1, gamma0.01, total 0.1s
[CV] C1, gamma0.01 .................................................
[CV] ................................ C5, gamma0.0001, total 0.1s
[CV] C5, gamma0.0001 ...............................................
[CV] ................................ C5, gamma0.0005, total 0.1s
[CV] C5, gamma0.001 ................................................
[CV] ................................. C1, gamma0.005, total 0.1s
[CV] C1, gamma0.005 ................................................
[CV] .................................. C1, gamma0.01, total 0.1s
[CV] C1, gamma0.01 .................................................
[CV] ................................ C5, gamma0.0001, total 0.1s
[CV] C5, gamma0.0005 ...............................................
[CV] ................................. C5, gamma0.001, total 0.1s
[CV] C5, gamma0.001 ................................................
[CV] ................................. C1, gamma0.005, total 0.1s
[CV] C1, gamma0.005 ................................................
[CV] ................................ C5, gamma0.0005, total 0.1s
[CV] C5, gamma0.0005 ...............................................
[CV] .................................. C1, gamma0.01, total 0.1s
[CV] C5, gamma0.0001 ...............................................
[CV] ................................. C5, gamma0.001, total 0.1s
[CV] C5, gamma0.001 ................................................
[CV] ................................. C1, gamma0.005, total 0.1s
[CV] ................................ C5, gamma0.0005, total 0.1s
[CV] C5, gamma0.005 ................................................
[CV] C5, gamma0.01 .................................................
[CV] ................................ C5, gamma0.0001, total 0.1s
[CV] C10, gamma0.0001 ..............................................
[CV] ................................. C5, gamma0.001, total 0.1s
[CV] C10, gamma0.001 ...............................................
[CV] ................................. C5, gamma0.005, total 0.1s
[CV] C5, gamma0.005 ................................................
[CV] .................................. C5, gamma0.01, total 0.1s
[CV] C5, gamma0.01 .................................................
[CV] ............................... C10, gamma0.0001, total 0.1s
[CV] C10, gamma0.0005 ..............................................
[CV] ................................ C10, gamma0.001, total 0.1s
[CV] C10, gamma0.001 ...............................................
[CV] ................................. C5, gamma0.005, total 0.1s
[CV] C5, gamma0.005 ................................................
[CV] ............................... C10, gamma0.0005, total 0.1s
[CV] C10, gamma0.0005 ..............................................
[CV] .................................. C5, gamma0.01, total 0.1s
[CV] C10, gamma0.0001 ..............................................
[CV] ................................ C10, gamma0.001, total 0.1s
[CV] C10, gamma0.001 ...............................................
[CV] ................................. C5, gamma0.005, total 0.1s
[CV] C5, gamma0.01 .................................................
[CV] ............................... C10, gamma0.0001, total 0.1s
[CV] C10, gamma0.0001 ..............................................
[CV] ............................... C10, gamma0.0005, total 0.1s
[CV] C10, gamma0.0005 ..............................................
[CV] ................................ C10, gamma0.001, total 0.1s
[CV] C10, gamma0.005 ...............................................
[CV] .................................. C5, gamma0.01, total 0.1s
[CV] C10, gamma0.005 ...............................................
[CV] ............................... C10, gamma0.0001, total 0.1s
[CV] C10, gamma0.01 ................................................
[CV] ............................... C10, gamma0.0005, total 0.1s
[CV] C50, gamma0.0005 ..............................................
[CV] ................................ C10, gamma0.005, total 0.1s
[CV] C50, gamma0.001 ...............................................
[CV] ................................ C10, gamma0.005, total 0.1s
[CV] C10, gamma0.005 ...............................................
[CV] ................................. C10, gamma0.01, total 0.1s
[CV] C50, gamma0.0001 ..............................................
[CV] ............................... C50, gamma0.0005, total 0.1s
[CV] C50, gamma0.0005 ..............................................
[CV] ................................ C50, gamma0.001, total 0.1s
[CV] C50, gamma0.001 ...............................................
[CV] ............................... C50, gamma0.0001, total 0.1s
[CV] ................................ C10, gamma0.005, total 0.1s
[CV] C10, gamma0.01 ................................................
[CV] ............................... C50, gamma0.0005, total 0.1s
[CV] C50, gamma0.0001 ..............................................
[CV] C50, gamma0.0005 ..............................................
[CV] ................................ C50, gamma0.001, total 0.1s
[CV] ................................. C10, gamma0.01, total 0.1s
[CV] C10, gamma0.01 ................................................
[CV] C50, gamma0.005 ...............................................
[CV] ............................... C50, gamma0.0001, total 0.1s
[CV] C50, gamma0.0001 ..............................................
[CV] ............................... C50, gamma0.0005, total 0.1s
[CV] C50, gamma0.001 ...............................................
[CV] ................................. C10, gamma0.01, total 0.1s
[CV] ................................ C50, gamma0.005, total 0.1s
[CV] C50, gamma0.005 ...............................................
[CV] C50, gamma0.005 ...............................................
[CV] ............................... C50, gamma0.0001, total 0.1s
[CV] ................................ C50, gamma0.001, total 0.1s
[CV] ................................ C50, gamma0.005, total 0.1s
[CV] ................................ C50, gamma0.005, total 0.1s
[CV] C50, gamma0.01 ................................................
[CV] ................................. C50, gamma0.01, total 0.0s
[CV] C50, gamma0.01 ................................................
[CV] ................................. C50, gamma0.01, total 0.0s
[CV] C50, gamma0.01 ................................................
[CV] ................................. C50, gamma0.01, total 0.0s
Best parameters found by grid search:
{C: 10, gamma: 0.0005}[Parallel(n_jobs4)]: Done 60 out of 60 | elapsed: 1.9s finished测试模型准确性
接着使用这一模型对测试集进行预测并分别使用confusion_matrix和classification_report查看其效果
t time()
y_pred clf.best_estimator_.predict(X_test_pca)
cm confusion_matrix(y_test, y_pred, labelsrange(n_targets))
print(Done in {0:.2f}.\n.format(time()-t))
print(confusion matrix:)
np.set_printoptions(thresholdnp.nan)
print(cm[:10])
Done in 0.01.confusion matrix:
[[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred, target_namestarget_names)) #这里注意y_test和y_pred位置不要颠倒 precision recall f1-score supportp0 1.00 1.00 1.00 1p1 1.00 1.00 1.00 3p2 1.00 0.50 0.67 2p3 1.00 1.00 1.00 1p4 1.00 1.00 1.00 1p5 1.00 1.00 1.00 1p6 1.00 0.75 0.86 4p7 1.00 1.00 1.00 2p8 1.00 1.00 1.00 4p9 1.00 1.00 1.00 2p10 1.00 1.00 1.00 1p11 1.00 1.00 1.00 4p12 1.00 1.00 1.00 4p13 1.00 1.00 1.00 1p14 1.00 1.00 1.00 1p15 0.75 1.00 0.86 3p16 1.00 1.00 1.00 2p17 1.00 1.00 1.00 2p18 1.00 1.00 1.00 2p19 1.00 1.00 1.00 1p20 1.00 1.00 1.00 2p21 1.00 1.00 1.00 3p22 1.00 1.00 1.00 2p23 1.00 1.00 1.00 3p24 0.75 1.00 0.86 3p25 1.00 1.00 1.00 2p26 1.00 1.00 1.00 2p27 1.00 1.00 1.00 2p28 1.00 1.00 1.00 2p29 1.00 1.00 1.00 3p30 1.00 1.00 1.00 2p31 1.00 1.00 1.00 2p32 1.00 1.00 1.00 2p33 1.00 1.00 1.00 3p34 1.00 1.00 1.00 1p35 1.00 1.00 1.00 2p36 1.00 1.00 1.00 2avg / total 0.98 0.97 0.97 80/Users/hadoop/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py:1428: UserWarning: labels size, 37, does not match size of target_names, 40.format(len(labels), len(target_names))疑问
效果非常乐观但是仍有个问题怎么确定p0p37分别对应的是哪个一个人