网站开发 管理方案,seo网站建设哪家专业,海口网站建设网页制作公司,网站设计 字体的搭配第1关#xff1a;使用scikit-learn导入数据集
本关任务
本关任务是使用scikit-learn的datasets模块导入iris数据集#xff0c;并打印前5条原数据、前5条数据标签及原数据的数组大小。 即编程实现step1/importData.py 的getIrisData()函数#xff1a;
from sklearn import…第1关使用scikit-learn导入数据集
本关任务
本关任务是使用scikit-learn的datasets模块导入iris数据集并打印前5条原数据、前5条数据标签及原数据的数组大小。 即编程实现step1/importData.py 的getIrisData()函数
from sklearn import datasets
def getIrisData():导入Iris数据集返回值X - 前5条训练特征数据y - 前5条训练数据类别X_shape - 训练特征数据的二维数组大小#初始化X [] y [] X_shape () # 请在此添加实现代码 ##********** Begin *********#digitsdatasets.load_iris()Xdigits.data[:5]ydigits.target[:5]X_shapedigits.data.shape#********** End **********#return X,y,X_shape
第2关数据预处理 — 标准化
本关任务
在前一关卡我们已经学会了使用sklearn导入数据然而原始数据总是比较杂乱、不规整的直接加载至模型中训练会影响预测效果。本关卡将学会使用sklearn对导入的数据进行预处理。
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScalerfrom sklearn.datasets import fetch_california_housing
Data descrption:
The data contains 20,640 observations on 9 variables.This dataset contains the average house value as target variable
and the following input variables (features): average income,
housing average age, average rooms, average bedrooms, population,
average occupation, latitude, and longitude in that order.dataset : dict-like object with the following attributes:dataset.data : ndarray, shape [20640, 8]Each row corresponding to the 8 feature values in order.dataset.target : numpy array of shape (20640,)Each value corresponds to the average house value in units of 100,000.dataset.feature_names : array of length 8Array of ordered feature names used in the dataset.dataset.DESCR : stringDescription of the California housing dataset.
dataset fetch_california_housing(./step4/)
X_full, y dataset.data, dataset.target#抽取其中两个特征数据
X X_full[:, [0, 5]]def getMinMaxScalerValue():对特征数据X进行MinMaxScaler标准化转换并返回转换后的数据前5条返回值:X_first5 - 数据列表X_first5 []# 请在此添加实现代码 ## ********** Begin *********#X_first5 MinMaxScaler().fit_transform(X)X_first5X_first5[:5]# ********** End **********#return X_first5def getScaleValue():对目标数据y进行简单scale标准化转换并返回转换后的数据前5条返回值:y_first5 - 数据列表y_first5 []# 请在此添加实现代码 ## ********** Begin *********#y_first5 scale(y) y_first5 y_first5[:5]# ********** End **********#return y_first5def getStandardScalerValue():对特征数据X进行StandardScaler标准化转换并返回转换后的数据均值和缩放比例返回值:X_mean - 均值X_scale - 缩放比例值X_mean NoneX_scale None# 请在此添加实现代码 ##********** Begin *********#scaleStandardScaler().fit(X)X_meanscale.mean_X_scalescale.scale_#********** End **********#return X_mean,X_scale
第3关文本数据特征提取
本关任务
在前一关卡我们已经学会了数据集标准化处理标准化一般主要针对数值型数据。对于文本数据我们无法直接将原始文本作为训练数据需通过特征提取将其转化为特征向量。本关卡将学习提取文本数据特征的基本操作。
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizercategories [alt.atheism,talk.religion.misc,
]# 加载对应目录的新闻数据包含857 个文档
data fetch_20newsgroups(./step5/,subsettrain, categoriescategories)
X data.datadef transfer2CountVector():使用CountVectorizer方法提取特征向量返回词汇表大小和前五条特征向量返回值:vocab_len - 标量词汇表大小tokenizer_list - 数组对测试字符串test_str进行分词后的结果vocab_len 0test_str whats your favorite programming language?tokenizer_list []# 请在此添加实现代码 ## ********** Begin *********#vectorizer CountVectorizer()vectorizer.fit(X)vocab_len len(vectorizer.vocabulary_)analyze vectorizer.build_analyzer()tokenizer_list analyze(test_str)# ********** End **********#return vocab_len,tokenizer_listdef transfer2TfidfVector():使用TfidfVectorizer方法提取特征向量并将向量化转换器应用到新的测试数据TfidfVectorizer()方法的参数设置min_df 2,stop_wordsenglishtest_data - 需要转换的原数据返回值:transfer_test_data - 二维数组ndarraytest_data [Once again, to not believe in God is different than saying\nI BELIEVE that God does not exist. I still maintain the position, even\nafter reading the FAQs, that strong atheism requires faith.\n\n \nNo it in the way it is usually used. In my view, you are saying here that\ndriving a car requires faith that the car drives.\n \nFor me it is a conclusion, and I have no more faith in it than I have in the\npremises and the argument used.\n \n \nBut first let me say the following.\nWe might have a language problem here - in regards to faith and\nexistence. I, as a Christian, maintain that God does not exist.\nTo exist means to have being in space and time. God does not HAVE\nbeing - God IS Being. Kierkegaard once said that God does not\nexist, He is eternal. With this said, I feel it\s rather pointless\nto debate the so called existence of God - and that is not what\nI\m doing here. I believe that God is the source and ground of\nbeing. When you say that god does not exist, I also accept this\nstatement - but we obviously mean two different things by it. However,\nin what follows I will use the phrase the existence of God in it\s\n\usual sense\ - and this is the sense that I think you are using it.\nI would like a clarification upon what you mean by the existence of\nGod.\n\n \nNo, that\s a word game. The term god is used in a different way usually.\nWhen you use a different definition it is your thing, but until it is\ncommonly accepted you would have to say the way I define god is ... and\nthat does not exist, it is existence itself, so I say it does not exist.\n \nInterestingly, there are those who say that existence exists is one of\nthe indubitable statements possible.\n \nFurther, saying god is existence is either a waste of time, existence is\nalready used and there is no need to replace it by god, or you are implying\nmore with it, in which case your definition and your argument so far\nare incomplete, making it a fallacy.\n \n \n(Deletion)\nOne can never prove that God does or does not exist. When you say\nthat you believe God does not exist, and that this is an opinion\nbased upon observation, I will have to ask what observtions are\nyou refering to? There are NO observations - pro or con - that\nare valid here in establishing a POSITIVE belief.\n(Deletion)\n \nWhere does that follow? Aren\t observations based on the assumption\nthat something exists?\n \nAnd wouldn\t you say there is a level of definition that the assumption\ngod is is meaningful. If not, I would reject that concept anyway.\n \nSo, where is your evidence for that god is is meaningful at some level?\n Benedikt\n]transfer_test_data None# 请在此添加实现代码 ## ********** Begin *********#tfidf_vertor TfidfVectorizer(min_df2, stop_wordsenglish)tfidf_vertor.fit(X)transfer_test_data tfidf_vertor.transform(test_data).toarray()# ********** End **********#return transfer_test_data
第4关使用scikit-learn分类器SVM对digits数据分类
本关任务
本关要求采用scikit-learn中的svm模型训练一个对digits数据集进行分类的模型。训练集是digits数据集的前半部分数据测试集是digits数据集的后半部分数据。
import matplotlib.pyplot as plt# 导入数据集分类器相关包
from sklearn import datasets, svm, metrics# 导入digits数据集
digits datasets.load_digits()
n_samples len(digits.data)
data digits.data# 使用前一半的数据集作为训练数据后一半数据集作为测试数据
train_data,train_target data[:n_samples // 2],digits.target[:n_samples // 2]
test_data,test_target data[n_samples // 2:],digits.target[n_samples // 2:]def createModelandPredict():创建分类模型并对测试数据预测返回值predicted - 测试数据预测分类值predicted None# 请在此添加实现代码 ##********** Begin *********#classifier svm.SVC()classifier.fit(train_data,train_target)predicted classifier.predict(test_data)#********** End **********#return predicted
第5关模型持久化
本关必读
当数据量很大的时候训练一个模型需要消耗很大的时间成本每次都重新训练模型预测是非常冗余且没有必要的我们可以将训练模型存储下来每当要预测新数据的时候只需加载该模型。 训练模型的持久化需要调用python的内建模块picklepickle可以用来将python对象转化为字节流存储至磁盘也可以逆向操作将磁盘上的字节流恢复为python对象。
# 导入数据集分类器相关包
from sklearn import datasets, svm, metrics
import pickle# 导入digits数据集
digits datasets.load_digits()
n_samples len(digits.data)
data digits.data# 使用前一半的数据集作为训练数据后一半数据集作为测试数据
train_data,train_target data[:n_samples // 2],digits.target[:n_samples // 2]
test_data,test_target data[n_samples // 2:],digits.target[n_samples // 2:]def createModel():classifier svm.SVC()classifier.fit(train_data,train_target)return classifierlocal_file dumpfile
def dumpModel():存储分类模型clf createModel()# 请在此添加实现代码 ##********** Begin *********#f_model open(local_file, wb)pickle.dump(clf, f_model)#********** End **********#def loadModel():加载模型并使用模型对测试数据进行预测返回预测值返回值predicted - 模型预测值predicted None# 请在此添加实现代码 ##********** Begin *********#fw open(local_file, rb)classifier pickle.loads(fw.read())predicted classifier.predict(test_data)#********** End **********#return predicted
第6关模型评估-量化预测效果
本关任务
在前面的关卡我们已经学会了如果使用sklearn训练分类模型那如何评估模型的分类效果本关卡将学会使用sklearn的模型度量方法来量化预测结果。
from sklearn.metrics import accuracy_score,precision_score,f1_score,precision_recall_fscore_support
from sklearn.svm import LinearSVC,SVCdef bin_evaluation(X_train, y_train, X_test, y_test):评估二分类模型:param X_train: 训练数据集:param y_train: 训练集类别:param X_test: 测试数据集:param y_test: 测试集实际类别:return:correct_num - 正确分类的样本个数prec - 正类的准确率recall - 正类的召回率f_score - 正类的f值classifier LinearSVC()correct_num, prec, recall, fscore None, None, None, None# 请在此添加实现代码 ## ********** Begin *********#classifier.fit(X_train, y_train)y_pred classifier.predict(X_test)correct_num accuracy_score(y_test, y_pred, normalizeFalse)prec, recall, fscore, support precision_recall_fscore_support(y_test, y_pred, averagebinary, pos_label1)return correct_num, prec, recall, fscore# ********** End **********#def multi_evaluation(X_train,y_train,X_test,y_test):评估多分类模型:param X_train: 训练数据集:param y_train: 训练集类别:param X_test: 测试数据集:param y_test: 测试集实际类别:return:acc - 模型的精度prec - 准确率f_score - f值#初始化acc,prec,f_score None,None,Noneclassifier SVC(kernellinear)# 请在此添加实现代码 ## ********** Begin *********#classifier.fit(X_train, y_train)y_pred classifier.predict(X_test)acc accuracy_score(y_test, y_pred)prec, zhaohui, f_score, sp_score precision_recall_fscore_support(y_test, y_pred, averagemacro)return acc,prec,f_score# ********** End **********#