当前位置: 首页 > news >正文

wordpress 站群注意做试管婴儿的网站

wordpress 站群注意,做试管婴儿的网站,wordpress 架构设计,crm厂商文章目录1. 逻辑回归二分类2. 垃圾邮件过滤2.1 性能指标2.2 准确率2.3 精准率、召回率2.4 F1值2.5 ROC、AUC3. 网格搜索调参4. 多类别分类5. 多标签分类5.1 多标签分类性能指标本文为 scikit-learn机器学习#xff08;第2版#xff09;学习笔记逻辑回归常用于分类任务 1. 逻… 文章目录1. 逻辑回归二分类2. 垃圾邮件过滤2.1 性能指标2.2 准确率2.3 精准率、召回率2.4 F1值2.5 ROC、AUC3. 网格搜索调参4. 多类别分类5. 多标签分类5.1 多标签分类性能指标本文为 scikit-learn机器学习第2版学习笔记逻辑回归常用于分类任务 1. 逻辑回归二分类 《统计学习方法》逻辑斯谛回归模型 Logistic RegressionLR 定义设 XXX 是连续随机变量 XXX 服从 logistic 分布是指 XXX 具有下列分布函数和密度函数 F(x)P(X≤x)11e−(x−μ)/γF(x) P(X \leq x) \frac{1}{1e^{{-(x-\mu)} / \gamma}}F(x)P(X≤x)1e−(x−μ)/γ1​ f(x)F′(x)e−(x−μ)/γγ(1e−(x−μ)/γ)2f(x)F(x) \frac {e^{{-(x-\mu)} / \gamma}}{\gamma {(1e^{{-(x-\mu)}/\gamma})}^2}f(x)F′(x)γ(1e−(x−μ)/γ)2e−(x−μ)/γ​ 在逻辑回归中当预测概率 阈值预测为正类否则预测为负类 2. 垃圾邮件过滤 从信息中提取 TF-IDF 特征并使用逻辑回归进行分类 import pandas as pd data pd.read_csv(SMSSpamCollection, delimiter\t,headerNone) datadata[data[0]ham][0].count() # 4825 条正常信息 data[data[0]spam][0].count() # 747 条垃圾信息import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split, cross_val_scoreX data[1].values y data[0].values from sklearn.preprocessing import LabelBinarizer lb LabelBinarizer() y lb.fit_transform(y)X_train_raw, X_test_raw, y_train, y_test train_test_split(X, y, random_state520)vectorizer TfidfVectorizer() X_train vectorizer.fit_transform(X_train_raw) X_test vectorizer.transform(X_test_raw)classifier LogisticRegression() classifier.fit(X_train, y_train)pred classifier.predict(X_test) for i, pred_i in enumerate(pred[:5]):print(预测为%s, 信息为%s,真实为%s %(pred_i,X_test_raw[i],y_test[i]))预测为0, 信息为Aww thats the first time u said u missed me without asking if I missed u first. You DO love me! :),真实为[0] 预测为0, 信息为Poor girl cant go one day lmao,真实为[0] 预测为0, 信息为Also remember the beads dont come off. Ever.,真实为[0] 预测为0, 信息为I see the letter B on my car,真实为[0] 预测为0, 信息为My love ! How come it took you so long to leave for Zahers? I got your words on ym and was happy to see them but was sad you had left. I miss you,真实为[0]2.1 性能指标 混淆矩阵 from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt confusion_matrix confusion_matrix(y_test, pred) plt.matshow(confusion_matrix) plt.rcParams[font.sans-serif] SimHei # 消除中文乱码 plt.title(混淆矩阵) plt.ylabel(真实) plt.xlabel(预测) plt.colorbar()2.2 准确率 scores cross_val_score(classifier, X_train, y_train, cv5) print(Accuracies: %s % scores) print(Mean accuracy: %s % np.mean(scores))Accuracies: [0.94976077 0.95933014 0.96650718 0.95215311 0.95688623] Mean accuracy: 0.9569274847434318准确率不是一个很合适的性能指标它不能区分预测错误是正预测为负还是负预测为正 2.3 精准率、召回率 可以参考 [Hands On ML] 3. 分类MNIST手写数字预测 单独只看精准率或者召回率是没有意义的 from sklearn.metrics import precision_score, recall_score, f1_score precisions precision_score(y_test, pred) print(Precision: %s % precisions) recalls recall_score(y_test, pred) print(Recall: %s % recalls)Precision: 0.9852941176470589 预测为垃圾信息的基本上真的是垃圾信息Recall: 0.6979166666666666 有30%的垃圾信息预测为了非垃圾信息2.4 F1值 F1 值是以上精准率和召回率的均衡 f1s f1_score(y_test, pred) print(F1 score: %s % f1s) # F1 score: 0.81707317073170742.5 ROC、AUC 好的分类器AUC面积越接近1越好随机分类器AUC面积为0.5 from sklearn.metrics import roc_curve from sklearn.metrics import roc_auc_scorefalse_positive_rate, recall, thresholds roc_curve(y_test, pred) roc_auc_score roc_auc_score(y_test, pred)plt.title(受试者工作特性) plt.plot(false_positive_rate, recall, b, labelAUC %0.2f % roc_auc_score) plt.legend(loclower right) plt.plot([0, 1], [0, 1], r--) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.ylabel(Recall) plt.xlabel(Fall-out) plt.show()3. 网格搜索调参 import pandas as pd from sklearn.preprocessing import LabelEncoder from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.model_selection import train_test_split from sklearn.metrics import precision_score, recall_score, accuracy_scorepipeline Pipeline([(vect, TfidfVectorizer(stop_wordsenglish)),(clf, LogisticRegression()) ]) parameters {vect__max_df: (0.25, 0.5, 0.75), # 模块name__参数namevect__stop_words: (english, None),vect__max_features: (2500, 5000, None),vect__ngram_range: ((1, 1), (1, 2)),vect__use_idf: (True, False),clf__penalty: (l1, l2),clf__C: (0.01, 0.1, 1, 10), }if __name__ __main__:df pd.read_csv(./SMSSpamCollection, delimiter\t, headerNone)X df[1].valuesy df[0].valueslabel_encoder LabelEncoder()y label_encoder.fit_transform(y)X_train, X_test, y_train, y_test train_test_split(X, y)grid_search GridSearchCV(pipeline, parameters, n_jobs-1, verbose1, scoringaccuracy, cv3)grid_search.fit(X_train, y_train)print(Best score: %0.3f % grid_search.best_score_)print(Best parameters set:)best_parameters grid_search.best_estimator_.get_params()for param_name in sorted(parameters.keys()):print(\t%s: %r % (param_name, best_parameters[param_name]))predictions grid_search.predict(X_test)print(Accuracy: %s % accuracy_score(y_test, predictions))print(Precision: %s % precision_score(y_test, predictions))print(Recall: %s % recall_score(y_test, predictions))Best score: 0.985 Best parameters set:clf__C: 10clf__penalty: l2vect__max_df: 0.5vect__max_features: 5000vect__ngram_range: (1, 2)vect__stop_words: Nonevect__use_idf: True Accuracy: 0.9791816223977028 Precision: 1.0 Recall: 0.8605769230769231调整参数后提高了召回率 4. 多类别分类 电影情绪评价预测 data pd.read_csv(./chapter5_movie_train.csv,header0,delimiter\t) datadata[Sentiment].describe()count 156060.000000 mean 2.063578 std 0.893832 min 0.000000 25% 2.000000 50% 2.000000 75% 3.000000 max 4.000000 Name: Sentiment, dtype: float64平均都是比较中立的情绪 data[Sentiment].value_counts()/data[Sentiment].count()2 0.509945 3 0.210989 1 0.174760 4 0.058990 0 0.045316 Name: Sentiment, dtype: float6450% 的例子都是中立的情绪 from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score, confusion_matrix from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCVdf pd.read_csv(./chapter5_movie_train.csv, header0, delimiter\t) X, y df[Phrase], df[Sentiment].values X_train, X_test, y_train, y_test train_test_split(X, y, train_size0.5)pipeline Pipeline([(vect, TfidfVectorizer(stop_wordsenglish)),(clf, LogisticRegression()) ]) parameters {vect__max_df: (0.25, 0.5),vect__ngram_range: ((1, 1), (1, 2)),vect__use_idf: (True, False),clf__C: (0.1, 1, 10), }grid_search GridSearchCV(pipeline, parameters, n_jobs-1, verbose1, scoringaccuracy) grid_search.fit(X_train, y_train)print(Best score: %0.3f % grid_search.best_score_) print(Best parameters set:) best_parameters grid_search.best_estimator_.get_params() for param_name in sorted(parameters.keys()):print(\t%s: %r % (param_name, best_parameters[param_name]))Best score: 0.619 Best parameters set:clf__C: 10vect__max_df: 0.25vect__ngram_range: (1, 2)vect__use_idf: False性能指标 predictions grid_search.predict(X_test)print(Accuracy: %s % accuracy_score(y_test, predictions)) print(Confusion Matrix:) print(confusion_matrix(y_test, predictions)) print(Classification Report:) print(classification_report(y_test, predictions))Accuracy: 0.6292323465333846 Confusion Matrix: [[ 1013 1742 682 106 11][ 794 5914 6275 637 49][ 196 3207 32397 3686 222][ 28 488 6513 8131 1299][ 1 59 548 2388 1644]] Classification Report:precision recall f1-score support0 0.50 0.29 0.36 35541 0.52 0.43 0.47 136692 0.70 0.82 0.75 397083 0.54 0.49 0.52 164594 0.51 0.35 0.42 4640accuracy 0.63 78030macro avg 0.55 0.48 0.50 78030 weighted avg 0.61 0.63 0.62 780305. 多标签分类 一个实例可以被贴上多个 labels 问题转换 实例的标签(假设为L1,L2)转换成L1 and L2,以此类推缺点产生很多种类的标签且模型只能训练数据中包含的类很多可能无法覆盖到对每个标签训练一个二分类器这个实例是L1吗是L2吗缺点忽略了标签之间的关系 5.1 多标签分类性能指标 汉明损失不正确标签的平均比例0最好杰卡德相似系数预测与真实标签的交集数量 / 并集数量1最好 from sklearn.metrics import hamming_loss, jaccard_score # help(jaccard_score)print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]])))print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]])))print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]])))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]]),averageNone))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]]),averageNone))print(jaccard_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]]),averageNone))0.0 0.25 0.5 [1. 1.] [0.5 1. ] [0. 1.]
http://www.pierceye.com/news/368602/

相关文章:

  • 怎么注册建设银行网站如何创建属于个人网站
  • 双语网站系统wordpress page 父页面
  • 连云港做网站设计稿定设计官网入口
  • 建筑网站推荐wordpress hook api
  • 昆明做网站公司哪家好安卓优化
  • 魔站建站系统哪家好国内知名的包装设计公司
  • 福田区住房和建设局网站早晨设计 做网站设计吗
  • 郑州轨道网站开发手机怎么做动漫微电影网站
  • vscode网站开发昆明做网站找启搜网络
  • 如何评估网站虚拟商品交易网站建设
  • 太原网站优化教程pycharm做网站
  • 哪些网站做英语比较好免费下载模板ppt
  • 网站建设运营计划书wordpress 维护页面
  • 襄阳定制型网站开发前端网页设计招聘
  • 网站备案报价深圳市住房和建设局官网首页
  • 宁波江北区网站推广联系方式做一个论坛网站要多少钱
  • 网站制作无锡台州建设工程网站
  • 云网站 制作如何做一个网页
  • 微信免费建站新建网站站点的
  • 云网站制作的流程世界500强企业排名
  • 巨久科技网站建设做出个人网站什么水平
  • 做外贸网站怎么做做网站3个月
  • 县局网站建设招标网站建设人文类
  • 网站开发亿玛酷给力5上海logo在线制作
  • 网站重新备案搞个网站需要多少钱
  • 海南微信网站制作平台网络计划的优化
  • 域名的正确书写格式自动seo优化
  • 怎样在网站做友情链接网页什么设计
  • 做seo网站营销推广南宁建设职业技术学院招聘信息网站
  • 网站建设全网推广小程序手机网站怎么优化