宽屏网站尺寸,wap asp网站模板下载,怎样在工商局网站做公示,256m内存 wordpress这里结合Kaggle比赛的一个数据集#xff0c;记录一下使用贝叶斯全局优化和高斯过程来寻找最佳参数的方法步骤。1.安装贝叶斯全局优化库从pip安装最新版本pip install bayesian-optimization2.加载数据集import pandas as pdimport numpy as npfrom sklearn.model_selection im…这里结合Kaggle比赛的一个数据集记录一下使用贝叶斯全局优化和高斯过程来寻找最佳参数的方法步骤。1.安装贝叶斯全局优化库从pip安装最新版本pip install bayesian-optimization2.加载数据集import pandas as pdimport numpy as npfrom sklearn.model_selection import StratifiedKFoldfrom scipy.stats import rankdatafrom sklearn import metricsimport lightgbm as lgbimport warningsimport gcpd.set_option(display.max_columns, 200)train_df pd.read_csv(../input/train.csv)test_df pd.read_csv(../input/test.csv)目标变量的分布target targetpredictors train_df.columns.values.tolist()[2:]train_df.target.value_counts()问题是不平衡。这里使用50%分层行作为保持行以便验证集获得最佳参数。 稍后将在最终模型拟合中使用5折交叉验证。bayesian_tr_index, bayesian_val_index list(StratifiedKFold(n_splits2,shuffleTrue, random_state1).split(train_df,train_df.target.values))[0]这些bayesian_tr_index和bayesian_val_index索引将用于贝叶斯优化作为训练和验证数据集的索引。3.黑盒函数优化(LightGBM)在加载数据时为LightGBM创建黑盒函数以查找参数。def LGB_bayesian(num_leaves, # intmin_data_in_leaf, # intlearning_rate,min_sum_hessian_in_leaf, # intfeature_fraction,lambda_l1,lambda_l2,min_gain_to_split,max_depth):# LightGBM expects next three parameters need to be integer. So we makethem integernum_leaves int(num_leaves)min_data_in_leaf int(min_data_in_leaf)max_depth int(max_depth)assert type(num_leaves) intassert type(min_data_in_leaf) intassert type(max_depth) intparam {num_leaves: num_leaves,max_bin: 63,min_data_in_leaf: min_data_in_leaf,learning_rate: learning_rate,min_sum_hessian_in_leaf: min_sum_hessian_in_leaf,bagging_fraction: 1.0,bagging_freq: 5,feature_fraction: feature_fraction,lambda_l1: lambda_l1,lambda_l2: lambda_l2,min_gain_to_split: min_gain_to_split,max_depth: max_depth,save_binary: True,seed: 1337,feature_fraction_seed: 1337,bagging_seed: 1337,drop_seed: 1337,data_random_seed: 1337,objective: binary,boosting_type: gbdt,verbose: 1,metric: auc,is_unbalance: True,boost_from_average: False,}xg_train lgb.Dataset(train_df.iloc[bayesian_tr_index][predictors].values,labeltrain_df.iloc[bayesian_tr_index][target].values,feature_namepredictors,free_raw_data False)xg_valid lgb.Dataset(train_df.iloc[bayesian_val_index][predictors].values,labeltrain_df.iloc[bayesian_val_index][target].values,feature_namepredictors,free_raw_data False)num_round 5000clf lgb.train(param, xg_train, num_round, valid_sets [xg_valid],verbose_eval250, early_stopping_rounds 50)predictions clf.predict(train_df.iloc[bayesian_val_index][predictors].values,num_iterationclf.best_iteration)score metrics.roc_auc_score(train_df.iloc[bayesian_val_index][target].values,predictions)return score上面的LGB_bayesian函数将作为贝叶斯优化的黑盒函数。我已经在LGB_bayesian函数中为LightGBM定义了trainng和validation数据集。LGB_bayesian函数从贝叶斯优化框架获取num_leavesmin_data_in_leaflearning_ratemin_sum_hessian_in_leaffeature_fractionlambda_l1lambda_l2min_gain_to_splitmax_depth的值。请记住对于LightGBMnum_leavesmin_data_in_leaf和max_depth应该是整数。 但贝叶斯优化会发送连续的函数。所以我强制它们是整数。 我只会找到它们的最佳参数值。 读者可以增加或减少要优化的参数数量。现在需要为这些参数提供边界以便贝叶斯优化仅在边界内搜索。bounds_LGB {num_leaves: (5, 20),min_data_in_leaf: (5, 20),learning_rate: (0.01, 0.3),min_sum_hessian_in_leaf: (0.00001, 0.01),feature_fraction: (0.05, 0.5),lambda_l1: (0, 5.0),lambda_l2: (0, 5.0),min_gain_to_split: (0, 1.0),max_depth:(3,15),}让我们将它们全部放在BayesianOptimization对象中from bayes_opt import BayesianOptimizationLGB_BO BayesianOptimization(LGB_bayesian, bounds_LGB,random_state13)现在让我们来优化key space (parameters)print(LGB_BO.space.keys)我创建了BayesianOptimization对象(LGB_BO)在调用maxime之前它不会工作。在调用之前解释一下贝叶斯优化对象(LGB_BO)的两个参数我们可以传递给它们进行最大化init_points我们想要执行的随机探索的初始随机运行次数。 在我们的例子中LGB_bayesian将被运行n_iter次。n_iter运行init_points数后我们要执行多少次贝叶斯优化运行。现在是时候从贝叶斯优化框架调用函数来最大化。 我允许LGB_BO对象运行5个init_points和5个n_iter。init_points 5n_iter 5print(- * 130)with warnings.catch_warnings():warnings.filterwarnings(ignore)LGB_BO.maximize(init_pointsinit_points, n_itern_iter, acqucb, xi0.0,alpha1e-6)优化完成后让我们看看我们得到的最大值是多少。LGB_BO.max[target]参数的验证AUC是0.89 让我们看看参数:LGB_BO.max[params]现在我们可以将这些参数用于我们的最终模型!BayesianOptimization库中还有一个很酷的选项。你可以探测LGB_bayesian函数如果你对最佳参数有所了解或者您从其他kernel获取参数。 我将在此复制并粘贴其他内核中的参数。你可以按照以下方式进行探测LGB_BO.probe(params{feature_fraction: 0.1403,lambda_l1: 4.218,lambda_l2: 1.734,learning_rate: 0.07,max_depth: 14,min_data_in_leaf: 17,min_gain_to_split: 0.1501,min_sum_hessian_in_leaf: 0.000446,num_leaves: 6},lazyTrue, #)好的默认情况下这些将被懒惰地探索(lazy True)这意味着只有在你下次调用maxime时才会评估这些点。让我们对LGB_BO对象进行最大化调用。LGB_BO.maximize(init_points0, n_iter0) # remember no init_points orn_iter最后通过属性LGB_BO.res可以获得探测的所有参数列表及其相应的目标值。for i, res in enumerate(LGB_BO.res):print(Iteration {}: \n\t{}.format(i, res))我们在调查中获得了更好的验证分数!和以前一样我只运行LGB_BO 10次。在实践中我将它增加到100。LGB_BO.max[target]LGB_BO.max[params]让我们一起构建一个模型使用这些参数。4.训练LightGBM模型param_lgb {num_leaves: int(LGB_BO.max[params][num_leaves]), # remember to intheremax_bin: 63,min_data_in_leaf: int(LGB_BO.max[params][min_data_in_leaf]), #remember to int herelearning_rate: LGB_BO.max[params][learning_rate],min_sum_hessian_in_leaf:LGB_BO.max[params][min_sum_hessian_in_leaf],bagging_fraction: 1.0,bagging_freq: 5,feature_fraction: LGB_BO.max[params][feature_fraction],lambda_l1: LGB_BO.max[params][lambda_l1],lambda_l2: LGB_BO.max[params][lambda_l2],min_gain_to_split: LGB_BO.max[params][min_gain_to_split],max_depth: int(LGB_BO.max[params][max_depth]), # remember to intheresave_binary: True,seed: 1337,feature_fraction_seed: 1337,bagging_seed: 1337,drop_seed: 1337,data_random_seed: 1337,objective: binary,boosting_type: gbdt,verbose: 1,metric: auc,is_unbalance: True,boost_from_average: False,}如您所见我将LGB_BO的最佳参数保存到param_lgb字典中它们将用于训练5折的模型。Kfolds数量无锡妇科检查医院 http://www.87554006.com/nfold 5gc.collect()skf StratifiedKFold(n_splitsnfold, shuffleTrue, random_state2019)oof np.zeros(len(train_df))predictions np.zeros((len(test_df),nfold))i 1for train_index, valid_index in skf.split(train_df,train_df.target.values):print(\nfold {}.format(i))xg_train lgb.Dataset(train_df.iloc[train_index][predictors].values,labeltrain_df.iloc[train_index][target].values,feature_namepredictors,free_raw_data False)xg_valid lgb.Dataset(train_df.iloc[valid_index][predictors].values,labeltrain_df.iloc[valid_index][target].values,feature_namepredictors,free_raw_data False)clf lgb.train(param_lgb, xg_train, 5000, valid_sets [xg_valid],verbose_eval250, early_stopping_rounds 50)oof[valid_index] clf.predict(train_df.iloc[valid_index][predictors].values,num_iterationclf.best_iteration)predictions[:,i-1] clf.predict(test_df[predictors],num_iterationclf.best_iteration)i i 1print(\n\nCV AUC:{:0.2f}.format(metrics.roc_auc_score(train_df.target.values, oof)))所以我们在5折交叉验证中获得了0.90 AUC。让我们对5折预测进行排名平均。5.排名平均值print(Rank averaging on, nfold, fold predictions)rank_predictions np.zeros((predictions.shape[0],1))for i in range(nfold):rank_predictions[:, 0] np.add(rank_predictions[:, 0],rankdata(predictions[:, i].reshape(-1,1))/rank_predictions.shape[0])rank_predictions / nfold6.提交sub_df pd.DataFrame({ID_code: test_df.ID_code.values})sub_df[target] rank_predictionssub_df.to_csv(Customer_Transaction_rank_predictions.csv, indexFalse)