当前位置: 首页 > news >正文

网站建设的方案实施包括08影院 WordPress模板

网站建设的方案实施包括,08影院 WordPress模板,怎样做企业宣传推广,空间设计师网站1、介绍 CAME#xff1a;一种以置信度为导向的策略#xff0c;以减少现有内存高效优化器的不稳定性。基于此策略#xff0c;我们提出CAME同时实现两个目标:传统自适应方法的快速收敛和内存高效方法的低内存使用。大量的实验证明了CAME在各种NLP任务(如BERT和GPT-2训练)中的…1、介绍 CAME一种以置信度为导向的策略以减少现有内存高效优化器的不稳定性。基于此策略我们提出CAME同时实现两个目标:传统自适应方法的快速收敛和内存高效方法的低内存使用。大量的实验证明了CAME在各种NLP任务(如BERT和GPT-2训练)中的训练稳定性和优异的性能。 2、Pytorch中调用该优化算法 1定义CAME import mathimport torch import torch.optimclass CAME(torch.optim.Optimizer):Implements CAME algorithm.This implementation is based on:CAME: Confidence-guided Adaptive Memory Efficient OptimizationArgs:params (iterable): iterable of parameters to optimize or dicts definingparameter groupslr (float, optional): external learning rate (default: None)eps (tuple[float, float]): regularization constants for square gradientand instability respectively (default: (1e-30, 1e-16))clip_threshold (float): threshold of root-mean-square offinal gradient update (default: 1.0)betas (tuple[float, float, float]): coefficient used for computing running averages ofupdate, square gradient and instability (default: (0.9, 0.999, 0.9999)))weight_decay (float, optional): weight decay (L2 penalty) (default: 0)def __init__(self,params,lrNone,eps(1e-30, 1e-16),clip_threshold1.0,betas(0.9, 0.999, 0.9999),weight_decay0.0,):assert lr 0.assert all([0. beta 1. for beta in betas])defaults dict(lrlr,epseps,clip_thresholdclip_threshold,betasbetas,weight_decayweight_decay,)super(CAME, self).__init__(params, defaults)propertydef supports_memory_efficient_fp16(self):return Truepropertydef supports_flat_params(self):return Falsedef _get_options(self, param_shape):factored len(param_shape) 2return factoreddef _rms(self, tensor):return tensor.norm(2) / (tensor.numel() ** 0.5)def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col):r_factor ((exp_avg_sq_row / exp_avg_sq_row.mean(dim-1, keepdimTrue)).rsqrt_().unsqueeze(-1))c_factor exp_avg_sq_col.unsqueeze(-2).rsqrt()return torch.mul(r_factor, c_factor)def step(self, closureNone):Performs a single optimization step.Args:closure (callable, optional): A closure that reevaluates the modeland returns the loss.loss Noneif closure is not None:loss closure()for group in self.param_groups:for p in group[params]:if p.grad is None:continuegrad p.grad.dataif grad.dtype in {torch.float16, torch.bfloat16}:grad grad.float()if grad.is_sparse:raise RuntimeError(CAME does not support sparse gradients.)state self.state[p]grad_shape grad.shapefactored self._get_options(grad_shape)# State Initializationif len(state) 0:state[step] 0state[exp_avg] torch.zeros_like(grad)if factored:state[exp_avg_sq_row] torch.zeros(grad_shape[:-1]).type_as(grad)state[exp_avg_sq_col] torch.zeros(grad_shape[:-2] grad_shape[-1:]).type_as(grad)state[exp_avg_res_row] torch.zeros(grad_shape[:-1]).type_as(grad)state[exp_avg_res_col] torch.zeros(grad_shape[:-2] grad_shape[-1:]).type_as(grad)else:state[exp_avg_sq] torch.zeros_like(grad)state[RMS] 0state[step] 1state[RMS] self._rms(p.data)update (grad**2) group[eps][0]if factored:exp_avg_sq_row state[exp_avg_sq_row]exp_avg_sq_col state[exp_avg_sq_col]exp_avg_sq_row.mul_(group[betas][1]).add_(update.mean(dim-1), alpha1.0 - group[betas][1])exp_avg_sq_col.mul_(group[betas][1]).add_(update.mean(dim-2), alpha1.0 - group[betas][1])# Approximation of exponential moving average of square of gradientupdate self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)update.mul_(grad)else:exp_avg_sq state[exp_avg_sq]exp_avg_sq.mul_(group[betas][1]).add_(update, alpha1.0 - group[betas][1])update exp_avg_sq.rsqrt().mul_(grad)update.div_((self._rms(update) / group[clip_threshold]).clamp_(min1.0))exp_avg state[exp_avg]exp_avg.mul_(group[betas][0]).add_(update, alpha1 - group[betas][0])# Confidence-guided strategy# Calculation of instabilityres (update - exp_avg)**2 group[eps][1]if factored:exp_avg_res_row state[exp_avg_res_row]exp_avg_res_col state[exp_avg_res_col]exp_avg_res_row.mul_(group[betas][2]).add_(res.mean(dim-1), alpha1.0 - group[betas][2])exp_avg_res_col.mul_(group[betas][2]).add_(res.mean(dim-2), alpha1.0 - group[betas][2])# Approximation of exponential moving average of instabilityres_approx self._approx_sq_grad(exp_avg_res_row, exp_avg_res_col)update res_approx.mul_(exp_avg)else:update exp_avgif group[weight_decay] ! 0:p.data.add_(p.data, alpha-group[weight_decay] * group[lr])update.mul_(group[lr])p.data.add_(-update)return loss2在深度学习中调用CAME优化器 本文以使用LSTM算法对鸢尾花数据集进行分类为例并且在代码中加入早停和十折交叉验证技术。 import torch import torch.nn as nn import torch.optim as optim from sklearn.datasets import load_iris from sklearn.model_selection import StratifiedKFold from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score# 定义 LSTM 模型 class LSTMClassifier(nn.Module):def __init__(self, input_size, hidden_size, num_classes):super(LSTMClassifier, self).__init__()self.lstm nn.LSTM(input_sizeinput_size, hidden_sizehidden_size, batch_firstTrue)self.fc nn.Linear(hidden_size, num_classes)def forward(self, x):_, (hn, _) self.lstm(x)out self.fc(hn[-1]) # 选择最后一个 LSTM 隐层输出return out# 早停 class EarlyStopping:def __init__(self, patience5, min_delta0):self.patience patienceself.min_delta min_deltaself.best_loss float(inf)self.counter 0self.early_stop Falsedef step(self, val_loss):if val_loss self.best_loss - self.min_delta:self.best_loss val_lossself.counter 0else:self.counter 1if self.counter self.patience:self.early_stop True# 读取数据 iris load_iris() X iris.data y iris.target# 标准化数据 scaler StandardScaler() X scaler.fit_transform(X)# 将数据转换为 PyTorch 张量 X torch.tensor(X, dtypetorch.float32) y torch.tensor(y, dtypetorch.long)# 配置模型参数 input_size X.shape[1] # 特征数量 hidden_size 32 num_classes 3 batch_size 16 num_epochs 100 learning_rate 0.001 patience 5# 进行十折交叉验证 kf StratifiedKFold(n_splits10, shuffleTrue, random_state42) fold_idx 0for train_index, val_index in kf.split(X, y):fold_idx 1print(fFold {fold_idx})# 划分训练集和验证集X_train, X_val X[train_index], X[val_index]y_train, y_val y[train_index], y[val_index]# 定义模型和优化器model LSTMClassifier(input_size, hidden_size, num_classes)optimizer CAME(model.parameters(), lr2e-4, weight_decay1e-2, betas(0.9, 0.999, 0.9999), eps(1e-30, 1e-16))# optimizer optim.Adam(model.parameters(), lrlearning_rate)criterion nn.CrossEntropyLoss()# 早停设置early_stopping EarlyStopping(patiencepatience)# 训练模型for epoch in range(num_epochs):# 训练阶段model.train()optimizer.zero_grad()outputs model(X_train.unsqueeze(1))loss criterion(outputs, y_train)loss.backward()optimizer.step()# 验证阶段model.eval()with torch.no_grad():val_outputs model(X_val.unsqueeze(1))val_loss criterion(val_outputs, y_val)# 打印每轮迭代的损失值print(fEpoch {epoch 1}: Train Loss {loss.item():.4f}, Val Loss {val_loss.item():.4f})# 早停检查early_stopping.step(val_loss.item())if early_stopping.early_stop:print(fEarly stopping at epoch {epoch 1})break# 评估模型model.eval()with torch.no_grad():val_outputs model(X_val.unsqueeze(1))_, predicted torch.max(val_outputs, 1)accuracy accuracy_score(y_val, predicted)print(fFold {fold_idx} Validation Accuracy: {accuracy:.4f}\n)由于CAME主要面向NLP数据集因此对于鸢尾花效果不算好本文仅展示CAME的使用方法并非提升acc和epoch。 参考文献Luo, Yang, et al. “CAME: Confidence-guided Adaptive Memory Efficient Optimization.” arXiv preprint arXiv:2307.02047 (2023).
http://www.pierceye.com/news/629019/

相关文章:

  • wordpress内容做成目录seo排名分析
  • 大型网站 网站建设做网站赔了8万
  • python官方网站建设网站要什么
  • 青海 网站开发 图灵如何创建自己的网页
  • 建设银行网站怎么登陆不做网站首页的尺寸
  • 谁能给我一个网站谢谢dedecms收费怎么办
  • dede 网站地图 模块青岛做网站服务商
  • 征信网站开发扬州市建设局网站
  • 教育网站建设 飞沐软件定制公司值得去吗
  • 金耀网站建设网站制作景观建筑人才网
  • 仿《爱美眉》网站 dede门户网站的主要功能
  • 外发加工网站深圳如何优化
  • 做设计在哪个网站上找高清图片大全网站建设风险分析
  • 做兼职哪个网站好哪些网站做免费送东西的广告6
  • 网站建设战略互动模板wordpress
  • 三原网站建设网易企业邮箱登录v
  • 为网站营销好处wordpress tar.xz
  • wordpress建站比较淘宝客网站怎么建设
  • 网站结构有哪些安徽省建设工程信息网官方网站
  • 如何查看网站是否备案直播网站怎么做啊
  • 广西做网站的公司投资融资理财网站模板
  • 做网站的颜色游戏推广员拉人犯法吗
  • 金融审核网站制作站长之家网址ip查询
  • 石家庄做家教网站网络营销网站建设
  • 怎么做淘宝网站赚钱吗怎样提高百度推广排名
  • 购物网站建设成本u9u8网站建设
  • 抚州市住房和城乡建设局网站手机网站素材
  • 用dw做音乐网站模板策划公司收费明细
  • 大气手机网站模板免费下载南昌seo排名
  • 做卖衣服网站源代码seo搜索引擎优化名词解释