wordpress前端ajax登录 注册,厦门做网站优化多少钱,咖啡公司网站建设策划书,服务器搭建要多少钱论文创新点汇总#xff1a;人工智能论文通用创新点(持续更新中...)-CSDN博客
论文总结
关于如何提升模型速度#xff0c;当今学术界的研究往往聚焦于如何将FLOPs或者参数量的降低#xff0c;而作者认为应该是减少分支数和选择高效的网络结构。
概述
MobileOne(≈MobileN…论文创新点汇总人工智能论文通用创新点(持续更新中...)-CSDN博客
论文总结
关于如何提升模型速度当今学术界的研究往往聚焦于如何将FLOPs或者参数量的降低而作者认为应该是减少分支数和选择高效的网络结构。
概述
MobileOne(≈MobileNetV1RepVGG训练Trick)是由Apple公司提出的一种基于iPhone12优化的超轻量型架构在ImageNet数据集上以1ms的速度取得了75.9%的Top1精度。 MobileOne
高效率网络具有很高的价值但学术界的研究往往聚焦于如何将FLOPs或者参数量的降低而这两者与推理效率之间并不存在严格的一致性。比如FLOPs并未考虑访存消耗与计算并行度像无参操作(如跳过连接导致的Add、Concat等)会带来显著的访存消耗导致更长推理耗时。
度量关系 从上图可以看出延迟的快慢与模型的参数量或者FLOPs的相关性较弱在CPU端相关性更弱。
关键瓶颈
本文对影响模型性能的两个瓶颈进行分析并提出相应方案
激活函数
从下表可以看出尽管具有相同的架构但不同激活函数导致的延迟差异极大。本文默认选择ReLU激活函数。 结构块
从下表可以看出当采用单分支结构时模型具有更快的速度。为改善效率本文在大模型配置方面有限的采用了SE模块。 MobileOne Block
MobileOne的核心模块基于MobileNetV1而设计同时吸收了重参数思想得到下图所示的结构。 图3。MobileOne块在训练时间和测试时间有两种不同的结构。左:训练时间MobileOne块与可重新参数化的分支。右:MobileOne块在推理分支被重新参数化。ReLU或SEReLU被用作激活。琐碎的过参数化因子k是一个针对每个变量进行调优的超参数。
结构
最近的工作在于如何缩放模型尺寸如宽度、深度和分辨率以提高性能[22,54]。MobileOne具有与MobileNet-V2相似的深度缩放即使用较浅的早期阶段其中输入分辨率较大因为这些层与使用较小输入分辨率的后期阶段相比要慢得多。我们引入5种不同的宽度尺度如表7所示。
其中用于控制通道数为分支数
S1-S4的区别更多是S4分支数少S1分支数多 表8所示。MobileOne-S2在各种训练设置上的消融在ImageNet上显示了Top-1的精度。 图4。MobileOne-S0模型的训练和验证损失图。从无分支到增加k1的可重新参数化分支列车损失降低3.4%。增加更多的分支(k4)可使列车损失额外降低约1%。从没有分支到具有可重新参数化分支的变体(k4)验证损失提高了3.1%。
实验 我们使用S1和S2标准增强-随机调整大小的裁剪和水平翻转。我们还使用衰减常数为0.9995的EMA(指数移动平均)加权平均来训练所有版本的MobileOne。在测试时所有MobileOne模型在分辨率为224 × 224的图像上进行评估。在表9中我们比较了所有最近的高效模型这些模型是在分辨率为224×224的图像上进行评估的同时参数计数 2000万并且没有像以前的工作(如[5,45])那样进行蒸馏训练。FLOP计数是使用fvcore[17]库报告的。
我们表明即使是最小的变压器架构变体在移动设备上也有超过4ms的延迟。目前最先进的MobileFormer[5]以70.76ms的延迟达到79.3%的最高精度而MobileOne-S4以1.86ms的延迟达到79.4%这在移动设备上快了38倍。MobileOne-S3的top-1准确率比EfficientNet-B0高1%在移动设备上则快11%。与竞争对手的方法相比我们的模型甚至在CPU和GPU上具有更低的延迟。
讨论
我们提出了一种高效、通用的移动设备骨干网。我们的主干适用于一般任务如图像分类目标检测和语义分割。我们表明在有效状态下延迟可能与其他指标(如参数计数和FLOPs)不太相关。此外我们通过直接在移动设备上测量其延迟来分析现代高效cnn中使用的各种架构组件的效率瓶颈。我们通过经验证明了使用可重新参数化结构对优化瓶颈的改善。我们使用可重新参数化结构的模型扩展策略获得了最先进的性能同时在移动设备和桌面CPU上都很高效。
核心代码
源码实现
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups1):result nn.Sequential()result.add_module(conv, nn.Conv2d(in_channelsin_channels, out_channelsout_channels,kernel_sizekernel_size, stridestride, paddingpadding, groupsgroups,biasFalse))result.add_module(bn, nn.BatchNorm2d(num_featuresout_channels))return resultclass DepthWiseConv(nn.Module):def __init__(self, inc, kernel_size, stride1):super().__init__()padding 1if kernel_size 1:padding 0# self.conv nn.Sequential(# nn.Conv2d(inc, inc, kernel_size, stride, padding, groupsinc, biasFalse,),# nn.BatchNorm2d(inc),# )self.conv conv_bn(inc, inc, kernel_size, stride, padding, inc)def forward(self, x):return self.conv(x)class PointWiseConv(nn.Module):def __init__(self, inc, outc):super().__init__()# self.conv nn.Sequential(# nn.Conv2d(inc, outc, 1, 1, 0, biasFalse),# nn.BatchNorm2d(outc),# )self.conv conv_bn(inc, outc, 1, 1, 0)def forward(self, x):return self.conv(x)class MobileOneBlock(nn.Module):def __init__(self, in_channels, out_channels, k,stride1, dilation1, padding_modezeros, deployFalse, use_seFalse):super(MobileOneBlock, self).__init__()self.deploy deployself.in_channels in_channelsself.out_channels out_channelsself.deploy deploykernel_size 3padding 1assert kernel_size 3assert padding 1self.k kpadding_11 padding - kernel_size // 2self.nonlinearity nn.ReLU()if use_se:# self.se SEBlock(out_channels, internal_neuronsout_channels // 16)...else:self.se nn.Identity()if deploy:self.dw_reparam nn.Conv2d(in_channelsin_channels, out_channelsin_channels, kernel_sizekernel_size,stridestride,paddingpadding, dilationdilation, groupsin_channels, biasTrue,padding_modepadding_mode)self.pw_reparam nn.Conv2d(in_channelsin_channels, out_channelsout_channels, kernel_size1, stride1,biasTrue)else:self.dw_bn_layer nn.BatchNorm2d(in_channels) if out_channels in_channels and stride 1 else Nonefor k_idx in range(k):setattr(self, fdw_3x3_{k_idx},DepthWiseConv(in_channels, 3, stridestride))self.dw_1x1 DepthWiseConv(in_channels, 1, stridestride)self.pw_bn_layer nn.BatchNorm2d(in_channels) if out_channels in_channels and stride 1 else Nonefor k_idx in range(k):setattr(self, fpw_1x1_{k_idx},PointWiseConv(in_channels, out_channels))def forward(self, inputs):if self.deploy:x self.dw_reparam(inputs)x self.nonlinearity(x)x self.pw_reparam(x)x self.nonlinearity(x)return xif self.dw_bn_layer is None:id_out 0else:id_out self.dw_bn_layer(inputs)x_conv_3x3 []for k_idx in range(self.k):x getattr(self, fdw_3x3_{k_idx})(inputs)x_conv_3x3.append(x)x_conv_1x1 self.dw_1x1(inputs)x id_out x_conv_1x1 sum(x_conv_3x3)x self.nonlinearity(self.se(x))# 1x1 convif self.pw_bn_layer is None:id_out 0else:id_out self.pw_bn_layer(x)x_conv_1x1 []for k_idx in range(self.k):x_conv_1x1.append(getattr(self, fpw_1x1_{k_idx})(x))x id_out sum(x_conv_1x1)x self.nonlinearity(x)return xclass MobileOne(nn.Module):# MobileOnedef __init__(self, in_channels, out_channels, n, k,stride1, dilation1, padding_modezeros, deployFalse, use_seFalse):super().__init__()self.m nn.Sequential(*[MobileOneBlock(in_channels, out_channels, k, stride, deploy) for _ in range(n)])def forward(self, x):x self.m(x)return x
common.py
# This file contains modules common to various models
import mathimport torch
import torch.nn as nn
from utils.general import non_max_suppression
import torch.nn.functional as Fimport torch.nn as nn
import numpy as np
import torchdef conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups1):result nn.Sequential()result.add_module(conv, nn.Conv2d(in_channelsin_channels, out_channelsout_channels,kernel_sizekernel_size, stridestride, paddingpadding, groupsgroups,biasFalse))result.add_module(bn, nn.BatchNorm2d(num_featuresout_channels))return resultclass DepthWiseConv(nn.Module):def __init__(self, inc, kernel_size, stride1):super().__init__()padding 1if kernel_size 1:padding 0# self.conv nn.Sequential(# nn.Conv2d(inc, inc, kernel_size, stride, padding, groupsinc, biasFalse,),# nn.BatchNorm2d(inc),# )self.conv conv_bn(inc, inc, kernel_size, stride, padding, inc)def forward(self, x):return self.conv(x)class PointWiseConv(nn.Module):def __init__(self, inc, outc):super().__init__()# self.conv nn.Sequential(# nn.Conv2d(inc, outc, 1, 1, 0, biasFalse),# nn.BatchNorm2d(outc),# )self.conv conv_bn(inc, outc, 1, 1, 0)def forward(self, x):return self.conv(x)class MobileOneBlock(nn.Module):def __init__(self, in_channels, out_channels, k,stride1, dilation1, padding_modezeros, deployFalse, use_seFalse):super(MobileOneBlock, self).__init__()self.deploy deployself.in_channels in_channelsself.out_channels out_channelsself.deploy deploykernel_size 3padding 1assert kernel_size 3assert padding 1self.k kpadding_11 padding - kernel_size // 2self.nonlinearity nn.ReLU()if use_se:# self.se SEBlock(out_channels, internal_neuronsout_channels // 16)...else:self.se nn.Identity()if deploy:self.dw_reparam nn.Conv2d(in_channelsin_channels, out_channelsin_channels, kernel_sizekernel_size,stridestride,paddingpadding, dilationdilation, groupsin_channels, biasTrue,padding_modepadding_mode)self.pw_reparam nn.Conv2d(in_channelsin_channels, out_channelsout_channels, kernel_size1, stride1,biasTrue)else:self.dw_bn_layer nn.BatchNorm2d(in_channels) if out_channels in_channels and stride 1 else Nonefor k_idx in range(k):setattr(self, fdw_3x3_{k_idx},DepthWiseConv(in_channels, 3, stridestride))self.dw_1x1 DepthWiseConv(in_channels, 1, stridestride)self.pw_bn_layer nn.BatchNorm2d(in_channels) if out_channels in_channels and stride 1 else Nonefor k_idx in range(k):setattr(self, fpw_1x1_{k_idx},PointWiseConv(in_channels, out_channels))def forward(self, inputs):if self.deploy:x self.dw_reparam(inputs)x self.nonlinearity(x)x self.pw_reparam(x)x self.nonlinearity(x)return xif self.dw_bn_layer is None:id_out 0else:id_out self.dw_bn_layer(inputs)x_conv_3x3 []for k_idx in range(self.k):x getattr(self, fdw_3x3_{k_idx})(inputs)x_conv_3x3.append(x)x_conv_1x1 self.dw_1x1(inputs)x id_out x_conv_1x1 sum(x_conv_3x3)x self.nonlinearity(self.se(x))# 1x1 convif self.pw_bn_layer is None:id_out 0else:id_out self.pw_bn_layer(x)x_conv_1x1 []for k_idx in range(self.k):x_conv_1x1.append(getattr(self, fpw_1x1_{k_idx})(x))x id_out sum(x_conv_1x1)x self.nonlinearity(x)return xclass MobileOne(nn.Module):# MobileOnedef __init__(self, in_channels, out_channels, n, k,stride1, dilation1, padding_modezeros, deployFalse, use_seFalse):super().__init__()self.m nn.Sequential(*[MobileOneBlock(in_channels, out_channels, k, stride, deploy) for _ in range(n)])def forward(self, x):x self.m(x)return x
#MobileOneclass LayerNorm_s(nn.Module):def __init__(self, normalized_shape, eps1e-6, data_formatchannels_last):super().__init__()self.weight nn.Parameter(torch.ones(normalized_shape))self.bias nn.Parameter(torch.zeros(normalized_shape))self.eps epsself.data_format data_formatif self.data_format not in [channels_last, channels_first]:raise NotImplementedErrorself.normalized_shape (normalized_shape,)def forward(self, x):if self.data_format channels_last:return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)elif self.data_format channels_first:u x.mean(1, keepdimTrue)s (x - u).pow(2).mean(1, keepdimTrue)x (x - u) / torch.sqrt(s self.eps)x self.weight[:, None, None] * x self.bias[:, None, None]return xclass ConvNextBlock(nn.Module):def __init__(self, dim, drop_path0., layer_scale_init_value1e-6):super().__init__()self.dwconv nn.Conv2d(dim, dim, kernel_size7, padding3, groupsdim) # depthwise convself.norm LayerNorm_s(dim, eps1e-6)self.pwconv1 nn.Linear(dim, 4 * dim)self.act nn.GELU()self.pwconv2 nn.Linear(4 * dim, dim)self.gamma nn.Parameter(layer_scale_init_value * torch.ones((dim)),requires_gradTrue) if layer_scale_init_value 0 else Noneself.drop_path DropPath(drop_path) if drop_path 0. else nn.Identity()def forward(self, x):input xx self.dwconv(x)x x.permute(0, 2, 3, 1) # (N, C, H, W) - (N, H, W, C)x self.norm(x)x self.pwconv1(x)x self.act(x)x self.pwconv2(x)if self.gamma is not None:x self.gamma * xx x.permute(0, 3, 1, 2) # (N, H, W, C) - (N, C, H, W)x input self.drop_path(x)return xclass DropPath(nn.Module):Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).def __init__(self, drop_probNone):super(DropPath, self).__init__()self.drop_prob drop_probdef forward(self, x):return drop_path_f(x, self.drop_prob, self.training)def drop_path_f(x, drop_prob: float 0., training: bool False):if drop_prob 0. or not training:return xkeep_prob 1 - drop_probshape (x.shape[0],) (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNetsrandom_tensor keep_prob torch.rand(shape, dtypex.dtype, devicex.device)random_tensor.floor_() # binarizeoutput x.div(keep_prob) * random_tensorreturn outputclass CNeB(nn.Module):# CSP ConvNextBlock with 3 convolutions by iscyy/yoloairdef __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5): # ch_in, ch_out, number, shortcut, groups, expansionsuper().__init__()c_ int(c2 * e) # hidden channelsself.cv1 Conv(c1, c_, 1, 1)self.cv2 Conv(c1, c_, 1, 1)self.cv3 Conv(2 * c_, c2, 1)self.m nn.Sequential(*(ConvNextBlock(c_) for _ in range(n)))def forward(self, x):return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim1))
#CONVNEXT BLOCKdef position(H, W, is_cudaTrue):if is_cuda:loc_w torch.linspace(-1.0, 1.0, W).cuda().unsqueeze(0).repeat(H, 1)loc_h torch.linspace(-1.0, 1.0, H).cuda().unsqueeze(1).repeat(1, W)else:loc_w torch.linspace(-1.0, 1.0, W).unsqueeze(0).repeat(H, 1)loc_h torch.linspace(-1.0, 1.0, H).unsqueeze(1).repeat(1, W)loc torch.cat([loc_w.unsqueeze(0), loc_h.unsqueeze(0)], 0).unsqueeze(0)return locdef stride(x, stride):b, c, h, w x.shapereturn x[:, :, ::stride, ::stride]def init_rate_half(tensor):if tensor is not None:tensor.data.fill_(0.5)def init_rate_0(tensor):if tensor is not None:tensor.data.fill_(0.)class ACmix(nn.Module):def __init__(self, in_planes, out_planes, kernel_att7, head4, kernel_conv3, stride1, dilation1):super(ACmix, self).__init__()self.in_planes in_planesself.out_planes out_planesself.head headself.kernel_att kernel_attself.kernel_conv kernel_convself.stride strideself.dilation dilationself.rate1 torch.nn.Parameter(torch.Tensor(1))self.rate2 torch.nn.Parameter(torch.Tensor(1))self.head_dim self.out_planes // self.headself.conv1 nn.Conv2d(in_planes, out_planes, kernel_size1)self.conv2 nn.Conv2d(in_planes, out_planes, kernel_size1)self.conv3 nn.Conv2d(in_planes, out_planes, kernel_size1)self.conv_p nn.Conv2d(2, self.head_dim, kernel_size1)self.padding_att (self.dilation * (self.kernel_att - 1) 1) // 2self.pad_att torch.nn.ReflectionPad2d(self.padding_att)self.unfold nn.Unfold(kernel_sizeself.kernel_att, padding0, strideself.stride)self.softmax torch.nn.Softmax(dim1)self.fc nn.Conv2d(3 * self.head, self.kernel_conv * self.kernel_conv, kernel_size1, biasFalse)self.dep_conv nn.Conv2d(self.kernel_conv * self.kernel_conv * self.head_dim, out_planes,kernel_sizeself.kernel_conv, biasTrue, groupsself.head_dim, padding1,stridestride)self.reset_parameters()def reset_parameters(self):init_rate_half(self.rate1)init_rate_half(self.rate2)kernel torch.zeros(self.kernel_conv * self.kernel_conv, self.kernel_conv, self.kernel_conv)for i in range(self.kernel_conv * self.kernel_conv):kernel[i, i // self.kernel_conv, i % self.kernel_conv] 1.kernel kernel.squeeze(0).repeat(self.out_planes, 1, 1, 1)self.dep_conv.weight nn.Parameter(datakernel, requires_gradTrue)self.dep_conv.bias init_rate_0(self.dep_conv.bias)def forward(self, x):q, k, v self.conv1(x), self.conv2(x), self.conv3(x)scaling float(self.head_dim) ** -0.5b, c, h, w q.shapeh_out, w_out h // self.stride, w // self.stride# ### att# ## positional encodingpe self.conv_p(position(h, w, x.is_cuda))q_att q.view(b * self.head, self.head_dim, h, w) * scalingk_att k.view(b * self.head, self.head_dim, h, w)v_att v.view(b * self.head, self.head_dim, h, w)if self.stride 1:q_att stride(q_att, self.stride)q_pe stride(pe, self.stride)else:q_pe peunfold_k self.unfold(self.pad_att(k_att)).view(b * self.head, self.head_dim,self.kernel_att * self.kernel_att, h_out,w_out) # b*head, head_dim, k_att^2, h_out, w_outunfold_rpe self.unfold(self.pad_att(pe)).view(1, self.head_dim, self.kernel_att * self.kernel_att, h_out,w_out) # 1, head_dim, k_att^2, h_out, w_outatt (q_att.unsqueeze(2) * (unfold_k q_pe.unsqueeze(2) - unfold_rpe)).sum(1) # (b*head, head_dim, 1, h_out, w_out) * (b*head, head_dim, k_att^2, h_out, w_out) - (b*head, k_att^2, h_out, w_out)att self.softmax(att)out_att self.unfold(self.pad_att(v_att)).view(b * self.head, self.head_dim, self.kernel_att * self.kernel_att,h_out, w_out)out_att (att.unsqueeze(1) * out_att).sum(2).view(b, self.out_planes, h_out, w_out)## convf_all self.fc(torch.cat([q.view(b, self.head, self.head_dim, h * w), k.view(b, self.head, self.head_dim, h * w),v.view(b, self.head, self.head_dim, h * w)], 1))f_conv f_all.permute(0, 2, 1, 3).reshape(x.shape[0], -1, x.shape[-2], x.shape[-1])out_conv self.dep_conv(f_conv)return self.rate1 * out_att self.rate2 * out_convdef autopad(k, pNone): # kernel, padding# Pad to sameif p is None:p k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-padreturn pdef DWConv(c1, c2, k1, s1, actTrue):# Depthwise convolutionreturn Conv(c1, c2, k, s, gmath.gcd(c1, c2), actact)class h_sigmoid(nn.Module):def __init__(self, inplaceTrue):super(h_sigmoid, self).__init__()self.relu nn.ReLU6(inplaceinplace)def forward(self, x):return self.relu(x 3) / 6class h_swish(nn.Module):def __init__(self, inplaceTrue):super(h_swish, self).__init__()self.sigmoid h_sigmoid(inplaceinplace)def forward(self, x):return x * self.sigmoid(x)class CA(nn.Module):# Coordinate Attention for Efficient Mobile Network DesignRecent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for liftingmodel performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose anovel attention mechanism for mobile iscyy networks by embedding positional information into channel attention, whichwe call “coordinate attention”. Unlike channel attentionthat transforms a feature tensor to a single feature vector iscyy via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encodingprocesses that aggregate features along the two spatial directions, respectivelydef __init__(self, inp, oup, reduction32):super(CA, self).__init__()mip max(8, inp // reduction)self.conv1 nn.Conv2d(inp, mip, kernel_size1, stride1, padding0)self.bn1 nn.BatchNorm2d(mip)self.act h_swish()self.conv_h nn.Conv2d(mip, oup, kernel_size1, stride1, padding0)self.conv_w nn.Conv2d(mip, oup, kernel_size1, stride1, padding0)def forward(self, x):identity xn, c, h, w x.size()pool_h nn.AdaptiveAvgPool2d((h, 1))pool_w nn.AdaptiveAvgPool2d((1, w))x_h pool_h(x)x_w pool_w(x).permute(0, 1, 3, 2)y torch.cat([x_h, x_w], dim2)y self.conv1(y)y self.bn1(y)y self.act(y)x_h, x_w torch.split(y, [h, w], dim2)x_w x_w.permute(0, 1, 3, 2)a_h self.conv_h(x_h).sigmoid()a_w self.conv_w(x_w).sigmoid()out identity * a_w * a_hreturn outclass space_to_depth(nn.Module):# Changing the dimension of the Tensordef __init__(self, dimension1):super().__init__()self.d dimensiondef forward(self, x):return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)class Conv(nn.Module):# Standard convolutiondef __init__(self, c1, c2, k1, s1, pNone, g1, actTrue): # ch_in, ch_out, kernel, stride, padding, groupssuper(Conv, self).__init__()self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p), groupsg, biasFalse)self.bn nn.BatchNorm2d(c2)self.act nn.Hardswish() if act else nn.Identity()def forward(self, x):return self.act(self.bn(self.conv(x)))def fuseforward(self, x):return self.act(self.conv(x))class Bottleneck(nn.Module):# Standard bottleneckdef __init__(self, c1, c2, shortcutTrue, g1, e0.5): # ch_in, ch_out, shortcut, groups, expansionsuper(Bottleneck, self).__init__()c_ int(c2 * e) # hidden channelsself.cv1 Conv(c1, c_, 1, 1)self.cv2 Conv(c_, c2, 3, 1, gg)self.add shortcut and c1 c2def forward(self, x):return x self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))class BottleneckCSP(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5): # ch_in, ch_out, number, shortcut, groups, expansionsuper(BottleneckCSP, self).__init__()c_ int(c2 * e) # hidden channelsself.cv1 Conv(c1, c_, 1, 1)self.cv2 nn.Conv2d(c1, c_, 1, 1, biasFalse)self.cv3 nn.Conv2d(c_, c_, 1, 1, biasFalse)self.cv4 Conv(2 * c_, c2, 1, 1)self.bn nn.BatchNorm2d(2 * c_) # applied to cat(cv2, cv3)self.act nn.LeakyReLU(0.1, inplaceTrue)self.m nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e1.0) for _ in range(n)])def forward(self, x):y1 self.cv3(self.m(self.cv1(x)))y2 self.cv2(x)return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim1))))class SPP(nn.Module):# Spatial pyramid pooling layer used in YOLOv3-SPPdef __init__(self, c1, c2, k(5, 9, 13)):super(SPP, self).__init__()c_ c1 // 2 # hidden channelsself.cv1 Conv(c1, c_, 1, 1)self.cv2 Conv(c_ * (len(k) 1), c2, 1, 1)self.m nn.ModuleList([nn.MaxPool2d(kernel_sizex, stride1, paddingx // 2) for x in k])def forward(self, x):x self.cv1(x)return self.cv2(torch.cat([x] [m(x) for m in self.m], 1))class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k1, s1, pNone, g1, actTrue): # ch_in, ch_out, kernel, stride, padding, groupssuper(Focus, self).__init__()self.conv Conv(c1 * 4, c2, k, s, p, g, act)def forward(self, x): # x(b,c,w,h) - y(b,4c,w/2,h/2)return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))class Concat(nn.Module):# Concatenate a list of tensors along dimensiondef __init__(self, dimension1):super(Concat, self).__init__()self.d dimensiondef forward(self, x):return torch.cat(x, self.d)class NMS(nn.Module):# Non-Maximum Suppression (NMS) moduleconf 0.3 # confidence thresholdiou 0.6 # IoU thresholdclasses None # (optional list) filter by classdef __init__(self, dimension1):super(NMS, self).__init__()def forward(self, x):return non_max_suppression(x[0], conf_thresself.conf, iou_thresself.iou, classesself.classes)class Flatten(nn.Module):# Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensionsstaticmethoddef forward(x):return x.view(x.size(0), -1)class Classify(nn.Module):# Classification head, i.e. x(b,c1,20,20) to x(b,c2)def __init__(self, c1, c2, k1, s1, pNone, g1): # ch_in, ch_out, kernel, stride, padding, groupssuper(Classify, self).__init__()self.aap nn.AdaptiveAvgPool2d(1) # to x(b,c1,1,1)self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p), groupsg, biasFalse) # to x(b,c2,1,1)self.flat Flatten()def forward(self, x):z torch.cat([self.aap(y) for y in (x if isinstance(x, list) else [x])], 1) # cat if listreturn self.flat(self.conv(z)) # flatten to x(b,c2)yolo.py
import argparse
import logging
import math
import sys
from copy import deepcopy
from pathlib import Pathsys.path.append(./) # to run $ python *.py files in subdirectories
logger logging.getLogger(__name__)import torch
import torch.nn as nnfrom models.common import Conv, Bottleneck, SPP, DWConv, Focus, BottleneckCSP, Concat, NMS,space_to_depth,CA,ACmix,CNeB,MobileOne
from models.experimental import MixConv2d, CrossConv, C3
from models.gc import CB2D
from utils.general import check_anchor_order, make_divisible, check_file, set_logging
from utils.torch_utils import (time_synchronized, fuse_conv_and_bn, model_info, scale_img, initialize_weights, select_device)class Detect(nn.Module):stride None # strides computed during buildexport False # onnx exportdef __init__(self, nc80, anchors(), ch()): # detection layersuper(Detect, self).__init__()self.nc nc # number of classesself.no nc 5 # number of outputs per anchorself.nl len(anchors) # number of detection layersself.na len(anchors[0]) // 2 # number of anchorsself.grid [torch.zeros(1)] * self.nl # init grida torch.tensor(anchors).float().view(self.nl, -1, 2)self.register_buffer(anchors, a) # shape(nl,na,2)self.register_buffer(anchor_grid, a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)self.m nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output convdef forward(self, x):# x x.copy() # for profilingz [] # inference outputself.training | self.exportfor i in range(self.nl):x[i] self.m[i](x[i]) # convbs, _, ny, nx x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)x[i] x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()if not self.training: # inferenceif self.grid[i].shape[2:4] ! x[i].shape[2:4]:self.grid[i] self._make_grid(nx, ny).to(x[i].device)y x[i].sigmoid()y[..., 0:2] (y[..., 0:2] * 2. - 0.5 self.grid[i].to(x[i].device)) * self.stride[i] # xyy[..., 2:4] (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # whz.append(y.view(bs, -1, self.no))return x if self.training else (torch.cat(z, 1), x)staticmethoddef _make_grid(nx20, ny20):yv, xv torch.meshgrid([torch.arange(ny), torch.arange(nx)])return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()class Model(nn.Module):def __init__(self, cfgyolov5s.yaml, ch3, ncNone): # model, input channels, number of classessuper(Model, self).__init__()if isinstance(cfg, dict):self.yaml cfg # model dictelse: # is *.yamlimport yaml # for torch hubself.yaml_file Path(cfg).namewith open(cfg) as f:self.yaml yaml.load(f, Loaderyaml.FullLoader) # model dict# Define modelif nc and nc ! self.yaml[nc]:print(Overriding model.yaml nc%g with nc%g % (self.yaml[nc], nc))self.yaml[nc] nc # override yaml valueself.model, self.save parse_model(deepcopy(self.yaml), ch[ch]) # model, savelist, ch_out# print([x.shape for x in self.forward(torch.zeros(1, ch, 64, 64))])# Build strides, anchorsm self.model[-1] # Detect()if isinstance(m, Detect):s 128 # 2x min stridem.stride torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forwardm.anchors / m.stride.view(-1, 1, 1)check_anchor_order(m)self.stride m.strideself._initialize_biases() # only run once# print(Strides: %s % m.stride.tolist())# Init weights, biasesinitialize_weights(self)self.info()print()def forward(self, x, augmentFalse, profileFalse):if augment:img_size x.shape[-2:] # height, widths [1, 0.83, 0.67] # scalesf [None, 3, None] # flips (2-ud, 3-lr)y [] # outputsfor si, fi in zip(s, f):xi scale_img(x.flip(fi) if fi else x, si)yi self.forward_once(xi)[0] # forward# cv2.imwrite(img%g.jpg % s, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) # saveyi[..., :4] / si # de-scaleif fi 2:yi[..., 1] img_size[0] - yi[..., 1] # de-flip udelif fi 3:yi[..., 0] img_size[1] - yi[..., 0] # de-flip lry.append(yi)return torch.cat(y, 1), None # augmented inference, trainelse:return self.forward_once(x, profile) # single-scale inference, traindef forward_once(self, x, profileFalse):y, dt [], [] # outputsi 1for m in self.model:if m.f ! -1: # if not from previous layerx y[m.f] if isinstance(m.f, int) else [x if j -1 else y[j] for j in m.f] # from earlier layersif profile:try:import thopo thop.profile(m, inputs(x,), verboseFalse)[0] / 1E9 * 2 # FLOPSexcept:o 0t time_synchronized()for _ in range(10):_ m(x)dt.append((time_synchronized() - t) * 100)print(%10.1f%10.0f%10.1fms %-40s % (o, m.np, dt[-1], m.type))x m(x) # run#print(层数,i,特征图大小,x.shape)i1y.append(x if m.i in self.save else None) # save outputif profile:print(%.1fms total % sum(dt))return xdef _initialize_biases(self, cfNone): # initialize biases into Detect(), cf is class frequency# cf torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlengthnc) 1.m self.model[-1] # Detect() modulefor mi, s in zip(m.m, m.stride): # fromb mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)with torch.no_grad():b[:, 4] math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)b[:, 5:] math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # clsmi.bias torch.nn.Parameter(b.view(-1), requires_gradTrue)def _print_biases(self):m self.model[-1] # Detect() modulefor mi in m.m: # fromb mi.bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)print((%6g Conv2d.bias: %10.3g * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))# def _print_weights(self):# for m in self.model.modules():# if type(m) is Bottleneck:# print(%10.3g % (m.w.detach().sigmoid() * 2)) # shortcut weightsdef fuse(self): # fuse model Conv2d() BatchNorm2d() layersprint(Fusing layers... )for m in self.model.modules():if type(m) is Conv and hasattr(m, bn):m._non_persistent_buffers_set set() # pytorch 1.6.0 compatabilitym.conv fuse_conv_and_bn(m.conv, m.bn) # update convdelattr(m, bn) # remove batchnormm.forward m.fuseforward # update forwardself.info()return selfdef add_nms(self): # fuse model Conv2d() BatchNorm2d() layersif type(self.model[-1]) is not NMS: # if missing NMSprint(Adding NMS module... )m NMS() # modulem.f -1 # fromm.i self.model[-1].i 1 # indexself.model.add_module(name%s % m.i, modulem) # addreturn selfdef info(self, verboseFalse): # print model informationmodel_info(self, verbose)def parse_model(d, ch): # model_dict, input_channels(3)logger.info(\n%3s%18s%3s%10s %-40s%-30s % (, from, n, params, module, arguments))anchors, nc, gd, gw d[anchors], d[nc], d[depth_multiple], d[width_multiple]na (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchorsno na * (nc 5) # number of outputs anchors * (classes 5)layers, save, c2 [], [], ch[-1] # layers, savelist, ch outfor i, (f, n, m, args) in enumerate(d[backbone] d[head]): # from, number, module, argsm eval(m) if isinstance(m, str) else m # eval stringsfor j, a in enumerate(args):try:args[j] eval(a) if isinstance(a, str) else a # eval stringsexcept:passn max(round(n * gd), 1) if n 1 else n # depth gainif m in [Conv, Bottleneck, SPP, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP, C3]:c1, c2 ch[f], args[0]# Normal# if i 0 and args[0] ! no: # channel expansion factor# ex 1.75 # exponential (default 2.0)# e math.log(c2 / ch[1]) / math.log(2)# c2 int(ch[1] * ex ** e)# if m ! Focus:c2 make_divisible(c2 * gw, 8) if c2 ! no else c2# Experimental# if i 0 and args[0] ! no: # channel expansion factor# ex 1 gw # exponential (default 2.0)# ch1 32 # ch[1]# e math.log(c2 / ch1) / math.log(2) # level 1-n# c2 int(ch1 * ex ** e)# if m ! Focus:# c2 make_divisible(c2, 8) if c2 ! no else c2args [c1, c2, *args[1:]]if m in [BottleneckCSP, C3]:args.insert(2, n)n 1elif m is nn.BatchNorm2d:args [ch[f]]elif m is Concat:c2 sum([ch[-1 if x -1 else x 1] for x in f])elif m is Detect:args.append([ch[x 1] for x in f])if isinstance(args[1], int): # number of anchorsargs[1] [list(range(args[1] * 2))] * len(f)elif m is space_to_depth:c2 4 * ch[f]elif m in [CA]:c1, c2 ch[f], args[0]if c2 ! no: # if not outputssc2 make_divisible(c2 * gw, 8)args [c1, c2, *args[1:]]elif m in [CB2D]:c1, c2 ch[f], args[0]if c2 ! no: # if not outputc2 make_divisible(c2 * gw, 8)args [c1, c2, *args[1:]]if m in [CB2D]:args.insert(2, n) # number of repeatsn 1elif m in [ACmix]:c1, c2 ch[f], args[0]if c2 ! no: # if not outputc2 make_divisible(c2 * gw, 8)args [c1, c2, *args[1:]]elif m is CNeB:c1, c2 ch[f], args[0]if c2 ! no:c2 make_divisible(c2 * gw, 8)args [c1, c2, *args[1:]]if m is CNeB:args.insert(2, n)n 1elif m is MobileOne:c1, c2 ch[f], args[0]c2 make_divisible(c2 * gw, 8)args [c1, c2, n, *args[1:]]else:c2 ch[f]m_ nn.Sequential(*[m(*args) for _ in range(n)]) if n 1 else m(*args) # modulet str(m)[8:-2].replace(__main__., ) # module typenp sum([x.numel() for x in m_.parameters()]) # number paramsm_.i, m_.f, m_.type, m_.np i, f, t, np # attach index, from index, type, number paramslogger.info(%3s%18s%3s%10.0f %-40s%-30s % (i, f, n, np, t, args)) # printsave.extend(x % i for x in ([f] if isinstance(f, int) else f) if x ! -1) # append to savelistlayers.append(m_)ch.append(c2)return nn.Sequential(*layers), sorted(save)if __name__ __main__:parser argparse.ArgumentParser()parser.add_argument(--cfg, typestr, defaultyolov5s.yaml, helpmodel.yaml)parser.add_argument(--device, default, helpcuda device, i.e. 0 or 0,1,2,3 or cpu)opt parser.parse_args()opt.cfg check_file(opt.cfg) # check fileset_logging()device select_device(opt.device)# Create modelmodel Model(opt.cfg).to(device)model.train()# Profile# img torch.rand(8 if torch.cuda.is_available() else 1, 3, 640, 640).to(device)# y model(img, profileTrue)# ONNX export# model.model[-1].export True# torch.onnx.export(model, img, opt.cfg.replace(.yaml, .onnx), verboseTrue, opset_version11)# Tensorboard# from torch.utils.tensorboard import SummaryWriter# tb_writer SummaryWriter()# print(Run tensorboard --logdirmodels/runs to view tensorboard at http://localhost:6006/)# tb_writer.add_graph(model.model, img) # add model to tensorboard# tb_writer.add_image(test, img[0], dataformatsCWH) # add model to tensorboardyolov5s_mobileone.yaml
# parameters
nc: 2 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple# anchors
anchors:- [10,13, 16,30, 33,23] # P3/8- [30,61, 62,45, 59,119] # P4/16- [116,90, 156,198, 373,326] # P5/32# YOLOv5 backbone
backbone:# [from, number, module, args][[-1, 1, Focus, [64, 3]], # 0-P1/2[-1, 1, Conv, [128, 3, 2]], # 1-P2/4[-1, 3, MobileOne, [128,4,1]],[-1, 1, Conv, [256, 3, 2]], # 3-P3/8[-1, 9, MobileOne, [256,4,1]],[-1, 1, Conv, [512, 3, 2]], # 5-P4/16[-1, 9, MobileOne, [512,4,1]],[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32[-1, 1, SPP, [1024, [5, 9, 13]]],[-1, 3, MobileOne, [1024, 4,1]], # 9]# YOLOv5 head
head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, nearest]],[[-1, 6], 1, Concat, [1]], # cat backbone P4[-1, 3, BottleneckCSP, [512, False]], # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, nearest]],[[-1, 4], 1, Concat, [1]], # cat backbone P3[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]], # cat head P4[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]], # cat head P5[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)]参考资料
论文下载
https://arxiv.org/pdf/2206.04040.pdf 代码地址
GitHub - apple/ml-mobileone: This repository contains the official implementation of the research paper, An Improved One millisecond Mobile Backbone.