当前位置：首页 > news >正文

郑州网站推广平台吴彦祖做的艺术家网站

news 2025/12/21 0:31:42

郑州网站推广平台,吴彦祖做的艺术家网站,河北汉佳做网站的公司,wordpress底部友情链接目录 todaytorch.meshgrid()函数 today 今天我们主要来捋一捋AnchorsGenerator这部分代码,对应在network_files文件夹中的rpn_function文件中#xff0c;从RegionProposalNetwork()类的forward()函数开始看#xff0c;首先会进入head部分也就是我们看到的RPNHead部分,也就是… 目录 todaytorch.meshgrid()函数 today 今天我们主要来捋一捋AnchorsGenerator这部分代码,对应在network_files文件夹中的rpn_function文件中从RegionProposalNetwork()类的forward()函数开始看首先会进入head部分也就是我们看到的RPNHead部分,也就是比较小的虚线框框起来的那部分可以看到是从backbone得到特征矩阵后传入到RPNHead部分先直接看代码 class RPNHead(nn.Module):def __init__(self, in_channels, num_anchors):super(RPNHead, self).__init__()# 3x3 滑动窗口self.conv nn.Conv2d(in_channels, in_channels, kernel_size3, stride1, padding1)# 计算预测的目标分数这里的目标只是指前景或者背景self.cls_logits nn.Conv2d(in_channels, num_anchors, kernel_size1, stride1)# 计算预测的目标bbox regression参数self.bbox_pred nn.Conv2d(in_channels, num_anchors * 4, kernel_size1, stride1)for layer in self.children():if isinstance(layer, nn.Conv2d):torch.nn.init.normal_(layer.weight, std0.01)torch.nn.init.constant_(layer.bias, 0)def forward(self, x):# type: (List[Tensor]) - Tuple[List[Tensor], List[Tensor]]logits []bbox_reg []for i, feature in enumerate(x):t F.relu(self.conv(feature))logits.append(self.cls_logits(t))bbox_reg.append(self.bbox_pred(t))return logits, bbox_reg首先初始化了一个3x3的滑动窗口输入的channels就是backbone输出的channels(1280)输出的channels(1280)没变卷积核大小步长padding就不用了说了吧。接下来初始化了1x1大小的类别卷积层和1x1大小的预测卷积层对应输入的channels都是上一层的输出channels注意类别卷积层输出的channels是anchors的个数因为用的是二分类交叉熵损失所以每个anchor只需要一个分数就够了用于计算预测的目标分数这里的目标只是指前景或者背景预测卷积层的输出channels是4倍anchor的个数每个anchor对应4个坐标左上角两个坐标右下角两个坐标接着遍历children层存在conv2d层对conv2d层进行均值为0标准差为0.01的权重初始化偏置全为0的初始化。最后看正向传播初始化两个列表分别存放预测的目标分数和预测的目标bbox regression参数遍历经过backbone特征提取后的特征层因为前面讲过用的是MobileNetv2进行的特征提取所以最后只有一个特征层那么只会循环一次进过一系列卷积操作后得到最后的结果debug我们看一下logits和bbox_reg列表的结果可以看到最后logits和bbox_reg列表中都只有一个元素我这里得到的形状分别是(8, 15, 25, 38), (8, 60, 25, 38), 8是一个batch有8张图片15是最后输出15个anchor的结果60是最后输出的15个anchor乘上4个坐标的结果25x38就是卷积后的特征图的大小。注意大家debug得到的形状最后两个维度可能跟我的不同这是因为每次运行的时候dataloader选定的第一个batch的图片是随机的尺寸也就可能变化了backbone的输出的features尺寸自然也会变了 class AnchorsGenerator(nn.Module):# 注解组成的字典.注释下面两个变量里的元素类型__annotations__ {cell_anchors: Optional[List[torch.Tensor]],_cache: Dict[str, List[torch.Tensor]]}anchors生成器Arguments:sizes (Tuple[Tuple[int]]):aspect_ratios (Tuple[Tuple[float]]):def __init__(self, sizes(128, 256, 512), aspect_ratios(0.5, 1.0, 2.0)):super(AnchorsGenerator, self).__init__()if not isinstance(sizes[0], (list, tuple)):# TODO change thissizes tuple((s,) for s in sizes)if not isinstance(aspect_ratios[0], (list, tuple)):aspect_ratios (aspect_ratios,) * len(sizes)assert len(sizes) len(aspect_ratios)self.sizes sizesself.aspect_ratios aspect_ratiosself.cell_anchors Noneself._cache {}一样的先对AnchorsGenerator类先看参数sizes就是原论文当中的尺度scale这里传的是((32, 64, 128, 256, 512),)aspect_ratios就是原论文中的三种比例(1:21:12:1)这里传的就是((0.5, 1.0, 2.0),)注意都是元组形式不过没关系传的时候不是元组也不会报错因为传入非元组和非列表时下面两个if语句会自动帮你转换成元组的形式(aspect_ratios,) * len(sizes)就是将(aspect_ratios,)重复len(sizes)次如下图当然还要判断scale的长度是否等于比例的长度因为每一组尺度都对应三个比例所以需要进行判断剩下的就是初始化各个变量就不赘述了 def forward(self, image_list, feature_maps):# type: (ImageList, List[Tensor]) - List[Tensor]# 获取每个预测特征层的尺寸(height, width)grid_sizes list([feature_map.shape[-2:] for feature_map in feature_maps])# 获取输入图像的height和widthimage_size image_list.tensors.shape[-2:]# 获取变量类型和设备类型dtype, device feature_maps[0].dtype, feature_maps[0].device# one step in feature map equate n pixel stride in origin image# 计算特征层上的一步等于原始图像上的步长strides [[torch.tensor(image_size[0] // g[0], dtypetorch.int64, devicedevice),torch.tensor(image_size[1] // g[1], dtypetorch.int64, devicedevice)] for g in grid_sizes]# 根据提供的sizes和aspect_ratios生成anchors模板self.set_cell_anchors(dtype, device)# 计算/读取所有anchors的坐标信息这里的anchors信息是映射到原图上的所有anchors信息不是anchors模板# 得到的是一个list列表对应每张预测特征图映射回原图的anchors坐标信息anchors_over_all_feature_maps self.cached_grid_anchors(grid_sizes, strides)anchors torch.jit.annotate(List[List[torch.Tensor]], [])# 遍历一个batch中的每张图像for i, (image_height, image_width) in enumerate(image_list.image_sizes):anchors_in_image []# 遍历每张预测特征图映射回原图的anchors坐标信息for anchors_per_feature_map in anchors_over_all_feature_maps:anchors_in_image.append(anchors_per_feature_map)anchors.append(anchors_in_image)# 将每一张图像的所有预测特征层的anchors坐标信息拼接在一起# anchors是个list每个元素为一张图像的所有anchors信息anchors [torch.cat(anchors_per_image) for anchors_per_image in anchors]# Clear the cache in case that memory leaks.self._cache.clear()return anchors老规矩直接看正向传播过程传入的image_list是ImageList类别之前也说了存储的是经过一个batch打包处理后的图片size和等比例缩放后的图片size, feature_maps就是经过backbone特征提取后得到的一个特征层 grid_sizes就是遍历特征层得到特征图的高和宽(debug得到的是25x38) image_size是经过batch打包处理后的图片宽高(debug得到的是800x1216) strides就是计算特征层上的一步等于原始图像上的步长求得对应高宽的缩放因子(debug得到的是32)对应特征层上缩小了32倍然后进入类方法set_cell_anchors() def set_cell_anchors(self, dtype, device):# type: (torch.dtype, torch.device) - Noneif self.cell_anchors is not None:cell_anchors self.cell_anchorsassert cell_anchors is not None# suppose that all anchors have the same device# which is a valid assumption in the current state of the codebaseif cell_anchors[0].device device:return# 根据提供的sizes和aspect_ratios生成anchors模板# anchors模板都是以(0, 0)为中心的anchorcell_anchors [self.generate_anchors(sizes, aspect_ratios, dtype, device)for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)]self.cell_anchors cell_anchorsset_cell_anchors()方法用于生成anchor模板现在我们还没有cell_anchors所以会进入类方法generate_anchors(),传入的参数分别是要生成不同anchor的尺度大小和比例以及数据类型和设备 def generate_anchors(self, scales, aspect_ratios, dtypetorch.float32, devicetorch.device(cpu)):# type: (List[int], List[float], torch.dtype, torch.device) - Tensorcompute anchor sizesArguments:scales: sqrt(anchor_area)aspect_ratios: h/w ratiosdtype: float32device: cpu/gpuscales torch.as_tensor(scales, dtypedtype, devicedevice)aspect_ratios torch.as_tensor(aspect_ratios, dtypedtype, devicedevice)h_ratios torch.sqrt(aspect_ratios)w_ratios 1.0 / h_ratios# [r1, r2, r3] * [s1, s2, s3]# number of elements is len(ratios)*len(scales)ws (w_ratios[:, None] * scales[None, :]).view(-1)hs (h_ratios[:, None] * scales[None, :]).view(-1)# left-top, right-bottom coordinate relative to anchor center(0, 0)# 生成的anchors模板都是以0, 0为中心的, shape [len(ratios)*len(scales), 4]base_anchors torch.stack([-ws, -hs, ws, hs], dim1) / 2return base_anchors.round() # round 四舍五入首先将尺度信息和比例信息转换为tensor格式 h_ratios torch.sqrt(aspect_ratios)w_ratios 1.0 / h_ratios这一步为什么这么做呢因为我们传入的是三种比例(1:2, 1:1, 2:1)高的因子开根号1除宽的因子这样得到高宽的因子可以保证面积不变我们用第一种比例子 2 × 1 1 × 2 2 × 2 × 2 2 2\times11\times\frac{\sqrt{2}}{2}\times2\times\sqrt{2}2 2×11×22 ×2×2 2 通过这两个因子就可以得到不同尺度的三种比例的anchorw_ratios[:, None]就是添加一个维度形状从[3]-[3, 1]scales[None, :]的形状就从[5]-[1, 5]矩阵相乘就会得到[3, 5]的矩阵通过view(-1)转换成一维向量生成的3x5矩阵每一列对应每一种尺度的三种比例的值 ( 2 2 1 2 ) × ( 32 64 128 256 512 ) \begin{pmatrix} \frac{\sqrt{2}}{2}\\ 1\\ \sqrt{2} \end{pmatrix}\times\begin{pmatrix} 32 64 128 256 512\\ \end{pmatrix} 22 12 ×(3264128256512) 这是对应生成的宽 ( 2 1 2 2 ) × ( 32 64 128 256 512 ) \begin{pmatrix} \sqrt{2}\\ 1\\ \frac{\sqrt{2}}{2} \end{pmatrix}\times\begin{pmatrix} 32 64 128 256 512\\ \end{pmatrix} 2 122 ×(3264128256512) 这是对应生成的高生成了15个高和15个宽值后我们需要对应图像坐标系来将这些值拼接成左上角右下角的坐标形式我们知道图像中的坐标系是下面这样的为什么拼接之后每个坐标值要除2看下面这张图就知道了这里只画了一种尺度的三种比例的anchor因为生成的anchors模板都是以0, 0为中心的我们把这些anchor模板放到坐标系中anchor左上角右下角的坐标就对应着 [ − w s 2 , − h s 2 , w s 2 , h s 2 ] [\frac{-ws}{2},\frac{-hs}{2},\frac{ws}{2},\frac{hs}{2}] [2−ws,2−hs,2ws,2hs]对吧, dim1是因为第0个维度是batch(多少张图片),所以从第一个维度拼接最后四舍五入一下就得到最后的anchor模板对应着原点(0, 0)的坐标信息,我们可以看一下debug的结果每五行对应一种比例刚好三种比例。这时候类方法set_cell_anchors()就讲完啦应该很好理解吧接下来就是类方法cached_grid_anchors() def cached_grid_anchors(self, grid_sizes, strides):# type: (List[List[int]], List[List[Tensor]]) - List[Tensor]将计算得到的所有anchors信息进行缓存key str(grid_sizes) str(strides)# self._cache是字典类型if key in self._cache:return self._cache[key]anchors self.grid_anchors(grid_sizes, strides)self._cache[key] anchorsreturn anchorsgrid_sizes是传入的是特征提取后的特征层高宽(25x38)strides就是上面讲到得特征图对应原图上的缩放倍数对应高宽所以是[32, 32]self._cache初始化的是一个空字典存储对应原图像上(经过打包处理后高宽固定的图像)的anchor坐标直接进类方法grid_anchors() def grid_anchors(self, grid_sizes, strides):# type: (List[List[int]], List[List[Tensor]]) - List[Tensor]anchors position in grid coordinate axis map into origin image计算预测特征图对应原始图像上的所有anchors的坐标Args:grid_sizes: 预测特征矩阵的height和widthstrides: 预测特征矩阵上一步对应原始图像上的步距anchors []cell_anchors self.cell_anchorsassert cell_anchors is not None# 遍历每个预测特征层的grid_sizestrides和cell_anchorsfor size, stride, base_anchors in zip(grid_sizes, strides, cell_anchors):grid_height, grid_width sizestride_height, stride_width stridedevice base_anchors.device# For output anchor, compute [x_center, y_center, x_center, y_center]# shape: [grid_width] 对应原图上的x坐标(列)shifts_x torch.arange(0, grid_width, dtypetorch.float32, devicedevice) * stride_width# shape: [grid_height] 对应原图上的y坐标(行)shifts_y torch.arange(0, grid_height, dtypetorch.float32, devicedevice) * stride_height# 计算预测特征矩阵上每个点对应原图上的坐标(anchors模板的坐标偏移量)# torch.meshgrid函数分别传入行坐标和列坐标生成网格行坐标矩阵和网格列坐标矩阵# shape: [grid_height, grid_width]shift_y, shift_x torch.meshgrid(shifts_y, shifts_x)shift_x shift_x.reshape(-1)shift_y shift_y.reshape(-1)# 计算anchors坐标(xmin, ymin, xmax, ymax)在原图上的坐标偏移量# shape: [grid_width*grid_height, 4]# 这里dim1结果才是[grid_width*grid_height, 4]dim0是batch的维度如果dim0结果就是[4, grid_width*grid_height]shifts torch.stack([shift_x, shift_y, shift_x, shift_y], dim1)# For every (base anchor, output anchor) pair,# offset each zero-centered base anchor by the center of the output anchor.# 将anchors模板与原图上的坐标偏移量相加得到原图上所有anchors的坐标信息(shape不同时会使用广播机制)shifts_anchor shifts.view(-1, 1, 4) base_anchors.view(1, -1, 4)anchors.append(shifts_anchor.reshape(-1, 4))return anchors # List[Tensor(all_num_anchors, 4)]遍历所有特征图的高宽(25x38),由于mobilenetv2特征提取后只有一个特征层所以只有25x34放缩步长strides[32, 32]cell_anchors就是之前存储的anchor模板 shifts_x torch.arange(0, grid_width, dtypetorch.float32, devicedevice) * stride_width # shape: [grid_height] 对应原图上的y坐标(行) shifts_y torch.arange(0, grid_height, dtypetorch.float32, devicedevice) * stride_height这两步就是生成对应原图上的x坐标y坐标我们可以看一下 torch.meshgrid()函数我们模拟一下上面那部分代码 import numpy as np import matplotlib.pyplot as pltsize [25, 38] stride [32, 32] grid_height, grid_width size stride_height, stride_width stride x np.arange(0, grid_width) * stride_width y np.arange(0, grid_height) * stride_height y, x np.meshgrid(y, x) y y.reshape(-1) x x.reshape(-1) plt.figure() plt.plot(x, y,colorlimegreen, # 设置颜色为limegreenmarker., # 设置点类型为圆点linestyle) # 设置线型为空也即没有线连接点 plt.grid(True) plt.show() print(x) print(y)自己可以去试试看看输出的x,y是什么得到结果可以看到生成很多点shifts torch.stack([shift_x, shift_y, shift_x, shift_y], dim1)代码得到的结果就是[x, y, x, y]这样的形式我们发现左上角右下角的坐标都是(x, y)说白了就是一个点那么就将这些点都当作原点来看即图上这些绿点再将anchor模板放上去每个点放上15个这样就会生成很多anchor shifts_anchor shifts.view(-1, 1, 4) base_anchors.view(1, -1, 4) anchors.append(shifts_anchor.reshape(-1, 4))这一步就是将anchors模板与原图上的坐标偏移量相加得到原图上所有anchors的坐标信息(shape不同时会使用广播机制)相当于下图三种颜色代表了三种尺度这里我只画了三种图上对应还有很多点每个点都会得到5种尺度3比例3x5个anchor将所有的anchor坐标存在一个列表中并返回 torch.jit.annotate()介绍剩下的部分很简单因为每张图片大小都是一样的将刚刚得到的一张图上的所有anchor坐标重复一个batch(我设置的是8)的数量最后再将一个batch的所有anchor坐标拼接到一起,debug结果如下这样就得到了一个batch每张图上的anchor坐标信息了本次的源码解析就到这里啦主要就是anchor模板的生成以及如何将anchor模板坐标放到原图上的过程最后再将一个batch的图片所有的anchor信息放在一个列表中。我们下节见谢谢大家能坚持看到结尾可能是很多很杂但慢慢理一下就能有个大概的体系了不懂的可以评论区留言我们下节见

查看全文

http://www.pierceye.com/news/591286/