爱站网挖掘词,wordpress删除页头页尾,网站备案能查到什么东西,注册网站的信息网站SSD#xff08;SSD: Single Shot MultiBox Detector#xff09;是采用单个深度神经网络模型实现目标检测和识别的方法。如图0-1所示#xff0c;该方法是综合了Faster R-CNN的anchor box和YOLO单个神经网络检测思路#xff08;YOLOv2也采用了类似的思路#xff0c;详见YOLO…SSDSSD: Single Shot MultiBox Detector是采用单个深度神经网络模型实现目标检测和识别的方法。如图0-1所示该方法是综合了Faster R-CNN的anchor box和YOLO单个神经网络检测思路YOLOv2也采用了类似的思路详见YOLO升级版YOLOv2和YOLO9000解析既有Faster R-CNN的准确率又有YOLO的检测速度可以实现高准确率实时检测。在300*300分辨率SSD在VOC2007数据集上准确率为74.3%mAP59FPS512*512分辨率SSD获得了超过Fast R-CNN获得了80%mAP/19fps的结果如图0-2所示。SSD关键点分为两类模型结构和训练方法。模型结构包括多尺度特征图检测网络结构和anchor boxes生成训练方法包括ground truth预处理和损失函数。本文解析的是SSD的tensorflow实现源码来源balancap/SSD-Tensorflow。本文结构如下 1多尺度特征图检测网络结构 2anchor boxes生成 3ground truth预处理 4目标函数 5总结 img srchttps://pic2.zhimg.com/v2-d0252b7d1408105470b88ceb45054725_b.png data-rawwidth1031 data-rawheight686 classorigin_image zh-lightbox-thumb width1031 data-originalhttps://pic2.zhimg.com/v2-d0252b7d1408105470b88ceb45054725_r.png 图0-1 SSD与MultiBoxFaster R-CNNYOLO原理此图来源于作者在eccv2016的PPT img srchttps://pic2.zhimg.com/v2-0213e22e8b0d96f8854e82d796c83a71_b.png classcontent_image 图0-2 SSD检测速度与精确度。此图来源于作者在eccv2016的PPT 1 多尺度特征图检测网络结构 SSD的网络模型如图1-1所示。img srchttps://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_b.png data-rawwidth1152 data-rawheight553 classorigin_image zh-lightbox-thumb width1152 data-originalhttps://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_r.png 图1-1 SSD模型结构。此图来源于原论文 模型建立源代码包含于ssd_vgg_300.py中。模型多尺度特征图检测如图1-2所示。模型选择的特征图包括38×38block4,19×19block710×10block85×5block93×3block101×1block11。对于每张特征图生成采用3×3卷积生成 默认框的四个偏移位置和21个类别的置信度。比如block7默认框def boxes数目为6每个默认框包含4个偏移位置和21个类别置信度421。因此block7的最后输出为(19*19)*6*(421)。 img srchttps://pic1.zhimg.com/v2-5964f6dff6dbbd435336cde9e5dfc988_b.png classcontent_image 图1-2 多尺度特征采样此图来源知乎专栏 其中初始化参数如下 Implementation of the SSD VGG-based 300 network. The default features layers with 300x300 image input are:conv4 38 x 38conv7 19 x 19conv8 10 x 10conv9 5 x 5conv10 3 x 3conv11 1 x 1The default image size used to train this network is 300x300.default_params SSDParams(img_shape(300, 300),#输入尺寸num_classes21,#预测类别2012120类加背景#获取feature map层feat_layers[block4, block7, block8, block9, block10, block11],feat_shapes[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],anchor_size_bounds[0.15, 0.90],#anchor boxes的大小anchor_sizes[(21., 45.),(45., 99.),(99., 153.),(153., 207.),(207., 261.),(261., 315.)],#anchor boxes的aspect ratiosanchor_ratios[[2, .5],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5],[2, .5]],anchor_steps[8, 16, 32, 64, 100, 300],#anchor的层anchor_offset0.5,#补偿阀值0.5normalizations[20, -1, -1, -1, -1, -1],#该特征层是否正则大于零即正则小于零则否prior_scaling[0.1, 0.1, 0.2, 0.2])建立模型代码如下作者采用了TensorFlow-Slim类似于keras的高层库来建立网络模型详细内容可以参考TensorFlow-Slim网页。 #建立ssd网络函数
def ssd_net(inputs,num_classes21,feat_layersSSDNet.default_params.feat_layers,anchor_sizesSSDNet.default_params.anchor_sizes,anchor_ratiosSSDNet.default_params.anchor_ratios,normalizationsSSDNet.default_params.normalizations,is_trainingTrue,dropout_keep_prob0.5,prediction_fnslim.softmax,reuseNone,scopessd_300_vgg):SSD net definition.# End_points collect relevant activations for external use.#用于收集每一层输出结果end_points {}#采用slim建立vgg网络,网络结构参考文章内的结构图with tf.variable_scope(scope, ssd_300_vgg, [inputs], reusereuse):# Original VGG-16 blocks.net slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scopeconv1)end_points[block1] netnet slim.max_pool2d(net, [2, 2], scopepool1)# Block 2.net slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scopeconv2)end_points[block2] netnet slim.max_pool2d(net, [2, 2], scopepool2)# Block 3.net slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scopeconv3)end_points[block3] netnet slim.max_pool2d(net, [2, 2], scopepool3)# Block 4.net slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scopeconv4)end_points[block4] netnet slim.max_pool2d(net, [2, 2], scopepool4)# Block 5.net slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scopeconv5)end_points[block5] netnet slim.max_pool2d(net, [3, 3], 1, scopepool5)#max pool#外加的SSD层# Additional SSD blocks.# Block 6: lets dilate the hell out of it!#输出shape为19×19×1024net slim.conv2d(net, 1024, [3, 3], rate6, scopeconv6)end_points[block6] net# Block 7: 1x1 conv. Because the fuck.#卷积核为1×1net slim.conv2d(net, 1024, [1, 1], scopeconv7)end_points[block7] net# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).end_point block8with tf.variable_scope(end_point):net slim.conv2d(net, 256, [1, 1], scopeconv1x1)net slim.conv2d(net, 512, [3, 3], stride2, scopeconv3x3)end_points[end_point] netend_point block9with tf.variable_scope(end_point):net slim.conv2d(net, 128, [1, 1], scopeconv1x1)net slim.conv2d(net, 256, [3, 3], stride2, scopeconv3x3)end_points[end_point] netend_point block10with tf.variable_scope(end_point):net slim.conv2d(net, 128, [1, 1], scopeconv1x1)net slim.conv2d(net, 256, [3, 3], scopeconv3x3, paddingVALID)end_points[end_point] netend_point block11with tf.variable_scope(end_point):net slim.conv2d(net, 128, [1, 1], scopeconv1x1)net slim.conv2d(net, 256, [3, 3], scopeconv3x3, paddingVALID)end_points[end_point] net# Prediction and localisations layers.#预测和定位predictions []logits []localisations []for i, layer in enumerate(feat_layers):with tf.variable_scope(layer _box):#接受特征层的输出生成类别和位置预测p, l ssd_multibox_layer(end_points[layer],num_classes,anchor_sizes[i],anchor_ratios[i],normalizations[i])#把每一层的预测收集predictions.append(prediction_fn(p))#prediction_fn为softmax预测类别logits.append(p)#概率localisations.append(l)#预测位置信息return predictions, localisations, logits, end_points2 anchor box生成 对每一张特征图按照不同的大小scale 和长宽比ratio 生成生成k个默认框default boxes原理图如图2-1所示(此图中默认框数目k6其中5×5的红色点代表特征图因此5*5*6 150 个boxes)。 每个默认框大小计算公式为其中m为特征图数目为最底层特征图大小原论文中值为0.2代码中为0.15为最顶层特征图默认框大小原论文中为0.9,代码中为0.9。 每个默认框长宽比根据比例值计算原论文中比例值为因此每个默认框的宽为高为。对于比例为1的默认框额外添加一个比例为的默认框。最终每张特征图中的每个点生成6个默认框。每个默认框中心设定为,其中为第k个特征图尺寸。 img srchttps://pic4.zhimg.com/v2-e128c01e26456fa24502e2c05bf46e1b_b.png classcontent_image img srchttps://pic3.zhimg.com/v2-e6f0dd799661fff724853435b976a82e_b.png classcontent_image img srchttps://pic3.zhimg.com/v2-64a521f37e62fe79c9b5d11746eb6686_b.png classcontent_image 图2-1 anchor box生成示意图此图来源于知乎专栏 源代码中默认框生成函数为ssd_anchor_one_layer()代码如下 #生成一层的anchor boxes
def ssd_anchor_one_layer(img_shape,#原始图像shapefeat_shape,#特征图shapesizes,#预设的box sizeratios,#aspect 比例step,#anchor的层offset0.5,dtypenp.float32):Computer SSD default anchor boxes for one feature layer. Determine the relative position grid of the centers, and the relativewidth and height. Arguments:feat_shape: Feature shape, used for computing relative position grids;size: Absolute reference sizes;ratios: Ratios to use on these features;img_shape: Image shape, used for computing height, width relatively to theformer;offset: Grid offset. Return:y, x, h, w: Relative x and y grids, and height and width.# Compute the position grid: simple way.# y, x np.mgrid[0:feat_shape[0], 0:feat_shape[1]]# y (y.astype(dtype) offset) / feat_shape[0]# x (x.astype(dtype) offset) / feat_shape[1]# Weird SSD-Caffe computation using steps values...#测试中参数如下feat_shapes[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]anchor_sizes[(21., 45.),(45., 99.),(99., 153.),(153., 207.),(207., 261.),(261., 315.)]anchor_ratios[[2, .5],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5],[2, .5]]anchor_steps[8, 16, 32, 64, 100, 300] offset0.5 dtypenp.float32 feat_shapefeat_shapes[0]stepanchor_steps[0]#测试中y和x的shape为38,3838,38#y的值为#array([[ 0, 0, 0, ..., 0, 0, 0],# [ 1, 1, 1, ..., 1, 1, 1],# [ 2, 2, 2, ..., 2, 2, 2],# ..., # [35, 35, 35, ..., 35, 35, 35],# [36, 36, 36, ..., 36, 36, 36],# [37, 37, 37, ..., 37, 37, 37]])y, x np.mgrid[0:feat_shape[0], 0:feat_shape[1]]#测试中y(y0.5)×8/300,x(x0.5)×8/300y (y.astype(dtype) offset) * step / img_shape[0]x (x.astype(dtype) offset) * step / img_shape[1]#扩展维度维度为38,38,1# Expand dims to support easy broadcasting.y np.expand_dims(y, axis-1)x np.expand_dims(x, axis-1)# Compute relative height and width.# Tries to follow the original implementation of SSD for the order.#数值为22num_anchors len(sizes) len(ratios)#shape为4,h np.zeros((num_anchors, ), dtypedtype)w np.zeros((num_anchors, ), dtypedtype)# Add first anchor boxes with ratio1.#测试中h[0]21/300,w[0]21/300?h[0] sizes[0] / img_shape[0]w[0] sizes[0] / img_shape[1]di 1if len(sizes) 1:#h[1]sqrt(21*45)/300h[1] math.sqrt(sizes[0] * sizes[1]) / img_shape[0]w[1] math.sqrt(sizes[0] * sizes[1]) / img_shape[1]di 1for i, r in enumerate(ratios):h[idi] sizes[0] / img_shape[0] / math.sqrt(r)w[idi] sizes[0] / img_shape[1] * math.sqrt(r)#测试中y和x shape为38,38,1#h和w的shape为4,return y, x, h, w3 ground truth预处理 训练过程中首先需要将label信息ground truth boxground truth category进行预处理将其对应到相应的默认框上。根据默认框和ground truth box的jaccard 重叠来寻找对应的默认框。文章中选取了jaccard重叠超过0.5的默认框为正样本其它为负样本。 源代码ground truth预处理代码位于ssd_common.py文件中关键代码如下 #label和bbox编码函数
def tf_ssd_bboxes_encode_layer(labels,#ground truth标签1D tensorbboxes,#N×4 Tensorfloatanchors_layer,#anchors为listmatching_threshold0.5,#阀值prior_scaling[0.1, 0.1, 0.2, 0.2],#缩放dtypetf.float32):Encode groundtruth labels and bounding boxes using SSD anchors fromone layer. Arguments:labels: 1D Tensor(int64) containing groundtruth labels;bboxes: Nx4 Tensor(float) with bboxes relative coordinates;anchors_layer: Numpy array with layer anchors;matching_threshold: Threshold for positive match with groundtruth bboxes;prior_scaling: Scaling of encoded coordinates. Return:(target_labels, target_localizations, target_scores): Target Tensors.# Anchors coordinates and volume.#获取anchors层yref, xref, href, wref anchors_layerymin yref - href / 2.xmin xref - wref / 2.ymax yref href / 2.xmax xref wref / 2.#xmax的shape为((38, 38, 1), (38, 38, 1), (4,), (4,))
(38, 38, 4)#体积vol_anchors (xmax - xmin) * (ymax - ymin)# Initialize tensors...shape (yref.shape[0], yref.shape[1], href.size)feat_labels tf.zeros(shape, dtypetf.int64)feat_scores tf.zeros(shape, dtypedtype)#shape为38,38,4feat_ymin tf.zeros(shape, dtypedtype)feat_xmin tf.zeros(shape, dtypedtype)feat_ymax tf.ones(shape, dtypedtype)feat_xmax tf.ones(shape, dtypedtype)#计算jaccard重合def jaccard_with_anchors(bbox):Compute jaccard score a box and the anchors.# Intersection bbox and volume.int_ymin tf.maximum(ymin, bbox[0])int_xmin tf.maximum(xmin, bbox[1])int_ymax tf.minimum(ymax, bbox[2])int_xmax tf.minimum(xmax, bbox[3])h tf.maximum(int_ymax - int_ymin, 0.)w tf.maximum(int_xmax - int_xmin, 0.)# Volumes.inter_vol h * wunion_vol vol_anchors - inter_vol \ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])jaccard tf.div(inter_vol, union_vol)return jaccard#条件函数 def condition(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):Condition: check label index.#tf.less函数 Returns the truth value of (x y) element-wise.r tf.less(i, tf.shape(labels))return r[0]#主体def body(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):Body: update feature labels, scores and bboxes.Follow the original SSD paper for that purpose:- assign values when jaccard 0.5;- only update if beat the score of other bboxes.# Jaccard score.label labels[i]bbox bboxes[i]scores jaccard_with_anchors(bbox)#计算jaccard重合值# Boolean mask.#tf.greater函数返回大于的布尔值mask tf.logical_and(tf.greater(scores, matching_threshold),tf.greater(scores, feat_scores))imask tf.cast(mask, tf.int64)fmask tf.cast(mask, dtype)# Update values using mask.feat_labels imask * label (1 - imask) * feat_labelsfeat_scores tf.select(mask, scores, feat_scores)feat_ymin fmask * bbox[0] (1 - fmask) * feat_yminfeat_xmin fmask * bbox[1] (1 - fmask) * feat_xminfeat_ymax fmask * bbox[2] (1 - fmask) * feat_ymaxfeat_xmax fmask * bbox[3] (1 - fmask) * feat_xmaxreturn [i1, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax]# Main loop definition.i 0[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax] tf.while_loop(condition, body,[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax])# Transform to center / size.#计算补偿后的中心feat_cy (feat_ymax feat_ymin) / 2.feat_cx (feat_xmax feat_xmin) / 2.feat_h feat_ymax - feat_yminfeat_w feat_xmax - feat_xmin# Encode features.feat_cy (feat_cy - yref) / href / prior_scaling[0]feat_cx (feat_cx - xref) / wref / prior_scaling[1]feat_h tf.log(feat_h / href) / prior_scaling[2]feat_w tf.log(feat_w / wref) / prior_scaling[3]# Use SSD ordering: x / y / w / h instead of ours.feat_localizations tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis-1)return feat_labels, feat_localizations, feat_scores#ground truth编码函数
def tf_ssd_bboxes_encode(labels,#ground truth标签1D tensorbboxes,#N×4 Tensorfloatanchors,#anchors为listmatching_threshold0.5,#阀值prior_scaling[0.1, 0.1, 0.2, 0.2],#缩放dtypetf.float32,scopessd_bboxes_encode):Encode groundtruth labels and bounding boxes using SSD net anchors.Encoding boxes for all feature layers. Arguments:labels: 1D Tensor(int64) containing groundtruth labels;bboxes: Nx4 Tensor(float) with bboxes relative coordinates;anchors: List of Numpy array with layer anchors;matching_threshold: Threshold for positive match with groundtruth bboxes;prior_scaling: Scaling of encoded coordinates. Return:(target_labels, target_localizations, target_scores):Each element is a list of target Tensors.with tf.name_scope(scope):target_labels []target_localizations []target_scores []for i, anchors_layer in enumerate(anchors):with tf.name_scope(bboxes_encode_block_%i % i):#将label和bbox进行编码t_labels, t_loc, t_scores \tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,matching_threshold, prior_scaling, dtype)target_labels.append(t_labels)target_localizations.append(t_loc)target_scores.append(t_scores)return target_labels, target_localizations, target_scores#编码goundtruth的label和bboxdef bboxes_encode(self, labels, bboxes, anchors,scopessd_bboxes_encode):Encode labels and bounding boxes.return ssd_common.tf_ssd_bboxes_encode(labels, bboxes, anchors,matching_threshold0.5,prior_scalingself.params.prior_scaling,scopescope)4 目标函数 SSD目标函数分为两个部分对应默认框的位置lossloc和类别置信度lossconf。定义 为第i个默认框和对应的第j个ground truth box相应的类别为p。目标函数定义为 其中N为匹配的默认框。如果N0loss为零。为预测框和ground truth box 的Smooth L1 loss值通过cross validation设置为1。 img srchttps://pic2.zhimg.com/v2-f7f9cd187a7e4cf8fb2c430a844bdc5d_b.png data-rawwidth441 data-rawheight93 classorigin_image zh-lightbox-thumb width441 data-originalhttps://pic2.zhimg.com/v2-f7f9cd187a7e4cf8fb2c430a844bdc5d_r.png 定义如下img srchttps://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_b.png data-rawwidth539 data-rawheight184 classorigin_image zh-lightbox-thumb width539 data-originalhttps://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_r.png其中 其中为预测框为ground truth。为补偿regress to offsets后的默认框的中心为默认框的宽和高。 定义为多累别softmax loss公式如下 img srchttps://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_b.png data-rawwidth739 data-rawheight75 classorigin_image zh-lightbox-thumb width739 data-originalhttps://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_r.png目标函数定义源码位于ssd_vgg_300.py注释如下 目标函数定义源码位于ssd_vgg_300.py注释如下 # #
# SSD loss function.
# #
def ssd_losses(logits, #预测类别localisations,#预测位置gclasses, #ground truth 类别glocalisations, #ground truth 位置gscores,#ground truth 分数match_threshold0.5,negative_ratio3.,alpha1.,label_smoothing0.,scopessd_losses):Loss functions for training the SSD 300 VGG network. This function defines the different loss components of the SSD, andadds them to the TF loss collection. Arguments:logits: (list of) predictions logits Tensors;localisations: (list of) localisations Tensors;gclasses: (list of) groundtruth labels Tensors;glocalisations: (list of) groundtruth localisations Tensors;gscores: (list of) groundtruth score Tensors;# Some debugging...# for i in range(len(gclasses)):# print(localisations[i].get_shape())# print(logits[i].get_shape())# print(gclasses[i].get_shape())# print(glocalisations[i].get_shape())# print()with tf.name_scope(scope):l_cross []l_loc []for i in range(len(logits)):with tf.name_scope(block_%i % i):# Determine weights Tensor.pmask tf.cast(gclasses[i] 0, logits[i].dtype)n_positives tf.reduce_sum(pmask)#正样本数目#np.prod函数Return the product of array elements over a given axisn_entries np.prod(gclasses[i].get_shape().as_list())# r_positive n_positives / n_entries# Select some random negative entries.r_negative negative_ratio * n_positives / (n_entries - n_positives)#负样本数nmask tf.random_uniform(gclasses[i].get_shape(),dtypelogits[i].dtype)nmask nmask * (1. - pmask)nmask tf.cast(nmask 1. - r_negative, logits[i].dtype)#cross_entropy loss# Add cross-entropy loss.with tf.name_scope(cross_entropy):# Weights Tensor: positive mask random negative.weights pmask nmaskloss tf.nn.sparse_softmax_cross_entropy_with_logits(logits[i],gclasses[i])loss tf.contrib.losses.compute_weighted_loss(loss, weights)l_cross.append(loss)#smooth loss# Add localization loss: smooth L1, L2, ...with tf.name_scope(localization):# Weights Tensor: positive mask random negative.weights alpha * pmaskloss custom_layers.abs_smooth(localisations[i] - glocalisations[i])loss tf.contrib.losses.compute_weighted_loss(loss, weights)l_loc.append(loss)# Total losses in summaries...with tf.name_scope(total):tf.summary.scalar(cross_entropy, tf.add_n(l_cross))tf.summary.scalar(localization, tf.add_n(l_loc))5 总结 本文对SSD: Single Shot MultiBox Detector的tensorflow的关键源代码进行了解析。本文采用的源码来自于balancap/SSD-Tensorflow。源码作者写得非常详细内容较多其它还包括了图像预处理多GPU并行训练等许多内容因此只选取了关键代码进行解析。在看完论文后再结合关键代码分析结构就很清晰了。SSD代码实现的关键点为1多尺度特征图检测网络结构2anchor boxes生成3ground truth预处理4目标函数。SSD和YOLOv2类似可以实现高准确率下的实时目标检测是非常值得研究和改进的目标检测方法。