目标检测是计算机视觉上的一个重要任务,本文介绍的是YOLO算法,其全称是You Only Look Once: Unified, Real-Time Object Detection,它是目标检测中实现端到端目标检测的佼佼者,从YOLOv1到YOLOv5其中在数据处理、网络结构上都做了不少优化,而YOLOv5能够达到体积更小、精度更好,本文就从零开始介绍如何通过用TensorFlow 对YOLOv5进行搭建训练和部署。本实例源码可在点击以下链接:https://github.com/Yunying-CN/Yolov5-TF



为了提高训练速度减少训练时长,在训练阶段最好在配有GPU的本地服务器或者云服务器上进行。本例以Linux 64位下的Python 3.8版本为例,可选择下载对应的安装包。在保存安装包的路径下打开终端,运行命令进行安装TensorFlow。这里安装的是Tensorflow2.3.0-gpu版本,搭配cuda10.1和对应的cudnn,也可以直接通过pip安装命令来下载安装,如果速度较慢可以修改下载的源。

$sudo apt-get install python-pip python-dev

$pip3 install --upgrade pip

$pip3 install tensorflow-gpu==2.3.1 -i https://pypi.tuna.tsinghua.edu.cn/simple



>>import tensorflow as tf







本实例以开源的Pascal Voc2012数据集。Pascal VOC2012作为基准数据之一,在对象检测、图像分割网络对比实验与模型效果评估中被频频使用,Pascal VOC2012数据集主要是针对视觉任务中监督学习提供标签数据,它一共包含有20个类别,分别为:aeroplane、bicycle、bird、boat、bottle、bus、car、cat、chair、cow、dining table、dog、horse、motorbike、person、

potted plant、sheep、sofa、train、tv/monitor,训练图像有5717张,目标数13609个,测试图像有11540张,目标数27450个。Pascal Voc2012数据集可以在官网上进行下载(http://host.robots.ox.ac.uk/pascal/VOC/voc2012/),也可以在终端通过命令下载数据集并解压。

$wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar -O ./data/voc2012.tar

$mkdir -p ./data/voc

$tar -xf ./data/voc2012.tar -C ./data/voc

$ls ./data/voc

里面包括有Annotations、ImageSets、JPEGImages、SegmentationClass 和SegmentationObject 五个文件夹,Annotations 文件夹中保存了进行目标检测任务时的标签文件为.xml格式,标签文件名与图片名一一对应。.xml文档记录了该图片的尺寸信息以及图片中识别物体的类别和其具体位置信息。ImageSets包含三个子文件夹 Layout、Main、Segmentation,其中 Main 存放的是分类和检测的数据集分割文件,JPEGImages 存放.jpg 格式的图片文件,SegmentationClass 存放按照类别进行分割的图片,SegmentationObject 存放按照物体进行分割的图片。





import os
from absl import app, flags, logging
from absl.flags import FLAGS
import tensorflow as tf
import lxml.etree
import tqdm
flags.DEFINE_string('data_dir', './data/voc/VOCdevkit/VOC2012/',
                    'path to PASCAL VOC dataset')
flags.DEFINE_enum('split', 'val', [
                  'train', 'val'], 'specify train or val spit')
flags.DEFINE_string('output_file', './data/voc2012_train.tfrecord', 'outpot dataset')
flags.DEFINE_string('classes', './data/voc2012.names', 'classes file') def build_example(annotation, class_map):
    img_path = os.path.join(
        FLAGS.data_dir, 'JPEGImages', annotation['filename'])
    img_raw = open(img_path, 'rb').read()
    width = int(annotation['size']['width'])

  height = int(annotation['size']['height'])
    xmin = []
    ymin = []
    xmax = []
    ymax = []
    classes = []
    classes_text = []
    if 'object' in annotation:
        for obj in annotation['object']:
            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            classes_text.append(obj['name'].encode('utf8'))            classes.append(class_map[obj['name']])
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/encoded': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])),
        'image/object/bbox/xmin':           tf.train.Feature(float_list=tf.train.FloatList(value=xmin)),
        'image/object/bbox/xmax':           tf.train.Feature(float_list=tf.train.FloatList(value=xmax)),
        'image/object/bbox/ymin':           tf.train.Feature(float_list=tf.train.FloatList(value=ymin)),
        'image/object/bbox/ymax':           tf.train.Feature(float_list=tf.train.FloatList(value=ymax)),
        'image/object/class/text':          tf.train.Feature(bytes_list=tf.train.BytesList(value=classes_text)),
        'image/object/class/label':           tf.train.Feature(int64_list=tf.train.Int64List(value=classes)),
    return exampledef parse_xml(xml):
    if not len(xml):
        return {xml.tag: xml.text}
    result = {}
    for child in xml:
        child_result = parse_xml(child)
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
            if child.tag not in result:
                result[child.tag] = []
    return {xml.tag: result}

def main(_argv):
    class_map = {name: idx for idx, name in enumerate(
    writer = tf.io.TFRecordWriter(FLAGS.output_file)
    image_list = open(os.path.join(
        FLAGS.data_dir, 'ImageSets', 'Main', '%s.txt' % FLAGS.split)).read().splitlines()
    logging.info("Image list loaded: %d", len(image_list))
    for name in tqdm.tqdm(image_list):
        annotation_xml = os.path.join(
            FLAGS.data_dir, 'Annotations', name + '.xml')
        annotation_xml = lxml.etree.fromstring(open(annotation_xml).read())        annotation = parse_xml(annotation_xml)['annotation']
        tf_example = build_example(annotation, class_map)
if __name__ == '__main__':



YOLOv5目标检测网络中一共有4个版本,分别是YOLOv5s、YOLOv5m、YOLOv5l、YOLOv5x四个模型,通过用.yaml文件来配置模型。在yaml文件里面分别定义了各个参数变量如: nc代表分类目标的数量,depth_multiple即网络深度表示channel的缩放系数,即将配置里面的backbone和head部分有关通道的设置。而width_multiple即网络宽度表示BottleneckCSP模块的层缩放系数,将所有的BottleneckCSP模块的number系数乘上该参数即为最终的层个数。通过这参数就可以实现不同大小不同复杂度的模型设计,4个版本的YOLOv5也做了不同的设计。Anchors为预设锚定框,预设了640×640图像大小下9种锚定框的尺寸。此外还有模型的主干网络backbone和通用检测层head,head主要用于最终检测部分。它在特征图上应用锚定框并生成带有类概率、对象得分和边界框的最终输出向量。以下是以YOLOv5s.yaml为例。

# Parameters
nc: 20  # number of classes
depth_multiple: 0.67  # model depth multiple
width_multiple: 0.75  # layer channel multiple
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
# YOLOv5 backbone
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
# YOLOv5 head
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [-1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [-1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [-1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [-1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)
   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)




2.4.1.        CBL模块

CBL为卷积模块,YOLOv5主干网络中的CBL模块以Convolution + Batch Normalization + Activation的形式,对输入数据进行卷积计算、批标准化计算和经过一个激活函数,其中的激活函数选用LeakyRelu,对网络加入非线性并加快网络的收敛速度。

2.4.2.        Focus模块

从.yaml 配置文件中可以看到在backbone主干网络中包含了focus模块,focus模块是对

对图片进行切片操作,通过在图片中每间隔1个像素取值,得到4张图片,使得图片的长和宽分别减半,通道数扩展为原来的4倍,该操作类似于2倍下采样但是保证了图片信息没有丢失,以YOLOV5s为例,原始的640 × 640 × 3的图像通过Focus模块,输出得到320 × 320 × 12的特征图。

2.4.3.        bottleneck模块

Bottleneck模块可以通过卷积计算改变数据的通道数,bottleneck瓶颈层有多种形式,其标准形式为进行一个1×1和3×3的卷积后加上其本身的短路连接, 而BottleneckCSP是几个标准bottleneck的堆叠,YOLOV5网络中的C3模块与BottleneckCSP模块类似,只是在C3中的卷积计算后加上了BN层和激活函数积操作。

2.4.4.        SPP模块


2.4.5.        Upsample模块


import tensorflow as tf
from tensorflow.keras.layers import Layer, Conv2D, BatchNormalization, MaxPool2D
from tensorflow import kerasimport math
import numpy as np

class Conv2d(keras.layers.Layer):
    def __init__(self, c1, c2, k, s=1, g=1, bias=True, w=None):
        super(Conv2d, self).__init__()
        assert g == 1, "TF v2.2 Conv2D does not support 'groups' argument"
        self.conv = keras.layers.Conv2D(
            c2, k, s, 'VALID', use_bias=bias,
            kernel_initializer=keras.initializers.Constant(w.weight.permute(2, 3, 1, 0).numpy()),
            bias_initializer=keras.initializers.Constant(w.bias.numpy()) if bias else None )
    def call(self, inputs):
        return self.conv(inputs)
class LeakyRelu(object):
    def __call__(self, x):
        return tf.nn.leaky_relu(x)

class Conv(Layer):
    def __init__(self, filters, kernel_size, strides, padding='SAME', groups=1):
        super(Conv, self).__init__()
        self.conv = Conv2D(filters, kernel_s

        self.conv = Conv2D(filters, kernel_size, strides, padding, groups=groups,                         use_bias=False,
        self.bn = BatchNormalization()
        self.activation = LeakyRelu()
    def call(self, x):
        return self.activation(self.bn(self.conv(x)))

class Focus(Layer):
    def __init__(self, filters, kernel_size, strides=1, padding='SAME'):
        super(Focus, self).__init__()
        self.conv = Conv(filters, kernel_size, strides, padding)
    def call(self, x):
        return self.conv(tf.concat([x[..., ::2, ::2, :],
                                    x[..., 1::2, ::2, :],
                                    x[..., ::2, 1::2, :],
                                    x[..., 1::2, 1::2, :]],

class Bottleneck(Layer):
    def __init__(self, units, shortcut=True, expansion=0.5):
        super(Bottleneck, self).__init__()
        self.conv1 = Conv(int(units * expansion), 1, 1)
        self.conv2 = Conv(units, 3, 1)
        self.shortcut = shortcut
    def call(self, x):
        if self.shortcut:
            return x + self.conv2(self.conv1(x))
        return self.conv2(self.conv1(x))

class BottleneckCSP(Layer):
    def __init__(self, units, n_layer=1, shortcut=True, expansion=0.5):
        super(BottleneckCSP, self).__init__()
        units_e = int(units * expansion)
        self.conv1 = Conv(units_e, 1, 1)
        self.conv2 = Conv2D(units_e, 1, 1, use_bias=False,         kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        self.conv3 = Conv2D(units_e, 1, 1, use_bias=False,      kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        self.conv4 = Conv(units, 1, 1)
        self.bn = BatchNormalization(momentum=0.03)
        self.activation = LeakyRelu()
        self.modules = tf.keras.Sequential([Bottleneck(units_e, shortcut, expansion=1.0) for  _ in range(n_layer)])

    def call(self, x):
class BottleneckCSP(Layer):
    def __init__(self, units, n_layer=1, shortcut=True, expansion=0.5):
        super(BottleneckCSP, self).__init__()
        units_e = int(units * expansion)
        self.conv1 = Conv(units_e, 1, 1)
        self.conv2 = Conv2D(units_e, 1, 1, use_bias=False,                           kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        self.conv3 = Conv2D(units_e, 1, 1, use_bias=False,                     kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        self.conv4 = Conv(units, 1, 1)
        self.bn = BatchNormalization(momentum=0.03)
        self.activation = LeakyRelu()
        self.modules = tf.keras.Sequential([Bottleneck(units_e, shortcut, expansion=1.0) for _ in range(n_layer)])
    def call(self, x):
        y1 = self.conv3(self.modules(self.conv1(x)))
        y2 = self.conv2(x)
        return self.conv4(self.activation(self.bn(tf.concat([y1, y2], axis=-1)))) class SPP(Layer):
    def __init__(self, units, kernels=(5, 9, 13)):
        super(SPP, self).__init__()
        units_e = units // 2  # Todo:
        self.conv1 = Conv(units_e, 1, 1)
        self.conv2 = Conv(units, 1, 1)
        self.modules = [MaxPool2D(pool_size=x, strides=1, padding='SAME') for x in kernels]
    def call(self, x):
        x = self.conv1(x)
        return self.conv2(tf.concat([x] + [module(x) for module in self.modules], axis=-1))
class SPPCSP(Layer):
    # Cross Stage Partial Networks
    def __init__(self, units, n=1, shortcut=False, expansion=0.5, kernels=(5, 9, 13)):
        super(SPPCSP, self).__init__()
        units_e = int(2 * units * expansion)
        self.conv1 = Conv(units_e, 1, 1)
        self.conv2 = Conv2D(units_e, 1, 1, use_bias=False,                   kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        self.conv3 = Conv(units_e, 3, 1)
        self.conv4 = Conv(units_e, 1, 1)
        self.modules = [MaxPool2D(pool_size=x, strides=1, padding='same') for x in kernels]
        self.conv5 = Conv(units_e, 1, 1)
        self.conv6 = Conv(units_e, 3, 1)
        self.bn = BatchNormalization()
        self.act = LeakyRelu()
        self.act = LeakyRelu()
        self.conv7 = Conv(units, 1, 1)
    def call(self, x):
        x1 = self.conv4(self.conv3(self.conv1(x)))
        y1 = self.conv6(self.conv5(tf.concat([x1] + [module(x1) for module in self.modules], axis=-1)))
        y2 = self.conv2(x)
        return self.conv7(self.act(self.bn(tf.concat([y1, y2], axis=-1))))
class Upsample(Layer):
    def __init__(self, i=None, ratio=2, method='bilinear'):
        super(Upsample, self).__init__()
        self.ratio = ratio
        self.method = method

    def call(self, x):
        return tf.image.resize(x, (tf.shape(x)[1] * self.ratio, tf.shape(x)[2] * self.ratio), method=self.method)




在损失计算中,分类任务和置信度任务都是通过二元交叉熵损失函数计算,再通过gamma和alpha的Focal Loss来调整权重,而边界框是通过以GIOU来计算其损失函数。

def parse_model(yaml_dict):  # model_dict, input_channels(3)
        anchors, nc = yaml_dict['anchors'], yaml_dict['nc']
        depth_multiple, width_multiple = yaml_dict['depth_multiple'],                                          yaml_dict['width_multiple']
        num_anchors = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors
        output_dims = num_anchors * (nc + 5)
        layers = []
        # from, number, module, args
        for i, (f, number, module, args) in enumerate(yaml_dict['backbone'] +                     yaml_dict['head']):
            # all component is a Class, initialize here, call in self.forward
            module = eval(module) if isinstance(module, str) else module
            for j, arg in enumerate(args):
                    args[j] = eval(arg) if isinstance(arg, str) else arg                                except:
            number = max(round(number * depth_multiple), 1) if number > 1 else number

            if module in [Conv2D, Conv, Bottleneck, SPP, Focus, BottleneckCSP, C3]:
                c2 = args[0]
                c2 = math.ceil(c2 * width_multiple / 8) * 8 if c2 != output_dims else c2
                args = [c2, *args[1:]]
                if module in [BottleneckCSP, C3, SPPCSP]:
                    args.insert(1, number)
                    number = 1
            modules = tf.keras.Sequential(*[module(*args) for _ in range(number)]) if number >                        1 else module(*args)   
            modules.i, modules.f = i, f
        return layers class Model(object):    # model, channels, classes
    def __init__(self, cfg='yolov5s.yaml', ch=3, nc=20, model=None, imgsz=(640, 640)):             super(Model, self).__init__()
        if isinstance(cfg, dict):
            self.yaml = cfg  # model dict
        else:  # is *.yaml
            import yaml  # for torch hub
            self.yaml_file = Path(cfg).name
            with open(cfg) as f:
                self.yaml = yaml.load(f, Loader=yaml.FullLoader)  # model dict
        self.imgsz =imgsz
        # Define model
        if nc and nc != self.yaml['nc']:
            print('Overriding %s nc=%g with nc=%g' % (cfg, self.yaml['nc'], nc))
            self.yaml['nc'] = nc  # override yaml value
        self.model = parse_model(self.yaml)
        if isinstance(model, Detect):
            # transfer the anchors to grid coordinator, 3 * 3 * 2
            model.anchors /= tf.reshape(module.stride, [-1, 1, 1])        
    def __call__(self, img_size, name='yolo'):
        x = tf.keras.Input([img_size, img_size, 3])
        output = self.forward(x)
        return tf.keras.Model(inputs=x, outputs=output, name=name)    
    def forward(self, inputs, tf_nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25):
        y = []  # outputs
        x = inputs
        for i, m in enumerate(self.model):
            if m.f != -1:  
                if isinstance(m.f, int):
                    x = y[m.f]
                    x = y[m.f]
                    x = [x if j == -1 else y[j] for j in m.f]
            x = m(x)  # run
        return x

class Loss(object):
      def __init__(self, anchors, iou_thres, num_classes=20, img_size=640, label_smoothing=0):
        self.anchors = anchors
        self.strides = [8, 16, 32]
        self.iou_thres = iou_thres
        self.num_classes = num_classes
        self.img_size = img_size
        self.bce_conf =  tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
        self.bce_class =  tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE,
      def __call__(self, y_true, y_pred):
        iou_loss_all = obj_loss_all = class_loss_all = tf.zeros(1)
        balance = [4.0, 1.0, 0.4] if len(y_pred) == 3 else [4.0, 1.0, 0.25, 0.06]
        for i, (pred, true) in enumerate(zip(y_pred, y_true)):
            true_box, true_obj, true_class = tf.split(true, (4, 1, -1), axis=-1)
            pred_box, pred_obj, pred_class = tf.split(pred, (4, 1, -1), axis=-1)
            if tf.shape(true_class)[-1] == 1 and self.num_classes > 1:
                true_class = tf.squeeze(tf.one_hot(tf.cast(true_class, tf.dtypes.int32),          depth=self.num_classes, axis=-1), -2)
            box_scale = 2 - 1.0 * true_box[..., 2] * true_box[..., 3] / (self.img_size ** 2)
            obj_mask = tf.squeeze(true_obj, -1)  # obj or noobj
            background_mask = 1.0 - obj_mask
            conf_focal = tf.squeeze(tf.math.pow(true_obj - pred_obj, 2), -1)
            # giou loss
            iou = bbox_iou(pred_box, true_box, xyxy=False, giou=True)           
            iou_loss = (1 - iou) * obj_mask * box_scale  # batch_size * grid * grid * 3
            # confidence loss
            conf_loss = self.bce_conf(true_obj, pred_obj)
            conf_loss = conf_focal * (obj_mask * conf_loss + background_mask * conf_loss)               # class loss
            class_loss = obj_mask * self.bce_class(true_class, pred_class)
            iou_loss = tf.reduce_mean(tf.reduce_sum(iou_loss, axis=[1, 2, 3]))
            conf_loss = tf.reduce_mean(tf.reduce_sum(conf_loss, axis=[1, 2, 3]))
            class_loss = tf.reduce_mean(tf.reduce_sum(class_loss, axis=[1, 2, 3]))

            iou_loss_all += iou_loss * balance[i]
            iou_loss_all += iou_loss * balance[i]
            obj_loss_all += conf_loss * balance[i]
            class_loss_all += class_loss * self.num_classes * balance[i]  # to balance the 3 loss

        return (iou_loss_all, obj_loss_all, class_loss_all)

def bbox_iou(bbox1, bbox2, xyxy=False, giou=False, diou=False, ciou=False, epsilon=1e-9):
    assert bbox1.shape == bbox2.shape
    # giou loss: https://arxiv.org/abs/1902.09630
    if xyxy:
        b1x1, b1y1, b1x2, b1y2 = bbox1[..., 0], bbox1[..., 1], bbox1[..., 2], bbox1[..., 3]
        b2x1, b2y1, b2x2, b2y2 = bbox2[..., 0], bbox2[..., 1], bbox2[..., 2], bbox2[..., 3]
    else:  # xywh -> xyxy
        b1x1, b1x2 = bbox1[..., 0] - bbox1[..., 2] / 2, bbox1[..., 0] + bbox1[..., 2] / 2
        b1y1, b1y2 = bbox1[..., 1] - bbox1[..., 3] / 2, bbox1[..., 1] + bbox1[..., 3] / 2
        b2x1, b2x2 = bbox2[..., 0] - bbox2[..., 2] / 2, bbox2[..., 0] + bbox2[..., 2] / 2
        b2y1, b2y2 = bbox2[..., 1] - bbox2[..., 3] / 2, bbox2[..., 1] + bbox2[..., 3] / 2

    # intersection area
    inter = tf.maximum(tf.minimum(b1x2, b2x2) - tf.maximum(b1x1, b2x1), 0) * \
            tf.maximum(tf.minimum(b1y2, b2y2) - tf.maximum(b1y1, b2y1), 0)

    # union area
    w1, h1 = b1x2 - b1x1 + epsilon, b1y2 - b1y1 + epsilon
    w2, h2 = b2x2 - b2x1+ epsilon, b2y2 - b2y1 + epsilon
    union = w1 * h1 + w2 * h2 - inter + epsilon

    # Giou
    iou = inter / union

    cw = tf.maximum(b1x2, b2x2) - tf.minimum(b1x1, b2x1)
    ch = tf.maximum(b1y2, b2y2) - tf.minimum(b1y1, b2y1)
    enclose_area = cw * ch + epsilon
    giou = iou - 1.0 * (enclose_area - union) / enclose_area
    return tf.clip_by_value(giou, -1, 1)



在完成网络的搭建后,需要从上述生成得到的TFRecord文件中读取训练数据,需要设置网络的分类类别数,根据batch size分批把数据放入网络中,并且设置网络训练轮数、优化器和学习率等,并将训练的网络模型保存为.pb或.pbtxt文件。

from absl import app, flags, logging
from absl.flags import FLAGS
import tensorflow as tf
import numpy as np
import cv2
import time
from models.yolo import *
from data.dataset import *

flags.DEFINE_string('dataset', './data/voc2012_train.tfrecord', 'path to dataset')
flags.DEFINE_string('val_dataset', './data/voc2012_val.tfrecord', 'path to validation dataset')
flags.DEFINE_string('yaml_dir', './models/yolov5s.yaml', 'path to yaml file')
flags.DEFINE_string('classes', './data/voc2012.names', 'path to classes file')
flags.DEFINE_integer('epochs', 20, 'number of epochs')
flags.DEFINE_integer('batch_size', 8, 'batch size')
flags.DEFINE_integer('img_size', 640, 'image size')
flags.DEFINE_float('learning_rate', 1e-3, 'learning rate')
flags.DEFINE_integer('num_classes', 20, 'number of classes in the model')
flags.DEFINE_boolean('multi_gpu', False, 'Use if wishing to train with more than 1 GPU.')
flags.DEFINE_float('label_smoothing', 0.02, 'label smoothing')
flags.DEFINE_integer('yolo_max_boxes', 100, 'yolo max boxes')

def transform(image, label):
    label_encoder = anchor_label.encode(label)
    return image, label_encoder

def main(_argv):
    train_dataset = load_tfrecord_dataset(FLAGS.batch_size,
        FLAGS.dataset, FLAGS.classes, FLAGS.size)
    Yolo = Model(cfg=FLAGS.yaml_dir)
    anchors = Yolo.model[-1].anchors
    stride = Yolo.model[-1].stride
    num_classes = FLAGS.num_classes
    anchor_label = AnchorLabeler(anchors,
                                          grids=FLAGS.img_size / stride,
    train_dataset = train_dataset.map(transform,      num_parallel_calls=tf.data.experimental.AUTOTUNE)
    train_dataset =  train_dataset.batch(FLAGS.batch_size).prefetch(tf.data.experimental.AUTOTUNE)
   Yolo_loss = Loss(anchors, iou_thres=0.3,
    optimizer = tf.keras.optimizers.Adam(lr=FLAGS.learning_rate)
    Yolo = Yolo(FLAGS.img_size)
    for epoch in range(0, FLAGS.epochs):
        for step, (image, target) in enumerate(train_dataset):
            with tf.GradientTape() as tape:
                output = Yolo(image)            
                iou_loss, conf_loss, prob_loss = Yolo_loss(target, output)
                pred_loss = iou_loss+conf_loss+prob_loss
                total_loss = tf.reduce_sum(pred_loss)
            grads = tape.gradient(total_loss, Yolo.trainable_variables)
            optimizer.apply_gradients(zip(grads, Yolo.trainable_variables))
            logging.info("{}_train_{}, {}, {}".format(epoch, step, total_loss.numpy(),
                list(map(lambda x: np.sum(x.numpy()), pred_loss))))
            tf.saved_model.save(Yolo, '/data/Yolov5/weights/')
if __name__=='__main__':









选择部署的操作系统和版本等进行下载和安装,本文的所有实现基于Windows操作系统下的2021.4.1 LTS版本。


转换OpenVINO 工具套件的IR格式

$python mo_tf.py --saved_model_dir <.pb文件夹路径> --input_shape [1,640,640,3] --output_dir <输出文件夹路径> --data_type FP32

运行成功之后会在输出文件夹路径下获得.xml和.bin文件,.xml和.bin是OpenVINO™ 工具套件中的模型存储方式,后续将基于.bin和.xml文件进行部署,效果如下。



此实例将在C++上进行推理部署, 在部署中包括有引擎初始化、数据准备、推理、结果处理等方面。引擎初始化需要读入转化后的模型文件并获取图像的输入输出信息。在数据准备中需要将输入图像缩放到640*640的尺寸大小并将通道输入改为RGB。然后将输入填充在blob中,进行推理。得到3个检测头,分别对应80、40和20的栅格尺寸,并依次对结果进行解析。最后通过NMS剔除多余的候选框。

// 导入头文件



using namespace InferenceEngine;

using namespace std;

using namespace cv;

int main() {

     // 初始化推理引擎

     Core ie;

     // 读取转换得到的.xml和.bin文件

     CNNNetwork network = ie.ReadNetwork("./openvino/yolov5s.xml", "./openvino/yolov5s.bin");

     // 获取设置输入输出格式

     // 从模型中获取输入数据的格式信息  

     InputsDataMap inputsInfo = network.getInputsInfo();

     InputInfo::Ptr& input = inputsInfo.begin()->second;

     string inputs_name = inputsInfo.begin()->first;

     ICNNNetwork::InputShapes inputShapes = network.getInputShapes();


     // 从模型中获取推断结果的格式

     OutputsDataMap outputsInfo = network.getOutputsInfo();


     for (auto& item_out : outputsInfo) {




     // 获取可执行网络,这里的CPU指的是推断运行的器件,可选"GPU"

     ExecutableNetwork executable_network = ie.LoadNetwork(network, "CPU");

     // 推理请求

     InferRequest infer_request = executable_network.CreateInferRequest();


     Mat src = cv::imread("./img/test.jpg");

     size_t h = lrInputBlob->getTensorDesc().getDims()[2];

     size_t w = lrInputBlob->getTensorDesc().getDims()[3];

     size_t image_size = h * w;

     Mat inframe = src.clone();

     cv::resize(src, src, Size(640, 640));

     cv::cvtColor(src, src, COLOR_BGR2RGB);

     InferenceEngine::LockedMemory<void> blobMapped = InferenceEngine::as(lrInputBlob)->wmap();

     float* blob_data = blobMapped.as<float*>();


     for (size_t row = 0; row < h; row++) {

         for (size_t col = 0; col < w; col++) {

              for (size_t ch = 0; ch < 3; ch++) {

                   blob_data[image_size*ch + row * w + col] = float(src.at(row, col)[ch]) / 255.0f;







     float _cof_threshold = 0.1;

     float _nms_area_threshold = 0.5;



     vector<float> origin_rect_cof;

     int s[3] = { 80,40,20 };


     int i = 0;

     for (auto OutputsBlob_name : OutputsBlobs_names) {

         Blob::Ptr OutputBlob = infer_request.GetBlob(OutputsBlob_name);

         parse_yolov5(OutputBlob, s[i], _cof_threshold, origin_rect, origin_rect_cof);




     vector<int> final_id;


    for (size_t i = 0; i < final_id.size(); ++i)


         int idx = final_id[i];

         Rect box = origin_rect[idx];

         cv::rectangle(inframe, box, Scalar(140, 199, 0), 1, 8, 0);


     cv::imwrite("./img/output.jpg", inframe);



bool Detector::parse_yolov5(const Blob::Ptr &blob,int net_grid,float cof_threshold,

    vector&o_rect,vector<float>& o_rect_cof){

vector<int> anchors = get_anchors(net_grid);

    LockedMemory<const void> blobMapped = as(blob)->rmap();

const float *output_blob = blobMapped.as<float *>();


    int item_size = 25;

    size_t anchor_n = 3;

    for(int n=0;n<anchor_n;++n)

        for(int i=0;i<net_grid;++i)

            for(int j=0;j<net_grid;++j)


                double box_prob = output_blob[n*net_grid*net_grid*item_size +                                      i*net_grid*item_size + j *item_size+ 4];

                box_prob = sigmoid(box_prob);


                if(box_prob < cof_threshold)



                double x = output_blob[n*net_grid*net_grid*item_size +                                         i*net_grid*item_size + j*item_size + 0];

                double y = output_blob[n*net_grid*net_grid*item_size +                                         i*net_grid*item_size + j*item_size + 1];

                double w = output_blob[n*net_grid*net_grid*item_size +                                         i*net_grid*item_size + j*item_size + 2];

                double max_prob = 0;

                int idx=0;

                for(int t=5;t<25;++t){

                    double tp= output_blob[n*net_grid*net_grid*item_size +                                   i*net_grid*item_size + j *item_size+ t];

                    tp = sigmoid(tp);

                    if(tp > max_prob){

                        max_prob = tp;

                        idx = t;



                float cof = box_prob * max_prob;               


                if(cof < cof_threshold)


                x = (sigmoid(x)*2 - 0.5 + j)*640.0f/net_grid;

                y = (sigmoid(y)*2 - 0.5 + i)*640.0f/net_grid;

                w = pow(sigmoid(w)*2,2) * anchors[n*2];

                h = pow(sigmoid(h)*2,2) * anchors[n*2 + 1];

                double r_x = x - w/2;

                double r_y = y - h/2;

                Rect rect = Rect(round(r_x),round(r_y),round(w),round(h));




    if(o_rect.size() == 0) return false;

    else return true;


double Detector::sigmoid(double x){

    return (1 / (1 + exp(-x)));


vector<int> Detector::get_anchors(int net_grid){

    vector<int> anchors(6);

    int anchor_80[6] = {10,13, 16,30, 33,23};

    int anchor_40[6] = {30,61, 62,45, 59,119};

    int anchor_20[6] = {116,90, 156,198, 373,326};

    if(net_grid == 80){ anchors.insert(anchors.begin(), anchor_80, anchor_80 + 6); }

    else if(net_grid == 40){ anchors.insert(anchors.begin(), anchor_40, anchor_40 + 6); }

    else if(net_grid == 20){ anchors.insert(anchors.begin(), anchor_20, anchor_20 + 6); }

    return anchors;





Intel® DevCloud for the Edge 支持在英特尔的硬件平台上主动构建原型并试验面向计算机视觉的 AI 工作负载。可以使用OpenVINO™ 工具套件以及 CPU、GPU 和 VPU 和 FPGA 的组合来测试模型的性能。Intel® DevCloud 使用 Jupyter* Notebook 直接在 web 浏览器中执行代码,并立即看到可视化结果。通过转换得到的.xml和.bin文件在不同边缘节点进行测试来分析性能,具体操作可以参考https://bizwebcast.intel.cn/dev/articleDetails.html?id=95,测试结果见表4-1。

