本教程转载于：https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/, 在原教程上加入了自己的理解，我的理解将用 这样的格式写出. （原博客中有错误的地方在本文中也进行了修正）

这是从头开始实现 YOLO v3检测器的教程的第5部分。在上部分中，我们实现了一个将网络输出转换为检测预测的函数。有了一个可以工作的检测器，剩下的就是创建输入和输出管道了。

在这一部分中，我们将构建检测器的输入和输出管道。这包括从磁盘上读取图像，进行预测，使用预测结果在图像上绘制边框，然后将它们保存到下来。我们还将介绍如何让detector在视频流中实时工作。我们将介绍一些命令行标志，以允许对网络的各种超级参数进行一些调整实验。那么让我们开始吧。

创建一个文件叫做detector.py, 添加必要的导入在它的顶部。

from __future__ import division
import time
import torch 
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import cv2 
from util import *
import argparse
import os 
import os.path as osp
from darknet import Darknet
import pickle as pkl
import pandas as pd
import random

创建命令行参数

由于 detector.py 是我们要执行来运行检测器的文件，所以我们可以将命令行参数传递给它。我们使用 python 的 ArgParse 模块来实现这一点。

def arg_parse():
    """
    解析arg参数
    """

    parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')

    parser.add_argument("--images", dest='images', help=
    "Image / Directory containing images to perform detection upon",
                        default="imgs", type=str)
    parser.add_argument("--det", dest='det', help=
    "Image / Directory to store detections to",
                        default="det", type=str)
    parser.add_argument("--bs", dest="bs", help="Batch size", default=1)
    parser.add_argument("--confidence", dest="confidence", help="Object Confidence to filter predictions", default=0.5)
    parser.add_argument("--nms_thresh", dest="nms_thresh", help="NMS Threshhold", default=0.4)
    parser.add_argument("--cfg", dest='cfgfile', help="Config file",
                        default="cfg/yolov3.cfg", type=str)
    parser.add_argument("--weights", dest='weightsfile', help=
    "weightsfile",
                        default="yolov3.weights", type=str)
    parser.add_argument("--reso", dest='reso', help=
    "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                        default="416", type=str)

    return parser.parse_args()
  
  
args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

其中，重要的标志包括images(用于指定输入图像或图像目录)、 det (用于保存检测的目录)、 reso (输入图像的分辨率，可用于速度精度折衷)、 cfg (可选配置文件)和权重文件。

加载网络

从这里下载 coco.names 文件，该文件包含 COCO 数据集中对象的名称。在检测器目录中创建一个文件夹数据。同样地，如果你在 linux 上，你可以输入。

1
2
3

mkdir data
cd data
wget https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names

然后，在程序中加载类文件。

1 2	num_classes = 80 #For COCO classes = load_classes("data/coco.names")

load_classes 是在 util.py 中定义的一个函数，它返回一个字典，该字典将每个类的索引映射到它名称的字符串。

def load_classes(namesfile):
    fp = open(namesfile, "r")
    names = fp.read().split("\n")[:-1]
    return names

初始化网络和加载权重。

# Set up the neural network
print("Loading network.....")
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print("Network successfully loaded")

model.net_info["height"] = args.reso
inp_dim = int(model.net_info["height"])
assert inp_dim % 32 == 0
assert inp_dim > 32

# If there's a GPU availible, put the model on GPU
if CUDA:
    model.cuda()

# Set the model in evaluation mode
model.eval()

加载输入图片

从磁盘读取中读取图片，或者从文件夹中读取图像。图像的路径存储在一个名为 imlist 的列表中。

read_dir = time.time()
#Detection phase
try:
    imlist = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
except NotADirectoryError:
    imlist = []
    imlist.append(osp.join(osp.realpath('.'), images))
except FileNotFoundError:
    print ("No file or directory with the name {}".format(images))
    exit()

read_dir 是一个用于度量时间的检查点(我们将会遇到其中的几个)

接下来判断det文件夹是否存在，如果不存在，我们就创建一个：

1 2	if not os.path.exists(args.det): os.makedirs(args.det)

我们使用opencv去读取图片：

1 2	load_batch = time.time() loaded_ims = [cv2.imread(x) for x in imlist]

OpenCV 以 numpy 数组的形式加载图像，以 BGR 作为颜色通道顺序。PyTorch 的图像输入格式为(batch x 通道 x 高度 x 宽度) ，通道顺序为 RGB。因此，我们将函数 prep_image 写入 util.py 中，将 numpy 数组转换为 PyTorch 的输入格式。

在编写这个函数之前，我们必须编写一个函数来调整图像的大小，保持长宽比的一致性，并填充剩下的区域。

def letterbox_image(img, inp_dim):
    '''resize image with unchanged aspect ratio using padding'''
    img_w, img_h = img.shape[1], img.shape[0]
    w, h = inp_dim
    new_w = int(img_w * min(w/img_w, h/img_h))
    new_h = int(img_h * min(w/img_w, h/img_h))
    resized_image = cv2.resize(img, (new_w,new_h), interpolation = cv2.INTER_CUBIC)
    
    canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)

    canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image
    
    return canvas

现在，我们编写一个接受 OpenCV 图像的函数，并将其转换为我们网络的输入。

def prep_image(img, inp_dim):
    """
    Prepare image for inputting to the neural network. 
    
    Returns a Variable 
    """

    img = cv2.resize(img, (inp_dim, inp_dim))
    img = img[:,:,::-1].transpose((2,0,1)).copy()
    img = torch.from_numpy(img).float().div(255.0).unsqueeze(0)
    return img

除了转换图像，我们还保持了一个原始图像的列表，以及一个包含原始图像维度的im_dim_list

#PyTorch Variables for images
im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))

#List containing dimensions of original images
im_dim_list = [(x.shape[1], x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

if CUDA:
    im_dim_list = im_dim_list.cuda()

批处理

leftover = 0
if len(im_dim_list) % batch_size:
    leftover = 1

if batch_size != 1:
    num_batches = len(imlist) // batch_size + leftover
    im_batches = [torch.cat((im_batches[i * batch_size: min((i + 1) * batch_size,
                                                            len(im_batches))])) for i in range(num_batches)]

检测

我们遍历每个batch的图片，生成预测，并将所有图像的预测张量连接起来。

对于每个batch，我们测量用于检测的时间，即从获取输入到生成 write _ results 函数的输出之间所花费的时间。在 write _ prediction 返回的输出中，其中一个属性是批处理图像的索引。我们以这样的方式转换这个特定属性，它现在表示 imlist 中图像的索引，即包含所有图像地址的列表。

在此之后，我们打印每次检测所用的时间以及在每个图像中检测到的对象。

如果批处理的 write_results 函数的输出是int(0) ，这意味着没有检测，我们使用 continue 跳过循环。

write = 0
start_det_loop = time.time()
for i, batch in enumerate(im_batches):
    #load the image 
    start = time.time()
    if CUDA:
        batch = batch.cuda()
	
    prediction = model(Variable(batch, volatile = True), CUDA)

    prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)

    end = time.time()

    if type(prediction) == int:

        for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
            im_id = i*batch_size + im_num
            print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
            print("{0:20s} {1:s}".format("Objects Detected:", ""))
            print("----------------------------------------------------------")
        continue

    prediction[:,0] += i*batch_size    #transform the atribute from index in batch to index in imlist 

    if not write:                      #If we have't initialised output
        output = prediction  
        write = 1
    else:
        output = torch.cat((output,prediction))

    for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
        im_id = i*batch_size + im_num
        objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
        print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
        print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
        print("----------------------------------------------------------")

    if CUDA:
        torch.cuda.synchronize()

现在，我们有所有图像的检测在我们的张量输出。让我们绘制的图像边框。

绘制检测框

我们使用一个 try-catch 块来检查是否进行了检测到了结果。如果不是这样的话，退出程序。

try:
    output
except NameError:
    print ("No detections were made")
    exit()

在我们绘制边界框之前，我们的输出张量中包含的预测符合网络的输入大小，而不是图像的原始大小。因此，在我们绘制box之前，让我们将每个box的属性转换为图像的原始维度。

im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())

scaling_factor = torch.min(inp_dim/im_dim_list,1)[0].view(-1,1)


output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim_list[:,0].view(-1,1))/2
output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim_list[:,1].view(-1,1))/2

现在，我们的坐标符合我们的图像在填充区域的尺寸。然而，在函数letterbox_image中，我们用缩放因子调整了图像的尺寸(请记住，两个维度都用一个公共因子进行了划分，以保持长宽比)。现在，获取原始图像上边界框的坐标。

1	output[:,1:5] /= scaling_factor

现在，让我们去除掉超出图像边界的边界框。

1
2
3

for i in range(output.shape[0]):
    output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim_list[i,0])
    output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim_list[i,1])

如果图像中有太多的box，用一种颜色把它们都画出来可能不是一个好主意。将此文件下载到您的文件下。这是一个 pickle 文件，其中包含许多可随机选择的颜色。

1 2	class_load = time.time() colors = pkl.load(open("pallete", "rb"))

现在让我们写一个函数来绘制box。

draw = time.time()

def write(x, results, color):
    c1 = tuple(x[1:3].int())
    c2 = tuple(x[3:5].int())
    img = results[int(x[0])]
    cls = int(x[-1])
    label = "{0}".format(classes[cls])
    cv2.rectangle(img, c1, c2,color, 1)
    t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
    c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
    cv2.rectangle(img, c1, c2,color, -1)
    cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
    return img

上面的函数从颜色中随机选择一种颜色绘制一个矩形。它还在边框的左上角创建一个实心矩形，并写上检测的结果。我们使用 cv2.rectangle 函数创建一个实心矩形。

我们在本地定义write函数，这样它就可以访问颜色列表。我们也可以把颜色作为一个参数，但这样我们就只能对每张图片使用一种颜色。

在定义了这个函数之后，让我们现在在图像上绘制边框

1	list(map(lambda x: write(x, loaded_ims), output))

通过在图像名称前加上“ det _”来保存每个图像。我们创建了一个列表，将检测图像保存到该列表中。

1	det_names = pd.Series(imlist).apply(lambda x: "{}/det_{}".format(args.det,x.split("/")[-1]))

最后，将预测出来的图片保存到相应地址当中：

1 2	list(map(cv2.imwrite, det_names, loaded_ims)) end = time.time()

打印时间信息

在程序的末尾，我们将打印一些信息，其中包含执行代码的哪一部分所花费的时间。当我们必须比较不同的超参数如何影响探测器的速度时，这是信息是很有用的。在命令行上执行脚本 detection.py 时，可以设置超参数，如batch size、置信度阈值和 NMS 阈值。

print("SUMMARY")
print("----------------------------------------------------------")
print("{:25s}: {}".format("Task", "Time Taken (in seconds)"))
print()
print("{:25s}: {:2.3f}".format("Reading addresses", load_batch - read_dir))
print("{:25s}: {:2.3f}".format("Loading batch", start_det_loop - load_batch))
print("{:25s}: {:2.3f}".format("Detection (" + str(len(imlist)) +  " images)", output_recast - start_det_loop))
print("{:25s}: {:2.3f}".format("Output Processing", class_load - output_recast))
print("{:25s}: {:2.3f}".format("Drawing Boxes", end - draw))
print("{:25s}: {:2.3f}".format("Average time_per_img", (end - load_batch)/len(imlist)))
print("----------------------------------------------------------")


torch.cuda.empty_cache()

测试对象检测器

例如，在终端上运行:

1	python detect.py --images dog-cycle-car.png --det det

Loading network.....
Network successfully loaded
dog-cycle-car.png    predicted in  2.456 seconds
Objects Detected:    bicycle truck dog
----------------------------------------------------------
SUMMARY
----------------------------------------------------------
Task                     : Time Taken (in seconds)

Reading addresses        : 0.002
Loading batch            : 0.120
Detection (1 images)     : 2.457
Output Processing        : 0.002
Drawing Boxes            : 0.076
Average time_per_img     : 2.657
----------------------------------------------------------