学习笔记：Face Recognition 源码学习

本文为Face Recognition库的学习笔记。该库基于dlib实现，用深度学习训练数据。据作者所说模型准确率高达99.38%。

Face Recognition功能强大可以实现人脸检测定位、人脸特征提取、人脸对比、甚至人脸涂装等更能，看上去很是吸引人。特别是该库支持GPU加速，支持CPU下Hog模型和GPU下CNN模型

hog：方向梯度直方图（Histogram of Oriented Gradient, HOG）特征算法，
cnn：卷积神经网络，模板匹配常用算法模型

尝试过之后，发现在图像中人脸较少，干扰较少的情况下，1:1人脸对比GPU加速的CNN模型并没有明显优势，甚至更慢，但在复杂光线及复杂场景下，cnn的优势特别明显，能够更精准，更快速的找到图像中的人脸，即使人脸在图像中有一定程度的变形，也能找到；而hog模型，在复杂场景下会出现漏判的情况。

下面会针对库中主要用到的几个方法，尝试理解他们的原理

人脸检测（face_locations）

人脸检测算法需要用大小位置不同的窗口在图像中进行滑动，然后判断窗口中是否存在人脸。

在深度学习之前的主流方法是特征提取+集成学习分类器，比如以前火热的haar特征+adaboost级联分类器，opencv中实现的人脸检测方法就采用了这种，不过实验结果来看，这种检测方法效果很不好，经常误检测人脸，或者检测不到真实的人脸；

dlib中使用的是HOG（histogram of oriented gradient）+ 回归树的方法，使用dlib训练好的模型进行检测效果要好很多。

dlib也使用了卷积神经网络来进行人脸检测，效果好于HOG的集成学习方法，不过需要使用GPU加速，不然程序会卡爆了，一张图片可能几秒甚至几十秒。


face_detector = dlib.get_frontal_face_detector()
cnn_face_detection_model = face_recognition_models.cnn_face_detector_model_location()
cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)


def _raw_face_locations(img, number_of_times_to_upsample=1, model="hog"):
    if model == "cnn":
        return cnn_face_detector(img, number_of_times_to_upsample)
    else:
        return face_detector(img, number_of_times_to_upsample)

其中detector是人脸检测的检测算子，可使用HOG模型或者CNN模型匹配到人脸位置。

特征提取（face_encodings）

该接口用于计算一副图像中，每张人脸的特征值。每幅人脸的图像数据都将转换成一个长度为128的向量，这128个数据代表了人脸的128个特征指标。


def pose_predictor_model_location():
    return resource_filename(__name__, "models/shape_predictor_68_face_landmarks.dat")

def pose_predictor_five_point_model_location():
    return resource_filename(__name__, "models/shape_predictor_5_face_landmarks.dat")

def face_recognition_model_location():
    return resource_filename(__name__, "models/dlib_face_recognition_resnet_model_v1.dat")

def cnn_face_detector_model_location():
    return resource_filename(__name__, "models/mmod_human_face_detector.dat")
    
face_detector = dlib.get_frontal_face_detector()

predictor_68_point_model = face_recognition_models.pose_predictor_model_location()
pose_predictor_68_point = dlib.shape_predictor(predictor_68_point_model)

predictor_5_point_model = face_recognition_models.pose_predictor_five_point_model_location()
pose_predictor_5_point = dlib.shape_predictor(predictor_5_point_model)

cnn_face_detection_model = face_recognition_models.cnn_face_detector_model_location()
cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)

face_recognition_model = face_recognition_models.face_recognition_model_location()
face_encoder = dlib.face_recognition_model_v1(face_recognition_model)

def _raw_face_landmarks(face_image, face_locations=None, model="large"):
    if face_locations is None:
        face_locations = _raw_face_locations(face_image)
    else:
        face_locations = [_css_to_rect(face_location) for face_location in face_locations]

    pose_predictor = pose_predictor_68_point

    if model == "small":
        pose_predictor = pose_predictor_5_point

    return [pose_predictor(face_image, face_location) for face_location in face_locations]

def face_encodings(face_image, known_face_locations=None, num_jitters=1):
    """
    Given an image, return the 128-dimension face encoding for each face in the image.

    :param face_image: The image that contains one or more faces
    :param known_face_locations: Optional - the bounding boxes of each face if you already know them.
    :param num_jitters: How many times to re-sample the face when calculating encoding. Higher is more accurate, but slower (i.e. 100 is 100x slower)
    :return: A list of 128-dimensional face encodings (one for each face in the image)
    """
    raw_landmarks = _raw_face_landmarks(face_image, known_face_locations, model="small")
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]

上面所有的*.dat都是dlib预训练好的模型，用于匹配图像中的人脸或者人脸特征
detector测算子；
pose_predictor，是使用预测算子获取得到的人脸区域中的五官的几何点区域，这里加载的是5特征点的landmark模型(鼻头，左眼，右眼)；也可以使用68特征点的landmark模型可以检测到下巴，左眉，右眉毛，鼻梁，鼻尖，左眼，右眼，上唇，下唇，但实际测对于人脸对比没有明显差异，但是可以根据这些特征点，来做人脸追踪和视频化妆等功能；
然后根据得到的特征点向量，经过ResNet深度残差网络模型，计算得出128位的特征向量编码；ResNet模型是He Kaiming（2009年和2015的CVPR best paper作者）提出的方法的一个实现，这里训练模型已经给出，因此不需要自己手动去训练了。

上述128位编码将被用于人脸比对。

附上landmark效果：

人脸测距（face_distance）

该接口用于计算目标人脸列表中所有人脸与已知人脸的距离


def face_distance(face_encodings, face_to_compare):
    if len(face_encodings) == 0:
        return np.empty((0))

    return np.linalg.norm(face_encodings - face_to_compare, axis=1)

其中，np.linalg.norm表示计算线性代数中矩阵的范数，默认计算范数L2，axis=1表示按行向量处理，求多个行向量的范数。

L2范数就是单个矩阵中元素的平方和再开平方，这里做矩阵相减求范数L2，其实以前就是求两个矩阵的元素的平方差再开平方，也就是常说的欧几里得距离，或者向量距离。

人脸比对（compare_faces）

该接口用于计算目标人脸列表中所有人脸与已知人脸是否匹配；可选择宽容度，宽容度越高，越容易通过，默认宽容度为0.6，严格情况下可选择0.5。

这个就很简单了，分别计算已知列表中目标人脸的与距离，小于宽容度的就返回True


def compare_faces(known_face_encodings, face_encoding_to_check, tolerance=0.6):
    return list(face_distance(known_face_encodings, face_encoding_to_check) <= tolerance)

其他

看上去Python的代码很简单，很直观，简单几行就能实现酷炫的代码。但其实真正艰深，真正的实现逻辑还是后面成千上万行的C++代码。其实这个项目是给C++的dlib库封装了一个好用的Python接口，强大的还是dlib的算法和他预训练的高质量模型。

方向梯度直方图（Histogram of Oriented Gradient, HOG） https://blog.csdn.net/u012679707/article/details/80657020

卷积神经网络（CNN） https://www.zhihu.com/question/34681168

深度残差网络(ResNet) https://www.jianshu.com/p/073378298dd5

本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名,转载请标明出处
最后编辑时间为: 2019/09/26 08:55

学习笔记：Face Recognition 源码学习

学习笔记：Face Recognition 源码学习

人脸检测 （face_locations）

特征提取 （face_encodings）

人脸测距 （face_distance）

人脸比对 （compare_faces）

其他

人脸检测（face_locations）

特征提取（face_encodings）

人脸测距（face_distance）

人脸比对（compare_faces）