博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
卷积神经网络 手势识别_如何构建识别手语手势的卷积神经网络
阅读量:2519 次
发布时间:2019-05-11

本文共 19219 字,大约阅读时间需要 64 分钟。

卷积神经网络 手势识别

by Vagdevi Kommineni

通过瓦格德维·科米尼(Vagdevi Kommineni)

如何构建识别手语手势的卷积神经网络 (How to build a convolutional neural network that recognizes sign language gestures)

Sign language has been a major boon for people who are hearing- and speech-impaired. But it can serve its purpose only when the other person can understand sign language. Thus it would be really nice to have a system which could convert the hand gesture image to the corresponding English letter. And so the aim of this post is to build such an American Sign Language Recognition System.

手语一直是听力和言语障碍人士的主要福音。 但是,只有当其他人能够理解手语时,它才能达到目的。 因此,拥有一个可以将手势图像转换为相应英文字母的系统真的很不错。 因此,本文的目的是建立这样的美国手语识别系统。

Wikipedia has defined ASL as the following:

维基百科将ASL定义如下:

American Sign Language (ASL) is a that serves as the predominant of in the United States and most of Anglophone Canada.

美国手语 ( ASL )是一种 ,是美国和加拿大大部分的主要 。

First, the data: it is really important to remember the diversity of image classes with respect to influential factors like lighting conditions, zooming conditions etc. has all such different variants. Training on such data makes sure our model has pretty good knowledge of each class. So, let's work on the K.

首先,数据:记住影响照明条件,缩放条件等影响因素的图像类别的多样性非常重要具有所有这些不同的变体。 对此类数据进行培训可确保我们的模型对每个班级都有相当好的知识。 因此,让我们处理K 。

The dataset consists of the images of hand gestures for each letter in the English alphabet. The images of a single class are of different variants — that is, zoomed versions, dim and bright light conditions, etc. For each class, there are as many as 3000 images. Let us consider classifying “A”, “B” and “C” images in our work for simplicity. Here are links for the full code for and .

数据集由英语字母中每个字母的手势图像组成。 单个类别的图像具有不同的变体-即缩放版本,昏暗和明亮的光照条件等。对于每个类别,最多有3000张图像。 为了简单起见,让我们考虑对工作中的“ A”,“ B”和“ C”图像进行分类。 这是和的完整代码的链接。

We are going to build an to achieve this classification task. Since we are training the CNN, make sure that there is the support of computational resources like GPU.

我们将构建一个来完成此分类任务。 由于我们正在训练CNN,因此请确保有GPU等计算资源的支持。

We start by importing the necessary modules.

我们首先导入必要的模块。

import warningswarnings.filterwarnings("ignore", category=DeprecationWarning)
import osimport cv2import randomimport numpy as npimport kerasfrom random import shufflefrom keras.utils import np_utilsfrom shutil import unpack_archive
print("Imported Modules...")

Download the data zip file from K. Now, let us select the gesture images for A, B, and C and split the obtained data into training data, validation data, and test data.

从K 下载数据zip文件。 现在,让我们选择A,B和C的手势图像,并将获得的数据分为训练数据,验证数据和测试数据。

# data folder pathdata_folder_path = "asl_data/new" files = os.listdir(data_folder_path)
# shuffling the images in the folderfor i in range(10):   shuffle(files)
print("Shuffled Data Files")
# dictionary to maintain numerical labelsclass_dic = {"A":0,"B":1,"C":2}
# dictionary to maintain countsclass_count = {'A':0,'B':0,'C':0}
# training listsX = []Y = []
# validation listsX_val = []Y_val = []
# testing listsX_test = []Y_test = []
for file_name in files:  label = file_name[0]  if label in class_dict:    path = data_folder_path+'/'+file_name    image = cv2.imread(path)    resized_image = cv2.resize(image,(224,224))    if class_count[label]<2000:      class_count[label]+=1      X.append(resized_image)      Y.append(class_dic[label])    elif class_count[label]>=2000 and class_count[label]<2750:      class_count[label]+=1      X_val.append(resized_image)      Y_val.append(class_dic[label])    else:      X_test.append(resized_image)      Y_test.append(class_dic[label])

Each image in the dataset is named according to a naming convention. The 34th image of class A is named as “A_34.jpg”. Hence, we consider only the first element of the name of the file string and check if it is of the desired class.

数据集中的每个图像均根据命名约定进行命名。 A类的第34张图像命名为“ A_34.jpg”。 因此,我们仅考虑文件字符串名称的第一个元素,并检查它是否属于所需的类。

Also, we are splitting the images based on counts and storing those images in the X and Y lists — X for image, and Y for the corresponding classes. Here, counts refer to the number of images we wish to put in the training, validation, and test sets respectively. So here, out of 3000 images for each class, I have put 2000 images in the training set, 750 images in the validation set, and the remaining in the test set.

另外,我们将基于计数拆分图像并将这些图像存储在X和Y列表中-X表示图像,Y表示对应的类。 在这里,计数是指我们希望分别放入训练,验证和测试集中的图像数量。 因此,这里,在每个课程的3000张图像中,我将2000张图像放入训练集中,将750张图像放入验证集中,其余的放入测试集中。

Some people also prefer to split based on the total dataset (not for each class as we did here), but this doesn’t promise that all classes are learned properly. The images are read and are stored in the form of Numpy arrays in the lists.

有些人还希望基于总数据集进行拆分(而不是像我们在此处那样对每个班级进行拆分),但这并不能保证所有班级都能正确学习。 图像被读取并以Numpy数组的形式存储在列表中。

Now the label lists (the Y’s) are encoded to form numerical one-hot vectors. This is done by the np_utils.to_categorical.

现在,标签列表(Y)被编码以形成数字一热向量。 这是由np_utils.to_categorical完成的。

# one-hot encodings of the classesY = np_utils.to_categorical(Y)Y_val = np_utils.to_categorical(Y_val)Y_test = np_utils.to_categorical(Y_test)

Now, let us store these images in the form of .npy files. Basically, we create separate .npy files to store the images belonging to each set.

现在,让我们以.npy文件的形式存储这些图像。 基本上,我们创建单独的.npy文件来存储属于每个集合的图像。

if not os.path.exists('Numpy_folder'):    os.makedirs('Numpy_folder')
np.save(npy_data_path+'/train_set.npy',X)np.save(npy_data_path+'/train_classes.npy',Y)
np.save(npy_data_path+'/validation_set.npy',X_val)np.save(npy_data_path+'/validation_classes.npy',Y_val)
np.save(npy_data_path+'/test_set.npy',X_test)np.save(npy_data_path+'/test_classes.npy',Y_test)
print("Data pre-processing Success!")

Now that we have completed the data preprocessing part, let us take a look at the full data preprocessing code here:

现在我们已经完成了数据预处理部分,让我们在这里查看完整的数据预处理代码:

# preprocess.py
import warningswarnings.filterwarnings("ignore", category=DeprecationWarning)
import osimport cv2import randomimport numpy as npimport kerasfrom random import shufflefrom keras.utils import np_utilsfrom shutil import unpack_archive
print("Imported Modules...")
# data folder pathdata_folder_path = "asl_data/new" files = os.listdir(data_folder_path)
# shuffling the images in the folderfor i in range(10):   shuffle(files)
print("Shuffled Data Files")
# dictionary to maintain numerical labelsclass_dic = {"A":0,"B":1,"C":2}
# dictionary to maintain countsclass_count = {'A':0,'B':0,'C':0}
# training listsX = []Y = []
# validation listsX_val = []Y_val = []
# testing listsX_test = []Y_test = []
for file_name in files:  label = file_name[0]  if label in class_dict:    path = data_folder_path+'/'+file_name    image = cv2.imread(path)    resized_image = cv2.resize(image,(224,224))    if class_count[label]<2000:      class_count[label]+=1      X.append(resized_image)      Y.append(class_dic[label])    elif class_count[label]>=2000 and class_count[label]<2750:      class_count[label]+=1      X_val.append(resized_image)      Y_val.append(class_dic[label])    else:      X_test.append(resized_image)      Y_test.append(class_dic[label])
# one-hot encodings of the classesY = np_utils.to_categorical(Y)Y_val = np_utils.to_categorical(Y_val)Y_test = np_utils.to_categorical(Y_test)
if not os.path.exists('Numpy_folder'):    os.makedirs('Numpy_folder')
np.save(npy_data_path+'/train_set.npy',X)np.save(npy_data_path+'/train_classes.npy',Y)
np.save(npy_data_path+'/validation_set.npy',X_val)np.save(npy_data_path+'/validation_classes.npy',Y_val)
np.save(npy_data_path+'/test_set.npy',X_test)np.save(npy_data_path+'/test_classes.npy',Y_test)
print("Data pre-processing Success!")

Now comes the training part! Let us start by importing the essential modules so we can construct and train the CNN AlexNet. Here it is primarily done using Keras.

现在是训练部分! 让我们从导入基本模块开始,以便我们可以构建和训练CNN AlexNet。 这里主要是使用Keras完成的。

# importing from keras.optimizers import SGDfrom keras.models import Sequentialfrom keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2D
print("Imported Network Essentials")

We next go for loading the images stored in the form of .npy:

接下来,我们将加载以.npy格式存储的图像:

X_train=np.load(npy_data_path+"/train_set.npy")Y_train=np.load(npy_data_path+"/train_classes.npy")
X_valid=np.load(npy_data_path+"/validation_set.npy")Y_valid=np.load(npy_data_path+"/validation_classes.npy")
X_test=np.load(npy_data_path+"/test_set.npy")Y_test=np.load(npy_data_path+"/test_classes.npy")

We then head towards defining the structure of our CNN. Assuming prior knowledge of the architecture, here is the Keras code for that.

然后,我们走向定义CNN的结构。 假设具有架构的先验知识,下面是的代码。

model = Sequential()
# 1st Convolutional Layermodel.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11),strides=(4,4), padding='valid'))model.add(Activation('relu'))
# Max Pooling model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layermodel.add(BatchNormalization())
# 2nd Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))model.add(Activation('relu'))
# Max Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisationmodel.add(BatchNormalization())
# 3rd Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))
# Batch Normalisationmodel.add(BatchNormalization())
# 4th Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))
# Batch Normalisationmodel.add(BatchNormalization())
# 5th Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))
# Max Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisationmodel.add(BatchNormalization())
# Passing it to a dense layermodel.add(Flatten())
# 1st Dense Layermodel.add(Dense(4096, input_shape=(224*224*3,)))model.add(Activation('relu'))
# Add Dropout to prevent overfittingmodel.add(Dropout(0.4))
# Batch Normalisationmodel.add(BatchNormalization())
# 2nd Dense Layermodel.add(Dense(4096))model.add(Activation('relu'))
# Add Dropoutmodel.add(Dropout(0.6))
# Batch Normalisationmodel.add(BatchNormalization())
# 3rd Dense Layermodel.add(Dense(1000))model.add(Activation('relu'))
# Add Dropoutmodel.add(Dropout(0.5))
# Batch Normalisationmodel.add(BatchNormalization())
# Output Layermodel.add(Dense(24))model.add(Activation('softmax'))
model.summary()

The Sequential model is a linear stack of layers. We add the convolutional layers (applying filters), activation layers (for non-linearity), max-pooling layers (for computational efficiency) and batch normalization layers (to standardize the input values from the previous layer to the next layer) and the pattern is repeated five times.

Sequential模型是层的线性堆栈。 我们添加卷积层(应用过滤器),激活层(用于非线性),最大池化层(用于计算效率)和批处理归一化层(以标准化从上一层到下一层的输入值)和模式重复五次。

The Batch Normalization layer was introduced in 2014 by Ioffe and Szegedy. It addresses the vanishing gradient problem by standardizing the output of the previous layer, it speeds up the training by reducing the number of required iterations, and it enables the training of deeper neural networks.

批次归一化层由Ioffe和Szegedy于2014年引入。 它通过标准化前一层的输出来解决消失的梯度问题,通过减少所需的迭代次数来加快训练速度,并且可以训练更深的神经网络。

At last, 3 fully-connected dense layers along with dropouts (to avoid over-fitting) are added.

最后,添加3个完全连接的密集层以及辍学(以避免过度拟合)。

To get the summarized description of the model, use model.summary().

要获取模型的摘要说明,请使用model.summary()。

The following is the code for the compilation part of the model. We define the optimization method to follow as SGD and set the parameters.

以下是该模型的编译部分的代码。 我们定义遵循SGD的优化方法并设置参数。

# Compile sgd = SGD(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
checkpoint = keras.callbacks.ModelCheckpoint("Checkpoint/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0,
save_best_only=False, save_weights_only=False, mode='auto', period=1)

lr in SGD is the learning rate. Since this is a categorical classification, we use categorical_crossentropy as the loss function in model.compile. We set the optimizer to be sgd, the SGD object we have defined and set the evaluation metric to be accuracy.

SGD中的lr是学习率。 由于这是分类分类,因此我们将categorical_crossentropy用作model.compile的损失函数。 我们将优化器设置为sgd sgd定义的SGD对象,并将评估指标设置为准确性。

While using GPU, sometimes it may happen to interrupt its running. Using checkpoints is the best way to store the weights we had gotten up to the point of interruption, so that we may use them later. The first parameter is to set the place to store: save it as weights.{epoch:02d}-{val_loss:.2f}.hdf5 in the Checkpoints folder.

使用GPU时,有时可能会中断其运行。 使用检查点是存储权衡到中断点的权重的最佳方法,以便我们以后可以使用它们。 第一个参数是设置存储位置:将其保存为weights.{epoch:02d}-{val_loss:.2f}.hdf5位于Checkpoints文件夹中。

Finally, we save the model in the json format and weights in .h5 format. These are thus saved locally in the specified folders.

最后,我们将模型保存为json格式,并将权重保存为.h5格式。 因此,这些文件将本地保存在指定的文件夹中。

# serialize model to JSONmodel_json = model.to_json()with open("Weights_Full/model.json", "w") as json_file:    json_file.write(model_json)
# serialize weights to HDF5model.save_weights("Weights_Full/model_weights.h5")print("Saved model to disk")

Let’s look at the whole code of defining and training the network. Consider this as a separate file ‘training.py’.

让我们看一下定义和训练网络的整个代码。 将此视为单独的文件“ training.py”。

# training.py
from keras.optimizers import SGDfrom keras.models import Sequentialfrom keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2D
print("Imported Network Essentials")
# loading .npy datasetX_train=np.load(npy_data_path+"/train_set.npy")Y_train=np.load(npy_data_path+"/train_classes.npy")
X_valid=np.load(npy_data_path+"/validation_set.npy")Y_valid=np.load(npy_data_path+"/validation_classes.npy")
X_test=np.load(npy_data_path+"/test_set.npy")Y_test=np.load(npy_data_path+"/test_classes.npy")
X_test.shape
model = Sequential()# 1st Convolutional Layermodel.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11),strides=(4,4), padding='valid'))model.add(Activation('relu'))# Pooling model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisation before passing it to the next layermodel.add(BatchNormalization())
# 2nd Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisationmodel.add(BatchNormalization())
# 3rd Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Batch Normalisationmodel.add(BatchNormalization())
# 4th Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Batch Normalisationmodel.add(BatchNormalization())
# 5th Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisationmodel.add(BatchNormalization())
# Passing it to a dense layermodel.add(Flatten())# 1st Dense Layermodel.add(Dense(4096, input_shape=(224*224*3,)))model.add(Activation('relu'))# Add Dropout to prevent overfittingmodel.add(Dropout(0.4))# Batch Normalisationmodel.add(BatchNormalization())
# 2nd Dense Layermodel.add(Dense(4096))model.add(Activation('relu'))# Add Dropoutmodel.add(Dropout(0.6))# Batch Normalisationmodel.add(BatchNormalization())
# 3rd Dense Layermodel.add(Dense(1000))model.add(Activation('relu'))# Add Dropoutmodel.add(Dropout(0.5))# Batch Normalisationmodel.add(BatchNormalization())
# Output Layermodel.add(Dense(24))model.add(Activation('softmax'))
model.summary()
# (4) Compile sgd = SGD(lr=0.001)model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])checkpoint = keras.callbacks.ModelCheckpoint("Checkpoint/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)# (5) Trainmodel.fit(X_train/255.0, Y_train, batch_size=32, epochs=50, verbose=1,validation_data=(X_valid/255.0,Y_valid/255.0), shuffle=True,callbacks=[checkpoint])
# serialize model to JSONmodel_json = model.to_json()with open("Weights_Full/model.json", "w") as json_file:    json_file.write(model_json)# serialize weights to HDF5model.save_weights("Weights_Full/model_weights.h5")print("Saved model to disk")

When we run the training.py file, we get to see something as follows:

当我们运行training.py文件时,我们将看到以下内容:

For example, considering the first epoch of 12(Epoch 1/12):

例如,考虑第一个纪元12(纪元1/12):

  • it took 1852s to complete that epoch

    完成了那个时代花了1852年代
  • the training loss was 0.2441

    训练损失为0.2441
  • accuracy was 0.9098 on the validation data

    验证数据的准确性为0.9098
  • 0.0069 was the validation loss, and

    验证损失为0.0069,并且
  • 0.9969 was the validation accuracy.

    验证准确性为0.9969。

So based on these values, we know the parameters of which epochs are performing better, where to stop training, and how to tune the hyperparameter values.

因此,基于这些值,我们知道哪些时期的效果更好,在哪里停止训练以及如何调整超参数值的参数。

Now it’s time for testing!

现在该进行测试了!

# test.py
import warningswarnings.filterwarnings("ignore", category=DeprecationWarning) from keras.preprocessing import imageimport numpy as npfrom keras.models import model_from_jsonfrom sklearn.metrics import accuracy_score
# dimensions of our imagesimage_size = 224
# load the model in json formatwith open('Model/model.json', 'r') as f:    model = model_from_json(f.read())    model.summary()model.load_weights('Model/model_weights.h5')model.load_weights('Weights/weights.250-0.00.hdf5')
X_test=np.load("Numpy/test_set.npy")Y_test=np.load("Numpy/test_classes.npy")
Y_predict = model.predict(X_test)Y_predict = [np.argmax(r) for r in Y_predict]
Y_test = [np.argmax(r) for r in Y_test]
print("##################")acc_score = accuracy_score(Y_test, Y_predict)print("Accuracy: " + str(acc_score))print("##################")

From the above code, we load the saved model architecture and the best weights. Also, we load the .npy files (the Numpy form of the test set) and go for the prediction of these test set of images. In short, we just load the saved model architecture and assign it the learned weights.

从上面的代码,我们加载保存的模型架构和最佳权重。 同样,我们加载.npy文件(测试集的Numpy形式),并预测这些图像测试集。 简而言之,我们只是加载保存的模型架构并为其分配学习的权重。

Now the approximator function along with the learned coefficients (weights) is ready. We just need to test it by feeding the model with the test set images and evaluating its performance on this test set. One of the famous evaluation metrics is accuracy. The accuracy is given by accuracy_score of sklearn.metrics.

现在,近似器函数以及学习的系数(权重)已准备就绪。 我们只需要通过向模型提供测试集图像并评估该测试集的性能来对其进行测试。 著名的评估指标之一是准确性。 精度由accuracy_score给出 sklearn.metrics

Thank you for reading! Happy learning! :)

感谢您的阅读! 学习愉快! :)

翻译自:

卷积神经网络 手势识别

转载地址:http://verwd.baihongyu.com/

你可能感兴趣的文章
jquery easyui+ashx+三层框架实现增删改查
查看>>
fopen,fread和fwrite
查看>>
爱的十个秘密--9.承诺的力量
查看>>
【吵架不能吵半截】
查看>>
电子书下载:Silverlight 4: Problem – Design – Solution
查看>>
为Vmware硬盘减肥瘦身
查看>>
YTT的提问以及由此引出的未来规划之思考
查看>>
QTP8.2--安装流程
查看>>
一步一步点亮Led
查看>>
POJ 3630 Phone List [Trie]
查看>>
springmvc 可以设置 <welcome-file>test.do</welcome-file>
查看>>
多Form界面控件状态变化问题分析
查看>>
面试记-(1)
查看>>
压力测试 相关
查看>>
MyBatis 通过 BATCH 批量提交
查看>>
android update automatically ( android 自动升级)
查看>>
session cookie
查看>>
POJ 1222 EXTENDED LIGHTS OUT(翻转+二维开关问题)
查看>>
【BZOJ-4059】Non-boring sequences 线段树 + 扫描线 (正解暴力)
查看>>
几种简单的负载均衡算法及其Java代码实现
查看>>