Real-Time Face Mask Detection

1Assistant Professor, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India 2-5Student, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------Abstract – For more than a year the entire world has been affected by coronavirus (COVID-19) disease which is caused by a newly discovered coronavirus. All the workplaces, educational institutions, the different government bodies, and also basic necessity providers have shut down their business. It has been a very tough phase for every individual and 80% of them were affected and lost their lives. India has developed various channels for progressing the work in software industries, educational institutions, local body government etc. But there are few professions that requires the physical presence of the employee. Apart from ensuring proper sanity, it is important to maintain physical distance from person to person in such cases. Our working model is one such model that assures the safety of the person. This face mask detector identifies multiple people at a time with the live capturing video. The key feature of the model is the alarm system which does not allow any person without a mask. This model works on the concepts of the Machine Learning (ML) platform. MobileNet V2 architecture plays an important role in producing maximum accuracy. Here we considered various types of CNN networks that can act as an important factor for the model considering the economical constraints possessed.


INTRODUCTION
In the year 2020, we have seen a drastic change due to the pandemic arrived has shown mankind series of events amongst which the COVID-19 pandemic is the most lifethreatening event which shook the world since the year began. It brought a drastic effect on the health of many people. COVID-19 has also brought many strict measures to be followed to prevent the spread of disease. From the very basic hygiene standards to the treatments in the hospitals, people are doing all they can for their safety, face masks are one of the personal protective and also essential equipment. People should wear face masks once they step out of their homes and authorities strictly ensured that people are wearing face masks while they are in groups and public places. To monitor people's basic safety, a strategy should be developed. A face mask detector system can be implemented to make this possible. Face mask recognition means is to check whether a person is wearing a mask or not. The first step is to recognize the face, which makes the invention idea divided into two parts: to detect faces and to detect masks on those faces. Firstly, face detection is one of the applications of object detection and can be used in many areas like security, biometrics, law enforcement, and more. There are many detection systems developed around the world and being implemented. However, all this science needs to be improved and provide a better because the world might face many more deadly diseases like a corona.

Objective
The main objective of this project is to take care of human health and control the spread of deadly viruses. The sudden attack of COVID-19 has taught us many things in which it includes wearing a mask and sanitizing our hands each now and then. In order to avoid its spread mask is the most essential object. So, to prevent people from skipping wearing masks this project has come to the light. In order to avoid manual detection and for better accuracy this project has been introduced. This enables the people to remind them of wearing their masks and also allows only when they wear their masks.
To summarise,  The objective of the project is to protect people from contagious diseases.
 Reduce the time and cost taken for manual detection.
 To avoid the spread of disease in public places like airports, bus stops, etc.

Scope of the Project
This project can be developed even more by adding the database to the model such that, it can be used in academic institutions on a large scale. Airports, traffic signals, and secured open places can be installed with such a working model. Since these systems are not introduced in India and also most of the work in industries are at stake, they can customize this basic and simply built model according to the requirements by deploying the software model using RaspberryPi.
To summarize,  Different position capturing of the face.
 Cognitive face expressions.
 Hardware model using RaspberryPi.

Data Pre-Processing
Data Preprocessing is the crucial steps to be carried for building the model. It is important to clean the dataset and make it is suitable for model building. In the case of images, we are converting the images into an array. First, the dataset has to be collected from the directory with categories (with mask and without mask). Keras is a neural network library. It provides a high-level API for building and training the models at ease. The conversion of the image into an array is done using img_to_array. Img_to_array comes from the keras.preprocessing.image.
After converting the image data into arrays, it is in the form of numerical data. Data Preprocessing is a method where all the data is generalized such that no redundancy will be present.
So now, the rest of the data should be in the same format to avoid redundancy. The rest of the categorical values has to be converted into 0's and 1's. This conversion can be done by using Label Binarizer.
These labels will be now converted into arrays using NumPy. Train-Test split will be carried out with 80% of the images to the training set and 20% of them to the testing set.

Training of the Dataset
Training the dataset is the important step for making the model. Here generally we use convolutional networks, but our idea is to neglect CNN we use for image processing and introduce MobileNet essentially what we do is usually after the image is processed as an array, we will send that into MobileNet and then do the maximum pooling.
After the maxpooling, we will connect it and flatten it, which later results in the required output. The added advantage of using MobileNet is that it is faster compared to CNN and uses fewer parameters. Initially learning rate (INIT_LR) is assigned less because the lesser the learning rate higher the accuracy (INIT_LR (1e-4)) 20 Epochs Batch size (BS) =32 By using MobileNet we are generating two types of models i) MobileNet model (which is going to be passed into the normal model that is first developed). To explain clearly, the first generated model is the output that we are passing to the normal model we are going to develop.
They can commonly be called as the head model and base model.
Image Data generator is also used which is used to generate many images to single images by adding properties like flipping, rotating the image, etc. These properties are beneficial for generating more images in the dataset.

Modelling
As MobileNet V2 generated a Base model, for further development we are using weights= ImageNet which has pre-trained models for images, so those weights will be initialized for a better model.
Input Tensor is the shape of the image that is going through which includes the size of the image and the number of channels.
Input Tensor =Input (Shape (224, 224, 3)) The number of channels in the images we are using is RED, GREEN, and BLUE (RGB).
Once the base model is built, we have to build the fully connected layer which is the head model. Already as mentioned, the Base model is passed through for Pooling, flattening, etc. to build the head model. We also used the goto activation function "Relu" for non-linear use cases and generally used for images.
We use dropout to avoid overfitting of models.
And our output is two layers which are with mask and without the mask and our activation function used here is SoftMax.
Our head model will be the output and the base model is the input which are the two parameters for the main model. The important measure taken here is to freeze the layers of the base model because they are just a replacement for CNN.
For the loss calculation, we are using binary_crossentropy and the optimizer we used here is Adam optimizer which is similar to Relu and always a go-to optimizer for image processing. For further training, we are using ImageDatagenerator which is used to generate more images. And we also used TestX and TestY for validation and also EPochs.

Classification Report
The below is the sample of the training report during the model building.

Fig: Training Report
Training report is about the loss, val_loss, accuracy and val_accuracy. Depending on the activation function these parameters may have a chance of variation. For example, when we use SoftMax activation function, val_loss and val_accuracy will start increasing simultaneously.
The validation set is used to evaluate the trained model. So, if the accuracy is increasing and val_accuracy is falling along with the increase in epoch, it means that the model is overfitting.
In the same way, the validation_loss and loss are the losses of the validation data and the training data.

Fig: Classification Report
Accuracy: It is the nearest value of the measured and true value.
Precision: Accuracy of positive predictions.
Precision= total positive/ (total positive + false positive) Recall: Fraction of positives that were correctly identified.
Recall= total positive/ (total positive + false negative) F1 Score: F1 scores are less than accurate measures as they are included in precision and recall values. Support: Support is the total occurrences in the specified dataset.

Model Usage in Real-Time Camera
We have developed a model for mask detection and now we need to have a model for face detection. So, the deep learning model will detect the mask and we use Open Cv for performing the camera operation. The two types of parameters Face net and Mask net.
Facenet contains the files that detect the face and Masknet has the datasets with mask and without the mask. Deep Neural Networks (DNN) has been developed in is the module developed by Cv2 which contains many methods in them. In order to load the camera, we use src=0, which is the source i.e., the main camera which is inbuilt into the system. After the face is detected, a frame will be loaded in the video with a width of 400. Then we detect the face and mask together as a single function which later returns the location and predictions.
Location is x and y coordinates of the rectangle surrounding the face and predictions are nothing but the accuracy of the model.

Visualization
The below are the results of the training data.

Conclusion
The goal of the project is to ensure that the working of a few industries and professions progress well without any economic loss. Since man-power is mandatory and safety should be ensured in these pandemic COVID-19 conditions, it is important and would be efficient to install such models at workplaces. Apart from this, public transportation like airports can also be made available with these systems for safety and sanity.