Practical approach for using face recognition in software applications
face recognition has become a very important tool . Many interesting use cases have come up which uses for face recognition . One interesting use case , is that Chinese government is using face recognition to catch criminals .
Another use case , i encountered on the twitter . Face recognition was used to display departure gate details to airport travelers . Airport is Chengdu Shuangliu International Airport in china . Seems pretty amazing feat of technology.
Question is how do you use face recognition in application software . Approach can depends on the number of customers ( or faces ) among’st which you need to perform face recognition . Some of the software application can have millions of records ( faces ) . Some of the questions which are important to look at
- Is it practical to perform face recognition across millions of records ?
- Will face recognition work reliably on a new face/existing Face ?
- Will face recognition give quick response to user?
Here i restrict my discussion to newer set of techniques which are used for face recognition .
Algorithms based on Deep learning.
One of the most popular technique is to use deep learning for face recognition . Deep learning does not require any features defined upfront . All the features are learned by the network on its own .
Davis king’s dlib library has adopted the deep learning approach and developed ( modified and improved on existing architecture — more about that later ) a convolution neural network for face detection and recognition .
- dlib library takes a photo as input
- It runs face detector on it
- Once face is detected , it will use the photo to train the underlying CNN . output of this training is a 128 dimension vector ( or in other words — fingerprint of the photo )
Face recognition is a similarity check
Face recognition does not perform equality check .
Let us take a example , For any image you fed to the face recognition algorithm , it will analyse and give a vector back . Lets say you take a image of viraat kohli ( an Indian cricketer ) and fed into the algorithm . You might get below vector
vector of first image v= [ 0.05 , 0.06 , 0.07 ,0.08]
In reality , dlib library emits a vector which has 128 numbers ( called 128D vector ) but for simplification , vector v above is 4D .
Now if you take second image of viraat kohli and fed into the face recognition . please note that second image is not same as the first image .
In second photo graph viraat kohli is wearing a cap . Feeding this image to the neural network would also output a vector . Although this time , vector would be different as the image is different . It would be as follows .
Vector of second image would be = [ 0.06 , 0.06 , 0.07 ,0.07]
You can observe that both vectors are not same . If you feed another image to neural network ,lets say that of another popular cricketer ( Rohit sharma ) , network will emit another vector .
Lets say that the vector emitted is : [ 0.78 , 0.80 ,0.90 ,0.96] . If we want to check which photos are similar to each other , we need to simply check which vectors are closer to each other . Another way to state is that , we need to find distance between the vectors . One of the simplest distance formula would be euclidean distance.
Euclidean distance is the straight line distance between two points .Lets take a simple example . Distance between two points (2,-1) and (-1,2) would be as follows .
dist((2, -1), (-2, 2))
= √(2 — (-2))² + ((-1) — 2)²
= √(2 + 2)² + (-1–2)²
= √(4)² + (-3)² = √16 + 9 = √25 = 5.
The Distance Formula
Very often, especially when measuring the distance in the plane, we use the formula for the Euclidean distance…
Euclidean distance between 2 viraat kohli’s photo
=√(0.05–0.78)² + (0.06–0.80)² + (0.07–0.90)²+ (0.08–0.96)²
Euclidean distance between photo of viraat kohli and Rohit sharma
=√(0.05–0.06)² + (0.06–0.06)² + (0.07–0.07)²+ (0.08–0.07)²
Airport face detection or face detection anywhere
Simply these are the steps for face recognition
- Step1 : For each of the photo in your system ,Take the photo , use a pre-trained model ( like dlib library has or build your network first!!) , to get the vector
- Store the vector against each photo(customer record)
- Step2 : Now customer walks in , his/her image is taken , calculate the vector of his image.
- Step3 : Last and important step is find out which is the most matching photo , so that would mean comparing the vector against every customers vector.
But that is like comparing against all records in your database. Lets try to calculate this for the airport example ( roughly ) .
Airport in the example is Chengdu Shuangliu International Airport . Shuangliu Airport handled 42.2 million passengers in 2015. It was among world’s top 30 busiest airport in 2015 as per wikipedia information . Lets take the number of travelers in year to 50 million. In a month approximately 4.5 million .
- In a day approximately 150000 unique visitors .
- Most travelers arrive at max 3 to 4 hours before international flight . lets us assume a period of around 8 hours . So number of photos to be scanned for face recognition would be around 50000.
- Now vector comparison between 2 vectors ( euclidean distance ) takes approximately 10 millisecond ( atleast on my laptop for 128D vector ) If done serially that would take 500 secs or 8 minutes. That is too bad user experience . But assume you can parallise vector comparison . vector comparison between 2 photos have no dependency with each other and can be done simultaneously . If you have cloud computing power and if you spin up 100 instances of your application , definitely time would be reduced by 100 times . Time for matching faces come to 500/100 = 5 secs .
- Assume cloud infrastructure has GPU’s installed . (Again GPU speeds up training time of neural network mostly ) We can assume 10x performance improvement over regular instances . Time for matching faces can be 0.5secs . That is a good user experience.
- Now when would you calculate the vector for the photo for first time (Step1 above ) . May be when you come to check-in counter and when your passport gets scanned . This could take time like few seconds but the operation can be done asynchronously .
If you have limited computed power , then it makes sense to limit the number of vector comparison to a small number may be to few thousands. This would ensure that user experience for online user is not bad . How to limit this number depends on context of specific use case .
Information on the model
Dlib library internally uses a pre-trained model. This model is altered version of famous ResNet-34 architecture . Please refer to the blog by author for more details on the same .
If you want to use the model in node.js applications , please refer to the following node wrapper over dlib library . Although author ( Vincent Muhler) has moved onto a authoring a different library and different models ( face-api.js library and SSD Mobilenet model) , i still prefer dlib implementation for server side face recognition .
- Face recognition is similarity match .
- Face recognition requires vector comparison over large sets of vector and time taken by face recognition depends on the available computing power ( or cloud infrastructure and availability of GPU’s)
- If you have limited computing power , it would require that we limit the number of vector comparison for quick results .That could possibly require searching within a subset of entire records.