samedi 8 septembre 2012

Finger print matching with Locality Sensitive Hashing

Have you ever wondered how finger print is saved and matched against you at a later time?

For some very important official documents we dirty our fingers with black jelly and place it on a scanner and later with that finger print we can be fished out in any circumstance. In most crime scenes for example you see police and related trying to get hold of as many finger prints as they can. Of course we know they go search the finger prints against a huge data storage. But how does it work? I give you hint about one of the methods used in finger print matching

It is a technique called Locality Sensitive Hashing (LSH)(an Object Search technique). First of all how is a finger print structured that it can even be stored in the first place and how is it stored? Finger prints consist of what is called is minutiae, which I can't claim to really understand (it's a technical term in the finger print domain). Minutiae consists of elements like cross over, core, bifurcation, ridge ending, island, delta, pore,.. etc. You might want to refer to a biometrics book for more details. When you place your finger on a scan, a 'grid' object is created and the cells of this grid can contain minutiae or not.

The minutia of your finger print is stored in the cells of the grid. Many types of data structures can be used to store grid elements as 'objects'. As an example a cell can be marked 1 if it contains minutae or 0 if it doesn't. With this representation of your finger print in a grid whose cells contain your minutiae, matching a print simply means performing an object search on some huge data storage. Now how is this done with Locality Sensitive Hashing?

Let's 'define' (a real analysis student with eat me for this) a function f(x) = 1 if print x has minutiae match in some k cells of the grid. (print x can be a sample of your finger print)

Suppose p is the probability that a print x has minutiae at a particular position (cell), then

P(f(x) = 1) = pk

i.e the probability that the print x has minutiae in all k positions. For example if p = 0.3 and k = 4 then the probability that the print has minutiae in all 4 positions is 0.0081.

Note that the function f depends on the k cells you choose from the grid. If you choose other cells in the grid, you might have to change the function f.

Now suppose you left your print in a crime scene :P call it print y and the police took the print to the lab. Let q be the probability that print y will have minutiae in some k cells IF x does, then the probability that the function f gives a match for both prints is

P(f(x) = f(y) = 1) = (pq)^k.

If q = 0.9 (quite high) then (pq)k = 0.0053 (quite low) - might not be you :-D. You can't conclude a match at this point. The probability is quite low.

ut what if we took b of such functions (say b = 1024 - that's a lot of functions to define). Each function chosen correspond to particular k cells in the grid.

Now what is the chance that at least one of such functions out of the thousands actually produce a match in both prints? i.e what is the probability that a function will satisfy f(x) = f(y) almost surely for every x, y?

  • (pq)k is the probability that f(x) and f(y) are both 1 for some particular k cells.
  • 1 - (pq)k is the chance that f(x) and f(y) are not 1 for these particular k cells
  • (1 - (pq)k)b is the chance that none of the b functions selected actually match for each print.
  • 1 - (1 - (pq)k)b is the chance that at least one of the b functions matches which is 0.996 for p = 0.3, k = 4 and q = 0.9

So by using many of such functions, we get a high chance that at least one of them will give us a match IF the prints are from the same person. (Think of how model designers come up with such functions and how many are actually implemented).

What if prints x and y DON’T belong to the same person? We have to figure out the probability that 2 prints, gotten at random, give a match with one of the b functions f. Prints can have a match at random if both of them have minutiae in same k positions. So what is the probability that this can happen?

  • the probability that both of them have minutiae in k positions at random is pk x pk = p2k (one for x and one for y)
  • the chance that they don't have minutiae in these k positions is (1 - p2k)
  • the chance that both prints don't match for any of the b functions is (1 - p2k)b
  • and finally the chance that at least one of the b functions gives a match is 1 - (1 - p2k)b, which is 0.063 with our values (not bad!)

So what we see is that if the 2 prints are from the same person, LSH procedure, maps them onto the same region with a very high probability but if they are NOT from the same person then the chance of a wrong match is only about 6%. We can reduce this probability by increasing b but so long as it is not 0% this method is still not PERFECT! and so is the world :P.

Aucun commentaire:

Enregistrer un commentaire