Impression Evidence: Strengthening the Disciplines of Fingerprints, Firearms, Footwear, and Other Pattern and Impression Sciences Through Research

June 1, 2010

Speaking: Gerry LaPorte, National Institute of Justice; and Tom Busey, Professor of Cognitive Science, Department of Psychological and Brain Sciences, Indiana University, Bloomington

Speaking: Lynn Abbott, Associate Professor, Department of Electrical and Computer Engineering, Virginia Tech

Sargur Srihari, SUNY Distinguished Professor, Department of Computer Science and Engineering, University at Buffalo, The State University of New York

See reuse policy

Speakers

Gerry LaPorte, National Institute of Justice; Tom Busey, Professor of Cognitive Science, Indiana University; Lynn Abbott, Associate Professor, Bradley Department of Electrical and Computer Engineering; Sargur Srihari, SUNY Distinguished Professor, University at Buffalo, State University of New York

Forensic examinations involving specific forensic science disciplines are typically dependent upon qualitative analyses and expert interpretation of observed patterns based on a scientific foundation, rather than quantitative results. These disciplines include latent fingerprints, questioned documents, footwear, and other forms of impression and pattern evidence. This panel will highlight current fundamental research needs in the areas of impression evidence examination and how NIJ is addressing those needs through its forensic research and development portfolio within the Office of Investigative and Forensic Sciences.

Gerry LaPorte: First of all, it's my pleasure to be moderating this panel. We have three highly esteemed experts in various fields of the … as what I'm … the terminology that I'm starting to hear quite a bit now is the pattern and impression evidence area.

So our first speaker today is Dr. Busey. Tom Busey is a professor of cognitive science at Indiana University in Bloomington. He has addressed the psychological aspects of latent print identification for the past five years with support from NIJ. Much of his current work uses eye-tracking methodologies to determine the features that experts use when individualizing or excluding latent prints. Dr. Busey received his doctorate from the University of Washington.

Tom Busey: Thank you, and I guess this is your last of your fourth session today, somewhat of a relief.

So, despite the fact that I live in a fly over state in Indiana here, we actually have a fairly strong quantitative group and a very strong school of informatics in computer science, and my colleague, Chen Yu, is a computer scientist, and we work with John Vanderkolk, who is with the Indiana State Police up in Fort Wayne.

And even though I'm in the Department of Psychology, I like to tell people I'm not the kind of psychologist who cares how you feel. I mean, if you tell me how you feel, I'll look interested, but, deep inside, I'm not caring. Instead, I'm the kind of —

[Laughter.]

Busey: I'm the kind of psychologist who studies how experts go through and match what can be very partial and degraded latent prints to inked prints collected from a known source. Now, this is a very difficult problem, and computers have made a go at this, but despite what you see on CSI, computers mainly play a role in triage, in finding candidate matches. And almost all of the evidence that's presented in court is done on the basis of a human expert comparing these two and rendering their opinion.

Now, there's good reason for this expertise, this superiority of the human visual system. In fact, you've probably proven this expertise yourself over machines many times through these CAPTCHAs. You're familiar with these where you have to fill them out to prove that you're actually not a computer, and, in fact, this was the basis for a recent comic that uses robots as their chief characters. One of the robots has a CAPTCHA for a tattoo, which, of course, he can't read. So you guys are in on the joke because you're not a computer. You can go through and read this and figure out what it says, and even though this is something that computers are continually getting better at — in fact, my colleague, Hari Srihari, here could probably develop algorithms that would read these kinds of CAPTCHAs.

By and large, humans have had a remarkable ability to succeed where computers have failed up until now, and the idea behind my research is to try to understand how the human visual system works, understand its elements of expertise, and then try to apply them to machine systems.

So human experts outperform kermit automated systems in many domains, not just in fingerprints, but there are lots of advantages to machines. They have known algorithms, except in the case of AFIS, where they're apparently still proprietary. They provide for standardization. They can be used to provide probabilities of random correspondence, and, of course, they can be incredibly fast. And so these are things, these are advantages that can be used. Once you develop expertise from humans and try to understand it, you can bring those into the machine world.

So, to do this, we're proposing to reverse-engineer the human visual system, study it and try to glean its secrets, and then apply those using the kind of language that machines understand.

Well, this is not inconceivable because, in some sense, the human brain can be thought of as a computing device. In fact, your retina, the back of your eye that actually does the light capture and the first steps of seeing, is actually made of the same stuff that your brain is. It's a little piece of brain tissue that was segregated out early on in development and brought out as part of your eye. So your eye itself is a little computing machine.

So we want to know what makes your human visual system and your eyes so good. Well, one point, one starting point might just say, “What do you use, experts; how do you go about this task?” It turns out to be a good starting point, but language is a very poor representation of what can be a very rich perceptual experience.

So imagine doing a latent-ink comparison over the phone here. So you have this print right here. Your colleague on the other end of the telephone has this print. Imagine trying to describe this in such a way that they can decide whether this matches or not. It's a very difficult process to imagine doing, and it illustrates how poor language is as a means of describing perceptual information, at least complex perceptual information here.

Another problem is that some processes to perception simply reside below the level of consciousness. We can't change some of our perceptions, even though we know that reality differs from our perceptions.

So this is a very famous illusion here. I don't know if it's going to work at this distance. What you may find, if you stare in the middle here, is that these outer ones start moving on you. It can be very disquieting to see this, and you can't not see it happen. You can't tell yourself just because you know it's static that the motion isn't there.

So, usually, our percepts are trustworthy. There are some situations where they're not, which is another reason to study human performance.

In addition, experts sometimes report having an “a-ha” moment where they just look at these two prints and they know they match, and then they have to go through and document it, of course, but the process of this immediate spark of recognition is way too fast to come from a serial, sort of point-by-point comparison.

So, in order to understand how experts are doing this, we are using eye-tracking technology. Here is an example of our eye tracker here. There is one camera on the front here, which looks at the eye, and there's another camera up here on the top of the glasses that looks out on the scene. And we end up with two images right here. You have an image of the eye, and we can capture the pupil here and what's known as the “cornea reflection,” and then we have another camera that tells us where the scene is relative to the head, and there's a very straightforward calibration procedure that allows us to look and see where the eye is relative to the world. So we can see what people are looking at.

We can estimate the pupil and the cornea reflection with our software, which tells us where the eye is gazing, and all of this is wrapped up in a software package that's free and open source called Expert Eyes that allows you to go through and collect data and analyze it. And we have data for over a hundred subjects now that we've analyzed, and I'll talk about a subset of that data today.

Now, we want controlled studies that allow experts to reveal those regions that they consider most diagnostic, and, therefore, we fix the viewing duration of each pair of our prints to about 20 seconds to give the experts just enough time to tell us what they consider to be the most relevant or the most diagnostic.

So here's an example movie of our eye tracker at work. Here's the eye moving up here, and here are the crosses here as our estimate of where they're looking in this gaze. When it jumps there, that's the blink, and that's, of course, not relevant data for us. So you can really get a sense of what features they're relying on when they're going through and doing this particular task. So that was one trial worth of data.

Here's another visualization here. The blue is a trace of the overall eye data within the trial, and then the red plus signs that are filling in, those are the actual moment-by-moment fixations. It's kind of actually hard to watch that in real time because you're essentially watching somebody else's eyes, and it's tough to keep up with other people's eyes.

So here's another visualization here. The green circles are the fixations, and then the red is the current fixation. And the size of that tells you how long they spend at that location. Even with this visualization, I had to slow it down to half speed to enable you to keep up. So this gives you an idea of how they're going through each location and determining what features they think are most relevant.

So our system is fairly accurate. The felvia [ph] is about the size of two thumbnails' width at arm's length. So, if you think about how wide your thumbnails are, that's about where you're acquiring information. That's the region of fairly high acuity of the visual system. The resolution falls off rapidly after that, out in the periphery.

And our eye-tracking system can resolve eye gaze locations down to a half a thumbnail or about .5 degrees of visual angle. So we're down to the point where we're in the region and below the region of where people are acquiring their information.

And this, in general, from most of our experiments, corresponds to about one to four ridge widths, depending on the size of the ridges that we display.

Now, the first step here, if you're going to try to take data from experts and use it to improve machine systems, is first verify that your experts have some kind of expertise. There hasn't been a lot of work done in eye-tracking work with experts, and so this was our sort of first step to make sure these experts really do have this special expertise.

So we did our first experiment with 12 experts and 12 novices. Each one did about 35 pairs of images, and it's useful to talk about the behavioral accuracy first because we want to make sure that they're actually doing better than our novices. What we find is that we have three categories of responses. So they can either say “yes, there's a match”; “no, it's not a match”; or we gave them an option of “too soon to tell.” We found that when you pushed our experts after only 20 seconds to make a decision, they really were uncomfortable with that because for an expert, if you make a wrong decision, it's a career-ending move.

So we added this “too soon to tell” option, and we found that many of our experts are fairly conservative. They're not likely after just 20 seconds to move into this “yes” category here. So a lot of them are saying “too soon to tell.” They make very few errors when it comes to misses relative to our novices, and most importantly, they make no erroneous identifications. None of our 12 subjects on any of our 35 images made erroneous identifications; whereas the novices, 25 percent of the time, it was a true nonmatch. It should have been an exclusion. They're saying, “yes, it's a match.” This is a huge difference, and that really drives the performance accuracy difference between the two groups. So the experts are doing much better than the novices.

The second question we asked is whether experts as a group are consistent with each other. We'd like to think that there's some kind of implicit standard that the experts are relying on, a set of features that they all agree that are relevant for the task. We could ask do they tend to visit the same regions or locations.

And I'm going to quantify that just briefly for you by imagining that your eyes shoot out laser beams, and when you're kids, I'm sure you imagine this. You're shooting out laser beams, and where the gaze hits, the print turns dark red here, yellow and then red. So this is a way of visualizing where the eye gaze falls, and we can compare the patches from one expert to another expert and one novice to another novice to ask are the experts more consistent as a group or are the novices more consistent as a group.

And it turns out that the experts as a group tend to be much more consistent in terms of the regions that they visit, and at least in this constrained viewing experiment, experts seem to have an implicit set of features that they tend to seek out and tend to all look at.

Here's an example of why this is happening. This is kind of a dim slide from this projector, but most of the experts in green here are falling in this region here and also in the analogous region on this side, whereas the novices in red are all over the place.

Here's another way of viewing what features the experts are relying on. We can use automated software to identify the locations of minutia, which are marked in green pluses here, and if we look at regions surrounding those of some arbitrary circle size and ask within each fixation how many of these minutia fall inside of it, do the experts tend to look at more minutia than novices? A lot of the AFIS systems are relying primarily on the locations of minutia. So you could ask, “When experts are doing these latent-ink comparisons, do they rely on minutia?”

Well, it turns out that when you count the number of minutia inside a circle of fixed size centered on each fixation — and the size of the circle doesn't matter, it turns out — and ask whether experts or novices have more nearby minutia, it turns out in these latent prints, there's no difference between the experts and novices in terms of the number of minutia that they visit. And this suggests that for latent inked comparisons, the minutia may represent a relatively small part of the available information. Experts might be relying on ridge flow or curvature or level three detail or something else when they have a relatively small patch of latent impressions.

So this highlights the advantages that humans have over existing computer models. They have the ability when they don't have a lot of minutia to move to other levels of detail, and to do that fairly easily and quickly; that's something that the computer system is going to have to — if it can process something like level three detail — it has to decide whether it should represent that or minutia detail in its decision and how to weight those.

OK. What about for clean prints? Well, we used a second set of clean prints, and we found that experts now are much more likely to move their eyes to locations closer to minutia. So, if you give them a large wealth of information, they're likely to move their eyes more to minutia.

And here it's obvious why. With these clean prints, the experts tend to move their eyes to regions near the core, leading from the core and also the delta. So the experts are in red here, the novices are in blue and the minutia are in green. So the novices tend to be sort of all over the place, and the experts tend to be down in these regions here.

So a third analysis that we did was to look to see whether experts move their eyes more quickly to matching locations. So we asked one of our experts to simply place locations of correspondence on these two prints, and that's what's shown connected by lines here. And then we can imagine that there is a trace starting right here and ending in a fixation right here, and it's the last fixation before they move with their eyes to the other side, so when the eyes moved over here looking for a matching corresponding location, which happens to be this green dot right here, and then they sort of wander off, having found it as close as they're going to get.

So we can ask how close do subjects get to this matching location or over time how close are they getting at each point in time to this matching location.

Well, it turns out that this graph here illustrates the time since they moved their eyes from the left side to the right side. That's time zero. And the Y axis here is the distance from what is the true matching location, as determined by where they started on the left-hand side, what's the right-hand side matching location, and we find that the experts almost immediately get closer and stay closer to the matching location. Ultimately, they are much closer to the matching location than the novices are.

OK. So experts are better than novices. They are more consistent as a group in terms of the features selected, and they have a tendency to rely on more minutia when looked at inked prints.

But the final question I want to address in the last few minutes is what features they rely on, because minutia may be only part of the story. At least for latent prints, it doesn't look like experts are using it more than novices.

So this is a more complicated analysis here, but basically what we do is we start with the fixations, which are shown in red here, and we'll crop out regions of pixels surrounding each fixation. So here's a little crop of pixels. Here's another little crop of pixels. We can get lots of these. In fact, we get 40,000 total over all of our database here, and then we can construct … do what's called “dimensionality reduction,” which basically illustrates the fundamental building blocks of perception. These are sort of the alphabet in perceptual language of fingerprints, and the amazing thing is that you can take relatively few basis functions here and reconstruct with fairly high accuracy the original image patches.

So here's an original image patch here, and here is the reconstruction of that here. It's a little bit blurry, but it's remarkable, the fact that you can take 40,000 image patches and with just 200 basis sets reconstruct any one with fairly high accuracy. It's actually not that unfamiliar or unrelated to something like JPEG compression, where you can see enormous redundancy reductions.

So what we'd like to do with this analysis is to figure out where … what features or combinations of these basis sets are used by experts as they go through and look for correspondences between prints, and this is really the heart of our grant project over the next couple years is to identify what combinations of these features are used by experts.

You can see that they're already starting to see some complex features like minutia coming out of these, but you also see lots of categories where it's just ridge flow, and so, by looking at the combinations of these features, we can really see which of these distinguished between experts and novices and ultimately — and I'm going to skip all of this in the interest of time — ultimately try to figure out what our features will look like that experts are using. And this is really going to lead to a situation where you can essentially look through the eyes of the experts and use this to filter images, so that we can highlight regions in new prints that we think experts might find most diagnostic.

So implications for practice here, just to finish up: machine learning results are directly applicable to computer-based systems like AFIS because we use this same language that they're using. We use machine learning approaches, which are the same kinds of tools that are used in systems like AFIS.

Prints filtered by this basis set that I just showed you can assist the identification process, much like we have computer-enhanced mammography examinations now in hospitals. The feature set that I just showed you provides a quantification of factors, such as feature rarity, something I think Hari will talk about as well, and it can be combined with likelihood models to provide probabilistic statements, much like we have in DNA evidence now.

But, first, we're going to need to collect the relevant data from experts and infer appropriate feature set, and that's really what we're doing in this project.

Thank you.

[Applause.]

Gerry LaPorte: Our next speaker is Dr. Lynn Abbott, and Dr. Abbott is an associate professor at the Bradley Department of Electrical and Computer Engineering at Virginia Polytechnic University, a.k.a. Virginia Tech. Dr. Abbott has more than 20 years of experience in the area of image analysis, and he's the co-principal investigator on an NIJ-funded project titled “Establishing the Quantitative Basis for Sufficiency Thresholds and Metrics for Friction Ridge Pattern Detail Quality and the Foundation for a Standard.” This was a grant that was … it's a 2009 grant, so they're not quite … I think we just got everything, all the logistics of the grant, completed probably this past February or March, so we're early into the process.

Lynn Abbott: Thank you very much. Yes, primarily I'll be describing a project that we just got under way. We had our first meeting with students last January. So we're a few months under way, and so my plan is to talk about our approach, some of the philosophy we have behind our approach.

And it was very interesting for me to hear the work of Tom a few minutes ago because he talked about human visualization. I come from what we call “machine vision,” computer vision, much of which is motivated by this wonderful existence proof, which is what human visual systems can do, biological vision systems in general, and some things machine vision systems can do much better. They're much more patient, for example, but there are many mysteries left in terms of what human and other biological systems are capable of doing.

And so the high-level objective of our project is to establish some sort of quantitative basis that can be used to develop some sort of sufficiency analysis for fingerprint image quality, friction ridge image quality. The motivation, well, the Daubert case provided the fundamental motivation, but, in essence, what we are trying to do long term is provide some sort of scientific measure of confidence in an examiner's decision with an emphasis on latent prints because, after all, this is an NIJ-funded project. We are interested in and motivated by many of the AFIS work, but our goal on this particular project is to somehow come up with a way to provide that confidence measure, somewhat semiautomatically but not necessarily completely automatically.

A little bit about the project team: We consider our ringleader to be the fellow in the lower left, Randy Murch, a fellow who's named and is also on the left of that picture. In that photograph, he was demonstrating to us how we could go about collecting some prints for use in our own experiments, and working with him are a couple of students and Ed Fox, who is a professor of computer science. Also in our project, Michael Hsiao, who is a well-known expert in the area of digital circuit design and testing.

Let's see. Bruce Budowle is here, but he's not in this room. He's manning a station outside with a poster presentation that he has. Both Bruce and Randy have several decades of experience each in the FBI world, and so they represent … they bring to our project a lot of expertise on the law enforcement side. The rest of us are really computer guys who are learning from them but bring, hopefully, some expertise into image analysis and matching and so forth.

When I sat down to make up the talk, I said, “Why don't I compare what I know about the AFIS world, Automatic Fingerprint Identification Systems, versus the latent print world?” When we talk about so-called “clean prints,” often called “plain-to-rolled” or 10 print images, they often are high quality, at least relatively high quality compared to many of the latent cases. And the reason is latent prints are inadvertent almost by definition. So they tend to be blurred or partial, and there's been not very much works to the extent we can determine in analyzing such poor-quality images, at least from a point of view of analysis of fingerprint quality.

The AFIS systems, to the extent we can learn about them, emphasize minutia, ridge endings, bifurcations; whereas the experts, such as places like SWG, SWGFAST and other sources like that, constantly emphasis multiple levels, especially level one and eventually level three as well.

Also, many of the commercial systems, we're told — they don't tell us details — typically use binarized images. So every pixel is either on or off, as shown by the middle image; whereas we would much prefer in my world to work with grayscale images. Each pixel is at least eight bits or so, typically, of information.

One thing also in discussing, in our discussion with experts, are the level two features alone are insufficient, and here is a contrived example which we are told mimics reality. It's possible to find a lot of level two details which match perfectly, but when you fill in the level one details, the ridges, there's an obvious mismatch in some cases, and so we've been told there are examples like this. And so reliance on level two alone is not good if we're talking about quality in some sort of courtroom environment.

And so one thing we have done in the last few months is to investigate what can be done using level two minutia, and so we are not trying to reinvent the wheel here. So we very happily adopted the NIST software, the NBIS package, which contains this procedure called MINDTCT or minutia detection, and you see the results of that on the left, if you can make them out. They're little colored squares indicating where the minutia points sit.

And then, because we are interested in grayscale approaches, we found this paper by Maio and Maltoni, which talks about analyzing the actual ridges from a grayscale perspective in an effort to find the minutia points.

One of our students implemented this work and came up with the dots that you see there. We ignore those false matches on the far left because there's extra work needed to identify the edges of the useful image.

If we zoom in to these yellow windows and take a very careful look, there are, in fact, quite a few small discrepancies, and I don't know if my pointer works here, but here is one that sort of stands out at me. This is from the NIST software. Supposedly, it's a ridge ending. It's out in the middle of nowhere. It's between two ridges, and that's more common than you might think.

There's another case … well, in the interest of … our grayscale method did find a much better localization in that.

Also, there's a lot of discussion in our group concerning what are called “lakes.” This is a lake area here. It's between two closely spaced minutia bifurcations, and if you carefully try to take the placement there, there might be some … you might argue that there's some better placement possible. We think our placement is a little better here. So there certainly are improvements that could be made, and no doubt, the commercial world has made such improvements, but they won't give us their details.

Now, here is another experiment. We sent one of our students on to an interesting experiment where we said, “Take this NIST database that's called Special Database 27,” and we're very happy to have it. It contains ground truth, as indicated by several human examiners, for approximately two or three hundred cases, and so we picked one particular case.

The human examiner, according to the database, identified 16 minutia in the left image, 98 in the right, and from the 16 on the left came up … was able to make … identify 14 matches between those two images.

So we said to ourselves, “Well, what could the automatic software do?” We simply provided the left image. As messy as it is, we provided it to … sent it into the minutia detection software, and it popped out with what it estimated as 500 or so minutia. The clean image on the right, it came up with about half that number, and in this 500 on the left, this included the 14 found by the human examiner, so we said, “Well, let's see how good those particular 14 matches seem to be, and how can we do that quantitatively?”

Well, the idea that we got after looking much more closely than we ever especially wanted to at the NIST software, we found step five, which is called “remove false minutia.” Then we asked ourself, “Well, what would cause the NIST software to remove some minutia?” and we found that in order to make its decision, it employs nine different independent software filters, subroutines, and based on the results of those subroutines, it decides whether or not to delete one of those minutia points from further consideration.

So we said to ourselves, “If it were ever to decide to remove a point, well, it must not have been a very good point to begin with, and so why don't we investigate that a little further?” Well, the student liked the idea of a straightforward task, and so the task we gave him was to consider every possible combination of those nine independent filters. If you do the math to the power 9, 512 different on/off settings for these different minutia; luckily, he's good at automating these things. He wrote a quick script, set the software running for a few days, and the underlying idea was that higher quality minutia ought to survive processing even under all these different cases of filter combinations.

And these 14 matched points, as determined by the human examiner, were assessed based on whether those particular 14 points survived all these different filter on/off conditions, and it turns out that eight of those 14 were not modified in any way by the processing, even though the NIST NBIS score associated with those points was low, and so that's the top row on that, the table at the bottom.

So, just our intuitive idea of how these points should behave in the presence of different filtering in this case would say that we should give these points high scores, these particular eight points high scores, whereas, in one case, the NBIS said no, it's a low score based on image contrast and proximity of other points and so forth. And so, at least in a couple of cases, we found where our initial thoughts differed from what the NBIS automatic score produced, so that was one study.

Partly based on these ideas and also based on our interviews with some human examiners, we wanted to study ridge detection more thoroughly, and so we went back to that paper by Maio and Maltoni. And, as I said, my student implemented that approach and detected all the colorful ridges that you see on the right from the software he implemented. So taking the image on the left, he can automatically detect the ridges on the right, and if we want, we can overlay them to see how well they match the original image.

And based on that, he has developed some early results. He's implemented a matching approach based on complete ridges, and so this is fairly hot off the press. Just this week — well, late last week — he said, “Here's what I got so far.” So, out of 20 ridges … out of 26 ridges he automatically detected in the left image, his software fairly quickly detected 20 correct matches on the right image, and this is using a fairly straightforward cross-correlation approach. Because of time, I won't go into great detail on that.

Another area of interest deals with blurs, smears — you can use other names for this — and we're intrigued by the fact that some of this can be automated. We can synthetically do some blurring, and as soon as we can do it synthetically, we can generate lots of test cases and see how well some of the standard matching techniques will work.

So here, actually, is an ink print created by one of us just to show the direction we want to go in. So you can see progressively worse blurring from right to left in this case.

Synthetically, the same kind of blurring is fairly easy to generate, and we can argue later on about how closely this blurring matches the physical ink blurring, but with a fairly simple linear filter, we can do 5 , 10 , 15 pixel blurring on the right. And this is as if we've just pulled the finger downward on the page. That's all it's supposed to resemble.

And based on this, one of our students took the blurred images, fed them into the NIST software to see how well does it produce … how well does it detect these minutia points and then how well are matches performed. And in this particular study, in this one case, he found that up to, oh, about seven-pixel blurs or so, we were still getting very good matches, and so we found that to be a promising step. With some small amount of blur, we were getting just as good results as if there was no blur performed.

And that led us into other discussions of just what could we do not just with blur but with partial prints. What you see here is an upside-down plastic serving tray from Kmart or Walmart or somewhere, and it's got lots of ridges on it that lead to missing points, missing areas in the ink prints you see on the right. And so one thing we are debating among ourselves now is just what could we do to handle cases like this where there's lots of … where the prints are only partial in a very dramatic way.

And just the initial idea we have so far is to subdivide the image. Here, we happen to show a three-by-three subdivision of the image and as shown on the left, and then you see some of those subdivisions join together going on the right. And the idea is to see, well, how many of these need to be joined in able to get a reasonable number of points that can be matched.

And so there's some graphs here I'll skip because Gerry is tugging at my elbow, and so let me lead.

We were also doing some work modeling the shape of the finger, some database work.

Let me close with a couple of observations. We thrive on databases. To the extent people can provide or organizations can provide databases with some sort of ground truth, that is absolutely invaluable to this kind of work, and so we are very happy to have the NIST database SD27 available for purchase to us. So we've been making heavy use of that.

On the other hand, vendors have seemed to have been fairly reluctant to give us any information at all. We decided we'd like to buy a live scan system. So I spent 10 minutes on Google to find out, well, what are some of the major vendors. I called them up — well, I had my student; I tried to delegate — had him call them up, and one person said, “Well, just send me your questions in e mail, and I'll get back to you.” Well, we're still waiting for her to get back to me, even after I sent e mail. And a vice president of the company called me and said, “Well, we'll give you that information really soon,” which we're still waiting for that.

The other person we got did give us information, but I can see why he was a little hesitant because there are all kinds of unusual charges. They wanted to charge us $500 just to install software on our laptop, and even though I could … if I buy a very complicated software system from anywhere else, I could click the mouse a couple of times, and it installs somehow. I don't know why they need to do this.

They claim that one of the main reasons that they need to install the software is that there are so many different standards among different enforcement agencies, and so he said, “Well, there's 20 different standards.” I'm not sure what the number is, but he gave as his example, agency number one wants the name as John, space, Doe; the other person wants the name as Doe, comma, John. And all these slight inconsistencies among the way data are reported gives them a great excuse to charge more for software. That's the way I see it.

And so there's a bunch of references here that I relied on in doing work so far. I'll be happy to talk more offline.

[Applause.]

Gerry LaPorte: Our next speaker is Dr. Sargur Srihari, and he is a distinguished professor in the Department of Computer Science and Engineering at the University of Buffalo, The State University of New York. Dr. Srihari is the founding director of the Center of Excellence for Document Analysis and Recognition. His work led to the first automated system for reading handwritten postal addresses in the world. An author of more than 300 papers, three books and seven U.S. patents, he served on the Committee on Identifying the Needs of the Forensic Science Community with the National Academy of Sciences. So he's one of those NAS members.

Sargur Srihari: Yeah. I was one of the quiet guys there.

[Laughter.]

LaPorte: Dr. Srihari is a fellow of the Institute of Electrical and Electronics Engineers and of the International Association for Pattern Recognition. And, unfortunately, me being a University of Michigan fan, Dr. Srihari is a distinguished alumnus of the Ohio State — I'm sorry — “The” Ohio State University College of Engineering.

[Applause.]

Srihari: Thank you, Gerry. I hate to sit down and talk, but I guess I'll try it.

So I will try and speak on different types of impression evidence than the previous talks on fingerprints. Of course, fingerprints, I think, probably are the most important of the impression evidences, but there are also these other areas, handwriting and footwear evidence, and I'll speak about the kind of research we are doing, particularly for the characterization of uncertainty. And, specifically, I'll talk about how does one go about characterizing the rarity, which is the kind of thing that the DNA evidence does. You know, people are able to say that the chance that this is matched is one in such a large number that it's got to be this person, that kind of thing, so can this be done with things like handwriting and with footwear, and so where do we begin?

My first slide here is just forensic modalities. We talk about impression evidence. I think for the forensic community, this is all quite well known. You know, this is just a tree describing impression evidence. It includes latent prints or the kind of forensic evidence we saw earlier on. It also includes handwriting and question documents, printers, footwear, tire tread, firearms, tool marks, et cetera. So many of these shows of characterizing uncertainty, I think, is common to all of these fields. But my talk is going to be largely about the handwriting and the shoeprint area now.

Now, why do we need uncertainty models at all? Why do we need to say how sure are we? Courts have long allowed testimony of certain individualization or exclusion or inconclusive. So it's possible for testimony to say that this is individualized; this is this person and no other or exclusion as well. But, of course, as we have seen over the last many years, several exonerations have been taken place and misidentifications and so on, so that it caused a lot of concern, and among other things, that also led to the formation of the NAS committee, and the NAS committee recommendations include a study of these kinds of issues. So those are some of the background as to why academics should do some research on this issue of how to characterize uncertainty.

On the other hand, one can say there's nothing new about uncertainty. We heard today the lunch speaker talked about Benjamin Franklin. Here is another quote of Benjamin Franklin, was that nothing is certain but death and taxes. All right. So, also, we can say in forensic testimony, also, it is not certain; nothing is certain about individualized exclusion. There's got to be some level of uncertainty about it, and it's better to be able to say that in some quantitative manner.

So this issue of expressing uncertainty itself has a lot of some uncertainty in it. How do you go about doing it? One can take measurements from evidence of the kind we talked about in fingerprints. I'll now talk after this slide about handwriting. One can take the measurements from that, and one can compute from that something called “rarity,” which is a joint probability of those exact features.

Let me see if I can use this. Yeah, rarity.

So this will be based on … you make some measurements as how unusual is this thing, how unusual is this piece of handwriting or how unusual is this particular structure of the fingerprint. So you can call that as a rarity.

Another thing is when you have a known and the evidence, one can compare the two together and say how similar are they, how sure do you think these two are one in the same, came from the same source, and we can call that as “similarity uncertainty.”

Of course, one can say, “Well, you know, this could also be combined together and expressed as one level of uncertainty,” but there really are two different things here, the rarity of the basic structure itself and then what is the uncertainty in the comparison. So there are probability models for each of these things.

And similarity usually depends on a particular similarity measure. How do you measure the two things are similar? What is the distance, in Euclidean space or whatever space? What is your measurement of similarity? With respect to the NIST software that talked about earlier on, that is what is called a [inaudible] measure. So it gives you a particular score that says how similar are these two things.

And, of course, once you get into similarity, one can talk about likelihood ratios coming under the hypothesis that they are from the same probability distribution of what is the similarity score; how is it distributed when the two came from the same or when the two came from different sources; what is the distribution; and one can use those two probability distributions to compute what's called a “likelihood ratio” and express that as a ratio of the prosecution hypothesis to the defense hypothesis and say that is the strength of the conviction here. Of course, one can model it in many statistical ways using what are called less-frequent Bayesian models and so on. All right.

So let me now get into how these kinds of ideas can be useful in an area such as handwriting, in handwriting comparison. Here, we have the classic handwriting comparison case. We have a known piece of handwriting. It comes from a database we created of handwriting samples, and then we also have here a question, a piece of handwriting. All right.

So, when comparing these two, we have letter shapes. We have shapes of piece of letters we call as “bigrams”. We have shapes of words and things like that. One could compare all these things to see how similar is the writing. One could also look at what are called as “macro features” or pictorial features looking at the spacing between the lines and the words and so on. So lots and lots of things a question document examiner uses to determine whether these two samples were written by the same person or not.

Well, we took on this issue. Ideas of expressing uncertainty in handwriting have been around, and the question document examiners have thought about this, but they really did not have the computational tools available. It's a massive task to be able to compute all the underlying probabilities to be able to say what is the probability of these two structures being one and the same.

So we first have to begin with what are the features of handwriting. So here is an example of a “bigram,” I call it. It's the letter pair "t h". So document examiners have thought about this — how do you describe this "t h"? So here is a table on the right-hand side.

So, you know, how do they characterize the shape of a t h? Height relationship between t to h, is t shorter, t is even, t is taller or no set pattern. Shape of loop of h, is this loop of h retraced staff, curved right, straight left, curved left, straight right, both sides curved, no fixed pattern. Shape of arch of h, I guess it's this one here, rounded arch, pointed arch, no set pattern. Height of cross on t is in upper half, in lower half, cross about, no fixed pattern. Baseline of h is slanting upward, slanting downward, even, no set pattern. Shape of t, single stroke, looped, closed, mixture of shapes. This is the kind of thing a document examiner would look at when he looks at this t, he or she looks at this "t h" and saying that is the basic description.

And we chose here "t h" because "t h" happens to be the most commonly occurring pair of letters in the English language. Second most is "e r". Third is "o n," et cetera, et cetera. So "t h" is the most common one that occurs in a lot of writing, so we said let's look at t h. So it's necessary to compute these kinds of things.

It's very complicated to compute these things by just looking at this image. So we computed a bunch of things here, and we said, “Well, we can compute four different things: height relationship of t to h, presence of loops, t and h connected, slant of t and h.” And then we said, “Well, 47 percent of the time, the t is shorter; 22 percent of the time, t is even with h; and 29 percent of the time, t is taller than h.” Loops, 10 percent loop only in t, 11 percent of the time loop only in h, 6 percent of the time loop in both, 71 percent of the time no loops like that. This is the kind of thing.

How do you do this? By looking at a world of t h's extracted from a lot of writing and say, “How often do these things happen?” So, once you get that, you can calculate what's called as a joint probability. So you observe this particular t h, which consists of the four features, X1, X2, X3, X4. X1 could be height relationship; second, the presence of loops, et cetera.

Now, once this is called as a joint probability and probability theory, once you have a joint probability that can be expressed as a product here, which consists of what are called as “conditional problem,” X1 given X2, X3, X4; X2 given X3, X4; X3 given X4; et cetera. It's pretty … you know, a long expression like that which is called a “joint probability.” So that's what needs to be computed.

Now, computing something like that is not easy. You're going to have to have lots of examples. Even for t h, you know, there are thousands of exemplars. Every one of them has to be analyzed, and you compute them. And then you need to express these kinds of things. It gets too complicated, and so, although the idea has been around, nobody has been able to compute these things before.

So, well, one can look at what are called as “independences.” This is basic probability theory. Are two features independent? If two features are — if X1, X2, X3, X4 are all independent, that probability calculation is very simple. Simply multiply all the four probabilities, and you have it. But that would be a very inaccurate calculation because they are dependent on each other, and you've got to determine which of these features are dependent on others. So we have to figure out all these conditional independences, and if you figure out that some of them are conditionally independent, you get a simple expression. So one could possibly compute that.

OK. So idea here would be we encounter a particular t h in a handwriting, and the forensic examiner can testify saying, “That is an extremely rare t h, and that is the particular probability that it occurs at all,” so one can make such a statement analogous to, say, DNA and so on.

So we went further here, condition probability tables. We got to compute all these things, and so here, there are … one is a height relationship between t h, presence of loops, presence of loops given the other variables and so on, so an extremely large number of tables have to be computed in order to have all those conditional probabilities to be able to figure out the final probability.

So, actually, we did this. We did this for the letter t h. We went about looking through 1,500 people's handwriting and extracted t h's, computed underlying probabilities of all of these things that a document examiner wishes that they knew, so that they could calculate it. That way, we are able to say here are several t h's.

So this particular t h here, as I said, probability point, zero, zero, zero, four, something like that. This particular one has t is taller than h. Neither have a loop, and t and h are connected, and slant of both t and h is positive. This one is shorter than h and so on. So we get different types of probability for each of these things.

So the starting point here, you can say, “How unusual is this particular t h?” And, of course, if we have much more handwriting available, we can now look at all of them together and say, “What is the particular probability associated with that?” So this is a style of calculation is called “rarity,” evaluating the rarity of this type of handwriting.

Now, to do this is an enormous task. You got to be able to take thousands of samples of t h, and that's not the only combination in the English language. You're going to have all the letters of the alphabet, all combinations and so on. So you'll have to look at all of these things. So it's infeasible to do it manually. So we've been developing computer programs to do these kinds of things. First of all, when you have handwriting samples, how do you associate the truth with it, are we looking at a t h here or an e r here and so on. This is called … this is an interface where you can type in the ground truth here. It matches it with the handwriting and associates the handwriting with the ground truth, so that you have all these pairs, and you can extract them all out. Here is the world of t h's now, and to that, we are to apply the feature extraction algorithms.

And the features are extremely different for everything. We saw what it was for t h. For an e r, it will be totally different set of features, so on, a very large number of such features. They all have been computed. So it's a fairly laborious computer programming task to be able to extract all these features so that you can apply it to a large database to extract the particular frequencies.

So that's about rarity computation. So the usefulness of this … it's simpler to say how unusual is something. The next one is how do you now characterize uncertainty. If I give you two letters e and then I ask you, “How similar are they and what is the strength of the opinion,” you got to capture their similarity in feature space.

We have some simple method here that we're using, and once you have the measurement and you have now two distributions that are corresponding to the similarity measure, when they came from the same, called as the “prosecution hypothesis,” and when they came from two different writers, it's called as the “defense hypothesis.” The prosecution and defense hypotheses are two probability distributions.

This is for the letter e. This is for slant in a writing, and those are two different distributions. So, once you have these distributions, one can … given any pair, saying I've got an e here, I've got an e here, what can you say about it? We can measure the similarity, and we can read off the two probabilities from these distributions and express that as a likelihood ratio.

So the computer program that actually does this kind of thing — here is a k; here is another k — and it computes not just the likelihood ratio but what's called as a “log likelihood ratio.” It takes the likelihood ratio; it takes the logarithm of it. If it's positive, then it says it sounds like it's the same writer. It's 2.145 here; it means similar case.

This k and this k … this one has a loop here. This one doesn't have a loop. This is minus 0.2894 saying these two k's probably were written by two different people. These two, more similar. These two is somewhat not similar, that kind of thing. So one can have a characterization of the likelihood ratios or the log likelihood ratios for all pairs of k's.

Now, one can accumulate all these types of information for every letter, every pairs of letters in the two documents, the known and the question, and come up with an overall log likelihood ratio. Of course, if you consider them to be independent, not independent, some subtle issues come about.

So, for example, here is one sample of handwriting. Here is another sample of handwriting, and the log likelihood ratio comes out here. That's 41.52. It's a large positive number and which gets mapped into the scale identified as same, highly probable, same. This is a nine-point scale recommended by the SWGDOC. The technical group suggests a nine-point scale for question document examiners in the U.S. It's a five-point scale in Europe and so on. So one could map these kinds of numbers into this kind of an opinion as to how sure one is about the writing.

This is again a computer program interface. Here are two handwriting samples displayed here, and these are the kinds of features that are being used, and this is another pop up screen that says these two handwriting samples are being assigned to the category highly probable, they were not the same writer for this particular case.

OK. That's about question documents, and let me just wrap up the last five minutes with footwear comparison now. It's the same kind of thing in all of impression evidence. So you have a question. The terminology is a little bit different when you go from one type of impression evidence to the next. In handwriting, we call it “question documents.” In fingerprint, we call them “latent prints.” In the case of footwear, we call them as “crime scene impressions.” Right?

This is the crime scene impression, and here are some several known prints over here. So the task again is how rare is this particular pattern, and we haven't really looked at that much at this point. We look at the similarity here, like can you compute the similarity between these two and give us some kind of a strength of the match.

So we have to do a lot of pre processing, image processing and so on. So we took on that shoeprints are manmade artifacts, objects. They're not natural things like fingerprints. They are made of geometrical primitives. We looked at lots of them. They seem to have lots of circles on them, ellipses on them, lines on them, et cetera. There seems to be all these primitives. So we said, “Let's detect all these primitives”; these become our minutia or whatever for shoeprints.

So we detect circles, ellipses and lines, and then we — now what are we after here? So, once you got all these primitives, you got to have a measure. How do you say these two prints are from same, or how similar are these two prints? So, for that, we compute what's called as an “attribute relational graph,” which represents the whole shoeprint now as a graph of some form, and we have given two graphs now. One can have a graph distance measure. So that's the kind of thing we do, and we have defined what's called a footwear print distance.

So, given two footprints, I can say how similar are these two things. Given any two piece of footwears, it's highly mathematical in nature, but, anyway, that's a little bit of formula to indicate to you. It's a complicated formula that's out.

So here, we have footprint distance values. Here is the crime scene print. Here are known prints, and we take that and say, “Well, what is the distance based on this?” For each of these, we calculate this value.

And why is this useful? You have a database. There are literally thousands of possible shoeprints in our shoeprint database. So, when you have a crime scene print, one could match against every one of them. It becomes a problem of image retrieval. So you can say, well, I find the closest match to be … let's say it's Nike here or an Air Force, you know, One or something like that. We can make it pull that up.

And we can do things like cluster all these things. That's the kind of thing we have done. We have taken a large shoeprint database of thousands of prints, and we have clustered them into groups so that very quickly we can go and say, well, this particular crime scene print seems to belong to this particular cluster, and how do you cluster things, you need a measuring similarity between prints, and that mathematical formula that I showed earlier on, there's that.

And once you have this kind of thing, one can also get into the similarity distributions within class, between class. When you have crime scene prints and the knowns associated with it, one can have distributions, and one can compute things like probabilities under the prosecution hypothesis, probability under the defense hypothesis and so on.

Here is a crime scene print, and here are the known prints. For one of these, for the closest matching print, the shoeprint distance is .0605. The probability of the prosecution hypothesis that it was done by the same is .2950. So we now have a concrete probability associated with could it be the same.

All right. Summarizing. Nothing is certain, including forensic matching. Features and similarity measures are needed for computing uncertainty. So we begin with what are features and then how do you measure similarity between two objects, and then we now map into … we work in feature space to talk about rarity, how unusual is this particular combination of features, and then we work in similarity space. We look at similarities of pairs of things in the world, and then we say, “How similar … what is the uncertainty associated by means of a likelihood ratio for similarity?”

And then, once you have a likelihood ratio like that, you can then map it into just an understandable opinion. Instead of numbers, you would simply say it is likely that it was done by the same. Right?

And, in order to do all of this, there's an enormous amount of underlying probability extraction, how often do these features occur, and in order to do that, particularly in the case of handwriting, you need computer tools to be able to analyze large quantities of handwriting, to extract the relative frequencies of all of these things. So one can automatically say, “I see a particular structure,” and then it comes out and says, “Well, the probability of that structure is this.” So that's the kind of work we're trying to do.

Thank you very much.

[Applause.]

Date Published: June 1, 2010

Impression Evidence: Strengthening the Disciplines of Fingerprints, Firearms, Footwear, and Other Pattern and Impression Sciences Through Research

Transcript