Get Started
Latest From Sovren

Best Practices for Testing a Resume Parser


If you’re in the market for a resume parser, chances are you’re contacting a few vendors and testing their software. Here are some helpful tips on how to get the most out of your testing so you can choose the best option for your needs.

One of the most important tips we can give you is: Don’t use fake resumes or disguised data.

Sovren’s parser is designed to reject fake and disguised data such as “Employer 1,” “Anytown, USA” or phone numbers like “555-555-5555.” Our parser recognizes this type of information is not real and won’t provide it with your results. Other parsers may indicate this is a valid phone number because it’s correctly formatted. This could inaccurately lead you to assume that our parser (or others) doesn’t recognize phone numbers, when in fact, it rejects fake ones.

Avoid using a translator such as translate.google.com to translate a resume into another language for testing.

Think about contact information, for example. A UK address will not “translate” to a Belgium address and postal code. The result will be nonsense data. The same mistranslations, and gibberish translations, will happen with many other important details on a resume.

Don’t accept vendor-supplied resumes for testing; use your own resumes.

Vendor-supplied resumes have been handpicked to parse accurately by the vendor’s own engine and poorly by other engines. If you want accurate results, use your own resumes to authentically test software.

However, don’t test your own resume or resumes of people you know very well.

If you are too close to a resume, you may not recognize that some of its flaws could be misinterpreted by others (or a parser). And even if a parser is 99% accurate, you may focus on the 1% because you are overly familiar with the person’s skills and experience. Use a pool of randomly selected applicants, not people you know, for the best results.

Don’t test the odd, weird resumes.

Most software is designed for the rule, not the exception. It’s better that a parser accurately interprets 99% of standard resumes correctly, not the exceptions. For instance, if you parse a graphically designed resume that is largely made up of images (like this example), most software will not parse those images. However, most candidates are not submitting resumes in that format. Keep in mind that if you submit odd data, you will get odd results.

Test about 30-50 resumes per language or locale.

Testing only a handful of resumes is not statistically valid. You’ll want to test at least 30 for more representative results.

On the other hand, trying to test too many resumes (hundreds or thousands) is overkill. See the next tip for why you would want to limit your tests to a manageable number.

You should evaluate results individually, comparing parsed results to the actual resumes.

For instance, let’s say that Product A reports more phone numbers than Product B. It’s not accurate to immediately assume Product A is better. Unless you look at the resumes themselves, you won’t know whether Product B missed a valid candidate phone number, or whether Product A wrongly reported a National ID as a phone or a reference’s phone number as a candidate phone. When you are testing software, you must compare your results to the sample resumes to verify the results are accurate.

Don’t test resumes from just one source, industry or job type. Instead, test resumes from many sources and that are applicable to many industries and classifications.

For instance, you may want to test how accurate a parser is with a list of skills. You will get a better analysis if you score skills across industries for comparison purposes. A particular vendor may specialize in one industry or area and you may not get the same results when testing a variety of industries or locales.

Don’t test only accuracy; also test completeness.

You may not need certain types of data today, but what is more likely -– that going forward you will need the same amount or more data? We’ve been in business since 1996, and the reason our parser reports much more data than any other product is because over the years, our customers have identified a need for it.

Test scalability and robustness, too.

Accuracy is irrelevant if the product can’t scale, and scalability and cost are irrelevant if the product is not accurate. Don’t test just one resume at a time. How many resumes can the parser handle? Try processing 10 documents simultaneously. What happens? Does the system bog down or time out?

Test configurability.

Assume that each resume must be parsed using a different set of configuration options and using a different skills taxonomy. How can that be accomplished using each vendor’s software? Can the software be configured on the fly to use a new configuration and taxonomy for each transaction with no additional setup or resource overhead? Or does the software require a separate, persistent server application instance to be configured in advance for each different scenario, or even, for each resume language?

Verify vendor claims.

All vendors claim to be the most accurate, and that’s clearly impossible, so ignore the claims and do your own testing. Some vendors hammer away at their scientific concepts, but are you buying academic technologies or real-world performance? Your customers are not shopping for algorithms; they just want the best performing product. Vendor emphasis on unverifiable concepts is always a smokescreen for real-world shortcomings.

Check with each vendor to ensure that you are setting up and running their software correctly.

While it may seem obvious, you’ll want to ensure that your results are not impacted by user error or software that has been implemented incorrectly.

Still having problems? Use Sovren's Resume Analyzer.

If you're still having issues with a specific resume, you can try our Resume Analyzer. You may have a corrupt file or other issue. Our Resume Analyzer helps both job seekers and recruiters optimize a resume for digital recruiting systems (or understand what’s missing, broken, or wrong with a resume, and correct it).

No one has been doing resume parsing longer than Sovren’s employees (who go back to the founding of the company in 1996), and there is no substitute for the experience we’ve gained over the decades. We’ve learned that no single approach to parsing (or searching, or matching) works best across all documents in all languages and all locales and all cultures. That’s why our products use more than 60 different parsing paradigms/strategies, and yet require no pre-training. Benefit from our years of experience by using our tips for testing resume parsing software, and you’ll find the best parser for your needs.

View All Articles