Monday, November 14, 2011

AccessViolationException When using tesseractdotnet .net wrapper

I recently came across a very annoying issue while writing a web application that utilizes the TesseractDotNet wrapper for Tesseract. I was attempting to have the web application process OCR tasks in an automated task engine. Whenever I called the TesseractEngine.recognise(imgObject) method, I would get the following:

AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

The code that performs all task engine management and OCR functionality lives in a separate DLL file. After several hours of frustration, I tried removing some variability in the problem by creating a test console application and using the DLL project's functionality from there (instead of the web project).

Amazingly it worked - no problems.

I then tried the same thing with a forms application: again it worked no problem.

It turns out the problem was that the training data file was never properly loaded. The TesseractEngine.init(string trainingDataPath, string language, int mode) method returns "true" if the method successfully loads the training data and "false" otherwise. I had neglected to capture this value and verify that it was == true. As it turns out, it wasn't .

I was using a relative path in code for the location of the training data file. This relative path was accurate with respect to the console/winforms applications' executables, however the web application's binaries live in a completely different place! The relative path was in that case inaccurate AND since I was not verifying the return value of the init() method, I was never notified of this problem, so when I went to call the recognize() method, it tries to work without having loaded the training files and fails.

The morals of this story are:
1) Don't use relative paths. You never know exactly where your web application will be deployed. Instead add a key/value for the path to your training data in your web/app.config files and get the paths from there.
2) Always ALWAYS check the return values of your method calls. If a method doesn't return void, its return value should always be checked for correctness. Its hell to debug something if you make the assumption it is working "properly".