MT-Recognition - recognition of mathematical equations. Desktop version
Desktop version of MTRecognition
The Desktop version allows ONLY equations to be recognized. However, this is not a disadvantage, but an advantage. Equations are recognized much better and you can choose which equations should be MathType equations, and which should be left for recognition as text in another recognition system, such as (for example) FineReader.
Let's briefly consider the sequence of steps for recognizing a PDF document with equations.
- Create a folder where all your work files will be located.
- Copy MTRecognition.exe and the PDF file with the text (with equations) into this folder. You can download MTRecognition.exe for free on this page - see below.
- Create in it, for example, the Images folder.
- Open PDF in Adobe Acrobat Pro, export PDF pages to JPEG images (File -> Export to -> Images -> JPEG). In the dialog box that opens, you can click the SETTINGS button to set the satisfactory image quality.
- Run MTRecognition.exe, click the Settings button, and enter the access key and email. The access key can be obtained free of charge for recognizing 100 equations by contacting email@example.com. If you need more equations for recognition, you will need to buy a pack of 1000 equations (see more on the PRICING tab).
- Click the Open Image button, select an image from the Images folder. The image will be opened in the program.
- Use the slider at the top of the program window to adjust the scale of the image (or use the mouse wheel).
- Press and hold the left mouse button and drag around the equation, release the mouse button. The equation will be circled with a dotted line.
- Press R or CTRL+R or press the RECOGNIZE button. The equation will be recognized and (for example) #115# will appear on the image instead of the equation (115 is the number of the equation).
- If you select something in the image in the same way, for example, a picture, and then press CTRL+DEL, the circled part of the image will be cleared. You can remove unnecessary drawings, parts of the text, or dirty spots this way.
- The font size in #115# should be about the same size as the other characters on the page. To adjust the size, click the SETTINGS button and increase or decrease the value in the FSIZE field. ONLY digits are allowed.
- Save the image - press the SAVE IMG button or the S or CTRL+S button.
- The number of equations you can recognize with your key is shown by the green big numbers after the EXIT button.
- After equation recognition, an information window opens, which informs you about the status of the recognized or unrecognized equation, sends error messages, and tells you how many equations are available for recognition. You can show or hide the information window by clicking on the SETTINGS button and checking or unchecking the SHOW ALERT WINDOW box.
- In the same way we process all the images exported from PDF.
- Now navigate to the folder that you allocated for this work. Notice that another folder img_tmp has appeared. It contains image files of equations (for example, img_115.jpg). These are the same images that were sent for recognition. It also contains files with the HST extension (e.g., img_115.hst). They are plain text files that contain recognized equations in TeX format. In addition, the MTRec_.log file appeared in the folder next to MTRecognition.exe during runtime. It contains the history of recognition.
- Open the Images folder. The processed_images folder appears there. It contains processed JPG files with #115# tags in place of equations. Note that files with the extension BAK are the same image files, but saved by the program automatically. In other words, JPG files are saved when you save the image (see item 12) and BAK files are saved automatically after each operation.
- Generate PDF from the JPG files in the processed_images folder. To do it, open Adobe Acrobat Pro, and then File -> Create -> Combine files into a single PDF document, add all JPGs, and click the COMBINE button (upper right). Save the resulting PDF. Recognize it in (for example) FineReader. Save the text in MS Word.
- Open the recognized text by MS Word. Now you need to place the text from the img_115.hst files instead of #115#.
- To do this, install the script (in VISUAL BASIC) for MS WORD.
Let's look at two options for installing the script.
First option. Open Word, click on View in the menu. In the panel that opens, click on MACROS and select MACROS RECORD. A dialog box will open. It doesn't matter what the name of the macro will be. Leave the default name. The macro must be written in Normal.dotm. Click OK. This will start the recording. Immediately afterward, click on MACROS again and choose Stop Recording. Now click on MACROSES again and choose MACROS from the drop-down menu. Select the macro you just recorded and click CHANGE. The Microsoft Visual Basic window will open and you will see the code like this one: SUB MACROS1 END SUB. Select everything from SUB to END SUB and paste the text of the WTRec_Replace macro instead (see it below).
Second option. Open Word, click on the Developer tab in the menu. If you don't have one, follow the instructions on the Microsoft website - how to show Developer Tab in Word. Click on Visual Basic. In the window on the left, find the Normal item, open it, open the Modules item and click on New Macros. In the main window on the right, enter the text of the macro. Now, if you close the Microsoft Visual Basic window and click on MACROS (Alt+F8), you should see WTRec_Replace in the list of macros.
- Select all or some part of the text in Word where you want to make the required replacement and run the script in Word. To do this, press Alt+F8 and click WTRec_Replace.
- This will open a dialog box to select the .hst files. Locate the folder and select ALL .hst files, click OK to run the script.
VERY IMPORTANT! BE SURE your text doesn't contain any $ character. If it does then remove or replace it with other characters or combinations of characters OR it wouldn't be possible to convert TeX to MathType properly.
- Once again, check to see if all #115# tags have been replaced, that is, look for # in the job. Figure out why this happened. Maybe there is only the first character # and the second is not recognized, then add it, or the numbers are not recognized correctly. Type them in and repeat the steps. 21-22.
- Now we have a recognized text with tables and TeX code in place of the equations. Once again, select all the text, or, if there are many equations in it, a small part of it, and run the command in MS Word menu: MathType->Toggle Tex. If there are a lot of equations, nothing will happen. Select less text and run Toggle TeX again. This way you will get recognized text with equations.
You can download MTRecognition.exe by clicking on
Const ForReading = 1
Dim iPath, iNumber, iContent, toFind
Dim ClbData As DataObject
Dim startPos, endPos, Length As Integer
iPath = "C:" 'You can change this path to something yours like this one Disk:\path\img_tmp\
Dim fd As FileDialog
Dim fs, f
Set fd = Application.FileDialog(msoFileDialogFilePicker)
Set fs = CreateObject("Scripting.FileSystemObject")
Dim vrtSelectedItem As Variant
.AllowMultiSelect = True
.InitialFileName = iPath
.Filters.Add "Text", "*.hst", 1
If .Show = -1 Then
For Each vrtSelectedItem In .SelectedItems
startPos = InStr(1, vrtSelectedItem, "img_")
endPos = InStr(1, vrtSelectedItem, ".hst")
Length = endPos - startPos - 4
If (endPos - startPos > 10) Then
Length = endPos - startPos - 12
startPos = startPos + 8
iNumber = Mid(vrtSelectedItem, startPos + 4, Length)
toFind = "#" & iNumber & "#"
Set f = fs.OpenTextFile(vrtSelectedItem, ForReading, False)
iContent = f.ReadAll
iContent = Mid(iContent, 4, Len(iContent) - 4)
iContent = " " + iContent + " "
Set ClbData = New DataObject
.Text = toFind
.Replacement.Text = "^c"
.Execute Replace:=wdReplaceAll, Forward:=True, Wrap:=wdFindContinue
Set f = Nothing
Set fd = Nothing
Set fs = Nothing