Scenario – screen scrape in seconds

Screen scrape in seconds with Textract Scenario utility! Textract Scenario is the extension to the basic Textract API. Technically, Scenario is represented by Scenario Editor and TextrextScenarioExec() function. Scenario Editor is a GUI program that you use to form a kind of prescriptions, named Scenarios and Scenario Groups, for the TextrextScenarioExec() function.

Basic Textract text capture library is presented by Textract API (Textract.DLL, TxtrOCX.dll OCX) and it can screen scrape by window handles (HWND) or by absolute coordinates. So any changes in window position like movement or resizing, and application reopening can prevent Textract from capturing proper area.

Capture: is a description of window search method that is not based on HWND window handle or rectangular position on the screen. Rectangular Capture is specified (visually, using mouse, from within Scenario Editor) when application we want to screen scrape is working. Application window can be moved around and capture tracks it. Capture use various methods to screen scrape windows independently of the position. It also ignores application reopening, window maximizing, and child windows.

Capture search function is based on analysis of windows hierarchical tree. It is possible to use strict comparison of window’s title and class name. Also it is possible to use regular expression as a template for window’s title and class name. Search function may skip some windows in the hierarchical tree or ignore window’s title or class name in searching process.

Anchor: Another problem that can be solved using Scenario is a false capture. Several places on the screen may meet capture specifications, some of them incorrectly. To prevent such an incorrect screen scrape the anchor is introduced. Anchor is a visual screen element (bitmap) that must be present on the screen to allow scenario to be executed. It can be a toolbar button, an icon or some other window element.

Scenario: Textract using application may need to get few strings of captured text from one screen scrape. Thus the scenario is introduced. Scenario includes one or more captures specified for various places on the screen. Scenario is an integral unit of screen scraping to use in your application. The main Textract Scenario function TextrextScenarioExec() receives one or more scenarios and returns an array of named strings.

Terminal windows and Regexes: In some cases application need to get a part of a large captured text. It may be not possible to know in advance a placement of these substrings in terms of rectangles inside windows. For example you need to screen scrape a terminal window. The most appropriate solution for this case would be to capture the whole terminal window content and then extract required substrings. To to that regular expressions can be utilized. Regular expressions are widely known among programmers and can be a googd solution in many cases. Another approach is a text analysis by string functions (strstr(), etc.) or using Yacc-like grammar tools. Any of these approaches can be implemented inside your application. Regular expressions are supported directly bu Textract Scenario. To use other text-parsing tool you should capture all the text and then analyze it inside your application. It is recommended first to try regular expressions provided by Scenario. It can be the most straightforward solution.

Scenario Type: Lets consider a situation when your application has to screen scrape several third-party applications. For example, one of these applications can be a terminal window, another one is a dialog window and one more is a MDI application. All you may need from them are several strings. For example client name, client ID and some related fields (address, amount, diagnosis, comment…). Thus we introduce scenario type that extracts the same results from various source applications. Scenario of some scenario type guarantees that TextrextScenarioExec() returns expected strings defined for this scenario type. For example, all scenarios of type “ClientAddress” return client name and address strings and your application shouldn’t care about the specific scenario in use. The only requirement is that scenario has to be of specified scenario type.

New scenarios can be created using Scenario Editor after your application is ready and even deployed. Then new scenario can be assigned to your application. No recompilation of your application is required.

The Role of Technician. Beside the end user and programmer one more role may be used: technician. The idea is that technician, who is not a programmer, could create new scenario. It can be used to screen scrape text from the applications that are not known in advance. We call this role technician but various people may perform this role. It may be somebody who goes to the client site for a system installation. Or it may be an advanced user. Or it may be a sales person or a programmer. This way your application can be unleashed from text source applications, to cover all usage cases, and without extra programming effort. Technician uses Scenario Editor as a tool to bind your application to the new text source.

Scenario Groups can be used to automatically recognize situation on the screen and select appropriate scenario type. Let us assume that user activates your application using hot key or system tray icon (this functionality is provided by HotIcon class from Textrext extension) when he/she decides that there is appropriate situation on the screen. For example, teller can activate cash dispensing or activate the work day totalling, both being activated by the same hot key. The task of your application is to determine the action depending on the screen content. Textract Scenario provide the capability to screen scrape and to OCR the text. Several scenarios of different scenario types are combined into scenario group. In this case your application pass the scenario group to TextrextScenarioExec() and appropriate scenario will be selected and executed. Type of executed scenario is returned along with captured strings so that your application can decide the action upon the captured text.