Automaton is a function used for roman character-Kana conversion by the IIIMF language Engine Wnn8LE (hereafter called just Wnn8LE) and similar managers. Automaton converts according to the contents set in a table (called a conversion table) to enable versatile conversion. This system provides an Automaton library with the same type of functions (function beginning romkan_ in the Japanese language input library (libwnn)) to enable a wide range of conversion programs.
Automaton performs three conversions in series according to conversion tables (in order, preprocessing, main processing, and postprocesing) and outputs the final results. Processing is handled according to conversion tables for each of the three conversions. Atomaton also has a mode function. The mode can be changed to dynamically change the combinations of the three processing stages. Setting the mode and the switchover codes is performed using conversion tables.
Because the conversion tables are text files, they can be changed easily and you can also easily change to any conversion table. Furthermore, BS (backspace) can be used to return to the previous status after a conversion has been completed until the next conversion is completed.
Although roman character-Kana conversion using Wnn8LE converts only between uppercase English characters into Hiragana, preprocessing and postprocessing can be used to handle various types of inputs and outputs. For example, preprocessing can be used to convert from lowercase English characters to uppercase English characters. Postprocessing can be used to convert from Hiragana to Katakana or from Hitagana to half-width Katakana.
Automaton proceeds with the operation as follows.
The following conversion tables are used by Automaton.
Declares the mode and the correspondence tables to use. The file name is mode.
The correspondence table used for preprocessing. The file name begins with "1".
The correspondence table used for main processing. The file name begins with "2".
The correspondence table used for postprocessing. The file name begins with "3".
The mode definition table contains the mode declaration, the correspondence tables to be used for each mode, and table usage rules for them.
The correspondence tables contain lists of corresponding input codes and output codes. The correspondence tables are separated into those for preprocessing, main processing and postprocessing and any number of correspondence tables can be used for each of these.
Wnn8LE searches for the mode definition table in the following order.
The following table entires can be used.
The mode definition table contains the mode declaration, the correspondence tables to be used for each mode, the determination standards for them, and the mode display text strings.
The mode definition table consists of the following items (1), (2), (3), and (4). The remainder of a line is treated as a comment if a semicolon (;) appears at the beginning of the line or following spaces (including tabs) unless the semicolon is escaped.
1. Special Characters
2. Mode Declaration
defmode mode_name [initial_status]
The mode declaration is made before the mode is used.
3. Search Specifications for Correspondence Tables
search directory ......
path directory ......
4. Specifications for Correspondence Tables and Mode Display Text Strings.
File names for correspondence tables must begin with '1','2',or '3'. Path names can also be specified. Mode display text strings are text strings placed in quotation marks used to display the current mode.
Indicates the mode display text string when conversion is ON.
Indicates the mode display text string when conversion is ON.
Indicates the mode display text string when conversion is OFF.
Indicates the same mode display text string as was used before the mode was changed be used when conversion is ON.
Indicates the same mode display text string as was used before the mode was changed be used when conversion is OFF.
This text string is used by Wnn8LE to display the mode.
The Automaton library can read this text string using romkan_dispmode(), but only the last entry is valid if multiple mode display text strings are given for the mode in the mode definition table.
(2), (3) are used to change the correspondence table depending on specified conditions. If the condition in the if statement in (2) is true, then the specification in the if statement is referenced and the specification following the if statement is not referenced. If the condition is false, the if statement is exited and the specification following the if statement is referenced.
If the condition in the when statement in (3) is true, the specification in the statement is referenced. The specification following the when statement, however, is referenced regardless of whether or not the condition is true or false.
(2) or (3) can be used recursively to specify correspondence tables.
Any one of the following can be used for the conditional statement.
For example, when (defmode kana) and (defmode romajikana) are both in the mode definition, (and kana romajikana) is true when both modes are ON.
Also, (here conditional statements are represented by <1>, <2>, and <3>, and conversion table names are represented by A, B, and C) assume the following statement.
(when <1> A (if <2> B ) C ) (if <3> D ) E
Also assume that conditional statements <1>, <2>, and <3> have been met. Examine the statement from the beginning. First comes (when <1> A (if <2> B) C). Because <1> has been met, "A (if <2> B) C" is examined and table A is selected.
Next comes (if <2> B ) and <2> has been met so table B is selected. Because this is an if statement and the conditional statements have been met, the rest of the current series "A(if <2> B)C" need not be examined. Although this ends examination of "A(if <2> B)C," this series is contained in a when statement, so the remainder of "(when <1> A (if <2> B )C) (if <3> D) E" is examined.
The next portion is (if <3> D). Table D is selected because the condition statement <3> has been met. Because this is an if statement, the rest of "(when <1> A (if <2> B ) C ) (if <3> D ) E" is not examined. The final results is the selection of tables A, B, and D.
Next we'll used the mode definition tables used by Wnn8LE as an example.
Three modes are defined in the mode definition table. There are specifications for correspondence table and mode display text string to be used from 2A_CTRL to the end. This table is referenced each time the mode changes and the tables to be used are selected as described above.
(defmode romkan) (defmode katakana) (defmode zenkaku) 2A_CTRL (if romkan 1B_TOUPPER 2B_ROMKANA 2B_JIS (if (not katakana) "[Ar]") (if zenkaku 3B_KATAKANA "[Ar]") 3B_HANKATA "[AIr]") ; "A" and "I" are half-width Katakana. 2B_DAKUTEN (if (not katakana) 1B_ZENHIRA (if zenkaku 3B_ZENKAKU "[A ]") "[AA]") (if zenkaku 1B_ZENKATA 3B_ZENKAKU "[A ]") "[AIA]" ; "A" and "I" are half-width Katakana.
Initially romkan, katakana, and zenkaku are all OFF. 2A_CTRL is selected as the table at this point. Because romkan is OFF, the following if statement is not referenced, and 2B_DAKUTEN is selected. The conditional statement for the next if statement, (not katakana), is true because katakana is OFF. The inside of the if statement is referenced and 1B_ZENHIRA is selected. Next the if statement inside the if statement is referenced. Because zenkaku is OFF, the conditional statement if false. The if statement is thus not referenced.
Next the mode display text string "[A[hiragana-A]]" is selected and the rest of the conversion table series is not examined.
The correspondence tables contain the conversion data (input codes and corresponding output codes) for preprocessing, main processing, and postprocessing.
Preprocessing and postprocessing serve supplemental roles to main processing. The following restrictions thus apply to preprocessing and postprocessing correspondence tables.
All lines in the correspondence table must contain one of the following items (1) to (3) or must be empty. Lines of this form are repeated to form the correspondence table.
Each entry must occupy no more than one line. The remainder of a line is treated as a comment if a semicolon (;) appears at the beginning of the line or following spaces (including tabs) unless the semicolon is escaped.
The output code or buffer remainder will be treated as a null string if omitted. Input codes, output codes, and buffer remainders must contain strings of the following without intervening spaces: forms that evaluate to characters and forms that evaluate to text strings.
Forms are considered to evaluate to characters or text strings if the form is replaced by the character or text string.
The following types of forms evalutate to characters.
(1) Character Notation
(2) Function Name with Form that Evaluates to a Character
(3) (Function Name with Two Forms that Evaluate to Characters)
(4) (Variable Names)
The following types of forms evalutate to characters.
(1) Character Notation
"" The null string is indicated by omitting the character notation as follows:
(2) (Function Name with Form that Evaluates to a Character)
(3) (Function Name with Mode Name)
if and unless can only be entered for input codes. on, off, and switch can only be entered for output codes in the main processing table.
(4) (Function Names Only)
(defvar variable_notation (list character_notation......))
(defvar variable_notation (all))
Variables that can be used as forms that evaluate to characters and the range of the variables is defined. Declarations are made as shown above. Variable notations are given as variable names or as (variable_name......). Character notations are the same as for forms that evaluate to characters.
(toupper (tolower Y))
The following line, however, cannot be used because a form that evaluates to a text string is used as the argument to another function
(toupper (tohankata [hiragana-KA]))
Variables can be used effectively when the same patterns appear many times in conversions, such as in the following example.
(defvar a1 (list K S T H Y R W G Z D B P)) (a1)(a1) [small tsu] (a1)
The above two lines achieve the same conversions as the following lines. Both show methods of handling assimulated sounds (Sokuon) in roman character-Kana conversion.
Variables are equivalent to the characters given as the range of the variable in the variable declaration (defvar).
(defvar a1 (list A B)) (a1)(tolower (a1)) 3
The text strings "Aa" and "Bb" will be converted to "3" in the above example and not to "Ab" and "Ba".
(defvar a1 (list a b)) (toupper (a1))(a1) 3
Here, it appears that the input "Aa" will be converted to "3", but first a match is attempted between "A" and (toupper (a1)). This is not possible because the argument of toupper is the variable a1, which does not yet have a value. A check is made for this type of setting when tables are read into the system.
(defvar a1 (list K S)) (defvar a2 (list a)) (a1)(a1) (a2) (a1)
The above programming is not correct because the variable a2 is not matched to an input code, but appears for an output code.
First, the code that is input is grouped into character units (characters with 2-byte codes are also treated as one character). This is called the input code. The input code goes through preprocessing, main processing, and postprocessing before the final output is produced. In preprocessing, each input code corresponds to one output code. The output code from preprocessing becomes the input code for main processing.
Inputs codes in the preprocessing table that is currently being used are examined in order from the beginning. When a match is found for the input code, the corresponding output code (i.e., the output code written on the same line as the input code) is output. If there is more than one table specified in the mode definition table, they are examined in the same order as they listed in the mode definition table. If no matching input code is found in a table (including when no table is specified), the input code is output unaltered. This is the same for main processing and postprocessing as well.
In main processing, input code is continuously added to the buffer as long as there is still a chance that a longer match will be found in the input codes in the table (i.e., when a number of characters from the beginning of the current section of input code have already been matched sometwhere in the table). Each time more input code is added to the buffer, comparisons are again made in order from the beginning of the input codes listed in the main processing table. As long as there is the chance of the input code in the buffer matching with the longest entry in the table (i.e., when a number of characters from the beginning of the current section of input code have already been matched somewhere in the table) a conversion is not finalized and more input code is awaited.
The code in the buffer is, however, output as nonfinalized characters to enable displaying and other processing. Codes for input errors and mode changes are also output.
These codes are differentiated from normal output codes and do not undergo postprocessing.
When the contents of the buffer matches the longest possible input code in the table (if more than one match is made, then the first one in the table is used), the corresponding output code is output and if no buffer remainder has been specified, the part of the buffer that was matched is deleted from the buffer. If a buffer remainder was specified, it replaces the portion in the buffer that was matched and the above operation is repeated.
If no possibility of a match is found in the table, the first character in the buffer is output unaltered. If the output code for a matched input code is a function that changes the mode (on, off, switch, etc.), the correspondence table is changed according to the specifications in the mode definition table. The functions that change the mode should be placed in the tables where they are required regardless of the status of the modes. If a match is made for the input code corresponding to the function (restart) the mode definition table will be reread. However, the same file as the one for the previous mode definition table will be used. This function can be used to change to an edited version of the conversion tables (including the mode definition table) while the Automaton is running without have to stop the Automaton.
In postprocessing, more than one output code can be output for one input code as the final output. In all other ways, postprocessing is the same as preprocessing.
In the following example "ls -la carriage_return" is output when "Ls" or "LS" is input.
Main processing table