Content-type: text/html Man page of fsexam

fsexam

Section: User Commands (1)
Updated: 16 Apr 2007
Index Return to Main Contents
 

NAME

fsexam - examine encoding of file name or content and convert to UTF-8  

SYNOPSIS

fsexamc [-a] [-b] [-d dry-run-result-file] [-E module-name] [-e encoding-list] [-F] [-f 'expression'] [-g history-length] [-H] [-k] [-L log-file] [-l] [-n] [-P] [-p] [-R] [-r] [-S] [-s] [-t] [-w]

fsexamc [-V]

fsexamc [-?]

fsexam [-a] [-b] [-E module-name] [-e encoding-list] [-F] [-f 'expression'] [-g history-length] [-H] [-k] [-L log-file] [-l] [-n] [-P] [-p] [-R] [-r] [-S] [-s] [-t] [-w]

fsexam [-V]

fsexam [-?]  

DESCRIPTION

The fsexam graphical user interface utility examines file names or file contents and try to convert them from legacy encodings to UTF-8 using given encoding list, system default encoding list, or both.

The fsexamc invocation is the same as fsexam except that the utility is now a command line interface utility.

When converting file names, fsexam will process regular file names, directory file names, and symbolic links by default. When converting file content, it will handle regular plain text files only by default. Use "-E module-name" to enable special file handling.

fsexam will ignore most of non-plain text files such as binary files, office document files, image files, and so on. It might produce unexpected result if conversion of such files are forced with -F option. Internally, fsexam uses file(1) utility to determine whether files are plain text files or not.

By default, fsexam will convert file names. To convert file contents instead, specify -t option.

To help find the best encoding, fsexam has encoding lists for supported languages. They include the most popular codesets or encodings of corresponding languages. For example, fsexam specifies GB18030, BIG5, EUC-TW, and so on for Simplified Chinese. The list is used to generate conversion candidates. You can use "-e encoding-list" option to add more encodings other than those system pre-defined encodings. If -a option is specified, additional encodings that are suggested by the encoding auto-detection library will be added to the encoding list for possible use. The encoding specified by the -e option has higher priority than the automatically detected encodings.  

OPTIONS

The following options are supported:

-a
--auto-detect
  

Enable encoding auto-detection. fsexam can guess the encodings of file names or file contents with the help of encoding auto-detection library interfaces. Use this option when you do not know the encodings of files. Note that, in file name conversions, the auto-detection based on the statistics may not be reliable due to small number of characters in the file names.

-b
--batch
  

Batch mode which is also known as non-interactive mode. With this mode, fsexam will not display candidates or wait for user's selection or confirmation.

Please make sure your terminal can display UTF-8 characters well when using this option. Otherwise, illegible or gibberish characters may be presented to you.

-d dry-run-result-file
--dry-run-result dry-run-result-file
  

Specifies the dry run result file. Used with -n option, dry run result will be stored into the file. Used without -n option, fsexam will convert based on the scenario in the dry run result file supplied.

The dry run result file will be created if it does not exist. If it exists as a regular file, the file will be truncated to zero length and overwritten.

When fsexam creates a dry run result file, you can edit and then subsequently feed it to fsexam to perform conversions based on the content of the edited dry run result file. Note that the editing should be done carefully in the format preserving manner. Recommended edit operation is to delete any wrong or inappropriate candidates and make the right one as the first candidate. For more information, refer to fsexam(4).

If the edited file does not conform to the file format described in the fsexam(4), fsexam will print out a warning message and quit without doing anything.

-E moduel-name
--enable-module moduel-name
  

Enable special file handling. Currently the only valid option argument is "COMPRESS". "ALL" can be used to enable all modules available.

The COMPRESS module supports several popular compress and archive format files. Currently, the module supports .tar, .tar.gz, .tar.bz2, .zip, and .tar.Z file formats. Used with -t option, fsexam converts contents of files in archived, compressed, or files of both. Without -t, fsexam converts file names.

Note that the COMPRESS module ignores symbolic links in the files archived, compressed, or both. It also ignores -n option. The COMPRESS module handles files compressed, archived, or both only if -R option is specified. If there is no suitable ISO8859-1 codeset locales in the system, this option is not supported as described in the NOTES section.

-e encoding-list
--encoding-list encoding-list
  

Specifies one or more colon or comma separated encodings to be used during conversion.

If this or -a options are not specified, fsexam uses system pre-defined encoding list for the current locale.

If specified without -a, -p, or -P options, by default, the list of encodings supplied with -e option replaces the system pre-defined encoding list for this session.

Use -p to prepend it after the system pre-defined encoding list. Use -P to append it before the pre-defined encoding list. If you want to make the encoding-list permanent, instead of only for the current session, use -S option.

When used with -a option, fsexam will merge the supplied encoding list and auto-detected encoding list. Note that the supplied encoding-list here has higher priority than the auto-detected encodings.

In non-interactive mode, the first encoding which can be used to convert file name or file content to UTF-8 successfully is used. In interactive mode, fsexam will display all candidates that are successfully converted from the encodings in the list of encodings to UTF-8. Note that if fsexam cannot convert successfully, such encodings will not be displayed in the list of candidates.

-F
--force-convert
 

Forcible conversion mode. fsexam will determine whether file name or file content is in UTF-8 or not, and if it is in UTF-8 already, then, fsexam will not convert by default. However, since fsexam has no completely accurate way to determine whether a string is in UTF-8 or not, sometimes, a byte sequence in legacy encoding could be treated as a valid UTF-8 string. As an example, three Simplified Chinese characters in GB2312 (two bytes per character) could be treated as two valid UTF-8 characters (three bytes per character). Use this option to bypass the verification step and perform conversions forcibly.

This option has to be used with caution and should be also avoided to use with -R whenever possible. It may convert real UTF-8 encoded file names or file contents to unintended characters.

-f 'expression'
--find-expression 'expression'
  

Search files according to 'expression.' The 'expression' here is a subset of the 'expression' used in find(1). But unlike find(1), the 'expression' here must include a path name of a starting point in the directory hierarchy in which you want to search files from as the first item. Following the path name, other items valid for the expression are following options and their combinations: -name, -amin, -atime, -cmin, -ctime, -group, -mmin, -mtime, -user. Refer to find(1) for more information. Internally, fsexam uses find(1) to perform searching.

You may want to use single quote to quote the whole expression because shell may expand special characters in it if you use double quotes.

When this option is used, any other operands are ignored.

-g history-length
--set-history-length history-length
  

Set the history length. fsexam saves the information about on what it has done and use the information to handle restore operations.

By default, fsexam will save history information for 100 fsexam executions as long as disk space permits. A single batch conversion counts as one. Use this option to change the default value.

If you change the length from a higher value to a lower value, the older history information will be purged.

When the number of history reach to the top limit, fsexam will discard the oldest history information in order to accept and record new history information.

-H
--hidden
  

Handles hidden files. Unless the option is specified, hidden files with names starting with a dot (.) will be ignored by default.

-k
--no-check-symlink-content
  

By default, during file name conversions, if both symbolic link and its source belong to the user supplied list of files or a starting point of a directory hierarchy at operands, fsexam tries to keep them consistent. In other words, if a source name is converted, then, not only symbolic link itself when applicable but also the content of the symbolic link is converted. If given source names are not converted for some reason, the corresponding symbolic link contents are also not converted and warning messages are issued. If either is not in the operand specified list, fsexam may break the symbolic links.

This default behavior of symbolic link processings need more resource and computation time and thus use of -k option is recommended to bypass the default behavior of symbolic link processing if you have no symbolic links.

During content conversions and dry run conversions, fsexam does not care about the symbolic link contents.

-l
-list-avail-encoding
  

List all available encodings supported by fsexam.

-L log-file
--log-file log-file
  

If specified, fsexam writes log into the log-file. Default is no log file writing.

The basic log file format is:


        (category) fullpath: message

The "category" values possible are "ERROR", "WARNING", and "INFO". The "fullpath" is the full path of file that is handled. The "message" briefly describes the operation result.

If the "fullpath" or the "message" contain non-UTF-8 characters, fsexam writes their hexadecimal byte values prefixed with "\x" such as "\xAE\x89" into the file.

-n
--dry-run
  

Dry run mode. With this mode, fsexam writes conversion information into the dry-run-result-file specified with -d option instead of actually performing the conversion on the file names or contents.

If used with -a option, the dry-run-result-file may get more candidates.

Note that compressed or archived files are not supported with this mode and symbolic links and their source consistencies are also not kept.

-P
--append-encoding-list
  

When used with -e option, fsexam appends the encoding-list to the system pre-defined encoding list. Otherwise, it has no effect.

-p
--prepend-encoding-list
  

When used with -e option, fsexam prepends the encoding-list to the system pre-defined encoding list. Otherwise, it has no effect.

-R
--recursive
  

Recursive mode. In this mode, fsexam recursively converts all applicable files and subdirectories specified at the operands as directories.

-r
--remote
  

With this option, fsexam handles files mounted as NFS and such remote file systems. Without the option, fsexam handles files in local disks only.

Obviously, while fsexam is running, file system mounting or unmounting at a directory hierarchy that is being examined are not recommended.

-S
--save-encoding-list
  

By default, the encoding-list option argument of '-e' option is used only for the current session. If this option is specified, however, fsexam makes the encoding-list option argument permanent. This option may override the default, system pre-defined encoding list. If you do not want that to happen, use with -p or -P to prepend or append, respectively.

-s
--restore
  

Restores file names to their original names. To restore file contents, specify with -t option.

This option is useful when you want to restore files to their last states in case wrong conversions have been made.

When this option is used on a file, fsexam restores its name or content. When used on a directory together with -R option, fsexam restores all files and subdirectories under the directory including the directory to their original names or contents.

-t
--conv-content
  

Converts file contents rather than file names. fsexam mainly handles plain text files only.

Internally, fsexam uses file(1) to determine whether a file is a plain text file or not.

First convert file names before converting contents if there are files or directories that contain multi-byte characters in their files names. Otherwise, you may get illegible characters in your log-file or dry-run-result-file.

-w
--follow
  

If specified with -R, fsexam follows symbolic links if they are symbolic links to directories as if they were regular and normal directories. If no -R option is specified, fsexam tries to convert symbolic links and it source only. If the source is a symbolic link too, fsexam keep convert source's source and so on. By default, fsexam does not follow symbolic links.

-V
--version
  

Print the version number of fsexam and halt.

-?
--help
  

Print usage information and halt.

 

OPERANDS

The following operand is supported:

pathname The pathname of a file or a directory to be converted. All arguments behind "--" will be treated as an operand, even if they begin with '-' character. If fsexam encounters '-' as an operand or no operand at all, fsexam will read pathnames from the standard input.

 

EXAMPLES

Example 1: Convert the name of a file

The following will convert the name of file "myfile" using the system pre-defined encoding list:

example% fsexam myfile

If there is no pre-defined encoding for the current locale, fsexam will exit with error messages.

Example 2: Recursively convert the names of files and subdirectories under the directory "mydir" with the given encoding list

example% fsexam -e GB18030:BIG5:EUC-TW --recursive mydir

Example 3: Dry run fsexam with auto-detected encoding

The following will scan the directory "mydir" and try to convert file and directory names under the directory with the system pre-defined plus auto-detected encodings to UTF-8 and store the result into the file, "mydryrunresult" without actually changing the names:

example% fsexam --auto-detect --dry-run -d mydryrunresult \
 --recursive mydir

Example 4: Perform scenario based conversions using a dry run result file

The following will perform scenario based conversions by using the "mydryrunresult." The first candidate for each file name is used. If there is no candidate, no action will be taken on the file:

example% fsexam -d mydryrunresult

Example 5: Forcibly convert a file name

The following will convert the file "myfile" by using the system pre-defined encodings even if fsexam thinks it is UTF-8 encoding already. This option should be used with caution as it may corrupt the already UTF-8 file names and contents:

example% fsexam --force myfile

Example 6: Convert files generated by other utility

The following two examples have the same effect and it will convert files generated by find(1) command with the system pre-defined and auto-detected encodings:

example% /usr/bin/find . -name "*.txt" | fsexam --auto-detect

example% fsexam --auto-detect `/usr/bin/find . -name "*.txt"`

The following is similar to the above two examples except the following uses the system pre-defined encodings only and files generated by ls(1) utility:

example% /usr/bin/ls *.txt | fsexam 

The following will search all files trailing with '.txt' under the current directory and convert any of them using the system pre-defined encoding list:

example% fsexam -f '. -name "*.txt"'

Example 7: Batch mode conversion

The following will use GB18030 and BIG5 to recursively convert file names under the directory "mydir" and use the first candidate to convert the file names.

example% fsexam --batch -e GB18030:BIG5 --recursive mydir

Example 8: Follow symbolic links and handle hidden files

The following will follow all symbolic links in the directory "mydir" and symbolic links in the symbolic link source's directory. Hidden files under the directory will be converted also:

example% fsexam --follow --hidden --recursive mydir

Example 9: Convert file contents recursively using specified encoding list

The following will recursively scan files under the directory "mydir." For each plain text file, it will automatically detect its possible encodings, combine them with GB18030 or BIG5, and try to convert the file using the encodings formulated one by one. If the conversion is successful, fsexam is done with the file and rest of the encodings will not be tried. If a file is a compressed or archived file, fsexam will first uncompress and unarchive them into a temporary directory and perform above operation, compress and archive them again, and replace the original file:

example% fsexam --conv-content --recursive -e GB18030:BIG5 \
--auto-detect --enable-module COMPRESS mydir

Example 10: Restore a file name or a file content

The following restores the file "myfile" to its original name:

example% fsexam  --restore myfile

example% fsexam  --conv-content --restore myfile

The following restores the content of "myfile" to its original content:  

EXIT STATUS

The following exit values are returned:

0 File names or contents are converted successfully or corresponding information is written to a dry run result file successfully.

>0 An error occurred. More information can be retrieved from a log file if "-L log-file" option and option argument are supplied.

 

ATTRIBUTES

See attributes(5) for descriptions of the following attributes:

ATTRIBUTE TYPEATTRIBUTE VALUE
AvailabilitySUNWfsexam
Interface stabilityCommitted

 

SEE ALSO

file(1), find(1), locale(1), tar(1), libauto_ef(3LIB), fsexam(4)  

NOTES

When you want to convert names of many files, do not convert them one by one in a loop. Try to construct a list of files and give the list to fsexam for conversions. For example, the following is not recommended:


    for file in *
    do
       fsexamc -b $file
    done

It is highly recommended to run this utility with UTF-8 locale. Otherwise, you may see some illegible or garbled characters. Since fsexam has the system pre-defined and the most popular encodings for every language, considering the best multiscript capability, it will be more smooth if you run on a UTF-8 locale environment of your language.

As shown in the NOTES section of the tar(1) man page, if an archive is created that contains files whose names were created by processes running in multiple or different locales, a locale that uses a full 8-bit coding space, i.e., 0x0 to 0xff, such as en_US.ISO8859-1 should be used both to create the archive and to extract files from the archive. Due to that, when you specify COMPRESS module with -E option, fsexam(1) tries to use en_US.ISO8859-1, fr_FR.ISO8859-1, de_DE.ISO8859-1, es_ES.ISO8859-1, it_IT.ISO8859-1, or sv_SE.ISO8859-1 locales. If there is no such locale in the current system, use of -E option is ignored and a warning message is issued.


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
OPERANDS
EXAMPLES
EXIT STATUS
ATTRIBUTES
SEE ALSO
NOTES

This document was created by man2html, using the manual pages.
Time: 02:39:29 GMT, October 02, 2010