Content-type: text/html
volintro - Introduction to Logical Storage Manager (LSM) utilities
/sbin/volassist, /sbin/vold, /sbin/voldctl, /sbin/voldg, /sbin/voldisk, /sbin/voledit, /usr/sbin/volinfo, /sbin/voliod, /sbin/volmake, /usr/sbin/volmend, /sbin/volplex, /usr/sbin/volprint, /sbin/volrecover, /sbin/volsd, /usr/sbin/volstat, /usr/sbin/voltrace, /sbin/volume
The
Logical Storage Manager utilities provide a shell-level interface used by
the system administrator and higher-level applications and scripts to query
and manipulate objects that are managed through the Logical Storage Manager
(LSM).
Some of the terms and objects that are used with the Logical Storage
Manager are:
A virtual disk device that looks to applications and file
systems like a regular disk partition device. Volumes present block and raw
device interfaces that are compatible in their use, with disk partition devices.
However, a volume is a virtual device that can be mirrored, spanned across
disk drives, moved to use different storage, and striped using administrative
commands. The configuration of a volume can be changed, using the Logical
Storage Manager utilities, without causing disruption to applications or file
systems that are using the volume.
A copy of a volume's logical data address space, also sometimes
known as a mirror. A volume can have up to eight
plexes associated with it. Each plex is, at least conceptually, a copy of
the volume that is maintained consistently in the presence of volume I/O and
reconfigurations. Plexes represent the primary means of configuring storage
for a volume. Plexes can have a striped or concatenated organization (layout).
Disks exist as two entities. One is the physical disk on
which all data is ultimately stored and which exhibits all the behaviors of
the underlying technology. The other is the Logical Storage Manager presentation
of disks which, while mapping one-to-one with the physical disks, are just
presentations of units from which allocations of storage are made. As an example,
a physical disk presents the image of a device with a definable geometry with
a definable number of cylinders, heads etc. whereas a Logical Storage Manager
disk is simply a unit of allocation with a name and a size.
A region of storage allocated on a disk for use with a volume.
Subdisks are associated to volumes through plexes. One or more subdisks are
layout to form plexes based on the plex layout: striped or concatenated.
Subdisks are defined relative to disk media records.
A reference to a physical disk, or possibly a disk partition.
This record can be thought of as a physical disk identifier for the disk
or partition. Disk media records are configuration records that provide a
name (known as the disk media name or DM name) that an administrator can use
to reference a particular disk independent of its location on the system's
various disk controllers. Disk media records reference particular physical
disks through a disk ID, which is a unique identifier that is assigned to
a disk when it is initialized for use with the Logical Storage Manager.
A number of conventions are used throughout much of the Logical Storage
Manager to provide a degree of similarity between the various operations.
The following is a list of such conventions:
Most utilities in the Logical Storage Manager provide more than one operation, with operations grouped into utilities primarily by object type. Utilities that provide multiple operations are typically invoked with the following form:
"utility [ options ] keyword [ operands ]"
Here, utility is the name of the utility and keyword is a name that identifies the specific operation to perform. Any options that are introduced in the standard -letter form precede the operation keyword.
To aid in normal use, each utility provides an extended usage message
that lists all the options and operation keywords supported by the utility.
The extended usage message for a utility can be displayed using a keyword
of help. The extended usage messages are intended to serve
as reminders, and not as replacements for user documentation.
Many basic properties of objects that are managed by the Logical Storage
Manager require specification of lengths, either as a pure object length or
as an offset relative to some other object. The Logical Storage Manager supports
volume lengths up to 2,147,483,647 disk sectors (one terabyte on most systems).
Typing such large numbers, or even much smaller numbers, can be annoying.
The Logical Storage Manager provides a uniform syntax for representing such
numbers which uses suffixes to provide convenient multipliers. Numbers can
be specified in decimal, octal or hexadecimal. Also, numbers can be specified
as a sum of several numbers, as a convenience to avoid using a calculator.
A hexadecimal (base 16) number is introduced using a prefix of 0x. For example, 0xfff is the same as decimal 4091. An octal (base 8) number is introduced using a prefix of 0. For example, 0177777 is the same as decimal 65535.
A number can be followed by a suffix character to indicate a multiplier for the number. A length number with no suffix character represents a count of standard disk sectors. The length of a standard disk sector can vary between systems. It is commonly 512 bytes. On systems where disks can have different sector sizes, one of the sectors sizes will be chosen as the ``standard'' size. Supported suffix characters are:
multiply the length by 512 bytes (blocks) multiply the length by the standard sectors size (default) multiply the length by 1024 bytes (Kilobytes) multiply the length by 1,048,576 (1024K) bytes (Megabytes) multiply the length by 1,073,741,824 (1024M) bytes (Gigabytes)
Numbers are represented internally as an integer number of sectors. As a result, if the standard disk sectors size is larger than 512 bytes, numbers can be specified that will need to be rounded to a sector. Rounding is always done to the next lowest, not the nearest, multiple of the sector size.
Because the letter b is a valid hexadecimal character, there is a special case for the b suffix where a single blank character can separate a number from the b suffix character. Use of a blank within a number, when invoking commands from the shell, usually requires quoting the number. For example: /sbin/volassist make vol01 "0x1000 b"
Numbers can be added or subtracted by separating two or more numbers by a plus or minus sign, respectively. A plus sign is optional. As an example, the largest allowed number that can be represented on a system with a 512 byte sector size can be entered as: 1023g+1023m+1023k+1
Note that 1024g-1 cannot be used, because the implementation cannot handle the intermediate representation of 1024g (which is greater than the largest number that can be represented) internally.
The Logical Storage Manager always reports length numbers as a simple count of sectors, with no suffix character.
Case is not important in length numbers. Hexadecimal numbers and suffix
characters can be specified using any reasonable combination of upper- and
lower-case letters.
Most commands operate upon only one disk group per invocation. Each disk group has a separate configuration from every other disk group and it is possible for two disk groups to contain two objects that have the same name. This can happen, in particular, if a disk group is moved from one system to another. However, most utilities make no attempt to ensure that names between disk groups are unique, so name collisions can occur anyway.
System administrators who endeavor to avoid name collisions should be able to use most of the utilities without having to specify disk groups except when creating objects. Administrators cannot use single-command invocations that reference objects in more than one disk group, but disk groups will be selected automatically, based on objects specified in the command.
The standard rules that most commands use for selecting the disk group for a command are as follows: Given a particular set of object names specified on the command, look for the disk group of each object. If all objects are in the same disk group, use that disk group. If any named object is not unique between all disk groups, and if the one of those object names is not in the rootdg disk group, then fail. To force use of a particular disk group, use -g diskgroup to indicate the group. Non-unique names do not cause errors when a disk group is specified explicitly. The diskgroup specification is either a disk group ID or a disk group name. A special case is provided for the rootdg disk group. Any set of objects in the rootdg disk group can be specified without specifying -g rootdg, even if the name is used in another disk group.
If a set of object names is given on the command line, and if some are
unique but some are not unique, then the command will still fail according
to the rules listed above. Just because a combination of objects could be
used to disambiguate the disk group does not mean that a utility will do so.
Disk group configurations contain six types of records: volume records,
plex records, subdisk records, disk media records, disk group records, and
disk access records. Disk access records are specific to the root disk group
and are stored in configurations only because there is no other convenient
place to store them; otherwise, they are logically separate from all disk
groups. Because they are specific and meaningful to the local system only,
the logical place for their storage is the rootdg since
that is the only disk group guaranteed to exist on the system.
Disk group records define several different types of names for a disk
group. The different types of names are:
This is the name of the disk group, as the name is defined
on disk. This name is stored in the disk group configuration, and is also
stored in the disk headers of all disks in the disk group.
This is the standard name that the system uses when referencing
the disk group. References to the disk group name usually mean the alias
name. Volume and plex device directories are structured into subdirectories
based on the disk group alias name. Typically, the disk group's alias name
and real name are identical. A local alias can be useful for gaining access
to a disk group with a name that conflicts with other disk groups in the system,
or that conflicts with records in the rootdg disk group.
A 64-byte identifier that represents the unique ID of the
disk group. All disk groups on all systems should have a different disk group
ID, even if they have the same real name. This identifier is stored in the
disk headers of all disks in the disk group. It is used to ensure that the
Logical Storage Manager does not confuse two disk groups which were created
with the same name.
Volume records define the characteristics of particular volume devices.
The name of a volume record defines the node name used for files in the /dev/vol and /dev/rvol directories. The block
device for a particular volume (which can be used as an argument to the mount command (see
mount(8)) has the path:
/dev/vol/groupname/volume
In this command path, groupname is the name assigned by the administrator to the disk group containing the volume. The raw device for a volume, typically used for application I/O and for issuing I/O control operations (see ioctl(2)), has the path:
/dev/rvol/groupname/volume
For convenience, volumes assigned to the root disk group are accessible under the rootdg subdirectories of /dev/vol and /dev/rvol, but are also under /dev/vol/volume and /dev/rvol/volume.
Reads to a volume device are directed to one of the read-write or read-only plexes associated with the volume. Writes to the volume are directed to all of the enabled read-write and write-only plexes associated with the volume.
During a write operation, two plexes of a volume may become out of sync with each other, due to the fact that writes directed to two disks can complete at different times. This is not normally a problem. However, if the system were to crash or lose power during a write operation, the two plexes could have different contents.
Most applications and file systems are not written with the presumption that two separate reads of a device can return different contents without an intervening write operation. Because plexes with different contents could cause such a situation where two read operations of a block return different contents, the Logical Storage Manager expends considerable effort to ensure that this is avoided.
Volumes have the following fundamental attributes:
Each volume has a usage type, which defines a particular class
of rules for operating on the volume, typically based on the expected content
of the volume. Several utilities can apply extensions or limitations that
apply to volumes with a particular usage type. Four usage types are included
with the base release of the Logical Storage Manager: fsgen,
for use with volumes that contain file systems; gen, for
use with volumes that are used as swap devices or for other applications that
do not use file systems; and special root and swap usage types which are specifically for use with the root file
system volume and the primary swap device.
Each volume has a length, which defines the limiting offset
of read and write operations. The length is assigned by the administrator,
and may or may not match the lengths of the associated plexes.
Each volume is either enabled, disabled, or detached. When
enabled, normal read and write operations are allowed on the volume, and any
file system residing on the volume can be mounted, or used in the usual way.
When disabled, no access to the volume or any of its associated plexes is
allowed. When detached, the plex devices for the volume can be accessed,
and some ioctls can be used by utilities to operate on the volume.
Usage types maintain a private state field related to the
volume that relate to operations that have been performed on the volume, or
to failure conditions that have been encountered. This state field contains
a string of up to 14 characters.
Each volume has between zero and eight associated plexes.
A configurable policy for switching between plexes for volume
reads. When a volume has more than one enabled associated plex, the Logical
Storage Manager can distribute reads between the plexes to distribute the
I/O load and thus increase total possible bandwidth of reads through the volume.
Plex records define the characteristics of a particular mirror of a
volume. A plex can be in either an associated state or a dissociated state.
In the dissociated state, the plex is not a part of a volume. A dissociated
plex cannot be accessed in any way. An associated plex can be accessed through
the volume and, in a limited fashion, through a plex device. The name of
the plex defines the node name used for files in the /dev/plex
directory. The device for a particular plex has the path:
/dev/plex/groupname/plex
In this command path, groupname is the name assigned by the administrator to the disk group containing the plex. For convenience, plexes assigned to the root disk group are accessible both under the rootdg subdirectory of /dev/plex and also under /dev/plex/plex.
Plexes have the following fundamental attributes:
Each plex is either enabled, disabled, or detached. When
enabled, normal read and write operations from the volume can be directed
to the plex. When disabled, no I/O operations will be applied to the plex.
When detached, normal volume I/O will not be directed to the plex. When
detached, the plex device can be accessed for either read or write access
using the special plex device nodes. If a plex is enabled, however, then the
plex device can be read but not written.
Subdisk records define a region of disk, allocated from a disk's public
region. Subdisks have very little state associated with them, other than
the configuration state that defines which region of disk the subdisk occupies.
Subdisks cannot overlap each other, either in their associations with plexes,
or in their arrangement on disk public regions.
Subdisks have the following fundamental attributes:
The name of the disk media record that the subdisk is defined
on.
The offset, from the beginning of the disk's public region,
to the start of the subdisk.
For associated subdisks, this is the offset (from the beginning
of the plex) of the subdisk association. For subdisks associated with striped
plexes, the plex offset defines relative ordering of subdisks in the plex,
rather than actual offsets within the plex address space.
The length of the subdisk.
An administrator-assigned string of up to 40 characters that
can be set and changed using the voledit utility. The
Logical Storage Manager does not interpret the comment field. The comment
cannot contain newline characters.
Disk media records define a specific disk within a disk group. The name of a disk media record is assigned when a disk is first added to a disk group (using the /sbin/voldg adddisk operation). Disk media records can be assigned to specific physical disks by associating the media record with the current disk access record for the physical disk.
Disk media records have the following fundamental attributes:
A 64-byte unique identifier representing the physical disk
to which the media record is associated. This can be cleared to indicate
that the disk is considered in the removed state.
A removed disk has no current association with any physical disk.
The disk access name that is currently used to access the
physical disk referenced by the disk ID. If the disk ID is defined, but no
physical disk with that ID could be found, the disk access name will be clear.
A disk where the physical disk could not be found is considered to be in
the NODAREC, or inaccessible,
state. A disk can become inaccessible either because the indicated disk is
not currently attached to the system, or because I/O failures on the physical
disk prevented the Logical Storage Manager from identifying or using the physical
disk.
Disk access records define an address, or access path, that can be used to access a disk. The list of all disk access records defines the list of all disk addresses that the Logical Storage Manager can use to locate physical disks. Disk access records do not define specific physical disks, since physical disks can be moved on a system. When a physical disk is moved, a different disk access record may be necessary to locate it.
Disk access records are stored in the rootdg disk group configuration. Unlike all other record types, the names of disk access records can conflict with the names of other records. For example, a specialty disk (such as a RAM disk) can use the same name for both the disk access record and the disk media record that points to it. It is typically advisable to use different names for the access and media records, to avoid additional confusion if disks are moved.
Disk access records can be defined explicitly. Some (sometimes all) disk access records may be configured automatically by the Logical Storage Manager, based on available information in the operating system. Such automatically-configured disks are not stored persistently in the on-disk root disk group configuration, but are instead regenerated every time the Logical Storage Manager starts up.
Disk access records have the following fundamental attributes: The name of the disk access record is typically a disk address of some kind. Disk names are usually of the form ranp or rznp, where ra is the device mnemonic for MSCP disks, rz is the device mnemonic for SCSI disks, n is the unit number of the device, and p is the partition identifier (in the range a to h). Each disk access record has a type, which identifies certain key characteristics of the Logical Storage Manager's interaction with the disk. Currently available types are: sliced, simple, and nopriv. See voldisk(8) for more information on disk types. Typically, most or all of the disks will be of type sliced. It may be desirable to create specialty disks (such as RAM disks) with type nopriv.
If the physical disk represented by the disk access record is currently associated with a disk media record, then the following fields are defined: The name of the disk group containing the disk media record. The name of the media record that points to the physical disk.
Additional attributes can be added, arbitrarily, by disk types. See
voldisk(8)
for a list of additional attributes defined by the standard disk types.
The usage type of a volume represents a class of rules for operating
on a volume. Each usage type is defined by a set of executables under the
directory /etc/vol/type/usage_type,
where usage_type is the name given to the usage
type. The required executables are: volinfo, volmake, volmend, volplex, volsd, and volume. These executables are invoked
by the Logical Storage Manager administrative utilities with the same names.
The executables under /etc/vol/type should not, normally,
be executed directly.
Four usage types are provided with the Logical Storage Manager: gen, fsgen, root, and swap. It is likely that new usage types will be added in future releases. It is also possible for third-party products to install usage types.
The usage types currently provided with the Logical Storage Manager store state information in the volume and plex usage-type state fields. The state fields defined for volumes are: The volume is not yet initialized. This is the initial state for volumes created by volmake. The volume has been stopped and the contents for all plexes are consistent. The volume has been started and is running normally, or was running normally when the system was stopped. If the system crashes in this state, then the volume may require plex consistency recovery. The volume requires recovery. This is typically set after a system failure to indicate that the plexes in the volume may be inconsistent, so that they require recovery [see the resync operation in volume(8)]. Plex consistency recovery is currently being done on the volume. /sbin/volume resync sets this state when it starts to recovery plex consistency on a volume that was in the NEEDSYNC state.
The state fields defined for plexes are:
The plex is not yet initialized. This state is set when the
volume state is also EMPTY.
The plex was running normally when the volume was stopped.
The plex will be enabled without requiring recovery when the volume is started.
The plex is running normally on a started volume. The plex
condition flags (NODAREC, REMOVED, RECOVER, and IOFAIL) may apply if the system
is rebooted and the volume restarted.
The plex was detached, either by /sbin/volplex det or by an I/O failure. /sbin/volume start will
change the state for a plex to STALE if any of the plex
condition flags are set. STALE plexes will be reattached
automatically, when starting a volume, by calling /sbin/volplex att.
The plex was disabled by the /usr/sbin/volmend off operation. See
volmend(8) for more information.
This is a snapshot plex that is being attached by the /sbin/volassist snapstart operation. When the attach is complete,
the state for the plex will be changed to SNAPDONE. If
the system fails before the attach completes, the plex and all of its subdisks
will be removed.
This is a snapshot plex created by /sbin/volassist
snapstart that is fully attached. A Plex in this state can be turned
into a snapshot volume with /sbin/volassist snapshot.
See
volassist(8) for more information. If the system fails before the attach
completes, the plex and all of its subdisks will be removed.
This is a snapshot plex being attached by the /sbin/volplex
snapstart operation. When the attach is complete, the state for
the plex will be changed to SNAPDIS. If the system fails
before the attach completes, the plex will be dissociated from the volume.
This is a snapshot plex created by /sbin/volplex
snapstart that is fully attached. A Plex in this state can be turned
into a snapshot volume with /sbin/volplex snapshot. See
volplex(8)
for more information. If the system fails before the attach completes, the
plex will be dissociated from the volume.
This is a plex that is being associated and attached to a
volume with /sbin/volplex att. If the system fails before
the attach completes the plex will be dissociated from the volume.
This is a plex that is being associated and attached to a
volume with /sbin/volplex att. If the system fails before
the attach completes the plex will be dissociated from the volume and removed.
Any subdisks in the plex will be kept.
This is a plex that is being associated and attached to a
volume with /sbin/volplex att. If the system fails before
the attach completes, the plex and its subdisks will be dissociated from the
volume and removed.
The majority of the Logical Storage Manager utilities use a common set
of exit codes, which can be used by shell scripts or other types of programs
to react to specific problems detected by the utilities. For C programmers,
these exit status codes are defined in the include file volclient.h. The number and macro name for each distinct exit code is described
below. Shell script writers must directly compare against the numbers specified.
The utility is not reporting any error through the exit code.
Some command line arguments to the utility were invalid.
A syntax error occurred in a command or description, or a
specified record name is too long or contains invalid characters. This code
is returned only by utilities that implement a command or description language.
This code may also be returned for errors in search patterns.
The volume daemon does not appear to be running.
An unexpected error was encountered while communicating with
the volume daemon.
An unexpected error was returned by a system call or by the
C library. This can also indicate that the utility ran out of memory.
The status for a commit was lost because the volume daemon
was killed and restarted during the commit of a transaction, but after restart
the volume daemon did not know whether the commit succeeded or failed.
The utility encountered an error that it should not have encountered.
This generally implies a condition that the utility should have tested for
but did not, or a condition that results from the volume daemon returning
a value that did not make sense.
The time required to complete a transaction exceeded 60 seconds,
causing the transaction locks to be lost. As most utilities will reattempt
the transaction at least once if a timeout occurs, this usually implies that
a transaction timed out two or more times.
No disk group could be identified for an operation. This
results either from naming a disk group that does not exist, or from supplying
names on a command line that are in different disk groups or in multiple disk
groups.
A change made to the database by another process caused the
utility to stop. This code is also returned by a usage-type-dependent utility
if it is given a record that is associated with a different usage-type. If
this situation occurs when the usage-type-dependent utility is called from
a switchout utility, then the database was changed after the switchout utility
determined the proper usage-type to invoke.
A requested subdisk, plex, or volume record was not found
in the configuration database. This may also mean that a record was an inappropriate
type.
A name used to create a new configuration record matches the
name of an existing record.
A subdisk, plex, or volume is locked against concurrent access.
This code is used for inter-transaction locks associated with usage-type
utilities. The code is also used for the dissociated plex or subdisk lock
convention, which writes a non-blank string to the tutil[0]
field in a plex or subdisk structure to indicate that the record is being
used.
No usage-type could be determined for a utility that requires
a usage type.
An unknown or invalid usage-type was specified.
A plex or subdisk is associated, but the operation requires
a dissociated record.
A plex or subdisk is dissociated, but the operation requires
an associated record. This code can also be used to indicate that a subdisk
or plex is not associated with a specific plex or volume.
A plex or subdisk was not dissociated because it was the last
record associated with a volume or plex.
Association of a plex or subdisk would surpass the maximum
number that can be associated to a volume or plex.
A specified operation is invalid within the parameters specified.
For example, this code is returned when an attempt is made to split a subdisk
on a striped plex, or to use a split size that is greater than the size of
the plex.
An I/O error was encountered that caused the utility to abort
an operation.
A volume involved in an operation did not have any associated
plexes, although at least one was required.
A plex involved in an operation did not have any associated
subdisks, although at least one was required
A volume could not be started by the /sbin/volume
start operation, because the configuration of the volume and its
plexes prevented the operation.
A specified volume was already started.
A specified volume was not started. For example, this code
is returned by the /sbin/volume stop operation if the operation
is given a volume that is not started.
A volume or plex involved in an operation is in the detached
state, thus preventing a successful operation.
A volume or plex involved in an operation is in the disabled
state, thus preventing a successful operation.
A volume or plex involved in an operation is in the enabled
state, thus preventing a successful operation.
An unknown error was encountered. This code may be used,
for example, when the volume daemon returns an unrecognized error number.
An operation failed because a volume or plex device was open
or mounted, or because a subdisk was associated with an open or mounted volume
or plex.
Exit codes greater than 32 are reserved for use by usage-types. Codes
greater than 64 can be reserved for use by specific utilities.
volassist(8), vold(8), voldctl(8), voldg(8), voldisk(8), voldiskadm(8), voledit(8), volinfo(8), voliod(8), volmake(8), volmend(8), volnotify(8), volplex(8), volprint(8), volrecover(8), volsd(8), volstat(8), voltrace(8), volume(8), volwatch(8), plexrec(4), sdrec(4), volrec(4), volmake(4)