Manual Pages
Table of Contents
na_fcstat - Fibre Channel stats functions
fcstat link_stats [ channel_name ]
fcstat fcal_stats [ channel_name ]
fcstat device_map [ channel_name ]
Use the fcstat command to show (a) link statistics maintained
for all drives on a Fibre Channel loop, (b) internal
statistics kept by the Fibre Channel driver, and (c) a
tenancy and relative physical position map of drives on a
Fibre Channel loop.
All disk drives maintain counts of useful link events.
The link_stats option displays the link event counts and
this information can be useful in isolating problems on
the loop. Refer to the event descriptions and the example
below for more information.
link failure count
The drive will note a link failure event if it cannot
synchronize its receiver PLL for a time greater
than R_T_TOV, usually on the order of milliseconds.
A link failure is a loss of sync that occurred for
a long enough period of time and therefore resulted
in the drive initiating a Loop Initialization Primitive
(LIP). Refer to loss of sync count below.
underrun count
Underruns are detected by the Host Adapter (HA)
during a read request. The disk sends data to the
HA through the loop and if any frames are corrupted
in transit, they are discarded by the HA as it has
received less data than expected. The driver
reports the underrun condition and retries the
read. The cause of the underrun is downstream in
the loop after the disk being read and before the
HA.
loss of sync count
The drive will note a loss of sync event if it
loses PLL synchronization for a time period less
than R_T_TOV and thereafter manages to resynchronize.
This event generally occurs when a component,
before the disk, reports loss of sync up to
and including the previous active component in the
loop. Disks that are on the shelf borders are subject
to seeing higher loss of sync counts than
disks that are not on a border.
invalid CRC count
Every frame received by a drive contains a checksum
that covers all data in the frame. If upon receiving
the frame the checksum does not match, the
invalid CRC counter is incremented and the frame is
"dropped". Generally, the disk which reports the
CRC error is not at fault but a component between
the Host Adapter (which originated the write
request) and the reporting drive, corrupted the
frame.
frame in count/ frame out count
These counts represent the total number of frames
received and transmitted by a device on the loop.
The number of frames received by the Host Adapter
is equal to the sum of all of the frames transmitted
from all of the disks. Similarly, the number
of frames transmitted by the Host Adapter is equal
to the sum of all frames received by all of the
disks.
The occurrence of any of the error events may result in
loop disruption. A link failure is considered the most
serious since it may indicate a transmitter problem that
is affecting loop signal integrity upstream of the drive.
These events will typically result in frames being dropped
and may result in data underruns or SCSI command timeouts.
Note that loop disruptions of this type, even though
potentially resulting in data underruns and/or SCSI command
timeouts, will not result in data corruption. The
host adapter driver will detect such events and will retry
the associated commands. The worst-case effect is a negligible
drop in performance.
All drive counters are persistent across filer reboots and
drive resets and can only be cleared by power-cycling the
drives. Host adapter counters, e.g. underruns, are reset
with each reboot.
The Fibre Channel host adapter driver maintains statistics
on various error conditions, exception conditions, and
handler code paths executed. In general, interpretation of
the fields requires understanding of the internal workings
of the driver. However, some of the counts kept on a per
drive basis, (e.g. device_underrun_cnt, device_over_run_cnt,
device_timeout_cnt) may be helpful in identifying
potentially problematic drives.
Counts are not persistent across filer reboots.
A Fibre Channel loop, as the name implies, is a logically
closed loop from a frame transmission perspective. Consequently,
signal integrity problems caused by a component
upstream will be seen as problem symptoms by components
downstream.
The relative physical position of drives on a loop is not
necessarily directly related to their loop IDs (which are
in turn determined by the drive shelf IDs). The device_map
sub-command is helpful therefore in determining relative
physical position on the loop.
Two pieces of information are displayed, (a) the physical
relative position on the loop as if the loop was one flat
space, and (b) the mapping of devices to shelves, to aid
in quick correlation of disk ID with shelf tenancy.
Diagnosing a possible problem using fcstat
Suppose a running filer is experiencing problems indicative
of loop signal integrity problems. For example, the
syslog shows SCSI commands being aborted (and retried) due
to frame parity/CRC errors.
To isolate the faulty component on this loop, we collect
the output of link_stats and device_map.
toaster> fcstat link_stats 4
-
Loop Link Underrun Loss of Invalid Frame In Frame Out
ID Failure count sync CRC count count
count count count
4.29 0 0 180 0 787 2277
4.28 0 0 26 0 787 2277
4.27 0 0 3 0 787 2277
4.26 0 0 13 0 788 2274
4.25 0 0 27 0 779 2269
4.24 0 0 2 0 787 2277
4.23 0 0 11 0 786 2274
4.22 0 0 83 0 786 2274
4.21 0 0 3 0 786 2274
4.20 0 0 11 0 786 2274
4.19 0 0 14 0 779 2277
4.18 0 0 26 0 786 2274
4.17 0 0 10 0 787 2274
4.16 0 0 90 0 779 2269
4.45 0 0 12 0 183015 179886
4.44 0 0 16 0 1830107 17990797
4.43 0 0 7 11 1829974 17988806
4.42 0 0 13 33 1968944 18123526
4.41 0 0 14 23 1843636 17989836
4.40 0 0 13 11 1828782 17990036
4.39 0 0 14 138 4740596 18459648
4.38 0 0 11 27 1832428 17133866
4.37 0 0 43 22 1839572 17994200
4.36 0 0 13 130 4740446 18468932
4.35 0 0 11 23 1844301 17994200
4.34 0 0 14 25 1832428 17133866
4.33 0 0 26 29 1839572 17894220
4.32 0 0 110 31 1740446 18268912
4.61 0 0 50 23 1844301 17994200
4.60 0 0 12 21 1830150 18188148
4.59 0 0 16 19 1830107 17990997
4.58 0 0 7 27 1829974 17988904
4.57 0 0 13 25 1968944 18123526
4.50 0 0 14 19 1843636 17889830
4.49 0 0 13 22 1828782 18090042
4.48 0 0 114 130 4740596 18459648
4.ha 0 0 1 0 396255820 51468458
toaster> fcstat device_map 4
Loop Map for channel 4:
Translated Map: Port Count 37
7 29 28 27 26 25 24 23 22 21 20 19 18 17 16 45
44 43 42 41 40 39 38 37 36 35 34 33 32 61 60 59
58 57 50 49 48
Shelf mapping:
Shelf 1: 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Shelf 2: 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Shelf 3: 61 60 59 58 57 XXX XXX XXX XXX XXX XXX 50 49 48
From the output of device_map we see the following
Drive 29 is the first component on the loop immediately
downstream from the host adapter. (Note that the host
adapter port (7) will always appear first on the position
map.)
Shelf 3 has 6 slots that do not have any disks, which are
represented by `XXX'. If the slot showed `BYP', then the
slot is bypassed by an embedded switched hub (ESH).
Shelf 1 is connected to shelf 2 between drives 16 and 45.
Shelf 2 is connected to shelf 3 between drives 32 and 61.
From the output of link_stats we can see the following
There is a higher loss of sync count for the drive connected
to the host adapter. Since every filer reboot
involves reinitialization of the host adapters, we expect
the first drive on the loop to see a higher loss of sync
count.
Disks 4.16 through 4.29 are probably spares as they have
relatively small frame counts.
CRC errors are first reported by drive 4.43. Assuming
that there is only one cause of all the CRC errors, then
the failing component is located between the Host Adapter
and drive 4.43.
Since drive 4.43 is in shelf 2, it is possible that the
errors are being caused by faulty components connecting
the shelves. In order to isolate the problem, we want to
see if it is related to any of the shelf connection
points. We can do this by running a disk write test on
the first shelf of disks using the following command (This
command is only available in maintenance mode so it will
be necessary to reboot.)
*> disktest -W -s 4:1
where:
W Write workload since CRC errors only occur on writes
s 4:1 test only shelf 1 on adapter 4
If errors are seen testing shelf 1, then it is likely that
the faulty component is either the cable or the I/O module
between the host adapter and the first drive. If no errors
are seen testing shelf 1, then the test should be run on
shelf 2. If errors are seen testing shelf 2, the faulty
component could be the connection between shelf 1 and 2.
A plan of action would involve (a) replacing cables
between shelves 1 and 2, or HA and shelf 1, and (b)
replacing I/O modules at faulty connection point.
Example of a link status for Shared Storage configurations
The following link staus shows a Shared Storage configuration
ferris> fcstat link_stats
Targets on channel 4a:
Loop Link Underrun Loss of Invalid Frame In Frame Out
ID Failure count sync CRC count count
count count count
4a.80 1 0 9 0 0 0
4a.81 1 0 3 0 0 0
4a.82 1 0 13 0 0 0
4a.83 1 0 3 0 0 0
4a.84 1 0 3 0 0 0
4a.86 1 0 3 0 0 0
4a.87 1 0 3 0 0 0
4a.88 1 0 3 0 0 0
4a.89 1 0 3 0 0 0
4a.91 1 0 10 0 0 0
4a.92 1 0 3 0 0 0
4a.93 1 0 264 0 0 0
Initiators on channel 4a:
Loop Link Underrun Loss of Invalid Frame In Frame Out
ID Failure count sync CRC count count
count count count
4a.0 (self) 0 0 0 0 0 0
4a.7 (toaster) 0 0 0 0 0 0
From the output of link_stats we see the following
The local filer has a loop id of 0 on this loop, and the
filer named toaster has a loop id of 7 on this loop.
Example of a device map for Shared Storage configurations
The following device map shows a Shared Storage configuration
ferris> fcstat device_map
Loop Map for channel 4a:
Translated Map: Port Count 14
0 80 81 82 83 84 86 87 88 89 91 92 93 7
Shelf mapping:
Shelf 5: 93 92 91 XXX 89 88 87 86 XXX 84 83 82 81 80
Initiators on this loop:
0 (self) 7 (toaster)
From the output of device_map we see the following
Both slot 6a and 6b are attached to Shelves 1 and 6.
Each loop has four filers conncted to it. On both loops,
the loop id of filer `ha15' is 0, the loop id of the local
filer, `ha16', is 1, the loop id of filer `ha17' is 2, the
loop id of the local filer, `ha18', is 7.
Example of a device map for switch attached drives
The following device map shows a configuration where a set
of shelves is connected via a switch
toaster> fcstat device_map
Loop Map for channel 9:
Translated Map: Port Count 43
7 32 33 34 35 36 37 38 39 40 41 42 43 44 45 16
17 18 19 20 21 22 23 24 25 26 27 28 29 64 65 66
67 68 69 70 71 72 73 74 75 76 77
Shelf mapping:
Shelf 1: 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Shelf 2: 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Shelf 4: 77 76 75 74 73 72 71 70 69 68 67 66 65 64
Loop Map for channel sw2:0:
Translated Map: Port Count 15
126 93 92 89 91 90 88 87 86 85 84 83 80 82 81
Shelf mapping:
Shelf 5: 93 92 91 90 89 88 87 86 85 84 83 82 81 80
From the output of device_map we see the following
The first set of shelves is connected to a host adapter in
slot 9.
The disks of shelf 5 are connected via a switch `sw2' at
its port 0. The switch port is 126 and appears first in
the translated map.
Statistics are maintained symmetrically for primary and
partner loops.
Table of Contents