The in.mpathd daemon performs Network Interface Card (NIC) failure and repair detection. In the event of a NIC failure, it causes IP network access from the failed NIC to failover to a standby NIC, if available, or to any another operational NIC that has been configured as part of the same network multipathing group. Once the failed NIC is repaired, all network access is restored to the repaired NIC.
The in.mpathd daemon can detect NIC failure and repair through two methods: by monitoring the IFF_RUNNING flag for each NIC (link-based failure detection), and by sending and receiving ICMP echo requests and replies on each NIC (probe-based failure detection). Link-based failure detection requires no explicit configuration and thus is always enabled (provided the NIC driver supports the feature); probe-based failure detection must be enabled through the configuration of one or more test addresses (described below), but has the benefit of testing the entire NIC send and receive path.
If only link-based failure detection is enabled, then the health of the interface is determined solely from the state of the IFF_RUNNING flag. Otherwise, the interface is considered failed if either of the two methods indicate a failure, and repaired once both methods indicate the failure has been corrected. Not all interfaces in a group need to be configured with the same failure detection methods.
As mentioned above, in order to perform probe-based failure detection in.mpathd needs a special test address on each NIC for the purpose of sending and receiving probes on the NIC. Use the ifconfig command -failover option to configure these test addresses. See ifconfig(1M). The test address must belong to a subnet that is known to the hosts and routers on the link.
The in.mpathd daemon can detect NIC failure and repair by two methods, by sending and receiving ICMP echo requests and replies on each NIC, and by monitoring the IFF_RUNNING flag for each NIC. The link state on some models of NIC is indicated by the IFF_RUNNING flag, allowing for faster failure detection when the link goes down. The in.mpathd daemon considers a NIC to have failed if either of the above two methods indicates failure. A NIC is considered to be repaired only if both methods indicate the NIC is repaired.
The in.mpathd daemon sends the ICMP echo request probes to on-link routers. If no routers are available, it sends the probes to neighboring hosts. Thus, for network failure detection and repair, there must be at least one neighbor on each link that responds to ICMP echo request probes.
in.mpathd works on both IPv4 and IPv6. If IPv4 is plumbed on a NIC, an IPv4 test address is configured on theNIC, and the NIC is configured as part of a network multipathing group, then in.mpathd will start sending ICMP probes on the NIC using IPv4.
In the case of IPv6, the link-local address must be configured as the test address. The in.mpathd daemon will not accept a non-link-local address as a test address. If the NIC is part of a multipathing group, and the test address has been configured, then in.mpathd will probe the NIC for failures using IPv6.
Even if both the IPv4 and IPv6 protocol streams are plumbed, it is sufficient to configure only one of the two, that is, either an IPv4 test address or an IPv6 test address on a NIC. If only an IPv4 test address is configured, it probes using only ICMPv4. If only an IPv6 test address is configured, it probes using only ICMPv6. If both type test addresses are configured, it probes using both ICMPv4 and ICMPv6.
The in.mpathd daemon accesses three variable values in /etc/default/mpathd: FAILURE_DETECTION_TIME, FAILBACK and TRACK_INTERFACES_ONLY_WITH_GROUPS.
The FAILURE_DETECTION_TIME variable specifies the NIC failure detection time for the ICMP echo request probe method of detecting NIC failure. The shorter the failure detection time, the greater the volume of probe traffic. The default value of FAILURE_DETECTION_TIME is 10 seconds. This means that NIC failure will be detected by in.mpathd within 10 seconds. NIC failures detected by the IFF_RUNNING flag being cleared are acted on as soon as the in.mpathd daemon notices the change in the flag. The NIC repair detection time cannot be configured; however, it is defined as double the value of FAILURE_DETECTION_TIME.
By default, in.mpathd does failure detection only on NICs that are configured as part of a multipathing group. You can set TRACK_INTERFACES_ONLY_WITH_GROUPS to no to enable failure detection by in.mpathd on all NICs, even if they are not part of a multipathing group. However, in.mpathd cannot do failover from a failed NIC if it is not part of a multipathing group.
The in.mpathd daemon will restore network traffic back to the previously failed NIC, after it has detected a NIC repair. To disable this, set the value of FAILBACK to no in /etc/default/mpathd.
/etc/default/mpathd Contains default values used by the in.mpathd daemon.
See attributes(5) for descriptions of the following attributes:
|ATTRIBUTE TYPE||ATTRIBUTE VALUE|
ifconfig(1M), attributes(5), icmp(7P), icmp6(7P),
System Administration Guide: IP Services
Test address address is not unique; disabling probe based failure detection on interface_name
For in.mpathd to perform probe-based failure detection, each test address in the group must be unique. Since the IPv6 test address is a link-local address derived from the MAC address, each IP interface in the group must have a unique MAC address.
NIC interface_name of group group_name is not plumbed for IPv[4|6] and may affect failover capability
All NICs in a multipathing group must be homogeneously plumbed. For example, if a NIC is plumbed for IPv4, then all NICs in the group must be plumbed for IPv4. The streams modules pushed on all NICs must be identical.
No test address configured on interface interface_name disabling probe-based failure detection on it
In order for in.mpathd to perform probe-based failure detection on a NIC, it must be configured with a test address: IPv4, IPv6, or both.
The link has come up on interface_name more than 2 times in the last minute; disabling failback until it stabilizes.
In order to prevent interfaces with intermittent hardware, such as a bad cable, from causing repeated failovers and failbacks, in.mpathd does not failback to interfaces with frequently fluctuating link states.
Invalid failure detection time assuming default 10000
An invalid value was encountered for FAILURE_DETECTION_TIME in the /etc/default/mpathd file.
Too small failure detection time of time assuming minimum 100
The minimum value that can be specified for FAILURE_DETECTION_TIME is currently 100 milliseconds.
Invalid value for FAILBACK value
Valid values for the boolean variable FAILBACK are yes or no.
Invalid value for TRACK_INTERFACES_ONLY_WITH_GROUPS value
Valid values for the boolean variable TRACK_INTERFACES_ONLY_WITH_GROUPS are yes or no.
Cannot meet requested failure detection time of time ms on (inet interface_name) new failure detection time for group group_name is time ms
The round trip time for ICMP probes is higher than necessary to maintain the current failure detection time. The network is probably congested or the probe targets are loaded. in.mpathd automatically increases the failure detection time to whatever it can achieve under these conditions.
Improved failure detection time time ms on (inet interface_name) for group group_name
The round trip time for ICMP probes has now decreased and in.mpathd has lowered the failure detection time correspondingly.
NIC failure detected on interface_name
in.mpathd has detected NIC failure on interface_name, and has set the IFF_FAILED flag on NIC interface_name.
Successfully failed over from NIC interface_name1 to NIC interface_name2
in.mpathd has caused the network traffic to failover from NIC interface_name1 to NIC interface_name2, which is part of the multipathing group.
NIC repair detected on interface_name
in.mpathd has detected that NIC interface_name is repaired and operational. If the IFF_FAILED flag on the NIC was previously set, it will be reset.
Successfully failed back to NIC interface_name
in.mpathd has restored network traffic back to NIC interface_name, which is now repaired and operational.
The link has gone down on interface_name
in.mpathd has detected that the IFF_RUNNING flag for NIC interface_name has been cleared, indicating the link has gone down.
The link has come up on interface_name
in.mpathd has detected that the IFF_RUNNING flag for NIC interface_name has been set, indicating the link has come up.