Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent missing symlinks with sas_mpath_snic_alias; possible timing issue? #29

Open
OsmiumBalloon opened this issue Jun 23, 2024 · 1 comment

Comments

@OsmiumBalloon
Copy link

Summary

  • Using sas_mpath_snic_alias script in udev rules
  • Intermittently missing a handful of symlinks after udev runs
  • Adding delay to sas_mpath_snic_alias seems to alleviate the problem
  • Possibly the enclosure controllers are overwhelmed with too many requests at once?
  • In this report, some WWNs have been redacted to protect the guilty

Environment

  • Hardware
    • 2 x Broadcom HBA 9500-16e
    • 84 x Seagate ST20000NM002D disks
    • 2 x Supermicro CSE-847E2C-R1K23JBOD enclosures (w/ redundant expanders)
  • Software
    • Debian 12.5 "bookworm"
    • Kernel 6.1.0-21-amd64 / 6.1.90-1 (2024-05-03)
    • Python 3.11.2
    • multipath-tools 0.9.4-3+deb12u1
    • sasutils 0.5.0
  • SAS topology
    • Single SFF-8644 cable from each HBA, to an expander in each enclosure
    • Thus: Two SFF-8644 cables from host to each enclosure
    • The second host-facing SFF-8644 port on each expander is not used
    • Downstream daisy-chain ports on enclosures are not used

Configuration

  • /etc/multipath.conf says in part:
    • user_friendly_names no
    • find_multipaths yes
    • path_grouping_policy multibus
  • /etc/udev/rules.d/sasutils.rules says:
    • KERNEL=="dm-[0-9]*", PROGRAM="/usr/local/bin/sas_mpath_snic_alias_delayed %k", SYMLINK+="mapper/%c"
  • sg_ses has been used to assign nicknames to the enclosures, such as:
    • SHLF_1_FRNT_PRI (disk shelf 1, front backplane, primary expander)
    • SHLF_1_FRNT_SEC (disk shelf 1, front backplane, secondary expander)
    • SHLF_1_REAR_PRI (disk shelf 1, rear backplane, primary expander)
    • SHLF_2_FRNT_PRI (disk shelf 2, front backplane, primary expander)
  • The disks are not partitioned

Symptoms

  • I am expecting symlinks like /dev/mapper/SHLF_1_FRNT-bay00 to appear for every physical disk
  • Intermittently, a handful of these will be missing

Investigation

Good behaviors

  • Each disk appears twice at the SAS block layer (/dev/sd*)
  • All /dev/mapper/35000000000000000 symlinks always appear for all disks
  • I/O seems completely reliable; it is just the udev aliases that have trouble
  • sas_devices -v has always reported all devices and enclosures, with proper slots
  • lsscsi has always reported all devices
  • multipath -l has always reported all devices, with two disks per map
  • When the links are there, I/O works fine; almost 1 petabyte written

Problem behaviors

  • Not always the same nodes missing
  • Persists though multiple reboots and kernel updates
  • Persists through a full shutdown, power-off, and power-source-disconnect
  • Typically only 2 to 4 nodes missing, but once saw as many as 14
  • Lower numbered devices seem slightly more likely to be missing
    • For example, /dev/mapper/SHLF_1_FRNT-bay00 missing several times
    • However, connecting just one shelf does not make the problem go away
  • I have tried running udevadm trigger a few times; it has always caused the missing nodes to appear
  • Does not appear to be specific to multipath
    • Tried a single SAS cable per enclosure and using just sas_sd_snic_alias
    • Most disks then appeared as SHLF_1_FRNT_PRI-bay00
    • Still randomly and intermittently missing some nodes
    • A few links showed up with names like /dev/disk/by-bay/naa.5000000000000000-bay09
  • I set udev_log to debug in /etc/udev/udev.conf but the results have not been particularly illuminating
    • All I have seen is the occasional /etc/udev/rules.d/sasutils.rules:11 Command "/usr/local/bin/sas_mpath_snic_alias_delayed dm-0" returned 1 (error)
    • Nothing more informative
    • Not all dm- devices even appear in the log, even when everything is working (???)

Workaround

  • Introducing a delay in sas_mpath_snic_alias seems to have alleviated the problem
  • Theory is
    • udev events are firing for all 84 disks at once
    • Possibly udev events fire for each path, so 168 invocations
    • Plus invocations for all 84 multipath maps
    • Each invocation would then query the enclosure controllers independently
    • Hundreds of inquiries hitting the enclosure controller at once might have been too much for its little brain to handle
  • Delay is proportional to the multipath map number, so it should scale with more/fewer disks
  • An extremely large number of disks might still lead to udev timeouts
  • My implementation is kludgey and fragile

Details

  • Modified sas_mpath_snic_alias file itself
  • Added from time import sleep near top
  • Added delay proportional to dm-NN number passed as first argument, after sys.argv processing, but before load_entry_point, as follows:
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
delay = sys.argv[1]     # assuming a single argument, I hope that's right
delay = delay[3::]      # extract number out of an argument like "dm-37"
delay = int(delay)      # make sure it is an integer
delay = delay * 0.04    # add 40 millisecond delay for each additional map
delay = delay + 0.25    # minimum 250 millisecond delay
sleep(delay)
sys.exit(load_entry_point('sasutils==0.5.0', 'console_scripts', 'sas_mpath_snic_alias')())

A proper solution would likely be in the main part of the code library, but I had neither the time nor the skill to delve that deeply.

@thiell
Copy link
Member

thiell commented Jun 24, 2024

Thanks for your detailed report @OsmiumBalloon. I also noticed such issues with very large systems. I am not sure the root cause is with udev, the mpt3sas driver, the enclosure SES little brain indeed, or the kernel itself. Like you, I also noticed that udev logs are not really helpful...

I use a workaround by wrapping sas_mpath_snic_alias in a script that calls it once and retries 3 more times with a random delay when the returned alias is not what we expect. This wrapper is then called by the udev rule:

Example with:

KERNEL=="dm-[0-9]*", PROGRAM="/usr/bin/oak_udev_sas_mpath_snic_alias %k", SYMLINK+="mapper/%c"

The wrapper script /usr/bin/oak_udev_sas_mpath_snic_alias being:

#!/bin/bash

DBGFILE=/tmp/udev_sas_mpath_snic_alias.log
DEV=$1

for i in {1..4}
do
    alias=$(/usr/bin/sas_mpath_snic_alias $DEV 2>>$DBGFILE)
    if [[ $alias =~ ^io[0-9]+-jbod[1-8]-bay[0-9]+$ ]]; then           # <<< change alias regex here
        echo "$DEV: alias \"$alias\" accepted" >>$DBGFILE
        break
    else
        echo "$DEV: alias \"$alias\" not valid" >>$DBGFILE
        usleep $[ 10 + ( $RANDOM * 50 ) ]
    fi
done

echo $alias

With this wrapper, I have been able to reliably get all aliases set up a few minutes after boot time. However, it seems to be a little bit too specific to be integrated into sasutils, but let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants