RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2140017 - need kernel parameter to keep disk order consistent
Summary: need kernel parameter to keep disk order consistent
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 9.3
Assignee: Ewan D. Milne
QA Contact: ChanghuiZhong
URL:
Whiteboard:
: 2161940 (view as bug list)
Depends On:
Blocks: 2094166 2094167 2133507 2249671
TreeView+ depends on / blocked
 
Reported: 2022-11-04 07:15 UTC by tbsky
Modified: 2024-01-25 20:49 UTC (History)
13 users (show)

Fixed In Version: kernel-5.14.0-344.el9
Doc Type: If docs needed, set a value
Doc Text:
Feature: New kernel parameter to keep disk order consistent Reason: Modern Linux kernels, including RHEL 9, utilize asynchronous device probing in order to speed up boot time. This can result in different device number assignments for SCSI devices (sda, sdb, etc.) on successive boot iterations. Red Hat documentation recommends the use of persistent device names (/dev/by-id links) in order to ensure that the correct device is used. This is particularly important for SAN-attached devices which may not all be present at boot time. However, the variability in device numbering can now more commonly occur even in systems with only local disks present. To improve consistency in SCSI disk device numbering, a new kernel option "sd_mod.probe=sync" has been added to use synchronous device probing instead of asynchronous device probing for SCSI devices. Result: With the "sd_mod.probe=sync" option, SCSI device enumeration is now performed synchronously, which reduces the variability in device numbering on successive boot iterations. NOTE: Even with synchronous SCSI disk probing, it is still possible for the device numbering (and sda, sdb. etc. device names) to change on successive boot iterations. For example, a disk may fail to respond, or a RAID controller configuration may have changed. For this reason, Red Hat strongly continues to recommend the use of persistent device names (/dev/by-id links). This module option is primarily being provided to assist customers migrating from earlier versions of RHEL to RHEL 9 and may be removed in a future major release.
Clone Of:
: 2249671 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:39:40 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmeg when last disk become sdb (107.75 KB, text/plain)
2023-12-19 17:23 UTC, tbsky
no flags Details
dmesg when last disk become sdc (107.75 KB, text/plain)
2023-12-19 17:25 UTC, tbsky
no flags Details
dmesg when last disk become sdd (107.12 KB, text/plain)
2023-12-19 17:25 UTC, tbsky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 2819 0 None opened scsi: sd: Add "probe_type" module parameter to allow synchronous probing 2023-07-17 17:32:06 UTC
Red Hat Issue Tracker RHELPLAN-138306 0 None None None 2022-11-04 07:26:24 UTC
Red Hat Knowledge Base (Solution) 2975361 0 None None None 2024-01-25 20:49:02 UTC
Red Hat Product Errata RHSA-2023:6583 0 None None None 2023-11-07 08:40:19 UTC

Description tbsky 2022-11-04 07:15:52 UTC
hi:
   since kernel 5.3 the driver change behavior so now disk orders maybe unpredictable. https://lore.kernel.org/lkml/59eedd28-25d4-7899-7c3c-89fe7fdd4b43@acm.org/t/#m6d134a012823377bb2ce91ea2350e8be9200ff91

   SUSE made parameter for this https://www.suse.com/support/kb/doc/?id=000018449

   but SUSE parameter is not working under RHEL9. I try similar parameter like "scsi_mod.scan=sync". it achieves about 90% stability in one 3 disks system. but in another 16 disks system, that parameter is totally useless.

   we need consistent disk order to apply some command like "smartctl -l scterc,70,70 /dev/sda". it is also convenient when the disk fails and need to hot-swap it. or we need to check /dev/disk/by-path to make sure the disk location.

Comment 1 Ewan D. Milne 2023-01-12 20:59:02 UTC
Are you able to use the persistent names in /dev/disk/by-id/xxxx or /dev/disk/by-path/xxx
for your application?

Generally speaking we do not (and have not) guarantee disk probe ordering, i.e. sda, sdb, etc.
The reason is, it may work for some simple environments, e.g. for local disks where the
drivers are probed in the same order, but it does not always work e.g. for SAN-attached devices.

Can you provide log files of the 16 disk system showing (A) the desired order and (B) when
a different order resulted when "scsi_mod.scan=sync" was used?

Comment 2 tbsky 2023-01-13 01:27:38 UTC
Hi:
   /dev/disk/by-id and /dev/disk/by-path is not good for daily use. like software raid below. it will a terrible list without simple disk names. even anaconda installation need these simple disk names.

[root@love-2 by-path]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sdf1[1] sdg1[3] sdj1[6] sdc1[0] sdk1[8] sdh1[4] sdp1[5] sdi1[7]
      27348205248 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

md3 : active raid5 sdl1[0] sdn1[2] sdq1[5] sdm1[1] sdo1[3]
      31255572480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 3/59 pages [12KB], 65536KB chunk

md1 : active raid6 sda3[4] sdb3[1] sdd3[2] sde3[3]
      7813154432 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 4/30 pages [16KB], 65536KB chunk

md0 : active raid1 sda2[4] sdd2[2] sdb2[1] sde2[3]
      308160 blocks super 1.0 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

   below is the disk order with parameter "scsi_mod.scan=sync". it is a 32 disk bay enclosure which has two sas expanders in front and back with a singe SAS io card. it has 16 disks installed now. most systems can have consistent names if the disk name (sda,sdb) follow the physical name (eg: disk-by-path name order). it was the logic under RHEL7/8.

[root@love-2 by-path]# ls -l
total 0
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0 -> ../../sda
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0 -> ../../sdd
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part3 -> ../../sdd3
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0 -> ../../sde
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part2 -> ../../sde2
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part3 -> ../../sde3
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy14-lun-0 -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy14-lun-0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy15-lun-0 -> ../../sdf
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy15-lun-0-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy16-lun-0 -> ../../sdg
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy16-lun-0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy17-lun-0 -> ../../sdh
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy17-lun-0-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy18-lun-0 -> ../../sdi
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy18-lun-0-part1 -> ../../sdi1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy19-lun-0 -> ../../sdj
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy19-lun-0-part1 -> ../../sdj1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy20-lun-0 -> ../../sdp
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy20-lun-0-part1 -> ../../sdp1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy21-lun-0 -> ../../sdk
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy21-lun-0-part1 -> ../../sdk1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy22-lun-0 -> ../../sdl
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy22-lun-0-part1 -> ../../sdl1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy23-lun-0 -> ../../sdm
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy23-lun-0-part1 -> ../../sdm1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy24-lun-0 -> ../../sdn
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy24-lun-0-part1 -> ../../sdn1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy25-lun-0 -> ../../sdo
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy25-lun-0-part1 -> ../../sdo1
lrwxrwxrwx 1 root root  9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy26-lun-0 -> ../../sdq
lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy26-lun-0-part1 -> ../../sdq1

Comment 3 tbsky 2023-02-14 05:49:16 UTC
Hi:
   will the new kernel parameter "<modulename>.async_probe" help is this situation? I saw this post https://lore.kernel.org
/lkml/Yr9yCMsB1HJ1NEuF.org/T/

   I don't know if "sd.async_probe=0" would make disk order consistent again if newer kernel support the parameter?

Comment 4 Ewan D. Milne 2023-02-14 19:56:10 UTC
Unfortunately I believe that parameter can only be used to force async probing for
modules that did not specify it in the driver template.  If the driver (like sd)
specifies async probing, the parameter will not override that.

I'll check to make sure, but from code inspection that appears to be the case.
The intent of the kernel developer community was to eventually make everything async.

I'm looking into whether I can get away with adding an sd_mod parameter, but this
would likely not be accepted upstream, which means I would need to justify a RHEL-only
change which we would have to carry forward in future versions.

Comment 5 Ewan D. Milne 2023-02-14 21:23:30 UTC
static bool driver_allows_async_probing(struct device_driver *drv)
{
        switch (drv->probe_type) {
        case PROBE_PREFER_ASYNCHRONOUS:                    <== if sd_template.probe_type = PROBE_PREFER_ASYNCHRONOUS
                return true;                                   then the code does not even check the module parameter

        case PROBE_FORCE_SYNCHRONOUS:
                return false;

        default:
                if (cmdline_requested_async_probing(drv->name))
                        return true;

                if (module_requested_async_probing(drv->owner))
                        return true;

                return false;
        }
}

static struct scsi_driver sd_template = {
        .gendrv = {
                .name           = "sd",
                .owner          = THIS_MODULE,
                .probe          = sd_probe,
                .probe_type     = PROBE_PREFER_ASYNCHRONOUS,          <===
                .remove         = sd_remove,
                .shutdown       = sd_shutdown,
                .pm             = &sd_pm_ops,
        },
        .rescan                 = sd_rescan,
        .init_command           = sd_init_command,
        .uninit_command         = sd_uninit_command,
        .done                   = sd_done,
        .eh_action              = sd_eh_action,
        .eh_reset               = sd_eh_reset,
};

Comment 6 Ewan D. Milne 2023-02-14 21:28:04 UTC
Something like this seems to work.  It makes the entire sd probe path
synchronous though, not just the first portion with the minor # allocation.
Unlike earlier kernels there would be no overlap of all the INQUIRY,
READ CAPACITY, etc commands that have to be issued for each device.
Might be OK for a small number of local devices though.


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 441e73c7265c..b78ab120903d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -130,6 +130,15 @@ static const char *sd_cache_types[] = {
        "write back, no read (daft)"
 };
 
+static const char *sd_probe_types[] = { "async", "sync" };
+
+static char sd_probe_type[6] = "async";
+module_param_string(probe, sd_probe_type, sizeof(sd_probe_type),
+                   S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(probe, "async or sync. Setting to 'sync' disables asynchronous "
+                "device number assignments (sda, sdb, ...).");
+
 static void sd_set_flush_flag(struct scsi_disk *sdkp)
 {
        bool wc = false, fua = false;
@@ -3842,6 +3850,8 @@ static int __init init_sd(void)
                goto err_out_cache;
        }
 
+       if (!strcmp(sd_probe_type, "sync"))
+               sd_template.gendrv.probe_type = PROBE_FORCE_SYNCHRONOUS;
        err = scsi_register_driver(&sd_template.gendrv);
        if (err)
                goto err_out_driver;

Comment 7 tbsky 2023-02-15 08:04:49 UTC
Thanks a lot for your effort!
I wonder if people don't need simple device names anymore. I can live when "eth0" become "eno1" or even "enp1s0".
but "sda" become "pci-0000:07:00.0-sas-phy0-lun-0" seems too much. we still need a simple consistent name when doing things like software raid or smart error monitoring.

Comment 8 Simon Matter 2023-03-02 08:11:05 UTC
I'm asking to increase the priority here because this behavior results in very dangerous situation IMHO.

Please see what happens in my case and I'm quite sure I'm not alone
https://lists.centos.org/pipermail/centos/2023-March/896737.html

The reulting devices in case of HPE Smart Array controlled disks result something like this but changing with every reboot:
/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sda
/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1
/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2

Imagine what happens if you want to wipe the two partitions on /dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000...

That's really a critical thing which should not be possible.


I'm still not exactly sure where the issue comes from. It may be that also 'sg3_utils' and 'dracut' are involved here. At least I found these bugs in them:

sg3_utils:
This file /usr/lib/udev/rules.d/65-scsi-cciss_id.rules calls 'cciss_id' but the program called is not shipped in the RPM. The spec file patch below should fix this:

--- sg3_utils.spec.orig	2022-06-15 14:03:29.000000000 +0200
+++ sg3_utils.spec	2023-03-01 10:38:54.691384321 +0100
@@ -102,6 +102,7 @@
 # need to run after 62-multipath.rules
 install -p -m 644 scripts/58-scsi-sg3_symlink.rules $RPM_BUILD_ROOT%{_udevrulesdir}/63-scsi-sg3_symlink.rules
 install -p -m 644 scripts/59-scsi-cciss_id.rules $RPM_BUILD_ROOT%{_udevrulesdir}/65-scsi-cciss_id.rules
+install -p -m 755 scripts/cciss_id $RPM_BUILD_ROOT%{_udevlibdir}
 install -p -m 644 scripts/59-fc-wwpn-id.rules $RPM_BUILD_ROOT%{_udevrulesdir}/63-fc-wwpn-id.rules
 install -p -m 755 scripts/fc_wwpn_id $RPM_BUILD_ROOT%{_udevlibdir}
 
@@ -113,6 +114,7 @@
 %{_udevrulesdir}/63-scsi-sg3_symlink.rules
 %{_udevrulesdir}/63-fc-wwpn-id.rules
 %{_udevrulesdir}/65-scsi-cciss_id.rules
+%{_udevlibdir}/cciss_id
 %{_udevrulesdir}/40-usb-blacklist.rules
 %{_udevlibdir}/fc_wwpn_id
 
dracut:
The file /usr/lib/dracut/modules.d/95udev-rules/module-setup.sh makes use of the following udev rules

55-scsi-sg3_id.rules
58-scsi-sg3_symlink.rules

But these files are renamed in EL9 to

61-scsi-sg3_id.rules
63-scsi-sg3_symlink.rules


I hope some of my input is helpful to fix the issue. Unfortunately the server I've used to test things for be available for more tests soon.

Regards,
Simon

Comment 9 Ewan D. Milne 2023-05-16 20:47:05 UTC
I discussed this issue with the other upstream Linux SCSI maintainers
during the Linux Foundation LSF/MM conference last week.  James and Martin
will not accept a kernel patch to allow the sd device probing to return
to its prior synchronous behavior.  James' suggestion was, as I expected,
to use udev to provide some naming consistency (his example was to do what
the network devices do, and make the naming persist after first boot).
I pointed out that this did not solve the issue of newly added devices
e.g. with scsi_add_device() calls from mpt3sas but there was no agreement
that the kernel should be changed.

I am pursuing a RHEL-only change for RHEL 9 now.  However, that may not be
accepted either, since the upstream kernel maintainers do not agree and
the RHEL kernel team is trying to minimize upstream deviations.

Comment 10 tbsky 2023-05-17 05:08:08 UTC
Thanks again!

Hope RHEL can make this work done like SUSE.
I saw discussions among Arch and Debian users also complain about the behavior, but they don't have solutions to overcome it.

Comment 15 Ewan D. Milne 2023-06-28 20:21:35 UTC
Still awaiting internal kernel team review/acceptance of RHEL-only patch.

Comment 20 Ewan D. Milne 2023-07-17 17:56:56 UTC
There are several other reports of this issue, we are in the process of
merging the module parameter described in comment # 6 above into RHEL 9,
what I would like to do is open this BZ up to all other interested parties,
is that acceptable?

Comment 22 tbsky 2023-07-18 01:25:15 UTC
Hi:
   Thanks a lot for the effort! Please share the information as you like.
I was afraid that the patch won't be accepted and I need to set up the disk name manually.
Thanks for the great news.

Comment 28 ChanghuiZhong 2023-07-21 05:17:47 UTC
I reboot and test 10 times, and the disk order can keep consistent every time

[root@storageqe-102 ~]# uname -r
5.14.0-340.2819_935944297.el9.x86_64
[root@storageqe-102 ~]# cat  /sys/module/sd_mod/parameters/probe
sync
[root@storageqe-102 ~]# (cd /dev/disk/by-path && ls -l | grep /s)
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-2.0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-3 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-3.0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-4 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jul 21 00:32 pci-0000:00:17.0-ata-4.0 -> ../../sdd
[root@storageqe-102 ~]#

Comment 34 ChanghuiZhong 2023-07-31 03:26:51 UTC
reboot and test 10 times,the disk order can keep consistent every time

[root@storageqe-103 ~]# uname -r
5.14.0-344.el9.x86_64
[root@storageqe-103 ~]# 
[root@storageqe-103 ~]# cat  /sys/module/sd_mod/parameters/probe
sync
[root@storageqe-103 ~]# 
[root@storageqe-103 ~]# (cd /dev/disk/by-path && ls -l | grep /s)
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-5 -> ../../sda
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-5.0 -> ../../sda
lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5.0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5.0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-6 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-6.0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-7 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-7.0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-8 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jul 30 23:12 pci-0000:00:17.0-ata-8.0 -> ../../sdd
[root@storageqe-103 ~]#

Comment 37 John Meneghini 2023-09-07 15:23:46 UTC
*** Bug 2161940 has been marked as a duplicate of this bug. ***

Comment 39 errata-xmlrpc 2023-11-07 08:39:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6583

Comment 41 tbsky 2023-12-18 14:24:15 UTC
Hi:
   I upgraded several systems to RHEL 9.3 and I notice two things:

1. as John Meneghini said in the upstream merge request: I also hope "nvme" module would has the same parameter https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2819

2. when there are unused sata ports between hard disks, the disk orders are strange and didn't follow pci device id sequence (but orders are still consistent every boot). below is an example for RHEL8/RHEL9 at "dev/disk/by-path" for the same server:

RHEL8:
lrwxrwxrwx 1 root root   9 Dec 16 07:31 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx 1 root root   9 Dec 16 07:31 pci-0000:00:17.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part3 -> ../../sdb3
lrwxrwxrwx 1 root root   9 Dec 16 07:31 pci-0000:00:17.0-ata-3 -> ../../sdc
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part3 -> ../../sdc3
lrwxrwxrwx 1 root root   9 Dec 16 07:31 pci-0000:00:17.0-ata-8 -> ../../sdd
lrwxrwxrwx 1 root root  13 Dec 16 07:31 pci-0000:4b:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Dec 16 07:31 pci-0000:4b:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  13 Dec 16 07:31 pci-0000:4c:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root  15 Dec 16 07:31 pci-0000:4c:00.0-nvme-1-part1 -> ../../nvme1n1p1

RHEL9:
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-2 -> ../../sdc
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-2.0 -> ../../sdc
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part3 -> ../../sdc3
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part3 -> ../../sdc3
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-3 -> ../../sdd
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-3.0 -> ../../sdd
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part2 -> ../../sdd2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part3 -> ../../sdd3
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part2 -> ../../sdd2
lrwxrwxrwx 1 root root  10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part3 -> ../../sdd3
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-8 -> ../../sdb
lrwxrwxrwx 1 root root   9 Dec 16 15:36 pci-0000:00:17.0-ata-8.0 -> ../../sdb
lrwxrwxrwx 1 root root  13 Dec 16 15:36 pci-0000:4b:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Dec 16 15:36 pci-0000:4b:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  13 Dec 16 15:36 pci-0000:4c:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root  15 Dec 16 15:36 pci-0000:4c:00.0-nvme-1-part1 -> ../../nvme1n1p1

Comment 42 Ewan D. Milne 2023-12-18 17:27:49 UTC
Thanks for trying this out and reporting your results.

NVMe has never had any consistency with minor device numbering (or any numbering
of kernel objects, e.g. if you have multiple NVMe controller instances in a fabric
environment you will notice that the numbering is basically never the same.
So we do not intend to have a similar module option for NVMe.

If you need to positively identify devices, the /dev/by-xxx is the "official" way
to do it.  Upstream won't accept changes to force the synchronous probing, and the
mechanism in the driver core that this RHEL-specific module option for the
sd driver is using may go away in the future, so we may end up having to deprecate it
in the future (e.g. it may get removed in a future major release).  We were really
only able to justify this RHEL-specific option to help ease the migration from RHEL 8
to RHEL 9 for people like yourself.

There was a recent discussion thread upstream when someone proposed changing the
driver core:

https://www.spinics.net/lists/linux-scsi/msg191542.html

See Greg K-H's response:

https://www.spinics.net/lists/linux-scsi/msg191602.html

If that's Greg's position, we're not likely to get any movement on an upstream change.

--

Note, the actual module option syntax is "sd_mod.probe=sync", I take it you used that?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4e88f8acf4f9..e54e59553256 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -131,6 +131,12 @@ static const char *sd_cache_types[] = {
        "write back, no read (daft)"
 };
 
+static char sd_probe_type[6] = "async";
+module_param_string(probe, sd_probe_type, sizeof(sd_probe_type),
+                   S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(probe, "async or sync. Setting to 'sync' disables asynchronous "
+                "device number assignments (sda, sdb, ...).");
+
 static void sd_set_flush_flag(struct scsi_disk *sdkp)
 {
        bool wc = false, fua = false;
@@ -3871,6 +3877,8 @@ static int __init init_sd(void)
                goto err_out_ppool;
        }
 
+       if (!strcmp(sd_probe_type, "sync"))
+               sd_template.gendrv.probe_type = PROBE_FORCE_SYNCHRONOUS;
        err = scsi_register_driver(&sd_template.gendrv);
        if (err)
                goto err_out_driver;


If so, your case of the device ordering with the synchronous option on being different
on RHEL 9.3 than on RHEL 8 is curious, could you attach a full dmesg of the boot on 9.3?
There could be something in the ATA code that explains this.

Comment 43 tbsky 2023-12-19 17:23:17 UTC
Created attachment 2005032 [details]
dmeg when last disk become sdb

Comment 44 tbsky 2023-12-19 17:25:04 UTC
Created attachment 2005033 [details]
dmesg when last disk become sdc

Comment 45 tbsky 2023-12-19 17:25:34 UTC
Created attachment 2005034 [details]
dmesg when last disk become sdd

Comment 46 tbsky 2023-12-19 17:43:28 UTC
Hi:
   when I check disk order again, I found it changed to what I want (by-path). so I resintall/recheck to find out what cause that. finally I found the boot parameter "console=ttyS1,115200" trigger the correct by-path order. it's strange so I attach the full dmesg logs.

   without the parameter the last hard disk will become "sdb" about 9 times and "sdc" about 1 time for every 10 reboot.
   with the parameter the last hard disk will always become "sdd"
   of course all these are under "sd_mod.probe=sync" as you can see that in dmesg logs.

   I think what administrator need is a short and persistent name like network device naming. maybe I need to create udev rules at least for nvme disks now. is it possible for RedHat to provide some naming suggestion like network device naming? it would be good if there is a rule to follow so results for every hosts would like similar.

   Thanks again for your patience and information.


Note You need to log in before you can comment on or make changes to this bug.