Bug 2140017
Summary: | need kernel parameter to keep disk order consistent | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | tbsky <tbskyd> | ||||||||
Component: | kernel | Assignee: | Ewan D. Milne <emilne> | ||||||||
kernel sub component: | Storage Drivers | QA Contact: | ChanghuiZhong <czhong> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | medium | ||||||||||
Priority: | medium | CC: | cwei, czhong, emilne, jmagrini, jpittman, nweddle, orion, sadas, saydas, sgardner, simon.matter, sumit.saxena, wpinheir | ||||||||
Version: | 9.0 | Keywords: | Triaged, ZStream | ||||||||
Target Milestone: | beta | ||||||||||
Target Release: | 9.3 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-5.14.0-344.el9 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: |
Feature: New kernel parameter to keep disk order consistent
Reason:
Modern Linux kernels, including RHEL 9, utilize asynchronous
device probing in order to speed up boot time. This
can result in different device number assignments for SCSI
devices (sda, sdb, etc.) on successive boot iterations.
Red Hat documentation recommends the use of persistent device
names (/dev/by-id links) in order to ensure that the correct
device is used. This is particularly important for SAN-attached
devices which may not all be present at boot time.
However, the variability in device numbering can now more
commonly occur even in systems with only local disks present.
To improve consistency in SCSI disk device numbering, a new
kernel option "sd_mod.probe=sync" has been added to use
synchronous device probing instead of asynchronous device
probing for SCSI devices.
Result:
With the "sd_mod.probe=sync" option, SCSI device enumeration is now performed synchronously, which reduces the
variability in device numbering on successive boot iterations.
NOTE: Even with synchronous SCSI disk probing, it is still possible
for the device numbering (and sda, sdb. etc. device names) to
change on successive boot iterations. For example, a disk may
fail to respond, or a RAID controller configuration may have
changed. For this reason, Red Hat strongly continues to
recommend the use of persistent device names (/dev/by-id links).
This module option is primarily being provided to
assist customers migrating from earlier versions of RHEL to
RHEL 9 and may be removed in a future major release.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 2249671 (view as bug list) | Environment: | |||||||||
Last Closed: | 2023-11-07 08:39:40 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 2094166, 2094167, 2133507, 2249671 | ||||||||||
Attachments: |
|
Description
tbsky
2022-11-04 07:15:52 UTC
Are you able to use the persistent names in /dev/disk/by-id/xxxx or /dev/disk/by-path/xxx for your application? Generally speaking we do not (and have not) guarantee disk probe ordering, i.e. sda, sdb, etc. The reason is, it may work for some simple environments, e.g. for local disks where the drivers are probed in the same order, but it does not always work e.g. for SAN-attached devices. Can you provide log files of the 16 disk system showing (A) the desired order and (B) when a different order resulted when "scsi_mod.scan=sync" was used? Hi: /dev/disk/by-id and /dev/disk/by-path is not good for daily use. like software raid below. it will a terrible list without simple disk names. even anaconda installation need these simple disk names. [root@love-2 by-path]# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md2 : active raid5 sdf1[1] sdg1[3] sdj1[6] sdc1[0] sdk1[8] sdh1[4] sdp1[5] sdi1[7] 27348205248 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU] bitmap: 0/30 pages [0KB], 65536KB chunk md3 : active raid5 sdl1[0] sdn1[2] sdq1[5] sdm1[1] sdo1[3] 31255572480 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 3/59 pages [12KB], 65536KB chunk md1 : active raid6 sda3[4] sdb3[1] sdd3[2] sde3[3] 7813154432 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU] bitmap: 4/30 pages [16KB], 65536KB chunk md0 : active raid1 sda2[4] sdd2[2] sdb2[1] sde2[3] 308160 blocks super 1.0 [4/4] [UUUU] bitmap: 0/1 pages [0KB], 65536KB chunk below is the disk order with parameter "scsi_mod.scan=sync". it is a 32 disk bay enclosure which has two sas expanders in front and back with a singe SAS io card. it has 16 disks installed now. most systems can have consistent names if the disk name (sda,sdb) follow the physical name (eg: disk-by-path name order). it was the logic under RHEL7/8. [root@love-2 by-path]# ls -l total 0 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0 -> ../../sda lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy12-lun-0-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e4f53f-phy13-lun-0-part3 -> ../../sdb3 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0 -> ../../sdd lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part2 -> ../../sdd2 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy12-lun-0-part3 -> ../../sdd3 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0 -> ../../sde lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy13-lun-0-part3 -> ../../sde3 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy14-lun-0 -> ../../sdc lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy14-lun-0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy15-lun-0 -> ../../sdf lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy15-lun-0-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy16-lun-0 -> ../../sdg lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy16-lun-0-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy17-lun-0 -> ../../sdh lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy17-lun-0-part1 -> ../../sdh1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy18-lun-0 -> ../../sdi lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy18-lun-0-part1 -> ../../sdi1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy19-lun-0 -> ../../sdj lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy19-lun-0-part1 -> ../../sdj1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy20-lun-0 -> ../../sdp lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy20-lun-0-part1 -> ../../sdp1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy21-lun-0 -> ../../sdk lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy21-lun-0-part1 -> ../../sdk1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy22-lun-0 -> ../../sdl lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy22-lun-0-part1 -> ../../sdl1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy23-lun-0 -> ../../sdm lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy23-lun-0-part1 -> ../../sdm1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy24-lun-0 -> ../../sdn lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy24-lun-0-part1 -> ../../sdn1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy25-lun-0 -> ../../sdo lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy25-lun-0-part1 -> ../../sdo1 lrwxrwxrwx 1 root root 9 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy26-lun-0 -> ../../sdq lrwxrwxrwx 1 root root 10 Jan 13 00:30 pci-0000:03:00.0-sas-exp0x5003048017e9c33f-phy26-lun-0-part1 -> ../../sdq1 Hi: will the new kernel parameter "<modulename>.async_probe" help is this situation? I saw this post https://lore.kernel.org /lkml/Yr9yCMsB1HJ1NEuF.org/T/ I don't know if "sd.async_probe=0" would make disk order consistent again if newer kernel support the parameter? Unfortunately I believe that parameter can only be used to force async probing for modules that did not specify it in the driver template. If the driver (like sd) specifies async probing, the parameter will not override that. I'll check to make sure, but from code inspection that appears to be the case. The intent of the kernel developer community was to eventually make everything async. I'm looking into whether I can get away with adding an sd_mod parameter, but this would likely not be accepted upstream, which means I would need to justify a RHEL-only change which we would have to carry forward in future versions. static bool driver_allows_async_probing(struct device_driver *drv) { switch (drv->probe_type) { case PROBE_PREFER_ASYNCHRONOUS: <== if sd_template.probe_type = PROBE_PREFER_ASYNCHRONOUS return true; then the code does not even check the module parameter case PROBE_FORCE_SYNCHRONOUS: return false; default: if (cmdline_requested_async_probing(drv->name)) return true; if (module_requested_async_probing(drv->owner)) return true; return false; } } static struct scsi_driver sd_template = { .gendrv = { .name = "sd", .owner = THIS_MODULE, .probe = sd_probe, .probe_type = PROBE_PREFER_ASYNCHRONOUS, <=== .remove = sd_remove, .shutdown = sd_shutdown, .pm = &sd_pm_ops, }, .rescan = sd_rescan, .init_command = sd_init_command, .uninit_command = sd_uninit_command, .done = sd_done, .eh_action = sd_eh_action, .eh_reset = sd_eh_reset, }; Something like this seems to work. It makes the entire sd probe path synchronous though, not just the first portion with the minor # allocation. Unlike earlier kernels there would be no overlap of all the INQUIRY, READ CAPACITY, etc commands that have to be issued for each device. Might be OK for a small number of local devices though. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 441e73c7265c..b78ab120903d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -130,6 +130,15 @@ static const char *sd_cache_types[] = { "write back, no read (daft)" }; +static const char *sd_probe_types[] = { "async", "sync" }; + +static char sd_probe_type[6] = "async"; +module_param_string(probe, sd_probe_type, sizeof(sd_probe_type), + S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(probe, "async or sync. Setting to 'sync' disables asynchronous " + "device number assignments (sda, sdb, ...)."); + static void sd_set_flush_flag(struct scsi_disk *sdkp) { bool wc = false, fua = false; @@ -3842,6 +3850,8 @@ static int __init init_sd(void) goto err_out_cache; } + if (!strcmp(sd_probe_type, "sync")) + sd_template.gendrv.probe_type = PROBE_FORCE_SYNCHRONOUS; err = scsi_register_driver(&sd_template.gendrv); if (err) goto err_out_driver; Thanks a lot for your effort! I wonder if people don't need simple device names anymore. I can live when "eth0" become "eno1" or even "enp1s0". but "sda" become "pci-0000:07:00.0-sas-phy0-lun-0" seems too much. we still need a simple consistent name when doing things like software raid or smart error monitoring. I'm asking to increase the priority here because this behavior results in very dangerous situation IMHO. Please see what happens in my case and I'm quite sure I'm not alone https://lists.centos.org/pipermail/centos/2023-March/896737.html The reulting devices in case of HPE Smart Array controlled disks result something like this but changing with every reboot: /dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sda /dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1 /dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2 Imagine what happens if you want to wipe the two partitions on /dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000... That's really a critical thing which should not be possible. I'm still not exactly sure where the issue comes from. It may be that also 'sg3_utils' and 'dracut' are involved here. At least I found these bugs in them: sg3_utils: This file /usr/lib/udev/rules.d/65-scsi-cciss_id.rules calls 'cciss_id' but the program called is not shipped in the RPM. The spec file patch below should fix this: --- sg3_utils.spec.orig 2022-06-15 14:03:29.000000000 +0200 +++ sg3_utils.spec 2023-03-01 10:38:54.691384321 +0100 @@ -102,6 +102,7 @@ # need to run after 62-multipath.rules install -p -m 644 scripts/58-scsi-sg3_symlink.rules $RPM_BUILD_ROOT%{_udevrulesdir}/63-scsi-sg3_symlink.rules install -p -m 644 scripts/59-scsi-cciss_id.rules $RPM_BUILD_ROOT%{_udevrulesdir}/65-scsi-cciss_id.rules +install -p -m 755 scripts/cciss_id $RPM_BUILD_ROOT%{_udevlibdir} install -p -m 644 scripts/59-fc-wwpn-id.rules $RPM_BUILD_ROOT%{_udevrulesdir}/63-fc-wwpn-id.rules install -p -m 755 scripts/fc_wwpn_id $RPM_BUILD_ROOT%{_udevlibdir} @@ -113,6 +114,7 @@ %{_udevrulesdir}/63-scsi-sg3_symlink.rules %{_udevrulesdir}/63-fc-wwpn-id.rules %{_udevrulesdir}/65-scsi-cciss_id.rules +%{_udevlibdir}/cciss_id %{_udevrulesdir}/40-usb-blacklist.rules %{_udevlibdir}/fc_wwpn_id dracut: The file /usr/lib/dracut/modules.d/95udev-rules/module-setup.sh makes use of the following udev rules 55-scsi-sg3_id.rules 58-scsi-sg3_symlink.rules But these files are renamed in EL9 to 61-scsi-sg3_id.rules 63-scsi-sg3_symlink.rules I hope some of my input is helpful to fix the issue. Unfortunately the server I've used to test things for be available for more tests soon. Regards, Simon I discussed this issue with the other upstream Linux SCSI maintainers during the Linux Foundation LSF/MM conference last week. James and Martin will not accept a kernel patch to allow the sd device probing to return to its prior synchronous behavior. James' suggestion was, as I expected, to use udev to provide some naming consistency (his example was to do what the network devices do, and make the naming persist after first boot). I pointed out that this did not solve the issue of newly added devices e.g. with scsi_add_device() calls from mpt3sas but there was no agreement that the kernel should be changed. I am pursuing a RHEL-only change for RHEL 9 now. However, that may not be accepted either, since the upstream kernel maintainers do not agree and the RHEL kernel team is trying to minimize upstream deviations. Thanks again! Hope RHEL can make this work done like SUSE. I saw discussions among Arch and Debian users also complain about the behavior, but they don't have solutions to overcome it. Still awaiting internal kernel team review/acceptance of RHEL-only patch. There are several other reports of this issue, we are in the process of merging the module parameter described in comment # 6 above into RHEL 9, what I would like to do is open this BZ up to all other interested parties, is that acceptable? Hi: Thanks a lot for the effort! Please share the information as you like. I was afraid that the patch won't be accepted and I need to set up the disk name manually. Thanks for the great news. I reboot and test 10 times, and the disk order can keep consistent every time [root@storageqe-102 ~]# uname -r 5.14.0-340.2819_935944297.el9.x86_64 [root@storageqe-102 ~]# cat /sys/module/sd_mod/parameters/probe sync [root@storageqe-102 ~]# (cd /dev/disk/by-path && ls -l | grep /s) lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-1 -> ../../sda lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-1.0 -> ../../sda lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1.0-part3 -> ../../sda3 lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jul 21 00:32 pci-0000:00:17.0-ata-1-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-2 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-2.0 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-3 -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-3.0 -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-4 -> ../../sdd lrwxrwxrwx 1 root root 9 Jul 21 00:32 pci-0000:00:17.0-ata-4.0 -> ../../sdd [root@storageqe-102 ~]# reboot and test 10 times,the disk order can keep consistent every time [root@storageqe-103 ~]# uname -r 5.14.0-344.el9.x86_64 [root@storageqe-103 ~]# [root@storageqe-103 ~]# cat /sys/module/sd_mod/parameters/probe sync [root@storageqe-103 ~]# [root@storageqe-103 ~]# (cd /dev/disk/by-path && ls -l | grep /s) lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-5 -> ../../sda lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-5.0 -> ../../sda lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5.0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5.0-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jul 30 23:12 pci-0000:00:17.0-ata-5-part2 -> ../../sda2 lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-6 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-6.0 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-7 -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-7.0 -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-8 -> ../../sdd lrwxrwxrwx 1 root root 9 Jul 30 23:12 pci-0000:00:17.0-ata-8.0 -> ../../sdd [root@storageqe-103 ~]# *** Bug 2161940 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6583 Hi: I upgraded several systems to RHEL 9.3 and I notice two things: 1. as John Meneghini said in the upstream merge request: I also hope "nvme" module would has the same parameter https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2819 2. when there are unused sata ports between hard disks, the disk orders are strange and didn't follow pci device id sequence (but orders are still consistent every boot). below is an example for RHEL8/RHEL9 at "dev/disk/by-path" for the same server: RHEL8: lrwxrwxrwx 1 root root 9 Dec 16 07:31 pci-0000:00:17.0-ata-1 -> ../../sda lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-1-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Dec 16 07:31 pci-0000:00:17.0-ata-2 -> ../../sdb lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-2-part3 -> ../../sdb3 lrwxrwxrwx 1 root root 9 Dec 16 07:31 pci-0000:00:17.0-ata-3 -> ../../sdc lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 10 Dec 16 07:31 pci-0000:00:17.0-ata-3-part3 -> ../../sdc3 lrwxrwxrwx 1 root root 9 Dec 16 07:31 pci-0000:00:17.0-ata-8 -> ../../sdd lrwxrwxrwx 1 root root 13 Dec 16 07:31 pci-0000:4b:00.0-nvme-1 -> ../../nvme0n1 lrwxrwxrwx 1 root root 15 Dec 16 07:31 pci-0000:4b:00.0-nvme-1-part1 -> ../../nvme0n1p1 lrwxrwxrwx 1 root root 13 Dec 16 07:31 pci-0000:4c:00.0-nvme-1 -> ../../nvme1n1 lrwxrwxrwx 1 root root 15 Dec 16 07:31 pci-0000:4c:00.0-nvme-1-part1 -> ../../nvme1n1p1 RHEL9: lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-1 -> ../../sda lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-1.0 -> ../../sda lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1.0-part3 -> ../../sda3 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-1-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-2 -> ../../sdc lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-2.0 -> ../../sdc lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2.0-part3 -> ../../sdc3 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-2-part3 -> ../../sdc3 lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-3 -> ../../sdd lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-3.0 -> ../../sdd lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part2 -> ../../sdd2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3.0-part3 -> ../../sdd3 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part2 -> ../../sdd2 lrwxrwxrwx 1 root root 10 Dec 16 15:36 pci-0000:00:17.0-ata-3-part3 -> ../../sdd3 lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-8 -> ../../sdb lrwxrwxrwx 1 root root 9 Dec 16 15:36 pci-0000:00:17.0-ata-8.0 -> ../../sdb lrwxrwxrwx 1 root root 13 Dec 16 15:36 pci-0000:4b:00.0-nvme-1 -> ../../nvme0n1 lrwxrwxrwx 1 root root 15 Dec 16 15:36 pci-0000:4b:00.0-nvme-1-part1 -> ../../nvme0n1p1 lrwxrwxrwx 1 root root 13 Dec 16 15:36 pci-0000:4c:00.0-nvme-1 -> ../../nvme1n1 lrwxrwxrwx 1 root root 15 Dec 16 15:36 pci-0000:4c:00.0-nvme-1-part1 -> ../../nvme1n1p1 Thanks for trying this out and reporting your results. NVMe has never had any consistency with minor device numbering (or any numbering of kernel objects, e.g. if you have multiple NVMe controller instances in a fabric environment you will notice that the numbering is basically never the same. So we do not intend to have a similar module option for NVMe. If you need to positively identify devices, the /dev/by-xxx is the "official" way to do it. Upstream won't accept changes to force the synchronous probing, and the mechanism in the driver core that this RHEL-specific module option for the sd driver is using may go away in the future, so we may end up having to deprecate it in the future (e.g. it may get removed in a future major release). We were really only able to justify this RHEL-specific option to help ease the migration from RHEL 8 to RHEL 9 for people like yourself. There was a recent discussion thread upstream when someone proposed changing the driver core: https://www.spinics.net/lists/linux-scsi/msg191542.html See Greg K-H's response: https://www.spinics.net/lists/linux-scsi/msg191602.html If that's Greg's position, we're not likely to get any movement on an upstream change. -- Note, the actual module option syntax is "sd_mod.probe=sync", I take it you used that? diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 4e88f8acf4f9..e54e59553256 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -131,6 +131,12 @@ static const char *sd_cache_types[] = { "write back, no read (daft)" }; +static char sd_probe_type[6] = "async"; +module_param_string(probe, sd_probe_type, sizeof(sd_probe_type), + S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(probe, "async or sync. Setting to 'sync' disables asynchronous " + "device number assignments (sda, sdb, ...)."); + static void sd_set_flush_flag(struct scsi_disk *sdkp) { bool wc = false, fua = false; @@ -3871,6 +3877,8 @@ static int __init init_sd(void) goto err_out_ppool; } + if (!strcmp(sd_probe_type, "sync")) + sd_template.gendrv.probe_type = PROBE_FORCE_SYNCHRONOUS; err = scsi_register_driver(&sd_template.gendrv); if (err) goto err_out_driver; If so, your case of the device ordering with the synchronous option on being different on RHEL 9.3 than on RHEL 8 is curious, could you attach a full dmesg of the boot on 9.3? There could be something in the ATA code that explains this. Created attachment 2005032 [details]
dmeg when last disk become sdb
Created attachment 2005033 [details]
dmesg when last disk become sdc
Created attachment 2005034 [details]
dmesg when last disk become sdd
Hi: when I check disk order again, I found it changed to what I want (by-path). so I resintall/recheck to find out what cause that. finally I found the boot parameter "console=ttyS1,115200" trigger the correct by-path order. it's strange so I attach the full dmesg logs. without the parameter the last hard disk will become "sdb" about 9 times and "sdc" about 1 time for every 10 reboot. with the parameter the last hard disk will always become "sdd" of course all these are under "sd_mod.probe=sync" as you can see that in dmesg logs. I think what administrator need is a short and persistent name like network device naming. maybe I need to create udev rules at least for nvme disks now. is it possible for RedHat to provide some naming suggestion like network device naming? it would be good if there is a rule to follow so results for every hosts would like similar. Thanks again for your patience and information. |