Description of problem: from bug 505422 : "working with ovirt node based on 5.4 "Red Hat Enterprise Virtualization Hypervisor release 5.4-1.beta4.el5rhev" ... i'm using iscsi storage" After few hours of normal behavior, the multipath starts to identify a device twice: [root@yanive-ovirt-node vdsm]# multipath -ll 3600144f04a30bbe300003048344a6b00 dm-1 SUN,SOLARIS [size=80G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 36:0:0:0 sdb 8:16 [active][ready] 3600144f04a2f73f600003048344a6b00 dm-7 SUN,SOLARIS [size=80G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 36:0:0:0 sdb 8:16 [active][ready] Note that restating the multipath service, and even to removing the dm devices (dmsetup remove_all) didn't help. A reboot did clear the issue. I couldn't see no dmesg errors. Note that the above confuses the LVM and the VDSM (RHEV agent) above it. The target is a SOLARIS zfs target. Note that I saw a similar case in a similar system (5.4 ovirt node) that was connected to a fedora iscsi target.
Since RHEL 5.1 there has been a race between multipath and multipathd to create a multipath device when a new block device is added. Usually it doesn't make a difference which one wins the race. However, it seems like it may be possible to tie, where both see that no device is created and create one at more or less the same time. To remove this race, you can comment out the following line from /etc/udev/rules.d/40-multipath.rules KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m" This keeps multipath from being run when a new block device is added, so multipath devices for new block devices will always be created by multipathd. This is how things work in fedora. If you could recreate this issue, and then try this workaround to see if it helps, that would be great.
OK, we will try it the next time we encounter this problem. Thanks.
(In reply to comment #3) > Since RHEL 5.1 there has been a race between multipath and multipathd to create > a multipath device when a new block device is added. Usually it doesn't make a > difference which one wins the race. However, it seems like it may be possible > to tie, where both see that no device is created and create one at more or less > the same time. > > To remove this race, you can comment out the following line from > /etc/udev/rules.d/40-multipath.rules > > KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | > /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m" > > This keeps multipath from being run when a new block device is added, so > multipath devices for new block devices will always be created by multipathd. > This is how things work in fedora. > > If you could recreate this issue, and then try this workaround to see if it > helps, that would be great. I just looked at this issue and it appears to me that it lacks some information. However, I can speculate that the sequence of events was like that. Device (GUID 3600144f04a2f73f600003048344a6b00) appeared and was enumerated by the kernel to be sdb. It was picked up by multipath and device /dev/mpath/3600144f04a2f73f600003048344a6b00 was created. At some later time device disappeared (probably as a course of QA testing) and sdb was vacated. However multipath didn't remove its mapping, waiting for *sdb* to come back. At some later point in time *new* device (GUID 3600144f04a30bbe300003048344a6b00) appeared. It was enumerated by kernel as sdb (since it was vacated earlier!). At this point multipath got itself into a problem - on the one hand the sdb came back and 3600144f04a2f73f600003048344a6b00 could be reinstated. On the other hand inquiring sdb brought new identification and *new* device was created (3600144f04a30bbe300003048344a6b00). Does that sound valid ? If so, is it correct behavior for multipath to rely on kernel enumeration ? It seems pretty challenging for the kernel to enumerate device in any other way if it appears at the same target/LU, same size, same everything except for the GUID.
Does this setup have /var or /var/lib on a different device than the root filesystem? If so, then bindings_file "/etc/multipath_bindings" needs to be in the defaults section of /etc/multipath.conf. Otherwise, multipath will create device bindings in /var/lib/multipath/bindings on boot before /var or /var/lib is mounted on top of it, and then the bindings will change out from under it.
(In reply to comment #13) > Does this setup have /var or /var/lib on a different device than the root > filesystem? If so, then > Doesn't appear that way (p.s. Please do not put sfrank on needinfo as he no longer works in RH, thanks) [root@red-vdsa ~]# mount /dev/mapper/live-rw on / type ext2 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) /dev/mapper/SServeRA_disk1_F4A5A7AFp1 on /boot type ext3 (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) tmpfs on /dev/shm type tmpfs (rw) /dev/mapper/HostVG-Logging on /var/log type ext3 (rw) /dev/mapper/HostVG-Data on /data type ext3 (rw) /data/images on /var/lib/libvirt/images type bind (rw,bind) /data/core on /var/log/core type bind (rw,bind) none on /var/vdsm/pools type tmpfs (rw,size=1048576) 10.35.16.2:/export/images/qa/paikovdomain on /rhev/data-center/mnt/10.35.16.2:_export_images_qa_paikovdomain type nfs (rw,soft,timeo=600,retrans=2,nosharecache,addr=10.35.16.2) 10.35.16.2:/export/images/qa/paikoviso on /rhev/data-center/mnt/10.35.16.2:_export_images_qa_paikoviso type nfs (rw,soft,timeo=600,retrans=2,nosharecache,addr=10.35.16.2) /dev/mapper/655ec061--b843--4d86--aa94--5c1c9135c8ae-master on /rhev/data-center/mnt/blockSD/655ec061-b843-4d86-aa94-5c1c9135c8ae/master type ext3 (rw) [root@red-vdsa ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 5.5-2.2 (0.16.1)
I have a question that goes back to Comment #9. I'm not sure, but that scenario may be possible, but if it is, there would be one requirement. The sbd device would have to never be reomved from the system. Unusally when a device is removed, the kernel sends a remove uevent, and multipathd removes the device from its mapping. When I comes back, multipathd treats it as an entirely new device and it is included in a multipath mapping based on its WWID. However, if it is possible to switch what device is being pointed to by sdb on the storage in a way that makes it return a different WWID, without the node that's running multipathd ever removing sdb, then this might be able to happen, although I'd have to dig through the code to verify that. Is there any way that you can see for the sdb device to be changing WWID like this in your setup? The only other possibility that I can thing of is that there might be some problem in the multipath code, related to removing the last path from a device. I'll take a look for that.
what would trigger the remove uevent? In their setups QE reuse and recreate devices a lot. Could this be a race? or perhaps multipath missed the event somehow? iirc remove events show up in /var/log/messages right? so we should be able to correlate the times of occurrence with the "duplicate" messages in vdsm log (next time we see this).
Yes, you should be able to see the remove uevent in the logs <devname>: remove path (uevent) It would be great if you could check for this the next time you see the error, and post it and any nearby multipathd messages (or just attach /var/log/messages, with a pointer to where the the remove event is in it)
The thing I'd really like to know about this issue is whether or not the path device is really changing its WWID. Looking at the initial comment, the two different multipath devices have different wwids. If sdb originally has the wwid of one, but later switched wwids, that would be really helpful to find out. Also, I'd like to know if the number of multipath devices increases when the the duplicate appears, or whether another multipath device is getting reloaded with duplicate paths.
O.k. I don't know why things are getting mixed up, but I've got a pretty good handle on how and when it is happening. This issue is due to something weird in early boot. When this happens, you should notice the dual devices as soon as the machine starts up. I've rebooted the machine and verified that this is the case. For some reason, multipath gets the scsi_id for the device sda as 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477 when it's run in the initrd. And once it's out of the initrd, it gets SATA_WDC_WD800JD-75M_WD-WMAM9CKD9477. Usually, when multipath see that a map has changed like this, it simply removes the old map. However, by that time the orginal multipath device is part of HostVG, and so it can't be removed. This means that multipath now has two maps. The one created by the initrd, and the new one. No matter what you do, multipath will keep thinking that the new map is how it's supposed to be setup, but will keep being unable to delete the old map. I have no idea why the device comes up with the scsi_id of 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477p2 when it's run during the initrd. Looking the dmesg output from immediately after boot, I see ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: WDC WD800JD-75MSA3, 10.01E04, max UDMA/133 ata1.00: 156250000 sectors, multi 8: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 ata2: SATA link down (SStatus 4 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata4: SATA link down (SStatus 4 SControl 300) Vendor: ATA Model: WDC WD800JD-75MS Rev: 10.0 Type: Direct-Access ANSI SCSI revision: 05 which does match pretty well with the initrd value (it has 75MSA3, instead of 75M). So it appears that this isn't simply multipath garbling the name. I tried copying the scsi_id from the initrd to the system, and running that, to see if it gave me a different answer than the one at /sbin/scsi_id. It didn't.
(For reference, this problem may be the same as https://bugzilla.redhat.com/show_bug.cgi?id=569356#c6 That part of the BZ was never resolved. ) man scsi_id indicates that "In order to generate unique values for either page 0x80 or page 0x83, the serial numbers or world wide names are prefixed as follows. Identifiers based on page 0x80 are prefixed by the character ’S’, the SCSI vendor, the SCSI product (model) and then the the serial number returned by page 0x80.... Identifiers based on page 0x83 are prefixed by the identifier type followed by the page 0x83 identifier. For example, a device with a NAA (Name Address Authority) type of 3... " So it would appear that 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477 comes from page 0x83, with a identifier type = 1 (T10 vendor ID based DESIGNATOR field format), and SATA_WDC_WD800JD-75M_WD-WMAM9CKD9477 comes from 0x80. (The product ID on the two pages is apparently slightly different, although that would not matter because scsi_id uses a different prefix anyway.) This should be easy enough to verify with scsi_id -p . The problem is that scsi_id's "default behaviour is to query the available VPD pages, and use page 0x83 if found, else page 0x80 if found, else nothing". It appears as though in some cases the attempt to read page 0x83 fails so scsi_id fails back to page 0x80, and (by design) gets a different answer. It may be that the only backward-compatible solution is to re-try the read of page 0x83 several times. Possibly look at the low-level failure code (if you can get it) to differentiate "the Inquiry I/O failed" from "the I/O succeeded, but there is no page 0x83 data". Retry the first case many times. Don't retry the second case (this case should be quite unusual with modern hardware - they should all have page 0x83). Maybe there is another solution? (We might want to get this in RHEL 6 as well.) Tom
I don't currently have access to those machines, but I though I did run # scsi_id -g -u -s -p 0x83 /block/<devname> and got the same thing. However, maybe not. I was suspecting that when it was in early boot, it was using the 0x80 page, and when it was in normal operations it was using the 0x83 page, so maybe I only tried -p 0x80, but from the scsi_id man page snippet you posted, it appears to be the opposite. However I ran the command a number of times and could never reproduce the 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477 answer, which is what we should be seeing. Possibly, we can edit the multipath.conf in the initrd to force it to use the 0x80 page. However, that would only be to prove that this is what's happening. As a general solution, that would break more systems than it fixes.
The problem is that /etc/scsi_id.config contains the following line: vendor="ATA",options=-p 0x80 So after boot the ata device is always queried through page 0x80 (even if -p 0x83 is used). But when udev creates the device the above does not take effect so it uses page 0x83 which returns (after commenting out the above line): [root@demo-vdsc device]# /lib/udev/scsi_id -g -u -p 0x83 -s /block/sda -d /dev/sda 1ATA_SAMSUNG_HD080HJ_S08EJ1HP300616_ The problem is this configuration is there because not all ATA devices support 0x83 (or so the comment says).
(In reply to comment #26) > The problem is that /etc/scsi_id.config contains the following line: > vendor="ATA",options=-p 0x80 Good catch! Solution is then for RHEV-H to include /etc/scsi_id.config into initrd, then we should get consistent id in early boot and later. So that explains SATA on RHEV-H, but comment 0 and bug 569356 comment 6 are not vendor=ATA. Also, was this ever reproduced on plain RHEL?
(In reply to comment #27) > (In reply to comment #26) > > The problem is that /etc/scsi_id.config contains the following line: > > vendor="ATA",options=-p 0x80 > > Good catch! Solution is then for RHEV-H to include /etc/scsi_id.config into > initrd, then we should get consistent id in early boot and later. Ah, that would indeed explain why this is seen on RHEV-H but not on RHEL!!! Great. > > So that explains SATA on RHEV-H, but comment 0 and bug 569356 comment 6 are not > vendor=ATA. Also, was this ever reproduced on plain RHEL? Comment 0 is pretty irrelevant I think. That was probably a bug in vdsm and we changed that area so many times I doubt if it happens. Let's move this bug to RHEV-H and do the above fix. If this resurfaces elsewhere we will reopen the bug.
(In reply to comment #26) > The problem is that /etc/scsi_id.config contains the following line: > vendor="ATA",options=-p page ... > The problem is this configuration is there because not all ATA devices support > 0x83 (or so the comment says). It seems unfortunate that they put this in the config file, rather than just allow scsi_id to try page 0x83, fail, and then fall-back to page 0x80. Maybe the page 0x83 attempt did not fail cleanly on some ATA devices, so they had to do it this way. Anyway, we can't change that now. > So after boot the ata device is always queried through page 0x80 > (even if -p 0x83 is used). That also seems unfortunate. Doesn't a command-line option usually override settings in the configuration file? (In reply to comment #27) > Also, was this ever reproduced on plain RHEL? I have never heard of this on plain RHEL. (In reply to comment #28) > Let's move this bug to RHEV-H and do the above fix. If this resurfaces > elsewhere we will reopen the bug. That's okay with me. Ben (or Mike Christie?), do you think it is overly-paranoid to put a re-try in multipath - "if scsi_id returns an id that begins with 'S', or it returns nothing, then try again, just in case the initial page 0x83 read failed for some reason on the first attempt". It seems unnecessary, since as far as we know, this has never been a problem, but on the other hand, we don't want WWIDs to change on the fly.
I'd rather not second guess the scsi_id returns. Also, what should this bug be assigned to now?
Moved to RHEV-H, to be fixed for 5.5-2.2.z BTW, /etc/scsi_id.config is not included in RHEL-6 or Fedora udev, instead there's udev rule which specifies -p0x80 for bus=ata Other rules don't specify -p in which case it tries 83 then 80, as documented in the man page (there isnt' fallback if 83 or 80 is specified, by either -p option or config file) Similar in multipath.conf getuid_callout defaults: --page is specified for only 3 specific vendor/models: pre-spc3-83 for EMC/SYMMATRIX and 0x80 for PIVOT3/RAIGE VOLUME and EUROLOGC/FC2502 What about having explicit --page=0x83 in multipath defaults for all others, to avoid any possibility for a fallback? (In reply to comment #29) > > So after boot the ata device is always queried through page 0x80 > > (even if -p 0x83 is used). > > That also seems unfortunate. Doesn't a command-line option usually override > settings in the configuration file? Yeah, definitely unusual. Looking at scsi_id.c (udev-147-2.20.el6) - the idea seems to be that command-line option sets the default which config file can override *per device* (vendor/model) which opposite the usual order of precedence config-env-options.
(In reply to comment #31) > What about having explicit --page=0x83 in multipath defaults for all others, to > avoid any possibility for a fallback? That would mean that non-configured devices that do not reply on 0x83 would not be registered with any guid/serial_id.
(In reply to comment #32) > (In reply to comment #31) > > What about having explicit --page=0x83 in multipath defaults for all others, to > > avoid any possibility for a fallback? > > That would mean that non-configured devices that do not reply on 0x83 would not > be registered with any guid/serial_id. Yea, page 0x80 was defined in the standard first, and I believe it was mandatory for all devices. Then they added page 0x83 as optional. Then they eventually made 0x83 mandatory and page 0x80 optional. So older devices (and ATA devices that emulate SCSI) will have no page 0x83. We'd have to be incredibly wary of changing anything in this area, since it can lead to corruption, either because of device WWID changes, or because multiple paths to a device are not detected as such. We should leave it as-is and keep an eye on this area to avoid any changes that we will regret in the future.
(In reply to comment #33) > (In reply to comment #32) > > (In reply to comment #31) I meant to change it only in specific device section where it can be verified to support 0x83. defaults section would keep getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
RHEV-H 5.5-2.2 Update 1 will add /etc/scsi_id.config into livecd initrd
*** Bug 554311 has been marked as a duplicate of this bug. ***
*** Bug 555694 has been marked as a duplicate of this bug. ***
Reproduced the duplicated scsi_id issue of local SATA disk with build 5.5-2.2.4.2, which is the last build before the fix. After install rhevh 5.5-2.2-4.2 to local sata disk, there is duplicated device in /dev/mapper. And there is also duplicated device error in ovirt.log, just as expected. Verified the fix with latest rhevh 5.6-2.2-5.1, after install it to local sata disk, there is no such error any more. Manully decompressed and checked the "initrd0.img" file, also find that rhevh version 5.5-2.2-4.2 dose not has "etc/scsi_id.conf" in initrd. And initrd0.img in all later rhevh versions( including 5.6-2.2-5.1) has that config file. So we can verify the fix is working.
change the title of this bug, as it is nothing related to multipath issue, we can ignore comment #0
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When Red Hat Enterprise Virtualization Hypervisor was installed on a SATA device, Device-Mapper Multipath could incorrectly identify this device twice. This was caused by the scsi_id tool returning different values in an early boot and later. With this update, scsi_id always returns the same value, and each SATA device is identified only once.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0148.html