505669 – duplicate device error when installed in local SATA disk

Bug 505669 - duplicate device error when installed in local SATA disk

Summary: duplicate device error when installed in local SATA disk

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	livecd-tools
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Alan Pevec
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	554311 555694 (view as bug list)
Depends On:
Blocks:	505422
TreeView+	depends on / blocked

Reported:	2009-06-12 20:11 UTC by Shahar Frank
Modified:	2016-04-26 14:54 UTC (History)
CC List:	30 users (show)
Fixed In Version:	livecd-tools-021-15.el5
Doc Type:	Bug Fix
Doc Text:	When Red Hat Enterprise Virtualization Hypervisor was installed on a SATA device, Device-Mapper Multipath could incorrectly identify this device twice. This was caused by the scsi_id tool returning different values in an early boot and later. With this update, scsi_id always returns the same value, and each SATA device is identified only once.
Clone Of:
Clones:	611610 (view as bug list)
Environment:
Last Closed:	2011-01-13 18:41:57 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0148	0	normal	SHIPPED_LIVE	rhev-hypervisor bug fix update	2011-01-13 18:40:52 UTC

Description Shahar Frank 2009-06-12 20:11:45 UTC

Description of problem:


from bug 505422 :

"working with ovirt node based on 5.4 "Red Hat Enterprise Virtualization
Hypervisor release 5.4-1.beta4.el5rhev"

...

i'm using iscsi storage"

After few hours of normal behavior, the multipath starts to identify a device twice:

[root@yanive-ovirt-node vdsm]# multipath -ll
3600144f04a30bbe300003048344a6b00 dm-1 SUN,SOLARIS
[size=80G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 36:0:0:0 sdb 8:16  [active][ready]
3600144f04a2f73f600003048344a6b00 dm-7 SUN,SOLARIS
[size=80G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 36:0:0:0 sdb 8:16  [active][ready]

Note that restating the multipath service, and even to removing the dm devices
(dmsetup remove_all) didn't help. A reboot did clear the issue. I couldn't see no dmesg errors.   

Note that the above confuses the LVM and the VDSM (RHEV agent) above it.

The target is a SOLARIS zfs target.

Note that I saw a similar case in a similar system (5.4 ovirt node) that was connected to a fedora iscsi target.

Comment 3 Ben Marzinski 2009-06-15 22:37:57 UTC

Since RHEL 5.1 there has been a race between multipath and multipathd to create a multipath device when a new block device is added.  Usually it doesn't make a difference which one wins the race.  However, it seems like it may be possible to tie, where both see that no device is created and create one at more or less the same time.

To remove this race, you can comment out the following line from /etc/udev/rules.d/40-multipath.rules

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This keeps multipath from being run when a new block device is added, so multipath devices for new block devices will always be created by multipathd.  This is how things work in fedora.

If you could recreate this issue, and then try this workaround to see if it helps, that would be great.

Comment 4 Shahar Frank 2009-06-16 11:37:49 UTC

OK, we will try it the next time we encounter this problem. Thanks.

Comment 9 Cyril Plisko 2009-12-08 09:38:03 UTC

(In reply to comment #3)
> Since RHEL 5.1 there has been a race between multipath and multipathd to create
> a multipath device when a new block device is added.  Usually it doesn't make a
> difference which one wins the race.  However, it seems like it may be possible
> to tie, where both see that no device is created and create one at more or less
> the same time.
> 
> To remove this race, you can comment out the following line from
> /etc/udev/rules.d/40-multipath.rules
> 
> KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod |
> /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"
> 
> This keeps multipath from being run when a new block device is added, so
> multipath devices for new block devices will always be created by multipathd. 
> This is how things work in fedora.
> 
> If you could recreate this issue, and then try this workaround to see if it
> helps, that would be great.  

I just looked at this issue and it appears to me that it lacks some information.
However, I can speculate that the sequence of events was like that.

Device (GUID 3600144f04a2f73f600003048344a6b00) appeared and was enumerated by the kernel to be sdb. It was picked up by multipath and device /dev/mpath/3600144f04a2f73f600003048344a6b00 was created. At some later time device disappeared (probably as a course of QA testing) and sdb was vacated. However multipath didn't remove its mapping, waiting for *sdb* to come back. At some later point in time *new* device (GUID 3600144f04a30bbe300003048344a6b00) appeared. It was enumerated by kernel as sdb (since it was vacated earlier!).
At this point multipath got itself into a problem - on the one hand the sdb came back and 3600144f04a2f73f600003048344a6b00 could be reinstated. On the other hand inquiring sdb brought new identification and *new* device was created (3600144f04a30bbe300003048344a6b00).

Does that sound valid ?

If so, is it correct behavior for multipath to rely on kernel enumeration ?
It seems pretty challenging for the kernel to enumerate device in any other way if it appears at the same target/LU, same size, same everything except for the GUID.

Comment 13 Ben Marzinski 2010-04-27 17:18:02 UTC

Does this setup have /var or /var/lib on a different device than the root filesystem?  If so, then

bindings_file "/etc/multipath_bindings"

needs to be in the defaults section of /etc/multipath.conf.  Otherwise, multipath will create device bindings in /var/lib/multipath/bindings on boot before /var or /var/lib is mounted on top of it, and then the bindings will change out from under it.

Comment 14 Ayal Baron 2010-05-23 12:59:01 UTC

(In reply to comment #13)
> Does this setup have /var or /var/lib on a different device than the root
> filesystem?  If so, then
> 
Doesn't appear that way


(p.s. Please do not put sfrank on needinfo as he no longer works in RH, thanks)


[root@red-vdsa ~]# mount
/dev/mapper/live-rw on / type ext2 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/mapper/SServeRA_disk1_F4A5A7AFp1 on /boot type ext3 (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/mapper/HostVG-Logging on /var/log type ext3 (rw)
/dev/mapper/HostVG-Data on /data type ext3 (rw)
/data/images on /var/lib/libvirt/images type bind (rw,bind)
/data/core on /var/log/core type bind (rw,bind)
none on /var/vdsm/pools type tmpfs (rw,size=1048576)
10.35.16.2:/export/images/qa/paikovdomain on /rhev/data-center/mnt/10.35.16.2:_export_images_qa_paikovdomain type nfs (rw,soft,timeo=600,retrans=2,nosharecache,addr=10.35.16.2)
10.35.16.2:/export/images/qa/paikoviso on /rhev/data-center/mnt/10.35.16.2:_export_images_qa_paikoviso type nfs (rw,soft,timeo=600,retrans=2,nosharecache,addr=10.35.16.2)
/dev/mapper/655ec061--b843--4d86--aa94--5c1c9135c8ae-master on /rhev/data-center/mnt/blockSD/655ec061-b843-4d86-aa94-5c1c9135c8ae/master type ext3 (rw)
[root@red-vdsa ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 5.5-2.2 (0.16.1)

Comment 16 Ben Marzinski 2010-05-24 14:47:10 UTC

I have a question that goes back to Comment #9.  I'm not sure, but that scenario may be possible, but if it is, there would be one requirement.  The sbd device would have to never be reomved from the system.

Unusally when a device is removed, the kernel sends a remove uevent, and multipathd removes the device from its mapping.  When I comes back, multipathd treats it as an entirely new device and it is included in a multipath mapping based on its WWID.  However, if it is possible to switch what device is being pointed to by sdb on the storage in a way that makes it return a different WWID, without the node that's running multipathd ever removing sdb, then this might be able to happen, although I'd have to dig through the code to verify that.

Is there any way that you can see for the sdb device to be changing WWID like this in your setup?  The only other possibility that I can thing of is that there might be some problem in the multipath code, related to removing the last path from a device.  I'll take a look for that.

Comment 17 Ayal Baron 2010-05-24 14:59:47 UTC

what would trigger the remove uevent?
In their setups QE reuse and recreate devices a lot.
Could this be a race? or perhaps multipath missed the event somehow?
iirc remove events show up in /var/log/messages right? so we should be able to correlate the times of occurrence with the "duplicate" messages in vdsm log (next time we see this).

Comment 18 Ben Marzinski 2010-05-24 15:22:38 UTC

Yes, you should be able to see the remove uevent in the logs

<devname>: remove path (uevent)

It would be great if you could check for this the next time you see the error, and post it and any nearby multipathd messages (or just attach /var/log/messages, with a pointer to where the the remove event is in it)

Comment 19 Ben Marzinski 2010-06-16 04:25:12 UTC

The thing I'd really like to know about this issue is whether or not the path device is really changing its WWID. Looking at the initial comment, the two different multipath devices have different wwids.  If sdb originally has the wwid of one, but later switched wwids, that would be really helpful to find out.  Also, I'd like to know if the number of multipath devices increases when the the duplicate appears, or whether another multipath device is getting reloaded with duplicate paths.

Comment 20 Ben Marzinski 2010-06-17 23:06:13 UTC

O.k. I don't know why things are getting mixed up, but I've got a pretty good handle on how and when it is happening.

This issue is due to something weird in early boot. When this happens, you should notice the dual devices as soon as the machine starts up. I've rebooted the machine and verified that this is the case.

For some reason, multipath gets the scsi_id for the device sda as 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477 when it's run in the initrd. And once it's out of the initrd, it gets SATA_WDC_WD800JD-75M_WD-WMAM9CKD9477. Usually, when multipath see that a map has changed like this, it simply removes the old map. However, by that time the orginal multipath device is part of HostVG,
and so it can't be removed. This means that multipath now has two maps. The one created by the initrd, and the new one. No matter what you do, multipath will keep thinking that the new map is how it's supposed to be setup, but will keep
being unable to delete the old map.

I have no idea why the device comes up with the scsi_id of 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477p2 when it's run during the initrd. Looking the dmesg output from immediately after boot, I see

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: WDC WD800JD-75MSA3, 10.01E04, max UDMA/133
ata1.00: 156250000 sectors, multi 8: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata2: SATA link down (SStatus 4 SControl 300)
ata3: SATA link down (SStatus 4 SControl 300)
ata4: SATA link down (SStatus 4 SControl 300)
Vendor: ATA Model: WDC WD800JD-75MS Rev: 10.0
Type: Direct-Access ANSI SCSI revision: 05

which does match pretty well with the initrd value (it has 75MSA3, instead of
75M). So it appears that this isn't simply multipath garbling the name. I
tried copying the scsi_id from the initrd to the system, and running that, to
see if it gave me a different answer than the one at /sbin/scsi_id. It didn't.

Comment 21 Tom Coughlan 2010-06-23 16:20:25 UTC

(For reference, this problem may be the same as
https://bugzilla.redhat.com/show_bug.cgi?id=569356#c6
That part of the BZ was never resolved. )

man scsi_id indicates that

"In order to generate unique values for either page 0x80 or page 0x83, the serial numbers or world wide names are prefixed as follows.

Identifiers based on page 0x80 are prefixed by the character ’S’, the SCSI vendor, the SCSI product (model) and then the the serial number returned by page 0x80.... Identifiers based on page 0x83 are prefixed by the identifier type followed by the page 0x83 identifier. For example, a device with a NAA (Name Address Authority) type of 3... "

So it would appear that

1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477

comes from page 0x83, with a identifier type = 1 (T10 vendor ID based DESIGNATOR field format),

and

SATA_WDC_WD800JD-75M_WD-WMAM9CKD9477

comes from 0x80. (The product ID on the two pages is apparently slightly different, although that would not matter because scsi_id uses a different prefix anyway.)

This should be easy enough to verify with scsi_id -p .

The problem is that scsi_id's "default behaviour is to query the available VPD pages, and use page 0x83 if found, else page 0x80 if found, else nothing".

It appears as though in some cases the attempt to read page 0x83 fails so scsi_id fails back to page 0x80, and (by design) gets a different answer.

It may be that the only backward-compatible solution is to re-try the read of page 0x83 several times. Possibly look at the low-level failure code (if you can get it) to differentiate "the Inquiry I/O failed" from "the I/O succeeded, but there is no page 0x83 data". Retry the first case many times. Don't retry the second case (this case should be quite unusual with modern hardware - they should all have page 0x83).

Maybe there is another solution?

(We might want to get this in RHEL 6 as well.)

Tom

Comment 22 Ben Marzinski 2010-06-23 16:53:30 UTC

I don't currently have access to those machines, but I though I did run

# scsi_id -g -u -s -p 0x83 /block/<devname>

and got the same thing. However, maybe not.  I was suspecting that when it was in early boot, it was using the 0x80 page, and when it was in normal operations it was using the 0x83 page, so maybe I only tried -p 0x80, but from the scsi_id man page snippet you posted, it appears to be the opposite. However I ran the command a number of times and could never reproduce the 1ATA_WDC_WD800JD-75MSA3_WD-WMAM9CKD9477 answer, which is what we should be seeing.  Possibly, we can edit the multipath.conf in the initrd to force it to use the 0x80 page. However, that would only be to prove that this is what's happening.  As a general solution, that would break more systems than it fixes.

Comment 26 Ayal Baron 2010-06-24 21:04:02 UTC

The problem is that /etc/scsi_id.config contains the following line:
vendor="ATA",options=-p 0x80
So after boot the ata device is always queried through page 0x80 (even if -p 0x83 is used).
But when udev creates the device the above does not take effect so it uses page 0x83 which returns (after commenting out the above line):
[root@demo-vdsc device]# /lib/udev/scsi_id -g -u -p 0x83 -s /block/sda -d /dev/sda
1ATA_SAMSUNG_HD080HJ_S08EJ1HP300616_

The problem is this configuration is there because not all ATA devices support 0x83 (or so the comment says).

Comment 27 Alan Pevec 2010-06-24 21:28:37 UTC

(In reply to comment #26)
> The problem is that /etc/scsi_id.config contains the following line:
> vendor="ATA",options=-p 0x80

Good catch! Solution is then for RHEV-H to include /etc/scsi_id.config into initrd, then we should get consistent id in early boot and later.

So that explains SATA on RHEV-H, but comment 0 and bug 569356 comment 6 are not vendor=ATA. Also, was this ever reproduced on plain RHEL?

Comment 28 Ayal Baron 2010-06-25 05:56:55 UTC

(In reply to comment #27)
> (In reply to comment #26)
> > The problem is that /etc/scsi_id.config contains the following line:
> > vendor="ATA",options=-p 0x80
> 
> Good catch! Solution is then for RHEV-H to include /etc/scsi_id.config into
> initrd, then we should get consistent id in early boot and later.
Ah, that would indeed explain why this is seen on RHEV-H but not on RHEL!!!
Great.

> 
> So that explains SATA on RHEV-H, but comment 0 and bug 569356 comment 6 are not
> vendor=ATA. Also, was this ever reproduced on plain RHEL?    
Comment 0 is pretty irrelevant I think.  That was probably a bug in vdsm and we changed that area so many times I doubt if it happens.
Let's move this bug to RHEV-H and do the above fix.  If this resurfaces elsewhere we will reopen the bug.

Comment 29 Tom Coughlan 2010-06-26 00:24:41 UTC

(In reply to comment #26)
> The problem is that /etc/scsi_id.config contains the following line:
> vendor="ATA",options=-p page 
...
> The problem is this configuration is there because not all ATA devices support
> 0x83 (or so the comment says).    

It seems unfortunate that they put this in the config file, rather than just allow scsi_id to try page 0x83, fail, and then fall-back to page 0x80. Maybe the page 0x83 attempt did not fail cleanly on some ATA devices, so they had to do it this way. Anyway, we can't change that now. 

>  So after boot the ata device is always queried through page 0x80 
> (even if -p 0x83 is used).

That also seems unfortunate. Doesn't a command-line option usually override settings in the configuration file?

(In reply to comment #27)

> Also, was this ever reproduced on plain RHEL?  

I have never heard of this on plain RHEL.

(In reply to comment #28)

> Let's move this bug to RHEV-H and do the above fix.  If this resurfaces
> elsewhere we will reopen the bug.    

That's okay with me.

Ben (or Mike Christie?), do you think it is overly-paranoid to put a re-try in multipath - "if scsi_id returns an id that begins with 'S', or it returns nothing, then try again, just in case the initial page 0x83 read failed for some reason on the first attempt". It seems unnecessary, since as far as we know, this has never been a problem, but on the other hand, we don't want WWIDs to change on the fly.

Comment 30 Ben Marzinski 2010-06-30 15:54:49 UTC

I'd rather not second guess the scsi_id returns.

Also, what should this bug be assigned to now?

Comment 31 Alan Pevec 2010-06-30 17:56:45 UTC

Moved to RHEV-H, to be fixed for 5.5-2.2.z

BTW, /etc/scsi_id.config is not included in RHEL-6 or Fedora udev, instead there's udev rule which specifies -p0x80 for bus=ata
Other rules don't specify -p in which case it tries 83 then 80, as documented in the man page (there isnt' fallback if 83 or 80 is specified, by either -p option or config file)

Similar in multipath.conf getuid_callout defaults: --page is specified for only 3 specific vendor/models: pre-spc3-83 for EMC/SYMMATRIX and 0x80 for PIVOT3/RAIGE VOLUME and EUROLOGC/FC2502

What about having explicit --page=0x83 in multipath defaults for all others, to avoid any possibility for a fallback?

(In reply to comment #29)
> >  So after boot the ata device is always queried through page 0x80 
> > (even if -p 0x83 is used).
> 
> That also seems unfortunate. Doesn't a command-line option usually override
> settings in the configuration file?

Yeah, definitely unusual. Looking at scsi_id.c (udev-147-2.20.el6) - the idea seems to be that command-line option sets the default which config file can override *per device* (vendor/model) which opposite the usual order of precedence config-env-options.

Comment 32 Ayal Baron 2010-06-30 18:32:17 UTC

(In reply to comment #31)
> What about having explicit --page=0x83 in multipath defaults for all others, to
> avoid any possibility for a fallback?

That would mean that non-configured devices that do not reply on 0x83 would not 
be registered with any guid/serial_id.

Comment 33 Tom Coughlan 2010-06-30 19:27:23 UTC

(In reply to comment #32)
> (In reply to comment #31)
> > What about having explicit --page=0x83 in multipath defaults for all others, to
> > avoid any possibility for a fallback?
> 
> That would mean that non-configured devices that do not reply on 0x83 would not 
> be registered with any guid/serial_id.   

Yea, page 0x80 was defined in the standard first, and I believe it was mandatory for all devices. Then they added page 0x83 as optional. Then they eventually made 0x83 mandatory and page 0x80 optional. So older devices (and ATA devices that emulate SCSI) will have no page 0x83. 

We'd have to be incredibly wary of changing anything in this area, since it can lead to corruption, either because of device WWID changes, or because multiple paths to a device are not detected as such. We should leave it as-is and keep an eye on this area to avoid any changes that we will regret in the future.

Comment 34 Alan Pevec 2010-06-30 20:30:08 UTC

(In reply to comment #33)
> (In reply to comment #32)
> > (In reply to comment #31)

I meant to change it only in specific device section where it can be verified to support 0x83.

defaults section would keep
 getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"

Comment 35 Alan Pevec 2010-06-30 22:17:52 UTC

RHEV-H 5.5-2.2 Update 1 will add /etc/scsi_id.config into livecd initrd

Comment 36 Alan Pevec 2010-07-02 18:53:29 UTC

*** Bug 554311 has been marked as a duplicate of this bug. ***

Comment 37 Alan Pevec 2010-07-02 18:53:55 UTC

*** Bug 555694 has been marked as a duplicate of this bug. ***

Comment 41 Pengzhen Cao 2010-11-11 13:15:09 UTC

Reproduced the duplicated scsi_id issue of local SATA disk with build
5.5-2.2.4.2, which is the last build before the fix.  After install rhevh
5.5-2.2-4.2 to local sata disk, there is duplicated device in /dev/mapper. And
there is also duplicated device error in ovirt.log, just as expected.

Verified the fix with latest rhevh 5.6-2.2-5.1, after install it to local sata
disk, there is no such error any more.

Manully decompressed and  checked the "initrd0.img" file, also find that rhevh
version 5.5-2.2-4.2 dose not has "etc/scsi_id.conf" in initrd. And initrd0.img
in all later rhevh versions( including 5.6-2.2-5.1) has that config file.

So we can verify the fix is working.

Comment 42 Pengzhen Cao 2010-11-11 13:22:03 UTC

change the title of this bug, as it is nothing related to multipath issue, we can ignore comment #0

Comment 43 Jaromir Hradilek 2010-12-16 16:57:39 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When Red Hat Enterprise Virtualization Hypervisor was installed on a SATA device, Device-Mapper Multipath could incorrectly identify this device twice. This was caused by the scsi_id tool returning different values in an early boot and later. With this update, scsi_id always returns the same value, and each SATA device is identified only once.

Comment 45 errata-xmlrpc 2011-01-13 18:41:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0148.html

Note You need to log in before you can comment on or make changes to this bug.

abaron
agk
apevec
bloch
bmarzins
bmr
christophe.varoqui
coughlan
cplisko
dwysocha
egoggin
harald
heinzm
iannis
jfenal
junichi.nomura
kueda
llim
lmb
mbroz
mchristi
mshao
ovirt-maint
pcao
prockai
rluxenbe
tranlan
vbian
ycui
yeylon