Bug 740289 - Failed to create storage domains in rhev-h
Summary: Failed to create storage domains in rhev-h
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ovirt-node
Version: 6.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Mike Burns
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-21 14:47 UTC by Mike Burns
Modified: 2011-12-06 19:28 UTC (History)
12 users (show)

Fixed In Version: ovirt-node-2.0.2-0.10.gitee3b50c.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 19:28:18 UTC


Attachments (Terms of Use)
vdsm.log (84.16 KB, text/plain)
2011-09-21 14:50 UTC, Mike Burns
no flags Details
Patch (2.55 KB, patch)
2011-09-30 21:38 UTC, Mike Burns
no flags Details | Diff
Follow-up Patch (4.34 KB, patch)
2011-10-02 14:35 UTC, Mike Burns
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1783 normal SHIPPED_LIVE rhev-hypervisor6 bug fix and enhancement update 2011-12-06 15:10:54 UTC

Description Mike Burns 2011-09-21 14:47:03 UTC
Description of problem:
When creating storage domains, something goes wrong on rhev-h causing the storage domain creation to fail

I've seen this with FC and iSCSI.  

In UI, FC reports Error code: 351 and has a stack trace:

Thread-94::DEBUG::2011-09-21 14:26:36,849::lvm::509::OperationMutex::(_reloadlvs) Operation 'lvm reload operation' released the operation mutex
Thread-94::ERROR::2011-09-21 14:26:36,849::task::865::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/storage/hsm.py", line 1193, in public_createStorageDomain
  File "/usr/share/vdsm/storage/sdf.py", line 60, in create
  File "/usr/share/vdsm/storage/blockSD.py", line 310, in create
  File "/usr/share/vdsm/storage/blockSD.py", line 431, in getMetaDataMapping
ValueError: list.remove(x): x not in list

Version-Release number of selected component (if applicable):
Any 6.2 rhev-h

How reproducible:
Always

Steps to Reproduce:
1. install rhev-h
2. register to rhev-m
3. add FC storage domain
  
Actual results:
Failed to add SD

Expected results:


Additional info:

This is only on fully 6.2 RHEV-H isos.  It does not happen with hybrid builds

Comment 1 Mike Burns 2011-09-21 14:50:39 UTC
Created attachment 524214 [details]
vdsm.log

Comment 2 Mike Burns 2011-09-21 15:24:50 UTC
A couple of possibly helpful, but possibly not, comments:

This occurred with vdsm builds from -96.1 to latest git (something around -104)

It only seems to be a problem with creating storage domains.  Adding a rhev-h to an existing datacenter with a storage domain already running works correctly.

Comment 4 Mike Burns 2011-09-21 22:54:28 UTC
I've played around with this some more and found a few things:  

- Need to make /var/db writable (bug 740406)
- re-mounting / as rw (mount -o remount,rw /) then restarting vdsmd allows storage domain creation to succeed.


Next test:
clean iscsi storage
fresh boot of rhevh
add iscsi storage domain

Failed with same error
Cleanup storage (vgremove, pvremove)
add again

failed with different error (can't find vg that i removed above)


Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 876, in _run
  File "/usr/share/vdsm/storage/hsm.py", line 1199, in public_createStorageDomain
  File "/usr/share/vdsm/storage/sdf.py", line 60, in create
  File "/usr/share/vdsm/storage/blockSD.py", line 282, in create
  File "/usr/share/vdsm/storage/lvm.py", line 829, in getVGbyUUID
  File "/usr/share/vdsm/storage/lvm.py", line 154, in __getattr__
AttributeError: Failed reload: d2bd6b3e-cc04-489a-a42c-26dbcb8929b7


Cleanup again
restart vdsmd
add storage domain again -- Success

Comment 5 Guohua Ouyang 2011-09-22 07:26:04 UTC
I did not see this issue during rhev-h-6.2-0.17.2, and I re-run the test today, also did not see this issue. testing steps:
1. install rhevh
2. configure network.
3. drop to shell, check multipath, vgs, pvs. make sure the lun which rhevh isn't installed is not partitioned.
4. register to rhevm
5. approve and add FC storage.
6. Add FC storage successful.

Comment 6 Alan Pevec 2011-09-22 08:54:22 UTC
(In reply to comment #5)
> I did not see this issue during rhev-h-6.2-0.17.2

This is the issue with "pure" 6.2 RHEV-H builds, -0.17.2 is "hybrid" one (6.1.z + only libvirt/kvm from 6.2)

Comment 7 Igor Lvovsky 2011-09-22 14:29:32 UTC
The real reason is wrong LVM behaviour as described in bug 740575

*** This bug has been marked as a duplicate of bug 740575 ***

Comment 8 Mike Burns 2011-09-29 13:26:43 UTC
We need a workaround in ovirt-node to make this work.  Suggestion is to wrap scsi_id to s/ +/_/

Comment 9 Alan Pevec 2011-09-30 20:02:56 UTC
Actually, latest workaround attempt in rhevh is to put this in multipath.conf:
defaults {
    getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/dev/%n"
}

--replace-whitespace does s/ +/_/

Comment 10 Mike Burns 2011-09-30 21:36:37 UTC
Patch will do 3 things:

1.  put getuid_callout workaround in multipath.conf
2.  drop lvm to set verify_udev_operations = 1 (it's 0 by default)
3.  remove this workaround:  sed -i -e '/^ENV{DM_UDEV_DISABLE_DM_RULES_FLAG}/d' /lib/udev/rules.d/10-dm.rules

A node built with these changes can successfully autoinstall and create storage domains.  It did uncover a couple bugs in the TUI however.  Disk selection was listing /dev/sda instead of /dev/mapper/<wwid>.  A patch for this issue is WIP.

Comment 11 Mike Burns 2011-09-30 21:38:08 UTC
Created attachment 525841 [details]
Patch

Patch for multipath.conf and removing the workarounds

Comment 12 Mike Burns 2011-10-02 14:35:01 UTC
Created attachment 525938 [details]
Follow-up Patch

Patch to cleanup previously mentioned TUI issues

(Collaborated on by Joey Boggs)

Comment 13 Mike Burns 2011-10-02 14:48:38 UTC
Testing:  

Install RHEV-H -- 

verify TUI has the right devices (should show multipath where appropriate)
verify booted to multipath device
There should be no "falling back to direct device creation" errors (or similar) in boot log
lvm.conf should not have verify_udev_operations set
ensure rule ENV{DM_UDEV_DISABLE_DM_RULES_FLAG} exists in 10-dm.rules
create various storage domains using RHEV-H as the host to create them

Comment 15 Ying Cui 2011-10-14 08:40:37 UTC
Verified this bug on RHEV-H 6.2-20111010.2.el6.
Creating FC storage domain and soft iSCSI domain successful. No such error in vdsm.log.
We can not check this bug on hard iSCSI machine,because it blocked by bug #742433.

So change the status to Verified, if the issue is reproduce on hare iSCSI machine, I will reopen it.

Comment 16 errata-xmlrpc 2011-12-06 19:28:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1783.html


Note You need to log in before you can comment on or make changes to this bug.