Bug 912348

Summary: Sanlock lockspace add failure while creating NFS storage pool on RHEV-H node.
Product: Red Hat Enterprise Linux 6 Reporter: Leonid Natapov <lnatapov>
Component: ovirt-nodeAssignee: Mike Burns <mburns>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.4CC: acathrow, bsarathy, chchen, chetan, cluster-maint, cpelland, cshao, dyasny, gouyang, hadong, hateya, huiwa, jboggs, leiwang, lyarwood, mburns, ovirt-maint, ycui
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The sanlock user was missing in some upgrade circumstances, so the "sanlock lockspace add" command could fail when creating new NFS storage pools on the hypervisor. Now, the sanlock user is checked on upgrade, and is created if it does not exist. The lockspace errors no longer occur.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-10 23:03:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 913267, 916183    
Attachments:
Description Flags
vdsm log none

Description Leonid Natapov 2013-02-18 13:43:14 UTC
Description of problem:

Getting Sanlock exception while creating NFS storage pool on RHEV-H node.
Happens with certain storage domains. Seems like a problem with sanlock and Root squash permission on storage domain.


Thread-1039::ERROR::2013-02-18 12:44:15,759::task::833::TaskManager.Task::(_setError) Task=`6c012945-be3f-4d12-834d-769e2d3185c3`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 893, in createStoragePool
  File "/usr/share/vdsm/storage/sp.py", line 568, in create
  File "/usr/share/vdsm/storage/sp.py", line 510, in _acquireTemporaryClusterLock
  File "/usr/share/vdsm/storage/sd.py", line 436, in acquireHostId
  File "/usr/share/vdsm/storage/clusterlock.py", line 187, in acquireHostId
AcquireHostIdFailure: Cannot acquire host id: ('c6c5d903-e05f-4b1d-89c5-2fdfd0b1b255', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))


Version-Release number of selected component (if applicable):


How reproducible:
Reproduces 100% with certain storage domain.

Steps to Reproduce:
1.Create Storage Pool on master domain.
2.
3.
  
Actual results:

Pool creation failed.

Expected results:
Pool creation succeed. 

[root@purple-vds1 vdsm]# rpm -q sanlock
sanlock-2.6-2.el6.x86_64
[root@purple-vds1 vdsm]# rpm -q vdsm
vdsm-4.10.2-1.4.el6.x86_64
[root@purple-vds1 vdsm]# rpm -q libvirt
libvirt-0.10.2-18.el6.x86_64


vdsm log attached

Comment 1 Leonid Natapov 2013-02-18 13:44:44 UTC
Created attachment 698893 [details]
vdsm log

Comment 3 Yaniv Kaul 2013-02-18 14:07:00 UTC
Leonid - isn't it a RHEVH issue?

Comment 5 Mike Burns 2013-02-18 15:05:56 UTC
What version of rhev-h?  Can this be reproduced with RHEL+vdsm+sanlock?

Comment 6 Mike Burns 2013-02-18 15:13:27 UTC
Dave/Dan,

Any hints on what the issue is here?

Comment 7 Leonid Natapov 2013-02-18 15:29:59 UTC
1.RHEV-H release 6.4 (20130213.1.el6)
2.On RHEL works OK. We ran automation tests on RHEL 6.4 (the same setup) and it didn't happen.

Comment 8 Mike Burns 2013-02-18 20:14:48 UTC
This is marked regression.  Can you tell me when it worked?  there is no logic in rhev-h or ovirt-node for using sanlock at all.  That logic is all held in vdsm.

Comment 9 Haim 2013-02-18 22:03:38 UTC
(In reply to comment #8)
> This is marked regression.  Can you tell me when it worked?  there is no
> logic in rhev-h or ovirt-node for using sanlock at all.  That logic is all
> held in vdsm.

it worked with RHEV-H on 6.3 version.
from what I understand, sanlock user in rhev-h runs in root context instead of sanlock context, problem start with shares that configured with root_squash configuration, which prevent sanlock from acquiring the lockspace on the lease.

also, sanlock user is missing from /etc/passwd. 

please make proper changes in the rhev-h rpm.

Comment 10 Mike Burns 2013-02-18 22:44:45 UTC
Was this on an upgrade?  A fresh install has sanlock user and group

Comment 11 Leonid Natapov 2013-02-19 06:19:01 UTC
(In reply to comment #10)
> Was this on an upgrade?  A fresh install has sanlock user and group

This was an upgrade from 6.3.

Comment 12 Leonid Natapov 2013-02-19 06:21:32 UTC
(In reply to comment #10)
> Was this on an upgrade?  A fresh install has sanlock user and group

This was an upgrade from 6.3.

Comment 29 Mike Burns 2013-06-10 23:03:28 UTC
This is not valid for 6.5 anymore.  sanlock inclusion is now part of the RHEV plugin.