Bug 1331051

Summary: cannot attach iso domain to engine 3.6.6.
Product: [oVirt] ovirt-engine Reporter: Sven Kieske <s.kieske>
Component: BLL.StorageAssignee: Nir Soffer <nsoffer>
Status: CLOSED NOTABUG QA Contact: Aharon Canan <acanan>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.6.5.1CC: amureini, bugs, nsoffer, s.kieske
Target Milestone: ovirt-3.6.6Keywords: Regression
Target Release: ---Flags: amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-29 07:03:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm.log
none
engine.log
none
/etc/sudoers from affected host running vdsm
none
50_vdsm sudoer file, actually altered by me none

Description Sven Kieske 2016-04-27 14:44:09 UTC
Created attachment 1151424 [details]
vdsm.log

Description of problem:

I can successfully create an ISO-Domain on my remote NFS Server.
However I can not attach it to a local storage DC (See Log excerpt below).

Version-Release number of selected component (if applicable):
I'm already running a snapshot 3.6 release because of BZ 1329317

Here are all versions, engine:

rpm -qa | grep -e "vdsm\|ovirt"
ebay-cors-filter-1.0.1-0.1.ovirt.el7.noarch
ovirt-log-collector-3.6.0-1.el7.centos.noarch
vdsm-jsonrpc-java-1.1.9-1.el7.centos.noarch
ovirt-engine-lib-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.0.8-0.0.master.20160415165902.gitca9be7e.el7.noarch
ovirt-engine-websocket-proxy-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-host-deploy-java-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
ovirt-engine-tools-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.0-1.el7.centos.noarch
ovirt-release36-002-2.noarch
ovirt-engine-wildfly-8.2.1-1.el7.x86_64
ovirt-engine-extension-aaa-ldap-setup-1.1.2-1.el7.centos.noarch
ovirt-engine-wildfly-overlay-8.0.5-1.el7.noarch
ovirt-release36-snapshot-007-1.noarch
ovirt-engine-setup-base-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-extensions-api-impl-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-iso-uploader-3.6.1-0.0.master.20160414111648.gitd2aea1a.el7.noarch
ovirt-engine-vmconsole-proxy-helper-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-cli-3.6.2.1-0.1.20160111.git696d8ea.el7.centos.noarch
ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
ovirt-engine-restapi-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-webadmin-portal-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-userportal-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-backend-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-ldap-1.1.2-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-tools-backup-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-image-uploader-3.6.1-0.0.master.20151006154111.git95ce637.el7.centos.noarch
ovirt-engine-dbscripts-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch


vdsm host:

rpm -qa | grep -e "vdsm\|ovirt"
vdsm-4.17.23-0.el7.centos.noarch
vdsm-infra-4.17.23-0.el7.centos.noarch
vdsm-xmlrpc-4.17.23-0.el7.centos.noarch
vdsm-yajsonrpc-4.17.23-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.23-0.el7.centos.noarch
vdsm-cli-4.17.23-0.el7.centos.noarch
vdsm-python-4.17.23-0.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
vdsm-jsonrpc-4.17.23-0.el7.centos.noarch
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch


How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:
can't attach iso domain

Expected results:

can mount iso domain

Additional info:


this is no configuration error on my side (imho).

I can mount the NFS-Share fine manually on the vdsm host:

mount 10.210.1.6:/home/iso1 /tmp/testmount/
l /tmp/testmount/
total 16
drwxr-xr-x  4 nobody nobody 4096 2016-04-27 16:28 ./
drwxrwxrwt. 9 root   root   4096 2016-04-27 16:30 ../
drwxr-xr-x  4 nobody nobody 4096 2016-04-27 16:28 9ba22bfe-d760-48c9-b4a5-10149cb4ecb9/
-rwxr-xr-x  1 nobody nobody    0 2016-04-27 16:31 __DIRECT_IO_TEST__*
drwx------  2 root   root   4096 2015-01-20 14:58 lost+found/


Excerpt from engine.log:

2016-04-27 16:31:14,812 INFO  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-10) [5aa023a7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9ba22bfe-d760-48c9-b4a5-10149
cb4ecb9=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-04-27 16:31:14,943 INFO  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Running command: AttachStorageDomainToPoolCommand internal: false
. Entities affected :  ID: 9ba22bfe-d760-48c9-b4a5-10149cb4ecb9 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN,  ID: 76cd7003-cf29-461a-b4d0-ff278b680cb5 Type: StoragePoolAction group M
ANIPULATE_STORAGE_DOMAIN with role type ADMIN
2016-04-27 16:31:14,996 INFO  [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-8-thread-44) [110400ae] Running command: ConnectStorageToVdsCommand internal: true. Entities af
fected :  ID: aaa00000-0000-0000-0000-123456789aaa Type: SystemAction group CREATE_STORAGE_DOMAIN with role type ADMIN
2016-04-27 16:31:14,999 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-44) [110400ae] START, ConnectStorageServerVDSCommand(HostName = vrnode0015
, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='0dab6a20-e772-4cb4-8b7b-53d2d4757c1b', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='NFS', connectionList='[Storage
ServerConnections:{id='661332a4-a897-4e04-b6bc-91c88d71b3cf', connection='10.210.1.6:/home/iso1', iqn='null', vfsType='null', mountOptions='null', nfsVersion='V4', nfsRetrans='null', nfsTimeo='null', iface='null
', netIfaceName='null'}]'}), log id: 183c6dbd
2016-04-27 16:31:15,016 INFO  [org.ovirt.engine.core.bll.aaa.LoginUserCommand] (default task-12) [] Running command: LoginUserCommand internal: false.
2016-04-27 16:31:15,020 INFO  [org.ovirt.engine.core.bll.aaa.LogoutSessionCommand] (default task-12) [5e3d9501] Running command: LogoutSessionCommand internal: false.
2016-04-27 16:31:15,031 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-44) [110400ae] FINISH, ConnectStorageServerVDSCommand, return: {661332a4-a
897-4e04-b6bc-91c88d71b3cf=0}, log id: 183c6dbd
2016-04-27 16:31:15,032 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] START, AttachStorageDomainVDSCommand( AttachStorageDomainVDS
CommandParameters:{runAsync='true', storagePoolId='76cd7003-cf29-461a-b4d0-ff278b680cb5', ignoreFailoverLimit='false', storageDomainId='9ba22bfe-d760-48c9-b4a5-10149cb4ecb9'}), log id: 48a0aa2e
2016-04-27 16:31:15,065 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [5e3d9501] Correlation ID: 5e3d9501, Call Stack: null, Custom Event ID: -1, Message: User ad
min@internal logged out.
2016-04-27 16:31:15,251 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Failed in 'AttachStorageDomainVDS' method
2016-04-27 16:31:15,278 INFO  [org.ovirt.engine.core.bll.aaa.LoginUserCommand] (default task-4) [] Running command: LoginUserCommand internal: false.
2016-04-27 16:31:15,288 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Me
ssage: VDSM command failed: Cannot obtain lock: u"id=9ba22bfe-d760-48c9-b4a5-10149cb4ecb9, rc=1, out=[], err=['sudo: a password is required']"
2016-04-27 16:31:15,288 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Command 'AttachStorageDomainVDSCommand( AttachStorageDomainV
DSCommandParameters:{runAsync='true', storagePoolId='76cd7003-cf29-461a-b4d0-ff278b680cb5', ignoreFailoverLimit='false', storageDomainId='9ba22bfe-d760-48c9-b4a5-10149cb4ecb9'})' execution failed: IRSGenericExce
ption: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot obtain lock: u"id=9ba22bfe-d760-48c9-b4a5-10149cb4ecb9, rc=1, out=[], err=['sudo: a password is required']", code = 651
2016-04-27 16:31:15,288 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] FINISH, AttachStorageDomainVDSCommand, log id: 48a0aa2e
2016-04-27 16:31:15,288 ERROR [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Command 'org.ovirt.engine.core.bll.storage.AttachStorageDomainToP
oolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot obtain lock: u"id=9ba22bfe-d760-48c9-b4a5-10149cb4ecb9, rc=1, out=[], err=['sudo: a password is required']", code = 651 (Failed with error AcquireLockFailure and code 651)
2016-04-27 16:31:15,289 INFO  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Command [id=da717d90-6f46-4f7c-8711-c65d0abea948]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: StoragePoolIsoMapId:{storagePoolId='76cd7003-cf29-461a-b4d0-ff278b680cb5', storageId='9ba22bfe-d760-48c9-b4a5-10149cb4ecb9'}.
2016-04-27 16:31:15,318 INFO  [org.ovirt.engine.core.bll.aaa.LogoutSessionCommand] (default task-4) [14a048ad] Running command: LogoutSessionCommand internal: false.
2016-04-27 16:31:15,329 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Correlation ID: 5aa023a7, Job ID: f2c8a361-23ef-4750-b919-acac205d59e7, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain iso1 to Data Center vrd0015. (User: admin@internal)
2016-04-27 16:31:15,354 INFO  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-8-thread-31) [5aa023a7] Lock freed to object 'EngineLock:{exclusiveLocks='[9ba22bfe-d760-48c9-b4a5-10149cb4ecb9=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'


I have attached both engine and vdsm log.

Comment 1 Sven Kieske 2016-04-27 14:44:55 UTC
Created attachment 1151425 [details]
engine.log

Comment 2 Sven Kieske 2016-04-27 14:58:02 UTC
I also did run the nfs-check.py from vdsm/contrib repo described here:
https://www.ovirt.org/documentation/how-to/troubleshooting/troubleshooting-nfs-storage-issues/

result:

python ./nfs-check.py 10.210.1.6:/home/iso1
Current hostname: vrnode0015.vrootdev - IP addr 10.210.2.20
Trying to /bin/mount -t nfs 10.210.1.6:/home/iso1...
Executing NFS tests..
Removing vdsmTest file..
Status of tests [OK]
Disconnecting from NFS Server..
Done!


So I really don't know what's the problem.

Maybe this is related to BZ 1320128 ?

There seem to be some network issues in between Connections from engine to host (in engine and/or vdsm code, nothing related to the infrastructure).

kind regards

Sven

Comment 3 Allon Mureinik 2016-04-27 16:12:24 UTC
Relevant error from VDSM:

jsonrpc.Executor/2::DEBUG::2016-04-27 16:28:55,378::clusterlock::140::Storage.Misc.excCmd::(acquire) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /usr/bin/setsid /usr/bin/ionice -c 1 -n 0 /usr/bin/su vdsm -s /bin/sh -c '/usr/libexec/vdsm/spmprotect.sh start 9ba22bfe-d760-48c9-b4a5-10149cb4ecb9 1 5 /rhev/data-center/mnt/10.210.1.6:_home_iso1/9ba22bfe-d760-48c9-b4a5-10149cb4ecb9/dom_md/leases 60000 10000 3 17433' (cwd /usr/libexec/vdsm)
jsonrpc.Executor/2::DEBUG::2016-04-27 16:28:55,387::clusterlock::140::Storage.Misc.excCmd::(acquire) FAILED: <err> = 'sudo: a password is required\n'; <rc> = 1
jsonrpc.Executor/2::ERROR::2016-04-27 16:28:55,387::task::866::Storage.TaskManager.Task::(_setError) Task=`9cf18f8d-c6a8-4e4a-ac8e-d9c4d786d3ae`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1210, in attachStorageDomain
    pool.attachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 943, in attachSD
    dom.acquireClusterLock(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 565, in acquireClusterLock
    self._clusterLock.acquire(hostID)
  File "/usr/share/vdsm/storage/clusterlock.py", line 142, in acquire
    raise se.AcquireLockFailure(self._sdUUID, rc, out, err)
AcquireLockFailure: Cannot obtain lock: u"id=9ba22bfe-d760-48c9-b4a5-10149cb4ecb9, rc=1, out=[], err=['sudo: a password is required']"

Can you please share your /etc/sudoers.d/50_vdsm file?

Comment 4 Allon Mureinik 2016-04-27 16:15:07 UTC
And the /etc/sudoers file too please, actually.

Comment 5 Sven Kieske 2016-04-28 06:05:49 UTC
Created attachment 1151722 [details]
/etc/sudoers from affected host running vdsm

This is the stock centos 7.2 sudoers file, I did not alter it

Comment 6 Sven Kieske 2016-04-28 06:08:51 UTC
Created attachment 1151735 [details]
50_vdsm sudoer file, actually altered by me

I added the section Cmnd_Alias VDSM_NETWORK because I run a hook which uses "tc" for advanced traffic control of vms.

I hope this isn't problematic, this works with older vdsm and ovirt-engine versions.

Comment 7 Allon Mureinik 2016-04-28 07:13:18 UTC
Seems to be in order [unfortunately :-(].
Nir - can you take a look?

Comment 8 Red Hat Bugzilla Rules Engine 2016-04-28 07:13:24 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 9 Nir Soffer 2016-04-28 14:23:41 UTC
Sven, 

In your vdsm log, we see that vdsm ask to run this command with sudo:

    /usr/bin/setsid /usr/bin/ionice -c 1 -n 0 /usr/bin/su vdsm -s /bin/sh -c '/usr/libexec/vdsm/spmprotect.sh ...

But your /etc/sudoers.d/50_vdsm will allow only:

    /usr/bin/setsid /usr/bin/ionice -c ? -n ? /bin/su vdsm -s /bin/sh -c /usr/libexec/vdsm/spmprotect.sh

Note "/bin/su" vs "/usr/bin/su"

Can you share the output of:
    
    grep 'EXT_SU\b' /usr/lib/python2.7/site-packages/vdsm/constants.py

Did you modify the 50_vdsm file installed by vdsm, or copied from another machine?

I don't think modifying files installed by vdsm is a good idea. Your file
will be overwritten once you upgrade vdsm.

I think you want to put your private sudoers configuration in another file like:

    /etc/sudoers.d/51_vdsm_hook_name

The contents would some somthing like this:

    vdsm  ALL=(ALL) NOPASSWD: /sbin/tc *

Comment 10 Sven Kieske 2016-04-29 07:03:21 UTC
(In reply to Nir Soffer from comment #9)
> Sven, 
> 
> In your vdsm log, we see that vdsm ask to run this command with sudo:
> 
>     /usr/bin/setsid /usr/bin/ionice -c 1 -n 0 /usr/bin/su vdsm -s /bin/sh -c
> '/usr/libexec/vdsm/spmprotect.sh ...
> 
> But your /etc/sudoers.d/50_vdsm will allow only:
> 
>     /usr/bin/setsid /usr/bin/ionice -c ? -n ? /bin/su vdsm -s /bin/sh -c
> /usr/libexec/vdsm/spmprotect.sh
> 
> Note "/bin/su" vs "/usr/bin/su"
> 
> Can you share the output of:
>     
>     grep 'EXT_SU\b' /usr/lib/python2.7/site-packages/vdsm/constants.py
> 
> Did you modify the 50_vdsm file installed by vdsm, or copied from another
> machine?

In fact I just checked the config management (I do basic setup with cobbler and later configuration via salt):

It seems my colleagues found it a good idea to overwrite the whole vdsm config with an old one from vdsm-4.13.4-0, from EL6 machines hence the error.

> I don't think modifying files installed by vdsm is a good idea. Your file
> will be overwritten once you upgrade vdsm.
> 
> I think you want to put your private sudoers configuration in another file
> like:
> 
>     /etc/sudoers.d/51_vdsm_hook_name
> 
> The contents would some somthing like this:
> 
>     vdsm  ALL=(ALL) NOPASSWD: /sbin/tc *

I will adapt your suggestion and put my own config in my own file and not fiddle with vdsm's sudoers file anymore.

In fact on EL7 su is located under /usr/bin/su and tc is located under /usr/sbin/tc and the stock 50_vdsm sudoers file has the correct path to "su" in it, I just checked.

I just reinstalled the host with your suggested changes to my sudoers files (taking stock 50_vdsm sudoers and my changes in a separate file) and I can confirm that iso domain attaching works again.

Thanks for your great help and support, it's really appreciated! :)

And sorry for all the noise, as this was apparently a configuration mistake on my side from the very beginning.

kind regards

Sven