Bug 950596

Summary: engine: can't activate a host that was moved to non-operational because engine is trying to connect to a domain that was moved to inactive
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, amureini, bazulay, dyasny, hateya, iheim, lnatapov, lpeer, mkublin, Rhev-m-bugs, scohen, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 948448    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-04-10 13:21:14 UTC
Created attachment 733695 [details]
logs

Description of problem:

I created a setup with two hosts in iscsi storage with 2 domains. 
master domain has 2 luns serverX and serverY, second domain is from serverY
I blocked connectivity to serverX from both hosts. 
spm became non-operational and domain became inactive. 
once the domain becomes inactive I tried activating the non-operationbal host and since engine is sending connectStorageServer to the inactive domain and failing, the host remains non-operational and cannot be activated. 
(please note that we will not be able to activate new hosts added as well). 


Version-Release number of selected component (if applicable):

sf13
vdsm-4.10.2-14.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster create a domain from serverX
2. create a domain from serverY
3. extend the first domain with lun from serverY
4. block connectivity to serverX from both hosts

Actual results:

spm host becomes non-operational. 
domain with stroageX will become inactive
host cannot be activated because we keep sending connectStorageServer to the inactive domain and failing. 

Expected results:

we should be able to recover the host


Additional info: logs



2013-04-10 14:15:02,590 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] START, ConnectStorageServerVDSCommand(HostName = cougar01, HostId = 42248b17-62ab-4f1f-88e0-
5f24b445d00b, storagePoolId = 2d223405-2f36-4d9a-983c-e651938bd0ed, storageType = ISCSI, connectionList = [{ id: 97fd9f42-2d16-4018-9ed9-31ef6865443f, connection: 10.35.64.10, iqn: Dafna-31-03, vfsType: null, mountOptions: null, nfsVers
ion: null, nfsRetrans: null, nfsTimeo: null };{ id: 5938bb8d-0e2c-4214-a3be-e8a646168753, connection: 10.35.64.106, iqn: Dafna-extend-2, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 9075b
048-2cec-4f87-9fb7-85fcd286d38b, connection: 10.35.64.10, iqn: Dafna-31-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 37fa68ca
2013-04-10 14:15:06,894 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-2) Running command: LoginUserCommand internal: false.


2013-04-10 14:17:04,293 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] FINISH, ConnectStorageServerVDSCommand, return: {97fd9f42-2d16-4018-9ed9-31ef6865443f=0, 5938bb8d-0e2c-4214-a3be-e8a646168753=465, 9075b048-2cec-4f87-9fb7-85fcd286d38b=0}, log id: 37fa68ca
2013-04-10 14:17:04,310 INFO  [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The lun with id oweNvL-LXOf-gdYu-WhaP-3OSU-Hw9X-TW2g4N was reported as problematic !
2013-04-10 14:17:04,311 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase
2013-04-10 14:17:04,311 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: vds for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase
2013-04-10 14:17:04,314 ERROR [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The connection with details 10.35.64.106 Dafna-extend-2 (LUN 1Dafna-extend-21365522) failed because of error code 465 and error message is: failed to setup iscsi subsystem
2013-04-10 14:17:04,314 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Host cougar01 storage connection was failed 
2013-04-10 14:17:04,330 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 42248b17-62ab-4f1f-88e0-5f24b445d00b Type: VDS



torageRefresh::DEBUG::2013-04-10 14:08:39,064::misc::1063::SamplingMethod::(__call__) Returning last result
storageRefresh::WARNING::2013-04-10 14:08:39,064::fileUtils::184::fileUtils::(createdir) Dir /rhev/data-center/hsm-tasks already exists
Thread-16::DEBUG::2013-04-10 14:10:37,128::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].\niscsiadm: initiator reported error 
(8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8
Thread-16::DEBUG::2013-04-10 14:10:37,129::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,163::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::DEBUG::2013-04-10 14:10:37,164::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 -u' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,180::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No matching sessions found\n'; <rc> = 21
Thread-16::DEBUG::2013-04-10 14:10:37,181::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,197::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::DEBUG::2013-04-10 14:10:37,198::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 --op=delete' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,216::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::ERROR::2013-04-10 14:10:37,217::hsm::2252::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2248, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 341, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 135, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, targetName)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 292, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].', 'iscsiadm: initi
ator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

Comment 2 Dafna Ron 2013-05-13 15:20:24 UTC
verified on sf16
we move the domain to inactive and all of the hosts stay in up state

Comment 3 Itamar Heim 2013-06-11 08:23:15 UTC
3.2 has been released

Comment 4 Itamar Heim 2013-06-11 08:25:05 UTC
3.2 has been released