Created attachment 733695[details]
logs
Description of problem:
I created a setup with two hosts in iscsi storage with 2 domains.
master domain has 2 luns serverX and serverY, second domain is from serverY
I blocked connectivity to serverX from both hosts.
spm became non-operational and domain became inactive.
once the domain becomes inactive I tried activating the non-operationbal host and since engine is sending connectStorageServer to the inactive domain and failing, the host remains non-operational and cannot be activated.
(please note that we will not be able to activate new hosts added as well).
Version-Release number of selected component (if applicable):
sf13
vdsm-4.10.2-14.0.el6ev.x86_64
How reproducible:
100%
Steps to Reproduce:
1. in two hosts cluster create a domain from serverX
2. create a domain from serverY
3. extend the first domain with lun from serverY
4. block connectivity to serverX from both hosts
Actual results:
spm host becomes non-operational.
domain with stroageX will become inactive
host cannot be activated because we keep sending connectStorageServer to the inactive domain and failing.
Expected results:
we should be able to recover the host
Additional info: logs
2013-04-10 14:15:02,590 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] START, ConnectStorageServerVDSCommand(HostName = cougar01, HostId = 42248b17-62ab-4f1f-88e0-
5f24b445d00b, storagePoolId = 2d223405-2f36-4d9a-983c-e651938bd0ed, storageType = ISCSI, connectionList = [{ id: 97fd9f42-2d16-4018-9ed9-31ef6865443f, connection: 10.35.64.10, iqn: Dafna-31-03, vfsType: null, mountOptions: null, nfsVers
ion: null, nfsRetrans: null, nfsTimeo: null };{ id: 5938bb8d-0e2c-4214-a3be-e8a646168753, connection: 10.35.64.106, iqn: Dafna-extend-2, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 9075b
048-2cec-4f87-9fb7-85fcd286d38b, connection: 10.35.64.10, iqn: Dafna-31-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 37fa68ca
2013-04-10 14:15:06,894 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-2) Running command: LoginUserCommand internal: false.
2013-04-10 14:17:04,293 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] FINISH, ConnectStorageServerVDSCommand, return: {97fd9f42-2d16-4018-9ed9-31ef6865443f=0, 5938bb8d-0e2c-4214-a3be-e8a646168753=465, 9075b048-2cec-4f87-9fb7-85fcd286d38b=0}, log id: 37fa68ca
2013-04-10 14:17:04,310 INFO [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The lun with id oweNvL-LXOf-gdYu-WhaP-3OSU-Hw9X-TW2g4N was reported as problematic !
2013-04-10 14:17:04,311 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase
2013-04-10 14:17:04,311 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: vds for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase
2013-04-10 14:17:04,314 ERROR [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The connection with details 10.35.64.106 Dafna-extend-2 (LUN 1Dafna-extend-21365522) failed because of error code 465 and error message is: failed to setup iscsi subsystem
2013-04-10 14:17:04,314 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Host cougar01 storage connection was failed
2013-04-10 14:17:04,330 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 42248b17-62ab-4f1f-88e0-5f24b445d00b Type: VDS
torageRefresh::DEBUG::2013-04-10 14:08:39,064::misc::1063::SamplingMethod::(__call__) Returning last result
storageRefresh::WARNING::2013-04-10 14:08:39,064::fileUtils::184::fileUtils::(createdir) Dir /rhev/data-center/hsm-tasks already exists
Thread-16::DEBUG::2013-04-10 14:10:37,128::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].\niscsiadm: initiator reported error
(8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8
Thread-16::DEBUG::2013-04-10 14:10:37,129::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,163::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::DEBUG::2013-04-10 14:10:37,164::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 -u' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,180::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No matching sessions found\n'; <rc> = 21
Thread-16::DEBUG::2013-04-10 14:10:37,181::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,197::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::DEBUG::2013-04-10 14:10:37,198::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 --op=delete' (cwd None)
Thread-16::DEBUG::2013-04-10 14:10:37,216::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
Thread-16::ERROR::2013-04-10 14:10:37,217::hsm::2252::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 2248, in connectStorageServer
conObj.connect()
File "/usr/share/vdsm/storage/storageServer.py", line 341, in connect
iscsi.addIscsiNode(self._iface, self._target, self._cred)
File "/usr/share/vdsm/storage/iscsi.py", line 135, in addIscsiNode
iscsiadm.node_login(iface.name, portalStr, targetName)
File "/usr/share/vdsm/storage/iscsiadm.py", line 292, in node_login
raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].', 'iscsiadm: initi
ator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])
Created attachment 733695 [details] logs Description of problem: I created a setup with two hosts in iscsi storage with 2 domains. master domain has 2 luns serverX and serverY, second domain is from serverY I blocked connectivity to serverX from both hosts. spm became non-operational and domain became inactive. once the domain becomes inactive I tried activating the non-operationbal host and since engine is sending connectStorageServer to the inactive domain and failing, the host remains non-operational and cannot be activated. (please note that we will not be able to activate new hosts added as well). Version-Release number of selected component (if applicable): sf13 vdsm-4.10.2-14.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster create a domain from serverX 2. create a domain from serverY 3. extend the first domain with lun from serverY 4. block connectivity to serverX from both hosts Actual results: spm host becomes non-operational. domain with stroageX will become inactive host cannot be activated because we keep sending connectStorageServer to the inactive domain and failing. Expected results: we should be able to recover the host Additional info: logs 2013-04-10 14:15:02,590 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] START, ConnectStorageServerVDSCommand(HostName = cougar01, HostId = 42248b17-62ab-4f1f-88e0- 5f24b445d00b, storagePoolId = 2d223405-2f36-4d9a-983c-e651938bd0ed, storageType = ISCSI, connectionList = [{ id: 97fd9f42-2d16-4018-9ed9-31ef6865443f, connection: 10.35.64.10, iqn: Dafna-31-03, vfsType: null, mountOptions: null, nfsVers ion: null, nfsRetrans: null, nfsTimeo: null };{ id: 5938bb8d-0e2c-4214-a3be-e8a646168753, connection: 10.35.64.106, iqn: Dafna-extend-2, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 9075b 048-2cec-4f87-9fb7-85fcd286d38b, connection: 10.35.64.10, iqn: Dafna-31-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 37fa68ca 2013-04-10 14:15:06,894 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-2) Running command: LoginUserCommand internal: false. 2013-04-10 14:17:04,293 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] FINISH, ConnectStorageServerVDSCommand, return: {97fd9f42-2d16-4018-9ed9-31ef6865443f=0, 5938bb8d-0e2c-4214-a3be-e8a646168753=465, 9075b048-2cec-4f87-9fb7-85fcd286d38b=0}, log id: 37fa68ca 2013-04-10 14:17:04,310 INFO [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The lun with id oweNvL-LXOf-gdYu-WhaP-3OSU-Hw9X-TW2g4N was reported as problematic ! 2013-04-10 14:17:04,311 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase 2013-04-10 14:17:04,311 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (QuartzScheduler_Worker-94) Unable to get value of property: vds for class org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase 2013-04-10 14:17:04,314 ERROR [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [5e6f0bbf] The connection with details 10.35.64.106 Dafna-extend-2 (LUN 1Dafna-extend-21365522) failed because of error code 465 and error message is: failed to setup iscsi subsystem 2013-04-10 14:17:04,314 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Host cougar01 storage connection was failed 2013-04-10 14:17:04,330 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-94) [5e6f0bbf] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 42248b17-62ab-4f1f-88e0-5f24b445d00b Type: VDS torageRefresh::DEBUG::2013-04-10 14:08:39,064::misc::1063::SamplingMethod::(__call__) Returning last result storageRefresh::WARNING::2013-04-10 14:08:39,064::fileUtils::184::fileUtils::(createdir) Dir /rhev/data-center/hsm-tasks already exists Thread-16::DEBUG::2013-04-10 14:10:37,128::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 Thread-16::DEBUG::2013-04-10 14:10:37,129::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None) Thread-16::DEBUG::2013-04-10 14:10:37,163::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-16::DEBUG::2013-04-10 14:10:37,164::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 -u' (cwd None) Thread-16::DEBUG::2013-04-10 14:10:37,180::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No matching sessions found\n'; <rc> = 21 Thread-16::DEBUG::2013-04-10 14:10:37,181::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m iface' (cwd None) Thread-16::DEBUG::2013-04-10 14:10:37,197::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-16::DEBUG::2013-04-10 14:10:37,198::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T Dafna-extend-2 -I default -p 10.35.64.106:3260 --op=delete' (cwd None) Thread-16::DEBUG::2013-04-10 14:10:37,216::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-16::ERROR::2013-04-10 14:10:37,217::hsm::2252::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2248, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 341, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 135, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, targetName) File "/usr/share/vdsm/storage/iscsiadm.py", line 292, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: Dafna-extend-2, portal: 10.35.64.106,3260].', 'iscsiadm: initi ator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])