Bug 1566083
Summary: | NetAPP - Duplicate ISCSI storage connections causing(probably) poor cleanup of ISCSI connections after host goes to maintenance | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Avihai <aefrat> | ||||
Component: | BLL.Storage | Assignee: | Nobody <nobody> | ||||
Status: | CLOSED DEFERRED | QA Contact: | Avihai <aefrat> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.2.2.6 | CC: | aefrat, bugs, eshenitz | ||||
Target Milestone: | --- | Keywords: | Automation | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-09-29 11:33:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This is not a regression bug but is related to moving to a new Netapp move of infra team (which we run our TierX tests ) on. This is an automation blocker as it blocks many tests in the storage connections test plan. We had the same issue since 3.1. I don't understand how this can be an automation blocker now. (In reply to Allon Mureinik from comment #2) > We had the same issue since 3.1. I do not claim this issue is a 'regression' issue but automation blocker. > I don't understand how this can be an automation blocker now. We worked with a different NetApp ISCSI storage provider until recent Lab move and we did not see this issue.(why ? no idea) Now we see the issue & it blocks automation tests thus automation blocker. This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release. This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix. |
Created attachment 1420326 [details] engine , vdsm logs, print screen of storage domain window Description of problem: This occurs on NetApp storage server netapp-cluster3.mgmt.lab.eng.tlv2.redhat.com - DOES NOT OCCUR ON ExtremIO storage. One & only host(only host on the DC/cluster/SD). First, after discovery, login and storage domain creation using host1 we see duplicate ISCSI connections so before host goes to maintenance 4 connections appear which looks duplicates of each other. Also note that 10.46.16.9 is the ip for FQDN "mantis-iscsi-lif2.lab.eng.tlv2.redhat.com" . So actually all 4 connections are the same storage connections: iscsiadm -m session Before host goes to maintenance: tcp: [16] 10.46.16.9:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [17] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [18] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [19] 10.46.16.9:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) Than after Moving a host with ISCSI storage connection to maintenance not all connections are cleared as expected. After host goes to maintenance still 2 connections remain: [root@infra4-nested-3 ~]# iscsiadm -m session tcp: [17] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [18] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) On VDSM log - we can see only 2 connections was disconnected : conList=[ {'id': '5a3be19c-1fb9-4d49-9962-812061db97f1', 'connection': 'mantis-iscsi-lif2.lab.eng.tlv2.redhat.com', 'iqn': 'iqn.1992-08.com.netapp:vserver-rhv-qe', 'user': u'', 'tpgt': '1033', 'password': '********', 'port': '3260'}, {'id': 'b71e88ab-8e66-48d2-bffa-ae69d58bf7cc', 'connection': '10.46.16.9', 'iqn': 'iqn.1992-08.com.netapp:vserver-rhv-qe', 'user': u'', 'tpgt': '1033', 'password': '********', 'port': '3260'} ] 2018-04-11 15:46:53,149+0300 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.disconnect succeeded in 2.94 seconds (__init__:573) 2018-04-11 15:46:53,177+0300 DEBUG (jsonrpc/0) [storage.TaskManager.Task] (Task='63d6df2a-e4bd-4f95-b06d-ae53c2bec80e') moving from state init -> state preparing (task:602) 2018-04-11 15:46:53,178+0300 INFO (jsonrpc/0) [vdsm.api] START disconnectStorageServer(domType=3, spUUID='3a173838-741d-440d-9047-ae74bbad5307', conList=[{'id': '5a3be19c-1fb9-4d49-9962-812061db97f1', 'connecti on': 'mantis-iscsi-lif2.lab.eng.tlv2.redhat.com', 'iqn': 'iqn.1992-08.com.netapp:vserver-rhv-qe', 'user': u'', 'tpgt': '1033', 'password': '********', 'port': '3260'}, {'id': 'b71e88ab-8e66-48d2-bffa-ae69d58bf7c c', 'connection': '10.46.16.9', 'iqn': 'iqn.1992-08.com.netapp:vserver-rhv-qe', 'user': u'', 'tpgt': '1033', 'password': '********', 'port': '3260'}], options=None) from=::ffff:10.46.16.68,56952, flow_id=3122df0 a, task_id=63d6df2a-e4bd-4f95-b06d-ae53c2bec80e (api:46) 2018-04-11 15:46:53,179+0300 INFO (jsonrpc/0) [storage.Server.ISCSI] disconnecting (storageServer:551) 2018-04-11 15:46:53,215+0300 DEBUG (jsonrpc/0) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 /usr/bin/sudo -n /sbin/iscsiadm -m session -r 16 -u (cwd None) (commands:65) 2018-04-11 15:46:53,364+0300 DEBUG (jsonrpc/0) [storage.Misc.excCmd] SUCCESS: <err> = ''; <rc> = 0 (commands:86) 2018-04-11 15:46:53,365+0300 DEBUG (jsonrpc/0) [storage.ISCSI] iface.net_ifacename not provided, skipping rp filter setup (iscsi:572) 2018-04-11 15:46:53,365+0300 INFO (jsonrpc/0) [storage.Server.ISCSI] disconnecting (storageServer:551) 2018-04-11 15:46:54,523+0300 INFO (jsonrpc/0) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': [{'status': 0, 'id': '5a3be19c-1fb9-4d49-9962-812061db97f1'}, {'status': 0, 'id': 'b71e88ab-8e66-48d2-bffa-ae69d58bf7cc'}]} from=::ffff:10.46.16.68,56952, flow_id=3122df0a, task_id=63d6df2a-e4bd-4f95-b06d-ae53c2bec80e (api:52) 2018-04-11 15:46:54,524+0300 DEBUG (jsonrpc/0) [storage.TaskManager.Task] (Task='63d6df2a-e4bd-4f95-b06d-ae53c2bec80e') finished: {'statuslist': [{'status': 0, 'id': '5a3be19c-1fb9-4d49-9962-812061db97f1'}, {'status': 0, 'id': 'b71e88ab-8e66-48d2-bffa-ae69d58bf7cc'}]} (task:1201) 2018-04-11 15:46:54,524+0300 DEBUG (jsonrpc/0) [storage.TaskManager.Task] (Task='63d6df2a-e4bd-4f95-b06d-ae53c2bec80e') moving from state preparing -> state finished (task:602) 2018-04-11 15:46:54,524+0300 DEBUG (jsonrpc/0) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:910) 2018-04-11 15:46:54,525+0300 DEBUG (jsonrpc/0) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:947) 2018-04-11 15:46:54,525+0300 DEBUG (jsonrpc/0) [storage.TaskManager.Task] (Task='63d6df2a-e4bd-4f95-b06d-ae53c2bec80e') ref 0 aborting False (task:1002) 2018-04-11 15:46:54,526+0300 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call StoragePool.disconnectStorageServer succeeded in 1.35 seconds (__init__:573) Engine log: Relevant time shows no issues: 2018-04-11 15:46:48,857+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-9) [0d9cf46d-5fb0-4696-8219-245ea5139f74] EVENT_ID: USER_VDS_MAINTENANCE_WITHOUT_REASON(620), Host host_mixed_3 was switched to Maintenance mode by admin@internal-authz. 2018-04-11 15:46:50,190+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [] Updated host status from 'Preparing for Maintenance' to 'Maintenance' in database, host 'host_mixed_3'(676884d5-d403-4280-a7c8-fc295f5f768b) 2018-04-11 15:46:50,199+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-198334) [] Clearing cache of pool: '3a173838-741d-440d-9047-ae74bbad5307' for problematic entities of VDS: 'host_mixed_3'. 2018-04-11 15:46:50,199+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-198334) [] Removing vds '[676884d5-d403-4280-a7c8-fc295f5f768b]' from the domain in maintenance cache 2018-04-11 15:46:50,199+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-198334) [] Removing host(s) '[676884d5-d403-4280-a7c8-fc295f5f768b]' from hosts unseen domain report cache 2018-04-11 15:46:50,201+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [] START, DisconnectStoragePoolVDSCommand(HostName = host_mixed_3, DisconnectStoragePoolVDSCommandParameters:{hostId='676884d5-d403-4280-a7c8-fc295f5f768b', storagePoolId='3a173838-741d-440d-9047-ae74bbad5307', vds_spm_id='1'}), log id: 77c65443 2018-04-11 15:46:51,239+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [] Storage Pool '3a173838-741d-440d-9047-ae74bbad5307' - Updating Storage Domain 'd290a93f-42ce-4988-8aaf-8d1d1b30775c' status from 'Active' to 'Unknown', reason: no reporting hosts 2018-04-11 15:46:51,244+03 INFO [org.ovirt.engine.core.bll.storage.pool.SetStoragePoolStatusCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [33d26b39] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 3a173838-741d-440d-9047-ae74bbad5307 Type: StoragePool 2018-04-11 15:46:51,249+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [33d26b39] EVENT_ID: SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC(980), Invalid status on Data Center dc1. Setting status to Non Responsive. 2018-04-11 15:46:53,148+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [] FINISH, DisconnectStoragePoolVDSCommand, log id: 77c65443 2018-04-11 15:46:53,151+03 INFO [org.ovirt.engine.core.bll.storage.pool.DisconnectHostFromStoragePoolServersCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [3122df0a] Running command: DisconnectHostFromStoragePoolServersCommand internal: true. Entities affected : ID: 3a173838-741d-440d-9047-ae74bbad5307 Type: StoragePool At discovery time we see some ERRORS in Engine log: 2018-04-11 15:37:28,194+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [7401dcb3-460d-4a60-840b-d1c63543981b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command DiscoverSendTargetsVDS failed: Failed discovery of iSCSI targets: "portal=infra4-nested-3.lab.eng.tlv2.redhat.com:3260, err=(21, [], ['iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: connection login retries (reopen_max) 5 exceeded', 'iscsiadm: No portals found'])" 2018-04-11 15:37:28,194+03 ERROR [org.ovirt.engine.core.bll.storage.connection.DiscoverSendTargetsQuery] (default task-2) [7401dcb3-460d-4a60-840b-d1c63543981b] Query 'DiscoverSendTargetsQuery' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed in vdscommand to DiscoverSendTargetsVDS, error = Failed discovery of iSCSI targets: "portal=infra4-nested-3.lab.eng.tlv2.redhat.com:3260, err=(21, [], ['iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: connection login retries (reopen_max) 5 exceeded', 'iscsiadm: No portals found'])" (Failed with error iSCSIDiscoveryError and code 475) 2018-04-11 15:37:28,194+03 ERROR [org.ovirt.engine.core.bll.storage.connection.DiscoverSendTargetsQuery] (default task-2) [7401dcb3-460d-4a60-840b-d1c63543981b] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed in vdscommand to DiscoverSendTargetsVDS, error = Failed discovery of iSCSI targets: "portal=infra4-nested-3.lab.eng.tlv2.redhat.com:3260, err=(21, [], ['iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: cannot make connection to 10.46.16.97: Connection refused', 'iscsiadm: connection login retries (reopen_max) 5 exceeded', 'iscsiadm: No portals found'])" (Failed with error iSCSIDiscoveryError and code 475) Version-Release number of selected component (if applicable): engine 4.2.2.6-0.1 vdsm-4.20.23-1 How reproducible: 100% Steps to Reproduce: 1)Create new DC & cluster c1 2) Move host to new cluster (c1) & activate 3) On host check that no other ISCSI connections exist in your host via "iscsiadm -m session" and disconnect from all existing connections. 4) create new ISCSI storage domain , discover Netapp storage targets via IP " 10.46.16.9" & add a lun (see print screen attached): 5) checkout ISCSI connections" Before host goes to maintenance 4 connections appear which looks duplicates of each other , Also note that 10.46.16.9 is the ip for FQDN "mantis-iscsi-lif2.lab.eng.tlv2.redhat.com" . So actually all 4 connections are the same storage connections. iscsiadm -m session: tcp: [16] 10.46.16.9:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [17] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [18] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [19] 10.46.16.9:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) 6) Move host to maintenance. Actual results: After host goes to maintenance still 2 connections remain: [root@infra4-nested-3 ~]# iscsiadm -m session tcp: [17] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) tcp: [18] mantis-iscsi-lif2.lab.eng.tlv2.redhat.com:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) Expected results: After host goes to maintenance No storage connections should remain Additional info: This occurs on NetApp storage server netapp-cluster3.mgmt.lab.eng.tlv2.redhat.com - DOES NOT OCCUR ON ExtreamIO storage.