Description of problem: If logging into a node fails with "session exists" (error 15): iscsiadm: initiator reported error (15 - session exists) We use to remove the node, which disconnects the node and removes it. This is not new behavior, but it seems that in 4.4 this cleanup was not effective in the case of logging in to the same connection more than once, and now it reliably disconnects the first node and leaves the host without any nodes, which makes it non-operational. Version-Release number of selected component (if applicable): RHV-4.5.0 How reproducible: always Steps to Reproduce: 1. Have several dup iscsi connections in engine DB and an iscsi SD. 2. Deactivate-activate host Actual results: If logging into a node fails with "session exists" (error 15), it removes the node, which disconnects the node and removes it. Expected results: if you try to connect to the same target more than once, the host should end with one connected target. Additional info:
This one is killing most of QE automation runs as hosts goes to non-operational state as we(RHV QE) use ansible role storages/tasks/main.yml (https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/infra/roles/storages/tasks/main.yml) for all our environment setups. This occurs only in RHV4.4SP1/RHEL8.6 thus a regression. Not sure why connecting to an existing ISCSI connection drops the connection - does not make sense but from @Nir's patch this was also the behavior on RHV4.4/RHEL<=8.5 BUT in RHV4.4 the connection was NOT dropped thus this issue of host non-operational (due to dropped ISCSI connections) was not seen. You can hit this in several ways: 1) Using ovirt-ansible-collection[1]: This will setup RHV with several ISCSI storage connections (A,B,C). The storage update task will overwrite the connections in the RHV engine with one of the connections (A) creating multiple duplicate ISCSI connections - Bug 2079896 and engine will allow it bug 2079903. This will create the following: When the host deactivate->activate or reboot, VDSM will try and reconnect to the same duplicate ISCSI connection(A) -> This will result in DROPPING the A(dup) ISCSI connection from the host -> Host goes to non-operational state for several minutes(!) as it does not see the storage backend. 2) Using API/UI - the customer can now override storage connections and create this issue as engine does not block it - bug 2079903 Same as 1) will occur. The bottom line: This should be fixes either in this bug or any other mentioned bugs: bug 2079903 or bug 2079896 to avoid this nasty regression in RHV4.4SP1. We hope not many customer use this script or will add manually or via script duplicate connection but still better fix it. [1] https://github.com/oVirt/ovirt-ansible-collection/pull/493
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
We can propose a WA for QE if needed
I compared current code with - commit 1f8eee87972a0e5c (virt: CPU hotplugging support for dedicated CPUs) from Tue Dec 14 17:45:32 2021. This was the last commit before the changes related to bug 1787192. - commit 3d0583b3a474bb41 (New release: 4.40.100.2). last 4.4 commit. In both older commits we did: for con in connections: add iscsi node login to node if login failed, remove node After the change we do: for con in connections: add iscsi node run login concurrently with all connections when login does: login to node if login fail, remove the node So theoretically removing the node on login failure was is not new and we should see the same behavior in 4.4 when engine has duplicate connections. It is possible that when logging in concurrently we are more likely to fail when "session exists" but it does not make sense.
I backported the fix to 4.5.0 for testing in QE environment: https://github.com/oVirt/vdsm/pull/168
(In reply to Nir Soffer from comment #5) > I backported the fix to 4.5.0 for testing in QE environment: > https://github.com/oVirt/vdsm/pull/168 there's no need for a backport, we won't be releasing this in 4.5.0. Nighlty builds are 4.5.1 already, so if the fix/change is already there then all is good...
(In reply to Michal Skrivanek from comment #6) > (In reply to Nir Soffer from comment #5) > > I backported the fix to 4.5.0 for testing in QE environment: > > https://github.com/oVirt/vdsm/pull/168 > > there's no need for a backport, we won't be releasing this in 4.5.0. Nighlty > builds are 4.5.1 already, so if the fix/change is already there then all is > good... sure, it's just for testing
Verified The "iscsi: Keep existing session on "session exists" doesn't exists when deactivate/activate the host. Moreover, we don't have several duplicate iscsi connections in engine DB. vdsm-4.50.1.2-1.el8ev.x86_64 ovirt-engine-4.5.1.1-0.14.el8ev.noarch
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.