Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2054745

Summary: Setting SD to maintenance fails and turns the SD to inactive mode as a result
Product: [oVirt] vdsm Reporter: sshmulev
Component: CoreAssignee: Vojtech Juranek <vjuranek>
Status: CLOSED CURRENTRELEASE QA Contact: sshmulev
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.50CC: aefrat, ahadas, bugs, michal.skrivanek, nsoffer, vjuranek
Target Milestone: ovirt-4.5.0Keywords: Automation, AutomationBlocker, Regression, TestBlocker, ZStream
Target Release: 4.50.0.7Flags: pm-rhel: ovirt-4.5?
pm-rhel: blocker?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.50.0.7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-20 06:33:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sshmulev 2022-02-15 15:55:32 UTC
Description of problem:
Setting SD to maintenance fails and turns the SD to inactive mode.
Reproduced this on ISCSI,FCP.
Doesn't reproduce on NFS and gluster.
After the SD turns to inactive mode, it is possible to activate it.


Version-Release number of selected component (if applicable):
ovirt-engine-4.5.0-582.gd548206.185.el8ev.noarch
vdsm-4.50.0.5-1.el8ev.x86_64

How reproducible:
100%

Steps to Reproduce:
Use the web admin UI-> got to "Compute" -> "data centers" -> choose iscsi SD or FCP -> set to maintenance mode

Actual results:
after a few seconds you will get message like this:
"VDSM host_mixed_1 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
VDSM host_mixed_3 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
VDSM host_mixed_2 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
Failed to deactivate Storage Domain iscsi_0 (Data Center golden_env_mixed)."

and on the SPM host we get a Traceback:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1190, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-119>", line 2, in disconnectStorageServer
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2232, in disconnectStorageServer
    results = storageServer.disconnect(domType, conList)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 932, in disconnect
    con_class, connections = _prepare_connections(dom_type, con_defs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 942, in _prepare_connections
    con_class = ConnectionFactory.registeredConnectionTypes[con_info.type]
UnboundLocalError: local variable 'con_info' referenced before assignment


Expected results:
Setting the host to maintenance should succeed without any failures.

Additional info:
This is a regression, that does not reproduce in 4.4.10 latest build.
Seems like the fix from this bug influenced this action (https://bugzilla.redhat.com/1787192)

Comment 2 Vojtech Juranek 2022-02-15 17:54:41 UTC
This is caused by engine, not sending connection info to vdsm:

    2022-02-15 15:48:25,654+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-37827) [storagedomains_syncAction_bb8aad06-f] START, DisconnectStorageServerVDSCommand(HostName = host_mixed_3, StorageServerConnectionManagementVDSParameters:{hostId='1d0c3077-9101-4685-bfbb-4efadc7c8899', storagePoolId='a370dd63-8234-47db-a373-3040ea63f4e1', storageType='ISCSI', connectionList='[]', sendNetworkEventOnFailure='true'}), log id: 5499f753

connectionList is empty list, which caused vdsm to crash.

This being said, vdsm code can be improved to handle this situation more gracefully.

Comment 3 Michal Skrivanek 2022-02-16 10:43:04 UTC
workaround exists: "After the SD turns to inactive mode, it is possible to activate it." -> high severity.

Comment 4 RHEL Program Management 2022-02-16 10:43:12 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Vojtech Juranek 2022-02-16 17:58:28 UTC
this happen only when there are more SDs connected to the same iSCSI target and in such case we cannot disconnect the tager, i.e. this is expected engine behaviour and should be fixed in vdsm to do nothing when connection list is empty.

Comment 6 Nir Soffer 2022-02-16 23:52:22 UTC
We need engine bug, it should never send disconnect request without connections
to disconnect. This is invalid request. Unfortunately we must support broken
engines so we cannot fail the request.

The vdsm schema should be updated to require non-empty connection list for
both connectStorageServer and disconnectStorageServer.

Comment 7 Avihai 2022-02-22 07:29:03 UTC
(In reply to Michal Skrivanek from comment #3)
> workaround exists: "After the SD turns to inactive mode, it is possible to
> activate it." -> high severity.

Yes, it has a WA for activation but the fact remains the SD can not be moved to maintenance so customers can not detach SD and move to a different DC or engine which is a huge blocker IMO -> changing to urgent.

Comment 8 sshmulev 2022-03-01 12:57:59 UTC
Verified successfully.

Versions:
ovirt-engine-4.5.0-743c0a787472.211.el8ev.noarch
vdsm-4.50.0.7-1.el8ev

Verified steps:
1) Maintenance on SDs(FCP, ISCSI, NFS, Gluster) on HE environment and on a regular one.
2) Activate all the SDs that were set on maintenance mode.

Expected results:
All steps should pass successfully without error logs.

Actual results: 
As expected.

Comment 10 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.