2054745 – Setting SD to maintenance fails and turns the SD to inactive mode as a result

Bug 2054745 - Setting SD to maintenance fails and turns the SD to inactive mode as a result

Summary: Setting SD to maintenance fails and turns the SD to inactive mode as a result

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Core
Sub Component:
Version:	4.50
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	ovirt-4.5.0
Target Release:	4.50.0.7
Assignee:	Vojtech Juranek
QA Contact:	sshmulev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-15 15:55 UTC by sshmulev
Modified:	2022-04-20 06:33 UTC (History)
CC List:	6 users (show)
Fixed In Version:	vdsm-4.50.0.7
Clone Of:
Environment:
Last Closed:	2022-04-20 06:33:59 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5? pm-rhel: blocker?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt vdsm pull 77	0	None	open	Bz2054745	2022-02-16 21:44:38 UTC
Red Hat Issue Tracker	RHV-44704	0	None	None	None	2022-02-15 16:07:49 UTC

Description sshmulev 2022-02-15 15:55:32 UTC

Description of problem:
Setting SD to maintenance fails and turns the SD to inactive mode.
Reproduced this on ISCSI,FCP.
Doesn't reproduce on NFS and gluster.
After the SD turns to inactive mode, it is possible to activate it.


Version-Release number of selected component (if applicable):
ovirt-engine-4.5.0-582.gd548206.185.el8ev.noarch
vdsm-4.50.0.5-1.el8ev.x86_64

How reproducible:
100%

Steps to Reproduce:
Use the web admin UI-> got to "Compute" -> "data centers" -> choose iscsi SD or FCP -> set to maintenance mode

Actual results:
after a few seconds you will get message like this:
"VDSM host_mixed_1 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
VDSM host_mixed_3 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
VDSM host_mixed_2 command DisconnectStorageServerVDS failed: Error storage server disconnection: ('domType=3, spUUID=a370dd63-8234-47db-a373-3040ea63f4e1, conList=[]',)
Failed to deactivate Storage Domain iscsi_0 (Data Center golden_env_mixed)."

and on the SPM host we get a Traceback:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1190, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-119>", line 2, in disconnectStorageServer
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2232, in disconnectStorageServer
    results = storageServer.disconnect(domType, conList)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 932, in disconnect
    con_class, connections = _prepare_connections(dom_type, con_defs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 942, in _prepare_connections
    con_class = ConnectionFactory.registeredConnectionTypes[con_info.type]
UnboundLocalError: local variable 'con_info' referenced before assignment


Expected results:
Setting the host to maintenance should succeed without any failures.

Additional info:
This is a regression, that does not reproduce in 4.4.10 latest build.
Seems like the fix from this bug influenced this action (https://bugzilla.redhat.com/1787192)

Comment 2 Vojtech Juranek 2022-02-15 17:54:41 UTC

This is caused by engine, not sending connection info to vdsm:

    2022-02-15 15:48:25,654+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-37827) [storagedomains_syncAction_bb8aad06-f] START, DisconnectStorageServerVDSCommand(HostName = host_mixed_3, StorageServerConnectionManagementVDSParameters:{hostId='1d0c3077-9101-4685-bfbb-4efadc7c8899', storagePoolId='a370dd63-8234-47db-a373-3040ea63f4e1', storageType='ISCSI', connectionList='[]', sendNetworkEventOnFailure='true'}), log id: 5499f753

connectionList is empty list, which caused vdsm to crash.

This being said, vdsm code can be improved to handle this situation more gracefully.

Comment 3 Michal Skrivanek 2022-02-16 10:43:04 UTC

workaround exists: "After the SD turns to inactive mode, it is possible to activate it." -> high severity.

Comment 4 RHEL Program Management 2022-02-16 10:43:12 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Vojtech Juranek 2022-02-16 17:58:28 UTC

this happen only when there are more SDs connected to the same iSCSI target and in such case we cannot disconnect the tager, i.e. this is expected engine behaviour and should be fixed in vdsm to do nothing when connection list is empty.

Comment 6 Nir Soffer 2022-02-16 23:52:22 UTC

We need engine bug, it should never send disconnect request without connections
to disconnect. This is invalid request. Unfortunately we must support broken
engines so we cannot fail the request.

The vdsm schema should be updated to require non-empty connection list for
both connectStorageServer and disconnectStorageServer.

Comment 7 Avihai 2022-02-22 07:29:03 UTC

(In reply to Michal Skrivanek from comment #3)
> workaround exists: "After the SD turns to inactive mode, it is possible to
> activate it." -> high severity.

Yes, it has a WA for activation but the fact remains the SD can not be moved to maintenance so customers can not detach SD and move to a different DC or engine which is a huge blocker IMO -> changing to urgent.

Comment 8 sshmulev 2022-03-01 12:57:59 UTC

Verified successfully.

Versions:
ovirt-engine-4.5.0-743c0a787472.211.el8ev.noarch
vdsm-4.50.0.7-1.el8ev

Verified steps:
1) Maintenance on SDs(FCP, ISCSI, NFS, Gluster) on HE environment and on a regular one.
2) Activate all the SDs that were set on maintenance mode.

Expected results:
All steps should pass successfully without error logs.

Actual results: 
As expected.

Comment 10 Sandro Bonazzola 2022-04-20 06:33:59 UTC

This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.