1238239 – Deployment of hosted engine failed on ISCSI storage with 'Error block device action'

Bug 1238239 - Deployment of hosted engine failed on ISCSI storage with 'Error block device action'

Summary: Deployment of hosted engine failed on ISCSI storage with 'Error block device ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.6.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	ovirt-3.6.0-rc3
Target Release:	3.6.0
Assignee:	Ala Hino
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-01 12:57 UTC by Artyom
Modified:	2016-03-10 12:07 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
sosreport (5.91 MB, application/x-xz) 2015-07-01 12:57 UTC, Artyom	no flags	Details
View All

Description Artyom 2015-07-01 12:57:09 UTC

Created attachment 1045043 [details]
sosreport

Description of problem:
Deployment of hosted engine failed on ISCSI storage, with vdsm eception:
RuntimeError: Broken communication with supervdsm. Failed call to readSessionInfo

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.0-0.0.master.20150623153111.git68138d4.el7.noarch
vdsm-4.17.0-1054.git562e711.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. run hosted-engine --deploy
2. choose iscsi storage and enter all necessaries details
3. 

Actual results:
After enter all details about iscsi storage deployment failed with exception:
RuntimeError: Error block device action: ()

Expected results:
Deployment continue and success without any errors

Additional info:
see all logs under sosreport

Comment 1 Allon Mureinik 2015-07-02 09:06:44 UTC

In SuperVdsm, the call succeeds:

MainProcess|Thread-24::DEBUG::2015-06-30 17:03:28,242::supervdsmServer::114::SuperVdsm.ServerCallback::(wrapper) return getScsiSerial with SXtremIO_XtremApp_PSNT_Not_Set
MainProcess|Thread-24::DEBUG::2015-06-30 17:03:28,244::supervdsmServer::107::SuperVdsm.ServerCallback::(wrapper) call readSessionInfo with (1,) {}
MainProcess|Thread-24::DEBUG::2015-06-30 17:03:28,244::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /sbin/iscsiadm -m iface -I default (cwd None)
MainProcess|Thread-24::DEBUG::2015-06-30 17:03:28,251::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-24::DEBUG::2015-06-30 17:03:28,244::supervdsmServer::114::SuperVdsm.ServerCallback::(wrapper) return readSessionInfo with IscsiSession(id=1, iface=<IscsiInterface name='default' transport='tcp' netIfaceName='None'>, target=IscsiTarget(portal=IscsiPortal(hostname='10.35.146.129', port=3260), tpgt=1, iqn='iqn.2008-05.com.xtremio:001e675b8ee0'), credentials=<storage.iscsi.ChapCredentials object at 0x7f49b0138d50>)

But VDSM is unable to get the result from it:

Thread-24::DEBUG::2015-06-30 17:03:28,252::supervdsm::76::SuperVdsmProxy::(_connect) Trying to connect to Super Vdsm
Thread-24::ERROR::2015-06-30 17:03:28,258::task::866::Storage.TaskManager.Task::(_setError) Task=`a5426845-cbba-4e7c-90e3-427cc2736e9b`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1984, in getDeviceList
    devices = self._getDeviceList(storageType=storageType, guids=guids)
  File "/usr/share/vdsm/storage/hsm.py", line 2014, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/share/vdsm/storage/multipath.py", line 304, in pathListIter
    sess = iscsi.getSessionInfo(sessionID)
  File "/usr/share/vdsm/storage/iscsi.py", line 87, in getSessionInfo
    return supervdsm.getProxy().readSessionInfo(sessionID)
  File "/usr/share/vdsm/supervdsm.py", line 55, in __call__
    % self._funcName)
RuntimeError: Broken communication with supervdsm. Failed call to readSessionInfo


Yaniv - we need infra's help here please.

Comment 2 Yaniv Bronhaim 2015-07-02 10:02:24 UTC

I see in supervdsm.log many fails after 

MainThread::DEBUG::2015-06-30 17:02:24,434::__init__::47::blivet::(register_device_format) registered device format class BIOSBoot as biosboot
MainThread::DEBUG::2015-06-30 17:02:24,474::storage_log::69::blivet::(log_exception_info) IGNORED:        Caught exception, continuing.
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::72::blivet::(log_exception_info) IGNORED:        Problem description: failed to get initiator name from iscsi firmware
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::73::blivet::(log_exception_info) IGNORED:        Begin exception details.
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::76::blivet::(log_exception_info) IGNORED:            Traceback (most recent call last):
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::76::blivet::(log_exception_info) IGNORED:              File "/usr/lib/python2.7/site-packages/blivet/iscsi.py", line 87, in __init__
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::76::blivet::(log_exception_info) IGNORED:                initiatorname = libiscsi.get_firmware_initiator_name()
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::76::blivet::(log_exception_info) IGNORED:            IOError: Unknown error
MainThread::DEBUG::2015-06-30 17:02:24,475::storage_log::77::blivet::(log_exception_info) IGNORED:        End exception details.
MainThread::DEBUG::2015-06-30 17:02:24,482::supervdsmServer::486::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm


- after this call libiscsi.get_firmware_initiator_name supervdsmd is restarted .

please figure why it kills the process. its not related to the communication between vdsm to supervdsm. the broken communication happens once after each supervdsm crash - but the crash is the bug here

Comment 3 Sandro Bonazzola 2015-07-02 10:47:51 UTC

Moving to VDSM according to comment #1 and comment #2

Comment 5 Nir Soffer 2015-09-15 21:26:46 UTC

This is not storage issue, this is an error in supervdsm, probably related
to multiprocessing call failing after receiving a signal.

These issues started when we added zombiereaper to supervdsm. Each time
a process ends, supervdsm get a SIGCHLD signal. If a multiprocessing call
is interrupted by the signal, our code typically fail in a wrong way, 
because we do not check and handle EINTR in such calls.

This may be also a duplicate of https://bugzilla.redhat.com/1259310.

Greg, do you have some insight on this?

I think this should move to infra.

Comment 6 Greg Padgett 2015-09-18 02:34:04 UTC

(In reply to Nir Soffer from comment #5)
> This is not storage issue, this is an error in supervdsm, probably related
> to multiprocessing call failing after receiving a signal.
> 
> These issues started when we added zombiereaper to supervdsm. Each time
> a process ends, supervdsm get a SIGCHLD signal. If a multiprocessing call
> is interrupted by the signal, our code typically fail in a wrong way, 
> because we do not check and handle EINTR in such calls.
> 
> This may be also a duplicate of https://bugzilla.redhat.com/1259310.
> 
> Greg, do you have some insight on this?

It seems very similar to the bug you mention, which was hosted engine setup failing with file storage.  Given that this started with zombiereaper and also involves supervdsm's _runcmd(), I think what you're proposing is likely.

Probably worth checking to see if it still happens now that the supervdsm fix for bug 1259310 is merged.

Comment 7 Yaniv Bronhaim 2015-10-13 15:44:16 UTC

Why isn't it sign as duplication of Bug 1259310 ?

Comment 8 Artyom 2015-10-15 12:18:56 UTC

Verified on ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch
Deploy succeed without any errors.
So I believe Elad can move it to ON_QA.

Comment 9 Elad 2015-11-09 09:42:57 UTC

I'm not sure why this bug is ON_QA if there is no patch that fixes the issue reported here. Should we verify this bug or close it as DUP of another one (1259310)?

Comment 10 Elad 2015-11-09 09:48:53 UTC

Moving to verified as this has a TestOnly keyword. 
Tested hosted-engine deployment over iSCSI and it succeeded.

Verified using:
ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.1-1.el7ev.noarch
vdsm-4.17.10.1-0.el7ev.noarch

Comment 12 Allon Mureinik 2016-03-10 10:49:08 UTC

RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Comment 13 Allon Mureinik 2016-03-10 10:50:49 UTC

RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Comment 14 Allon Mureinik 2016-03-10 12:07:21 UTC

RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.