Created attachment 1080246 [details] messages and hosted-engine-setup logs Description of problem: After new install of RHEVH from USB stick, configuring VLAN and running Hosted Engine deploy the installation failed because of libvirt couldn't be started Version Red Hat Enterprise Virtualization Hypervisor release 7.1 (20150917.0.el7ev) Steps to Reproduce: 1. Install RHEVH with USB stick 2. Run TUI and configure VLAN on it 3. Run Hosted Engine deploy Actual results: Installation fails as libvirt couldn't be started Expected results: Installation should continue Additional info:
The issue seams to be there: 2015-10-06 11:25:06 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:932 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout: Checking configuration status... libvirt is already configured for vdsm Running configure... checking certs.. Reconfiguration of libvirt is done. 2015-10-06 11:25:06 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:937 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr: Error: ServiceOperationError: _systemctlStart failed Job for libvirtd.service failed. See 'systemctl status libvirtd.service' and 'journalctl -xn' for details. 2015-10-06 11:25:06 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/system/vdsmenv.py", line 148, in _late_setup RuntimeError: Failed to reconfigure libvirt for VDSM 2015-10-06 11:25:06 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Environment setup': Failed to reconfigure libvirt for VDSM
Genadi, please provide all the logs from a failed run (including vdsm log)
Dan, is this a regression on vdsm-tool?
Created attachment 1080542 [details] vdsm logs
All the hosted-engine jobs on 7.2 are failing for the same reason: http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_dynamicip_iscsi_install_on_latest/28/ http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_staticip_nfs3_install_on_latest/85/ http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_dynamicip_nfs4_install_on_latest/36/ http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_staticip_nfs4_install_on_latest/38/ http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_staticip_iscsi_install_on_latest/28/ http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_dynamicip_nfs3_install_on_latest/82/ The same jobs are working on 7.1
This is blocking our RHEV-H automation and the Engine appliance testing
(In reply to Fabian Deutsch from comment #6) > This is blocking our RHEV-H automation and the Engine appliance testing Does it reproduce 100% of the time?
The bug is reproduced also on RHEL (7.2). Therefore, changing the product to vdsm.
vdsm-4.17.8-1.el7ev.noarch vdsm-xmlrpc-4.17.8-1.el7ev.noarch vdsm-infra-4.17.8-1.el7ev.noarch vdsm-jsonrpc-4.17.8-1.el7ev.noarch vdsm-cli-4.17.8-1.el7ev.noarch vdsm-yajsonrpc-4.17.8-1.el7ev.noarch vdsm-python-4.17.8-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.0-1.el7ev.noarch
I'm not sure... What is the command-line output of `vdsm-tool configure --force`? 2015-10-06 11:25:06 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:937 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr: Error: ServiceOperationError: _systemctlStart failed
(In reply to Oved Ourfali from comment #8) > (In reply to Fabian Deutsch from comment #6) > > This is blocking our RHEV-H automation and the Engine appliance testing > > Does it reproduce 100% of the time? This morning on HE CI jobs it reproduced: - 5 times over 5 jobs on 7.2 (100%). - 0 times over 5 jobs on 7.1 (0%).
looks like https://gerrit.ovirt.org/#/c/46772/ HE should use JSON-RPC...as it slowly become an issue as XML-RPC is getting less and less testing
(In reply to Michal Skrivanek from comment #13) > looks like https://gerrit.ovirt.org/#/c/46772/ > HE should use JSON-RPC...as it slowly become an issue as XML-RPC is getting > less and less testing Sandro - Why is HE using xmlrpc? I suggest to open another bug to move to jsonrpc, and move this one to virt. Michal - what do you say?
sounds good to me. The actual bug is actually already fixed in vdsm
(In reply to Oved Ourfali from comment #14) > (In reply to Michal Skrivanek from comment #13) > > looks like https://gerrit.ovirt.org/#/c/46772/ > > HE should use JSON-RPC...as it slowly become an issue as XML-RPC is getting > > less and less testing > > Sandro - Why is HE using xmlrpc? I suggest to open another bug to move to > jsonrpc, and move this one to virt. Oved whoever introduced JSON-RPC feature forgot about hosted engine. We discovered XMLRPC will disappear in 4.0 in a gerrit comment, so we already opened Bug #1228641 - [RFE] Switch from XML-RPC to JSON-RPC API > Michal - what do you say?
Even if the patch you refer to is missing I still see an issue with vdsm startup in this environment - it takes almost a minute for it to be response .. long time since I saw such issue which was regarding the vm recovering part. now it seems related to network initialization. Pioter looks into it right now
(In reply to Sandro Bonazzola from comment #16) > Oved whoever introduced JSON-RPC feature forgot about hosted engine. We > discovered XMLRPC will disappear in 4.0 in a gerrit comment, so we already > opened > Bug #1228641 - [RFE] Switch from XML-RPC to JSON-RPC API By the way, according to documentation, JsonRpc is still on development, far away to be ready for use by hosted engine. http://www.ovirt.org/Features/Design/JsonRpc Revision as of 06:07, 10 June 2015 by Ybronhei Still in design stage.
(In reply to Sandro Bonazzola from comment #18) > (In reply to Sandro Bonazzola from comment #16) > > > Oved whoever introduced JSON-RPC feature forgot about hosted engine. We > > discovered XMLRPC will disappear in 4.0 in a gerrit comment, so we already > > opened > > Bug #1228641 - [RFE] Switch from XML-RPC to JSON-RPC API > Perhaps. But we're changing all 3.6 cluster level hosts to work with jsonrpc, in the way to deprecate xmlrpc. > By the way, according to documentation, JsonRpc is still on development, far > away to be ready for use by hosted engine. > http://www.ovirt.org/Features/Design/JsonRpc > Revision as of 06:07, 10 June 2015 by Ybronhei > > Still in design stage. The jsonrpc feature consist of many phases, so it is still on the works, also in 4.0. But, the protocol is ready, and it is jsonrpc.
On my setup the bug reproduce 100% using RHEL7.2 while deploying 3.6 hosted engine (even after reboot). On Genadi's setup, with 3.5 7.1 RHEV-H hosted-engine, it didn't reproduce 100%, after reboot it worked fine. Yaniv, can you please provide Dan request for the vdsm-tool configure output? (I'm not sure if it still has the original code)
Vdsm replies with "Recovering from crash or Initializing" for around 40 seconds. Checking the logs I can see that from vdsm starting: MainThread::INFO::2015-10-07 16:13:06,121::vdsm::157::vds::(run) (PID: 17709) I am the actual vdsm 4.17.8-1.el7ev green-vdsb.qa.lab.tlv.redhat.com (3.10.0-320.el7.x86_64) to HMS being ready: storageRefresh::DEBUG::2015-10-07 16:13:37,989::hsm::373::Storage.HSM::(storageRefresh) HSM is ready the time is very similar. It seems to be not related to infra but it is actually storage issue.
(In reply to Piotr Kliczewski from comment #21) > Vdsm replies with "Recovering from crash or Initializing" for around 40 > seconds. > Checking the logs I can see that from vdsm starting: > > MainThread::INFO::2015-10-07 16:13:06,121::vdsm::157::vds::(run) (PID: > 17709) I am the actual vdsm 4.17.8-1.el7ev green-vdsb.qa.lab.tlv.redhat.com > (3.10.0-320.el7.x86_64) > > to HMS being ready: > > storageRefresh::DEBUG::2015-10-07 > 16:13:37,989::hsm::373::Storage.HSM::(storageRefresh) HSM is ready > > the time is very similar. It seems to be not related to infra but it is > actually storage issue. Vdsm behavior is expected, caller cannot assume it will be ready in 30 seconds and need to be ready to handle such timeouts. Usually hsm is ready in few seconds, but if scsi/fc scan get stuck or lvm operation get stuck, vdsm may be stuck from several minutes in this state. This should be improved when bug 880738 will be resolved.
Looking at vdsm log on the host: 1. Vdsm starts: MainThread::INFO::2015-10-07 16:18:09,389::vdsm::157::vds::(run) (PID: 18099) I am the actual vdsm 4.17.8-1.el7ev green-vdsb.qa.lab.tlv.redhat.com (3.10.0-320.el7.x86_64) 2. Starting scsi scan (can take up to 30 seconds) storageRefresh::DEBUG::2015-10-07 16:18:09,477::iscsi::431::Storage.ISCSI::(rescan) Performing SCSI scan, this will take up to 30 seconds 3. Starting FC scan (can take up to 30 seconds) storageRefresh::DEBUG::2015-10-07 16:18:09,557::hba::56::Storage.HBA::(rescan) Starting scan 4. Trying to connect to supervdsm (fc-scan is run as root) storageRefresh::DEBUG::2015-10-07 16:18:09,567::supervdsm::76::SuperVdsmProxy::(_connect) Trying to connect to Super Vdsm 5. FC scan timed out storageRefresh::ERROR::2015-10-07 16:18:39,669::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=18151) (We will look at /var/log/messages to understand why) 6. HSM is ready storageRefresh::ERROR::2015-10-07 16:18:39,669::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=18151) So the issue was timeout scanning FC devices. There is no bug in vdsm storage init phase, the caller should expect these timeouts. Having said that, we should improve the startup behavior, since scsi and fc scan and the lvm operations should not be needed since vdsm is not connect to storage yet. We should delay any such access until the point vdsm is asked to connect to a storage server (should be 4.0 RFE). Looking in supervdsm log: # grep Storage.HBA /var/log/vdsm/vdsm.log storageRefresh::DEBUG::2015-10-07 14:21:37,745::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 14:22:07,861::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=14101) storageRefresh::DEBUG::2015-10-07 15:44:36,718::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::DEBUG::2015-10-07 15:45:09,391::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::DEBUG::2015-10-07 15:45:27,163::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 15:45:57,268::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=16297) storageRefresh::DEBUG::2015-10-07 16:07:33,312::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:08:03,433::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=17158) storageRefresh::DEBUG::2015-10-07 16:11:48,454::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:12:18,577::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=17498) storageRefresh::DEBUG::2015-10-07 16:13:06,274::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:13:36,387::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=17760) storageRefresh::DEBUG::2015-10-07 16:18:09,557::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:18:39,669::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=18151) storageRefresh::DEBUG::2015-10-07 16:20:37,725::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:21:07,842::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=18444) storageRefresh::DEBUG::2015-10-07 16:21:55,537::hba::56::Storage.HBA::(rescan) Starting scan storageRefresh::ERROR::2015-10-07 16:22:25,633::hba::60::Storage.HBA::(rescan) Scan failed: Timeout scanning (pid=18713) It is clear that something in FC connection is bad in this host. Running fc-scan manually show: [root@green-vdsb ~]# /usr/libexec/vdsm/fc-scan -v fc-scan: Scanning /sys/class/scsi_host/host0/scan fc-scan: Scanning /sys/class/scsi_host/host4/scan fc-scan: Scanned /sys/class/scsi_host/host0/scan: 0.00 seconds fc-scan: Scanned /sys/class/scsi_host/host4/scan: 82.01 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4:0:0/4:0:0:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4:0:0/4:0:0:0/rescan: 0.01 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4:0:1/4:0:1:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4:0:1/4:0:1:0/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:0/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:1/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:1/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:2/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:2/rescan: 0.00 seconds /sys/class/fc_host/host4 is not responding as expected. Aharon, can you for errors in the server side explaining this timeout?
This seems like a server issue, not a vdsm bug. If hosted engine setup fails because of 30 seconds timeout, this is hosted engine setup bug; it should be ready for storage timeouts.
http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_dynamicip_nfs3_install_on_latest/82/ It's failing also for the CI jobs testing the deployment of hosted-engine over NFS on RHEL 7.2. That machines are not using iSCSI and doesn't have FC HBA. The same code is working on RHEL 7.1. Maybe there is also an additional issue on that host storage bug the bug is systematically reproducible somewhere else.
based on comment #24, do we have any other problem? (comment #17) Does it actually fail or just takes longer time, Yaniv? If it works then this bug should remain in MODIFIED, if there are HE deployment timeouts failing the installation then this need to be moved to integration and addressed for 3.6 (I suppose)
Checking /var/log/messages while performing fc-scan: # logger "*** starting fc-scan ***"; /usr/libexec/vdsm/fc-scan -v; logger "*** fc-scan finished ***" fc-scan: Scanning /sys/class/scsi_host/host0/scan fc-scan: Scanning /sys/class/scsi_host/host4/scan fc-scan: Scanned /sys/class/scsi_host/host0/scan: 0.00 seconds fc-scan: Scanned /sys/class/scsi_host/host4/scan: 82.01 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4:0:0/4:0:0:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4:0:0/4:0:0:0/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4:0:1/4:0:1:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4:0:1/4:0:1:0/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:0/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:0/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:1/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:1/rescan: 0.00 seconds fc-scan: Rescanning device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:2/rescan fc-scan: Rescanned device /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4:0:2/4:0:2:2/rescan: 0.01 seconds Oct 7 18:30:46 green-vdsb root: *** starting fc-scan *** Oct 7 18:32:08 green-vdsb kernel: scsi 4:0:3:0: timing out command, waited 82s Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Sense Key : Illegal Request [current] Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Add. Sense: Logical unit not supported Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Sense Key : Illegal Request [current] Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Add. Sense: Logical unit not supported Oct 7 18:32:08 green-vdsb root: *** fc-scan finished *** Chris, what can cause this timeout (kernel: scsi 4:0:3:0: timing out command, waited 82s), and what is the meaning of errors above?
Maybe we have to split it into two distinct bugs: the original issue,and from that the bug title is 'Hosted engine deploy fails due to Libvirt cannot be started error'. It's fully reproduce on RHEL 7.2 and all of our HE CI jobs are failing there. Took this one on NFS3 as a reference (just to exclude any iSCSI or FC interferences): http://jenkins-ci.eng.lab.tlv.redhat.com/job/hosted_engine_3.6_el7_dynamicip_nfs3_install_on_latest/82/ hosted-engine-setup it's calling '/bin/vdsm-tool configure --force' in a blocking mode without any timeout. but on RHEL 7.2 it's failing: 2015-10-07 06:08:51 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:936 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout: Checking configuration status... multipath requires configuration 2015-10-07 06:08:51 DEBUG otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv plugin.execute:941 execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr: Traceback (most recent call last): File "/bin/vdsm-tool", line 219, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 137, in configure if _should_configure(c, args.force)] File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 327, in _should_configure configured = _isconfigured(c) File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 113, in _isconfigured return getattr(module, 'isconfigured', lambda: configurators.NO)() File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line 43, in isconfigured _, out, _ = utils.execCmd(script) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 679, in execCmd deathSignal=deathSignal, childUmask=childUmask) File "/usr/lib64/python2.7/site-packages/cpopen/__init__.py", line 51, in __init__ stderr=PIPE) File "/usr/lib64/python2.7/subprocess.py", line 712, in __init__ restore_sigpipe=restore_sigpipe) TypeError: _execute_child_v276() got an unexpected keyword argument 'restore_sigpipe' Under /var/log/vdsm/upgrade.log we found the correspondent error: MainThread::ERROR::2015-10-07 06:08:47,330::upgrade::94::upgrade::(apply_upgrade) Failed to run upgrade-unified-persistence Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/tool/upgrade.py", line 92, in apply_upgrade upgrade.run(ns, args) File "/usr/lib/python2.7/site-packages/vdsm/tool/unified_persistence.py", line 193, in run run() File "/usr/lib/python2.7/site-packages/vdsm/tool/unified_persistence.py", line 44, in run networks, bondings = _getNetInfo() File "/usr/lib/python2.7/site-packages/vdsm/tool/unified_persistence.py", line 131, in _getNetInfo networks = _processNetworks(netinfo) File "/usr/lib/python2.7/site-packages/vdsm/tool/unified_persistence.py", line 53, in _processNetworks defaultGateway = getDefaultGateway() File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 319, in getDefaultGateway output = routeShowGateways('main') File "/usr/lib/python2.7/site-packages/vdsm/ipwrapper.py", line 537, in routeShowGateways return _execCmd(command) File "/usr/lib/python2.7/site-packages/vdsm/ipwrapper.py", line 526, in _execCmd returnCode, output, error = execCmd(command) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 679, in execCmd deathSignal=deathSignal, childUmask=childUmask) File "/usr/lib64/python2.7/site-packages/cpopen/__init__.py", line 51, in __init__ stderr=PIPE) File "/usr/lib64/python2.7/subprocess.py", line 712, in __init__ restore_sigpipe=restore_sigpipe) TypeError: _execute_child_v276() got an unexpected keyword argument 'restore_sigpipe' The same code is working on RHEL 7.1 so on my opinion is a regression on vdsm-tool. It's systematically happening with vdsm-4.17.8-12.gitbabda51.el7.noarch so if it has been recently fixed the path is still not there.
It's probably just a duplicate of this one: https://bugzilla.redhat.com/show_bug.cgi?id=1264667
The CI jobs are running with python-cpopen-1.3-4.el7.x86_64 witch comes from EPEL: https://bodhi.fedoraproject.org/updates/?packages=python-cpopen So it's probably the time to release an updated version with that fix.
(In reply to Simone Tiraboschi from comment #28) > TypeError: _execute_child_v276() got an unexpected keyword argument > 'restore_sigpipe' You are using old cpopen - Python on 7.2 modified internal api which broke cpopen. Please update cpopen to 1.4.
According to https://bugzilla.redhat.com/show_bug.cgi?id=1264667#c9 you can also just update to the latest python version in the CI env.
(In reply to Oved Ourfali from comment #32) > According to https://bugzilla.redhat.com/show_bug.cgi?id=1264667#c9 you can > also just update to the latest python version in the CI env. Nice, but that CI job is already updating everything with what is available.
(In reply to Nir Soffer from comment #27) > Checking /var/log/messages while performing fc-scan: > > # logger "*** starting fc-scan ***"; /usr/libexec/vdsm/fc-scan -v; logger > "*** fc-scan finished ***" > fc-scan: Scanning /sys/class/scsi_host/host0/scan > fc-scan: Scanning /sys/class/scsi_host/host4/scan > fc-scan: Scanned /sys/class/scsi_host/host0/scan: 0.00 seconds > fc-scan: Scanned /sys/class/scsi_host/host4/scan: 82.01 seconds > fc-scan: Rescanning device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4: > 0:0/4:0:0:0/rescan > fc-scan: Rescanned device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-0/target4: > 0:0/4:0:0:0/rescan: 0.00 seconds > fc-scan: Rescanning device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4: > 0:1/4:0:1:0/rescan > fc-scan: Rescanned device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-1/target4: > 0:1/4:0:1:0/rescan: 0.00 seconds > fc-scan: Rescanning device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:0/rescan > fc-scan: Rescanned device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:0/rescan: 0.00 seconds > fc-scan: Rescanning device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:1/rescan > fc-scan: Rescanned device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:1/rescan: 0.00 seconds > fc-scan: Rescanning device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:2/rescan > fc-scan: Rescanned device > /sys/devices/pci0000:00/0000:00:04.0/0000:07:00.1/host4/rport-4:0-2/target4: > 0:2/4:0:2:2/rescan: 0.01 seconds > > > Oct 7 18:30:46 green-vdsb root: *** starting fc-scan *** > Oct 7 18:32:08 green-vdsb kernel: scsi 4:0:3:0: timing out command, waited > 82s > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Read Capacity(10) > failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Sense Key : Illegal > Request [current] > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:0:0: [sdb] Add. Sense: Logical > unit not supported > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Read Capacity(10) > failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Sense Key : Illegal > Request [current] > Oct 7 18:32:08 green-vdsb kernel: sd 4:0:1:0: [sdc] Add. Sense: Logical > unit not supported > Oct 7 18:32:08 green-vdsb root: *** fc-scan finished *** > > Chris, what can cause this timeout (kernel: scsi 4:0:3:0: timing out > command, waited 82s), and what is the meaning of errors above? Was the target side storage reconfigured? The "Logical unit not supported" response indicates that the LUs for sdb and sdc no longer exist on the target (or targets I guess, as there are multiple targets with one LU each it looks like). So targets 0 and 1 are responding but each failing Read Capacity(10) on LU 0, and target 3 failed to respond at all.
(In reply to Nir Soffer from comment #23) > Aharon, can you for errors in the server side explaining this timeout? No FC errors on storage server side, also, the devices are available for the host: [root@green-vdsb scsi_host]# cat /proc/scsi/scsi . . Host: scsi4 Channel: 00 Id: 02 Lun: 00 Vendor: XtremIO Model: XtremApp Rev: 4010 Type: RAID ANSI SCSI revision: 06 Host: scsi4 Channel: 00 Id: 02 Lun: 01 Vendor: XtremIO Model: XtremApp Rev: 4010 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi4 Channel: 00 Id: 02 Lun: 02 Vendor: XtremIO Model: XtremApp Rev: 4010 Type: Direct-Access ANSI SCSI revision: 06 . . Also, the bug is 100% reproduced on hosts that are not connected to FC as stated on comment #25
Re-setting the need-info flags for Yaniv
(In reply to Elad from comment #36) > Also, the bug is 100% reproduced on hosts that are not connected to FC as > stated on comment #25 which bug? Again, please separate the discussion about possible storage scan-related issues and the original reported problem, and cpopen. The latter two are already addressed
(In reply to Michal Skrivanek from comment #38) > (In reply to Elad from comment #36) > > Also, the bug is 100% reproduced on hosts that are not connected to FC as > > stated on comment #25 > > which bug? Again, please separate the discussion about possible storage > scan-related issues and the original reported problem, and cpopen. The > latter two are already addressed I'm talking about the hosted-engine failure
(In reply to Elad from comment #36) > (In reply to Nir Soffer from comment #23) > > Aharon, can you for errors in the server side explaining this timeout? > > No FC errors on storage server side, also, the devices are available for the > host: > > [root@green-vdsb scsi_host]# cat /proc/scsi/scsi > . > . > Host: scsi4 Channel: 00 Id: 02 Lun: 00 > Vendor: XtremIO Model: XtremApp Rev: 4010 > Type: RAID ANSI SCSI revision: 06 > Host: scsi4 Channel: 00 Id: 02 Lun: 01 > Vendor: XtremIO Model: XtremApp Rev: 4010 > Type: Direct-Access ANSI SCSI revision: 06 > Host: scsi4 Channel: 00 Id: 02 Lun: 02 > Vendor: XtremIO Model: XtremApp Rev: 4010 > Type: Direct-Access ANSI SCSI revision: 06 Elad, please open a separate bug kernel bug for this issue. Clearly there is something wrong here, scanning timeout after 82 seconds and errors mentioned in comment 35 are not ok. Please use the command seen in comment 27 on host running 7.2 and 7.1 connected to the same storage server and attach the command output and the relevant errors seen in /var/log/messages while the command was running.
This is not a storage bug, and the unrelated storage/kernel related issue will be tracked in another bug.
Jeff, please check comment 42.
> Elad, please open a separate bug kernel bug for this issue. Clearly there > is something wrong here, scanning timeout after 82 seconds and errors > mentioned in comment 35 are not ok. > > Please use the command seen in comment 27 on host running 7.2 and 7.1 > connected to the same storage server and attach the command output and > the relevant errors seen in /var/log/messages while the command was > running. Nir, on both 7.1 and 7.2 it behaves the same, the fc scan takes ~80 seconds using XtremIO storage server. Tried with another storage server (EMC-VNX) and the scan takes less than a second.
(In reply to Elad from comment #48) > > Elad, please open a separate bug kernel bug for this issue. Clearly there > > is something wrong here, scanning timeout after 82 seconds and errors > > mentioned in comment 35 are not ok. > > > > Please use the command seen in comment 27 on host running 7.2 and 7.1 > > connected to the same storage server and attach the command output and > > the relevant errors seen in /var/log/messages while the command was > > running. > > Nir, on both 7.1 and 7.2 it behaves the same, the fc scan takes ~80 seconds > using XtremIO storage server. > > Tried with another storage server (EMC-VNX) and the scan takes less than a > second. So this is an issue with the storage server, not with 7.2 or 7.1.
Moving out yo 3.6.1, since it's dependent on the storage server.
Moving back since this is MODIFIED.
note this bug doesn't track the storage issue mentioned above
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Can you please check if the dependent bugs are really needed to resolve this issue? Can you move them to see also if not?
They are not. This bug tracks only the regression in xmlrpc needed for hosted engine. That is fixed. Everything else is unrelated
On clean installation on Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151025.0.el7ev) I ran as shown bellow and failed with the error: [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: Configuration files: ['/tmp/tmpAM2ccM'] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20151026152224-k4x1eo.log Version: otopi-1.4.0 (otopi-1.4.0-1.el7ev) [ INFO ] Hardware supports virtualization [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ ERROR ] Failed to execute stage 'Environment setup': Failed to reconfigure libvirt for VDSM [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151026152232.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Something went wrong setting up hosted engine, or the setup process was cancelled. Press any key to continue... If I'm following these steps then: [root@black-vdsb ~]# vdsm-tool configure --force Checking configuration status... Current revision of multipath.conf detected, preserving libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Running configure... checking certs.. Reconfiguration of libvirt is done. Done configuring modules to VDSM. I'm getting the same results: [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: Configuration files: ['/tmp/tmpRaZH19'] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20151026153500-e9t177.log Version: otopi-1.4.0 (otopi-1.4.0-1.el7ev) [ INFO ] Hardware supports virtualization [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ ERROR ] Failed to execute stage 'Environment setup': Failed to start service 'vdsmd' [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151026153512.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination Something went wrong setting up hosted engine, or the setup process was cancelled. Press any key to continue... Please see the logs I'm attached.
Created attachment 1086545 [details] logs from the host (black)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
I've created a separate bug 1275571 to not mix both issues and have set this bug as depends on 1275571, this is due to inability to verify as deployment fails on network bridge creation issue.
Works for me, failed to reproduce these errors on my Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20151113.123.el7ev) during both iSCSI and NFS deployments: mom-0.5.1-1.el7ev.noarch sanlock-3.2.4-1.el7.x86_64 libvirt-1.2.17-13.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7.x86_64 vdsm-4.17.10.1-0.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-node-branding-rhev-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-lib-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-plugin-snmp-logic-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch ovirt-node-plugin-vdsm-0.6.1-3.el7ev.noarch ovirt-setup-lib-1.0.0-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-node-lib-config-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-selinux-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-plugin-cim-logic-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch ovirt-node-plugin-hosted-engine-0.3.0-3.el7ev.noarch ovirt-node-plugin-snmp-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-plugin-rhn-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-node-lib-legacy-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-host-deploy-offline-1.4.0-1.el7ev.x86_64 ovirt-node-plugin-cim-3.6.0-0.20.20151103git3d3779a.el7ev.noarch ovirt-host-deploy-1.4.1-0.0.master.el7ev.noarch ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch
Since oVirt 3.6.0 has been released, moving from verified to closed current release.