Created attachment 872383 [details] RHEVM Engine log (gzipped) Description of problem: It seems that we're receiving the following error during the "host-reinstall" tests (which is supposed to reinstall the already installed RHEVH host with the ISO image from RHEVM): 20:01:42 2014-03-06 20:01:42,456 - MainThread - hosts - ERROR - Response code is not valid, expected is: [200, 201], actual is: 400 20:01:42 2014-03-06 20:01:42,630 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-automation-rpm-vm17.eng.lab.tlv.redhat.com): 2014-03-06 20:01:42,356 ERROR [org.ovirt.engine.core.bll.UpdateVdsCommand] (ajp-/127.0.0.1:8702-5) [107] Installation/upgrade of Host a59f1b46-5da0-4c27-94c1-4a41f898e923,cinteg26.ci.lab.tlv.redhat.com failed due to: Cannot upgrade Host. Host version is not compatible with selected ISO version. Please select an ISO with major version 6.x. I verified that the RHEVM has the correct RHEVH RPM installed: rpm -q rhev-hypervisor6 rhev-hypervisor6-6.5-20140305.0.el6ev.noarch And the RHEVH also has this latest version installed: cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140305.0.el6ev)
I don't see any obvious version mismatch between the rpm and within the iso.
Could you share the result of command bellow? # rpm -ql rhev-hypervisor6
(In reply to Amador Pahim from comment #2) > Could you share the result of command bellow? > > # rpm -ql rhev-hypervisor6 /usr/share/rhev-hypervisor /usr/share/rhev-hypervisor/rhevh-6.5-20140305.0.el6ev.iso /usr/share/rhev-hypervisor/vdsm-compatibility-6.5-20140305.0.el6ev.txt /usr/share/rhev-hypervisor/version-6.5-20140305.0.el6ev.txt
Just in case... # cat /usr/share/rhev-hypervisor/vdsm-compatibility-6.5-20140305.0.el6ev.txt 3.4,3.3,3.2,3.1,3.0 # cat /usr/share/rhev-hypervisor/version-6.5-20140305.0.el6ev.txt 6.5,20140305.0.el6ev
Lev, could you please check if when the previous RHEV-H release is instaleld on a machine a) RHEVM suggest to upgrade b) the upgrade succeds?
(In reply to Fabian Deutsch from comment #5) > Lev, > > could you please check if when the previous RHEV-H release is instaleld on a > machine > a) RHEVM suggest to upgrade > b) the upgrade succeds? The upgrade from previous version (Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140217.0.el6ev)) fails as well.
Not able to reproduce this issue with rhevm-3.4.0-0.3.master.el6ev.noarch. Tried to upgrade from rhev-hypervisor6-6.5-20140217.0.el6ev.noarch and then to reinstall rhev-hypervisor6-6.5-20140305.0.el6ev.noarch. No issues like this.
Tested with rhevm-3.4.0-0.4.master.el6ev.noarch and rhev-hypervisor6-6.5-20140311.0.el6ev.noarch. Still not reproduced. Instead, another error comes to the game. During reinstall, engine falls in exception: 2014-03-12 10:37:41,079 ERROR [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) Error during upgrade: java.io.IOException: Pipe closed After the error, hypervisor is restarted. When it is back, I noticed system was reinstalled but vdsm is down: [root@rhevh /]# service vdsmd status VDS daemon is not running No success starting it: [root@rhevh /]# service vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Running run_final_hooks vdsm stop [ OK ] initctl: Job is already running: libvirtd vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running check_is_configured libvirt is not configured for vdsm yet sanlock service requires restart Modules libvirt,sanlock are not configured Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 145, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 142, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 265, in isconfigured RuntimeError: One of the modules is not configured to work with VDSM. To configure the module use the following: 'vdsm-tool configure [module_name]'. If all modules are not configured try to use: 'vdsm-tool configure --force' (The force flag will stop the module's service and start it afterwards automatically to load the new configuration.) vdsm: stopped during execute check_is_configured task (task returned with error code 1). vdsm start [FAILED] VDSM only works again after a "configure --force": [root@rhevh /]# vdsm-tool configure --force Checking configuration status... Running configure... checking certs.. File already persisted: /etc/libvirt/libvirtd.conf File already persisted: /etc/libvirt/qemu.conf File already persisted: /etc/sysconfig/libvirtd File already persisted: /etc/logrotate.d/libvirtd Reconfiguration of libvirt is done. /bin/sed: cannot rename /etc/logrotate.d/sedOFU2sH: Device or resource busy /bin/mv: inter-device move failed: `/tmp/tmp.tVIZ4UY9xN' to `/etc/logrotate.d/libvirtd'; unable to remove target: Device or resource busy Done configuring modules to VDSM. [root@rhevh /]# service vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Running run_final_hooks vdsm stop [ OK ] initctl: Job is already running: libvirtd vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running run_init_hooks vdsm: Running gencerts vdsm: Running check_is_configured libvirt is already configured for vdsm sanlock service is already configured vdsm: Running validate_configuration SUCCESS: ssl configured to true. No conflicts vdsm: Running prepare_transient_repository vdsm: Running syslog_available vdsm: Running nwfilter vdsm: Running dummybr vdsm: Running load_needed_modules vdsm: Running tune_system vdsm: Running test_space vdsm: Running test_lo vdsm: Running restore_nets vdsm: Running unified_network_persistence_upgrade vdsm: Running upgrade_300_nets Starting up vdsm daemon: vdsm start [ OK ] [root@rhevh /]#
From a RHEV-H perspective this could be considered as a blocker, because upgrades are - according to comment 6 - also affected.
(In reply to Fabian Deutsch from comment #22) > From a RHEV-H perspective this could be considered as a blocker, because > upgrades are - according to comment 6 - also affected. Douglas found a problem with vdsm build. I am going to rebuild vdsm tomorrow and then will rebuild rhevh and will see. - Kiril
Update about "Pipe closed". ======================== The upgrade is happening on the node via vdsm-upgrade command but after upgrade and before the reboot the communication between ovirt-node and ovirt-engine unexpectedly closes generating IOException ("pipe broken"). logs from ovirt-engine UI events: ========================================= Host localhost.localdomain installation failed. Pipe closed. Installing Host localhost.localdomain. Step: RHEV_INSTALL. Installing Host localhost.localdomain. Step: umount; Details: umount Succeeded . Installing Host localhost.localdomain. Step: doUpgrade; Details: Upgrade Succeeded. Rebooting . Installing Host localhost.localdomain. Step: setMountPoint; Details: Mount succeeded. . Installing Host localhost.localdomain. Step: RHEL_INSTALL; Details: vdsm daemon stopped for upgrade process! . Installing Host localhost.localdomain. Executing /usr/share/vdsm-reg/vdsm-upgrade. Installing Host localhost.localdomain. Sending file /usr/share/ovirt-node-iso/ovirt-node-iso-3.0.4-1.999.201403191804.el6.iso to /data/updates/ovirt-node-image.iso. Installing Host localhost.localdomain. Connected to host 192.168.100.133 with SSH key fingerprint: a9:8b:21:a8:0d:e4:16:7d:4b:79:38:f3:e0:f6:92:e0. 2014-Mar-20, 22:43 /var/log/secure during the pipe closed ========================================= Mar 22 04:40:56 localhost sshd[23184]: Accepted publickey for root from 192.168.100.228 port 33905 ssh2 Mar 22 04:40:56 localhost sshd[23184]: pam_unix(sshd:session): session opened for user root by (uid=0) Mar 22 04:41:38 localhost sshd[23184]: channel_by_id: 0: bad id: channel free Mar 22 04:41:38 localhost sshd[23184]: Disconnecting: Received ieof for nonexistent channel 0. Mar 22 04:41:38 localhost sshd[23184]: pam_unix(sshd:session): session closed for user root Mar 22 04:42:14 localhost sshd[25420]: Connection closed by 192.168.100.1 Mar 22 04:42:20 localhost sshd[12927]: Received signal 15; terminating. Alon can be related to BZ#1051035 ? Additionally to this "Pipe closed" event, I have seen some connection timeout from ovirt-engine to ovirt-node during the process of sending the iso as reported in comment#20, working back after a retry. On oVirt Node side executing manually /usr/share/vdsm-reg/vdsm-upgrade works without error. # /usr/share/vdsm-reg/vdsm-upgrade <BSTRAP component='RHEL_INSTALL' status='WARN' message='vdsm daemon is already down before we stop it for upgrade.'/> <BSTRAP component='setMountPoint' status='OK' message='Mount succeeded.'/> <BSTRAP component='doUpgrade' status='OK' message='Upgrade Succeeded. Rebooting'/> <BSTRAP component='umount' status='OK' message='umount Succeeded'/> <BSTRAP component='RHEV_INSTALL' status='OK'/>
(In reply to Douglas Schilling Landgraf from comment #26) > Alon can be related to BZ#1051035 ? yes, but this already fixed in master, 3.4 and 3.3.z. but it is easy to verify... if engine receive the RHEV_INSTALL status OK, then it is unrelated as the entire data is recieved. I am also unsure that discussing two different issues at one bug is wise, this bug was about inability to upgrade the node, as no iso was shown in engine side. what you need to do is: 1. open a bug per issue. 2. figure out if the sshd process is terminated or not when vdsm-upgrade script ends, just hard code node sleep to 600 seconds or something. 3. once the sshd process is terminated see if tcp session is terminated. 4. if tcp session is terminated checkout the engine behavior at this point.
(In reply to Alon Bar-Lev from comment #27) > (In reply to Douglas Schilling Landgraf from comment #26) > > Alon can be related to BZ#1051035 ? > > yes, but this already fixed in master, 3.4 and 3.3.z. I saw the patch but still see "Pipe closed" error on engine's 3.4 branch upstream during the upgrade. > > but it is easy to verify... if engine receive the RHEV_INSTALL status OK, > then it is unrelated as the entire data is recieved. > Agreed. > I am also unsure that discussing two different issues at one bug is wise, > this bug was about inability to upgrade the node, as no iso was shown in > engine side. Agreed. Lev, about your original report I am not able to reproduce or Amador. Are you still seeing it in the last bits? (Looks like not, based on your comment#19). If not, let's close this bug and as alon suggested open bug per issue. > > what you need to do is: > 1. open a bug per issue. > 2. figure out if the sshd process is terminated or not when vdsm-upgrade > script ends, just hard code node sleep to 600 seconds or something. > 3. once the sshd process is terminated see if tcp session is terminated. > 4. if tcp session is terminated checkout the engine behavior at this point. Sure. I will double check.
(In reply to Douglas Schilling Landgraf from comment #29) > (In reply to Alon Bar-Lev from comment #27) > > (In reply to Douglas Schilling Landgraf from comment #26) > > > Alon can be related to BZ#1051035 ? > > > > yes, but this already fixed in master, 3.4 and 3.3.z. > > I saw the patch but still see "Pipe closed" error on engine's 3.4 branch > upstream during the upgrade. > > > > > but it is easy to verify... if engine receive the RHEV_INSTALL status OK, > > then it is unrelated as the entire data is recieved. > > > Agreed. > > > I am also unsure that discussing two different issues at one bug is wise, > > this bug was about inability to upgrade the node, as no iso was shown in > > engine side. > > Agreed. Lev, about your original report I am not able to reproduce or Amador. > Are you still seeing it in the last bits? (Looks like not, based on your > comment#19). If not, let's close this bug and as alon suggested open bug per > issue. > > > > > what you need to do is: > > 1. open a bug per issue. > > 2. figure out if the sshd process is terminated or not when vdsm-upgrade > > script ends, just hard code node sleep to 600 seconds or something. > > 3. once the sshd process is terminated see if tcp session is terminated. > > 4. if tcp session is terminated checkout the engine behavior at this point. > > Sure. I will double check. With last version of RHEVH that I checked I only got to the closed SSH connection issue. Thus not sure if the original issue still exists, as this issue may come before the original one in the flow.
(In reply to Alon Bar-Lev from comment #27) > > what you need to do is: > 1. open a bug per issue. > 2. figure out if the sshd process is terminated or not when vdsm-upgrade > script ends, just hard code node sleep to 600 seconds or something. > 3. once the sshd process is terminated see if tcp session is terminated. > 4. if tcp session is terminated checkout the engine behavior at this point. I have changed vdsm-upgrade for tests only from: if install.ovirt_boot_setup(reboot="Y") to if install.ovirt_boot_setup(reboot="N") and included os.system("reboot") in vdsm-upgrade only when the script finish and I don't see any "Pipe closed" error anymore. Seems a sync issue. Fabian any suggestion? Besides of http://gerrit.ovirt.org/#/c/25967/ ? Is it time to open a bug in the ovirt-node side? Also, double check the ovirt-node.iso is copied correctly to /data/updates as I already shared previously.
Hey Douglas, thanks for investigating this so much. I can not reproduce this with plain ssh: Provision Node with Node for 3.4rc2, then from another host: $ scp ovirt-node-iso-3.0.4-TestDay.vdsm.el6.iso root.122.211:/data/updates/ovirt-node-image.iso $ ssh root.122.211 "/usr/share/vdsm-reg/vdsm-upgrade" <BSTRAP component='RHEL_INSTALL' status='WARN' message='vdsm daemon is already down before we stop it for upgrade.'/> <BSTRAP component='setMountPoint' status='OK' message='Mount succeeded.'/> <BSTRAP component='doUpgrade' status='OK' message='Upgrade Succeeded. Rebooting'/> <BSTRAP component='umount' status='OK' message='umount Succeeded'/> <BSTRAP component='RHEV_INSTALL' status='OK'/> $ No pipe closed error when triggering the vdsm-upgrade manually via ssh. There are also no unusual things in /var/log/secure. Can someone confirm this? Alon, do you maybe know if there is something special about Engines ssh client?
Lev, can you tell if upstream 3.4 is also affected by this?
(In reply to Lev Veyde from comment #30) ... > With last version of RHEVH that I checked I only got to the closed SSH > connection issue. Thus not sure if the original issue still exists, as this > issue may come before the original one in the flow. Can you also tell here what versions of RHEV-H and RHEV-M you checked?
(In reply to Fabian Deutsch from comment #32) > > No pipe closed error when triggering the vdsm-upgrade manually via ssh. > There are also no unusual things in /var/log/secure. > > Can someone confirm this? > > Alon, do you maybe know if there is something special about Engines ssh > client? There is nothing special, if process ends, session terminates, client happy. It worked so far, no reason it won't keep working. Please open a separate bug for this bug is resolved and abused.
Alon, where do you see that this bug is solved?
(In reply to Fabian Deutsch from comment #36) > Alon, > > where do you see that this bug is solved? This bug is all about: """ a59f1b46-5da0-4c27-94c1-4a41f898e923,cinteg26.ci.lab.tlv.redhat.com failed due to: Cannot upgrade Host. Host version is not compatible with selected ISO version. Please select an ISO with major version 6.x. """ This was resolved.
Based on a dialog on IRC: The original bug is solved, because the upgrade could be triggered in latter trials. And that the upgrade was run indiciates that the error which is mentioned in the description is gone. The remaining comments were abusing this bug, because they are about a different issue. Lev, could you please verify that the original bug as described in the description is really gone?
(In reply to Fabian Deutsch from comment #33) > Lev, > > can you tell if upstream 3.4 is also affected by this? I don't know - only tested downstream RHEVH. (In reply to Fabian Deutsch from comment #38) > Based on a dialog on IRC: > > The original bug is solved, because the upgrade could be triggered in latter > trials. > And that the upgrade was run indiciates that the error which is mentioned in > the description is gone. > > The remaining comments were abusing this bug, because they are about a > different issue. > > Lev, could you please verify that the original bug as described in the > description is really gone? I no longer see it in manual tests, but it still appear in automatic one (checking that).
tareq - can you veify this fails on rhev-h testing in qe or not?
(In reply to Lev Veyde from comment #39) > (In reply to Fabian Deutsch from comment #33) > > Lev, > > > > can you tell if upstream 3.4 is also affected by this? > > I don't know - only tested downstream RHEVH. > > (In reply to Fabian Deutsch from comment #38) > > Based on a dialog on IRC: > > > > The original bug is solved, because the upgrade could be triggered in latter > > trials. > > And that the upgrade was run indiciates that the error which is mentioned in > > the description is gone. > > > > The remaining comments were abusing this bug, because they are about a > > different issue. > > > > Lev, could you please verify that the original bug as described in the > > description is really gone? > > I no longer see it in manual tests, but it still appear in automatic one > (checking that). Ok, I am moving to QA for now for double check. Let's us know in case you find something in your automatic tests. (In reply to Eyal Edri from comment #40) > tareq - can you veify this fails on rhev-h testing in qe or not? Hi Tareq, In case you see the below message during your tests: 2014-03-12 10:37:41,079 ERROR [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) Error during upgrade: java.io.IOException: Pipe closed Please go to bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1080594 it's in POST and need to be backported to 3.4 downstream. Thanks
I have a running engine rhevm-3.4.0-0.12.beta2.el6ev.noarch that have up and running rhevh(Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140313.1.el6ev) tried to upgrade to Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140326.0.el6ev) Result: installation failed. However the rhevh version is Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140326.0.el6ev) and when i tried to activate host it stayed in unresponsive state. logs attached.
Created attachment 881325 [details] deploymentlog
Created attachment 881326 [details] enginelog
If you were able to see iso image in the upgrade dialog, this bug is resolved. You are be experiencing bug#1080594 or
moving to verified since the new iso image is installed.
I still see the issue: http://jenkins-ci.eng.lab.tlv.redhat.com/job/rhevm_3.4_automation_coretools_rhevh_restapi_hosts_nfs_rest_factory/126/testReport/Hosts/019-Reinstall%20host/Reinstall_host/
Closing the bug, as what we see is potentially due to another bug: https://bugzilla.redhat.com/show_bug.cgi?id=1082612
Chris, Does this one need a release note? Thanks in advance. Zac
Closing as part of 3.4.0