Bug 1176048 - [6.6-3.5]Failed to upgrade hypervisor via RHEVM 3.5
Summary: [6.6-3.5]Failed to upgrade hypervisor via RHEVM 3.5
Keywords:
Status: CLOSED DUPLICATE of bug 1177216
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Douglas Schilling Landgraf
QA Contact: Virtualization Bugs
URL:
Whiteboard: node
Depends On:
Blocks: rhev35rcblocker rhev35gablocker 1177216
TreeView+ depends on / blocked
 
Reported: 2014-12-19 09:32 UTC by cshao
Modified: 2016-02-10 20:11 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1177216 (view as bug list)
Environment:
Last Closed: 2015-01-07 17:44:35 UTC
oVirt Team: Node


Attachments (Terms of Use)
upgrade.tar.gz (251.05 KB, application/x-gzip)
2014-12-19 09:32 UTC, cshao
no flags Details
sosreport_rhevh7.0_1218 (8.40 MB, application/x-xz)
2014-12-22 16:16 UTC, Ying Cui
no flags Details
engine_log_rhevh7.0_1218_comment 5 (4.04 MB, text/plain)
2014-12-22 16:22 UTC, Ying Cui
no flags Details
/var/log for comment 5 (1.90 MB, application/x-gzip)
2014-12-22 17:18 UTC, Ying Cui
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 36300 master MERGED spec: Drop custom libvirt service files Never
oVirt gerrit 36560 ovirt-3.5 MERGED spec: Drop custom libvirt service files Never
Red Hat Bugzilla 1176058 None None None Never

Internal Links: 1176058

Description cshao 2014-12-19 09:32:19 UTC
Created attachment 971058 [details]
upgrade.tar.gz

Description of problem:
[6.6-3.5]Failed to upgrade hypervisor via RHEVM 3.5

RHEV-H UI:
1. Networking status show as unknown.
2. Nic status show as unconfigured.

RHEVM UI:
pop-up below info on RHEVM UI:
Host dell-pet105-02.qe.lab.eng.nay.redhat.com is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued.


Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.6-20141218.0.el6ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el6.noarch
vdsm-4.16.8.1-4.el6ev.x86_64
RHEVM vt13.4
rhevm-3.5.0-0.26.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install rhev-hypervisor6-6.6-20141218.0
2. Register hypervisor to RHEVM.
3. Approve it.
4. Maintenance the host.
5. Click Upgrade button and upgrade to itself.

Actual results:
[6.6-3.5]Failed to upgrade hypervisor via RHEVM(VT)

Expected results:
Upgrade the hypervisor can succeed via RHEVM.

Additional info:
Not only pet105 issue, I encountered this bug on usb disk.
I tested 7.0(1218) upgrade itself, no such issue.

Comment 2 Fabian Deutsch 2014-12-19 11:47:45 UTC
A first investigation shows that libvirtd and vdsmd are not coming up correctly.

Comment 3 Ying Cui 2014-12-22 11:38:11 UTC
I can reproduce this issue on rhev-hypervisor7-7.0-20141218.0.el7ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el7.noarch

Comment 4 Fabian Deutsch 2014-12-22 13:56:55 UTC
(In reply to Ying Cui from comment #3)
> I can reproduce this issue on rhev-hypervisor7-7.0-20141218.0.el7ev
> ovirt-node-3.1.0-0.37.20141218gitcf277e1.el7.noarch

Are you sure it is that one?

The problem on 6.6 was that vdsmd and libvirtd were not coming up again.

What is the problem you are seeing on 7.0?

Comment 5 Ying Cui 2014-12-22 16:14:11 UTC
(In reply to Fabian Deutsch from comment #4)
> (In reply to Ying Cui from comment #3)
> > I can reproduce this issue on rhev-hypervisor7-7.0-20141218.0.el7ev
> > ovirt-node-3.1.0-0.37.20141218gitcf277e1.el7.noarch
> 
> Are you sure it is that one?
> 
> The problem on 6.6 was that vdsmd and libvirtd were not coming up again.
> 
> What is the problem you are seeing on 7.0?

For rhevh 7.0 itself upgrade via rhevm portal failed. libvirtd was running, but vdsmd was not coming up. 
The rhevh did not reboot automatically after upgrade, in rhevm portal there display: Host dell-per515-02.qe.lab.eng.nay.redhat.com is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued.

My test steps:
1. Install rhevh 7.0 1218 successful.
2. Register rhevh 7.0 to rhevm successful.
3. Go to rhevm portal, approve it up.
4. Maintenance the host.
5. Click Upgrade button.
6. Select rhevh 7.0 1218.iso then OK.
7. checking /data/updates/ in rhevh host, the iso is transferred.

Actually result:
1. rhevh did not reboot automatically after upgrade.
2. check rhevm portal, there display non-response.
3. failed to upgrade rhevh 7.0 itself via RHEVM 3.5.

[root@dell-per515-02 admin]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: inactive (dead) since Mon 2014-12-22 15:50:15 UTC; 3min 51s ago
  Process: 53242 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 53024 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=0/SUCCESS)
  Process: 52836 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 53024 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/vdsmd.service

Dec 22 15:48:24 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 client step 2
Dec 22 15:48:24 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 ask_user_info()
Dec 22 15:48:24 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 make_client_response()
Dec 22 15:48:24 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 client step 3
Dec 22 15:50:11 dell-per515-02.qe.lab.eng.nay.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager...
Dec 22 15:50:15 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 client mech dispose
Dec 22 15:50:15 dell-per515-02.qe.lab.eng.nay.redhat.com python[53024]: DIGEST-MD5 common mech dispose
Dec 22 15:50:15 dell-per515-02.qe.lab.eng.nay.redhat.com vdsmd_init_common.sh[53242]: vdsm: Running run_final_hooks
Dec 22 15:50:15 dell-per515-02.qe.lab.eng.nay.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager.
Dec 22 15:50:17 dell-per515-02.qe.lab.eng.nay.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager.

[root@dell-per515-02 admin]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Mon 2014-12-22 15:48:12 UTC; 12min ago
 Main PID: 52835 (libvirtd)
   CGroup: /system.slice/libvirtd.service
           └─52835 /usr/sbin/libvirtd --listen

Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Cannot find 'pm-is-supported' in path: No such file or directory
Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Failed to get host power management capabilities
Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Cannot find 'pm-is-supported' in path: No such file or directory
Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Failed to get host power management capabilities
Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Cannot find 'pm-is-supported' in path: No such file or directory
Dec 22 15:52:02 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Failed to get host power management capabilities
Dec 22 15:52:03 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Cannot find 'pm-is-supported' in path: No such file or directory
Dec 22 15:52:03 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Failed to get host power management capabilities
Dec 22 15:52:03 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Cannot find 'pm-is-supported' in path: No such file or directory
Dec 22 15:52:03 dell-per515-02.qe.lab.eng.nay.redhat.com libvirtd[52835]: Failed to get host power management capabilities

Comment 6 Ying Cui 2014-12-22 16:16:26 UTC
Created attachment 972079 [details]
sosreport_rhevh7.0_1218

Comment 7 Ying Cui 2014-12-22 16:20:10 UTC
# cd /tmp
# ll
-rw-r--r--.  1 root root   0 Dec 22 15:50 ovirt.log
-rw-r--r--.  1 root root   0 Dec 22 15:50 ovirt_upgraded

Comment 8 Ying Cui 2014-12-22 16:22:03 UTC
Created attachment 972080 [details]
engine_log_rhevh7.0_1218_comment 5

Comment 11 Ying Cui 2014-12-22 16:39:19 UTC
From the appearance point of view rhevh 6 itself upgrade and rhevh 7 itself upgrade are similar, but after your investigation if it is not the same root cause, we can spite the two bugs. Thanks.

Comment 12 Ying Cui 2014-12-22 17:18:12 UTC
Created attachment 972088 [details]
/var/log for comment 5

some logs are not in sosreport, so I pasted all /var/log/

Comment 13 Douglas Schilling Landgraf 2014-12-23 11:12:37 UTC
(In reply to Ying Cui from comment #12)
> Created attachment 972088 [details]
> /var/log for comment 5
> 
> some logs are not in sosreport, so I pasted all /var/log/

Hi Ying,

We have two differente issues here, the original one that you reported and the other listed below which should be fixed by: ovirt-node-plugin-vdsm-0.2.0-17. This happened because augeas failed when we had unneeded double quotes into /etc/default/ovirt for MANAGED_BY key.

As temporary workaround until next test iso is available (to test the original report) is:

- Host is UP on RHEV-M
- on TUI press F2 edit /etc/default/ovirt and remove the double "" in MANAGED_BY
  Example, from : ""RHEV-M https://IP:443"" to "RHEV-M https://IP:443"
- Put host in maintainer in RHEV-M and execute the upgrade.

ovirt-node-upgrade.log
=========================
2014-12-22 15:50:23,706 - ERROR    - ovirt-node-upgrade - Error: Upgrade Failed: Unable to save to file!
Traceback (most recent call last):
  File "/usr/sbin/ovirt-node-upgrade", line 364, in run
    self._run_upgrade()
  File "/usr/sbin/ovirt-node-upgrade", line 255, in _run_upgrade
    if not upgrade.ovirt_boot_setup():
  File "/usr/lib/python2.7/site-packages/ovirtnode/install.py", line 687, in ovirt_boot_setup
  File "/usr/lib/python2.7/site-packages/ovirtnode/ovirtfunctions.py", line 371, in disable_firstboot
  File "/usr/lib/python2.7/site-packages/augeas.py", line 385, in save
IOError: Unable to save to file!

Comment 14 Ying Cui 2014-12-24 09:31:43 UTC
> We have two differente issues here, the original one that you reported and
> the other listed below which should be fixed by:
> ovirt-node-plugin-vdsm-0.2.0-17. This happened because augeas failed when we
> had unneeded double quotes into /etc/default/ovirt for MANAGED_BY key.

Thanks Douglas for detail explanation.
Let me try this workaround.

And I need to double confirm with you: do I need to split this bug into two? see my comment 11. Thanks.

Comment 15 Douglas Schilling Landgraf 2014-12-24 17:13:53 UTC
(In reply to Ying Cui from comment #14)
> > We have two differente issues here, the original one that you reported and
> > the other listed below which should be fixed by:
> > ovirt-node-plugin-vdsm-0.2.0-17. This happened because augeas failed when we
> > had unneeded double quotes into /etc/default/ovirt for MANAGED_BY key.
> 
> Thanks Douglas for detail explanation.
> Let me try this workaround.
> 
> And I need to double confirm with you: do I need to split this bug into two?
> see my comment 11. Thanks.

Hi Ying, it's up to you, if you want to make a record about it, yes. On the other hand, we already have the package wich should fix it (ovirt-node-plugin-vdsm-0.2.0-17).

Comment 16 Ying Cui 2014-12-25 07:32:15 UTC
> > Thanks Douglas for detail explanation.
> > Let me try this workaround.

Test workaround in comment 13 on rhevh 7.0 build and rhevh 6.0 build, works good. RHEVH 7.0 itself and RHEVH 6.0 itself can be upgraded via RHEVM, and Up in rhevm automatically after upgrading.
rhev-hypervisor7-7.0-20141218.0.el7ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el7.noarch
rhev-hypervisor6-6.6-20141218.0.el6ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el6.noarch

> > 
> > And I need to double confirm with you: do I need to split this bug into two?
> > see my comment 11. Thanks.
> 
> Hi Ying, it's up to you, if you want to make a record about it, yes. On the
> other hand, we already have the package wich should fix it
> (ovirt-node-plugin-vdsm-0.2.0-17).

Yeah, according to comment 13, two difference issues here, the another issue should be fixed in ovirt-node-plugin-vdsm component. so we'd better to open another bug on ovirt-node-plugin-vdsm component to record this change. Thanks.

Comment 17 Ying Cui 2014-12-25 07:51:44 UTC
new bug 1177216 on ovirt-node-plugin-vdsm component to trace unneeded double quotes into /etc/default/ovirt for MANAGED_BY key cause upgrade Failed issue.

Comment 18 Fabian Deutsch 2015-01-07 14:21:00 UTC
It looks like the "remaining" parts of this bug are the same cause which is described in bug 1179068.

Douglas, what do you think?

Comment 19 Douglas Schilling Landgraf 2015-01-07 17:44:35 UTC
(In reply to Fabian Deutsch from comment #18)
> It looks like the "remaining" parts of this bug are the same cause which is
> described in bug 1179068.
> 
> Douglas, what do you think?

Hi Fabian,

I was checking the original report description and reproduced the issue locally again. Basically, I would say that the report happened because of bz#1177216 mainly. For now, I will close as duplicate of bz#1177216, if there were other issues around this topic, we can re-open this one. 

shaochen/Ying if you have any questions, please let me know.

*** This bug has been marked as a duplicate of bug 1177216 ***


Note You need to log in before you can comment on or make changes to this bug.