Bug 1254196 - RHEV-H with HE upgrade failed on Step: RHEV_INSTALL via Hosted-Engine, vdsm service is not stopped
RHEV-H with HE upgrade failed on Step: RHEV_INSTALL via Hosted-Engine, vdsm s...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha (Show other bugs)
3.5.4
Unspecified Unspecified
urgent Severity high
: ovirt-3.5.6
: 3.5.6
Assigned To: Martin Sivák
Artyom
sla
: Triaged, ZStream
: 1271707 (view as bug list)
Depends On:
Blocks: 1275527
  Show dependency treegraph
 
Reported: 2015-08-17 08:15 EDT by Ying Cui
Modified: 2016-02-10 14:19 EST (History)
31 users (show)

See Also:
Fixed In Version: ovirt-hosted-engine-ha-1.2.8-1
Doc Type: Bug Fix
Doc Text:
Previously, if VDSM was not able to be restarted during an upgrade of the self-hosted engine, the hosted-engine deployment script would fail. Now, local maintenance mode is used during the upgrade and VDSM state does not interfere with upgrade procedure.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-01 14:55:35 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ykaplan: needinfo+


Attachments (Terms of Use)
rhevh_var_log (1.36 MB, application/x-bzip)
2015-08-17 08:33 EDT, Ying Cui
no flags Details
sosreport_rhevh (6.87 MB, application/x-xz)
2015-08-17 08:34 EDT, Ying Cui
no flags Details
engine.log (2.80 MB, text/plain)
2015-08-17 08:37 EDT, Ying Cui
no flags Details
hosted-deploy log (270.53 KB, text/plain)
2015-08-17 08:38 EDT, Ying Cui
no flags Details
screenshot.png (184.90 KB, image/png)
2015-08-17 08:40 EDT, Ying Cui
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47335 ovirt-hosted-engine-ha-1.2 MERGED Do not start any services while in maintenance mode Never

  None (edit)
Description Ying Cui 2015-08-17 08:15:20 EDT
Description of problem:
Setup Two RHEV-H with Hosted Engine successful.
Try to _upgrade_ one of two RHEV-H via Hosted Engine, but failed on Step: RHEV_INSTALL. 

Version-Release number of selected component (if applicable):
# rpm -qa ovirt-node ovirt-hosted-engine-setup ovirt-hosted-engine-ha vdsm kernel ovirt-node-plugin-hosted-engine
ovirt-hosted-engine-setup-1.2.5.3-1.el7ev.noarch
vdsm-4.16.24-2.el7ev.x86_64
ovirt-hosted-engine-ha-1.2.6-2.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
kernel-3.10.0-229.11.1.el7.x86_64
ovirt-node-3.2.3-18.el7.noarch
# cat /etc/rhev-hypervisor-release 
Red Hat Enterprise Virtualization Hypervisor release 7.1 (20150813.0.el7ev)

How reproducible:
100%


Steps to Reproduce:
1. Setup HE on the first RHEV-H successful.
    - nfs storage
    - em1 
2. Setup additional HE on second RHEV-H successful.
3. All above RHEV-H are UP in Hosted Engine
4. Download RHEV-H 7.1 20150813.0.el7ev into RHEV-M to make RHEV-H 7.1 iso listed in Install Page of upgrade.
5. Maintenance the second RHEV-H
6. Click on 'Upgrade' in rhevm portal - Host sheet.
7. Selected rhevh iso which you want to upgrade.

Actual results:
Upgrade failed. 

Expected results:
Upgrade RHEV-H via Hosted-engine successful.


Additional info:

1. [root@dhcp-11-107 updates]# ll
total 244736
-rw-r--r--. 1 root root 250609664 Aug 17 11:51 ovirt-node-image.iso

2. 
<snip>
2015-08-17 07:51:09,812 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (org.ovirt.thread.pool-7-thread-15) [55a55548] Installation dhcp-11-107.nay.redhat.com: Sending file /usr/share/rhev-hypervisor/rhevh-7.1-20150813.0.el7ev.iso to /data/updates/ovirt-node-image.iso
2015-08-17 07:51:10,210 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-15) [55a55548] Correlation ID: 55a55548, Call Stack: null, Custom Event ID: -1, Message: Installing Host hosted_engine_2. Sending file /usr/share/rhev-hypervisor/rhevh-7.1-20150813.0.el7ev.iso to /data/updates/ovirt-node-image.iso.
2015-08-17 07:51:10,211 INFO  [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-15) SSH execute root@dhcp-11-107.nay.redhat.com 'mkdir -p '/data/updates''
2015-08-17 07:51:23,340 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (org.ovirt.thread.pool-7-thread-15) [55a55548] Installation dhcp-11-107.nay.redhat.com: Executing /usr/share/vdsm-reg/vdsm-upgrade
2015-08-17 07:51:23,444 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-15) [55a55548] Correlation ID: 55a55548, Call Stack: null, Custom Event ID: -1, Message: Installing Host hosted_engine_2. Executing /usr/share/vdsm-reg/vdsm-upgrade.
2015-08-17 07:51:23,444 INFO  [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-15) SSH execute root@dhcp-11-107.nay.redhat.com '/usr/share/vdsm-reg/vdsm-upgrade'
2015-08-17 07:51:23,574 INFO  [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) update from host dhcp-11-107.nay.redhat.com: <BSTRAP component="ovirt-node-upgrade" status="OK" message="ovirt-node-upgrade.UpgradeTool: INFO     Temporary Directory is: /data/tmpHVhJJ6&#10;"/>
2015-08-17 07:51:23,575 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (OVirtNodeUpgrade) Installation dhcp-11-107.nay.redhat.com: Step: ovirt-node-upgrade; Details: ovirt-node-upgrade.UpgradeTool: INFO     Temporary Directory is: /data/tmpHVhJJ6
 
2015-08-17 07:51:23,581 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-15) SSH error running command root@dhcp-11-107.nay.redhat.com:'/usr/share/vdsm-reg/vdsm-upgrade': java.io.IOException: Command returned failure code 1 during SSH session 'root@dhcp-11-107.nay.redhat.com'
	at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:527) [uutils.jar:]
	at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:318) [uutils.jar:]
	at org.ovirt.engine.core.bll.OVirtNodeUpgrade.execute(OVirtNodeUpgrade.java:215) [bll.jar:]
...
2015-08-17 07:51:26,049 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (OVirtNodeUpgrade) Correlation ID: 55a55548, Call Stack: null, Custom Event ID: -1, Message: Installing Host hosted_engine_2. Step: ovirt-node-upgrade; Details: RuntimeError: Previous upgrade completed, you must reboot
 .
2015-08-17 07:51:26,049 INFO  [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) update from host dhcp-11-107.nay.redhat.com: <BSTRAP component="ovirt-node-upgrade" status="FAIL" message="Upgraded Failed"/>

</snip>
Comment 1 Ying Cui 2015-08-17 08:33:27 EDT
Created attachment 1063789 [details]
rhevh_var_log
Comment 2 Ying Cui 2015-08-17 08:34:36 EDT
Created attachment 1063791 [details]
sosreport_rhevh
Comment 3 Ying Cui 2015-08-17 08:37:33 EDT
Created attachment 1063804 [details]
engine.log
Comment 4 Ying Cui 2015-08-17 08:38:25 EDT
Created attachment 1063806 [details]
hosted-deploy log
Comment 5 Ying Cui 2015-08-17 08:40:04 EDT
Created attachment 1063807 [details]
screenshot.png
Comment 6 Douglas Schilling Landgraf 2015-08-17 08:41:11 EDT
From vdsm-upgrade log I see vdsm cannot be stopped and raised the upgrade issue.

2015-08-17 10:27:10,658 - INFO     - ovirt-node-upgrade - Running pre-upgrade hooks
2015-08-17 10:27:10,658 - INFO     - ovirt-node-upgrade - Running: 01-vdsm
2015-08-17 10:27:10,658 - DEBUG    - ovirt-node-upgrade - ('/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm',)
2015-08-17 10:27:16,551 - DEBUG    - ovirt-node-upgrade - [u'/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm: Stopping vdsmd to upgrade']
2015-08-17 10:27:16,551 - DEBUG    - ovirt-node-upgrade - Failed to stop vdsdm: Error:  ServiceOperationError: _systemctlStop failed
Job for vdsmd.service canceled.



2015-08-17 10:27:16,551 - ERROR    - ovirt-node-upgrade - Error: Upgrade Failed: Command Failed: '('/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm',)' [u'/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm: Stopping vdsmd to upgrade']
Traceback (most recent call last):
  File "/usr/sbin/ovirt-node-upgrade", line 365, in run
    self._run_hooks("pre-upgrade")
  File "/usr/sbin/ovirt-node-upgrade", line 197, in _run_hooks
    self._system(hook)
  File "/usr/sbin/ovirt-node-upgrade", line 145, in _system
    raise RuntimeError("Command Failed: '%s' %s" % (command, output))
RuntimeError: Command Failed: '('/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm',)' [u'/usr/libexec/ovirt-node/hooks/pre-upgrade/01-vdsm: Stopping vdsmd to upgrade']
Comment 9 Ying Cui 2015-08-17 08:49:36 EDT
Additional: To reproduce this issue, you must setup hosted-engine on RHEV-H, the upgrade this RHEV-H via Hosted Engine.
Comment 11 Douglas Schilling Landgraf 2015-08-17 10:33:11 EDT
Tried to stop vdsm manually: 

#1 - checking vdsm status:
------------------------------
# /bin/systemctl status  vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Mon 2015-08-17 10:30:12 UTC; 3h 59min ago

# 2 - Stopping vdsm manually
------------------------------
# /bin/systemctl stop  vdsmd.service
Redirecting to /bin/systemctl stop  vdsmd.service
Job for vdsmd.service canceled.

# 3 - Checking status
------------------------------
# bin/systemctl status  -l vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: deactivating (stop-sigterm) since Mon 2015-08-17 14:29:32 UTC; 5s ago
  Process: 8775 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 8887 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─ 8887 /usr/bin/python /usr/share/vdsm/vdsm
           ├─29047 /usr/libexec/ioprocess --read-pipe-fd 29 --write-pipe-fd 20 --max-threads 10 --max-queued-requests 10
           ├─29053 /usr/libexec/ioprocess --read-pipe-fd 54 --write-pipe-fd 50 --max-threads 10 --max-queued-requests 10
           ├─29056 /usr/libexec/ioprocess --read-pipe-fd 40 --write-pipe-fd 34 --max-threads 10 --max-queued-requests 10
           ├─29067 /usr/libexec/ioprocess --read-pipe-fd 50 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10
           └─29069 /usr/libexec/ioprocess --read-pipe-fd 64 --write-pipe-fd 60 --max-threads 10 --max-queued-requests 10

Aug 17 10:30:12 dhcp-11-107.nay.redhat.com python[8887]: DIGEST-MD5 ask_user_info()
Aug 17 10:30:12 dhcp-11-107.nay.redhat.com python[8887]: DIGEST-MD5 make_client_response()
Aug 17 10:30:12 dhcp-11-107.nay.redhat.com python[8887]: DIGEST-MD5 client step 3
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager...
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com vdsm[8887]: vdsm IOProcessClient ERROR IOProcess failure
                                                       Traceback (most recent call last):
                                                         File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
                                                       Exception: FD closed
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com vdsm[8887]: vdsm IOProcessClient ERROR IOProcess failure
                                                       Traceback (most recent call last):
                                                         File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
                                                       Exception: FD closed
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com vdsm[8887]: vdsm IOProcessClient ERROR IOProcess failure
                                                       Traceback (most recent call last):
                                                         File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
                                                       Exception: FD closed
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com vdsm[8887]: vdsm IOProcessClient ERROR IOProcess failure
                                                       Traceback (most recent call last):
                                                         File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
                                                       Exception: FD closed
Aug 17 14:29:32 dhcp-11-107.nay.redhat.com vdsm[8887]: vdsm IOProcessClient ERROR IOProcess failure
                                                       Traceback (most recent call last):
                                                         File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
                                                       Exception: FD closed
Aug 17 14:29:33 dhcp-11-107.nay.redhat.com systemd[1]: Starting Virtual Desktop Server Manager...


Dan, could you please review this one? Looks like an error in ioprocess and vdsm.

Thanks!
Comment 12 Ying Cui 2015-08-18 04:46:46 EDT
lower this priority because we can reproduce this issue on second rhevh host with HE, but the third rhevh with HE and the forth rhevh with HE can be upgraded via hosted-engine successful.

I am sure I did the same steps on these rhevh hosts.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/Upgrading_the_Self-Hosted_Engine.html

How reproducible:
30%
Comment 13 Yeela Kaplan 2015-08-18 06:57:42 EDT
The ioprocess traceback you see is related to BZ#1189200, which needs to be fixed and is only a redundant log that we need to remove.

but is not related to any problems with stopping the vdsm service.
Comment 14 wanghui 2015-08-18 07:03:02 EDT
Still encounter this issue in rhev-hypervisor6-6.7-20150813.0.
Comment 15 Yeela Kaplan 2015-08-18 09:18:01 EDT
Douglas,
Can you provide the vdsm log for comment 11?
Comment 17 Yeela Kaplan 2015-08-18 11:15:00 EDT
I have provided a patch solving BZ#1189200: 
https://gerrit.ovirt.org/#/c/45038/

You can try and reproduce the issue with it.
But I strongly suggest you keep investigating the issue, and not assume that BZ#1189200 is a blocker for this bug.
As there is nothing in the issue over there that should prevent vdsm from stopping as a service.
Comment 18 Douglas Schilling Landgraf 2015-08-18 11:22:16 EDT
(In reply to Yeela Kaplan from comment #17)
> I have provided a patch solving BZ#1189200: 
> https://gerrit.ovirt.org/#/c/45038/
> 
> You can try and reproduce the issue with it.
> But I strongly suggest you keep investigating the issue, and not assume that
> BZ#1189200 is a blocker for this bug.
> As there is nothing in the issue over there that should prevent vdsm from
> stopping as a service.

The logs are pointing to it. If there is other vdsm issue hiding behind your fix will discover it. If you can provide a VDSM downstream build with your fix we can include into a scratch-build of RHEV-H and re-test it.
Comment 19 Yeela Kaplan 2015-08-18 11:52:05 EDT
the ioprocess traceback you see in the vdsm log is just noise.
Please try to reproduce the issue so we can find the real problem hiding behind it.
Comment 20 Douglas Schilling Landgraf 2015-08-18 15:22:45 EDT
(In reply to Yeela Kaplan from comment #19)
> the ioprocess traceback you see in the vdsm log is just noise.
> Please try to reproduce the issue so we can find the real problem hiding
> behind it.

Ying, do you mind to provide again a reproducer machine so VDSM folks can investigate it?
Comment 21 Barak Korren 2015-08-19 04:03:34 EDT
should this issue be resolved by the following commit?
https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=c6f3e6eba71c1166281f4ef7e9be1ad7000f4e77
Please properly assign and ack the bug.
Comment 24 Yeela Kaplan 2015-08-20 03:53:55 EDT
Douglas, 
After investigating further and connecting to the hosts Ying provided, 
the vdsm is killed only on host RHEV-H-2.
Meaning the job is cancelled and vdsm got a kill signal.

We tried a different approach:
Stopping the ha services (ovirt-ha-broker, ovirt-ha-agent).
It fixed the problem on this machine. 
Meaning vdsm stopped normally with the signal TERM.


We can still see the IOProcess error in the log, meaning it is still there when vdsm stops normally now.
This log is not related and is just noise as the IOProcesses get the same signals the main vdsm process gets, causing it to close its fds and the python bindings to raise an FD closed exception which is expected behavior.


I guess that we should ask the ha guys why it is keeping vdsm from closing...
Comment 25 Yaniv Lavi (Dary) 2015-08-20 08:16:17 EDT
Will this bug hit us if we fix this for the next images and not in the current one?
Is there a workaround? if there is, what is it?
Comment 26 Yeela Kaplan 2015-08-20 08:21:08 EDT
continuing comment 24, removing BZ#1189200 dependency.
Comment 27 Douglas Schilling Landgraf 2015-08-20 10:22:52 EDT
(In reply to Yeela Kaplan from comment #26)
> continuing comment 24, removing BZ#1189200 dependency.

Yeela, if the ioprocess patch only suppress the errors messages from vdsm and do not impact anything else would be nice to include it into 3.5.4 as we are going to rebuild the rhev-h anyway. These errors messages are pretty bad/confusing IMO.

Yeela/Yaniv, what do you think?
Comment 28 Douglas Schilling Landgraf 2015-08-20 10:27:10 EDT
(In reply to Yaniv Dary from comment #25)
> Will this bug hit us if we fix this for the next images and not in the
> current one?

Yes because we require to stop the vdsm in the current version of node to proceed with the upgrade.

> Is there a workaround? if there is, what is it?

Probably requesting users to go to the unsupported shell via (F2 key) and stopping manually vdsm before upgrading the iso via oVirt Web admin.

Ying, does it also happens during cdrom/usb upgrade? Maybe we could have it as workaround?
Comment 29 Yeela Kaplan 2015-08-20 10:31:06 EDT
Fix will only be available in el7.2.
Not even on vdsm master.
As it depeneds on an el7 bug.
Comment 30 Douglas Schilling Landgraf 2015-08-20 10:46:15 EDT
(In reply to Douglas Schilling Landgraf from comment #28)
> (In reply to Yaniv Dary from comment #25)
> > Will this bug hit us if we fix this for the next images and not in the
> > current one?
> 
> Yes because we require to stop the vdsm in the current version of node to
> proceed with the upgrade.
> 
> > Is there a workaround? if there is, what is it?
> 
> Probably requesting users to go to the unsupported shell via (F2 key) and
> stopping manually vdsm before upgrading the iso via oVirt Web admin.
> 
> Ying, does it also happens during cdrom/usb upgrade? Maybe we could have it
> as workaround?

Ying, answering myself after talking with Yaniv. Doing cdrom/usb upgrade will be bad for remote/servers, we can ignore this approach for now.
Comment 31 Yaniv Lavi (Dary) 2015-08-20 10:55:48 EDT
(In reply to Douglas Schilling Landgraf from comment #30)
> (In reply to Douglas Schilling Landgraf from comment #28)
> > (In reply to Yaniv Dary from comment #25)
> > > Will this bug hit us if we fix this for the next images and not in the
> > > current one?
> > 
> > Yes because we require to stop the vdsm in the current version of node to
> > proceed with the upgrade.
> > 
> > > Is there a workaround? if there is, what is it?
> > 
> > Probably requesting users to go to the unsupported shell via (F2 key) and
> > stopping manually vdsm before upgrading the iso via oVirt Web admin.
> > 
> > Ying, does it also happens during cdrom/usb upgrade? Maybe we could have it
> > as workaround?
> 
> Ying, answering myself after talking with Yaniv. Doing cdrom/usb upgrade
> will be bad for remote/servers, we can ignore this approach for now.

It should not be ignored it is a workaround option, but without manual steps for remote hosts, it's a blocker.
Comment 33 Douglas Schilling Landgraf 2015-08-21 15:42:45 EDT
Hi,

Just to mention that I couldn't reproduce this report, below my steps. 

Phase 1:
----------
#1 Installed rhev-hypervisor6-6.7-20150813.0.iso
#2 Configured Hostname/Network via TUI (eth0)
#3 Configured Hosted Engine via TUI:
    - provided RHEL 6.7 as ISO for installation of 
      RHEVM-3.5 (3.5.4.2-1.3.el6ev)
    - nfs storage
    - Installed RHEVM 3.5 in the VM
    - Finished the configuration of Hosted Engine and everything is working.

#4 Installed the RPM rhev-hypervisor6-6.7-20150813.0.el6ev.noarch.rpm into RHEV-M

Phase 2:
------------
#5 After the first hosted engine is UP, installed a second machine with 
   rhev-hypervisor6-6.7-20150813.0
#6 Configure Hostname/Network to all nodes and engine communicate
#7 In Hosted Engine Tab via TUI selected "Start Additional host setup" and during the process provide the same NFS storage to include this RHEV-H into the existing Hosted Engine instance. 
 
After hosted-engine setup, everything should be working and the two hosts 
should be UP in RHEV-M.

Phase 3:
---------
#8 The two hosts are UP in RHEV-M, select the last which was added and put in maint. 
#9 Right click in the Host, select upgrade and the ISO to be upgraded
#10 The upgrade happened without any issue, hosted rebooted and later became UP.
Comment 36 Martin Sivák 2015-08-24 11:18:14 EDT
Douglas: The failing test (#11) uses systemctl (systemd) and the test where you can't reproduce uses sysV (RHEL 6 based image, #33).

Could it be that systemd refused to stop vdsm? If it is so, then we need to know why. Hosted engine depends on VDSM, but systemd should be smart enough to kill it too.
Comment 37 Douglas Schilling Landgraf 2015-08-24 11:54:25 EDT
(In reply to Martin Sivák from comment #36)
> Douglas: The failing test (#11) uses systemctl (systemd) and the test where
> you can't reproduce uses sysV (RHEL 6 based image, #33).

Good catch Martin. I have tried both systems el6 and el7. The comment#11 was in the bogus machine QE provided.

> 
> Could it be that systemd refused to stop vdsm? If it is so, then we need to
> know why. Hosted engine depends on VDSM, but systemd should be smart enough
> to kill it too.

Agreed. This report is in my radar, I will give a new try today.

Thanks !
Comment 41 Yaniv Lavi (Dary) 2015-09-03 04:21:06 EDT
Any updates on this issue?
Comment 44 Doron Fediuck 2015-09-21 09:11:30 EDT
Will this bz be ready for 3.5.5?
Comment 45 Fabian Deutsch 2015-10-05 10:45:03 EDT
No, it is targeted for 3.5.6
Comment 46 Ryan Barry 2015-10-05 11:41:49 EDT
Martin/Martin -

I'm not sure who's responsible for some interactions in this component -- when a host is put into maintenance, does it also put that host into maintenance for hosted engine? 

It seems like this may not be happening, and it's something that we can do from ovirt-node-upgrade on the node side, but that it should already be done.

What's the expected flow?
Comment 47 Martin Sivák 2015-10-06 04:09:43 EDT
Hi Ryan, yes putting host to maintenance should also maintenance hosted engine (local mode). This bug might also be related to https://gerrit.ovirt.org/#/c/45842/ so it might be already fixed.

Do we have any recent test results with new enough hosted engine (3.6)?
Comment 48 Ryan Barry 2015-10-08 12:22:05 EDT
We do have 3.6 builds on 7.2 available. Do you know what the NVR of the package with the fix would be?

Ying: has this been tested on 3.6/7.2?
Comment 49 Ying Cui 2015-10-09 06:59:59 EDT
(In reply to Ryan Barry from comment #48)
> We do have 3.6 builds on 7.2 available. Do you know what the NVR of the
> package with the fix would be?
> 
> Ying: has this been tested on 3.6/7.2?

Since the first rhevh 7.2 for 3.6.0 build, we are blocked by critical bugs long time.
A serial bugs... bug 1260470, bug 1267437, bug 1260548, bug 1260551 ,bug 1260559, bug 1270203 and bug 1267470 ...

We CAN NOT test this bug now due to these critical bugs on our node.
Comment 50 Fabian Deutsch 2015-10-14 10:30:59 EDT
Raising priority because it's blocking testing.

We are possibly also seeing this in 3.5.z in bug 1271707
Comment 52 Anatoly Litovsky 2015-10-15 08:19:35 EDT
*** Bug 1271707 has been marked as a duplicate of this bug. ***
Comment 54 Fabian Deutsch 2015-10-22 10:27:44 EDT
Yeela, what is the bug you are referencing in comment 29?
Comment 55 Yeela Kaplan 2015-10-22 10:34:53 EDT
I am referring to BZ#1189200 in comment 29, but
the traceback seen in vdsm log on vdsm restart is unrelated to this bug... 
It is just noise log that will be removed. not a real bug.
Comment 56 Fabian Deutsch 2015-10-22 14:24:14 EDT
So, to me this problem described in this bug (IIUIC) is not Node specific.

IIUIC the problem is that the node is not brought into the right maintenance mode on el7 hosts (comment 36), and using a manual workaround (comment 38) fixes the issue.

Simone, what is the expected flow on RHEL-H hosts? I sthere any user intervention needed?
And with that in mind, what does this mean for Node? Can the whole process be automated?
Comment 57 Ying Cui 2015-10-23 03:41:05 EDT
(In reply to Fabian Deutsch from comment #56)
> IIUIC the problem is that the node is not brought into the right maintenance
> mode on el7 hosts (comment 36), and using a manual workaround (comment 38)
> fixes the issue.

Need to be noticed this issue occurred on rhev-h 6.7.el6 as well. see comment 14.
Comment 59 Sandro Bonazzola 2015-10-27 06:30:23 EDT
This is 3.5.z only.
Comment 60 Sandro Bonazzola 2015-10-29 08:04:52 EDT
Martin can you fill doc text?
Comment 62 Artyom 2015-11-05 04:36:20 EST
To verify this bug I need start from RHEV-H 3.5.6 and upgrade to greater version.
Comment 63 Gil Klein 2015-11-11 02:17:00 EST
(In reply to Artyom from comment #62)
> To verify this bug I need start from RHEV-H 3.5.6 and upgrade to greater
> version.
Fabian, Is there a way you can think of that might help us simulate this ?
Comment 64 Fabian Deutsch 2015-11-11 05:19:25 EST
The bug is in upgrading a host which is involved in HE.
I don't see how we can shortcut this.
Comment 65 Artyom 2015-11-12 06:49:03 EST
Upgrade succeed via engine from:
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151028.0.el6ev)
ovirt-hosted-engine-ha-1.2.8-1.el6ev.noarch
to
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151029.0.el6ev)
ovirt-hosted-engine-ha-1.2.8-1.el6ev.noarch
Comment 67 errata-xmlrpc 2015-12-01 14:55:35 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2529.html

Note You need to log in before you can comment on or make changes to this bug.