Bug 1427849 - [atomic] The engine starts HA VM if the VM powered off from the guest OS
Summary: [atomic] The engine starts HA VM if the VM powered off from the guest OS
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-guest-agent
Version: 4.1.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.1.8
: ---
Assignee: Tomáš Golembiovský
QA Contact: Jiri Belka
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-01 12:03 UTC by Artyom
Modified: 2019-05-16 12:54 UTC (History)
12 users (show)

(edit)
Previously, during Atomic host shutdown, the container was killed before the Guest Agent had a chance to send 'session-shutdown' message to VDSM host. This is now fixed.
Clone Of:
(edit)
Last Closed: 2018-01-05 16:12:52 UTC


Attachments (Terms of Use)
vdsm and engine logs (1.31 MB, application/zip)
2017-08-28 12:52 UTC, Artyom
no flags Details
logs (1.48 MB, application/zip)
2017-11-27 13:34 UTC, Artyom
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0049 normal SHIPPED_LIVE Important: ovirt-guest-agent-docker security and bug fix update 2018-01-05 20:50:02 UTC

Description Artyom 2017-03-01 12:03:24 UTC
Description of problem:
The engine starts HA VM if the VM powered off from the guest OS

Version-Release number of selected component (if applicable):
rhevm-4.0.7.3-0.1.el7ev.noarch

Atomic guest agent image:
# atomic images list
   REPOSITORY                          TAG                                                IMAGE ID       CREATED            VIRTUAL SIZE   TYPE      
>  vfeenstr/rhevm-guest-agent-docker   rhevm-4.0-rhel-7-docker-candidate-20170224020507   88039b982959   2017-02-24 07:14   473.43 MB      Docker    

[root@test ~]# atomic images info 88039b982959
Image Name: vfeenstr/rhevm-guest-agent-docker:rhevm-4.0-rhel-7-docker-candidate-20170224020507
io.k8s.description: This is the RHEVM management agent running inside the guest. The agent interfaces with the RHEV manager, supplying heart-beat info as well as run-time data from within the guest itself. The agent also accepts control commands to be run executed within the OS (like: shutdown and restart).
STOP: docker kill --signal=TERM ${NAME}
Version: 1.0.12
INSTALL: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-install.sh
vendor: Red Hat, Inc.
description: The Red Hat Enterprise Linux Base image is designed to be a fully supported foundation for your containerized applications.  This base image provides your operations and application teams with the packages, language runtimes and tools necessary to run, maintain, and troubleshoot all of your applications. This image is maintained by Red Hat and updated regularly. It is designed and engineered to be the base layer for all of your containerized applications, middleware and utilites. When used as the source for all of your containers, only one copy will ever be downloaded and cached in your production environment. Use this image just like you would a regular Red Hat Enterprise Linux distribution. Tools like yum, gzip, and bash are provided by default. For further information on how this image was built look at the /root/anacanda-ks.cfg file.
authoritative-source-url: registry.access.redhat.com
io.k8s.display-name: RHEVM Guest Agent
version: 1.0.12
vcs-ref: 25865513b0890f8e962b87893acdf93f8079e3c0
com.redhat.component: rhevm-guest-agent-docker
distribution-scope: public
run: docker run --privileged --pid=host --net=host -v /:/host -e HOST=/host -v /proc:/hostproc -v /dev/virtio-ports/com.redhat.rhevm.vdsm:/dev/virtio-ports/com.redhat.rhevm.vdsm --env container=docker --restart=always -e IMAGE=IMAGE -e NAME=NAME IMAGE
Name: rhev4/rhevm-guest-agent
vcs-type: git
com.redhat.build-host: ip-10-29-120-149.ec2.internal
Release: 10
BZComponent: rhevm-guest-agent-docker
build-date: 2017-02-24T02:06:40.898691
UNINSTALL: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-uninstall.sh
RUN: docker run --privileged --pid=host --net=host -v /:/host -e HOST=/host -v /proc:/hostproc -v /dev/virtio-ports/com.redhat.rhevm.vdsm:/dev/virtio-ports/com.redhat.rhevm.vdsm --env container=docker --restart=always -e IMAGE=IMAGE -e NAME=NAME IMAGE
name: rhev4/rhevm-guest-agent
license: ASL 2.0
summary: The RHEVM Guest Agent
architecture: x86_64
install: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-install.sh
release: 10
io.openshift.tags: base rhel7
uninstall: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-uninstall.sh


How reproducible:
Always

Steps to Reproduce:
1. Create atomic HA VM
2. Start the VM
3. Load the relevant docker image: # docker load -i docker-image.tar.gz
4. Install the relevant image: # atomic install IMAGE_ID
5. Run the relevant image: # atomic run IMAGE_ID
6. Poweroff the VM from the guest OS: # poweroff

Actual results:
The engine restart the VM

Expected results:
The engine leave the VM in the state DOWN

Additional info:
Check the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1406033#c27

Comment 1 Michal Skrivanek 2017-08-22 08:09:20 UTC
please check if verification of bug 1341106 (by you) is satisfactory. If not, please retest

Comment 3 Michal Skrivanek 2017-08-28 12:49:13 UTC
alright, can you please attach vdsm.log?

Comment 4 Artyom 2017-08-28 12:52 UTC
Created attachment 1319070 [details]
vdsm and engine logs

You can start looking from:

2017-08-28 15:37:30,453+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-13) [] VM '06fefa09-a213-410c-ae12-149b0de90f42'(atomic-vm) moved from 'Up' --> 'Down'
2017-08-28 15:37:30,521+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [] EVENT_ID: VM_DOWN_ERROR(119), VM atomic-vm is down with error. Exit message: VM has been terminated on the host.

Comment 5 Michal Skrivanek 2017-11-24 11:11:12 UTC
Hi Artyom, we still need to see the guest side logs from ovirt-guest-agent and/or vdsm logs with debug level. 
The agent needs to shut down cleanly and send session-shutdown message in the last 240s before the actual qemu termination in order to detect it correctly on Atomic.

Comment 6 Artyom 2017-11-27 13:34 UTC
Created attachment 1359462 [details]
logs

Checked again on:
# atomic images info 867512d0966f
Image Name: 867512d0966f
architecture: x86_64
atomic.type: system
authoritative-source-url: registry.access.redhat.com
build-date: 2017-11-22T18:15:26.566179
com.redhat.build-host: rcm-img-docker02.build.eng.bos.redhat.com
com.redhat.component: ovirt-guest-agent-docker
description: The ovirt-guest-agent is providing information about the virtual machine and allows to restart / shutdown the machine via the RHV Portal. This image is intended to be used with virtual machines running RHEL 7 Atomic Host.
distribution-scope: public
io.k8s.description: The ovirt-guest-agent is providing information about the virtual machine and allows to restart / shutdown the machine via the RHV Portal. This image is intended to be used with virtual machines running RHEL 7 Atomic Host.
io.k8s.display-name: oVirt Guest Agent
io.openshift.tags: base rhel7
license: ASL 2.0
maintainer: Tomas Golembiovsky <tgolembi@redhat.com>
name: rhev4/ovirt-guest-agent
release: 40
summary: The oVirt Guest Agent
url: https://access.redhat.com/containers/#/registry.access.redhat.com/rhev4/ovirt-guest-agent/images/1.0.13-40
vcs-ref: 4cc91717604b2ed1e495c2001dcefe9a73309388
vcs-type: git
vendor: Red Hat, Inc.
version: 1.0.13

and vdsm-4.20.8-1.el7ev.x86_64

For some reason, ovirt-guest-agent does not fill the log(I believe container issue), so I just provide snapshot from journalctl -u ovirt-guest-agent.service.

You can start looking from:
2017-11-27 15:24:58,154+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), VM atomic-vm is down with error. Exit message: VM has been terminated on the host.

Also I can provide the environment, so please just ping me.

Comment 8 Tomáš Golembiovský 2017-11-28 18:48:08 UTC
It seems we kill the container before the agent can send the session-shutdown message.

Comment 9 Michal Skrivanek 2017-11-28 19:20:06 UTC
which would mean the atomic os is not set up correctly on shutdown and it does not wait for ovirt-ga-docker to terminate cleanly. Is that possible to configure somehow?

Comment 10 Tomáš Golembiovský 2017-11-29 12:41:33 UTC
It's a bug in our container not in atomic. Easy to fix though.

Comment 13 Jiri Belka 2017-12-19 15:44:04 UTC
looks ok

# atomic containers list --no-trunc
   CONTAINER ID                                                             IMAGE                                                                                                                               NAME                                                                     COMMAND                                                           CREATED          STATE      BACKEND    RUNTIME   
   ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhev4/ovirt-guest-agent:rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py 2017-12-19 15:27 running    ostree     runc 

no tuned inside container.

Comment 16 Jiri Belka 2018-01-03 11:54:48 UTC
ok, ovirt-guest-agent-docker-1.0.14-3

2018-01-03 12:40:51,380+01 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler2) [55e0befd] VM '027ef96e-544c-4ced-a267-7ea89cc9464a'(jbelka-atomic-02) moved from 'PoweringUp' --> 'Up'
2018-01-03 12:40:51,407+01 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [55e0befd] EVENT_ID: USER_RUN_VM(32), Correlation ID: f57ac60c-bbb8-4f07-802d-39076a2f57a5, Job ID: 75ca4d30-4960-47c1-94cd-be69dcd1924b, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM jbelka-atomic-02 started on Host slot-7c
2018-01-03 12:42:54,926+01 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [50c96e8b] VM '027ef96e-544c-4ced-a267-7ea89cc9464a'(jbelka-atomic-02) moved from 'Up' --> 'Down'
2018-01-03 12:42:55,118+01 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [50c96e8b] EVENT_ID: VM_DOWN(61), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM jbelka-atomic-02 is down. Exit message: User shut down from within the guest

from brew task id 635712

# runc list
ID                                                                         PID         STATUS      BUNDLE                                                                                                  CREATED                          OWNER
ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-96798-20180102144148   831         running     /var/lib/containers/atomic/ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-96798-20180102144148.0   2018-01-03T11:52:12.875767314Z   root

Comment 19 errata-xmlrpc 2018-01-05 16:12:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0049

Comment 20 Franta Kust 2019-05-16 12:54:46 UTC
BZ<2>Jira re-sync


Note You need to log in before you can comment on or make changes to this bug.