Description of problem: The engine starts HA VM if the VM powered off from the guest OS Version-Release number of selected component (if applicable): rhevm-4.0.7.3-0.1.el7ev.noarch Atomic guest agent image: # atomic images list REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE TYPE > vfeenstr/rhevm-guest-agent-docker rhevm-4.0-rhel-7-docker-candidate-20170224020507 88039b982959 2017-02-24 07:14 473.43 MB Docker [root@test ~]# atomic images info 88039b982959 Image Name: vfeenstr/rhevm-guest-agent-docker:rhevm-4.0-rhel-7-docker-candidate-20170224020507 io.k8s.description: This is the RHEVM management agent running inside the guest. The agent interfaces with the RHEV manager, supplying heart-beat info as well as run-time data from within the guest itself. The agent also accepts control commands to be run executed within the OS (like: shutdown and restart). STOP: docker kill --signal=TERM ${NAME} Version: 1.0.12 INSTALL: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-install.sh vendor: Red Hat, Inc. description: The Red Hat Enterprise Linux Base image is designed to be a fully supported foundation for your containerized applications. This base image provides your operations and application teams with the packages, language runtimes and tools necessary to run, maintain, and troubleshoot all of your applications. This image is maintained by Red Hat and updated regularly. It is designed and engineered to be the base layer for all of your containerized applications, middleware and utilites. When used as the source for all of your containers, only one copy will ever be downloaded and cached in your production environment. Use this image just like you would a regular Red Hat Enterprise Linux distribution. Tools like yum, gzip, and bash are provided by default. For further information on how this image was built look at the /root/anacanda-ks.cfg file. authoritative-source-url: registry.access.redhat.com io.k8s.display-name: RHEVM Guest Agent version: 1.0.12 vcs-ref: 25865513b0890f8e962b87893acdf93f8079e3c0 com.redhat.component: rhevm-guest-agent-docker distribution-scope: public run: docker run --privileged --pid=host --net=host -v /:/host -e HOST=/host -v /proc:/hostproc -v /dev/virtio-ports/com.redhat.rhevm.vdsm:/dev/virtio-ports/com.redhat.rhevm.vdsm --env container=docker --restart=always -e IMAGE=IMAGE -e NAME=NAME IMAGE Name: rhev4/rhevm-guest-agent vcs-type: git com.redhat.build-host: ip-10-29-120-149.ec2.internal Release: 10 BZComponent: rhevm-guest-agent-docker build-date: 2017-02-24T02:06:40.898691 UNINSTALL: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-uninstall.sh RUN: docker run --privileged --pid=host --net=host -v /:/host -e HOST=/host -v /proc:/hostproc -v /dev/virtio-ports/com.redhat.rhevm.vdsm:/dev/virtio-ports/com.redhat.rhevm.vdsm --env container=docker --restart=always -e IMAGE=IMAGE -e NAME=NAME IMAGE name: rhev4/rhevm-guest-agent license: ASL 2.0 summary: The RHEVM Guest Agent architecture: x86_64 install: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-install.sh release: 10 io.openshift.tags: base rhel7 uninstall: docker run --rm --privileged --pid=host -v /:/host -e HOST=/host -e IMAGE=IMAGE -e NAME=NAME IMAGE /usr/local/bin/ovirt-guest-agent-uninstall.sh How reproducible: Always Steps to Reproduce: 1. Create atomic HA VM 2. Start the VM 3. Load the relevant docker image: # docker load -i docker-image.tar.gz 4. Install the relevant image: # atomic install IMAGE_ID 5. Run the relevant image: # atomic run IMAGE_ID 6. Poweroff the VM from the guest OS: # poweroff Actual results: The engine restart the VM Expected results: The engine leave the VM in the state DOWN Additional info: Check the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1406033#c27
please check if verification of bug 1341106 (by you) is satisfactory. If not, please retest
alright, can you please attach vdsm.log?
Created attachment 1319070 [details] vdsm and engine logs You can start looking from: 2017-08-28 15:37:30,453+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-13) [] VM '06fefa09-a213-410c-ae12-149b0de90f42'(atomic-vm) moved from 'Up' --> 'Down' 2017-08-28 15:37:30,521+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [] EVENT_ID: VM_DOWN_ERROR(119), VM atomic-vm is down with error. Exit message: VM has been terminated on the host.
Hi Artyom, we still need to see the guest side logs from ovirt-guest-agent and/or vdsm logs with debug level. The agent needs to shut down cleanly and send session-shutdown message in the last 240s before the actual qemu termination in order to detect it correctly on Atomic.
Created attachment 1359462 [details] logs Checked again on: # atomic images info 867512d0966f Image Name: 867512d0966f architecture: x86_64 atomic.type: system authoritative-source-url: registry.access.redhat.com build-date: 2017-11-22T18:15:26.566179 com.redhat.build-host: rcm-img-docker02.build.eng.bos.redhat.com com.redhat.component: ovirt-guest-agent-docker description: The ovirt-guest-agent is providing information about the virtual machine and allows to restart / shutdown the machine via the RHV Portal. This image is intended to be used with virtual machines running RHEL 7 Atomic Host. distribution-scope: public io.k8s.description: The ovirt-guest-agent is providing information about the virtual machine and allows to restart / shutdown the machine via the RHV Portal. This image is intended to be used with virtual machines running RHEL 7 Atomic Host. io.k8s.display-name: oVirt Guest Agent io.openshift.tags: base rhel7 license: ASL 2.0 maintainer: Tomas Golembiovsky <tgolembi> name: rhev4/ovirt-guest-agent release: 40 summary: The oVirt Guest Agent url: https://access.redhat.com/containers/#/registry.access.redhat.com/rhev4/ovirt-guest-agent/images/1.0.13-40 vcs-ref: 4cc91717604b2ed1e495c2001dcefe9a73309388 vcs-type: git vendor: Red Hat, Inc. version: 1.0.13 and vdsm-4.20.8-1.el7ev.x86_64 For some reason, ovirt-guest-agent does not fill the log(I believe container issue), so I just provide snapshot from journalctl -u ovirt-guest-agent.service. You can start looking from: 2017-11-27 15:24:58,154+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), VM atomic-vm is down with error. Exit message: VM has been terminated on the host. Also I can provide the environment, so please just ping me.
It seems we kill the container before the agent can send the session-shutdown message.
which would mean the atomic os is not set up correctly on shutdown and it does not wait for ovirt-ga-docker to terminate cleanly. Is that possible to configure somehow?
It's a bug in our container not in atomic. Easy to fix though.
looks ok # atomic containers list --no-trunc CONTAINER ID IMAGE NAME COMMAND CREATED STATE BACKEND RUNTIME ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhev4/ovirt-guest-agent:rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-59820-20171218213722 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py 2017-12-19 15:27 running ostree runc no tuned inside container.
ok, ovirt-guest-agent-docker-1.0.14-3 2018-01-03 12:40:51,380+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler2) [55e0befd] VM '027ef96e-544c-4ced-a267-7ea89cc9464a'(jbelka-atomic-02) moved from 'PoweringUp' --> 'Up' 2018-01-03 12:40:51,407+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [55e0befd] EVENT_ID: USER_RUN_VM(32), Correlation ID: f57ac60c-bbb8-4f07-802d-39076a2f57a5, Job ID: 75ca4d30-4960-47c1-94cd-be69dcd1924b, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM jbelka-atomic-02 started on Host slot-7c 2018-01-03 12:42:54,926+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [50c96e8b] VM '027ef96e-544c-4ced-a267-7ea89cc9464a'(jbelka-atomic-02) moved from 'Up' --> 'Down' 2018-01-03 12:42:55,118+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [50c96e8b] EVENT_ID: VM_DOWN(61), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM jbelka-atomic-02 is down. Exit message: User shut down from within the guest from brew task id 635712 # runc list ID PID STATUS BUNDLE CREATED OWNER ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-96798-20180102144148 831 running /var/lib/containers/atomic/ovirt-guest-agent-rhevm-4.1-rhel-7-docker-candidate-96798-20180102144148.0 2018-01-03T11:52:12.875767314Z root
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0049
BZ<2>Jira re-sync