Bug 1225425
Summary: | [HC] Hosted Engine VM accessed w/ libgfapi destroyed while adding the host: Failed to find the libvirt domain (code=9) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Sandro Bonazzola <sbonazzo> | ||||||
Component: | General | Assignee: | Nobody <nobody> | ||||||
Status: | CLOSED DEFERRED | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | --- | CC: | acanan, bugs, danken, gianluca.cecchi, giuseppe.ragusa, godas, knarra, lsurette, mgoldboi, michal.skrivanek, sabose, sasundar, srevivo, ykaul | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-03-18 07:03:52 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1022961, 1322852 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Sandro Bonazzola
2015-05-27 10:46:33 UTC
Created attachment 1030473 [details]
sos full report
thanks sandro, I investigated the logs and identified there was another process (not vdsm) which killed the VM during the time when vdsm was restarted during upgrade, therefore it lost track of that VM and on vdsm startup in recovery flow it failed I believe it needs to be investigated from the integration point of view as it may be some services conflict during the HE deployment (In reply to Michal Skrivanek from comment #3) > thanks sandro, I investigated the logs and identified there was another > process (not vdsm) which killed the VM during the time when vdsm was > restarted during upgrade, therefore it lost track of that VM and on vdsm > startup in recovery flow it failed > I believe it needs to be investigated from the integration point of view as > it may be some services conflict during the HE deployment Thanks Michal. Dan is something changed in the way vdsm handle child process termination? Looks like vdsm restart is now killing qemu like it's in the same cgroup and killed by something that may be systemd If you guys can provide (once you will have it) wordaround it will be great. Reproduced today. In /var/log/messages I see: Jun 8 13:12:21 minidell journal: vdsm vds WARNING Found 1 VMs from recovery files not reported by libvirt. This should not happen! Will try to recover them. A fiew lines before that there is: Jun 8 13:12:18 minidell systemd: Starting Virtualization daemon... Jun 8 13:12:18 minidell systemd: Failed to reset devices.list on /machine.slice: Invalid argument Jun 8 13:12:18 minidell journal: libvirt version: 1.2.8, package: 16.el7_1.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-04-02-12:11:36, x86-024.build .eng.bos.redhat.com) Jun 8 13:12:18 minidell journal: Module /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not accessible And also: Jun 8 13:12:08 minidell yum[23495]: Installed: 10:qemu-kvm-tools-ev-2.1.2-23.el7_1.3.1.x86_64 Jun 8 13:12:07 minidell systemd: Time has been changed Qemu logs says: 2015-06-08 11:12:20.134+0000: shutting down qemu: terminating on signal 15 from pid 24264 and the process involved is: root 24264 0.1 0.1 1123260 22124 ? Ssl 13:12 0:00 /usr/sbin/libvirtd --listen I'm not sure about why libvirt and messages dispaly logs in different timezones. So, moving back to virt. Not sure why libvirt send the term signal. Created attachment 1036316 [details]
sos report
Also note that vdsm has been stopped at 13:12:02: VM Channels Listener::DEBUG::2015-06-08 13:12:02,452::vmchannels::187::vds::(run) VM channels listener thread has ended and started at 13:12:21 MainThread::INFO::2015-06-08 13:12:21,793::vdsm::153::vds::(run) (PID: 24464) I am the actual vdsm 4.17.0-914.gitdb9c4de.el7 minidell.home (3.10.0-229.el7.x86_64) one second after libvirt killed qemu, in the middle of supervdsmd initialization. (In reply to Sandro Bonazzola from comment #4) > Dan is something changed in the way vdsm handle child process > termination? No. Vdsm never kills qemu processes directly. It issues a destroy() call to libvirt if it finds vdsm-looking VM that is not expected (e.g. supposed to be Donw). A big loose qemu process with id: %s found, killing it. message should appear in vdsm.log in this case. BTW, libvirtd.log uses UTC, and vdsm.log uses the system timezone. This can be confusing, I admit. There is bunch of stuff being restarted. Libvirt somehow decides to kill the VM, In principle I'm not that surprised...it's not a good idea to upgrade system packages (even though here it was only a harmless qemu-kvm-tools-ev) while the KVM is in use. AFAICT there are no warnings in libvirt log other than Jun 8 13:12:18 minidell journal: libvirt version: 1.2.8, package: 16.el7_1.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-04-02-12:11:36, x86-024.build.eng.bos.redhat.com) Jun 8 13:12:18 minidell journal: Module /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not accessible Jun 8 13:12:19 minidell journal: failed to initialize gluster connection to server: 'minidell.home': Errore di input/output I'm not sure if that gluster error is related or not, you may have better luck with libvirt debug logs Moving back to integration, it seems to be related to the actual upgrade/install procedure. (In reply to Michal Skrivanek from comment #10) > There is bunch of stuff being restarted. Libvirt somehow decides to kill the > VM, In principle I'm not that surprised...it's not a good idea to upgrade > system packages (even though here it was only a harmless qemu-kvm-tools-ev) > while the KVM is in use. AFAICT there are no warnings in libvirt log other > than May I ask where did you see a package upgrade while the VM was running? > > Jun 8 13:12:18 minidell journal: libvirt version: 1.2.8, package: > 16.el7_1.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, > 2015-04-02-12:11:36, x86-024.build.eng.bos.redhat.com) > Jun 8 13:12:18 minidell journal: Module > /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not accessible > Jun 8 13:12:19 minidell journal: failed to initialize gluster connection to > server: 'minidell.home': Errore di input/output > > I'm not sure if that gluster error is related or not, you may have better > luck with libvirt debug logs Can you please specify how to enable libvirt debug logs? > Moving back to integration, it seems to be related to the actual > upgrade/install procedure. Again, I don't see any upgrade happening. (In reply to Sandro Bonazzola from comment #11) > (In reply to Michal Skrivanek from comment #10) > > There is bunch of stuff being restarted. Libvirt somehow decides to kill the > > VM, In principle I'm not that surprised...it's not a good idea to upgrade > > system packages (even though here it was only a harmless qemu-kvm-tools-ev) > > while the KVM is in use. AFAICT there are no warnings in libvirt log other > > than > > May I ask where did you see a package upgrade while the VM was running? again, I'm only guessing based on what I see in the logs. I don't know what exactly it was doing and why: Jun 08 13:12:01 minidell.home systemd[1]: Stopping Virtual Desktop Server Manager... Jun 08 13:12:05 minidell.home systemd[1]: Stopped Virtual Desktop Server Manager. Jun 08 13:12:05 minidell.home systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. ^^ stopped vdsm Jun 08 13:12:08 minidell.home yum[23495]: Installed: 10:qemu-kvm-tools-ev-2.1.2-23.el7_1.3.1.x86_64 Jun 08 13:12:07 minidell.home systemd[1]: Time has been changed Jun 08 13:12:08 minidell.home systemd[1]: Started Dynamic System Tuning Daemon. Jun 08 13:12:11 minidell.home systemd[1]: Reloading. ^^ yum install/update and a complete reload of config, e.g.: Jun 08 13:12:11 minidell.home systemd[1]: Stopping IPv4 firewall with iptables... Jun 08 13:12:11 minidell.home systemd[1]: Stopped GlusterFS, a clustered file-system server. Jun 08 13:12:11 minidell.home systemd[1]: Starting GlusterFS, a clustered file-system server... Jun 08 13:12:14 minidell.home systemd[1]: Stopped Virtualization daemon. Jun 08 13:12:14 minidell.home systemd[1]: Starting Virtualization daemon. ^^ libvirt is restarted too, there are few errors on start, e.g. Jun 08 13:12:19 minidell.home libvirtd[24264]: failed to initialize gluster connection to server: 'minidell.home': Errore di input/output which I dont know if it is relevant or not, but in few seconds the HE VM is gracefully terminated by libvirt. Other services are in progress of restart during that time (vdsm as well, however I don't see any indication that the VM shutdown is related as on vdsm recovery the VM was already gone) > > Jun 8 13:12:18 minidell journal: libvirt version: 1.2.8, package: > > 16.el7_1.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, > > 2015-04-02-12:11:36, x86-024.build.eng.bos.redhat.com) > > Jun 8 13:12:18 minidell journal: Module > > /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not accessible > > Jun 8 13:12:19 minidell journal: failed to initialize gluster connection to > > server: 'minidell.home': Errore di input/output > > > > I'm not sure if that gluster error is related or not, you may have better > > luck with libvirt debug logs > > Can you please specify how to enable libvirt debug logs? see log_outputs in /etc/libvirt/libvirtd.conf. > > Moving back to integration, it seems to be related to the actual > > upgrade/install procedure. > > Again, I don't see any upgrade happening. The system activity as documented by systemd's journal indicates otherwise This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA. oVirt 4.0 beta has been released, moving to RC milestone. oVirt 4.0 beta has been released, moving to RC milestone. This bug is dependent on 1022961 Moving this to 4.2 as libgfapi access is not available in 4.1 Maybe this should be moved to target milestone ovirt-4.1.3 in light of the related recent change on BZ # 1022961 What about comment#21 considerations? Can we move target milestone to any 4.1.x? Are there any other limitations not to target 4.1? Thanks, Gianluca (In reply to Gianluca Cecchi from comment #22) > What about comment#21 considerations? Can we move target milestone to any > 4.1.x? > Are there any other limitations not to target 4.1? > Thanks, > Gianluca We'll need to add more code (see https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:libgfapi ) before we can see if we can backport to 4.1.x This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. Hosted engine deployment with gfapi support is now a lower priority. Closing this now, please re-open with justification and we can revisit priority. |