Description of problem: In an environment with large compute nodes (3TB of RAM) and high instance add/delete churn, the current wait time outs for instance destroy may not be sufficient. The destroy time outs should be user configurable to support such environment. This environment sees the following instance destroy failures daily. 2018-09-25 14:50:25.251 250438 WARNING nova.virt.libvirt.driver [req-9c05c3c1-ab08-4a14-b036-ad10b987b8e5 ad64ce5e9890b9596163edd10c8a4da2bca62c4f84f720b59cf30d20903c60ab b297db4812004ed9938eae3c776467ad - - -] [instance: ee0cd616-c3da-41fa-9af4-4c3173f85764] Error from libvirt during destroy. Code=38 Error=Failed to terminate process 203650 with SIGKILL: Device or resource busy; attempt 3 of 3 The instances eventually terminate and allowing a longer time out would prevent this error. References: https://bugzilla.redhat.com/show_bug.cgi?id=1205647 https://github.com/libvirt/libvirt/blob/9a4e4b942df0474503e7524ea427351a46c0eabe/src/util/virprocess.c#L349 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L843 Version-Release number of selected component (if applicable): OSP 10 openstack-nova-compute-14.1.0-22.el7ost.noarch libvirt-daemon-3.9.0-14.el7_5.6.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.3.x86_64 How reproducible: Daily in this specific environment Additional info: I'll provide additional environment details and logs
Matt's random thoughts: * do we have any way to ask libvirt if the shutdown is still in progress and expected to complete eventually? * are there circumstances in which a shutdown will never complete, but also not fail? * i.e. Can we just remove the timeout and just handle failure?
(In reply to Matthew Booth from comment #2) > Matt's random thoughts: > > * do we have any way to ask libvirt if the shutdown is still in progress and > expected to complete eventually? In theory there is a "Shutting down" state but I don't think we use that in the QEMU driver in libvirt. It just remains "running" until it goes to "shutoff". We few weeks ago though we did majorly increase the time we wait for shutdown to complete in libvirt. Originally we send SIGTERM, then wait 10 seconds, and sent SIGKILL and wait another 5 seconds. With the new code we wait 30 seconds for SIGKILL to work instead of 5. We also add even longer wait if there are PCI devices assigned as some of those slow things down alot. commit 9a4e4b942df0474503e7524ea427351a46c0eabe Author: Christian Ehrhardt <christian.ehrhardt> Date: Mon Aug 6 12:10:38 2018 +0200 process: wait longer 5->30s on hard shutdown In cases where virProcessKillPainfully already reailizes that SIGTERM wasn't enough we are partially on a bad path already. Maybe the system is overloaded or having serious trouble to free and reap resources in time. In those case give the SIGKILL that was sent after 10 seconds some more time to take effect if force was set (only then we are falling back to SIGKILL anyway). Signed-off-by: Christian Ehrhardt <christian.ehrhardt> Reviewed-by: Daniel P. Berrangé <berrange> commit be2ca0444728edd12a000653d3693d68a5c9102f Author: Christian Ehrhardt <christian.ehrhardt> Date: Thu Aug 2 09:05:18 2018 +0200 process: wait longer on kill per assigned Hostdev It was found that in cases with host devices virProcessKillPainfully might be able to send signal zero to the target PID for quite a while with the process already being gone from /proc/<PID>. That is due to cleanup and reset of devices which might include a secondary bus reset that on top of the actions taken has a 1s delay to let the bus settle. Due to that guests with plenty of Host devices could easily exceed the default timeouts. To solve that, this adds an extra delay of 2s per hostdev that is associated to a VM. Reviewed-by: Daniel P. Berrangé <berrange> Signed-off-by: Christian Ehrhardt <christian.ehrhardt> > * are there circumstances in which a shutdown will never complete, but also > not fail? > * i.e. Can we just remove the timeout and just handle failure? I'm not sure what you mean by "not fail" ? If the process does die after we sent it SIGKILL, then we'll return an error from virDomainDestroy after the timeout (5secs, now 30 secs). If the system was merely busy the QEMU might still die after that. If the QEMU was stuck in kernel space, eg due to dead storage path, it might be stuck forever (until host reboot). We can't easily distinguish which of these two scenarios applies, but you can't rese resources for another VM until the original QEMU has gone completely.
Can't say I'm a huge fan of this kind of tuning knob, but based on comment 4 I can't think of a better solution. I'll bring it up in the team meeting.
Hello. Could we please have some update on this one. BR, Alex.
(In reply to Daniel Berrange from comment #4) [...] > We few weeks ago though we did majorly increase the time we wait for > shutdown to complete in libvirt. Originally we send SIGTERM, then wait 10 > seconds, and sent SIGKILL and wait another 5 seconds. > > With the new code we wait 30 seconds for SIGKILL to work instead of 5. We > also add even longer wait if there are PCI devices assigned as some of those > slow things down alot. > > commit 9a4e4b942df0474503e7524ea427351a46c0eabe > Author: Christian Ehrhardt <christian.ehrhardt> > Date: Mon Aug 6 12:10:38 2018 +0200 > > process: wait longer 5->30s on hard shutdown > > In cases where virProcessKillPainfully already reailizes that > SIGTERM wasn't enough we are partially on a bad path already. > Maybe the system is overloaded or having serious trouble to free and > reap resources in time. > > In those case give the SIGKILL that was sent after 10 seconds some more > time to take effect if force was set (only then we are falling back to > SIGKILL anyway). > > Signed-off-by: Christian Ehrhardt <christian.ehrhardt> > Reviewed-by: Daniel P. Berrangé <berrange> > > commit be2ca0444728edd12a000653d3693d68a5c9102f > Author: Christian Ehrhardt <christian.ehrhardt> > Date: Thu Aug 2 09:05:18 2018 +0200 > > process: wait longer on kill per assigned Hostdev > > > It was found that in cases with host devices virProcessKillPainfully > might be able to send signal zero to the target PID for quite a while > with the process already being gone from /proc/<PID>. > > That is due to cleanup and reset of devices which might include a > secondary bus reset that on top of the actions taken has a 1s delay > to let the bus settle. Due to that guests with plenty of Host devices > could easily exceed the default timeouts. > > To solve that, this adds an extra delay of 2s per hostdev that is > associated > to a VM. > > Reviewed-by: Daniel P. Berrangé <berrange> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt> I wonder if it's a reasonable to request to backport the above two commits (available from libvirt v4.10.0 onwards) to RHEL 7.6. FWIW, the `diffstat` looks small, and doesn't look very risky to my eyes. The customer is using RHEL 7.5. And _assuming_ the new timeouts are sufficient, maybe they are willing to update to RHEL 7.6.
(In reply to Kashyap Chamarthy from comment #10) > I wonder if it's a reasonable to request to backport the above two > commits (available from libvirt v4.10.0 onwards) to RHEL 7.6. I mixed up the libvirt version from which the said two libvirt patches are available: "v4.10.0" --> "v4.7.0" [...] Based on discussion with DanPB and Matt on IRC, we are leaning towards the following solution: Increase the counter to call destroy() API (when EBUSY hits) in Nova from 3 to 6 (so that it matches what libvirt upstream does). We don't want to add yet-another config attribute; we already have far too many.
The upstream patch has merged: https://opendev.org/openstack/nova/commit/10d50ca4e2 — "libvirt: Rework 'EBUSY' (SIGKILL) error handling code path" (https://review.opendev.org/#/c/639091/)
*** Bug 1489980 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4299