Bug 236595

Summary: Guest Reboot Fails, 30 Second Shutdown Timeout
Product: Red Hat Enterprise Linux 5 Reporter: Devan Goodwin <dgoodwin>
Component: xenAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: chuck.morrison, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHEA-2007-0635 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 17:10:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Don't destroy guests on reboot timeout none

Description Devan Goodwin 2007-04-16 17:41:19 UTC
When rebooting a guest on sufficiently slow hardware, guest shuts down but does
not come back.

Ticket filed with Xen: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=967

Was suggested we create this ticket as a blocker for an RHN ticket to make sure
we don't lose track of the issue.

Details from the Xen ticket:

Encountered a problem where attempting to reboot guests with xm or virsh
resulted in the guest being destroyed and not coming back up. Found the
following xend.log entries:

[2007-04-16 11:07:15 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:940)
XendDomainInfo.handleShutdownWatch
[2007-04-16 11:07:15 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:940)
XendDomainInfo.handleShutdownWatch
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] INFO (XendDomainInfo:930) Domain
shutdown timeout expired: name=sanjose id=5
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:1463)
XendDomainInfo.destroy: domid=5
[2007-04-16 11:07:45 xend.XendDomainInfo 2129] DEBUG (XendDomainInfo:1471)
XendDomainInfo.destroyDomain(5)

Shutdown timeout expires exactly 30 seconds after the first call to
handleShutdownWatch, and watching the guest console it appears the guest needs
just slightly more than 30 seconds to shutdown on the hardware in question.

Suspect a 30 second hard coded timeout which is likely too short.


How reproducible:

Depends on hardware, system in question was rlx-0-04.rhndev.redhat.com.

Comment 1 Daniel Berrangé 2007-04-17 13:02:14 UTC
I searched for the 'shutdown timeout expired' message and found it in

./python/xen/xend/XendDomainInfo.py

It checks to see if the domain has been shutting down for > SHUTDOWN_TIMEOUT,
and if so kills it.

                if self.shutdownStartTime:
                    timeout = (SHUTDOWN_TIMEOUT - time.time() +
                               self.shutdownStartTime)
                    if timeout < 0:
                        log.info(
                            "Domain shutdown timeout expired: name=%s id=%s",
                            self.info['name'], self.domid)
                        self.destroy()


SHUTDOWN_TIMEOUT is set to '30' at the top of the file. I reckon we need to bump
this up to 60 seconds at least.


Comment 2 Clifford Perry 2007-04-18 12:39:32 UTC
Flagging the bug as proposed for RHEL 5.1. Seems like easy modification. 

Comment 3 RHEL Program Management 2007-04-18 12:45:13 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Daniel Berrangé 2007-07-19 18:52:38 UTC
*** Bug 248942 has been marked as a duplicate of this bug. ***

Comment 5 Daniel Berrangé 2007-07-19 19:37:42 UTC
Upstream Xen has removed the shutdown timer completely, allowing the admin to
deal with non-responsive guests as they see fit. They can run a 'destroy'
manually if desirable, or take other action.

changeset:   15179:152dc0d812b2
user:        kfraser
date:        Wed May 30 10:06:23 2007 +0100
summary:     xend: Don't destroy domains on shutdown timeout.


Comment 6 Daniel Berrangé 2007-07-19 19:41:14 UTC
Created attachment 159606 [details]
Don't destroy guests on reboot timeout

This patch is a copy of upstream code ported to RHEL-5 tree

Comment 8 Daniel Berrangé 2007-08-27 22:52:57 UTC
Patch applied in:

* Mon Aug 27 2007 Daniel P. Berrange <berrange> - 3.0.3-37.el5
- Don't destroy guest after shutdown timeout (rhbz #236595)


Comment 11 errata-xmlrpc 2007-11-07 17:10:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0635.html