Bug 624959
Summary: | xend should prevent restarting loops when guest crashes at boot time and dump-core is enabled | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Yufang Zhang <yuzhang> | ||||||||
Component: | xen | Assignee: | Miroslav Rezanina <mrezanin> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.6 | CC: | gyang, leiwang, mrezanin, mshao, xen-maint | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | xen-3.0.3-120.el5 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-01-13 22:23:42 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 514500 | ||||||||||
Attachments: |
|
Description
Yufang Zhang
2010-08-18 09:00:26 UTC
Created attachment 439328 [details]
patch to prevent loops
I have figured out a patch to solve this problem, which records the time when the guest crashes and uses this time to compute elapse time in restart() function. I have tested the situation with the patch, xend would destroy the guest when it crashes at boot time.
Thanks for patch Yufang. Can you please test this scenario with xen-3.0.3-114.el5? Is there any relevant difference in behavior? Created attachment 439572 [details]
xend.log
Hi Miroslav,
I have tested this scenario with xen-3.0.3-114.el5. I hit the same problem but with a small difference in behaviour. xend would restart the guest at its first boot-and-crash, because 'xend/previous_restart_time' is None. After the guest reboots, timeout value is also expanded due to dump core. Thus the guest drops into loops. You could check more detailed information from xend.log in the attachment.
With xen-116 and kernel-xen-233, I can reproduce this bug with comment#1 steps with RHEL5.5-64bit-pv-guest whose elapse time is larger than MINIMUM_RESTART_TIME. But with xen-119 and kernel-xen-233, there is a problem that the bug will still occur while I execute "xm list " continually during the process. The details is in the followings: Version-Release number of selected component (if applicable): xen-3.0.3-119.el5 kernel-xen-2.6.18-233.el5 How reproducible: Always Actual steps: 1. Edit a grub file of guest to make it crash at boot time 2. set 'enable-dump' as 'yes' in xend and 'on_crash' as 'restart' for the guest 3. Start the guest 4. In the same time with step3, open another console, execute "xm list" continually. Actual results: 1. The guest drops into a restart-crash-dumpcore loop. Domain0 was full of dump core files. 2. When "xm li" stops, the guest will work well later. Created attachment 462276 [details]
xend.log of Xen-119
I see the problem in log. I do not know why, but in your case, refreshShutdown is call more than once - on each call time of crash is rewritten. This is probably due to xm list blocking xend to handle crashDump immediately. I will rewrite patch for this. Without xm list interfere, was this reproducible on 119? (In reply to comment #10) > I see the problem in log. I do not know why, but in your case, refreshShutdown > is call more than once - on each call time of crash is rewritten. This is > probably due to xm list blocking xend to handle crashDump immediately. I will > rewrite patch for this. Without xm list interfere, was this reproducible on > 119? Without xm list, I cannot reproduce it. Fix built into xen-3.0.3-120.el5 Version-Release number of selected component (if applicable): xen-3.0.3-120.el5 kernel-xen-2.6.18-233.el5 host: RHEL5.5-x86_64 guest: RHEL5.5-x86_64-PV Actual steps: same with comment#8 Actual steps: the guest works well. So I change it to verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0031.html |