Bug 666158
Summary: | domain suspension followed by two conflicting events | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dan Kenigsberg <danken> | ||||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.0 | CC: | abaron, berrange, dallan, dyuan, eblake, mjenner, nzhang, vbian, xen-maint, yoyzhang | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.8.7-3.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-05-19 13:25:21 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 624252, 672492, 672725, 678524 | ||||||||
Bug Blocks: | 682015 | ||||||||
Attachments: |
|
Description
Dan Kenigsberg
2010-12-29 08:51:09 UTC
Created attachment 471045 [details]
some context to the said event
The first SAVED event is emitted in qemudDomainSaveFlag(). I think that while qemudDomainSaveFlag is running and holds the lock on the domain object, a monitor EOF event arrives which then causes qemuHandleMonitorEOF() to run emitting a FAILED event. qemuHandleMonitorEOF() probably needs to do as if (virDomainIsActive()) check before emitting the event. Dan, you don't sound entirely convinced of your analysis. How do you want to proceed? (In reply to comment #3) > Dan, you don't sound entirely convinced of your analysis. How do you want to > proceed? Just to clarify, that's Dan B. I'm asking. This is simply a hypothesis based on looking at the code & DanK's logfile. It would of course need investigation & testing to see if its correct. Dan K, is it something that happens at a constant rate; i.e., if you suspend and resume a domain in a loop do you expect to see the incorrect event periodically? I'm trying to figure out how we can test Dan B's hypothesis. (In reply to comment #6) > Dan K, is it something that happens at a constant rate; i.e., if you suspend > and resume a domain in a loop do you expect to see the incorrect event > periodically? I'm trying to figure out how we can test Dan B's hypothesis. Yes, at least in vdsm environment, Igor experienced it quite often, running http://git.engineering.redhat.com/?p=users/dkenigsb/vdsm.git;a=blob;f=vdsm/storage/ut/multiVmTests.py I think the hypothesis in comment 2 is right. By putting sleep() inside qemudShutdownVMDaemon(), I was able to get two STOPPED events in a row for a single domain. The patch, that fixes this was sent upstream for review: https://www.redhat.com/archives/libvir-list/2011-January/msg00818.html The patch is now upstream and its backport sent for internal review Hi Jiri, failed to get a way to reproduce this bug with the old version of libvirt . Would you please give me some suggestion on how to reproduce it ? Thanks Vivian Vivian, please see comment #7 for a reproducer script. You need to read the full transcript before asking for help reproducing a bug. Often the answer is there. Hi Dan, According to the following reasons , would you please help verify this bug : 1. We don't have the RUTH environment in China. 2. failed to reproduce this bug without RUTH environment. 3. the reproducer script in comment #7 is from RUTH . I managed to make the script run without error output. But didn't get any output from the script ,either. Thanks Vivian (In reply to comment #13) Please run the attached script. It registers to libvirt and prints received events. python libvirtev.py | tee /tmp/log In another console run save/restore in a tight loop. Verify that "save" is followed by only one Stop event. Created attachment 475358 [details]
slightly modified libvirtev.py to print all events
can't reproduce this bug with the comment 14 and comment 15 suggestion , but met bug 672725 (the blocker) instead . Will retest this bug after the blocker bug gets fixed . Also tried the script on comment 7 , could not reproduce this bug with RUTH environment with the old version . Is there any special profile requested for the reproducer machine ? Dan would you please show me more suggestion on this bug ? Thanks ! add blocker bugs 624252 678524 , because if we don't get 678524 fixed , we can't completely make the blocker bug 672725 be without error report like error: Failed to restore domain from /opt/test.save error: operation failed: failed to read qemu header if we don't have 624252 fixed this bug can't be verified on vdsm RUTH system Cannot reproduced this bug both with older libvirt-0.8.1-27.el6.x86_64.rpm and new libvirt-0.8.7-12.el6.x86_64 In plain libvirt env, execute the reproducer in comment 14. Save/restore guests for 200 times. No twice save event is found. # python libvirtev.py Using uri:qemu:///system ..... myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4 myDomainEventCallback2 EVENT: Domain new(44) Started 2 myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4 myDomainEventCallback2 EVENT: Domain new(45) Started 2 myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4 myDomainEventCallback2 EVENT: Domain new(46) Started 2 myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4 ..... An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |