RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 666158 - domain suspension followed by two conflicting events
Summary: domain suspension followed by two conflicting events
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 624252 672492 672725 678524
Blocks: 682015
TreeView+ depends on / blocked
 
Reported: 2010-12-29 08:51 UTC by Dan Kenigsberg
Modified: 2011-07-25 15:13 UTC (History)
10 users (show)

Fixed In Version: libvirt-0.8.7-3.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 13:25:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
some context to the said event (4.47 KB, application/gzipped-tar)
2010-12-29 08:53 UTC, Dan Kenigsberg
no flags Details
slightly modified libvirtev.py to print all events (16.88 KB, text/plain)
2011-01-26 09:58 UTC, Dan Kenigsberg
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Dan Kenigsberg 2010-12-29 08:51:09 UTC
Description of problem:
Once in a while, suspending a domain ends with 
VIR_DOMAIN_EVENT_STOPPED + VIR_DOMAIN_EVENT_STOPPED_SAVED
followed by
VIR_DOMAIN_EVENT_STOPPED + VIR_DOMAIN_EVENT_STOPPED_FAILED

Version-Release number of selected component (if applicable):


How reproducible:
seldom

Additional info:
10:42:49.999: debug : remoteRelayDomainEventLifecycle:118 : Relaying domain lifecycle event 5 4
10:42:49.999: debug : virDomainFree:2218 : domain=0x7f3448048850
10:42:49.999: debug : remoteRelayDomainEventLifecycle:118 : Relaying domain lifecycle event 5 5
10:42:49.999: debug : virDomainFree:2218 : domain=0x7f3448048850

Comment 1 Dan Kenigsberg 2010-12-29 08:53:56 UTC
Created attachment 471045 [details]
some context to the said event

Comment 2 Daniel Berrangé 2011-01-05 10:59:00 UTC
The first SAVED event is emitted in qemudDomainSaveFlag().  I think that while qemudDomainSaveFlag is running and holds the lock on the domain object, a monitor EOF event arrives which then causes qemuHandleMonitorEOF() to run emitting a FAILED event. qemuHandleMonitorEOF() probably needs to do as if (virDomainIsActive()) check before emitting the event.

Comment 3 Dave Allan 2011-01-05 21:29:35 UTC
Dan, you don't sound entirely convinced of your analysis.  How do you want to proceed?

Comment 4 Dave Allan 2011-01-05 21:30:27 UTC
(In reply to comment #3)
> Dan, you don't sound entirely convinced of your analysis.  How do you want to
> proceed?

Just to clarify, that's Dan B. I'm asking.

Comment 5 Daniel Berrangé 2011-01-06 10:48:29 UTC
This is simply a hypothesis based on looking at the code & DanK's logfile. It would of course need investigation & testing to see if its correct.

Comment 6 Dave Allan 2011-01-06 19:39:49 UTC
Dan K, is it something that happens at a constant rate; i.e., if you suspend and resume a domain in a loop do you expect to see the incorrect event periodically?  I'm trying to figure out how we can test Dan B's hypothesis.

Comment 7 Dan Kenigsberg 2011-01-06 21:10:43 UTC
(In reply to comment #6)
> Dan K, is it something that happens at a constant rate; i.e., if you suspend
> and resume a domain in a loop do you expect to see the incorrect event
> periodically?  I'm trying to figure out how we can test Dan B's hypothesis.

Yes, at least in vdsm environment, Igor experienced it quite often, running http://git.engineering.redhat.com/?p=users/dkenigsb/vdsm.git;a=blob;f=vdsm/storage/ut/multiVmTests.py

Comment 8 Jiri Denemark 2011-01-19 19:31:20 UTC
I think the hypothesis in comment 2 is right. By putting sleep() inside qemudShutdownVMDaemon(), I was able to get two STOPPED events in a row for a single domain. The patch, that fixes this was sent upstream for review: https://www.redhat.com/archives/libvir-list/2011-January/msg00818.html

Comment 9 Jiri Denemark 2011-01-19 19:57:32 UTC
The patch is now upstream and its backport sent for internal review

Comment 11 Vivian Bian 2011-01-21 12:25:40 UTC
Hi Jiri, 
failed to get a way to reproduce this bug with the old version of libvirt . Would you please give me some suggestion on how to reproduce it ?

Thanks
Vivian

Comment 12 Dave Allan 2011-01-21 18:22:25 UTC
Vivian, please see comment #7 for a reproducer script.  You need to read the full transcript before asking for help reproducing a bug.  Often the answer is there.

Comment 13 Vivian Bian 2011-01-25 08:50:17 UTC
Hi Dan, 
According to the following reasons , would you please help verify this bug  :
1. We don't have the RUTH environment in China. 
2. failed to reproduce this bug without RUTH environment.
3. the reproducer script in comment #7 is from RUTH . I managed to make the script run without error output. But didn't get any output from the script ,either. 

Thanks 
Vivian

Comment 14 Dan Kenigsberg 2011-01-26 09:56:45 UTC
(In reply to comment #13)

Please run the attached script. It registers to libvirt and prints received events.

python libvirtev.py | tee /tmp/log

In another console run save/restore in a tight loop. Verify that "save" is followed by only one Stop event.

Comment 15 Dan Kenigsberg 2011-01-26 09:58:04 UTC
Created attachment 475358 [details]
slightly modified libvirtev.py to print all events

Comment 16 Vivian Bian 2011-02-15 08:05:37 UTC
can't reproduce this bug with the comment 14 and comment 15 suggestion , but met bug 672725 (the blocker) instead . Will retest this bug after the blocker bug gets fixed . 

Also tried the script on comment 7 , could not reproduce this bug with RUTH environment with the old version . Is there any special profile requested for the reproducer machine ? 

Dan would you please show me more suggestion on this bug ? Thanks !

Comment 18 Vivian Bian 2011-03-10 10:53:41 UTC
add blocker bugs 624252 678524 , because if we don't get 678524 fixed , we can't completely make the blocker bug 672725 be without error report like 
error: Failed to restore domain from /opt/test.save
error: operation failed: failed to read qemu header
if we don't have 624252 fixed this bug can't be verified on vdsm RUTH system

Comment 19 zhanghaiyan 2011-03-16 08:09:05 UTC
Cannot reproduced this bug both with older libvirt-0.8.1-27.el6.x86_64.rpm and new libvirt-0.8.7-12.el6.x86_64

In plain libvirt env, execute the reproducer in comment 14. Save/restore guests for 200 times. No twice save event is found.

# python libvirtev.py
Using uri:qemu:///system
.....
myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4
myDomainEventCallback2 EVENT: Domain new(44) Started 2
myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4
myDomainEventCallback2 EVENT: Domain new(45) Started 2
myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4
myDomainEventCallback2 EVENT: Domain new(46) Started 2
myDomainEventCallback2 EVENT: Domain new(-1) Stopped 4
.....

Comment 25 errata-xmlrpc 2011-05-19 13:25:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.