Bug 439182 - [RHEL5.2]: Running "xm destroy" on a domain with PVFB causes a xenstore leak
Summary: [RHEL5.2]: Running "xm destroy" on a domain with PVFB causes a xenstore leak
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Michal Novotny
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 438439 478868 (view as bug list)
Depends On:
Blocks: 448899 492190
TreeView+ depends on / blocked
 
Reported: 2008-03-27 13:56 UTC by Chris Lalancette
Modified: 2018-10-19 18:48 UTC (History)
10 users (show)

Fixed In Version: xen-3.0.3-85.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 10:11:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Configuration file for a RHEL-5 PV guest (377 bytes, text/plain)
2008-03-27 13:57 UTC, Chris Lalancette
no flags Details
Output of "xenstore-ls" right after dom0 boot (713 bytes, text/plain)
2008-03-27 13:57 UTC, Chris Lalancette
no flags Details
Output of "xenstore-ls" right after "xm create" of PV domain (4.52 KB, text/plain)
2008-03-27 13:58 UTC, Chris Lalancette
no flags Details
Output of "xenstore-ls" right after "xm destroy" of PV domain (1.49 KB, text/plain)
2008-03-27 13:58 UTC, Chris Lalancette
no flags Details
PVFB device detach fix (1.01 KB, patch)
2009-04-08 15:42 UTC, Michal Novotny
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1328 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2009-09-01 10:32:30 UTC

Description Chris Lalancette 2008-03-27 13:56:02 UTC
Description of problem:
I was poking around xenstore last night, and I ran into what looks like a leak
in the PVFB xenstore entries.  If I create a PV domain with this in the
configuration file:

vfb = [ "type=vnc,vncunused=1" ]

Then it (properly) creates a vfb and a vkbd xenstore entry for those devices in
xenstore.  However, if I then "xm destroy" that same domain, the entries never
go away.  While it's not an immediate problem, it will probably cause xenstore
to slow down over time, and probably leak memory.  I'll attach my domain
configuration, and the output of xenstore-ls right after boot, right after
booting a single PV domain, and right after running "xm destroy" on that same
domain.

Comment 1 Chris Lalancette 2008-03-27 13:57:16 UTC
Created attachment 299317 [details]
Configuration file for a RHEL-5 PV guest

Comment 2 Chris Lalancette 2008-03-27 13:57:44 UTC
Created attachment 299318 [details]
Output of "xenstore-ls" right after dom0 boot

Comment 3 Chris Lalancette 2008-03-27 13:58:12 UTC
Created attachment 299319 [details]
Output of "xenstore-ls" right after "xm create" of PV domain

Comment 4 Chris Lalancette 2008-03-27 13:58:35 UTC
Created attachment 299320 [details]
Output of "xenstore-ls" right after "xm destroy" of PV domain

Comment 5 RHEL Program Management 2008-06-10 10:05:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Michal Novotny 2009-04-08 15:00:04 UTC
Well, the problem is that something prevents detach at time it's being detached. There are 2 entries for both device class (vkbd & vfb). One is located in: /local/domain/0/backend/{deviceClass}/{domid} and the second one is in /vm/{uuid}/device/{deviceClass}

I did some research about it and I found out that deleting /vm/{uuid}/device/{deviceClass} didn't return error as described in BZ #478868 but vfb-* and vkbd-* entries still exists in /sys/bus/xen-backend/devices/ like described in BZ #438439. I think problem is that something prevents device detach at the time. I have tried to add some PVFB cleanup code there and it's working fine but I was unable to find any reference to upstream changeset but according to information from kraxel in http://lists.xensource.com/archives/html/xen-devel/2009-03/msg00307.html upstream doesn't have this issue.

Comment 8 Michal Novotny 2009-04-08 15:42:53 UTC
Created attachment 338736 [details]
PVFB device detach fix

Well, this is the patch for it that works for me. Please try it

Comment 9 Chris Lalancette 2009-04-10 09:44:53 UTC
*** Bug 438439 has been marked as a duplicate of this bug. ***

Comment 10 Chris Lalancette 2009-04-10 09:46:55 UTC
Cool, thanks for the patch, I'll try it out next week.  The thing is, it would be good to understand *why* upstream doesn't have the problem.  Maybe we can leverage the upstream patch, but if not, at least we can understand what the difference is between RHEL-5 and upstream.

Chris Lalancette

Comment 11 Michal Novotny 2009-04-14 07:53:45 UTC
Yeah Chris, I completely agree with you and that's why I am trying to recompile upstream version of Xen on my Fedora 10 box (the box I am running on my laptop, it should work, right?) but no luck yet. I need to investigate it on working upstream version of Xen there but there are some issues when booting the required Xen kernel so I need to investigate further.

Comment 12 Michal Novotny 2009-04-23 08:54:27 UTC
Well Chris, I've finally tested this on my desktop box (I was unable to get it running on Fedora 10 because my laptop's hardware seems to be too new for kernel required by upstream Xen) and I've tested it now. Upstream doesn't have this issue and the thing is that it has a different codebase. I know know why exactly but the problem is that in upstream's codebase, there are device classes defined in XendDevices.py but we're having it defined in XendDomainInfo.py. Something prevents the detaching in our codebase because I've been testing it some time ago and 'vkbd' and 'vfb' entries were there in 'for' cycle as well. Maybe the reason upstream code for that has been overwritten is that causes errors in their codebase as well. But the strange thing is that we're having function for release devices that's now working so this was just a workaround/patch to make it working.

Comment 13 Jiri Denemark 2009-05-11 13:40:26 UTC
Fix built into xen-3.0.3-85.el5

Comment 14 Jiri Denemark 2009-05-21 11:47:42 UTC
*** Bug 478868 has been marked as a duplicate of this bug. ***

Comment 16 Issue Tracker 2009-06-09 14:17:37 UTC
Event posted on 06-09-2009 08:01am EDT by Glen Johnson

------- Comment From santwana.samantray.com 2009-06-09 07:53
EDT-------
Hi,

I tried to reproduce this issue in RHEL5.4-pre Alpha
release(k.v-2.6.18-151.el5xen).Its very had to reproduce this issue in
general.After many iterations, I was able to reproduce this issue,i.e the
guest failed to restart after rebooting.
Below is a snip from /var/log/xen/xend.log as below:

<snip>
[2009-06-09 21:31:34 xend.XendDomainInfo 5670] ERROR (XendDomainInfo:2555)
Failed to restart domain 22.
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 2539, in restart
for x in prev_vm_xend[0][1]:
TypeError: iteration over non-sequence
</snip>

List of Xen rpm versions:
xen-3.0.3-86.el5
xen-libs-3.0.3-86.el5
kernel-xen-devel-2.6.18-151.el5
kernel-xen-2.6.18-151.el5

Thanks,
Santwana


This event sent from IssueTracker by jkachuck 
 issue 252463

Comment 17 Jiri Denemark 2009-06-09 15:36:07 UTC
Hmm, this error is most likely a different issue. Could you, please, add the following line:
log.debug("/vm/UUID/xend: %s", prev_vm_xend)

above line 2539 in /usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py, restart xend and try to reproduce it again? And attach the whole xend.log next time, please.

Thanks

Comment 18 Michal Novotny 2009-06-10 08:28:33 UTC
Well, my patch was about removing remaining PVFB backend after domain has shutdown/reboot/been destroyed so it has nothing to do with prev_vm_xend or anything similar, ie. this is definitely a different issue.

Also, I have created a package with some logging of this applied and investigation showed me only reboot calls this function. You can download and test using my new RPMs at:

http://people.redhat.com/minovotn/xen/

Thanks,
Michal

Comment 19 Chris Ward 2009-07-03 18:01:33 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 24 zhanghaiyan 2009-07-29 10:29:48 UTC
Could you please help give out test steps this bug ? thanks a lot

Comment 25 Chris Lalancette 2009-07-29 11:00:14 UTC
(In reply to comment #24)
> Could you please help give out test steps this bug ? thanks a lot  

I think that Gurhan proved it in Comment #23, so we should be good to go.

Chris Lalancette

Comment 27 zhanghaiyan 2009-07-30 01:50:44 UTC
Thanks a lot for your guide !

Comment 31 errata-xmlrpc 2009-09-02 10:11:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1328.html


Note You need to log in before you can comment on or make changes to this bug.