Description of problem: I was poking around xenstore last night, and I ran into what looks like a leak in the PVFB xenstore entries. If I create a PV domain with this in the configuration file: vfb = [ "type=vnc,vncunused=1" ] Then it (properly) creates a vfb and a vkbd xenstore entry for those devices in xenstore. However, if I then "xm destroy" that same domain, the entries never go away. While it's not an immediate problem, it will probably cause xenstore to slow down over time, and probably leak memory. I'll attach my domain configuration, and the output of xenstore-ls right after boot, right after booting a single PV domain, and right after running "xm destroy" on that same domain.
Created attachment 299317 [details] Configuration file for a RHEL-5 PV guest
Created attachment 299318 [details] Output of "xenstore-ls" right after dom0 boot
Created attachment 299319 [details] Output of "xenstore-ls" right after "xm create" of PV domain
Created attachment 299320 [details] Output of "xenstore-ls" right after "xm destroy" of PV domain
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Well, the problem is that something prevents detach at time it's being detached. There are 2 entries for both device class (vkbd & vfb). One is located in: /local/domain/0/backend/{deviceClass}/{domid} and the second one is in /vm/{uuid}/device/{deviceClass} I did some research about it and I found out that deleting /vm/{uuid}/device/{deviceClass} didn't return error as described in BZ #478868 but vfb-* and vkbd-* entries still exists in /sys/bus/xen-backend/devices/ like described in BZ #438439. I think problem is that something prevents device detach at the time. I have tried to add some PVFB cleanup code there and it's working fine but I was unable to find any reference to upstream changeset but according to information from kraxel in http://lists.xensource.com/archives/html/xen-devel/2009-03/msg00307.html upstream doesn't have this issue.
Created attachment 338736 [details] PVFB device detach fix Well, this is the patch for it that works for me. Please try it
*** Bug 438439 has been marked as a duplicate of this bug. ***
Cool, thanks for the patch, I'll try it out next week. The thing is, it would be good to understand *why* upstream doesn't have the problem. Maybe we can leverage the upstream patch, but if not, at least we can understand what the difference is between RHEL-5 and upstream. Chris Lalancette
Yeah Chris, I completely agree with you and that's why I am trying to recompile upstream version of Xen on my Fedora 10 box (the box I am running on my laptop, it should work, right?) but no luck yet. I need to investigate it on working upstream version of Xen there but there are some issues when booting the required Xen kernel so I need to investigate further.
Well Chris, I've finally tested this on my desktop box (I was unable to get it running on Fedora 10 because my laptop's hardware seems to be too new for kernel required by upstream Xen) and I've tested it now. Upstream doesn't have this issue and the thing is that it has a different codebase. I know know why exactly but the problem is that in upstream's codebase, there are device classes defined in XendDevices.py but we're having it defined in XendDomainInfo.py. Something prevents the detaching in our codebase because I've been testing it some time ago and 'vkbd' and 'vfb' entries were there in 'for' cycle as well. Maybe the reason upstream code for that has been overwritten is that causes errors in their codebase as well. But the strange thing is that we're having function for release devices that's now working so this was just a workaround/patch to make it working.
Fix built into xen-3.0.3-85.el5
*** Bug 478868 has been marked as a duplicate of this bug. ***
Event posted on 06-09-2009 08:01am EDT by Glen Johnson ------- Comment From santwana.samantray.com 2009-06-09 07:53 EDT------- Hi, I tried to reproduce this issue in RHEL5.4-pre Alpha release(k.v-2.6.18-151.el5xen).Its very had to reproduce this issue in general.After many iterations, I was able to reproduce this issue,i.e the guest failed to restart after rebooting. Below is a snip from /var/log/xen/xend.log as below: <snip> [2009-06-09 21:31:34 xend.XendDomainInfo 5670] ERROR (XendDomainInfo:2555) Failed to restart domain 22. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2539, in restart for x in prev_vm_xend[0][1]: TypeError: iteration over non-sequence </snip> List of Xen rpm versions: xen-3.0.3-86.el5 xen-libs-3.0.3-86.el5 kernel-xen-devel-2.6.18-151.el5 kernel-xen-2.6.18-151.el5 Thanks, Santwana This event sent from IssueTracker by jkachuck issue 252463
Hmm, this error is most likely a different issue. Could you, please, add the following line: log.debug("/vm/UUID/xend: %s", prev_vm_xend) above line 2539 in /usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py, restart xend and try to reproduce it again? And attach the whole xend.log next time, please. Thanks
Well, my patch was about removing remaining PVFB backend after domain has shutdown/reboot/been destroyed so it has nothing to do with prev_vm_xend or anything similar, ie. this is definitely a different issue. Also, I have created a package with some logging of this applied and investigation showed me only reboot calls this function. You can download and test using my new RPMs at: http://people.redhat.com/minovotn/xen/ Thanks, Michal
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
Could you please help give out test steps this bug ? thanks a lot
(In reply to comment #24) > Could you please help give out test steps this bug ? thanks a lot I think that Gurhan proved it in Comment #23, so we should be good to go. Chris Lalancette
Thanks a lot for your guide !
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1328.html