Bug 589123 - Event channel not giving the required data after PV guest resume
Event channel not giving the required data after PV guest resume
Status: CLOSED DUPLICATE of bug 497080
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.7
All Linux
low Severity medium
: rc
: ---
Assigned To: Xen Maintainance List
Virtualization Bugs
:
Depends On:
Blocks: 514490
  Show dependency treegraph
 
Reported: 2010-05-05 08:52 EDT by Michal Novotny
Modified: 2014-02-02 17:37 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-06-15 08:41:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michal Novotny 2010-05-05 08:52:25 EDT
Description of problem:
When a PV guest is resumed it cannot shutdown/suspend any longer because of event channel not giving the required data for domain cleanup

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-194.el5
xen-3.0.3-107.el5

How reproducible:
Always

Steps to Reproduce:
1. define a PV guest without PVFB devices (for guest with them you would need a patch for bug 513335 too) - I'll refer to it's name as $VM
2. start up the guest
3. try to save to /mnt/small (to make it fail with insufficient memory and resume)
4. the guest is working again so login to the guest console
5. make it shutdown using poweroff
  
Actual results:
$VM is renamed to Zombie-$VM and the domain is in paused, suspended and dying state according to `xm list`

Expected results:
No $VM or Zombie-$VM should be there, `xm list` shouldn't show this domain at all

Additional info:
I've been doing investigation in libxc, xenstore and user-space tools but I found out that there libxc was requesting the list of the domains using a hypercall which was showing this domain to be in those states instead of not being here at all. My investigation continued in the xenstore code with a debug patch added to it - the data that should be going to the user-space/xenstore daemon were not coming from the event channel so the domain cleanup was not done.

-----------------------------------

I am copying some comments from bug 513335 where some comments have reference to this issue:

> Well, I've been investigating this further and the problem *may* be coming 
>from the hypervisor, since I've created a little logging function to log some 
>of data from libxc/domain.c to provide me some basic logging of 
>xc_domain_getinfo() and xc_domain_destroy() function calls and I don't know 
>how this interacts with the user-space XenD restart but the thing is that the 
>logging I added was logging the data that were returned directly from the 
>XEN_DOMCTL_getdomaininfo hypercall.
>
>Those data were and those were the domain IDs per which the hypercall passed, 
>i.e. didn't return any negative value to break the for cycle there:
>...
>getinfo got domain id: 0
>getinfo got domain id: 19
>getinfo got domain id: 0
>getinfo got domain id: 19
>...
>
>Also I've been thinking that is may be a problem of the domain is not 
>destroyed by a hypercall so I tried added logging to xc_domain_destroy() 
>function as well and I was getting:
>
>...
>getinfo got domain id: 0
>getinfo got domain id: 19
>getinfo got domain id: 0
>getinfo got domain id: 19
>destroying domain: 19
>getinfo got domain id: 0
>getinfo got domain id: 19
>getinfo got domain id: 0
>getinfo got domain id: 19
>...
>
>After I restarted the XenD it was all OK and the log was having those new lines:
>...
>getinfo got domain id: 0
>getinfo got domain id: 0
>...

>Therefore it seems that restarting the Xen daemon somehow interacts with the 
>hypervisor itself in a way that this domain entry gots removed when the XenD 
>restart is issued, i.e. connection is being established to the hypervisor.
>
>Isn't it possible it's a bug in hypervisor since I am getting those data 
>directly from the hypervisor using XEN_DOMCTL_getdomaininfo hypercall?
>
>Chris, could you please have a look and tell me? Or having any ideas what may 
>be going on? I've been also checking xenstore and no entries relevant to this 
>domain were present so I saw nothing suspicious.
>
>Thanks,
>Michal

--- Additional comment from minovotn@redhat.com on 2010-04-14 07:57:18 EDT ---

Created an attachment (id=406487)
Libxc debug log

I've been studying it a little further and I created a logging function to libxc (in a libxc_debug.patch file) that is available at:
http://git.engineering.redhat.com/?p=users/drjones/virt-cookbook.git;a=commit;h=7ce8456ad2a45b700c56b3740aaaf72a762a43a1

The thing I've discovered is that it's returning domain in the dying/paused/shutdown state every time I trigger shutdown for both the case of xen domain is being shown as dying and in the case the domain is not shown at all. It looks like this:

[xc_domain.c:216] Func xc_domain_getinfo: XEN_DOMCTL_getdomaininfo returned next domain id (domid=4)
[xc_domain.c:227] Func xc_domain_getinfo: Dying=1
[xc_domain.c:228] Func xc_domain_getinfo: Shutdown=1
[xc_domain.c:229] Func xc_domain_getinfo: Paused=1
[xc_domain.c:230] Func xc_domain_getinfo: Blocked=0
[xc_domain.c:231] Func xc_domain_getinfo: Running=0
[xc_domain.c:232] Func xc_domain_getinfo: HVM=0
[xc_domain.c:238] Func xc_domain_getinfo: Shutdown_reason=0
[xc_domain.c:249] Func xc_domain_getinfo: NrPages=474
[xc_domain.c:251] Func xc_domain_getinfo: MaxMemKB=524288
[xc_domain.c:255] Func xc_domain_getinfo: OnVCPUs=2
[xc_domain.c:257] Func xc_domain_getinfo: MaxVCPUId=1
[xc_domain.c:266] Func xc_domain_getinfo: nr_doms=2

Now I think the deletion of this domain is done by XenD itself and since restart makes it work fine (not showing the dying domain) it's likely that restart of XenD makes one more step to make it working. I need to investigate this further but I'll try to investigate it back in XenD since something appears to be set preventing the domain to be destroyed the right way. The log of 3 possible getdomaininfo calls is attached.

Michal

--- Additional comment from minovotn@redhat.com on 2010-04-16 09:45:18 EDT ---

(In reply to comment #24)
> Created an attachment (id=406487) [details]
> Libxc debug log
> 
> I've been studying it a little further and I created a logging function to
> libxc (in a libxc_debug.patch file) that is available at:
> http://git.engineering.redhat.com/?p=users/drjones/virt-cookbook.git;a=commit;h=7ce8456ad2a45b700c56b3740aaaf72a762a43a1
> 
> The thing I've discovered is that it's returning domain in the
> dying/paused/shutdown state every time I trigger shutdown for both the case of
> xen domain is being shown as dying and in the case the domain is not shown at
> all. It looks like this:
> 
> [xc_domain.c:216] Func xc_domain_getinfo: XEN_DOMCTL_getdomaininfo returned
> next domain id (domid=4)
> [xc_domain.c:227] Func xc_domain_getinfo: Dying=1
> [xc_domain.c:228] Func xc_domain_getinfo: Shutdown=1
> [xc_domain.c:229] Func xc_domain_getinfo: Paused=1
> [xc_domain.c:230] Func xc_domain_getinfo: Blocked=0
> [xc_domain.c:231] Func xc_domain_getinfo: Running=0
> [xc_domain.c:232] Func xc_domain_getinfo: HVM=0
> [xc_domain.c:238] Func xc_domain_getinfo: Shutdown_reason=0
> [xc_domain.c:249] Func xc_domain_getinfo: NrPages=474
> [xc_domain.c:251] Func xc_domain_getinfo: MaxMemKB=524288
> [xc_domain.c:255] Func xc_domain_getinfo: OnVCPUs=2
> [xc_domain.c:257] Func xc_domain_getinfo: MaxVCPUId=1
> [xc_domain.c:266] Func xc_domain_getinfo: nr_doms=2
> 
> Now I think the deletion of this domain is done by XenD itself and since
> restart makes it work fine (not showing the dying domain) it's likely that
> restart of XenD makes one more step to make it working. I need to investigate
> this further but I'll try to investigate it back in XenD since something
> appears to be set preventing the domain to be destroyed the right way. The log
> of 3 possible getdomaininfo calls is attached.
> 
> Michal    

This one is still under investigation and there are many parts of the component I am investigating right now but basically I don't know how why it does it. The thing is that when I restart XenD it's working fine but I guess the bug is in xenstore daemon since there is a function handle_event() that reads data from the event channel and checks whether the read port is not virq_port and if it is then it calls the domain_cleanup() function which appears to be doing actual cleanup. Problem is that when save fails and domain is resumed the event channel may be blocked somehow since it doesn't return virq_port and therefore the domain_cleanup() is not called from handle_event() in xenstore itself. But when you restart XenD it is called immediately on the restart event. I have tried using the various methods from the XenD start functionality but it's not working. Now it's seems that it's somehow connected to the xenstore since when I restart XenD the data on virq_port are received from the event channel and the domain shutdown is therefore called resulting into domain not being in dying state but not being in the list at all so the cleanup is then complete but not before doing this. I'm just writing what I've found but I need to go on with the investigation.

Michal

--- Additional comment from minovotn@redhat.com on 2010-04-16 15:44:26 EDT ---

I was thinking about this one in my spare time when travelling after my shift was over and I think it makes sense for the test results when second save returns errors it cannot suspend the domain since suspend is basically shutdown(reason='suspend') call and the domain cannot shutdown successfully after failed save. As described above the reason seems to be that event channel is not giving port == virq_port to trigger the domain cleanup by calling domain_cleanup() in xenstored. As I've been thinking of this it *may* make sense since when you shutdown the domain for the first time, e.g. with the suspend reason to save it, you will be having port == virq_port at the end of events coming from the event channel (/dev/xen/evtchn) but when you try second save as well as you try to shutdown the domain using poweroff you this event is not going from the event channel at all until you restart the xen daemon. I guess this interacts with the hypervisor and/or xenstored the way to resume the capability to generate the virq_port event again since until the daemon is restarted you are not getting any the port == virq_port event at all and therefore the domain is in the dying state and the proper cleanup is not done (according to the xenstore daemon debug logging I've created to help me investigate this). Isn't it possible that when you resume the guest the hypervisor still things that the virq_port has been already used so it cannot be used anymore which is reset only by some side-effect of restarting XenD ? Isn't that possible?

Thanks,
Michal

--- Additional comment from minovotn@redhat.com on 2010-04-16 15:49:02 EDT ---

(In reply to comment #26)
> I was thinking about this one in my spare time when travelling after my shift
> was over and I think it makes sense for the test results when second save
> returns errors it cannot suspend the domain since suspend is basically
> shutdown(reason='suspend') call and the domain cannot shutdown successfully
> after failed save. As described above the reason seems to be that event channel
> is not giving port == virq_port to trigger the domain cleanup by calling
> domain_cleanup() in xenstored. As I've been thinking of this it *may* make
> sense since when you shutdown the domain for the first time, e.g. with the
> suspend reason to save it, you will be having port == virq_port at the end of
> events coming from the event channel (/dev/xen/evtchn) but when you try second
> save as well as you try to shutdown the domain using poweroff this event is
> not going from the event channel at all until you restart the xen daemon. I
> guess this interacts with the hypervisor and/or xenstored the way to resume the
> capability to generate the virq_port event again since until the daemon is
> restarted you are not getting port == virq_port event at all and
> therefore the domain is in the dying state and the proper cleanup is not done
> (according to the xenstore daemon debug logging I've created to help me
> investigate this). Isn't it possible that when you resume the guest the
> hypervisor still thinks that the virq_port has been already used so it cannot
> be used anymore which is reset only by some side-effect of restarting XenD ?
> Isn't that possible?
> 
> Thanks,
> Michal    

Oops, sorry for typos. I'm really tired today thinking about this almost all the week but I really needed to share this information since I can't sleep well because of this and soon I may be having nightmares from this :(

Michal

--- Additional comment from minovotn@redhat.com on 2010-04-19 09:30:45 EDT ---

Ok, I've been studying event channels calls done from within xc library (libxc) to the xen daemon. When there is a save, according to my logging added to tools/libxc/xc_evtchn.c for now, it's working fine with no commands to be send to the event channel. Problem is when you are doing save because there are 2 commands issued to with the EVTCHNOP_* hypercalls that are going directly to the event channel, those 2 are EVTCHNOP_alloc_unbound (code 6) and EVTCHNOP_reset (code 10). When a reset is done and they are rebound it's not working until we restart the xen daemon. At this point, there are now new messages in the log file done in the function do_evtchn_op() that issues those EVTCHNOP hypercalls to the hypervisor who forwards this to the event channel. When I try to remove the reset and alloc_unbound code from the xen daemon it returns error that the domain is already running. The problem here is that this may be really closely connected to new domains getting new IDs since when there is a new ID the problem is not here since both EVTCHNOP_alloc_unbound and EVTCHNOP_reset are using domain ID as their parameter or one of the parameters. This seems like when reset is done (which is basically call to close all the event channels according to the event channel code in the hypervisor, common/event_channel.c):

...
    if ( (d = rcu_lock_domain_by_id(dom)) == NULL )
        return -ESRCH;

    for ( i = 0; port_is_valid(d, i); i++ )
        (void)__evtchn_close(d, i);

    rcu_unlock_domain(d);
...

It appears like event channel cannot give port that is equal to virq_port which is bound in the domain_init() in xenstore user-space code in tools/xenstore/xenstored_domain.c using the following statement:

...
        if ((rc = xc_evtchn_bind_virq(xce_handle, VIRQ_DOM_EXC)) == -1)
                barf_perror("Failed to bind to domain exception virq port");
        virq_port = rc;
...

According to the handle_event() function in the same file there's a code to cleanup the domain when port (which is pending event in the channel) is equal to the virq_port:

...
        if ((port = xc_evtchn_pending(xce_handle)) == -1)
                barf_perror("Failed to read from event fd");

        if (port == virq_port)
                domain_cleanup();
...

xc_evtchn_pending() reads the data from the event channel file descriptor, /dev/xen/evtchn device. Bind_virq binds free virtual port got using get_free_port() for the domain call. This is done in domain_init() call but reading the event to be handled by the xenstore. There is a code unbind the port in the when calling do introduce but this for resume case (at least it is called in this case according to my testing) to recreate the event channel but even after recreation the event channel is not giving value of virq_port when it should to shutdown/destroy the guest and remove it from the domain list. The code is as follows:

...
        } else if ((domain->mfn == mfn) && (domain->conn != conn)) {
                /* Use XS_INTRODUCE for recreating the xenbus event-channel. */
                if (domain->port)
                        xc_evtchn_unbind(xce_handle, domain->port);
                rc = xc_evtchn_bind_interdomain(xce_handle, domid, port);
                domain->port = (rc == -1) ? 0 : rc;
                domain->remote_port = port;
        } else {
...

I think that event channel *may* be limited to send just one value of virq_port per domain id which would result into the event to trigger domain shutdown issued just once in the domain live but when resume is called it doesn't clear the flag that this command (virq_port == port appears to be a domain_cleanup() command) has been already used. I think it should be removed in the case the resume has been called to allow proper domain destruction/cleanup.

Like I stated several comments above it would make sense that the domain cannot be saved for the second time (if the first save fails) with error that it cannot be suspended because there is no signal to trigger domain cleanup and since suspend is basically shutdown(reason='suspend') and the shutdown cannot be called because of missing signal. This means that this appears to be connected with the implementation of resume functionality but I can't make a patch available for this one yet if it would cause the domain to be in working but not destroyable state which is not good.

Michal

--- Additional comment from clalance@redhat.com on 2010-04-19 09:59:53 EDT ---

I'm not quite sure where you are with this one, but I'm no longer the right person to ask about this.  I'll move the needinfo over to drjones, since he might have a better idea about any possible issues in the hypervisor.

Chris Lalancette

--- Additional comment from minovotn@redhat.com on 2010-04-19 10:14:01 EDT ---

(In reply to comment #29)
> I'm not quite sure where you are with this one, but I'm no longer the right
> person to ask about this.  I'll move the needinfo over to drjones, since he
> might have a better idea about any possible issues in the hypervisor.
> 
> Chris Lalancette    

Ok Chris, sorry for that. It would be better to ask drjones, thanks for changing needinfo.

Michal

--- Additional comment from drjones@redhat.com on 2010-04-21 04:34:24 EDT ---

Michal,

When setting needinfo please provide a clear, _concise_, and precise question. I've skimmed through all the comments, but don't know what you need from me. The huge comment with event channel analysis doesn't contain any question marks.  I'm clearing needinfo for now. If you prepare some question for me, then feel free to set it again.

Andrew

--- Additional comment from minovotn@redhat.com on 2010-04-21 04:48:02 EDT ---

(In reply to comment #31)
> Michal,
> 
> When setting needinfo please provide a clear, _concise_, and precise question.
> I've skimmed through all the comments, but don't know what you need from me.
> The huge comment with event channel analysis doesn't contain any question
> marks.  I'm clearing needinfo for now. If you prepare some question for me,
> then feel free to set it again.
> 
> Andrew    

Right, simply put there is an issue with event channel not giving port value to the one that is bound to virq_port which is required for xenstore to trigger domain_cleanup(). This way the domain is suspended the first time save it being triggered but when it fails, the domain is resumed. Suspend is basically shutdown(reason='suspend') so event channel *is* now giving port == virq_port which calls domain_cleanup(). When the resume is done, the domain is successfully resumed but when you try to shutdown the guest/poweroff or save again, event channel is no longer giving port == virq_port and the domain cannot be shutdown/suspended and therefore also cleaned up right so basically my question is:

Isn't possible that event channel is setting some flag *not to* give virq_port value any longer after save is done (no matter what the result is)? The event channel should be giving port == virq_port to trigger domain_cleanup() but it's not after resume is done. Isn't it possible some flag is set that should be cleared after resume but it's not? Or having any other idea what's wrong with the event channel since event channel is not giving something what is should on domain shutdown event.

Note:
virq_port is the value that bound using the EVTCHNOP_bind_virq.
port is the value currently read from the event channel, /dev/xen/evtchn

Michal

--- Additional comment from drjones@redhat.com on 2010-04-22 11:17:58 EDT ---


First, why are all the comments in this bug private?
Second, I'm still not really sure what's needed from me at this point, but I'll give a response a try.

Here's what I understand as your current assessment, with some of my own assumptions based on the comments in this bug (because without really knowing what I need to look at, I didn't look at anything).

1) we attempt to suspend a guest that has a vfb device, but it fails due to limited disk space
2) since it failed to suspend, it should resume, but instead it stays in a half-way suspended state
3) from your analysis it fails to resume because we never call domain_cleanup, which I presume we need to do in order to replace the old domain state with the new resumed state.
4) you believe we never call domain_cleanup, because the HV call that checks for pending events on the event channel doesn't return the expected port (the virq port), and that it's not returning it on purpose (due to some flag or whatever).
5) your question to me is whether (4) is a correct assessment or not

If I have all the correct, then I will say that (4) is most likely not correct. If you're expecting an event on a particular port, but don't get it, then I believe the two more likely possibilies are a) you're expecting it to be on the wrong port (i.e. virq port was rebound, but the saved portnum wasn't updated) or b) you're getting a different event on a different port which possibly needs to be dealt with first, and then afterwards you should check the event channel again for the event you're concerned with.

Anyway, I suggest taking a big step back from this event channel stuff and to approach the problem by tracing as much as possible for the working case "no vfb in the config" and then again with the vfb in the config.  Then see what pops out as differences between the traces. If it looks event channel related, then increase your tracing of event channel related code, and repeat the process of comparing differences.

--- Additional comment from minovotn@redhat.com on 2010-04-22 11:39:18 EDT ---

(In reply to comment #33)
> First, why are all the comments in this bug private?
> Second, I'm still not really sure what's needed from me at this point, but I'll
> give a response a try.
> 
> Here's what I understand as your current assessment, with some of my own
> assumptions based on the comments in this bug (because without really knowing
> what I need to look at, I didn't look at anything).
> 
> 1) we attempt to suspend a guest that has a vfb device, but it fails due to
> limited disk space


Well, it's not about VFB device or similar. Maybe I should have filed a new bug from this but the thing is that this is not connected to VFB devices or anything. The point is that when I try to save the guest for the second time or poweroff the guest that is resumed, the event channel is not giving the value to the user-space as it should like it does on the shutdown event normally - i.e. *before* the resume operation was called. Therefore the user-space can't do a proper cleanup because xenstore is not getting the proper information from the event channel.


> 2) since it failed to suspend, it should resume, but instead it stays in a
> half-way suspended state


Well, it's resumed correctly but it cannot shutdown/save again since the event channel is not giving proper signal to trigger domain_cleanup().


> 3) from your analysis it fails to resume because we never call domain_cleanup,
> which I presume we need to do in order to replace the old domain state with 
> the new resumed state.

That's correct and the domain_cleanup() is a user-space function in xenstored which is triggered if the data port (value read from event channel) is equal to the one that was bound to virq_port using the EVTCHNOP_bind_virq hypercall before.

> 4) you believe we never call domain_cleanup, because the HV call that checks
> for pending events on the event channel doesn't return the expected port (the
> virq port), and that it's not returning it on purpose (due to some flag or
> whatever).


There's no evidence of domain_cleanup() call in my debug. There *is* evidence *before* resume is called but when it fails to save the guest and the resume is called, there is no evidence of port == virq_port that calls the domain_cleanup() so the guest cannot shutdown/save/suspend.


> 5) your question to me is whether (4) is a correct assessment or not
> 

Basically my question is whether you can investigate this on the event channel side since event channel *should* be giving one value on shutdown event (the one bound using the EVTCHNOP_bind_virq hypercall) it's no longer giving after resume operation. This value is going fine from the event channel only in case the resume was *not* called.

> If I have all the correct, then I will say that (4) is most likely not correct.
> If you're expecting an event on a particular port, but don't get it, then I
> believe the two more likely possibilies are a) you're expecting it to be on the
> wrong port (i.e. virq port was rebound, but the saved portnum wasn't updated)
>

I was thinking of this already and the virq_port is not rebound according to the code. But there is a call to reset (close) and then reconnect the event channel ports when resume is done but this is necessary and no change concerning the virq_port value. This procedure is necessary but as I was investigating this I saw no problem here. It's only about store_mfn and console_mfn. Those ports are also rebound in the xenstore daemon when the introduce_domain is called but it's not working without it.

> or b) you're getting a different event on a different port which possibly needs
> to be dealt with first, and then afterwards you should check the event channel
> again for the event you're concerned with.
> 
> Anyway, I suggest taking a big step back from this event channel stuff and to
> approach the problem by tracing as much as possible for the working case "no
> vfb in the config" and then again with the vfb in the config.  Then see what
> pops out as differences between the traces. If it looks event channel related,
> then increase your tracing of event channel related code, and repeat the
> process of comparing differences.    

Well, this is not connected to VFB stuff. The real point here is that according to my testing it was doing even without PVFB devices when you did the resume.

Michal

--- Additional comment from minovotn@redhat.com on 2010-04-27 09:13:52 EDT ---

(In reply to comment #34)
> (In reply to comment #33)
> > First, why are all the comments in this bug private?
> > Second, I'm still not really sure what's needed from me at this point, but I'll
> > give a response a try.
> > 

Well, I've been looking to this one again and there's no rebinding code for virq_port. This is being bound in domain_init() function which is being called in xenstored_core.c on xenstore daemon startup which means that domain_init() is basically initialization for dom0. There is no port rebinding or anything similar. Event channel is not giving the virq_port value as it does before the resume was done. Xenstore daemon is not being restarted when XenD is so this is setting something strange but what XenD does on the domain restart is the initialization of all domains, including dom0. Since the reinitialization of dom0 seems to be done isn't it possible that event channel remembers some flag (meaning something like virq_port_used or anything similar) to prevent the current port (which are basically data read from the event channel using read on /dev/xen/evtchn device) to be set to the value of virq_port ?

I'd study it myself but unfortunately I don't know how does the event channel (or anything in kernel) provide data to be accessible as from a character device, therefore what function does it use to provide data that are readable by user-space read() operation on /dev/xen/evtchn device.

Michal

--- Additional comment from minovotn@redhat.com on 2010-05-05 03:44:50 EDT ---

(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #33)
> > > First, why are all the comments in this bug private?
> > > Second, I'm still not really sure what's needed from me at this point, but I'll
> > > give a response a try.
> > > 
> 
> Well, I've been looking to this one again and there's no rebinding code for
> virq_port. This is being bound in domain_init() function which is being called
> in xenstored_core.c on xenstore daemon startup which means that domain_init()
> is basically initialization for dom0. There is no port rebinding or anything
> similar. Event channel is not giving the virq_port value as it does before the
> resume was done. Xenstore daemon is not being restarted when XenD is so this is
> setting something strange but what XenD does on the domain restart is the
> initialization of all domains, including dom0. Since the reinitialization of
> dom0 seems to be done isn't it possible that event channel remembers some flag
> (meaning something like virq_port_used or anything similar) to prevent the
> current port (which are basically data read from the event channel using read
> on /dev/xen/evtchn device) to be set to the value of virq_port ?
> 
> I'd study it myself but unfortunately I don't know how does the event channel
> (or anything in kernel) provide data to be accessible as from a character
> device, therefore what function does it use to provide data that are readable
> by user-space read() operation on /dev/xen/evtchn device.
> 
> Michal
Comment 1 Andrew Jones 2011-01-21 08:52:06 EST
This problem may go away after doing bug 497080.
Comment 3 Laszlo Ersek 2011-06-15 08:41:14 EDT

*** This bug has been marked as a duplicate of bug 497080 ***

Note You need to log in before you can comment on or make changes to this bug.