Bug 1638894
| Summary: | domain life cycle events are not consistently fired during some situations. | ||
|---|---|---|---|
| Product: | [Community] Virtualization Tools | Reporter: | David Vossel <dvossel> |
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | berrange, dvossel, fdeutsch, jdenemar, libvirt-maint, pkrempa, tburke |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-16 09:51:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Vossel
2018-10-12 17:49:58 UTC
Could you please elaborate what's meant by "domain api being invoked"? Also could you please elaborate how do you receive the events? Do you run a separate thread with an event loop? It is mandatory for delivering events. The event loop runs in a separate thread. In this case it's actually a goroutine since we're using the libvirt-go bindings. For the "domain api being invoked" question, I see the way I worded that is unclear. What I mean is that I've observed a correlation that indicates lifecycle events being fired depends on whether or not other seemingly unrelated functions are called. For example. If I'm listening on a target node for a migration to arrive, I never receive the lifecycle event indicating the domain started on the target... However if I call a function that lists all the domains and retrieves the found domain's xml in a background thread while waiting for the migration to arrive in another thread, then I do get the lifecycle event. "Also could you please elaborate how do you receive the events?" We're registering a callback using the libvirt-go bindings. Here's the binding. https://github.com/libvirt/libvirt-go/blob/master/domain_events.go#L961 Here's the libvirt function that binding actually invokes. https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectDomainEventRegisterAny The logic on our side is essentially just this. ------------------- libvirt.EventRegisterDefaultImpl() go func() { for { if res := libvirt.EventRunDefaultImpl(); res != nil { // failed listening to events, retry time.Sleep(time.Second) } } }() entrypointCallback := func(c *libvirt.Connect, d *libvirt.Domain, event *libvirt.DomainEventLifecycle) { fmt.Printf("yay got an event %v", event) } domainConn.DomainEventLifecycleRegister(entrypointCallback) --------------------- We never see the "yay got an event" log message for a migrated domain. We do however receive the log message if I add another goroutine that sits in a loop listing all known domains and retrieving their domain xml. So, using that same domainConn object if I add something like this in the background then the lifecycle events work. -------------------------- go func() { for { doms, _ := domainConn.ListAllDomains(libvirt.CONNECT_LIST_DOMAINS_ACTIVE | libvirt.CONNECT_LIST_DOMAINS_INACTIVE) for _, dom := range doms { dom.GetXMLDesc(libvirt.DOMAIN_XML_MIGRATABLE) dom.Free() } time.Sleep(time.Second*5) } }() -------------------------- Yes, I know that sounds crazy. (In reply to David Vossel from comment #3) [...] > The logic on our side is essentially just this. > > ------------------- > libvirt.EventRegisterDefaultImpl() > go func() { > for { > if res := libvirt.EventRunDefaultImpl(); res != nil { > // failed listening to events, retry > time.Sleep(time.Second) I presume the timeout is 1 second here. Libvirt's eventloop is meant to be run without a timeout since it blocks until events to process arrive. > } > } > }() > [...] > -------------------------- > domainConn.ListAllDomains(libvirt.CONNECT_LIST_DOMAINS_ACTIVE | > libvirt.CONNECT_LIST_DOMAINS_INACTIVE) > for _, dom := range doms { > dom.GetXMLDesc(libvirt.DOMAIN_XML_MIGRATABLE) Which would explain that this fixes it, since an API processes all pending requests _without_ a timeout until the response for the API is received. > dom.Free() > } > time.Sleep(time.Second*5) > } > }() > -------------------------- > > Yes, I know that sounds crazy. Well, I think the sleep in the eventloop causes it being saturated by keepalive requests and can't get to process your events until you invoke the API which processes all incomming data. If removing the timeout does not help please try the following: Could you please also retry your scenario while waiting for events using virsh: virsh event --loop --all --timestamp (In reply to Peter Krempa from comment #4) > (In reply to David Vossel from comment #3) > > [...] > > > The logic on our side is essentially just this. > > > > ------------------- > > libvirt.EventRegisterDefaultImpl() > > go func() { > > for { > > if res := libvirt.EventRunDefaultImpl(); res != nil { > > // failed listening to events, retry > > time.Sleep(time.Second) > > I presume the timeout is 1 second here. Libvirt's eventloop is meant to be > run without a timeout since it blocks until events to process arrive. Sorry I've misread the code. Well it indeed should run without timeout here ... please try the virsh event listener to see whether that's a go-specific problem: > Could you please also retry your scenario while waiting for events using > virsh: > > virsh event --loop --all --timestamp (In reply to David Vossel from comment #3) > The event loop runs in a separate thread. In this case it's actually a > goroutine since we're using the libvirt-go bindings. > > For the "domain api being invoked" question, I see the way I worded that is > unclear. What I mean is that I've observed a correlation that indicates > lifecycle events being fired depends on whether or not other seemingly > unrelated functions are called. > > For example. If I'm listening on a target node for a migration to arrive, I > never receive the lifecycle event indicating the domain started on the > target... However if I call a function that lists all the domains and > retrieves the found domain's xml in a background thread while waiting for > the migration to arrive in another thread, then I do get the lifecycle event. > > "Also could you please elaborate how do you receive the events?" > > We're registering a callback using the libvirt-go bindings. > > Here's the binding. > https://github.com/libvirt/libvirt-go/blob/master/domain_events.go#L961 > > Here's the libvirt function that binding actually invokes. > https://libvirt.org/html/libvirt-libvirt-domain. > html#virConnectDomainEventRegisterAny > > > The logic on our side is essentially just this. > > ------------------- > libvirt.EventRegisterDefaultImpl() > go func() { > for { > if res := libvirt.EventRunDefaultImpl(); res != nil { > // failed listening to events, retry > time.Sleep(time.Second) > } > } > }() IIUC, that is from this code: https://github.com/kubevirt/kubevirt/blob/master/cmd/virt-launcher/virt-launcher.go#L125 which is called from https://github.com/kubevirt/kubevirt/blob/master/cmd/virt-launcher/virt-launcher.go#L400 This, however, is *after* you have already opened the libvirt connection https://github.com/kubevirt/kubevirt/blob/master/cmd/virt-launcher/virt-launcher.go#L366 The event impl *must* be registered before any connection is opened: https://libvirt.org/html/libvirt-libvirt-event.html#virEventRegisterImpl As a result the event subsystem won't be available when the remote connection is opened, and thus it will not be able to register a callback to receive async events out of band. As a result events will only be delivered at the next synchronous API call you make. This is why you only see the events when you run an API like listing domains. Daniel, It sounds like you nailed it. I'll give that a shot. |