Bug 1458901
| Summary: | Error: vdsm.virt.sampling.VMBulkstatsMonitor object at 0x3155490> operation failed (KeyError: 'balloon.current') | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Yaniv Kaul <ykaul> | ||||
| Component: | Core | Assignee: | Francesco Romani <fromani> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Shira Maximov <mshira> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.19.20 | CC: | bugs, fromani, michal.skrivanek, mzamazal, vyerys, ykaul | ||||
| Target Milestone: | ovirt-4.2.0 | Flags: | rule-engine:
                ovirt-4.2+ | ||||
| Target Release: | 4.20.9.1 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-09-12 09:52:05 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: | 
 | ||||||
| 
        
          Description
        
        
          Yaniv Kaul
        
        
        
        
        
          2017-06-05 19:03:25 UTC
        
       the VM was paused just before the err due to outgoing migration. I think this case is not covered, many of those stats are not present when the VM is paused. Perhaps just extend the _skip_if_missing_stats check? Created attachment 1285707 [details]
Another balloon.current KeyError
Please also look at the attached log snippet, when the error occurs on VM creation. It's from my development Vdsm version, so I can't 100% guarantee it's not induced by something else, but the VM has been happily running from its start to successful migration. So I think it's worth checking whether a contingent fix covers this case as well.
From the provided snippets, it seems to me another benign instance of stats not present (aka, we do much noise in the logs for something not important). Now, the problem with a generic solution is that we cannot distinguish from the outside if the event of missing information is benign or not. It is benign in well known circumstances, like VM starting up, shutting down or migrating away; at steady state, however, we expect this data to be present, so it is strange and suspicious if the data isn't there. This is the reason why our handling of missing data is inconsistent and cumbersome. However, I believe so far we framed the issue in a wrong way. Lack of data reported from libvirt is never actionable. We lack data, we cannot do anything about that, so there no point in bubbling up the error. The best thing we could do is concisely complain in debug mode, and silently pass through in each and other case. Setting it for 4.2, we can backport if needed, I guess (unsure we need to) patches merged on master, and we didn't branch out 4.2 branch, so MODIFIED. Hey, I tried to reproduce the bug on : Software version:4.2.0-0.0.master.20170901193740.git7900511.el7.centos I couldn't reproduce this bug, since the error was caused by race condition. Steps: 1. Configured cluster and VM with memory balloon 2. Migrate the VM back and forth several times 3. Changed the log mode of VDSM to DEBUG to catch the error. Yaniv, I couldn't reproduce the bug, can you reproduce the error on your environment, in DEBUG mode? (In reply to Shira Maximov from comment #6) > Hey, > > I tried to reproduce the bug on : > Software version:4.2.0-0.0.master.20170901193740.git7900511.el7.centos > > I couldn't reproduce this bug, since the error was caused by race condition. > > Steps: > 1. Configured cluster and VM with memory balloon > 2. Migrate the VM back and forth several times > 3. Changed the log mode of VDSM to DEBUG to catch the error. > > > Yaniv, I couldn't reproduce the bug, can you reproduce the error on your > environment, in DEBUG mode? It's oVirt system tests, not my environment. I can check if I see it. Doesn't seem to happen anymore - closing as CURRENTRELEASE. *** Bug 1467696 has been marked as a duplicate of this bug. *** |