Bug 1449967
| Summary: | VMs intermittently go into not responding status | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Mahesh <mahesh.vidhyadharan> | ||||||||||||
| Component: | Core | Assignee: | Dan Kenigsberg <danken> | ||||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | meital avital <mavital> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 4.20.0 | CC: | bugs, mahesh.vidhyadharan, michal.skrivanek, tjelinek | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2017-05-18 07:12:17 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Mahesh
2017-05-11 09:25:40 UTC
Typically this indicates overloaded host. Please add some performance indicators, cpu usage, load, sar, iotop, etc. Created attachment 1279016 [details]
sar logs for 3 ovirt hosts
some sar logs.
Hi Thanks for getting back to me. As I mentioned in my post, the host doesn't seem to be loaded at all. Ovirt shows about 45% memory usage per host, Cpu usually 1%, and network 0%. These VMs are just powered on. No special services running, just idling windows 7 VMs so there's no reason for host to be overloaded. I've attached some sar logs for the day for each host anyway. hey, ok, in that case we need some more logs. Can you please provide the full VDSM, libvirt, qemu and engine logs here? Created attachment 1279243 [details]
vdsm log node01
Created attachment 1279245 [details]
vdsm log node02
Created attachment 1279246 [details]
vdsm log node03
Created attachment 1279249 [details]
ovirt engine logs
I noticed there aren't any libvirt logs in /var/log/vdsm/ on any hosts. The last time the host went to not responding was 02:52:02AM and there have been intermittent events until 10:46:07 according to the engine events.
The libvirtd service status shows following:
libvirtd.service - Virtualization daemon
Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/libvirtd.service.d
└─unlimited-core.conf
Active: active (running) since Fri 2017-05-05 15:14:39 BST; 1 weeks 3 days ago
Docs: man:libvirtd(8)
http://libvirt.org
Main PID: 3996 (libvirtd)
CGroup: /system.slice/libvirtd.service
└─3996 /usr/sbin/libvirtd --listen
May 16 05:28:10 node03.loc libvirtd[3996]: Cannot start job (query, none) for domain Win7_x64test14; current job is (query, none) owned by (46498 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (45s, 0s)
May 16 05:28:10 node03.loc libvirtd[3996]: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
May 16 05:28:41 node03.loc libvirtd[3996]: Cannot start job (query, none) for domain Win7_x64test14; current job is (query, none) owned by (46498 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (76s, 0s)
May 16 05:28:41 node03.loc libvirtd[3996]: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
May 16 05:30:25 node03.loc libvirtd[3996]: Cannot start job (query, none) for domain Win7_x64test19; current job is (query, none) owned by (4001 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s)
May 16 05:30:25 node03.loc libvirtd[3996]: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
May 16 05:32:35 node03.loc libvirtd[3996]: Cannot start job (query, none) for domain Win7_x64test16; current job is (query, none) owned by (4002 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s)
May 16 05:32:35 node03.loc libvirtd[3996]: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
May 16 05:33:05 node03.loc libvirtd[3996]: Cannot start job (query, none) for domain Win7_x64test16; current job is (query, none) owned by (4002 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (60s, 0s)
May 16 05:33:05 node03.loc libvirtd[3996]: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
This looks like the same issue as 1419856 which will be fixed in 4.1.2 Im closing this bug as duplicate of that. If you will face the same issue also after an update to 4.1.2 please reopen and we can keep investigating. *** This bug has been marked as a duplicate of bug 1419856 *** |