Hide Forgot
Description of problem: ----------------------------------------- Appears to be similar to the upstream nova bug attached to this BZ with queries against a domain which is being worked on asynchronously can result in hung clients and the inability to do anything further with that domain. Version-Release number of selected component (if applicable): ----------------------------------------- libvirt-1.2.8-16.el7_1.3.x86_64 libvirt-client-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-config-network-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-interface-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-lxc-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-network-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-qemu-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-secret-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-driver-storage-1.2.8-16.el7_1.3.x86_64 libvirt-daemon-kvm-1.2.8-16.el7_1.3.x86_64 libvirt-python-1.2.8-7.el7_1.1.x86_64 openstack-nova-common-2014.1.5-31.el7ost.noarch openstack-nova-compute-2014.1.5-31.el7ost.noarch How reproducible: ----------------------------------------- Appears to occur every few days based on heavy queries however logging appears to indicate two separate threads that have made libvirt API calls against the same instance. Based on https://bugs.launchpad.net/nova/+bug/1254872 one of the calls has either hung completely, or is taking a very long time to respond causing the second API call to report this error message "libvirtError: Timed out during operation: cannot acquire state change lock". Actual results: ----------------------------------------- Queries against a domain which is being worked on asynchronously can result in hung clients and the inability to do anything further with that domain. Expected results: ----------------------------------------- Queries against a domain which is being worked on asynchronously do not result in hung clients and the inability to do anything further with that domain. Additional info: ----------------------------------------- We've requested the customer enable verbose libvirt logging to help identity the specific API calls that were last reported as successful as it appears possible this can be caused by a lot of factors.
Assuming they are about to provide debug logs with the below log filters: 1. In /etc/libvirt/libvirtd.conf, have these two config attributes: . . . log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log" . . . 2. Restart libvirtd: $ systemctl restart libvirtd 3. Repeat the test. 4. Capture the libvirt logs and attach them as plain text to the bug.