Bug 1307160

Summary: VirtualDomain can log unnecessary error when probing nonexistent domain
Product: Red Hat Enterprise Linux 7 Reporter: Ken Gaillot <kgaillot>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.2CC: agk, cluster-maint, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-69.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 00:01:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Gaillot 2016-02-12 21:14:21 UTC
Description of problem: Pacemaker startup probes of VirtualDomain resources can result in unnecessary and potentially confusing "Unable to determine emulator" log messages.

Version-Release number of selected component (if applicable): 3.9.5-54.el7_2.1


How reproducible: Reliably


Steps to Reproduce:
1. Call VirtualDomain's monitor action for a domain that does not exist and has not been seen before (i.e. no /run/resource-agents/VirtualDomain-$DOMAIN-emu.state file exists).

Actual results: Log message "ERROR: Unable to determine emulator for" domain.

Expected results: No errors logged for successful "not running" monitor of nonexistent domain.


Additional info: The VirtualDomain resource agent's get_emulator function can be called by update_emulator_cache or pid_status. While the above error message may be useful with the pid_status call, it is unnecessary with update_emulator_cache. Perhaps get_emulator could just return a status, and the caller could print the message or not.

Comment 3 Ken Gaillot 2016-03-18 20:36:35 UTC
I was unable to reproduce this with a KVM guest, but it does affect at least LXC. I was hoping to spare you the steps for setting up an LXC guest node :) but here is the reproducer:

1. Set up a pacemaker cluster with at least two cluster nodes (the cluster nodes may be VMs).

2. Make sure you have these prerequisites on all nodes:
* yum install rsync libvirt-daemon libvirt-daemon-driver-lxc libvirt-daemon-lxc libvirt-login-shell pacemaker-remote
* SELinux must be enabled (permissive or enforcing)
* libvirtd must be enabled and running
* root must be able to ssh without a password between all nodes

3. Download http://people.redhat.com/kgaillot/bz1307160/lxc_autogen.sh (newer version than the one supplied with RHEL 7.2 pacemaker-cts package)

4. Run "./lxc_autogen.sh -v" on each node to verify the local environment (should print the PID of libvirtd and no errors).

5. Create an LXC guest node: "./lxc_autogen.sh -g -a -m -s -c 1 && crm_resource --wait"

6. At this point, the guest node will have been probed on all nodes. Any nodes that aren't running the container will have logs like this in their /var/log/messages:
Mar 18 15:10:35 rhel7-2 VirtualDomain(container1)[4736]: ERROR: Unable to determine emulator for lxc1

7. To clean up afterwards: "./lxc_autogen.sh -R -s"

Comment 4 Oyvind Albrigtsen 2016-04-05 10:30:25 UTC
Working patch:
https://github.com/ClusterLabs/resource-agents/pull/787/files

Comment 6 michal novacek 2016-09-12 10:11:51 UTC
I'm unable to verify this bug with qemu or lxc in resource-agents-3.9.5-81, patch is simple enough.

I have verified that the patch is present as 

bz1307160-virtualdomain-fix-unnecessary-error-when-probing-nonexistent-domain.patch 

and that resource-agents can be built correctly.

Comment 8 errata-xmlrpc 2016-11-04 00:01:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html