Bug 1390658

Summary: false-positive monitoring operation result of ocf::pacemaker:ping resource
Product: Red Hat Enterprise Linux 7 Reporter: Josef Zimek <pzimek>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED NOTABUG QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: agk, cluster-maint, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-08 13:43:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Josef Zimek 2016-11-01 15:40:11 UTC
Description of problem:

According do resource description:
"Every time the monitor action is run, this resource agent records (in the CIB) the current number of nodes the host can connect to using the system fping (preferred) or ping tool."


However resource doesn't perform ping of configured host. Resource starts even with non-reachable IP address:


# pcs resource create pingGW ping host_list="1.1.1.2" op monitor interval=5 timeout=5


pcs status shows the resource started successfully:

pingGW	(ocf::pacemaker:ping):	Started virt-026



# tcpdump -n host 1.1.1.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes


^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel



Now lets update the ping resource to use IP address which actually exists:

# ping 10.34.70.4
PING 10.34.70.4 (10.34.70.4) 56(84) bytes of data.
64 bytes from 10.34.70.4: icmp_seq=1 ttl=255 time=1.87 ms
64 bytes from 10.34.70.4: icmp_seq=2 ttl=255 time=0.709 ms
^C
--- 10.34.70.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.709/1.290/1.871/0.581 ms



# tcpdump -n host 10.34.70.4

15:57:15.811220 IP 10.34.70.145 > 10.34.70.4: ICMP echo request, id 21202, seq 1, length 64
15:57:15.812744 ARP, Request who-has 10.34.70.145 tell 10.34.70.4, length 46
15:57:15.812807 ARP, Reply 10.34.70.145 is-at 1a:00:00:00:00:12, length 28
15:57:15.813069 IP 10.34.70.4 > 10.34.70.145: ICMP echo reply, id 21202, seq 1, length 64
15:57:16.813502 IP 10.34.70.145 > 10.34.70.4: ICMP echo request, id 21202, seq 2, length 64
15:57:16.814074 IP 10.34.70.4 > 10.34.70.145: ICMP echo reply, id 21202, seq 2, length 64
15:57:21.815798 ARP, Request who-has 10.34.70.4 tell 10.34.70.145, length 28
15:57:21.816159 ARP, Reply 10.34.70.4 is-at 10:60:4b:a1:0d:d6, length 46




# pcs resource update pingGW host_list=10.34.70.4


Resource is still started:

pingGW	(ocf::pacemaker:ping):	Started virt-026


but no actual pings are done:

# tcpdump -n host 10.34.70.4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes


^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel



Now let's disable this IP via iptables:

# iptables -A INPUT -s 10.34.70.4 -j DROP

# ping -w1 -c5 10.34.70.4
PING 10.34.70.4 (10.34.70.4) 56(84) bytes of data.

--- 10.34.70.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms


Resource is still started:

 pingGW	(ocf::pacemaker:ping):	Started virt-026




Version-Release number of selected component (if applicable):

resource-agents-3.9.5-54.el7_2.10.x86_64



Actual results:

Monitoring operation not taking place otherwise the resource would fail


Expected results:

1. Resource doesn't start if created with non-reachable IP
2. Inability to ping configured IP address  (monitoring failure) actually leads to monitoring operation to fail followed by on-fail action

Comment 2 Oyvind Albrigtsen 2016-11-11 14:15:51 UTC
You need to set failure_score to have it fail when unreachable:
# pcs resource describe ping
...
  failure_score: Resource is failed if the score is less than failure_score.
                 Default never fails.