Bug 1120826 - haproxy will not run, PCSD will not go online
Summary: haproxy will not run, PCSD will not go online
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Chris Feist
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-17 19:30 UTC by John H Terpstra
Modified: 2015-04-13 22:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-13 22:26:23 UTC


Attachments (Terms of Use)
Haproxy config file (446 bytes, text/plain)
2014-07-17 19:30 UTC, John H Terpstra
no flags Details
Earlier crm_report file (253.35 KB, application/octet-stream)
2014-07-17 20:20 UTC, John H Terpstra
no flags Details
crm_report - current - part A (needs to be concatenated with part B) (15.00 MB, application/octet-stream)
2014-07-17 20:21 UTC, John H Terpstra
no flags Details
crm_report - current - part B (needs to be concatenated with Part A) (7.74 MB, application/octet-stream)
2014-07-17 20:23 UTC, John H Terpstra
no flags Details
SOS Report (split part A) (18.00 MB, application/octet-stream)
2014-07-17 20:24 UTC, John H Terpstra
no flags Details
sosreport (split part B) (18.00 MB, application/octet-stream)
2014-07-17 20:25 UTC, John H Terpstra
no flags Details
SOS Report (split part C - final) (8.70 MB, application/octet-stream)
2014-07-17 20:25 UTC, John H Terpstra
no flags Details

Description John H Terpstra 2014-07-17 19:30:56 UTC
Created attachment 918802 [details]
Haproxy config file

Description of problem:
Haproxy is configured to enable RabbitMQ to operate in HA mode.  Haproxy can be started using systemctl, but fails when run via pacemaker. When pacemaker is started lb-haproxy will start and then is shutdown shortly after.  Disabling and then re-enabling will not restart haproxy.

Version-Release number of selected component (if applicable):
pacemaker-1.1.10-21.el7_0.x86_64
haproxy-2.5-0.3.dev22.el7.x86_64
rabbitmq-server-3.1.5-6.3.el7ost.noarch

How reproducible:
haproxy.conf file is included. 

Steps to Reproduce:
1. Install haproxy.conf file in /etc/haproxy directory
2. Execute the following:
a) ocs resource create lb-haproxy systemd:haproxy op monitor start-delay=10s --clone
b) pcs resource create rabbit-vip IPaddr2 ip=$BROKER_VIP (set elsewhere)
c) pcs resource create msg-rabbit systemd:rabbitmq-server --clone
3. Run pcs status to monitor progress of cluster services
4. Try to restart lb-haproxy fails (following steps demonstrate issue)
a) pcs resource disable lb-haproxy-clone
b) pcs resources enable lb-haproxy-clone
c) pcs status

Actual results:
Clone set: lb-haproxy-clone will stop and is not capable of being restarted via disable/enable sequence

Expected results:
disable/enable sequence is expected to restart lb-haproxy

Additional info:
crm_report and log files have not cast light on potential cause/s.

Comment 2 Chris Feist 2014-07-17 20:05:32 UTC
Does haproxy start one one node if you start with a fresh cluster and use this command:

pcs resource create lb-haproxy systemd:haproxy op monitor start-delay=10s

Comment 3 John H Terpstra 2014-07-17 20:07:37 UTC
Yes.

I also have the  crm_report and the sos report from the first node, but both files are too large to attach to this bug report.

Comment 4 John H Terpstra 2014-07-17 20:20:07 UTC
Created attachment 918820 [details]
Earlier crm_report file

This is an older file. An additional crm_report and current sosreport will also be attached.

Comment 5 John H Terpstra 2014-07-17 20:21:11 UTC
Created attachment 918821 [details]
crm_report - current - part A (needs to be concatenated with part B)

Comment 6 John H Terpstra 2014-07-17 20:23:06 UTC
Created attachment 918822 [details]
crm_report - current - part B (needs to be concatenated with Part A)

To reconsititute:
cat RHEL7HACluster.tar.bz2.a* > RHEL7HACluster.tar.bz2

Comment 7 John H Terpstra 2014-07-17 20:24:09 UTC
Created attachment 918823 [details]
SOS Report (split part A)

Comment 8 John H Terpstra 2014-07-17 20:25:13 UTC
Created attachment 918824 [details]
sosreport (split part B)

Comment 9 John H Terpstra 2014-07-17 20:25:53 UTC
Created attachment 918825 [details]
SOS Report (split part C - final)

Comment 10 Chris Feist 2014-07-17 22:53:14 UTC
John,

Can you send me the output of 'service haproxy status' on all 3 nodes?

Also, I did see this potential issue:

haproxy is set to bind to 192.168.123.39 (for the rabbitmq frontend), but that ip (resource: rabbit-vip) will only be running on one node.  And I'm pretty sure you only want to run haproxy on one node as well.

So I would change the configuration to do something like this (this will put the rabbit-vip & lb-haproxy on the same node, and require rabbit-vip starts before the haproxy service starts).

a) pcs resource create rabbit-vip IPaddr2 ip=$BROKER_VIP --group haproxy_group
b) pcs resource create lb-haproxy systemd:haproxy op monitor start-delay=10s --group haproxy_group
c) pcs resource create msg-rabbit systemd:rabbitmq-server --clone

If this still doesn't work, can you post the output of 'pcs status' (one one node) and 'service haproxy status' (on all 3 nodes).

Comment 11 Steve Reichard 2014-07-18 14:45:10 UTC
Currently only one service is under HAproxy and cluster control.


This experiement is attempting to add them 1 by 1.  When they are all added, each service will have a separte VIP making HAproxy partially used on each cluster systems and will change as teh VIPs relocate.

This is the recommended practice from Fabio's OSP how-to recipes.

I currently have a compelte cluster configured that used the pupper to deploy if you would like to see its values.

spr

Comment 12 Andrew Beekhof 2014-07-22 07:09:25 UTC
I see this in one of the configs:

        <meta_attributes id="lb-haproxy-clone-meta_attributes">
          <nvpair id="lb-haproxy-clone-meta_attributes-target-role" name="target-role" value="Stopped"/>
        </meta_attributes>

That would certainly prevent haproxy from being started (and cause pacemaker to stop haproxy if it found it running).

Its also looks like rabbitmq is suffering from the systemd-start-returns-too-early issue that we now have a z-stream for.  I'd recommend an upgrade.

The symptom of that being at tight loop of:

Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: log_execute: 	executing - rsc:msg-rabbit action:start call_id:110098
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: systemd_async_dispatch: 	Call to start passed: /org/freedesktop/systemd1/job/2894604
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: log_finished: 	finished - rsc:msg-rabbit action:start call_id:110098  exit-code:0 exec-time:348ms queue-time:0ms
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: pcmk_dbus_get_property: 	Calling: GetAll on org.freedesktop.systemd1
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: cancel_recurring_action: 	Cancelling operation msg-rabbit_status_60000
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: log_execute: 	executing - rsc:msg-rabbit action:stop call_id:110101
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: systemd_async_dispatch: 	Call to stop passed: /org/freedesktop/systemd1/job/2894708
Jul 16 22:45:26 [19028] rh7cntl1.rcbd.lab       lrmd:     info: log_finished: 	finished - rsc:msg-rabbit action:stop call_id:110101  exit-code:0 exec-time:86ms queue-time:0ms

Comment 13 Andrew Beekhof 2014-07-22 07:15:29 UTC
Specifically, it doesn't appear it was ever set otherwise:

# for f in `ls -1 rh7cntl3/pengine/pe-input-*.bz2`; do bzcat $f | grep lb-haproxy-clone-meta_attributes-target-role | grep -v Stopped ; done | wc -l
       0
# for f in `ls -1 rh7cntl3/pengine/pe-input-*.bz2`; do bzcat $f | grep lb-haproxy-clone-meta_attributes-target-role | grep Stopped ; done | wc -l
    4000


Note You need to log in before you can comment on or make changes to this bug.