Bug 1234276

Summary: When running ocf:heartbeat:pgsql resource in enforcing mode, systemd-logind process is not able to send a D-bus message to a cluster service
Product: Red Hat Enterprise Linux 7 Reporter: Naoya Hashimoto <nhashimo>
Component: selinux-policyAssignee: Miroslav Grepl <mgrepl>
Status: CLOSED ERRATA QA Contact: Milos Malik <mmalik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: agk, cluster-maint, ksrot, lvrabec, mailinglists, mgrepl, mmalik, nhashimo, plautrba, pvrabec, ssekidde
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: selinux-policy-3.13.1-48.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1249430 (view as bug list) Environment:
Last Closed: 2015-11-19 10:37:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch for ocf:heartbeat:pgsql none

Description Naoya Hashimoto 2015-06-22 09:18:24 UTC
Description of problem:
Running PostgreSQL (Master/Slave) replicated cluster composed of Red Hat Enter Prise Linux Add-on and ocf:heartbeatgsql failed in SELinux enforcing mode.

Version-Release number of selected component (if applicable):

 - OS: RHEL-7.1 (x86_64)
 - Kernel: 3.10.0-229.el7.x86_64
 - HA:
    corosync-2.3.4-4.el7_1.1.x86_64
    pacemaker-1.1.12-22.el7_1.2.x86_64,
    pcs-0.9.137-13.el7_1.2.x86_64
    resource-agents-3.9.5-40.el7
 - DB: postgresql-server-9.2.10-2.el7_1.x86_64

How reproducible:
100% when running in SELinux enforcing mode.


Steps to Reproduce:
1. Install RHHA and creating a cluster. 
Cf. <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Overview/>
 
2. Create a multi-state resource (ocf:hearbeat:pgsql) and a resource group (ocf:hearbeat:IPaddr2)

# configure property, resource defaults policy
pcs property set no-quorum-policy="ignore"
pcs resource defaults resource-stickiness="INFINITY"
pcs resource defaults migration-threshold="1"

# create resource (vip)
cib_file=cluster_pgsql.xml
pcs -f ${cib_file} resource create vip-master IPaddr2 \
   ip="192.168.100.100" \
   nic="eth0" \
   cidr_netmask="24"

pcs -f ${cib_file} resource create vip-rep IPaddr2 \
   ip="192.168.102.100" \
   nic="eth2" \
   cidr_netmask="24" \
   meta migration-threshold="0"

# create resource (postgresql)
pcs -f ${cib_file} resource create pgsql ocf:heartbeat:pgsql \
   rep_mode="async" \
   node_list="db01 db02" \
   restore_command="cp %p /var/lib/pgsql/pg_archive/%f" \
   primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
   master_ip="192.168.102.100" \
   restart_on_promote='true' \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
   op monitor timeout="60s" interval="4s" on-fail="restart" \
   op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" \
   op promote timeout="60s" interval="0s"  on-fail="restart" \
   op demote  timeout="60s" interval="0s"  on-fail="stop" \
   op stop    timeout="60s" interval="0s"  on-fail="block" \
   op notify  timeout="60s" interval="0s"

# configure master resource
pcs -f ${cib_file} resource master msPostgresql pgsql \
   master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

# create group of ip address resources
pcs -f ${cib_file} resource group add master-group vip-master vip-rep

# Configure collocation/Ordering Constraints\
pcs -f ${cib_file} constraint colocation add master-group with master msPostgresql score=INFINITY
pcs -f ${cib_file} constraint order promote msPostgresql then start master-group symmetrical=false score=INFINITY
pcs -f ${cib_file} constraint order demote msPostgresql then stop master-group symmetrical=false score=-INFINITY

# update from raw xml
pcs cluster cib-push ${cib_file}

3. Verify if the resource is running

pcs status


Actual results:

[root@db01 pcs]# pcs status
Cluster name: pgha
Last updated: Sat Jun 20 10:53:09 2015
Last change: Sat Jun 20 10:47:54 2015
Stack: corosync
Current DC: db02 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ db01 db02 ]

Full list of resources:

 vm01_fence        (stonith:fence_xvm):        Started db02
 vm02_fence        (stonith:fence_xvm):        Started db02
 Master/Slave Set: msPostgresql [pgsql]
     Stopped: [ db01 db02 ]
 Resource Group: master-group
     vip-master        (ocf::heartbeat:IPaddr2):        Stopped
     vip-rep        (ocf::heartbeat:IPaddr2):        Stopped

Failed actions:
    pgsql_start_0 on db01 'unknown error' (1): call=682, status=Timed Out, exit-reason='none', last-rc-change='Sat Jun 20 10:47:57 2015', queued=0ms, exec=60007ms
    pgsql_start_0 on db02 'unknown error' (1): call=1527, status=complete, exit-reason='My data may be inconsistent. You have to remove /var/lib/pgsql/tmp/PGSQL.lock file to force start.', last-rc-change='Sat Jun 20 10:47:56 2015', queued=0ms, exec=50279ms


PCSD Status:
  db01: Online
  db02: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



Expected results:

The following resources should succeeded in running.

 - Multi-state master resource called msPostgresql (ocf:hearbeat:pgsql) 
 - Resource Group called master-group including vip-master and vip-rep (ocf:heartbeat:IPaddr2)

The output below shows when the resource succeeded in running in enforcing mode. 

[root@db01 pcs]# pcs status
Cluster name: pgha
Last updated: Sat Jun 20 10:46:33 2015
Last change: Sat Jun 20 10:40:05 2015
Stack: corosync
Current DC: db02 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ db01 db02 ]

Full list of resources:

 vm01_fence        (stonith:fence_xvm):        Started db02
 vm02_fence        (stonith:fence_xvm):        Started db02
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ db02 ]
     Slaves: [ db01 ]
 Resource Group: master-group
     vip-master        (ocf::heartbeat:IPaddr2):        Started db02
     vip-rep        (ocf::heartbeat:IPaddr2):        Started db02

PCSD Status:
  db01: Online
  db02: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Additional info:

I checked the following conditions to verify if it has something to do with the ocf:heartbeat:pgsql resource agent and if the resource is able to run in SELinux enforcing mode.

 - Running PostgreSQL Daemon: OK (systemctl start postgresql)
 - Running Streaming Replication: OK (manually configured)
 - Running systemdostgresql: OK (via Resource Agent)
 - Running ocf:heartbeatgsql: NG (via Resource Agent)

The output below shows the result of running the resource to collect all SELinux denials when running the same resource in enforcing mode and permissive mode. 

* Enforcing mode

[root@db01 pcs]# getenforce
Enforcing

[root@db01 pcs]# ausearch -m avc -m user_avc -m selinux_err -i
----
type=USER_AVC msg=audit(06/20/2015 10:47:57.209:1806512) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc:  received setenforce notice (enforcing=1)  exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:47:57.219:1806513) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601582 spid=28847 tpid=3376 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:48:22.324:1806520) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601583 spid=28847 tpid=3430 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:48:47.381:1806527) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601584 spid=28847 tpid=3448 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:48:57.264:1806531) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601585 spid=28847 tpid=3542 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:49:22.401:1806538) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601586 spid=28847 tpid=3617 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
type=USER_AVC msg=audit(06/20/2015 10:49:47.533:1806545) : pid=597 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.601587 spid=28847 tpid=3704 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'

* permissive mode

[root@db01 pcs]# getenforce
Permissive

[root@db01 pcs]# pcs status
Cluster name: pgha
Last updated: Sat Jun 20 10:46:33 2015
Last change: Sat Jun 20 10:40:05 2015
Stack: corosync
Current DC: db02 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ db01 db02 ]

Full list of resources:

 vm01_fence        (stonith:fence_xvm):        Started db02
 vm02_fence        (stonith:fence_xvm):        Started db02
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ db02 ]
     Slaves: [ db01 ]
 Resource Group: master-group
     vip-master        (ocf::heartbeat:IPaddr2):        Started db02
     vip-rep        (ocf::heartbeat:IPaddr2):        Started db02

PCSD Status:
  db01: Online
  db02: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

[root@db01 pcs]# ausearch -m avc -m user_avc -m selinux_err -i
<no matches>

Comment 2 Naoya Hashimoto 2015-06-23 01:36:16 UTC
I added mmarik to Cc: because he helped me to process and analyze the SELinux denial messages.
As a temporary workaround, he suggested me to create and load a local selinux module as follows and I succeeded in running the multi-state resource with ocf:heartbeat:pgsql.


* create a local policy module

# cat additional-rhha.te
policy_module(additional-rhha,1.0)

require {
  type systemd_logind_t;
  type cluster_t;
  class dbus { send_msg };
}

allow systemd_logind_t cluster_t : dbus { send_msg };
allow cluster_t systemd_logind_t : dbus { send_msg };

* install packages to compile a binary form

# yum -y install selinux-policy-devel policycoreutils-devel

* Compile the local policy module

# make -f /usr/share/selinux/devel/Makefile 

* load the binary form of the policy module into memory to activate the rules present inside (the policy module is able to survive a reboot)

# semodule -i additional-rhha.pp

If you need to unload the module from the memory to deactivate the rules present inside, you can remove them using -r option.
# semodule -r additional-rhha

Comment 3 Milos Malik 2015-06-23 07:29:29 UTC
The insufficiency is in selinux-policy rather then in resource-agents.

Comment 4 Naoya Hashimoto 2015-06-25 03:15:13 UTC
Created attachment 1042922 [details]
patch for ocf:heartbeat:pgsql

Comment 5 Naoya Hashimoto 2015-06-26 03:59:37 UTC
(In reply to Naoya Hashimoto from comment #4)
> Created attachment 1042922 [details]
> patch for ocf:heartbeat:pgsql

After configuring and applying the attached patch, I succeeded in running the multi-state resource in SELinux enforcing mode following the same instruction as before.
The patch uses /sbin/runuser command instead of using su command to run resources in the ocf:heartbeat:pgsql agent. Please see the details in the attachment (1042922).

I believe we have another option to use the patch in order to run the multi-state resource using ocf:heartbeat:pgsql instead of configuring a local policy module.

* show selinux mode

[root@db01 ~]# getenforce 
Enforcing

* apply patch
cd /usr/lib/ocf/resource.d/heartbeat/
patch < pgsql.save

* create a multi-state resource (ocf:hearbeat:pgsql) and a resource group (ocf:hearbeat:IPaddr2)

* verify pcs status

[root@db01 ~]# pcs status
Cluster name: pgha
Last updated: Fri Jun 26 12:56:14 2015
Last change: Thu Jun 25 11:49:31 2015
Stack: corosync
Current DC: db01 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured

Online: [ db01 db02 ]

Full list of resources:

 vm01_fence	(stonith:fence_xvm):	Started db01 
 vm02_fence	(stonith:fence_xvm):	Started db02 
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ db01 ]
     Slaves: [ db02 ]
 Resource Group: master-group
     vip-master	(ocf::heartbeat:IPaddr2):	Started db01 
     vip-rep	(ocf::heartbeat:IPaddr2):	Started db01 

PCSD Status:
  db01: Online
  db02: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

* verify the state of postgresql process, ip, streaming replication

root@db01 ~]# ps awux | grep [p]ostgres
postgres 27110  0.0  0.9 232144  9320 ?        S    Jun25   0:22 /usr/bin/postgres -D /var/lib/pgsql/data -c config_file=/var/lib/pgsql/data/postgresql.conf
postgres 27148  0.0  0.1 189760  1524 ?        Ss   Jun25   0:00 postgres: logger process   
postgres 27150  0.0  0.1 232144  1672 ?        Ss   Jun25   0:00 postgres: checkpointer process   
postgres 27151  0.0  0.1 232144  1944 ?        Ss   Jun25   0:00 postgres: writer process   
postgres 27152  0.0  0.1 232144  1440 ?        Ss   Jun25   0:00 postgres: wal writer process   
postgres 27153  0.0  0.2 233008  2920 ?        Ss   Jun25   0:01 postgres: autovacuum launcher process   
postgres 27154  0.0  0.1 191856  1336 ?        Ss   Jun25   0:00 postgres: archiver process   
postgres 27155  0.0  0.1 191996  1728 ?        Ss   Jun25   0:04 postgres: stats collector process   
postgres 27621  0.0  0.2 232992  2876 ?        Ss   Jun25   0:10 postgres: wal sender process postgres 192.168.102.102(56684) streaming 0/30000E0

[root@db01 ~]# ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:59:09:ed brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.101/32 brd 192.168.100.101 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.100.100/24 scope global eth0
       valid_lft forever preferred_lft forever

[root@db01 ~]# ip a s eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:5e:fb:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.102.101/24 brd 192.168.102.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet 192.168.102.100/24 brd 192.168.102.255 scope global secondary eth2
       valid_lft forever preferred_lft forever

[root@db01 ~]# su -l postgres -c "psql -x -c 'select * from pg_stat_replication'"
-[ RECORD 1 ]----+------------------------------
pid              | 27621
usesysid         | 10
usename          | postgres
application_name | db02
client_addr      | 192.168.102.102
client_hostname  | 
client_port      | 56684
backend_start    | 2015-06-25 11:49:29.330045+09
state            | streaming
sent_location    | 0/30000E0
write_location   | 0/30000E0
flush_location   | 0/30000E0
replay_location  | 0/30000E0
sync_priority    | 0
sync_state       | async

Comment 6 Naoya Hashimoto 2015-07-30 01:42:43 UTC
The patch I attached to fix the bug is merged upstream.
Cf. <https://github.com/ClusterLabs/resource-agents/commit/13c3f5a741fb6fe3307ceb9f29e6e5aced8c3511>
Should I request a new ticket in order to request back-porting of the patch?

Comment 7 Miroslav Grepl 2015-08-04 15:02:50 UTC
It looks we already have in 7.2

systemd_dbus_chat_logind(cluster_t)

Comment 8 Miroslav Grepl 2015-08-07 10:13:46 UTC
Could you test it with the latest RHEL-7.2 builds.

Comment 9 Lukas Vrabec 2015-08-18 10:33:23 UTC
Any update here?

Comment 12 Sam McLeod 2015-09-04 05:27:42 UTC
Any update on this bug?

Comment 14 Miroslav Grepl 2015-09-04 07:42:04 UTC
(In reply to Sam McLeod from comment #12)
> Any update on this bug?

The fix will be a part of RHEL-7.2.

Comment 15 Lukas Vrabec 2015-09-04 08:57:13 UTC
commit 822257ce3898071f9589fd80b57f209f9de845d2
Author: Miroslav Grepl <mgrepl>
Date:   Wed Jan 28 08:43:53 2015 +0100

    Allow cluster domain to dbus chat with systemd-logind.

Comment 19 errata-xmlrpc 2015-11-19 10:37:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2300.html