Bug 2003732 - pacemaker resources are missing the "z" flag for most of the rw bind-mounts in containers
Summary: pacemaker resources are missing the "z" flag for most of the rw bind-mounts i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 16.2 (Train on RHEL 8.4)
Assignee: OSP Team
QA Contact: dabarzil
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-13 14:26 UTC by Cédric Jeanneret
Modified: 2021-12-09 20:41 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.5.1-2.20210603174824.el8ost.10
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:41:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1943459 0 None None None 2021-09-13 14:28:03 UTC
OpenStack gerrit 808964 0 None NEW [TRAIN-ONLY] Ensure OVN directory content is podman-compatible 2021-09-14 14:53:59 UTC
Red Hat Issue Tracker OSP-9549 0 None None None 2021-11-17 09:54:22 UTC
Red Hat Product Errata RHBA-2021:5067 0 None None None 2021-12-09 20:41:46 UTC

Description Cédric Jeanneret 2021-09-13 14:26:22 UTC
While investigating #1997351, it was found out that most of the pacemaker managed containers are lacking the "z" flag for their read-write bind mounts.

Though it's mostly working thanks to all the label work done within tripleo-heat-templates and host_prep_tasks, in addition to other containers having the same bind-mounts with the correct options, this situation is really dangerous:

- depending on the start order, a bundle might lock the data for itself ("z" allows to mount the same location in multiple containers)
- depending on external actions, a location might see its label changed, and this would prevent the bundle to properly start

We therefore must ensure all the proper options are set, while preventing killing the system (so, no "z" flag for /dev, /sys and other sensitive locations).

This is the "Red Hat" version of https://bugs.launchpad.net/tripleo/+bug/1943459

This is a patch we must ensure to land in 16.2.1.

Comment 2 Cédric Jeanneret 2021-09-13 14:49:36 UTC
Steps to verify:
- get a 16.2 OVN deploy env
- *during* controller deploy, connect to that controller
- check for the existence of /var/lib/openvswitch/ovn directory, as well as its label (ls -lZd /var/lib/openvswitch/ovn)
- if directory exists, change its type: chcon -t openvswitch_var_lib_t /var/lib/openvswitch/ovn
- follow /var/log/audit/audit.log, and ensure you don't see any AVC related to OVN service - especially when ovn-dbs-bundle is started
- you can also double check no errors are in /var/log/containers/stdouts/ovn-dbs-bundle.log

Comment 3 Cédric Jeanneret 2021-09-14 14:54:00 UTC
After some discussions, here's the plan:

adding the "z" flag will create an outage during day-2 operations, and we don't want that.

So we target https://review.opendev.org/c/openstack/puppet-tripleo/+/808774 for wallaby/osp-17 only, and get a train-only patch that will actually change the context during upgrade_tasks (linked patch here).

This same train-only patch will also be added to 16.1.x (see #1997351 for more details).

Comment 4 Cédric Jeanneret 2021-09-16 15:54:43 UTC
Note that rhos-16.2-patches has a squashed patch, merging those two:
- https://review.opendev.org/c/openstack/tripleo-heat-templates/+/808964
- https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809427

Comment 9 dabarzil 2021-10-06 10:19:23 UTC
No avc related to ovn service:

[stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates-11.5.1-2
openstack-tripleo-heat-templates-11.5.1-2.20210603174824.el8ost.10.noarch

[root@controller-0 ~]# grep -i "avc.*ovn" /var/log/audit/audit.log
[root@controller-0 ~]# 


[root@controller-0 ~]# cat /var/log/containers/stdouts/ovn-dbs-bundle.log
2021-10-06T08:09:24.962629445+00:00 stderr F + sudo -E kolla_set_configs
2021-10-06T08:09:24.975549467+00:00 stderr F sudo: unable to send audit message: Operation not permitted
2021-10-06T08:09:25.055352956+00:00 stderr F INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
2021-10-06T08:09:25.055726004+00:00 stderr F INFO:__main__:Validating config file
2021-10-06T08:09:25.055842771+00:00 stderr F INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
2021-10-06T08:09:25.055885495+00:00 stderr F INFO:__main__:Copying service configuration files
2021-10-06T08:09:25.056204214+00:00 stderr F INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets
2021-10-06T08:09:25.057176396+00:00 stderr F INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets
2021-10-06T08:09:25.057514130+00:00 stderr F INFO:__main__:Writing out command to execute
2021-10-06T08:09:25.070459613+00:00 stderr F ++ cat /run_command
2021-10-06T08:09:25.073326411+00:00 stdout F Running command: '/usr/sbin/pacemaker_remoted'
2021-10-06T08:09:25.073349176+00:00 stderr F + CMD=/usr/sbin/pacemaker_remoted
2021-10-06T08:09:25.073349176+00:00 stderr F + ARGS=
2021-10-06T08:09:25.073349176+00:00 stderr F + [[ ! -n '' ]]
2021-10-06T08:09:25.073349176+00:00 stderr F + . kolla_extend_start
2021-10-06T08:09:25.073349176+00:00 stderr F + echo 'Running command: '\''/usr/sbin/pacemaker_remoted'\'''
2021-10-06T08:09:25.073349176+00:00 stderr F + exec /usr/sbin/pacemaker_remoted
2021-10-06T08:09:25.089784613+00:00 stderr F (crm_add_logfile) 	notice: Additional logging available in /var/log/pacemaker/pacemaker.log
2021-10-06T08:09:25.089784613+00:00 stderr F (crm_log_init) 	info: Changed active directory to /var/lib/pacemaker/cores
2021-10-06T08:09:25.089784613+00:00 stderr F (main) 	notice: Starting Pacemaker remote executor
2021-10-06T08:09:25.089784613+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: lrmd
2021-10-06T08:09:25.089784613+00:00 stderr F (pcmk__init_tls_dh) 	info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS
2021-10-06T08:09:25.967490182+00:00 stderr F (lrmd_tls_set_key) 	error: No valid Pacemaker Remote key found at /etc/pacemaker/authkey
2021-10-06T08:09:25.967535096+00:00 stderr F (lrmd_init_remote_tls_server) 	warning: A cluster connection will not be possible until the key is available
2021-10-06T08:09:25.968481642+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_ro
2021-10-06T08:09:25.968853968+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_rw
2021-10-06T08:09:25.969089989+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_shm
2021-10-06T08:09:25.969251766+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: attrd
2021-10-06T08:09:25.969399830+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: stonith-ng
2021-10-06T08:09:25.969527314+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: pacemakerd
2021-10-06T08:09:25.969676346+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: crmd
2021-10-06T08:09:25.969848078+00:00 stderr F (main) 	notice: Pacemaker remote executor successfully started and accepting connections
2021-10-06T08:09:32.681314674+00:00 stderr F (crm_signal_dispatch) 	notice: Caught 'Terminated' signal | 15 (invoking handler)
2021-10-06T08:09:32.681314674+00:00 stderr F (lrmd_exit) 	info: Terminating with 0 clients
2021-10-06T08:09:32.681314674+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.681713412+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.681794686+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.681852520+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.681890523+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.681970622+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.682042284+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.682103157+00:00 stderr F (qb_ipcs_us_withdraw) 	info: withdrawing server sockets
2021-10-06T08:09:32.682213791+00:00 stderr F (crm_xml_cleanup) 	info: Cleaning up memory from libxml2
2021-10-06T08:09:32.682256305+00:00 stderr F (crm_exit) 	info: Exiting pacemaker-remoted | with status 0
2021-10-06T08:09:34.265518796+00:00 stderr F + sudo -E kolla_set_configs
2021-10-06T08:09:34.276518932+00:00 stderr F sudo: unable to send audit message: Operation not permitted
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Validating config file
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Copying service configuration files
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets
2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Writing out command to execute
2021-10-06T08:09:34.358385541+00:00 stderr F ++ cat /run_command
2021-10-06T08:09:34.360749708+00:00 stderr F + CMD=/usr/sbin/pacemaker_remoted
2021-10-06T08:09:34.360799385+00:00 stderr F + ARGS=
2021-10-06T08:09:34.360811853+00:00 stderr F + [[ ! -n '' ]]
2021-10-06T08:09:34.360822982+00:00 stderr F + . kolla_extend_start
2021-10-06T08:09:34.360961379+00:00 stdout F Running command: '/usr/sbin/pacemaker_remoted'
2021-10-06T08:09:34.360974830+00:00 stderr F + echo 'Running command: '\''/usr/sbin/pacemaker_remoted'\'''
2021-10-06T08:09:34.360974830+00:00 stderr F + exec /usr/sbin/pacemaker_remoted
2021-10-06T08:09:34.369991449+00:00 stderr F (crm_add_logfile) 	error: Directory '/var/log/pacemaker' does not exist: logging to '/var/log/pacemaker/pacemaker.log' is disabled
2021-10-06T08:09:34.370380654+00:00 stderr F (crm_log_init) 	info: Changed active directory to /var/lib/pacemaker/cores
2021-10-06T08:09:34.370435049+00:00 stderr F (main) 	notice: Starting Pacemaker remote executor
2021-10-06T08:09:34.370454773+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: lrmd
2021-10-06T08:09:34.370828626+00:00 stderr F (pcmk__init_tls_dh) 	info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS
2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_ro
2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_rw
2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: cib_shm
2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: attrd
2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: stonith-ng
2021-10-06T08:09:34.542622194+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: pacemakerd
2021-10-06T08:09:34.542883893+00:00 stderr F (qb_ipcs_us_publish) 	info: server name: crmd
2021-10-06T08:09:34.542883893+00:00 stderr F (main) 	notice: Pacemaker remote executor successfully started and accepting connections
2021-10-06T08:09:35.246321783+00:00 stderr F (pcmk__accept_remote_connection) 	info: Accepted new remote client connection from ::ffff:172.17.1.101
2021-10-06T08:09:35.246545466+00:00 stderr F (lrmd_remote_listen) 	info: Remote client pending authentication | 0x560383f8d1c0 id: 1e434cdb-6e95-4c62-9aa5-9d7d65bbda56
2021-10-06T08:09:35.793265984+00:00 stderr F (remoted__read_handshake_data) 	notice: Remote client connection accepted
2021-10-06T08:09:37.624055549+00:00 stderr F (process_lrmd_get_rsc_info) 	info: Agent information for 'ovndb_servers' not in cache
2021-10-06T08:09:37.664983661+00:00 stderr F (process_lrmd_get_rsc_info) 	info: Agent information for 'ovndb_servers:0' not in cache
2021-10-06T08:09:37.748068551+00:00 stderr F (process_lrmd_rsc_register) 	info: Cached agent information for 'ovndb_servers'
2021-10-06T08:09:38.455413267+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:start call_id:8
2021-10-06T08:09:39.193911686+00:00 stderr F (log_op_output) 	notice: ovndb_servers_start_0[55] error output [ ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ]
2021-10-06T08:09:39.193911686+00:00 stderr F (log_op_output) 	notice: ovndb_servers_start_0[55] error output [ ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ]
2021-10-06T08:09:39.193911686+00:00 stderr F (log_finished) 	info: ovndb_servers start (call 8, PID 55) exited with status 0 (execution time 739ms, queue time 0ms)
2021-10-06T08:09:39.393202935+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:notify call_id:17
2021-10-06T08:09:39.409380124+00:00 stderr F (log_finished) 	info: ovndb_servers notify (call 17, PID 209) exited with status 0 (execution time 16ms, queue time 0ms)
2021-10-06T08:09:40.384310926+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:notify call_id:18
2021-10-06T08:09:40.400723921+00:00 stderr F (log_finished) 	info: ovndb_servers notify (call 18, PID 213) exited with status 0 (execution time 16ms, queue time 0ms)
2021-10-06T08:09:41.272456336+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:notify call_id:19
2021-10-06T08:09:41.292393179+00:00 stderr F (log_finished) 	info: ovndb_servers notify (call 19, PID 217) exited with status 0 (execution time 19ms, queue time 0ms)
2021-10-06T08:09:41.532400514+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:notify call_id:20
2021-10-06T08:09:41.549856540+00:00 stderr F (log_finished) 	info: ovndb_servers notify (call 20, PID 221) exited with status 0 (execution time 17ms, queue time 0ms)
2021-10-06T08:09:41.643332183+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:promote call_id:21
2021-10-06T08:09:42.435567158+00:00 stderr F (log_op_output) 	notice: ovndb_servers_promote_0[225] error output [ nice: cannot set niceness: Permission denied ]
2021-10-06T08:09:42.435567158+00:00 stderr F (log_finished) 	info: ovndb_servers promote (call 21, PID 225) exited with status 0 (execution time 792ms, queue time 0ms)
2021-10-06T08:09:42.515383072+00:00 stderr F (log_execute) 	info: executing - rsc:ovndb_servers action:notify call_id:31
2021-10-06T08:09:43.026195246+00:00 stderr F (log_finished) 	info: ovndb_servers notify (call 31, PID 351) exited with status 0 (execution time 511ms, queue time 0ms)
[root@controller-0 ~]#

Comment 10 Jose Luis Franco 2021-10-13 15:20:57 UTC
It is being observed that https://code.engineering.redhat.com/gerrit/273889 isn't merged in rhos-16.2-patches branch, which will cause a regression as it fixes a typo introduced in: https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/273733/ which is merged in the rhos-16.2-patches.

Comment 11 Cédric Jeanneret 2021-10-18 06:54:21 UTC
Hello Jose,

It is merged actually - I squashed it with the "big one" in order to avoid getting 2 patches to review. We hit the typo while the patch was still open in -patches, it was easier to amend it:
https://code.engineering.redhat.com/gerrit/gitweb?p=openstack-tripleo-heat-templates.git;a=commit;h=e5be3f63b30dadea53d09720fab754e43e13f394

See the "Note" at the bottom.
It's the same for 16.2, actually:
https://code.engineering.redhat.com/gerrit/gitweb?p=openstack-tripleo-heat-templates.git;a=commit;h=f7aa3068f1d8eb41971f64bb6121128b5842a38e (same "Note" - one is a cherry-pick of the other)

LADA doesn't see this kind of things - sorry!

Cheers,

C.

Comment 20 errata-xmlrpc 2021-12-09 20:41:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5067


Note You need to log in before you can comment on or make changes to this bug.