While investigating #1997351, it was found out that most of the pacemaker managed containers are lacking the "z" flag for their read-write bind mounts. Though it's mostly working thanks to all the label work done within tripleo-heat-templates and host_prep_tasks, in addition to other containers having the same bind-mounts with the correct options, this situation is really dangerous: - depending on the start order, a bundle might lock the data for itself ("z" allows to mount the same location in multiple containers) - depending on external actions, a location might see its label changed, and this would prevent the bundle to properly start We therefore must ensure all the proper options are set, while preventing killing the system (so, no "z" flag for /dev, /sys and other sensitive locations). This is the "Red Hat" version of https://bugs.launchpad.net/tripleo/+bug/1943459 This is a patch we must ensure to land in 16.2.1.
Steps to verify: - get a 16.2 OVN deploy env - *during* controller deploy, connect to that controller - check for the existence of /var/lib/openvswitch/ovn directory, as well as its label (ls -lZd /var/lib/openvswitch/ovn) - if directory exists, change its type: chcon -t openvswitch_var_lib_t /var/lib/openvswitch/ovn - follow /var/log/audit/audit.log, and ensure you don't see any AVC related to OVN service - especially when ovn-dbs-bundle is started - you can also double check no errors are in /var/log/containers/stdouts/ovn-dbs-bundle.log
After some discussions, here's the plan: adding the "z" flag will create an outage during day-2 operations, and we don't want that. So we target https://review.opendev.org/c/openstack/puppet-tripleo/+/808774 for wallaby/osp-17 only, and get a train-only patch that will actually change the context during upgrade_tasks (linked patch here). This same train-only patch will also be added to 16.1.x (see #1997351 for more details).
Note that rhos-16.2-patches has a squashed patch, merging those two: - https://review.opendev.org/c/openstack/tripleo-heat-templates/+/808964 - https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809427
No avc related to ovn service: [stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates-11.5.1-2 openstack-tripleo-heat-templates-11.5.1-2.20210603174824.el8ost.10.noarch [root@controller-0 ~]# grep -i "avc.*ovn" /var/log/audit/audit.log [root@controller-0 ~]# [root@controller-0 ~]# cat /var/log/containers/stdouts/ovn-dbs-bundle.log 2021-10-06T08:09:24.962629445+00:00 stderr F + sudo -E kolla_set_configs 2021-10-06T08:09:24.975549467+00:00 stderr F sudo: unable to send audit message: Operation not permitted 2021-10-06T08:09:25.055352956+00:00 stderr F INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json 2021-10-06T08:09:25.055726004+00:00 stderr F INFO:__main__:Validating config file 2021-10-06T08:09:25.055842771+00:00 stderr F INFO:__main__:Kolla config strategy set to: COPY_ALWAYS 2021-10-06T08:09:25.055885495+00:00 stderr F INFO:__main__:Copying service configuration files 2021-10-06T08:09:25.056204214+00:00 stderr F INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets 2021-10-06T08:09:25.057176396+00:00 stderr F INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets 2021-10-06T08:09:25.057514130+00:00 stderr F INFO:__main__:Writing out command to execute 2021-10-06T08:09:25.070459613+00:00 stderr F ++ cat /run_command 2021-10-06T08:09:25.073326411+00:00 stdout F Running command: '/usr/sbin/pacemaker_remoted' 2021-10-06T08:09:25.073349176+00:00 stderr F + CMD=/usr/sbin/pacemaker_remoted 2021-10-06T08:09:25.073349176+00:00 stderr F + ARGS= 2021-10-06T08:09:25.073349176+00:00 stderr F + [[ ! -n '' ]] 2021-10-06T08:09:25.073349176+00:00 stderr F + . kolla_extend_start 2021-10-06T08:09:25.073349176+00:00 stderr F + echo 'Running command: '\''/usr/sbin/pacemaker_remoted'\''' 2021-10-06T08:09:25.073349176+00:00 stderr F + exec /usr/sbin/pacemaker_remoted 2021-10-06T08:09:25.089784613+00:00 stderr F (crm_add_logfile) notice: Additional logging available in /var/log/pacemaker/pacemaker.log 2021-10-06T08:09:25.089784613+00:00 stderr F (crm_log_init) info: Changed active directory to /var/lib/pacemaker/cores 2021-10-06T08:09:25.089784613+00:00 stderr F (main) notice: Starting Pacemaker remote executor 2021-10-06T08:09:25.089784613+00:00 stderr F (qb_ipcs_us_publish) info: server name: lrmd 2021-10-06T08:09:25.089784613+00:00 stderr F (pcmk__init_tls_dh) info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS 2021-10-06T08:09:25.967490182+00:00 stderr F (lrmd_tls_set_key) error: No valid Pacemaker Remote key found at /etc/pacemaker/authkey 2021-10-06T08:09:25.967535096+00:00 stderr F (lrmd_init_remote_tls_server) warning: A cluster connection will not be possible until the key is available 2021-10-06T08:09:25.968481642+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_ro 2021-10-06T08:09:25.968853968+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_rw 2021-10-06T08:09:25.969089989+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_shm 2021-10-06T08:09:25.969251766+00:00 stderr F (qb_ipcs_us_publish) info: server name: attrd 2021-10-06T08:09:25.969399830+00:00 stderr F (qb_ipcs_us_publish) info: server name: stonith-ng 2021-10-06T08:09:25.969527314+00:00 stderr F (qb_ipcs_us_publish) info: server name: pacemakerd 2021-10-06T08:09:25.969676346+00:00 stderr F (qb_ipcs_us_publish) info: server name: crmd 2021-10-06T08:09:25.969848078+00:00 stderr F (main) notice: Pacemaker remote executor successfully started and accepting connections 2021-10-06T08:09:32.681314674+00:00 stderr F (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) 2021-10-06T08:09:32.681314674+00:00 stderr F (lrmd_exit) info: Terminating with 0 clients 2021-10-06T08:09:32.681314674+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.681713412+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.681794686+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.681852520+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.681890523+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.681970622+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.682042284+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.682103157+00:00 stderr F (qb_ipcs_us_withdraw) info: withdrawing server sockets 2021-10-06T08:09:32.682213791+00:00 stderr F (crm_xml_cleanup) info: Cleaning up memory from libxml2 2021-10-06T08:09:32.682256305+00:00 stderr F (crm_exit) info: Exiting pacemaker-remoted | with status 0 2021-10-06T08:09:34.265518796+00:00 stderr F + sudo -E kolla_set_configs 2021-10-06T08:09:34.276518932+00:00 stderr F sudo: unable to send audit message: Operation not permitted 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Validating config file 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Kolla config strategy set to: COPY_ALWAYS 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Copying service configuration files 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets 2021-10-06T08:09:34.350515433+00:00 stderr F INFO:__main__:Writing out command to execute 2021-10-06T08:09:34.358385541+00:00 stderr F ++ cat /run_command 2021-10-06T08:09:34.360749708+00:00 stderr F + CMD=/usr/sbin/pacemaker_remoted 2021-10-06T08:09:34.360799385+00:00 stderr F + ARGS= 2021-10-06T08:09:34.360811853+00:00 stderr F + [[ ! -n '' ]] 2021-10-06T08:09:34.360822982+00:00 stderr F + . kolla_extend_start 2021-10-06T08:09:34.360961379+00:00 stdout F Running command: '/usr/sbin/pacemaker_remoted' 2021-10-06T08:09:34.360974830+00:00 stderr F + echo 'Running command: '\''/usr/sbin/pacemaker_remoted'\''' 2021-10-06T08:09:34.360974830+00:00 stderr F + exec /usr/sbin/pacemaker_remoted 2021-10-06T08:09:34.369991449+00:00 stderr F (crm_add_logfile) error: Directory '/var/log/pacemaker' does not exist: logging to '/var/log/pacemaker/pacemaker.log' is disabled 2021-10-06T08:09:34.370380654+00:00 stderr F (crm_log_init) info: Changed active directory to /var/lib/pacemaker/cores 2021-10-06T08:09:34.370435049+00:00 stderr F (main) notice: Starting Pacemaker remote executor 2021-10-06T08:09:34.370454773+00:00 stderr F (qb_ipcs_us_publish) info: server name: lrmd 2021-10-06T08:09:34.370828626+00:00 stderr F (pcmk__init_tls_dh) info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS 2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_ro 2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_rw 2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) info: server name: cib_shm 2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) info: server name: attrd 2021-10-06T08:09:34.542575947+00:00 stderr F (qb_ipcs_us_publish) info: server name: stonith-ng 2021-10-06T08:09:34.542622194+00:00 stderr F (qb_ipcs_us_publish) info: server name: pacemakerd 2021-10-06T08:09:34.542883893+00:00 stderr F (qb_ipcs_us_publish) info: server name: crmd 2021-10-06T08:09:34.542883893+00:00 stderr F (main) notice: Pacemaker remote executor successfully started and accepting connections 2021-10-06T08:09:35.246321783+00:00 stderr F (pcmk__accept_remote_connection) info: Accepted new remote client connection from ::ffff:172.17.1.101 2021-10-06T08:09:35.246545466+00:00 stderr F (lrmd_remote_listen) info: Remote client pending authentication | 0x560383f8d1c0 id: 1e434cdb-6e95-4c62-9aa5-9d7d65bbda56 2021-10-06T08:09:35.793265984+00:00 stderr F (remoted__read_handshake_data) notice: Remote client connection accepted 2021-10-06T08:09:37.624055549+00:00 stderr F (process_lrmd_get_rsc_info) info: Agent information for 'ovndb_servers' not in cache 2021-10-06T08:09:37.664983661+00:00 stderr F (process_lrmd_get_rsc_info) info: Agent information for 'ovndb_servers:0' not in cache 2021-10-06T08:09:37.748068551+00:00 stderr F (process_lrmd_rsc_register) info: Cached agent information for 'ovndb_servers' 2021-10-06T08:09:38.455413267+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:start call_id:8 2021-10-06T08:09:39.193911686+00:00 stderr F (log_op_output) notice: ovndb_servers_start_0[55] error output [ ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ] 2021-10-06T08:09:39.193911686+00:00 stderr F (log_op_output) notice: ovndb_servers_start_0[55] error output [ ovn-sbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ] 2021-10-06T08:09:39.193911686+00:00 stderr F (log_finished) info: ovndb_servers start (call 8, PID 55) exited with status 0 (execution time 739ms, queue time 0ms) 2021-10-06T08:09:39.393202935+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:notify call_id:17 2021-10-06T08:09:39.409380124+00:00 stderr F (log_finished) info: ovndb_servers notify (call 17, PID 209) exited with status 0 (execution time 16ms, queue time 0ms) 2021-10-06T08:09:40.384310926+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:notify call_id:18 2021-10-06T08:09:40.400723921+00:00 stderr F (log_finished) info: ovndb_servers notify (call 18, PID 213) exited with status 0 (execution time 16ms, queue time 0ms) 2021-10-06T08:09:41.272456336+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:notify call_id:19 2021-10-06T08:09:41.292393179+00:00 stderr F (log_finished) info: ovndb_servers notify (call 19, PID 217) exited with status 0 (execution time 19ms, queue time 0ms) 2021-10-06T08:09:41.532400514+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:notify call_id:20 2021-10-06T08:09:41.549856540+00:00 stderr F (log_finished) info: ovndb_servers notify (call 20, PID 221) exited with status 0 (execution time 17ms, queue time 0ms) 2021-10-06T08:09:41.643332183+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:promote call_id:21 2021-10-06T08:09:42.435567158+00:00 stderr F (log_op_output) notice: ovndb_servers_promote_0[225] error output [ nice: cannot set niceness: Permission denied ] 2021-10-06T08:09:42.435567158+00:00 stderr F (log_finished) info: ovndb_servers promote (call 21, PID 225) exited with status 0 (execution time 792ms, queue time 0ms) 2021-10-06T08:09:42.515383072+00:00 stderr F (log_execute) info: executing - rsc:ovndb_servers action:notify call_id:31 2021-10-06T08:09:43.026195246+00:00 stderr F (log_finished) info: ovndb_servers notify (call 31, PID 351) exited with status 0 (execution time 511ms, queue time 0ms) [root@controller-0 ~]#
It is being observed that https://code.engineering.redhat.com/gerrit/273889 isn't merged in rhos-16.2-patches branch, which will cause a regression as it fixes a typo introduced in: https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/273733/ which is merged in the rhos-16.2-patches.
Hello Jose, It is merged actually - I squashed it with the "big one" in order to avoid getting 2 patches to review. We hit the typo while the patch was still open in -patches, it was easier to amend it: https://code.engineering.redhat.com/gerrit/gitweb?p=openstack-tripleo-heat-templates.git;a=commit;h=e5be3f63b30dadea53d09720fab754e43e13f394 See the "Note" at the bottom. It's the same for 16.2, actually: https://code.engineering.redhat.com/gerrit/gitweb?p=openstack-tripleo-heat-templates.git;a=commit;h=f7aa3068f1d8eb41971f64bb6121128b5842a38e (same "Note" - one is a cherry-pick of the other) LADA doesn't see this kind of things - sorry! Cheers, C.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5067