Bug 1913224

Summary:

[Cinderlib] - Although the Configure local CinderLib database configured to True in global maintenance mode, the execution of optional-components setup failed on hosted engine environment

Product:

Red Hat Enterprise Linux 8

Reporter:

Shir Fishbain <sfishbai>

Component:

libsemanage

Assignee:

Petr Lautrbach <plautrba>

Status:

CLOSED ERRATA

QA Contact:

Milos Malik <mmalik>

Severity:

high

Docs Contact:

Khushbu Borole <kborole>

Priority:

high

Version:

8.0

CC:

asharov, bugs, didi, dwalsh, lvrabec, mjahoda, mmalik, plautrba, ssekidde, vmojzis

Target Milestone:

Keywords:

Triaged

Target Release:

8.0

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

.Rebuilds of the SELinux policy store are now more resistant to power failures Previously, SELinux-policy rebuilds were not resistant to power failures due to write caching. Consequently, the SELinux policy store may become corrupted after a power failure during a policy rebuild. With this update, the `libsemanage` library writes all pending modifications to metadata and cached file data to the file system that contains the policy store before using it. As a result, the policy store is now more resistant to power failures and other interruptions.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-05-18 15:05:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
setup_logs	none
engine_log	none
setup_reconfiguration_cinderlib_flow	none

Description Shir Fishbain 2021-01-06 10:11:07 UTC

Description of problem:
After configured the Cinderlib DB as true:
[1] Configure Cinderlib integration (Currently in tech preview) (Yes, No) [No]: Yes
[2] Configured the database located as Local or Remote
During the running of "engine-setup --reconfigure-optional-components" command, the execution of setup failed. (setup running attached)

In addition to the running of CinderLib DB configuration (when the engine was in global maintenance mode), I have already succeeded to run the regular following Cinderlib integration steps on the engine and got the option (Just in the UI) to add new MBD :

CinderLib integration steps:
1 )Changing the "openstack-cinderlib-rpms" to enabled =1 under rhv-4.4.4-dependencies.repo.
2) Run dnf install python3-cinder
3) Run engine-config -s ManagedBlockDomainSupported=true -> 4.5
4) systemctl restart ovirt-engine


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.4.5-0.10.el8ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Running Cinderlib regular integration steps as described above - Succeeded 
2. Set to global maintenance mode 
3. Running manually the "engine-setup --reconfigure-optional-components" on  hosted engine environment - Failed
4. Running the POST request for creation MBD via REST API - Failed

POST , https://<hosted_engine_environment_name>/ovirt-engine/api/storagedomains Body:
<storage_domain>
    <name>ceph-cinderlib</name>
    <type>managed_block_storage</type>
    <storage>
    	<type>managed_block_storage</type>
    	<driver_options>
    		<property>
    		<name>rbd_ceph_conf</name>
    		<value>/etc/ceph/ceph.conf</value>
                <name>rbd_keyring_conf</name>
    		<value>/etc/ceph/ceph.client.admin.keyring</value>
    		</property>
    		 <property>
    		<name>rbd_pool</name>
    		<value>volumes</value>
    		</property>
    		<property>
    		<name>rbd_user</name>
    		<value>admin</value>
    		</property>
    		<property>
    		<name>use_multipath_for_image_xfer</name>
    		<value>true</value>
    		</property>
    		<property>
    		<name>volume_driver</name>
    		<value>cinder.volume.drivers.rbd.RBDDriver</value>
    		</property>
    	</driver_options>
    </storage>
    <host>
    	<name>host_mixed_1</name>
    </host>
</storage_domain>
 
Actual results:
1 ) The result of POST request for creation new MBD failed on Cinderlib DB required :
<fault>
  <detail>[Cannot add Storage. Cinderlib database is required., Cannot add Storage. Cinderlib database is required.]</detail>
  <reason>Operation Failed</reason>
</fault>

2) The engine setup reconfiguration components was failed with the following errors:

from the setup running:
          --== CONFIGURATION PREVIEW ==--
         
          Default SAN wipe after delete           : False
          Host FQDN                               : hosted-engine-11.lab.eng.tlv2.redhat.com
          Firewall manager                        : firewalld
          Update Firewall                         : True
          CinderLib database host                 : localhost
          CinderLib database port                 : 5432
          CinderLib database secured connection   : False
          CinderLib database host name validation : False
          CinderLib database name                 : ovirt_cinderlib
          CinderLib database user name            : ovirt_cinderlib
          Set up Cinderlib integration            : True
          Configure local CinderLib database      : True
          Engine database host                    : localhost
          Engine database port                    : 5432
          Engine database secured connection      : False
          Engine database host name validation    : False
          Engine database name                    : engine
          Engine database user name               : engine
          Engine installation                     : True
          PKI organization                        : lab.eng.tlv2.redhat.com
          Set up ovirt-provider-ovn               : True
          Grafana integration                     : False
          Configure WebSocket Proxy               : True
          DWH installation                        : True
          DWH database host                       : localhost
          DWH database port                       : 5432
          DWH database secured connection         : False
          DWH database host name validation       : False
          DWH database name                       : ovirt_engine_history
          DWH database user name                  : ovirt_engine_history
          Backup DWH database                     : True
          Configure VMConsole Proxy               : True

          Please confirm installation settings (OK, Cancel) [OK]: 
[ INFO  ] Cleaning async tasks and compensations
[ INFO  ] Unlocking existing entities
[ INFO  ] Checking the Engine database consistency
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stopping engine service
[ INFO  ] Stopping ovirt-fence-kdump-listener service
[ INFO  ] Stopping dwh service
[ INFO  ] Stopping vmconsole-proxy service
[ INFO  ] Stopping websocket-proxy service
[ INFO  ] Stage: Misc configuration (early)
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Creating PostgreSQL 'ovirt_cinderlib' database
[ INFO  ] Configuring PostgreSQL
[ INFO  ] Upgrading CA
[ INFO  ] Updating OVN SSL configuration
[ INFO  ] Updating OVN timeout configuration
[ INFO  ] Backing up database localhost:ovirt_engine_history to '/var/lib/ovirt-engine-dwh/backups/dwh-20210105160339.mgjszrw_.dump'.
[ INFO  ] Creating/refreshing DWH database schema
[ INFO  ] Configuring WebSocket Proxy
[ INFO  ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20210105160343.2iruju3t.dump'.
[ INFO  ] Creating/refreshing Engine database schema
[ INFO  ] Creating/refreshing Engine 'internal' domain database schema
          Unregistering existing client registration info.
[ INFO  ] Install selinux module /usr/share/ovirt-engine/selinux/ansible-runner-service.cil
[ ERROR ] Failed to execute stage 'Misc configuration': Command '/usr/sbin/semodule' failed to execute
[ INFO  ] DNF Performing DNF transaction rollback
[WARNING] Rollback of DWH database postponed to Stage "Clean up"
[ INFO  ] Rolling back database schema
[ INFO  ] Clearing Engine database engine
[ INFO  ] Restoring Engine database engine
[ INFO  ] Restoring file '/var/lib/ovirt-engine/backups/engine-20210105160343.2iruju3t.dump' to database localhost:engine.
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20210105160204-tpq803.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20210105160419-setup.conf'
[WARNING] Rollback of DWH database started
          This might be a long process, but it should be safe to start the engine service before it finishes, if needed.
[ INFO  ] Clearing DWH database ovirt_engine_history
[ INFO  ] Restoring DWH database ovirt_engine_history
[ INFO  ] Restoring file '/var/lib/ovirt-engine-dwh/backups/dwh-20210105160339.mgjszrw_.dump' to database localhost:ovirt_engine_history.
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of setup failed

From setup.log:
2021-01-05 15:56:59,421+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Environment customization': SIG2
2021-01-05 15:56:59,493+0200 ERROR otopi.plugins.ovirt_engine_common.base.core.misc misc._terminate:153 Execution of setup failed

2021-01-05 15:58:15,076+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Misc configuration': Command '/usr/sbin/semodule' failed to execute
2021-01-05 15:58:34,561+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.restore:904 db restore rc 1 stderr ['pg_restore: while PROCESSING TOC:', 'pg_restore: from TOC entry 7272; 0 0 COMMENT EXTENSION "uuid-ossp" ', 'pg_restore: error: could not execute query: ERROR:  must be owner of extension uuid-ossp', 'Command was: COMMENT ON EXTENSION "uuid-ossp" IS \'generate universally unique identifiers (UUIDs)\';', '', '', 'pg_restore: warning: errors ignored on restore: 1']
2021-01-05 15:58:43,555+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.restore:904 db restore rc 1 stderr ['pg_restore: while PROCESSING TOC:', 'pg_restore: from TOC entry 5166; 0 0 COMMENT EXTENSION "uuid-ossp" ', 'pg_restore: error: could not execute query: ERROR:  must be owner of extension uuid-ossp', 'Command was: COMMENT ON EXTENSION "uuid-ossp" IS \'generate universally unique identifiers (UUIDs)\';', '', '', 'pg_restore: warning: errors ignored on restore: 1']
2021-01-05 15:58:43,606+0200 ERROR otopi.plugins.ovirt_engine_common.base.core.misc misc._terminate:153 Execution of setup failed

2021-01-05 16:04:00,128+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Misc configuration': Command '/usr/sbin/semodule' failed to execute
2021-01-05 16:04:19,605+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.restore:904 db restore rc 1 stderr ['pg_restore: while PROCESSING TOC:', 'pg_restore: from TOC entry 7274; 0 0 COMMENT EXTENSION "uuid-ossp" ', 'pg_restore: error: could not execute query: ERROR:  must be owner of extension uuid-ossp', 'Command was: COMMENT ON EXTENSION "uuid-ossp" IS \'generate universally unique identifiers (UUIDs)\';', '', '', 'pg_restore: warning: errors ignored on restore: 1']
2021-01-05 16:04:28,409+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.restore:904 db restore rc 1 stderr ['pg_restore: while PROCESSING TOC:', 'pg_restore: from TOC entry 5166; 0 0 COMMENT EXTENSION "uuid-ossp" ', 'pg_restore: error: could not execute query: ERROR:  must be owner of extension uuid-ossp', 'Command was: COMMENT ON EXTENSION "uuid-ossp" IS \'generate universally unique identifiers (UUIDs)\';', '', '', 'pg_restore: warning: errors ignored on restore: 1']
2021-01-05 16:04:28,461+0200 ERROR otopi.plugins.ovirt_engine_common.base.core.misc misc._terminate:153 Execution of setup failed

Expected results:
The operation of creating new MBD should success

Comment 1 Shir Fishbain 2021-01-06 10:14:23 UTC

Created attachment 1744844 [details]
setup_logs

Comment 2 Shir Fishbain 2021-01-06 10:16:56 UTC

Created attachment 1744845 [details]
engine_log

Comment 3 Shir Fishbain 2021-01-06 10:18:02 UTC

Created attachment 1744847 [details]
setup_reconfiguration_cinderlib_flow

Comment 4 Yedidyah Bar David 2021-01-06 11:08:45 UTC

I looked at the setup logs, and also logged into the machine, and this is what I see:

1. The failure was in running the command:

semodule -i /usr/share/ovirt-engine/selinux/ansible-runner-service.cil

meaning, it's not really related to cinderlib (or even to oVirt/RHV, likely).

2. stderr of the command was:

libsemanage.semanage_direct_get_module_info: Unable to read ansible-runner-service module lang ext file.
libsemanage.semanage_direct_get_module_info: Unable to read openvswitch-custom module lang ext file. (No such file or directory).
libsemanage.semanage_direct_get_module_info: Unable to read openvswitch-custom module lang ext file. (No such file or directory).
/usr/sbin/semodule:  Failed on /usr/share/ovirt-engine/selinux/ansible-runner-service.cil!

I searched for (parts of) it, and found (also) bug 1756835.

3. Based on the discussion in that bug, I checked the contents of /var/lib/selinux/targeted. Indeed, most (all?) of the subdirs there were semi-empty - meaning, e.g.:

# ls -la --full-time /var/lib/selinux/targeted/active/modules/100/avahi
total 16
drwx------.   2 root root    44 2020-12-25 00:35:50.857712594 +0200 .
drwx------. 423 root root 12288 2020-12-25 00:35:50.922712961 +0200 ..
-rw-------.   1 root root     0 2020-12-25 00:35:50.857712594 +0200 cil
-rw-------.   1 root root     0 2020-12-25 00:35:50.857712594 +0200 hll
-rw-------.   1 root root     0 2020-12-25 00:35:50.857712594 +0200 lang_ext

IOW, had empty cil/hll/lang_ext files.

4. All of them had timestamps between '00:35:50.854712577' and '00:35:50.917712933'. so I checked what might have happened at that time, and found, in the first (successful) engine-setup log:

==========================================================================
2020-12-25 00:35:49,025+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.executeRaw:813 execute: ('/usr/sbin/restorecon', '-r', '/usr/share/ovirt-engine/ansible-runner
-service-project'), executable='None', cwd='None', env=None
2020-12-25 00:35:50,294+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.executeRaw:863 execute-result: ('/usr/sbin/restorecon', '-r', '/usr/share/ovirt-engine/ansible
-runner-service-project'), rc=0
2020-12-25 00:35:50,294+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.execute:921 execute-output: ('/usr/sbin/restorecon', '-r', '/usr/share/ovirt-engine/ansible-ru
nner-service-project') stdout:


2020-12-25 00:35:50,294+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.execute:926 execute-output: ('/usr/sbin/restorecon', '-r', '/usr/share/ovirt-engine/ansible-runner-service-project') stderr:


2020-12-25 00:35:50,295+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.executeRaw:813 execute: ('/usr/sbin/semanage', 'boolean', '--modify', '--on', 'httpd_can_network_connect'), executable='None', cwd='None', env=None
2020-12-25 00:35:52,690+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.executeRaw:863 execute-result: ('/usr/sbin/semanage', 'boolean', '--modify', '--on', 'httpd_can_network_connect'), rc=0
2020-12-25 00:35:52,691+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.execute:921 execute-output: ('/usr/sbin/semanage', 'boolean', '--modify', '--on', 'httpd_can_network_connect') stdout:


2020-12-25 00:35:52,691+0200 DEBUG otopi.plugins.ovirt_engine_common.base.system.selinux plugin.execute:926 execute-output: ('/usr/sbin/semanage', 'boolean', '--modify', '--on', 'httpd_can_network_connect') stderr:
==========================================================================

Meaning, we ran restorecon (on /usr/share/ovirt-engine/ansible-runner. Not sure why we need to, but anyway), this finished, and then: 'semanage boolean --modify --on httpd_can_network_connect'. This started at 00:35:50,295 and finished at 00:35:52,691 - meaning, all the timestamps of the mentioned files are in this range, as if this command somehow trashed them.

5. Based on other comments in bug 1756835, I tried gradually moving stuff away from /var/lib/selinux/targeted and running the semodule command (with different but similar error messages), and eventually, after it still failed when it was completely empty, did 'dnf reinstall selinux-policy-targeted' and then the semodule command succedded.

So I suspect this is some issue with selinux/semodule/semanage. Versions:

==========================================================================
# rpm -qfi /usr/sbin/semanage /usr/sbin/semodule
Name        : policycoreutils-python-utils
Version     : 2.9
Release     : 9.el8
Architecture: noarch
Install Date: Tue 17 Nov 2020 11:54:26 AM IST
Group       : Unspecified
Size        : 140042
License     : GPLv2
Signature   : RSA/SHA256, Mon 20 Jan 2020 07:39:30 PM IST, Key ID 199e2f91fd431d51
Source RPM  : policycoreutils-2.9-9.el8.src.rpm
Build Date  : Fri 17 Jan 2020 08:01:05 PM IST
Build Host  : arm64-033.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/SELinuxProject/selinux
Summary     : SELinux policy core python utilities
Description :
The policycoreutils-python-utils package contains the management tools use to manage
an SELinux environment.
Name        : policycoreutils
Version     : 2.9
Release     : 9.el8
Architecture: x86_64
Install Date: Tue 17 Nov 2020 11:53:55 AM IST
Group       : Unspecified
Size        : 673367
License     : GPLv2
Signature   : RSA/SHA256, Mon 20 Jan 2020 07:39:28 PM IST, Key ID 199e2f91fd431d51
Source RPM  : policycoreutils-2.9-9.el8.src.rpm
Build Date  : Fri 17 Jan 2020 07:58:30 PM IST
Build Host  : x86-vm-08.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/SELinuxProject/selinux
Summary     : SELinux policy core utilities
Description :
Security-enhanced Linux is a feature of the Linux® kernel and a number
of utilities with enhanced security functionality designed to add
mandatory access controls to Linux.  The Security-enhanced Linux
kernel contains new architectural components originally developed to
improve the security of the Flask operating system. These
architectural components provide general support for the enforcement
of many kinds of mandatory access control policies, including those
based on the concepts of Type Enforcement®, Role-based Access
Control, and Multi-level Security.

policycoreutils contains the policy core utilities that are required
for basic operation of a SELinux system.  These utilities include
load_policy to load policies, setfiles to label filesystems, newrole
to switch roles.

# rpm -qi selinux-policy-targeted
Name        : selinux-policy-targeted
Version     : 3.14.3
Release     : 54.el8_3.2
Architecture: noarch
Install Date: Wed 06 Jan 2021 12:45:17 PM IST
Group       : Unspecified
Size        : 52520436
License     : GPLv2+
Signature   : RSA/SHA256, Thu 10 Dec 2020 07:13:41 PM IST, Key ID 199e2f91fd431d51
Source RPM  : selinux-policy-3.14.3-54.el8_3.2.src.rpm
Build Date  : Tue 08 Dec 2020 05:15:44 PM IST
Build Host  : s390-071.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/fedora-selinux/selinux-policy
Summary     : SELinux targeted base policy
Description :
SELinux Reference policy targeted base module.
==========================================================================

So, how to continue?

1. Shir, please keep the machine as-is for now, for further investigation, and if possible, please try to reproduce on another machine.

2. For the selinux issue, setting needinfo on Daniel Walsh. Dan, please have a look. Thanks!

Comment 5 Shir Fishbain 2021-01-06 11:37:44 UTC

> So, how to continue?
> 
> 1. Shir, please keep the machine as-is for now, for further investigation,
> and if possible, please try to reproduce on another machine.
On which environment do you recommend to check? In stand alone environments it works fine. 
I found the bug in HE environment only. 
Keeping the HE env for your investigation, ping me in G-chat for more details.

> 2. For the selinux issue, setting needinfo on Daniel Walsh. Dan, please have
> a look. Thanks!

Comment 6 Petr Lautrbach 2021-01-07 17:48:39 UTC

I'm investigating it. Given that it's not first report like this it seems to be some kind of a race condition during libsemanage commit. But I guess it's un-reproducible.

Comment 7 Yedidyah Bar David 2021-01-10 08:02:23 UTC

(In reply to Petr Lautrbach from comment #6)
> I'm investigating it. Given that it's not first report like this it seems to
> be some kind of a race condition during libsemanage commit. But I guess it's
> un-reproducible.

Do you want/need access to Shir's systems? If not, she can use them for other
things, and/or try to verify current bug again - and if it fails again similarly,
you get another reproduction.

Comment 8 Petr Lautrbach 2021-01-11 09:24:49 UTC

It would be helpful to have system logs from 2020-12-25 00:35:50 - at least few minutes before. Other than that I don't need the system.
Is it ext4, xfs,... ?

It looks like to be a problem with filesystem like the write() reported success while there was no data actually written to the file. Maybe missing sync.

Comment 18 Petr Lautrbach 2021-01-18 19:37:19 UTC

Not sure if it's related but I found the following lines in the logs:

Dec 25 01:35:34 oncilla04 vdsm[26680]: WARN unhandled close event
Dec 25 01:35:45 oncilla04 vdsm[26680]: WARN unhandled write event
Dec 25 01:35:47 oncilla04 vdsm[26680]: WARN unhandled close event
Dec 25 01:35:57 oncilla04 vdsm[26680]: WARN unhandled write event
Dec 25 01:35:59 oncilla04 vdsm[26680]: WARN unhandled close event
Dec 25 01:36:09 oncilla04 vdsm[26680]: WARN unhandled write event


Anyway. It looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1838762 which is supposed to be fixed in libsemanage-2.9-3.el8 shipped by RHEL-8.3. According to logs, libsemanage-2.9-3.el8 was be already installed. 
But according to comment 9 - https://bugzilla.redhat.com/show_bug.cgi?id=1838762#c9 libsemanage-2.9-3.el8 fixed only problem with broken boot while it let broken selinux store on user to reinstall selinux-policy-targeted. It looks like we have semi reliable reproducer in #1838762 so we'll try to fix also the store problem.

Comment 19 Petr Lautrbach 2021-01-28 11:00:51 UTC

This patch https://lore.kernel.org/selinux/20210128104231.102470-1-plautrba@redhat.com/T/#u should prevent some issues with empty files in the module store after policy rebuild. It waits for upstream review now.

Comment 23 Petr Lautrbach 2021-02-02 10:40:23 UTC

*** Bug 1917507 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2021-05-18 15:05:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libsemanage bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1672