Bug 1518253 - engine-setup upgrade to postgresql 9.5 sometimes fails due to missing selinux policy
Summary: engine-setup upgrade to postgresql 9.5 sometimes fails due to missing selinux...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.1
: 4.2.1
Assignee: Yedidyah Bar David
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-28 13:51 UTC by Yedidyah Bar David
Modified: 2018-06-27 06:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Under certain conditions an issue with a change in selinux policy, and the script that converts a selinux policy in the old format to the new format of the selinux policy, causes the engine-setup upgrade to postgresql to fail with the error '[ERROR] Failed to execute stage 'Misc configuration': Failed to start service 'rh-postgresql95-postgresql'. The log in /var/log/messages shows 'postgresql-ctl: postgres cannot access the server configuration file "/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf": Permission denied'. To prevent this, reinstall the rh-postgresql95-runtime package by running 'yum reinstall rh-postgresql95-runtime', then run engine-setup again.
Clone Of:
: 1518599 (view as bug list)
Environment:
Last Closed: 2018-02-12 11:48:32 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
sosreport-edwardh-host-1-part1 (19.00 MB, application/x-xz)
2017-11-28 14:11 UTC, Edward Haas
no flags Details
sosreport-edwardh-host-1-part2 (17.80 MB, application/octet-stream)
2017-11-28 14:14 UTC, Edward Haas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1518599 0 unspecified CLOSED engine-setup upgrade to postgresql 9.5 sometimes fails due to missing selinux policy 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1594615 0 high CLOSED Unable to perform upgrade from 4.1 to 4.2 due to selinux related errors. 2021-02-22 00:41:40 UTC

Internal Links: 1518599 1594615

Description Yedidyah Bar David 2017-11-28 13:51:13 UTC
Description of problem:

engine-setup fails with:

[ERROR] Failed to execute stage 'Misc configuration': Failed to start service 'rh-postgresql95-postgresql'

audit log has:

type=AVC msg=audit(1511854191.913:1057): avc:  denied  { getattr } for  pid=14752 comm="postgres" path="/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf" dev="dm-0" ino=396983 scontext=system_u:system_r:postgresql_t:s0 tcontext=unconfined_u:object_r:var_t:s0 tclass=file

We run this upgrade on CI, and it works there, so there is obviously something specific to the machine this happened on. Not sure what exactly.

Comparing to a machine where it does work, I saw this difference, when running 'semanage fcontext --list': On the good machine, I have:

==========================================
SELinux Local fcontext Equivalence 

/opt/rh/rh-postgresql95/root = /
/var/opt/rh/rh-postgresql95 = /var
/etc/opt/rh/rh-postgresql95 = /etc
==========================================

This is missing on the bad machine.

This seems to be handled by the contents of the file:

selinux/targeted/contexts/files/file_contexts.subs

which is empty on the bad machine, and on the good one has:
==========================================
/opt/rh/rh-postgresql95/root /
/var/opt/rh/rh-postgresql95 /var
/etc/opt/rh/rh-postgresql95 /etc
==========================================

It seems to be created by the postinstall scriptlet of the rpm package rh-postgresql95-runtime. Based on this, I verified the following workaround:

yum reinstall rh-postgresql95-runtime

Based on logs of the bad machine, and the timestamp of the above file, it seems to have been broken by the systemd one-time service selinux-policy-migrate-local-changes. But it did not output anything to the log except for Starting/Started, so can't be sure.

Version-Release number of selected component (if applicable):

Current master

How reproducible:
Not sure

Steps to Reproduce:
1. Have a pre-postgresql9.5 setup. Current was old master (4.2), not sure it affects also 4.1
2. Try to upgrade as usual - yum update setup packages, engine-setup
3.

Actual results:
Fails as above

Expected results:
Succeeds

Additional info:

Workaround:

yum reinstall rh-postgresql95-runtime

and try again.

Opening this bug for reference, for now. Once we get more reports and/or find the root cause, we might close, move to selinux, apply some workaround during engine-setup, or something else.

Comment 1 Edward Haas 2017-11-28 14:11:22 UTC
Created attachment 1359890 [details]
sosreport-edwardh-host-1-part1

Split by dd into two parts.

Comment 2 Edward Haas 2017-11-28 14:14:13 UTC
Created attachment 1359891 [details]
sosreport-edwardh-host-1-part2

Split by dd into two parts.

Comment 3 Sandro Bonazzola 2017-11-28 14:16:01 UTC
Didi please get rh-postgresql95 maintainer involved, maybe some race condition in the order packages are installed due to some missing dep within rh-postgresql95 deptree.

Comment 4 Yedidyah Bar David 2017-11-28 14:27:28 UTC
Adding selinux-policy and rh-postgresql95 maintainers.

Pavel and Lukas, any of this makes sense?

Comment 5 Pavel Raiskup 2017-11-28 14:39:55 UTC
(In reply to Yedidyah Bar David from comment #0)
> type=AVC msg=audit(1511854191.913:1057): avc:  denied  { getattr } for 
> pid=14752 comm="postgres"
> path="/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf" dev="dm-0"
> ino=396983 scontext=system_u:system_r:postgresql_t:s0
> tcontext=unconfined_u:object_r:var_t:s0 tclass=file

The postgresql.conf file should be of type postgresql_db_t.  If it is not
we need to find the reason why it is not.

> We run this upgrade on CI, and it works there, so there is obviously
> something specific to the machine this happened on. Not sure what exactly.

Do you have SELinux enabled in the CI?

> [...]
> It seems to be created by the postinstall scriptlet of the rpm package
> rh-postgresql95-runtime. Based on this, I verified the following workaround:
> 
> yum reinstall rh-postgresql95-runtime

I would guess the context is OK after postgresql installation,
and db initialization;  and that engine-setup changes the context later.
That script should be analyzed.

Comment 6 Yedidyah Bar David 2017-11-28 15:22:51 UTC
(In reply to Pavel Raiskup from comment #5)
> (In reply to Yedidyah Bar David from comment #0)
> > type=AVC msg=audit(1511854191.913:1057): avc:  denied  { getattr } for 
> > pid=14752 comm="postgres"
> > path="/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf" dev="dm-0"
> > ino=396983 scontext=system_u:system_r:postgresql_t:s0
> > tcontext=unconfined_u:object_r:var_t:s0 tclass=file
> 
> The postgresql.conf file should be of type postgresql_db_t.  If it is not
> we need to find the reason why it is not.
> 
> > We run this upgrade on CI, and it works there, so there is obviously
> > something specific to the machine this happened on. Not sure what exactly.
> 
> Do you have SELinux enabled in the CI?

Yes

> 
> > [...]
> > It seems to be created by the postinstall scriptlet of the rpm package
> > rh-postgresql95-runtime. Based on this, I verified the following workaround:
> > 
> > yum reinstall rh-postgresql95-runtime
> 
> I would guess the context is OK after postgresql installation,
> and db initialization;  and that engine-setup changes the context later.
> That script should be analyzed.

As I wrote above, and discussed in private, engine-setup does not change the label on that file. It copies it from /var/lib/pgsql/data to the /var/opt location using python's shutil.copy2.

Adding also needinfo on Petr, who appears as the author of the script that is ran by selinux-policy-migrate-local-changes, selinux-policy-migrate-local-changes.sh. Petr - any idea?

Comment 7 Petr Lautrbach 2017-11-28 20:27:39 UTC
selinux-policy-migrate-local-changes.sh script should be run only once after a system is upgraded to rhel-7.3 or from SELinux userspace release < 2.4 to 2.5.

The only place where it touches file_contexts.subs is when it runs:

/usr/sbin/semanage export | /usr/sbin/semanage import

This generally should not fail.

At the same time there is the following line in selinux-policy.spec file which should prevent overwriting this file during selinux-policy update:

%config(noreplace) %{_sysconfdir}/selinux/%1/contexts/files/file_contexts.subs

So it looks like something else removed/erased file_contexts.subs. 

Are you able to reproduce it in a way that you have working setup and do something which results in empty file_contexts.subs ?

Comment 8 Yedidyah Bar David 2017-11-29 06:45:02 UTC
(In reply to Petr Lautrbach from comment #7)
> selinux-policy-migrate-local-changes.sh script should be run only once after
> a system is upgraded to rhel-7.3 or from SELinux userspace release < 2.4 to
> 2.5.
> 
> The only place where it touches file_contexts.subs is when it runs:
> 
> /usr/sbin/semanage export | /usr/sbin/semanage import
> 
> This generally should not fail.
> 
> At the same time there is the following line in selinux-policy.spec file
> which should prevent overwriting this file during selinux-policy update:
> 
> %config(noreplace)
> %{_sysconfdir}/selinux/%1/contexts/files/file_contexts.subs
> 
> So it looks like something else removed/erased file_contexts.subs. 

Perhaps, but I really have no idea what. Also, the timestamp of that file is ~ 1 second before selinux-policy-migrate-local-changes was "Started" (finished).

> 
> Are you able to reproduce it in a way that you have working setup and do
> something which results in empty file_contexts.subs ?

Didn't try yet.
I have a VM snapshotted and cloned from a machine in a state that has this file zeroed, I can give you access to it for further analysis. The attached sosreport is from the original machine.

Comment 9 Petr Lautrbach 2017-11-29 09:44:01 UTC
I cloned this bug to selinux-policy.

It seems to be some corner case when selinux-policy-3.13.1-102.el7.noarch is updated to 3.13.1-166.el7_4.7.noarch in a particular condition

Until it's fixed please use the workaround from

Comment 10 Yedidyah Bar David 2017-11-29 10:51:56 UTC
Changing to a Known Issue for now. We might clone and implement a workaround in the code if we get more reports.

Comment 11 Yedidyah Bar David 2017-12-11 06:58:16 UTC
Edited doc text a bit. Also perhaps consider adding somewhere 'Under certain conditions', because we got very few reports like this, and do not fully know the details. Thanks.

Comment 12 Megan Lewis 2017-12-11 23:52:51 UTC
Thanks for the review Didi. I've added under certain conditions and verified the changes.

Comment 13 RHV bug bot 2018-01-05 16:57:47 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No external trackers attached]

For more info please contact: infra

Comment 14 RHV bug bot 2018-01-12 14:39:15 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No external trackers attached]

For more info please contact: infra

Comment 15 Lucie Leistnerova 2018-01-24 15:54:10 UTC
Upgrade 4.1 -> 4.2 was sucessfull. I tried it more times on different environments (with selinux-policy-targeted version 3.13.1-102.el7, 3.13.1-166.el7_4.7 and 3.13.1-183.el7).
Feel free to reopen the bug if it happens again.

verified in ovirt-engine-4.2.1.2-0.1.el7.noarch

Comment 16 Sandro Bonazzola 2018-02-12 11:48:32 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 17 Ondra Machacek 2018-06-27 06:43:02 UTC
Another case where it happen:

 https://bugzilla.redhat.com/show_bug.cgi?id=1594615


Note You need to log in before you can comment on or make changes to this bug.