Bug 2104730

Summary:

PostgreSQL permissions are broken after LEAPP upgrade when PostgreSQL is on dedicated partition

Product:

Red Hat Enterprise Linux 7

Reporter:

Brenden Wood <bwood>

Component:

leapp-repository

Assignee:

Evgeni Golov <egolov>

Status:

CLOSED ERRATA

QA Contact:

Lukas Pramuk <lpramuk>

Severity:

medium

Docs Contact:

Priority:

high

Version:

7.9

CC:

ahumbe, bwood, egolov, kkinge, mhecko, saydas, sraut

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

leapp-repository-0.16.0-7.el7_9

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-08-23 20:14:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
LEAPP upgrade log	none

Description Brenden Wood 2022-07-07 03:51:45 UTC

Created attachment 1895063 [details]
LEAPP upgrade log

Description of problem:

PostgreSQL file and directory permissions are broken after LEAPP upgrade on Satellite 6.11 server. The satellite-installer run at FirstBoot step of LEAPP upgrade fails because PostgreSQL can't start. Manually changing permissions of /var/lib/pgsql/data/ allows Postgres to start and re-run of satelliteinstaller works from there.

Prior to LEAPP upgrade I had to do `rpm -e ansible ansible-test --nodeps` and `subscription-manager repo-override --repo=satellite-6.11-for-rhel-8-x86_64-rpms --add=module_hotfixes:1` fixes that are listed in the release notes known issues.

I reverted to a snapshot of the system on RHEL 7 with satellite 6.10 and went through the same upgrade and LEAPP steps again. The issue was reproducable.

An sos report of system on RHEL 8 with broken Satellite install is here:
https://drive.google.com/file/d/1y_dFFOUK2MGSNtiVTTNvbtX4aJvmqyIm/view?usp=sharing ( Red Hat Internal )
or
https://drive.google.com/file/d/193PbnxQePDlhmRPrLjyOHKcTBDmaqF18/view?usp=sharing ( public link )


leapp-upgrade.log is attached to this BZ.

PostgreSQL permissions are: 

[root@woody-lab-satellite ~]# ls -lahZ /var/lib/pgsql/data/
total 60K
drwx------. 20 postgres postgres system_u:object_r:postgresql_db_t:s0     4.0K Jul  7 12:22 .
drwx------.  4 postgres postgres system_u:object_r:postgresql_db_t:s0       54 Jul  7 12:37 ..
drwx------.  9 root     root     unconfined_u:object_r:postgresql_db_t:s0   97 Jan 26 16:02 base
-rw-------.  1 root     root     system_u:object_r:postgresql_db_t:s0       30 Jul  7 11:36 current_logfiles
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0 4.0K Jul  7 11:37 global
drwx------.  2 root     root     system_u:object_r:postgresql_log_t:s0     136 Mar  4 09:53 log
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_commit_ts
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_dynshmem
-rw-r-----.  1 postgres postgres system_u:object_r:postgresql_db_t:s0      703 Jan 25 16:46 pg_hba.conf
-rw-r-----.  1 postgres postgres system_u:object_r:postgresql_db_t:s0       47 Jan 25 16:46 pg_ident.conf
drwx------.  4 root     root     unconfined_u:object_r:postgresql_db_t:s0   68 Jul  7 12:22 pg_logical
drwx------.  4 root     root     unconfined_u:object_r:postgresql_db_t:s0   36 Jan 25 16:37 pg_multixact
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0   18 Jul  7 11:36 pg_notify
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_replslot
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_serial
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_snapshots
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0  126 Jul  7 12:22 pg_stat
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jul  7 12:22 pg_stat_tmp
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0   18 Jun 29 16:43 pg_subtrans
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_tblspc
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0    6 Jan 25 16:37 pg_twophase
-rw-------.  1 root     root     unconfined_u:object_r:postgresql_db_t:s0    3 Jan 25 16:37 PG_VERSION
drwx------.  3 root     root     unconfined_u:object_r:postgresql_db_t:s0  188 Jul  7 10:22 pg_wal
drwx------.  2 root     root     unconfined_u:object_r:postgresql_db_t:s0   42 Jun 28 01:37 pg_xact
-rw-------.  1 root     root     unconfined_u:object_r:postgresql_db_t:s0   88 Jan 25 16:37 postgresql.auto.conf
-rw-------.  1 postgres postgres system_u:object_r:postgresql_db_t:s0      27K Jul  7 12:45 postgresql.conf
-rw-------.  1 root     root     system_u:object_r:postgresql_db_t:s0       96 Jul  7 11:36 postmaster.opts

[root@woody-lab-satellite ~]# ls -alhZ /var/opt/rh/rh-postgresql12/lib/pgsql/
total 0
drwx------. 2 postgres postgres system_u:object_r:var_t:s0  6 Jul  7 12:39 .
drwxr-xr-x. 3 root     root     system_u:object_r:var_t:s0 19 Jul  7 12:34 ..


Version-Release number of selected component (if applicable):
Satellite 6.11.0 

How reproducible:
Upgrade Satellite from 6.10 > 6.11 then perform LEAPP upgrade form RHEL7 to RHEL 8 with fixes from release notes known issues mentioned above

Steps to Reproduce:
1. Upgrade Satellite from 6.10 > 6.11
2. Perform LEAPP upgrade form RHEL7 to RHEL 8 with fixes from release notes known issues mentioned above

Actual results:
The satellite-installer run at FirstBoot step of LEAPP upgrade fails because PostgreSQL can't start.

Expected results:
PostgreSQL should start successfully and Satellite should run at FirstBoot step of LEAPP upgrade

Additional info:

Assigning to egolov after chatting internally

Comment 1 Evgeni Golov 2022-07-11 07:56:59 UTC

I can reproduce the issue and it seems to stem from the fact that the old PostgreSQL data was on an own mountpoint and had to be moved over.
However, I am sure we had this scenario tested and it worked. Looking into the details.

Comment 2 Brad Buckingham 2022-07-11 15:26:50 UTC

The upstream PR proposed to fix the problem:
   https://github.com/oamg/leapp-repository/pull/916

Comment 9 Lukas Pramuk 2022-07-28 12:49:15 UTC

VERIFIED.

@Satellite 6.11.1
leapp-0.14.0-1.el7_9.noarch
leapp-upgrade-el7toel8-0.16.0-7.el7_9.noarch

by the following manual reproducer:

1) Switch over PostgreSQL to be mounted at /var/opt/rh/rh-postgresql12/lib/pgsql
# eval $(blkid -o export /dev/vdb1)
# echo "UUID=$UUID /var/opt/rh/rh-postgresql12/lib/pgsql xfs defaults 0 2">> /etc/fstab
# satellite-maintain service stop
# mv /var/opt/rh/rh-postgresql12/lib/pgsql{,-orig}
# mkdir /var/opt/rh/rh-postgresql12/lib/pgsql
# mount -a
# mv /var/opt/rh/rh-postgresql12/lib/pgsql-orig/* /var/opt/rh/rh-postgresql12/lib/pgsql
# chown postgres:postgres /var/opt/rh/rh-postgresql12/lib/pgsql
# chmod 700 /var/opt/rh/rh-postgresql12/lib/pgsql
# restorecon -v /var/opt/rh/rh-postgresql12/lib/pgsql
# satellite-maintain service start

2) Run LEAPP upgrade of Satellite 6.11 to RHEL8
# leapp upgrade --reboot

3) After leapp upgrade finishes check satellite health

FIX:
# hammer ping
database:         
    Status:          ok
    Server Response: Duration: 0ms
candlepin:        
    Status:          ok
    Server Response: Duration: 41ms
candlepin_auth:   
    Status:          ok
    Server Response: Duration: 40ms
candlepin_events: 
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms
katello_events:   
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms
pulp3:            
    Status:          ok
    Server Response: Duration: 183ms
pulp3_content:    
    Status:          ok
    Server Response: Duration: 77ms
foreman_tasks:    
    Status:          ok
    Server Response: Duration: 3ms


vs. 

REPRO:
# hammer ping
Error: Failed to open TCP connection to sat.local:443 (Connection refused - connect(2) for "sat.local" port 443)

Comment 10 Lukas Pramuk 2022-07-28 13:01:45 UTC

The result depends on mountpoint!

If you have one of these then you are safe and the fix delivers resolution:
/var
/var/opt
/var/opt/rh
/var/opt/rh/rh-postgresql12
/var/opt/rh/rh-postgresql12/lib
/var/opt/rh/rh-postgresql12/lib/pgsql
All these possible mountpoints are fixed by this BZ -> VERIFIED

However if the mountpoint is lowest possible one:
/var/opt/rh/rh-postgresql12/lib/pgsql/data

Then you shouldn't proceed !
LEAPP upgrade hangs forever and after forced reboot system is unusable -> tracked by BZ 2111835

Comment 13 errata-xmlrpc 2022-08-23 20:14:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (leapp-repository bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6141