Bug 2210237

Summary: systemd-239-31 prevents some units from stopping
Product: Red Hat Enterprise Linux 9 Reporter: Michal Sekletar <msekleta>
Component: systemdAssignee: Michal Sekletar <msekleta>
Status: ASSIGNED --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.2CC: jamacku, systemd-maint-list
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Sekletar 2023-05-26 08:51:26 UTC
This bug was initially created as a copy of Bug #1875056

I am copying this bug because: 



Description of problem:
postgresql unit fails to stop when running in container based on ubi8-init with systemd-239-31.

Version-Release number of selected component (if applicable):
RHEL 8.2
systemd-239-31
postgresql 12.1

After installing, enabling and starting of the postgresql

sh-4.4# systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-09-01 11:43:08 UTC; 39s ago
  Process: 41 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=0/SUCCESS)
 Main PID: 62 (postmaster)
    Tasks: 8 (limit: 101454)
   Memory: 25.1M
   CGroup: /docker/c816f955e1014d6b6bce7eecf51b5267b3bab5679b256d4dbdd45fb203e12b56/system.slice/postgresql.service
           ├─62 /usr/bin/postmaster -D /var/lib/pgsql/data
           ├─63 postgres: logger   
           ├─65 postgres: checkpointer   
           ├─66 postgres: background writer   
           ├─67 postgres: walwriter   
           ├─68 postgres: autovacuum launcher   
           ├─69 postgres: stats collector   
           └─70 postgres: logical replication launcher   

Sep 01 11:43:07 rhv-flow-node.lab.eng.tlv2.redhat.com systemd[1]: Starting PostgreSQL database server...
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.442 UTC LOG:  starting PostgreSQL 12.1 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.443 UTC LOG:  listening on IPv4 address "0.0.0.0", port 5432
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.443 UTC LOG:  listening on IPv6 address "::", port 5432
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.449 UTC LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.459 UTC LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.521 UTC LOG:  redirecting log output to logging collector process
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.521 UTC HINT:  Future log output will appear in directory "log".
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com systemd[1]: Started PostgreSQL database server.

all the processes are in place:

sh-4.4# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.5  0.0  94504  6968 ?        Ss   11:43   0:00 /sbin/init
root          31  0.8  0.0  93880  7828 ?        Ss   11:43   0:00 /usr/lib/systemd/systemd-journald
root          43  0.5  0.0  12172  3804 ?        Ss   11:43   0:00 /usr/sbin/haveged -w 1024 -v 1 --Foreground
root          44  0.0  0.0 106000  1636 ?        Ssl  11:43   0:00 /usr/sbin/gssproxy -D
root          45  1.0  0.0 225768  3900 ?        Ssl  11:43   0:00 /usr/sbin/rsyslogd -n
dbus          51  0.0  0.0  53952  2504 ?        Ss   11:43   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
chrony        53  0.0  0.0  97100  1504 ?        S    11:43   0:00 /usr/sbin/chronyd
postgres      62  0.2  0.0 295468  7568 ?        Ss   11:43   0:00 /usr/bin/postmaster -D /var/lib/pgsql/data
postgres      63  0.0  0.0 143828  1844 ?        Ss   11:43   0:00 postgres: logger   
postgres      65  0.0  0.0 295468  1880 ?        Ss   11:43   0:00 postgres: checkpointer   
postgres      66  0.0  0.0 295468  2092 ?        Ss   11:43   0:00 postgres: background writer   
postgres      67  0.0  0.0 295468  2084 ?        Ss   11:43   0:00 postgres: walwriter   
postgres      68  0.0  0.0 296016  2864 ?        Ss   11:43   0:00 postgres: autovacuum launcher   
postgres      69  0.0  0.0 145948  2068 ?        Ss   11:43   0:00 postgres: stats collector   
postgres      70  0.0  0.0 296036  2600 ?        Ss   11:43   0:00 postgres: logical replication launcher   
root          72  0.0  0.0  12008  1980 pts/1    Ss   11:43   0:00 sh
root          81  0.0  0.0  44584  1780 pts/1    R+   11:44   0:00 ps aux

when trying to stop the postgresql

sh-4.4# systemctl stop postgresql

the systemctl gets stuck. When connecting from other shell the postgresql is active (not stopped) although the main PID exited succesfully:

sh-4.4# systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: deactivating (final-sigkill) (thawing) since Tue 2020-09-01 11:46:07 UTC; 5min ago
  Process: 62 ExecStart=/usr/bin/postmaster -D ${PGDATA} (code=exited, status=0/SUCCESS)
  Process: 41 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=0/SUCCESS)
 Main PID: 62 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 101454)
   Memory: 22.5M
   CGroup: /docker/c816f955e1014d6b6bce7eecf51b5267b3bab5679b256d4dbdd45fb203e12b56/system.slice/postgresql.service

Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.442 UTC LOG:  starting PostgreSQL 12.1 on x86_64-redhat-linux-gnu, compiled by gcc (GC>
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.443 UTC LOG:  listening on IPv4 address "0.0.0.0", port 5432
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.443 UTC LOG:  listening on IPv6 address "::", port 5432
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.449 UTC LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.459 UTC LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.521 UTC LOG:  redirecting log output to logging collector process
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com postmaster[62]: 2020-09-01 11:43:08.521 UTC HINT:  Future log output will appear in directory "log".
Sep 01 11:43:08 rhv-flow-node.lab.eng.tlv2.redhat.com systemd[1]: Started PostgreSQL database server.
Sep 01 11:46:07 rhv-flow-node.lab.eng.tlv2.redhat.com systemd[1]: Stopping PostgreSQL database server...
Sep 01 11:46:07 rhv-flow-node.lab.eng.tlv2.redhat.com systemd[1]: postgresql.service: Killing process 63 (postmaster) with signal SIGKILL.

All the corresponding postgres processes got terminated as expected:

sh-4.4# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.1  0.0  94504  6968 ?        Ss   11:43   0:00 /sbin/init
root          31  0.2  0.0  93880  7828 ?        Ss   11:43   0:00 /usr/lib/systemd/systemd-journald
root          43  0.2  0.0  12172  3804 ?        Ss   11:43   0:00 /usr/sbin/haveged -w 1024 -v 1 --Foreground
root          44  0.0  0.0 106000  1636 ?        Ssl  11:43   0:00 /usr/sbin/gssproxy -D
root          45  0.3  0.0 225768  3900 ?        Ssl  11:43   0:00 /usr/sbin/rsyslogd -n
dbus          51  0.0  0.0  53952  2504 ?        Ss   11:43   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
chrony        53  0.0  0.0  97100  1504 ?        S    11:43   0:00 /usr/sbin/chronyd
root          72  0.0  0.0  12008  1988 pts/1    Ss   11:43   0:00 sh
root          86  0.0  0.0  83632  3860 pts/1    S+   11:46   0:00 systemctl stop postgresql
root          87  0.0  0.0  83452  3340 pts/1    S+   11:46   0:00 /usr/bin/systemd-tty-ask-password-agent --watch
root          88  0.1  0.0  12008  1952 pts/2    Ss   11:46   0:00 sh
root          94  0.0  0.0  44584  1780 pts/2    R+   11:46   0:00 ps aux

Kind of weird is that systemd tries to kill postmaster as process 63 but postmaster was process 62.

This bug is regression - in systemd-239-18 is not present.

The same issue was identified in Centos 8.2 (https://bugs.centos.org/view.php?id=17461)


How reproducible:
100%


Steps to Reproduce:
1. Run container based on ubi8-init
2. Install postgresql-server
3. Start the postgres via systemctl start
4. Try to stop the postgres via systemctl stop

Actual results:
systemctl stop postgresql 
gets stuck and the unit is not stopped

Expected results:
systemctl stop postgresql
ends with success and unit is stopped


Additional info:

Comment 1 Plumber Bot 2023-07-13 19:08:48 UTC
fix merged to github main branch -> https://github.com/redhat-plumbers/systemd-rhel9/pull/171