Bug 1809548

Summary: Mount option hidepid=2 on /proc may cause MariaDB to not start
Product: Red Hat Enterprise Linux 7 Reporter: Lennart Schedin <lennart.schedin>
Component: systemdAssignee: systemd-maint
Status: CLOSED WONTFIX QA Contact: Frantisek Sumsal <fsumsal>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.7CC: fkrska, lennart.schedin, msekleta, systemd-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 21:38:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Dummy service for easy reproduction
none
Dummy service ExecStartPost script for easy reproduction
none
Dummy service restarter loop script for easy reproduction none

Description Lennart Schedin 2020-03-03 11:50:27 UTC
How to reproduce (the hard MariaDB way):
1. Use RHEL 7 or CentOS 7.
2. Install mariadb-server.
3. Set hidepid=2 on the /proc file system.
4. Reboot the machine (or run systemctl restart mariadb.service).
5. Login and run systemctl status mariadb.service.

Actual Result (the hard MariaDB way):
This is how it looks between mariadb.service is started until TimeoutSec=300 has happened:
[root@mas ~]# systemctl status mariadb.service
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: deactivating (stop-sigterm) (Result: exit-code)
 Main PID: 1999 (mysqld_safe)
   CGroup: /system.slice/mariadb.service
           ├─1999 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
           └─2465 /usr/libexec/mysqld --basedir=/usr --datadir=/home/mysql/ --plugin-dir=/usr/lib64/mysql/plugin --log-error=...

Feb 19 14:46:29 mas mariadb-prepare-db-dir[1930]: Database MariaDB is probably initialized in /home/mysql/ already, noth...done.
Feb 19 14:46:29 mas mariadb-prepare-db-dir[1930]: If this is not the case, make sure the /home/mysql/ is empty before ru...-dir.
Feb 19 14:46:29 mas systemd[1]: mariadb.service: control process exited, code=exited status=1
Feb 19 14:46:29 mas mysqld_safe[1999]: 200219 14:46:29 mysqld_safe Logging to '/home/mysql//mas.err'.
Feb 19 14:46:29 mas mysqld_safe[1999]: 200219 14:46:29 mysqld_safe Starting mysqld daemon with databases from /home/mysql/

This is how it looks after TimeoutSec=300 has happened:
[root@mas xsense]# systemctl status mariadb.service
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2020-02-21 11:56:01 UTC; 6min ago
  Process: 2003 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)
  Process: 2002 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=killed, signal=TERM)
  Process: 1944 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 2002 (code=killed, signal=TERM)

Feb 21 11:56:01 mas mariadb-prepare-db-dir[1944]: Database MariaDB is probably initialized in /home/mysql/ already, nothing is done.
Feb 21 11:56:01 mas mariadb-prepare-db-dir[1944]: If this is not the case, make sure the /home/mysql/ is empty before running mariadb-prepare-db-dir.
Feb 21 11:56:01 mas systemd[1]: mariadb.service: control process exited, code=exited status=1
Feb 21 11:56:01 mas systemd[1]: Failed to start MariaDB database server.
Feb 21 11:56:01 mas systemd[1]: Unit mariadb.service entered failed state.
Feb 21 11:56:01 mas systemd[1]: mariadb.service failed.


Expected Result (the hard MariaDB way):
mariadb.service should have started successfully.

Reproduction rate (the hard MariaDB way):
1 in 30 times the entire VM is rebooted (this makes it very hard to reproduce this way).


----------------------------------------------------------------



How to reproduce (the easy dummy.service way):
1. Set up a VM (or use hardware).
I tried both a RHEL 7.7 and a CentOS 7.7 VM on my Intel 8700 (6 core CPU with hyperthreading) on Windows 10 using VirtaulBox. I could reproduce it both when allocating the VM 2 and 12 vCPUs. I could not reproduce it when only allocating the VM 1 vCPU. 

I could also reproduce this on VMware ESXi, 6.7.0 on a HPE ProLiant DL360 Gen9 with Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz. I used 8 vCPU for the VM.

I could also reproduce this in hardware, on HPE ProLiant DL360 Gen10 with dual Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz.

2. Set the hidepid=2 parameter on the /proc mount. This can be done using /etc/fstab (and a reboot) or just executing this command to set it dynamically (will not survive a reboot):
sudo mount -o remount,rw,hidepid=2 /proc

3. Create the following 3 files in /tmp:
----------- /tmp/dummy.service -----------
[Unit]
Description=Dummy service

[Service]
Type=simple
User=nobody
ExecStart=/bin/sleep 100000
ExecStartPost=/tmp/dummy_service_exec_start_post.sh $MAINPID

[Install]
WantedBy=multi-user.target


----------- /tmp/dummy_service_exec_start_post.sh -----------
#!/bin/sh

MAINPID=$1
scriptname=`basename "$0"`
user=`whoami`
echo "$scriptname running as $user. MAINPID=$MAINPID"

if ! [ -d "/proc/$MAINPID" ] ; then
    echo "Directory /proc/$MAINPID did not exist!"
    exit 42 # exit code "meaning of life" that gets picked up by systemd
fi
echo "Found MAINPID: $MAINPID"

exit 0



----------- /tmp/dummy_service_restarter.sh -----------
#!/bin/sh

# Command used to enable the service
# systemctl enable /tmp/dummy.service

# Will cause the script to run foreever:
# sudo mount -o remount,rw,hidepid=1 /proc

# Will cause the script to fail after time time:
# sudo mount -o remount,rw,hidepid=2 /proc

scriptname=`basename "$0"`

while true; do
    systemctl reset-failed dummy.service # Make it possible to restart with high     frequency
    systemctl restart dummy.service
    exitcode=$?
    if [ $exitcode -ne 0 ]; then
        echo "$scriptname: Failed to restart! systemctl exited with status $exit    code"
        exit 1
    fi
done


4. Set the executable bit on the shell files (chmod 755 /tmp/*.sh)

5. Enable the service:
systemctl enable /tmp/dummy.service

6. Run /tmp/dummy_service_restarter.sh and wait about 0 to 5 minutes and then run systemctl status dummy.serivce.

7. If nothing happens you need to stress the CPU. I needed to do this on hardware and on the ESXi server to get reliable reproduction result. On my own desktop in Virtualbox I did not need it. I executed this command (to stress 8 cores (adapt the numbers to your setting)):
for i in 1 2 3 4 5 6 7 8; do while : ; do : ; done & done


Actual Result (the easy dummy.service way):
[root@localhost tmp]# time /tmp/dummy_service_restarter.sh
Job for dummy.service failed because the control process exited with error code.                                                                         See "systemctl status dummy.service" and "journalctl -xe" for details.
dummy_service_restarter.sh: Failed to restart! systemctl exited with status 1

real    0m7.280s
user    0m1.966s
sys     0m1.974s

[root@localhost tmp]# systemctl status dummy.service
● dummy.service - Dummy service
   Loaded: loaded (/tmp/dummy.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2020-02-21 12:14:35 UTC; 3min 17s ago
  Process: 10865 ExecStartPost=/tmp/dummy_service_exec_start_post.sh $MAINPID (code=exited, status=42)
  Process: 10864 ExecStart=/bin/sleep 100000 (code=killed, signal=TERM)
 Main PID: 10864 (code=killed, signal=TERM)

Feb 21 12:14:35 localhost.localdomain systemd[1]: Starting Dummy service...
Feb 21 12:14:35 localhost.localdomain systemd[1]: dummy.service: control process exited, code=exited status=42
Feb 21 12:14:35 localhost.localdomain systemd[1]: Failed to start Dummy service.
Feb 21 12:14:35 localhost.localdomain systemd[1]: Unit dummy.service entered failed state.
Feb 21 12:14:35 localhost.localdomain systemd[1]: dummy.service failed.


Sometimes I had to wait up to 5 minutes. Typical wait time is 30 seconds. Reproduction rate (using >= 2 CPU, setting hidepid=2 and stressing the CPU): 100 %. I reproduced it about 30 times in different environments.



----------------------------------------------------------------



Analysis of what is happening:
Systemd will execute the code in ExecStart= and ExecStartPost= almost at the same time. If the User= is set to a non-root user both code will execute as that user (I used "nobody"). If the code in ExecStartPost= makes a check for the /proc/$MAINPID directory there is a small time-gap where the OS/kernel/filesystem (?) reports that the pid directory does not exist (because of the hidepid=2 mount option). If the ExecStartPost= then exits with a non-zero return code systemd will kill the main process and mark the service as failed. In my script you can see that systemd picks up my 42 (meaning of life) return code.


Workarounds:
Workaround 1) Don’t use hidepid=2 on /proc. I verified that hidepid=1 and hidepid=0 is both fine.

Workaround 2) Create a small sleep between ExecStart= and ExecStartPost=. For example, by creating this drop-in system file (and directory):
----------- /etc/systemd/system/dummy.service.d/workaround.conf -----------
[Service]
ExecStartPost=
ExecStartPost=/bin/sleep 0.2
ExecStartPost=/git/dummy_service_exec_start_post.sh $MAINPID

Just 0.2 seconds of sleep was enough to make the problem go away.

Comment 2 Lennart Schedin 2020-03-03 11:50:57 UTC
Created attachment 1667177 [details]
Dummy service for easy reproduction

Comment 3 Lennart Schedin 2020-03-03 11:52:02 UTC
Created attachment 1667178 [details]
Dummy service ExecStartPost script for easy reproduction

Comment 4 Lennart Schedin 2020-03-03 11:52:37 UTC
Created attachment 1667179 [details]
Dummy service restarter loop script for easy reproduction

Comment 8 Chris Williams 2020-11-11 21:38:56 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7