OS: Fedora ALL Version: httpd ALL Mechanism: crond -> dnf update -y Pattern found: yes The security update of httpd tonight did not restart the running httpd processes on all servers in our cluster. 2 servers were unable to restart httpd in the update process. Root Cause: wrong order of commands: START -> STOP, instead of STOP -> waitforallPIDstoStop() -> START Caused by: --no-block option in posttransation script: test -f /etc/sysconfig/httpd-disable-posttrans || \ /bin/systemctl try-restart --no-block httpd.service htcacheclean.service >/dev/null 2>&1 || : Explanation: httpd process under load need sometimes a lot of time to finish. As a result, the stop of the all units is delayed, which does not work well with the --no-block option -> "Do not synchronously wait for the requested operation to finish." As a result, the asyncronously started httpds get razored by the systemd hard kill after the timeout for waiting on the stop order. Result: Disruption of Production Servers Fix: hard restart off httpd -> systemctl stop httpd; systemctl start httpd ATTN: This has been reported in the past, without any reaction. NOTE: dnf does not log the time with the correct TZ , so we have +2 h difference in dnf output and httpd error log. [root@serverA ~]# dnf history info 137 Transaction ID : 137 Begin time : 2025-10-16 04:01:07 Begin rpmdb : 279ae4a64ab1fb3ac8b57d6d873f60111e2825da692a449cc2cb8ee1836a67b3 End time : 2025-10-16 04:01:09 End rpmdb : 4338cd8ff7a2a0879fa178135dbe8134b22fca8fefff51d0029f5e129d9cf961 User : 0 root <root> Status : Ok Releasever : 41 Description : dnf -y update Comment : Packages altered: Action Package Reason Repository Upgrade httpd-0:2.4.64-1.fc41.x86_64 User updates Upgrade httpd-core-0:2.4.64-1.fc41.x86_64 Dependency updates Upgrade httpd-filesystem-0:2.4.64-1.fc41.noarch Dependency updates Upgrade httpd-tools-0:2.4.64-1.fc41.x86_64 Dependency updates Upgrade mod_ssl-1:2.4.64-1.fc41.x86_64 User updates Upgrade mod_lua-0:2.4.64-1.fc41.x86_64 Weak Dependency updates Replaced httpd-0:2.4.63-1.fc41.x86_64 User @System Replaced httpd-core-0:2.4.63-1.fc41.x86_64 Dependency @System Replaced httpd-filesystem-0:2.4.63-1.fc41.noarch Dependency @System Replaced httpd-tools-0:2.4.63-1.fc41.x86_64 Dependency @System Replaced mod_lua-0:2.4.63-1.fc41.x86_64 Weak Dependency @System Replaced mod_ssl-1:2.4.63-1.fc41.x86_64 User @System O== Server A error.log: [Thu Oct 16 05:59:29.399720 2025] [core:error] [pid 2821499:tid 2821535] [client 114.119.132.42:20071] AH00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace. [Thu Oct 16 06:01:08.271089 2025] [mpm_event:notice] [pid 1523711:tid 1523711] AH00491: caught SIGTERM, shutting down [Thu Oct 16 06:01:08.388350 2025] [suexec:notice] [pid 2925681:tid 2925681] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Thu Oct 16 06:01:08.582556 2025] [so:warn] [pid 2925681:tid 2925681] AH01574: module http2_module is already loaded, skipping [Thu Oct 16 06:01:08.948539 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Thu Oct 16 06:01:08.948570 2025] [core:notice] [pid 2925732:tid 2925732] AH00094: Command line: '/usr/sbin/httpd' [Thu Oct 16 06:01:53.525161 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00491: caught SIGTERM, shutting down [Thu Oct 16 08:59:21.730999 2025] [suexec:notice] [pid 2975425:tid 2975425] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Thu Oct 16 08:59:21.847368 2025] [so:warn] [pid 2975425:tid 2975425] AH01574: module http2_module is already loaded, skipping [Thu Oct 16 08:59:21.995257 2025] [mpm_event:notice] [pid 2975426:tid 2975426] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Thu Oct 16 08:59:21.995283 2025] [core:notice] [pid 2975426:tid 2975426] AH00094: Command line: '/usr/sbin/httpd' The most important part is this: START -> STOP [Thu Oct 16 06:01:08.948539 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Thu Oct 16 06:01:08.948570 2025] [core:notice] [pid 2925732:tid 2925732] AH00094: Command line: '/usr/sbin/httpd' [Thu Oct 16 06:01:53.525161 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00491: caught SIGTERM, shutting down If I'm not mistaken, the normal order of restarts is: STOP -> START, not START -> STOP. O== SERVER B error.log [Thu Oct 16 08:00:50.414068 2025] [access_compat:error] [pid 2569289:tid 2569333] [client 83.246.80.131:37186] AH01797: client denied by server configuration: ... ... normal operation until ... [Thu Oct 16 08:01:06.425285 2025] [mpm_event:notice] [pid 1454:tid 1454] AH00491: caught SIGTERM, shutting down [Thu Oct 16 08:01:06.529325 2025] [suexec:notice] [pid 2775308:tid 2775308] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Thu Oct 16 08:01:06.672450 2025] [so:warn] [pid 2775308:tid 2775308] AH01574: module http2_module is already loaded, skipping [Thu Oct 16 08:01:06.774310 2025] [mpm_event:notice] [pid 2775317:tid 2775317] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Thu Oct 16 08:01:06.774328 2025] [core:notice] [pid 2775317:tid 2775317] AH00094: Command line: '/usr/sbin/httpd' ... log messages from still running httpds ... [Thu Oct 16 08:01:34.743160 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: install_driver(mysql) failed: Can't load '/usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.21: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line 206. [Thu Oct 16 08:01:34.743183 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: [Thu Oct 16 08:01:34.743201 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: Compilation failed in require at (eval 5) line 3. [Thu Oct 16 08:01:34.743216 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: Perhaps a required shared library or dll isn't installed where expected [Thu Oct 16 08:01:34.743231 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: at /home/bboah-hardwarede/bboah_old_cgi/cgi-bin/db_common.pl line 34. ... and here is the SIGTERM stopping it again ... [Thu Oct 16 08:01:51.556992 2025] [mpm_event:notice] [pid 2775317:tid 2775317] AH00491: caught SIGTERM, shutting down ... manual restart by admins ... [Thu Oct 16 08:59:08.273029 2025] [suexec:notice] [pid 2793036:tid 2793036] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Thu Oct 16 08:59:08.356051 2025] [so:warn] [pid 2793036:tid 2793036] AH01574: module http2_module is already loaded, skipping [Thu Oct 16 08:59:08.452154 2025] [mpm_event:notice] [pid 2793037:tid 2793037] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Thu Oct 16 08:59:08.452174 2025] [core:notice] [pid 2793037:tid 2793037] AH00094: Command line: '/usr/sbin/httpd'
Here we go again: [Mon Nov 03 04:01:10.349422 2025] [mpm_event:notice] [pid 2483928:tid 2483928] AH00491: caught SIGTERM, shutting down [Mon Nov 03 04:01:10.630929 2025] [suexec:notice] [pid 1333351:tid 1333351] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon Nov 03 04:01:10.888680 2025] [so:warn] [pid 1333351:tid 1333351] AH01574: module http2_module is already loaded, skipping [Mon Nov 03 04:01:11.239634 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Mon Nov 03 04:01:11.239694 2025] [core:notice] [pid 1333360:tid 1333360] AH00094: Command line: '/usr/sbin/httpd' [Mon Nov 03 04:01:55.619624 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00491: caught SIGTERM, shutting down [Mon Nov 03 10:13:39.656671 2025] [suexec:notice] [pid 1464477:tid 1464477] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon Nov 03 10:13:39.780258 2025] [so:warn] [pid 1464477:tid 1464477] AH01574: module http2_module is already loaded, skipping [Mon Nov 03 10:13:39.945162 2025] [mpm_event:notice] [pid 1464478:tid 1464478] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Mon Nov 03 10:13:39.945187 2025] [core:notice] [pid 1464478:tid 1464478] AH00094: Command line: '/usr/sbin/httpd' Upgrade httpd-0:2.4.65-1.fc41.x86_64 User updates Upgrade httpd-core-0:2.4.65-1.fc41.x86_64 Dependency updates Upgrade httpd-filesystem-0:2.4.65-1.fc41.noarch Dependency updates Upgrade httpd-tools-0:2.4.65-1.fc41.x86_64 Dependency updates Upgrade mod_ssl-1:2.4.65-1.fc41.x86_64 User updates Upgrade mod_lua-0:2.4.65-1.fc41.x86_64 Weak Dependency updates
Next server down due to httpd update.. Please, fix this.
You're also using a modified httpd.service here (I can tell from the SIGTERM not SIGWINCH), please can you provide the "systemctl show httpd.service" output? There is a trade-off between uptime and reliability here. The 100% reliable way to update is "systemctl stop httpd / dnf update / systemctl start httpd", which sacrifices uptime for reliability. The %posttrans is a best effort which is generally reliable (evidence: you are the only user who I've seen reporting this issue) without sacrificing uptime, and it is configurable exactly because it won't be the behaviour that's desirable for all users. If you're seeing this frequently I'd suggest you disable it.
I checked the show output, it's not really helpfull due to the massive output: When we started with the apache cluster 15 years ago we needed some changes, so we created this, now with the Wants= After= changes from the other ticket, to compensate the ipv6 issue from NetworkManager 1.52. We needed "LimitNOFILE=1000000" which did not work otherwise. So the Exec* lines may be a bit older than ususal. ------------------------------------------------ [Unit] Description=The Apache HTTP Server (prefork MPM) After=syslog.target network.target remote-fs.target nss-lookup.target network-online.target Wants=network-online.target [Service] Type=forking PIDFile=/var/run/httpd/httpd.pid LimitNOFILE=1000000 #EnvironmentFile=/etc/sysconfig/httpd ExecStart=/usr/sbin/httpd $OPTIONS -k start ExecReload=/usr/sbin/httpd $OPTIONS -t ExecReload=/bin/kill -HUP $MAINPID ExecStop=/usr/sbin/httpd $OPTIONS -k stop PrivateTmp=true [Install] WantedBy=multi-user.target ------------------------------------------------ In the F42 service file, the mechanics is a bit different. ------------------------------------------------ [Service] Type=notify Environment=LANG=C ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND ExecReload=/usr/sbin/httpd $OPTIONS -k graceful # Send SIGWINCH for graceful stop KillSignal=SIGWINCH KillMode=mixed ------------------------------------------------ From the systemctl manpage we read this: --no-block Do not synchronously wait for the requested operation to finish. If this is not specified, the job will be verified, enqueued and systemctl will wait until the unit's start-up is completed. By passing this argument, it is only verified and enqueued. This option may not be combined with --wait. Due not waiting for the service to finish, my guess is, that the systemd is seeing running processes from the old httpd instance and tries to kill them like "killall -9 httpd", but does limit it to old pids. If --no-block is removed from the rpm script, it should wait for all httpds to finish and then start them new. That will sometimes take a while, but tbh.. it's seconds we are talking about. There is no need to bother about this delay, except a httpd is in an endless loop. I did not notice one hanging httpd in years now, and as said, we have a big cluster of httpds. That's the reason why i opened the bugreport, because it's happening more often and became a noticeable issue. Question is, if a "Restart=always" in the service file would solve the issue, which would be the easiest way as we already ship our own service file.
Please revert to the Fedora stock httpd.service and use drop-ins for any Limit* configuration you want. That service file you presented has race conditions and we stopped shipping a service like that in Fedora for *over a decade* exactly because it has race conditions.
Yup. We switched to KillMode=mixed in September 2014. https://src.fedoraproject.org/rpms/httpd/c/36930381bc186af121a2439f92b8fe2c2c6f3acc