Bug 2404361

Summary:	rpm update does not restart httpd reliable - wrong order of operations
Product:	[Fedora] Fedora	Reporter:	customercare
Component:	httpd	Assignee:	Luboš Uhliarik <luhliari>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	anon.amish, jorton, luhliari, mturk
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	---
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2025-12-05 16:53:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description customercare 2025-10-16 07:48:29 UTC

OS: Fedora ALL
Version: httpd ALL
Mechanism: crond -> dnf update -y
Pattern found: yes

The security update of httpd tonight did not restart the running httpd processes on all servers in our cluster. 2 servers were unable to restart httpd in the update process.

Root Cause: wrong order of commands: START -> STOP, instead of STOP -> waitforallPIDstoStop() -> START

Caused by: --no-block option in posttransation script:

test -f /etc/sysconfig/httpd-disable-posttrans || \
  /bin/systemctl try-restart --no-block httpd.service htcacheclean.service >/dev/null 2>&1 || :

Explanation:

httpd process under load need sometimes a lot of time to finish. As a result, the stop of the all units is delayed, which does not work well with the --no-block option -> "Do not synchronously wait for the requested operation to finish." As a result, the asyncronously started httpds get razored by the systemd hard kill after the timeout for waiting on the stop order. 

Result: Disruption of Production Servers 

Fix: hard restart off httpd -> systemctl stop httpd; systemctl start httpd


ATTN: This has been reported in the past, without any reaction. 

NOTE: dnf does not log the time with the correct TZ , so we have +2 h difference in dnf output and httpd error log.

[root@serverA ~]# dnf history info 137
Transaction ID : 137
Begin time     : 2025-10-16 04:01:07
Begin rpmdb    : 279ae4a64ab1fb3ac8b57d6d873f60111e2825da692a449cc2cb8ee1836a67b3
End time       : 2025-10-16 04:01:09
End rpmdb      : 4338cd8ff7a2a0879fa178135dbe8134b22fca8fefff51d0029f5e129d9cf961
User           : 0 root <root>
Status         : Ok
Releasever     : 41
Description    : dnf -y update
Comment        : 
Packages altered:
  Action   Package                                 Reason          Repository
  Upgrade  httpd-0:2.4.64-1.fc41.x86_64            User            updates
  Upgrade  httpd-core-0:2.4.64-1.fc41.x86_64       Dependency      updates
  Upgrade  httpd-filesystem-0:2.4.64-1.fc41.noarch Dependency      updates
  Upgrade  httpd-tools-0:2.4.64-1.fc41.x86_64      Dependency      updates
  Upgrade  mod_ssl-1:2.4.64-1.fc41.x86_64          User            updates
  Upgrade  mod_lua-0:2.4.64-1.fc41.x86_64          Weak Dependency updates
  Replaced httpd-0:2.4.63-1.fc41.x86_64            User            @System
  Replaced httpd-core-0:2.4.63-1.fc41.x86_64       Dependency      @System
  Replaced httpd-filesystem-0:2.4.63-1.fc41.noarch Dependency      @System
  Replaced httpd-tools-0:2.4.63-1.fc41.x86_64      Dependency      @System
  Replaced mod_lua-0:2.4.63-1.fc41.x86_64          Weak Dependency @System
  Replaced mod_ssl-1:2.4.63-1.fc41.x86_64          User            @System


O==  Server A error.log:


[Thu Oct 16 05:59:29.399720 2025] [core:error] [pid 2821499:tid 2821535] [client 114.119.132.42:20071] AH00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
[Thu Oct 16 06:01:08.271089 2025] [mpm_event:notice] [pid 1523711:tid 1523711] AH00491: caught SIGTERM, shutting down
[Thu Oct 16 06:01:08.388350 2025] [suexec:notice] [pid 2925681:tid 2925681] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Oct 16 06:01:08.582556 2025] [so:warn] [pid 2925681:tid 2925681] AH01574: module http2_module is already loaded, skipping
[Thu Oct 16 06:01:08.948539 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Thu Oct 16 06:01:08.948570 2025] [core:notice] [pid 2925732:tid 2925732] AH00094: Command line: '/usr/sbin/httpd'
[Thu Oct 16 06:01:53.525161 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00491: caught SIGTERM, shutting down
[Thu Oct 16 08:59:21.730999 2025] [suexec:notice] [pid 2975425:tid 2975425] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Oct 16 08:59:21.847368 2025] [so:warn] [pid 2975425:tid 2975425] AH01574: module http2_module is already loaded, skipping
[Thu Oct 16 08:59:21.995257 2025] [mpm_event:notice] [pid 2975426:tid 2975426] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Thu Oct 16 08:59:21.995283 2025] [core:notice] [pid 2975426:tid 2975426] AH00094: Command line: '/usr/sbin/httpd'


The most important part is this: START -> STOP


[Thu Oct 16 06:01:08.948539 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Thu Oct 16 06:01:08.948570 2025] [core:notice] [pid 2925732:tid 2925732] AH00094: Command line: '/usr/sbin/httpd'
[Thu Oct 16 06:01:53.525161 2025] [mpm_event:notice] [pid 2925732:tid 2925732] AH00491: caught SIGTERM, shutting down

If I'm not mistaken, the normal order of restarts is: 

STOP -> START, not START -> STOP.



O==  SERVER B error.log


[Thu Oct 16 08:00:50.414068 2025] [access_compat:error] [pid 2569289:tid 2569333] [client 83.246.80.131:37186] AH01797: client denied by server configuration: ... 

... normal operation until ... 

[Thu Oct 16 08:01:06.425285 2025] [mpm_event:notice] [pid 1454:tid 1454] AH00491: caught SIGTERM, shutting down
[Thu Oct 16 08:01:06.529325 2025] [suexec:notice] [pid 2775308:tid 2775308] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Oct 16 08:01:06.672450 2025] [so:warn] [pid 2775308:tid 2775308] AH01574: module http2_module is already loaded, skipping
[Thu Oct 16 08:01:06.774310 2025] [mpm_event:notice] [pid 2775317:tid 2775317] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Thu Oct 16 08:01:06.774328 2025] [core:notice] [pid 2775317:tid 2775317] AH00094: Command line: '/usr/sbin/httpd'

... log messages from still running httpds ...

[Thu Oct 16 08:01:34.743160 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: install_driver(mysql) failed: Can't load '/usr/lib64/perl5/vendor_perl/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.21: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line 206.
[Thu Oct 16 08:01:34.743183 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: 
[Thu Oct 16 08:01:34.743201 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: Compilation failed in require at (eval 5) line 3.
[Thu Oct 16 08:01:34.743216 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64: Perhaps a required shared library or dll isn't installed where expected
[Thu Oct 16 08:01:34.743231 2025] [cgid:error] [pid 2775610:tid 2775645] [client 216.244.66.226:41858] AH01215: stderr from /etc/httpd/bin/cgiwrap64:  at /home/bboah-hardwarede/bboah_old_cgi/cgi-bin/db_common.pl line 34.

... and here is the SIGTERM stopping it again ...

[Thu Oct 16 08:01:51.556992 2025] [mpm_event:notice] [pid 2775317:tid 2775317] AH00491: caught SIGTERM, shutting down

... manual restart by admins ...

[Thu Oct 16 08:59:08.273029 2025] [suexec:notice] [pid 2793036:tid 2793036] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Oct 16 08:59:08.356051 2025] [so:warn] [pid 2793036:tid 2793036] AH01574: module http2_module is already loaded, skipping
[Thu Oct 16 08:59:08.452154 2025] [mpm_event:notice] [pid 2793037:tid 2793037] AH00489: Apache/2.4.64 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Thu Oct 16 08:59:08.452174 2025] [core:notice] [pid 2793037:tid 2793037] AH00094: Command line: '/usr/sbin/httpd'

Comment 1 customercare 2025-11-03 09:19:59 UTC

Here we go again:

[Mon Nov 03 04:01:10.349422 2025] [mpm_event:notice] [pid 2483928:tid 2483928] AH00491: caught SIGTERM, shutting down
[Mon Nov 03 04:01:10.630929 2025] [suexec:notice] [pid 1333351:tid 1333351] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Mon Nov 03 04:01:10.888680 2025] [so:warn] [pid 1333351:tid 1333351] AH01574: module http2_module is already loaded, skipping
[Mon Nov 03 04:01:11.239634 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Mon Nov 03 04:01:11.239694 2025] [core:notice] [pid 1333360:tid 1333360] AH00094: Command line: '/usr/sbin/httpd'
[Mon Nov 03 04:01:55.619624 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00491: caught SIGTERM, shutting down

[Mon Nov 03 10:13:39.656671 2025] [suexec:notice] [pid 1464477:tid 1464477] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Mon Nov 03 10:13:39.780258 2025] [so:warn] [pid 1464477:tid 1464477] AH01574: module http2_module is already loaded, skipping
[Mon Nov 03 10:13:39.945162 2025] [mpm_event:notice] [pid 1464478:tid 1464478] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations
[Mon Nov 03 10:13:39.945187 2025] [core:notice] [pid 1464478:tid 1464478] AH00094: Command line: '/usr/sbin/httpd'

  Upgrade  httpd-0:2.4.65-1.fc41.x86_64            User            updates
  Upgrade  httpd-core-0:2.4.65-1.fc41.x86_64       Dependency      updates
  Upgrade  httpd-filesystem-0:2.4.65-1.fc41.noarch Dependency      updates
  Upgrade  httpd-tools-0:2.4.65-1.fc41.x86_64      Dependency      updates
  Upgrade  mod_ssl-1:2.4.65-1.fc41.x86_64          User            updates
  Upgrade  mod_lua-0:2.4.65-1.fc41.x86_64          Weak Dependency updates

Comment 2 customercare 2025-12-02 21:55:38 UTC

Next server down due to httpd update.. 

Please, fix this.

Comment 3 Joe Orton 2025-12-03 15:22:39 UTC

You're also using a modified httpd.service here (I can tell from the SIGTERM not SIGWINCH), please can you provide the "systemctl show httpd.service" output?

There is a trade-off between uptime and reliability here. The 100% reliable way to update is "systemctl stop httpd / dnf update / systemctl start httpd", which sacrifices  uptime for reliability. The %posttrans is a best effort which is generally reliable (evidence: you are the only user who I've seen reporting this issue) without sacrificing uptime, and it is configurable exactly because it won't be the behaviour that's desirable for all users. If you're seeing this frequently I'd suggest you disable it.

Comment 4 customercare 2025-12-03 16:11:42 UTC

I checked the show output, it's not really helpfull due to the massive output:

When we started with the apache cluster 15 years ago we needed some changes, so we created this, now with the Wants= After= changes from the other ticket, to compensate the ipv6 issue from NetworkManager 1.52.

We needed "LimitNOFILE=1000000" which did not work otherwise. So the Exec* lines may be a bit older than ususal.

------------------------------------------------
[Unit]
Description=The Apache HTTP Server (prefork MPM)
After=syslog.target network.target remote-fs.target nss-lookup.target network-online.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/var/run/httpd/httpd.pid
LimitNOFILE=1000000
#EnvironmentFile=/etc/sysconfig/httpd
ExecStart=/usr/sbin/httpd $OPTIONS -k start
ExecReload=/usr/sbin/httpd $OPTIONS -t
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/usr/sbin/httpd $OPTIONS -k stop
PrivateTmp=true

[Install]
WantedBy=multi-user.target
------------------------------------------------

In the F42 service file, the mechanics is a bit different.

------------------------------------------------
[Service]
Type=notify
Environment=LANG=C

ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
# Send SIGWINCH for graceful stop
KillSignal=SIGWINCH
KillMode=mixed
------------------------------------------------

From the systemctl manpage we read this:

--no-block
Do not synchronously wait for the requested operation to finish. If this is not specified, the job will be verified, enqueued and systemctl will wait until the unit's start-up is completed. By passing this argument, it is only verified and enqueued. This
option may not be combined with --wait.

Due not waiting for the service to finish, my guess is, that the systemd is seeing running processes from the old httpd instance and tries to kill them like "killall -9 httpd", but does limit it to old pids.

If --no-block is removed from the rpm script, it should wait for all httpds to finish and then start them new. That will sometimes take a while, but tbh.. it's seconds we are talking about. There is no need to bother about this delay, except a httpd is in an endless loop.

I did not notice one hanging httpd in years now, and as said, we have a big cluster of httpds. That's the reason why i opened the bugreport, because it's happening more often and became a noticeable issue.

Question is, if a "Restart=always" in the service file would solve the issue, which would be the easiest way as we already ship our own service file.

Comment 5 Joe Orton 2025-12-05 16:53:29 UTC

Please revert to the Fedora stock httpd.service and use drop-ins for any Limit* configuration you want. That service file you presented has race conditions and we stopped shipping a service like that in Fedora for *over a decade* exactly because it has race conditions.

Comment 6 Joe Orton 2025-12-05 16:56:44 UTC

Yup. We switched to KillMode=mixed in September 2014.

https://src.fedoraproject.org/rpms/httpd/c/36930381bc186af121a2439f92b8fe2c2c6f3acc