Bug 2404361
| Summary: | rpm update does not restart httpd reliable - wrong order of operations | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | customercare |
| Component: | httpd | Assignee: | Luboš Uhliarik <luhliari> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | anon.amish, jorton, luhliari, mturk |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | --- | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2025-12-05 16:53:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
customercare
2025-10-16 07:48:29 UTC
Here we go again: [Mon Nov 03 04:01:10.349422 2025] [mpm_event:notice] [pid 2483928:tid 2483928] AH00491: caught SIGTERM, shutting down [Mon Nov 03 04:01:10.630929 2025] [suexec:notice] [pid 1333351:tid 1333351] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon Nov 03 04:01:10.888680 2025] [so:warn] [pid 1333351:tid 1333351] AH01574: module http2_module is already loaded, skipping [Mon Nov 03 04:01:11.239634 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Mon Nov 03 04:01:11.239694 2025] [core:notice] [pid 1333360:tid 1333360] AH00094: Command line: '/usr/sbin/httpd' [Mon Nov 03 04:01:55.619624 2025] [mpm_event:notice] [pid 1333360:tid 1333360] AH00491: caught SIGTERM, shutting down [Mon Nov 03 10:13:39.656671 2025] [suexec:notice] [pid 1464477:tid 1464477] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon Nov 03 10:13:39.780258 2025] [so:warn] [pid 1464477:tid 1464477] AH01574: module http2_module is already loaded, skipping [Mon Nov 03 10:13:39.945162 2025] [mpm_event:notice] [pid 1464478:tid 1464478] AH00489: Apache/2.4.65 (Fedora Linux) OpenSSL/3.2.6 configured -- resuming normal operations [Mon Nov 03 10:13:39.945187 2025] [core:notice] [pid 1464478:tid 1464478] AH00094: Command line: '/usr/sbin/httpd' Upgrade httpd-0:2.4.65-1.fc41.x86_64 User updates Upgrade httpd-core-0:2.4.65-1.fc41.x86_64 Dependency updates Upgrade httpd-filesystem-0:2.4.65-1.fc41.noarch Dependency updates Upgrade httpd-tools-0:2.4.65-1.fc41.x86_64 Dependency updates Upgrade mod_ssl-1:2.4.65-1.fc41.x86_64 User updates Upgrade mod_lua-0:2.4.65-1.fc41.x86_64 Weak Dependency updates Next server down due to httpd update.. Please, fix this. You're also using a modified httpd.service here (I can tell from the SIGTERM not SIGWINCH), please can you provide the "systemctl show httpd.service" output? There is a trade-off between uptime and reliability here. The 100% reliable way to update is "systemctl stop httpd / dnf update / systemctl start httpd", which sacrifices uptime for reliability. The %posttrans is a best effort which is generally reliable (evidence: you are the only user who I've seen reporting this issue) without sacrificing uptime, and it is configurable exactly because it won't be the behaviour that's desirable for all users. If you're seeing this frequently I'd suggest you disable it. I checked the show output, it's not really helpfull due to the massive output:
When we started with the apache cluster 15 years ago we needed some changes, so we created this, now with the Wants= After= changes from the other ticket, to compensate the ipv6 issue from NetworkManager 1.52.
We needed "LimitNOFILE=1000000" which did not work otherwise. So the Exec* lines may be a bit older than ususal.
------------------------------------------------
[Unit]
Description=The Apache HTTP Server (prefork MPM)
After=syslog.target network.target remote-fs.target nss-lookup.target network-online.target
Wants=network-online.target
[Service]
Type=forking
PIDFile=/var/run/httpd/httpd.pid
LimitNOFILE=1000000
#EnvironmentFile=/etc/sysconfig/httpd
ExecStart=/usr/sbin/httpd $OPTIONS -k start
ExecReload=/usr/sbin/httpd $OPTIONS -t
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/usr/sbin/httpd $OPTIONS -k stop
PrivateTmp=true
[Install]
WantedBy=multi-user.target
------------------------------------------------
In the F42 service file, the mechanics is a bit different.
------------------------------------------------
[Service]
Type=notify
Environment=LANG=C
ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
# Send SIGWINCH for graceful stop
KillSignal=SIGWINCH
KillMode=mixed
------------------------------------------------
From the systemctl manpage we read this:
--no-block
Do not synchronously wait for the requested operation to finish. If this is not specified, the job will be verified, enqueued and systemctl will wait until the unit's start-up is completed. By passing this argument, it is only verified and enqueued. This
option may not be combined with --wait.
Due not waiting for the service to finish, my guess is, that the systemd is seeing running processes from the old httpd instance and tries to kill them like "killall -9 httpd", but does limit it to old pids.
If --no-block is removed from the rpm script, it should wait for all httpds to finish and then start them new. That will sometimes take a while, but tbh.. it's seconds we are talking about. There is no need to bother about this delay, except a httpd is in an endless loop.
I did not notice one hanging httpd in years now, and as said, we have a big cluster of httpds. That's the reason why i opened the bugreport, because it's happening more often and became a noticeable issue.
Question is, if a "Restart=always" in the service file would solve the issue, which would be the easiest way as we already ship our own service file.
Please revert to the Fedora stock httpd.service and use drop-ins for any Limit* configuration you want. That service file you presented has race conditions and we stopped shipping a service like that in Fedora for *over a decade* exactly because it has race conditions. Yup. We switched to KillMode=mixed in September 2014. https://src.fedoraproject.org/rpms/httpd/c/36930381bc186af121a2439f92b8fe2c2c6f3acc |