Bug 2100282

Summary: Stopping httpd normally during ipa.service stop fails, causes 90 second delay before httpd is killed
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: python3.11Assignee: Tomáš Hrnčiar <thrnciar>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: abokovoy, cstratak, ftrivino, ipa-maint, jcholast, jhrozek, mhjacks, mhroncok, pvoborni, python-maint, python-sig, rcritten, ssorce, thrnciar, torsava, twoerner, vstinner
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: python3.11-3.11.0~b3-6.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-24 13:11:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2022-06-23 00:47:28 UTC
Filing this against freeipa as I suspect it's ultimately a freeipa issue not an apache one, though it's hard to be sure.

In current Rawhide - since composes started working again after the Python 3.11 merge - stopping ipa.service takes much longer than normal. The logs show that this is because stopping httpd normally does not work, and systemd waits 90 seconds before giving up and killing it:

Jun 22 09:09:52 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopping ipa.service - Identity, Policy, Audit...
Jun 22 09:09:54 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopping ipa-dnskeysyncd.service - IPA key daemon...
Jun 22 09:09:54 ipa001.test.openqa.fedoraproject.org ipa-dnskeysyncd[7785]: ipa-dnskeysyncd: INFO     Signal 15 received: Shutting down!
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: ipa-dnskeysyncd.service: Deactivated successfully.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopped ipa-dnskeysyncd.service - IPA key daemon.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ipa-dnskeysyncd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: ipa-dnskeysyncd.service: Consumed 4.879s CPU time.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: ipa-otpd.socket: Deactivated successfully.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Closed ipa-otpd.socket - ipa-otpd socket.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopped target pki-tomcatd.target - PKI Tomcat Server.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopping pki-tomcatd - PKI Tomcat Server pki-tomcat...
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopping ipa-custodia.service - IPA Custodia Service...
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: ipa-custodia.service: Deactivated successfully.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopped ipa-custodia.service - IPA Custodia Service.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: Java virtual machine used: /usr/lib/jvm/jre-17-openjdk/bin/java
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: ipa-custodia.service: Consumed 1.112s CPU time.
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ipa-custodia comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: main class used: org.apache.catalina.startup.Bootstrap
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: flags used: -Dcom.redhat.fips=false
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: arguments used: stop
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[7946]: NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopping httpd.service - The Apache HTTP Server...
Jun 22 09:09:55 ipa001.test.openqa.fedoraproject.org server[6068]: WARNING: The web application [acme] appears to have started a thread named [ACMEEngineConfigFileSource] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
Jun 22 09:09:56 ipa001.test.openqa.fedoraproject.org systemd[1]: pki-tomcatd: Deactivated successfully.
Jun 22 09:09:56 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopped pki-tomcatd - PKI Tomcat Server pki-tomcat.
Jun 22 09:09:56 ipa001.test.openqa.fedoraproject.org audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pki-tomcatd@pki-tomcat comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 22 09:09:56 ipa001.test.openqa.fedoraproject.org systemd[1]: pki-tomcatd: Consumed 45.767s CPU time.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: State 'stop-sigterm' timed out. Killing.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7147 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7149 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7151 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7152 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7153 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7155 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7157 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7159 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7160 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7180 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7555 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7370 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7390 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7391 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7392 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7393 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7394 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7395 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7396 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7397 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7398 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7399 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7400 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7401 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7402 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7486 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7487 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7488 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7489 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7490 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7491 (n/a) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7493 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7496 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Main process exited, code=killed, status=9/KILL
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7180 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7555 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7370 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7390 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7487 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7490 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7493 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 7496 (httpd) with signal SIGKILL.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Failed with result 'timeout'.
Jun 22 09:11:25 ipa001.test.openqa.fedoraproject.org systemd[1]: Stopped httpd.service - The Apache HTTP Server.

The Apache logs don't show anything useful, so I'm not sure why this is happening yet. I can poke into this some more tomorrow if nobody else gets to it first, wanted to file it for now.

Comment 1 Alexander Bokovoy 2022-06-23 12:45:20 UTC
I think this might be another sign of https://github.com/GrahamDumpleton/mod_wsgi/issues/730

Comment 2 Adam Williamson 2022-06-23 19:35:19 UTC
THAT was it! I knew we had an existing issue similar to this, but I just couldn't find it. Thanks for the link.

So...did something change in Python 3.11, or did something change in mod_wsgi? I think it must be Python 3.11, as there's no change to mod_wsgi that fits the timeframe where this bug re-appeared, which is between Fedora-Rawhide-20220619.n.0 and Fedora-Rawhide-20220621.n.3 .

Comment 3 Adam Williamson 2022-06-23 19:52:21 UTC
Ah, OK, so yeah, I think it is that again. What we did back in F34 / Python 3.9 timeframe was to revert the upstream Python commit that we identified as fixing the problem, in 00371-revert-bpo-1596321-fix-threading-_shutdown-for-the-main-thread-gh-28549-gh-28589.patch . That patch is still on our current python3.9 and python3.10 specs, but it was dropped from python3.11 when it was first created:

https://src.fedoraproject.org/rpms/python3.11/c/cf1cfe95f840cfade40df7c32513237a00c876ae?branch=rawhide

So it looks like this has never been fixed and has been lurking all along...

Comment 4 Adam Williamson 2022-06-23 19:59:12 UTC
Kicking to Python 3.11 for now. Python maintainers, can we have the reversion on the 3.11 branch too? The proximate cause of this bug showing up again is the reversion being dropped between our Python 3.10 and Python 3.11 packages.

Comment 5 Miro Hrončok 2022-06-23 20:15:12 UTC
Yeah, I decided to drop this hoping it would be fixed in the meantime somehow better. We will add that back.

Comment 7 Fedora Update System 2022-06-24 13:08:02 UTC
FEDORA-2022-ca859a1deb has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-ca859a1deb

Comment 8 Fedora Update System 2022-06-24 13:11:25 UTC
FEDORA-2022-ca859a1deb has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.