I'm observing a situation where curl requests hang on subsequent REST API requests. I'm running a couple of trivial containers (bash) and a single empty pod, all created via the podman CLI for the purposes of testing new PCP code. Version-Release number of selected component (if applicable): Fedora34, podman-3.2.0-0.1.rc1.fc34.x86_64 How reproducible: Every time. Steps to Reproduce: $ sudo curl -s --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/containers/json [{"AutoRemove":false,"Command":["bash"],"Created":"2021-05-07T15:18:58.11280636+10:00","CreatedAt":"","Exited":false,"ExitedAt":-62135596800,"ExitCode":0,"Id":"d747ee1448150d051b34a8d675c9c8de49a9db78406da1b1fcad4c88523a1459","Image":"registry.fedoraproject.org/fedora:latest","ImageID":"eb7134a03cd6bd8a3de99c16cf174d66ad2d93724bac3307795efcd8aaf914c5","IsInfra":false,"Labels":{"license":"MIT","name":"fedora","vendor":"Fedora Project","version":"32"},"Mounts":[],"Names":["eloquent_euler"],"Namespaces":{},"Networks":["podman"],"Pid":3201686,"Pod":"","PodName":"","Ports":null,"Size":null,"StartedAt":1620364738,"State":"running","Status":""}] $ sudo curl -s --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/pods/json [ hangs ] ^C $ sudo curl -s --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/containers/json [ also hangs now ] ^C However, accesses to a regular users podman socket (i.e. non-root) continue just fine for both cases (pods/json and containers/json).
Can you verify that it only hangs after the Pods call happens, or that regardless of order, the second call always hangs?
(In reply to Matthew Heon from comment #1) > Can you verify that it only hangs after the Pods call happens, or that > regardless of order, the second call always hangs? Thanks Matt - yes, using the pods API definitely seems to be the trigger for the hang. With a freshly started REST service I can make any number of successful containers/json queries, but as soon as I introduce a pods query, it hangs. One other thing I neglected to mention, which may help reproduce - I am often times running curl in a way to replicate what the PCP client code is going to be doing, i.e. reusing a single socket connection. So any number of containers requests, chained from one client or not, is successful: $ sudo curl --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/containers/json http://d/v3.0.0/libpod/containers/json http://d/v3.0.0/libpod/containers/json [3 responses, curl completes] $ but introducing pods/json and the wheels fall off... $ sudo curl --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/containers/json http://d/v3.0.0/libpod/pods/json [hangs immediately, either no response at all - or just the containers response if the service was restarted just prior to running curl]
@mheon FYI, using a build from master (8dcd5b893feb8b3b34386e6688d20b7ef098228b). I was not able to reproduce: $ bin/podman system reset curl --unix-socket /run/user/1000/podman/podman.sock http://d/v3.0.0/libpod/pods/json [] $ bin/podman pod create --name pod001 $ curl --unix-socket /run/user/1000/podman/podman.sock http://d/v3.0.0/libpod/pods/json [{"Cgroup":"user.slice","Containers":[{"Id":"527c6d322dd68e261630a3f46b3a7db3b1b79ceb0c09d8aeb36012fa8efa28b2","Names":"b9ab3ec058fc-infra","Status":"configured"}],"Created":"2021-05-11T15:26:25.325078296-07:00","Id":"b9ab3ec058fc50417bd68e79a40b0ff75d67c074ca37cba1c73a012c7bb40953","InfraId":"527c6d322dd68e261630a3f46b3a7db3b1b79ceb0c09d8aeb36012fa8efa28b2","Name":"pod001","Namespace":"","Networks":null,"Status":"Created","Labels":{}}] $ bin/podman pull fedora ... $ bin/podman run -it fedora date ... $ curl --unix-socket /run/user/1000/podman/podman.sock http://d/v3.0.0/libpod/pods/json [{"Cgroup":"user.slice","Containers":[{"Id":"527c6d322dd68e261630a3f46b3a7db3b1b79ceb0c09d8aeb36012fa8efa28b2","Names":"b9ab3ec058fc-infra","Status":"configured"}],"Created":"2021-05-11T15:26:25.325078296-07:00","Id":"b9ab3ec058fc50417bd68e79a40b0ff75d67c074ca37cba1c73a012c7bb40953","InfraId":"527c6d322dd68e261630a3f46b3a7db3b1b79ceb0c09d8aeb36012fa8efa28b2","Name":"pod001","Namespace":"","Networks":null,"Status":"Created","Labels":{}}]
Jhon, try the curl invocations from my second post? Having multiple URLs on the curl command line seemed to make things more likely to fail for me (internally this results in a single socket connection with multiple HTTP requests and responses are processed) - mixing pod and container requests that way may be a critical part of the puzzle too.
Concur with Jhon, this doesn't reproduce on current upstream. Possibly was a bug in the 3.2.0 RC, since fixed? Can you try building the latest code from our main branch and verify if you can still see this, Nathan?
I've upgraded to latest podman in Fedora 34 - the REST API hangs very quickly still with the podman version there: $ rpm -qi podman Name : podman Epoch : 3 Version : 3.1.2 Release : 3.fc34 Architecture: x86_64 Which seems to be the latest stable release. BTW there's a broken link here: https://podman.io/new/2021/04/02/new.html I'll attempt to build and run unreleased upstream podman next (or is there a safe copr/CI/... build I could use?)
OK, I've managed to build with no problems from the podman github repo. The problem still exists there in the main branch, still trivially reproducible for me. I managed to get gdb to attach to the hung process and this is where it seems to be - looks like a futex deadlock perhaps? $ podman --version podman version 3.2.0-dev $ sudo gdb -p 35574 GNU gdb (GDB) Fedora 10.1-14.fc34 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". Attaching to process 35574 [New LWP 35575] [New LWP 35576] [New LWP 35577] [New LWP 35578] [New LWP 35579] [New LWP 35580] [New LWP 35581] [New LWP 35582] [New LWP 35583] [New LWP 35584] [New LWP 35585] [New LWP 35587] [New LWP 35595] warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0. warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/lib/golang/src/runtime/runtime-gdb.py". Missing separate debuginfos, use: dnf debuginfo-install podman-3.1.2-3.fc34.x86_64 --Type <RET> for more, q to quit, c to continue without paging--q Quit Loading Go Runtime support. (gdb) runtime.futex () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:580 580 MOVL AX, ret+40(FP) (gdb) bt #0 runtime.futex () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:580 #1 0x000000000043e9c6 in runtime.futexsleep (addr=0x2896a30 <runtime.m0+336>, val=0, ns=-1) at /usr/lib/golang/src/runtime/os_linux.go:44 #2 0x0000000000417b3f in runtime.notesleep (n=0x2896a30 <runtime.m0+336>) at /usr/lib/golang/src/runtime/lock_futex.go:159 #3 0x0000000000447eb9 in runtime.mPark () at /usr/lib/golang/src/runtime/proc.go:1340 #4 0x0000000000449cce in runtime.stoplockedm () at /usr/lib/golang/src/runtime/proc.go:2451 #5 0x000000000044beec in runtime.schedule () at /usr/lib/golang/src/runtime/proc.go:3059 #6 0x000000000044c2bd in runtime.park_m (gp=0xc000602480) at /usr/lib/golang/src/runtime/proc.go:3274 #7 0x0000000000478d7b in runtime.mcall () at /usr/lib/golang/src/runtime/asm_amd64.s:327 #8 0x0000000000478c74 in runtime.rt0_go () at /usr/lib/golang/src/runtime/asm_amd64.s:226 #9 0x0000000000000000 in ?? () (gdb) This is a completely up-to-date Fedora 34 system as of this morning, with just podman built from main being the only upstream non-release bits (well, other than PCP - but that's not in play here). $ sudo curl --unix-socket /run/podman/podman.sock http://d/v3.0.0/libpod/pods/json http://d/v3.0.0/libpod/pods/json 'http://d/v3.0.0/libpod/containers/stats?stream=false' 'http://d/v3.0.0/libpod/containers/json' (hangs) ^C $ podman pod list POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS a67e64258be1 jovial_lehmann Created 2 weeks ago 8ecc96186124 1 $ sudo podman pod list (hangs) ^C $ podman container list CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 936eecade00f registry.fedoraproject.org/fedora:latest bash 9 minutes ago Up 9 minutes ago nostalgic_germain fb89274a964a registry.fedoraproject.org/fedora:latest bash 7 minutes ago Up 7 minutes ago silly_kalam $ sudo podman container list (hangs) Looking at the gdb stack trace for the two hanging podman list commands, it looks the same as the REST API case. (or maybe I'm misreading the stack trace - I don't have experience debugging golang code)
Are you still able to reproduce, since no one else has regenerated this issue.
Yes, still fails reliably for me - same as always (numerous f34 updates & reboots since last post). Let me know if there's any other debugging information I can provide. strace is still suggesting its hanging in a futex call, as before... $ sudo strace -p 1447472 strace: Process 1447472 attached futex(0x56332c856ef0, FUTEX_WAIT_PRIVATE, 0, NULL^Cstrace: Process 1447472 detached <detached ...> I see also that podman on the command line hangs too (i.e. not only via REST API?) - same sort of strace... read(9, "# The containers configuration f"..., 16427) = 16426 read(9, "", 1) = 0 close(9) = 0 geteuid() = 0 newfstatat(AT_FDCWD, "/etc/containers/storage.conf", {st_mode=S_IFREG|0644, st_size=7834, ...}, 0) = 0 newfstatat(AT_FDCWD, "/etc/containers/storage.conf", {st_mode=S_IFREG|0644, st_size=7834, ...}, 0) = 0 futex(0xc000580150, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1 RT_2], NULL, 8) = 0 futex(0xc000580150, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1 RT_2], NULL, 8) = 0 futex(0xc000628150, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55d95fe76ef0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xc000068d50, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1 RT_2], NULL, 8) = 0 futex(0xc000180150, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x55d95fe76ef0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x55d95fe76ef0, FUTEX_WAIT_PRIVATE, 0, NULL[hangs here] cheers.
Are there any podman debugging diagnostics I can enable to dig deeper? If not, could someone in the podman team provide a debug version (or even patch) I can run to assist further? Thanks!
Quick update - nothing has improved and the situation seems to have got significantly worse with F35. If it wasn't made clear earlier, these bugs prevent monitoring of podman containers using tools like PCP and atop. To make it even more clear - everything used to work just fine for us when libvarlink was being used by podman, but nothing has worked properly since. :( With F35 I now see previously-not-hanging API requests hanging every time, e.g. # curl --output /tmp/containers.json --unix-socket /run/podman/podman.sock -v 'http://d/v3.0.0/libpod/containers/json' * Trying /run/podman/podman.sock:0... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to d (/run/podman/podman.sock) port 80 (#0) > GET /v3.0.0/libpod/containers/json HTTP/1.1 > Host: d > User-Agent: curl/7.79.1 > Accept: */* > 0 0 0 0 0 0 0 0 --:--:-- 0:00:38 --:--:-- 0 (that final counter - at 38 seconds so far there - just keeps going up, and the request never completes) The example request (/libpod/images) used on the podman REST API docs first page works fine, albeit very slowly (2sec to respond to a single request - that's quite poor latency). # curl --output /tmp/images.json --unix-socket /run/podman/podman.sock -v 'http://d/v3.0.0/libpod/images/json' * Trying /run/podman/podman.sock:0... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to d (/run/podman/podman.sock) port 80 (#0) > GET /v3.0.0/libpod/images/json HTTP/1.1 > Host: d > User-Agent: curl/7.79.1 > Accept: */* > 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0* Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Api-Version: 1.40 < Content-Type: application/json < Libpod-Api-Version: 3.4.1 < Server: Libpod/3.4.1 (linux) < X-Reference-Id: 0xc002db8000 < Date: Mon, 08 Nov 2021 22:46:59 GMT < Transfer-Encoding: chunked < { [7970 bytes data] 100 74332 0 74332 0 0 32735 0 --:--:-- 0:00:02 --:--:-- 32745 * Connection #0 to host d left intact (i.e. it completes successfully, but takes ~2 seconds - its visibly slow to complete on the command line)
Note that strace has a -k (obtain stack trace between each syscall) option. Maybe it can show where the futex is waiting for/from.
This message is a reminder that Fedora Linux 35 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '35'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 35 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13. Fedora Linux 35 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days