Created attachment 1899683 [details] backtrace from virtqemud Description of problem: Cloning an issue found in #2075736 Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. enable the modula virt daemons 2. config the mondula deamons to not timeout vis # grep ARGS /etc/sysconfig/virt* /etc/sysconfig/virtnetworkd:VIRTNETWORKD_ARGS="" /etc/sysconfig/virtqemud:VIRTQEMUD_ARGS="" /etc/sysconfig/virtstoraged:VIRTSTORAGED_ARGS="" 3. load up libvirt with lots of creating, starting, stopping and deleting Actual results: virt-install, for instance hangs (for hours) Additional info: Workaround 1 is: sudo gstack $(pgrep virtqemud) > /dev/null which will unblock virtqemud (presumably it causes a lock blocked in a syscall to retry) Workaround 2 is to stick with the monolythic daemon?
Debuginfo backtrace of rpc-libvirtd thread stuck in futex: (gdb) thread 22 bt [Switching to thread 22 (Thread 0x7f2239ffb640 (LWP 56946))] #0 futex_wait (private=0, expected=2, futex_word=0x7f229c35c108 <pw_mc_ctx+8>) at ../sysdeps/nptl/futex-internal.h:146 Downloading 0.00 MB source file /usr/src/debug/glibc-2.35-15.fc36.x86_64/nptl/../sysdeps/nptl/futex-internal.h 146 int err = lll_futex_timed_wait (futex_word, expected, NULL, private); (gdb) bt #0 futex_wait (private=0, expected=2, futex_word=0x7f229c35c108 <pw_mc_ctx+8>) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait (futex=futex@entry=0x7f229c35c108 <pw_mc_ctx+8>, private=0) at lowlevellock.c:49 #2 0x00007f22c78901e2 in lll_mutex_lock_optimized (mutex=0x7f229c35c108 <pw_mc_ctx+8>) at pthread_mutex_lock.c:48 #3 ___pthread_mutex_lock (mutex=mutex@entry=0x7f229c35c108 <pw_mc_ctx+8>) at pthread_mutex_lock.c:93 #4 0x00007f229c353736 in sss_mt_lock (ctx=0x7f229c35c100 <pw_mc_ctx>) at src/sss_client/nss_mc_common.c:61 #5 sss_nss_mc_init_ctx (ctx=0x7f229c35c100 <pw_mc_ctx>, name=0x7f229c35903b "passwd") at src/sss_client/nss_mc_common.c:155 #6 sss_nss_mc_get_ctx (name=name@entry=0x7f229c35903b "passwd", ctx=ctx@entry=0x7f229c35c100 <pw_mc_ctx>) at src/sss_client/nss_mc_common.c:226 #7 0x00007f229c353aa7 in sss_nss_mc_getpwnam (name=name@entry=0x7f22241191b0 "1000", name_len=name_len@entry=4, result=result@entry=0x7f2239ffa400, buffer=buffer@entry=0x7f22240b5b00 "nginx", buflen=buflen@entry=1024) at src/sss_client/nss_mc_passwd.c:110 #8 0x00007f229c355d89 in _nss_sss_getpwnam_r (name=0x7f22241191b0 "1000", result=0x7f2239ffa400, buffer=0x7f22240b5b00 "nginx", buflen=1024, errnop=0x7f2239ffb5b8) at src/sss_client/nss_passwd.c:153 #9 0x00007f22c78dc3a3 in __getpwnam_r (name=name@entry=0x7f22241191b0 "1000", resbuf=resbuf@entry=0x7f2239ffa400, buffer=<optimized out>, buflen=1024, result=result@entry=0x7f2239ffa3f0) at ../nss/getXXbyYY_r.c:273 #10 0x00007f22c7f5d74a in virGetUserIDByName (name=0x7f22241191b0 "1000", uid=0x7f2239ffa4b4, missing_ok=missing_ok@entry=false) at ../src/util/virutil.c:714 #11 0x00007f22c7f5e3ef in virGetUserID (user=user@entry=0x7f22241191b0 "1000", uid=uid@entry=0x7f2239ffa4b4) at ../src/util/virutil.c:755 #12 0x00007f22c7f5e575 in virParseOwnershipIds (label=<optimized out>, uidPtr=uidPtr@entry=0x7f2239ffa544, gidPtr=gidPtr@entry=0x7f2239ffa540) at ../src/util/virutil.c:1320 #13 0x00007f229e4b4166 in qemuDomainGetImageIds (cfg=cfg@entry=0x7f225c20a010, vm=vm@entry=0x7f225c5521f0, src=src@entry=0x7f22a40be1b0, parentSrc=parentSrc@entry=0x7f22a40be1b0, uid=uid@entry=0x7f2239ffa544, gid=gid@entry=0x7f2239ffa540) at ../src/qemu/qemu_domain.c:7374 #14 0x00007f229e4bf02d in qemuDomainDetermineDiskChain (driver=0x7f225c205d80, vm=0x7f225c5521f0, disk=0x7f22a80e3aa0, disksrc=0x7f22a40be1b0, report_broken=true) at ../src/qemu/qemu_domain.c:7594 #15 0x00007f229e5558db in qemuProcessPrepareHostStorage (flags=17, vm=0x7f225c5521f0, driver=0x7f225c205d80) at ../src/qemu/qemu_process.c:6691 #16 qemuProcessPrepareHost (driver=0x7f225c205d80, vm=<optimized out>, flags=17) at ../src/qemu/qemu_process.c:7066 #17 0x00007f229e55b811 in qemuProcessStart (conn=0x7f22a0003b30, driver=0x7f225c205d80, vm=0x7f225c5521f0, updatedCPU=0x0, asyncJob=QEMU_ASYNC_JOB_START, migrateFrom=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=<optimized out>) at ../src/qemu/qemu_process.c:7845 #18 0x00007f229e578726 in qemuDomainObjStart.constprop.0 (conn=0x7f22a0003b30, driver=driver@entry=0x7f225c205d80, vm=0x7f225c5521f0, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../src/qemu/qemu_driver.c:6372 #19 0x00007f229e4dac6d in qemuDomainCreateWithFlags (dom=0x7f22b0022b80, flags=0) at ../src/qemu/qemu_driver.c:6424 #20 0x00007f22c812a816 in virDomainCreate (domain=domain@entry=0x7f22b0022b80) at ../src/libvirt-domain.c:6740 #21 0x000056079b57f33e in remoteDispatchDomainCreate (server=0x56079bd5c880, msg=0x56079f5d9100, args=<optimized out>, rerr=0x7f2239ffa970, client=0x56079d0f6200) at src/remote/remote_daemon_dispatch_stubs.h:4947 #22 remoteDispatchDomainCreateHelper (server=0x56079bd5c880, client=0x56079d0f6200, msg=0x56079f5d9100, rerr=0x7f2239ffa970, args=<optimized out>, ret=0x0) at src/remote/remote_daemon_dispatch_stubs.h:4926 #23 0x00007f22c801ae1f in virNetServerProgramDispatchCall (msg=0x56079f5d9100, client=0x56079d0f6200, server=0x56079bd5c880, prog=0x56079bdcb010) at ../src/rpc/virnetserverprogram.c:428 #24 virNetServerProgramDispatch (prog=0x56079bdcb010, server=0x56079bd5c880, client=0x56079d0f6200, msg=0x56079f5d9100) at ../src/rpc/virnetserverprogram.c:302 #25 0x00007f22c801b332 in virNetServerProcessMsg (srv=srv@entry=0x56079bd5c880, client=0x56079d0f6200, prog=<optimized out>, msg=0x56079f5d9100) at ../src/rpc/virnetserver.c:140 #26 0x00007f22c801b441 in virNetServerHandleJob (jobOpaque=0x56079d1edf00, opaque=0x56079bd5c880) at ../src/rpc/virnetserver.c:160 #27 0x00007f22c7f5827d in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164 #28 0x00007f22c7f5778b in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:256 #29 0x00007f22c788ce2d in start_thread (arg=<optimized out>) at pthread_create.c:442 #30 0x00007f22c79121b0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) print *ctx $1 = {initialized = UNINITIALIZED, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 4294967295, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\377\377\377\377", '\000' <repeats 23 times>, __align = 0}, fd = -1, seed = 0, mmap_base = 0x0, mmap_size = 0, data_table = 0x0, dt_size = 0, hash_table = 0x0, ht_size = 0, active_threads = 1} (gdb) #10 0x00007f22c7f5d74a in virGetUserIDByName (name=0x7f22241191b0 "1000", uid=0x7f2239ffa4b4, missing_ok=missing_ok@entry=false) at ../src/util/virutil.c:714 714 while ((rc = getpwnam_r(name, &pwbuf, strbuf, strbuflen, &pw)) == ERANGE) { shouldn't name==cagney? Ah, probably because the VM has: <seclabel type='static' model='dac' relabel='yes'> <label>1000:107</label> <imagelabel>1000:107</imagelabel> </seclabel>
It very similar to bug 2109451. Could you please help to check the sssd version?
$ rpm -q sssd sssd-2.7.3-1.fc36.x86_64
Probably the same; marking this as blocked by that. This bug locks up in getpwnam*(). Bug 2109451 locks up in getpwuid*() so fixing latter should fix this.
(In reply to cagney from comment #4) > Probably the same It is. Fix already posted for review upstream: https://github.com/SSSD/sssd/pull/6289
This was extracted from Bug 2075736
Pushed PR: https://github.com/SSSD/sssd/pull/6289 * `master` * 0f3a761ed9d654a61f8caed8eae3863c518b9911 - CLIENT:MC: store context mutex outside of context as it should survive context destruction / re-initialization * `sssd-2-7` * 03142f8de42faf4f75465d24d3be9a49c2dd86f7 - CLIENT:MC: store context mutex outside of context as it should survive context destruction / re-initialization
*** Bug 2116164 has been marked as a duplicate of this bug. ***
Did this fix get pushed to Fedora?
(In reply to cagney from comment #9) > Did this fix get pushed to Fedora? Not yet.
Moreover, additional patch is required: https://github.com/SSSD/sssd/pull/6305
Maybe it would be faster to temporarily revert related changes introduced in 2.7.3 * https://sssd.io/release-notes/sssd-2.7.3.html It would be nice to unblock fedora user as fast as possible. BZ is opened for more than 3 weeks.
https://github.com/SSSD/sssd/pull/6305 accepted upstream and tested by QE in bz 2116389
Pushed PR: https://github.com/SSSD/sssd/pull/6305 * `master` * 4ac93d9c5df59cdb7f397b4467f1c1c4822ff757 - CLIENT:MC: pointer to the context mutex shouldn't be touched * 579cc0b266d5f8954bc71cfcd3fe68002d681a5f - CLIENT:MC: -1 is more appropriate initial value for fd * `sssd-2-7` * d386e94ef49d95d7305a3e6578e41a2cf61dfc5c - CLIENT:MC: pointer to the context mutex shouldn't be touched * 0eae0862069e4bbbdd87b809193fc873f3003cff - CLIENT:MC: -1 is more appropriate initial value for fd
*** Bug 2120680 has been marked as a duplicate of this bug. ***
(In reply to Alexey Tikhonov from comment #14) > Pushed PR: https://github.com/SSSD/sssd/pull/6305 > > * `master` > * 4ac93d9c5df59cdb7f397b4467f1c1c4822ff757 - CLIENT:MC: pointer to the > context mutex shouldn't be touched > * 579cc0b266d5f8954bc71cfcd3fe68002d681a5f - CLIENT:MC: -1 is more > appropriate initial value for fd > * `sssd-2-7` > * d386e94ef49d95d7305a3e6578e41a2cf61dfc5c - CLIENT:MC: pointer to the > context mutex shouldn't be touched > * 0eae0862069e4bbbdd87b809193fc873f3003cff - CLIENT:MC: -1 is more > appropriate initial value for fd This is a great new it is already fixed in upstream. But looks like another week passed and fedora users are still affected. Is there any ETA when patches will be backported to fedora? Or at least temporarily revert related changes introduced in 2.7.3 We(fedora users) would appreciate a progress here :-)
It seems to help. I'm back to virsh operations taking 1s (they should take no time) rather than hanging. I'll follow that up in bug 2075736
This message is a reminder that Fedora Linux 36 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '36'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 36 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.