Bug 2033016
Summary: | systemd/1 - possible circular locking dependency detected on aarch64 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jakub Čajka <jcajka> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, dustymabe, fedoraproject, filbranden, flepied, hdegoede, jarodwilson, jeremy, jeremy.linton, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, lnykryn, masami256, mchehab, msekleta, pbrobinson, ptalbert, ryncsn, ssahani, s, steved, systemd-maint, yuwatana, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | aarch64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2022-02-21 16:56:17 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 245418 | ||||||||
Attachments: |
|
Description
Jakub Čajka
2021-12-15 17:31:23 UTC
Given this is happening when css_set_lock has the lock, and css_set_lock is used exclusively by cgroups, I'm guessing COSA is running things with cgroups, probably in containers? This is running inside a Fedora CoreOS VM run via qemu. `cosa` is the tool that does the setup and launch of the VM. All this is to say that we should be able to reproduce without `cosa` in the chain. Jakub also was able to reproduce on an AWS and in a baremetal instance. The failing test actually means that we're not uploading any of the images (i.e. I don't have a link for you to download) but if that would be useful I can do a build and post it up somewhere. Just let me know which platform you prefer. For reference.. The failing test (the one that triggers the warning) is in this test file [1] and is the equivalent of running: ``` sudo podman info --format json sudo podman run --net=none --rm --memory=128m --memory-swap=128m echo echo 1 sudo podman run --net=none --rm --memory-reservation=10m echo echo 1 sudo podman run --net=none --rm --cpu-shares=100 echo echo 1 sudo podman run --net=none --rm --cpu-period=1000 echo echo 1 sudo podman run --net=none --rm --cpuset-cpus=0 echo echo 1 sudo podman run --net=none --rm --cpuset-mems=0 echo echo 1 sudo podman run --net=none --rm --cpu-quota=1000 echo echo 1 sudo podman run --net=none --rm --blkio-weight=10 echo echo 1 sudo podman run --net=none --rm --memory=128m echo echo 1 sudo podman run --net=none --rm --shm-size=1m echo echo 1 ``` [1] https://github.com/coreos/coreos-assembler/blob/1183b990f3cdfd223651c49c4bd93fbd26a178e4/mantle/kola/tests/podman/podman.go#L119-L122 But the locking is a kernel thing, no? Yes. I assume it's an issue with the RC kernel. Created attachment 1846632 [details]
reduced reproducer
I have managed to reproduce this in aarch64 rawhide(5.16.0-0.rc5.20211214git5472f14a3742.36.fc36.aarch64) VM. Script based on FCOS's podman.base test case is in attachment. It might take more than one run to trigger this. Also it seems it will happen just once per reboot. To note nothing seems to fail, but the issue is logged.
As a note, this doesn't at first glance appear to be aarch64 specific, although TBD. Obviously, this isn't going to be reproducible on any of the release/etc kernels as they won't have CONFIG_PROVE_LOCKING enabled. If it were causing wider problems I would expect them to appear as soft lockup messages with similar callstacks. I'm still trying to duplicate it on the honeycomb. I don't think it is aarch64 specific. Just happened to be where we were seeing it in our CI (see https://github.com/coreos/fedora-coreos-tracker/issues/1049). I did comment later in that issue and mention I saw a similar lock, this time on x86_64: https://github.com/coreos/fedora-coreos-tracker/issues/1049#issuecomment-998252332 So, I duplicated this, and it is probably legitimate but requires a cgroup migration with multiple threads to be running when an async signal is being sent to a task being migrated. AKA I dont think the indicated tests can deadlock, ive got a bit of code which should, but doesn't yet. Its a bit tricky to intuit out a solution, but its likely assuring that obj_cgroup_release (see commit bf4f059954dcb221384b2f784677e19a13cd4bdb) is delayed. There is a frozen task check in get_signal() which may be expanded (because AFAIK it doesn't avoid this lock dependency) to cover cgroup task migration as well as already frozen tasks, but that isn't obvious/clean. I'm going to see if i can force a deadlock some more, and then post what i have to lk. Just a note. We are still able to reproduce this in our CI env: https://github.com/coreos/fedora-coreos-tracker/issues/1049#issuecomment-1018540617 Well, I slimmed down the test program, but never managed to create a standalone deadlocking program. OTOH, I think I have a fairly simple patch (~3 lines) that fixes it a fairly clean way (basically I'm just replacing the spinlock/list_del/unlock with a list_del_rcu). I've been running the podman test in a loop for a few hours now and it hasn't poped, although it would go away for a bit in the past as well.. Jeremy, Since we're able to reproduce the issue pretty easy (at least it was last I checked) if you can provide me a kernel scratch build I can run it through FCOS CI and see if it resolves the issue. Sure, which kernel version do you prefer, although, hu, I'm still playing with different ways to fix it, although I've got some other issues I'm juggling so this is a bit slow too. I should have most of tomorrow to start closing this out (and post a suggested fix to LKML as I've been promising). Let me do that, and roll you a scratch build (although I'm not 100% confident I can get the config right for rawhide in an official build, lets see). The latest kernel version in rawhide (which I was able to reproduce this issue on yesterday about 50% of the time) is kernel-5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36 You can apply a patch on top of the rawhide branch and do a scratch build (you can limit the build to x86_64/aarch64 unless you want to wait a long time for the armv7/s390x builds to complete). You've probably applied a patch to the kernel before and done a scratch build, but in case it's helpful here's a recent example where I did: https://src.fedoraproject.org/rpms/kernel/pull-request/50 Yah, I will see if the scratch build picks up the right config... The public posting/patch is here https://lore.kernel.org/lkml/20220201205623.1325649-1-jeremy.linton@arm.com/T/#u Pretty sure it fixed the problem here, but i'm not sure it doesn't break something else. So there is a scratch build here: https://koji.fedoraproject.org/koji/taskinfo?taskID=82256731 That scratch build seems to have solved the problem for me. A modified version has been merged to -mm, and seems like its going to be merged to -stable as well soon. Cool. Let me know when it hits a tag and we'll see if things make it down into a fedora kernel so we can test. This is in 5.17rc4. Is it safe to say it's in kernel-5.17.0-0.rc4.96.fc36 (https://bodhi.fedoraproject.org/updates/FEDORA-2022-e27e6736b8) ? Since the kernel in branched and rawhide are both rc4+ I'll mark this as closed and notify if we see the issue any longer. |