Bug 1852298
Summary: | Install VM fails and RT host get Call Trace "WARNING: CPU: 18 PID: 2947 at fs/eventfd.c:68 eventfd_signal+0x7e/0x90" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Pei Zhang <pezhang> | ||||
Component: | kernel-rt | Assignee: | Juri Lelli <jlelli> | ||||
kernel-rt sub component: | File Systems - Other | QA Contact: | Pei Zhang <pezhang> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | ailan, bhu, chayang, dhowells, jinzhao, jlelli, jmoyer, juzhang, lcapitulino, lgoncalv, mszeredi, mtosatti, nilal, peterx, swhiteho, virt-maint | ||||
Version: | 8.3 | Keywords: | Regression, TestBlocker | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.3 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-rt-4.18.0-229.rt7.41.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-04 02:26:35 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1817732, 1823810 | ||||||
Attachments: |
|
Description
Pei Zhang
2020-06-30 05:23:44 UTC
Created attachment 1699236 [details]
VM installation hang here
This is real time only issue. We cannot reproduce with standard kernel. 4.18.0-221.rt7.33.el8.x86_64 fail 4.18.0-221.el8.x86_64 works well Hi, I had a look and commit 68848557caf6 ("[fs] eventfd: track eventfd_signal() recursion depth") (kernel-4.18.0-218.el8) introduced the check that generates the WARNING reported. Now, it is not immediately clear to me if the erroneous condition pointed out by the warning and the fact that VM install hangs are correlated. I'm auditing recent changes (more info below), but as a first try I'd kindly ask Pei to try to replicate with kernel-rt-4.18.0-218.rt7.30.el8 (so that we restrict the number of potential problematic changes). As a second tentative (don't have high hopes on it, but still think it's worth a shot) I'd try using a more recent compose to start the VM install, as I seem to remember I had troubles with testing using some compose from last week (don't remember the specific one, sorry). Could you also please try to replicate with, say, http://download.eng.pek2.redhat.com/nightly/RHEL-8.3.0-20200702.n.0/compose/BaseOS/x86_64/os/ ? Asking this because I noticed dracut failing to download artifacts from the install log you shared. Anyway, by looking at code, I noticed that eventfd_signal_count() is not called from anywhere, which looks suspicious as eventfd_signal() comment states: "Deadlock or stack overflow issues can happen if we recurse here through waitqueue wakeup handlers. If the caller users potentially nested waitqueues with custom wakeup handlers, then it should check eventfd_signal_count() before calling this function. If it returns true, the eventfd_signal() call should be deferred to a safe context." In upstream RT (v5.6.17-rt10) eventfd_signal_count() is called from two locations: - fs/io_uring.c <<io_poll_wake>> - fs/aio.c <<aio_poll_wake>> Both locations are different in 8.3 (more recent code got pulled from upstream with the major fs update that got into 218). So, I'll try to understand if a problem might reside somewhere around here. Adding Jeff in Cc, as he backported the fs changes and might have suggestions on how to tackle this. (In reply to Juri Lelli from comment #3) > Hi, > > I had a look and commit 68848557caf6 ("[fs] eventfd: track eventfd_signal() > recursion depth") (kernel-4.18.0-218.el8) introduced the check that generates > the WARNING reported. > > Now, it is not immediately clear to me if the erroneous condition pointed > out by the warning and the fact that VM install hangs are correlated. > > I'm auditing recent changes (more info below), but as a first try I'd kindly > ask Pei to try to replicate with kernel-rt-4.18.0-218.rt7.30.el8 (so that we > restrict the number of potential problematic changes). As a second tentative Hi Juri, 4.18.0-218.rt7.30.el8.x86_64 hit this issue. So 4.18.0-218.rt7.30.el8.x86_64 is the first version which hit this problem. > (don't have high hopes on it, but still think it's worth a shot) I'd try > using > a more recent compose to start the VM install, as I seem to remember I had > troubles with testing using some compose from last week (don't remember the > specific one, sorry). Could you also please try to replicate with, say, > http://download.eng.pek2.redhat.com/nightly/RHEL-8.3.0-20200702.n.0/compose/ > BaseOS/x86_64/os/ > ? Asking this because I noticed dracut failing to download artifacts from the > install log you shared. Yes, I'm testing with the latest repo with 4.18.0-218.rt7.30.el8.x86_64. Reproduced this issue. Thank you. Best regards, Pei Another info: When using more vcpus (--vcpus=10,cpuset=16,18,1,3,5,7,9,11,13,15) to install, everything looks good, didn't hit this issue. virt-install \ --graphics type=spice,listen=0.0.0.0 \ --name=rhel8.3_rt_8vcpu \ --machine q35 \ --memory=4096,hugepages=yes \ --memorybacking hugepages=yes,size=1,unit=G,locked=yes,access.mode=shared \ --cpu host-passthrough,cache.mode=passthrough,numa.cell0.memory=4194304,numa.cell0.cpus=0-9 \ --numatune=1 \ --vcpus=10,cpuset=16,18,1,3,5,7,9,11,13,15 \ --controller type=scsi,model=virtio-scsi,driver.iommu=on \ --controller type=virtio-serial,driver.iommu=on \ --rng type=/dev/random,driver.iommu=on \ --memballoon virtio,driver.iommu=on \ --disk path=/home/images_nfv-virt-rt-kvm/rhel8.3_rt_8vcpu.qcow2,size=40,target.bus=scsi \ -l http://download.eng.pek2.redhat.com/nightly/RHEL-8.3.0-20200701.n.0/compose/BaseOS/x86_64/os/ \ --noautoconsole \ --noreboot \ -x ks=http://10.73.72.41/kickstart-rhel8.cfg \ --network bridge=switch,model=virtio,mac=38:66:77:5f:dd:95,driver.iommu=on I did some further investigation on this issue, which I'll try to summarize below. Root cause of the hang seems really to be commit 68848557caf6; I could verify that cherry picking this single commit on -217.rt brings the issue in the last known good kernel and that by applying the following latest RT kernel works (I could install and start a VM). So all the other changes that came with -218 should be OK. --- fs/eventfd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index ce1d1711fbbaf..cf9df57fa3719 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -65,8 +65,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) * it returns true, the eventfd_signal() call should be deferred to a * safe context. */ - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) - return 0; + WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)); spin_lock_irqsave(&ctx->wqh.lock, flags); this_cpu_inc(eventfd_wake_count); This is of course not fixing the potential issue, but no user of eventfd_signal() is currently checking its return value, so it looks like everybody is expecting this to always work (and the fact that it doesn't in this case makes the install process to hang, guess waiting on a wakeup event that never happens). Upstream RT (v5.6.17-rt10) has the same problem. So, I'll soon report it. Apart from the issue above, I think RT also has a problem if the comment above eventfd_signal() is correct in saying that the function is expected to be working in atomic ctx: spin_lock_irqsave is converted to a mutex in RT and that of course should be allowed to sleep. I don't think we witness the related warning though, still a potential issue is present. Biggest problem in my understanding is that issue is blocking kvm-rt testing and it might take a while before we get a proper fix upstream. Do we want a stop gap "solution" of the kind above in the meanwhile? In the end the actual issue has always been present (without causing crashes or observable latencies) and we just uncovered it. (In reply to Juri Lelli from comment #6) > I did some further investigation on this issue, which I'll try to summarize > below. > > Root cause of the hang seems really to be commit 68848557caf6; I could > verify that cherry picking this single commit on -217.rt brings the issue > in the last known good kernel and that by applying the following latest > RT kernel works (I could install and start a VM). So all the other changes > that came with -218 should be OK. > > --- > fs/eventfd.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index ce1d1711fbbaf..cf9df57fa3719 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -65,8 +65,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) > * it returns true, the eventfd_signal() call should be deferred to a > * safe context. > */ > - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) > - return 0; > + WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)); > > spin_lock_irqsave(&ctx->wqh.lock, flags); > this_cpu_inc(eventfd_wake_count); > > This is of course not fixing the potential issue, but no user of > eventfd_signal() > is currently checking its return value, so it looks like everybody is > expecting > this to always work (and the fact that it doesn't in this case makes the > install > process to hang, guess waiting on a wakeup event that never happens). > > Upstream RT (v5.6.17-rt10) has the same problem. So, I'll soon report it. > Apart > from the issue above, I think RT also has a problem if the comment above > eventfd_signal() is correct in saying that the function is expected It is expected to be called from interrupt context. > to be > working in atomic ctx: spin_lock_irqsave is converted to a mutex in RT and > that of course should be allowed to sleep. I don't think we witness the > related warning though, still a potential issue is present. > > Biggest problem in my understanding is that issue is blocking kvm-rt testing > and it might take a while before we get a proper fix upstream. Do we want > a stop gap "solution" of the kind above in the meanwhile? In the end the > actual issue has always been present (without causing crashes or observable > latencies) and we just uncovered it. (In reply to Juri Lelli from comment #6) > I did some further investigation on this issue, which I'll try to summarize > below. > > Root cause of the hang seems really to be commit 68848557caf6; I could > verify that cherry picking this single commit on -217.rt brings the issue > in the last known good kernel and that by applying the following latest > RT kernel works (I could install and start a VM). So all the other changes > that came with -218 should be OK. > > --- > fs/eventfd.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index ce1d1711fbbaf..cf9df57fa3719 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -65,8 +65,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) > * it returns true, the eventfd_signal() call should be deferred to a > * safe context. > */ > - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) > - return 0; > + WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)); > > spin_lock_irqsave(&ctx->wqh.lock, flags); > this_cpu_inc(eventfd_wake_count); > > This is of course not fixing the potential issue, but no user of > eventfd_signal() > is currently checking its return value, so it looks like everybody is > expecting > this to always work (and the fact that it doesn't in this case makes the > install > process to hang, guess waiting on a wakeup event that never happens). > > Upstream RT (v5.6.17-rt10) has the same problem. So, I'll soon report it. Not sure if you have seen the following upstream patches from He Zhe: v1: https://lore.kernel.org/lkml/1586257192-58369-1-git-send-email-zhe.he@windriver.com/ v2: https://lore.kernel.org/lkml/20200410114720.24838-1-zhe.he@windriver.com/ Maybe we start discussing the initial ideas in the same thread along with the other issue that you have mentioned. > Apart > from the issue above, I think RT also has a problem if the comment above > eventfd_signal() is correct in saying that the function is expected to be > working in atomic ctx: spin_lock_irqsave is converted to a mutex in RT and > that of course should be allowed to sleep. I don't think we witness the > related warning though, still a potential issue is present. > > Biggest problem in my understanding is that issue is blocking kvm-rt testing > and it might take a while before we get a proper fix upstream. Do we want > a stop gap "solution" of the kind above in the meanwhile? In the end the > actual issue has always been present (without causing crashes or observable > latencies) and we just uncovered it. (In reply to Nitesh Narayan Lal from comment #8) > (In reply to Juri Lelli from comment #6) > > I did some further investigation on this issue, which I'll try to summarize > > below. > > > > Root cause of the hang seems really to be commit 68848557caf6; I could > > verify that cherry picking this single commit on -217.rt brings the issue > > in the last known good kernel and that by applying the following latest > > RT kernel works (I could install and start a VM). So all the other changes > > that came with -218 should be OK. > > > > --- > > fs/eventfd.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/fs/eventfd.c b/fs/eventfd.c > > index ce1d1711fbbaf..cf9df57fa3719 100644 > > --- a/fs/eventfd.c > > +++ b/fs/eventfd.c > > @@ -65,8 +65,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) > > * it returns true, the eventfd_signal() call should be deferred to a > > * safe context. > > */ > > - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) > > - return 0; > > + WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)); > > > > spin_lock_irqsave(&ctx->wqh.lock, flags); > > this_cpu_inc(eventfd_wake_count); > > > > This is of course not fixing the potential issue, but no user of > > eventfd_signal() > > is currently checking its return value, so it looks like everybody is > > expecting > > this to always work (and the fact that it doesn't in this case makes the > > install > > process to hang, guess waiting on a wakeup event that never happens). > > > > Upstream RT (v5.6.17-rt10) has the same problem. So, I'll soon report it. > > Not sure if you have seen the following upstream patches from He Zhe: > v1: > https://lore.kernel.org/lkml/1586257192-58369-1-git-send-email-zhe. > he/ > v2: https://lore.kernel.org/lkml/20200410114720.24838-1-zhe.he@windriver.com/ I didn't, thanks for the pointers! v2 seems to work, but I wonder if that "raising the bar" approach isn't too fragile. > > Maybe we start discussing the initial ideas in the same thread along with > the other issue that you have mentioned. Anyway, I'll follow the discussion and report the potential problem with upstream RT. Verification: This issue cannot be reproduced with kernel-rt-4.18.0-230.rt7.42.el8.x86_64 any more. Versions: kernel-rt-4.18.0-230.rt7.42.el8.x86_64 qemu-kvm-4.2.0-31.module+el8.3.0+7437+4bb96e0d.x86_64 tuned-2.14.0-3.el8.noarch libvirt-6.5.0-1.scrmod+el8.3.0+7493+a5e196a4.x86_64 python3-libvirt-6.0.0-1.module+el8.3.0+6423+e4cb6418.x86_64 microcode_ctl-20200609-2.el8.x86_64 rt-tests-1.8-8.el8.x86_64 12h cyclictest results: (1)Single VM with 1 rt vCPU: # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00022 (2)Single VM with 8 rt vCPUs: # Min Latencies: 00008 00012 00012 00012 00012 00012 00012 00012 # Avg Latencies: 00009 00012 00012 00012 00012 00012 00012 00012 # Max Latencies: 00020 00022 00022 00022 00022 00022 00020 00022 (3)Multiple VMs each with 1 rt vCPU: - VM1 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00019 - VM2 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00019 - VM3 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00022 - VM4 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00023 Beaker job: https://beaker.engineering.redhat.com/jobs/4457653 So this bug has been fixed very well. Move to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel-rt security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4609 |