Bug 1712781
Summary: | KVM-RT guest fails boot with emulatorsched | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Pei Zhang <pezhang> | ||||||
Component: | kernel-rt | Assignee: | Juri Lelli <juri.lelli> | ||||||
kernel-rt sub component: | KVM | QA Contact: | Pei Zhang <pezhang> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | bhu, chayang, chuhu, fiezzi, jinzhao, jlelli, juri.lelli, juzhang, lcapitulino, mkletzan, mtosatti, pauld, virt-maint, williams | ||||||
Version: | 8.1 | ||||||||
Target Milestone: | rc | ||||||||
Target Release: | 8.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-rt-4.18.0-176.rt13.33.el8 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-04-28 15:25:29 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1774652 | ||||||||
Bug Blocks: | 1640832, 1722609 | ||||||||
Attachments: |
|
Description
Pei Zhang
2019-05-22 08:59:53 UTC
I'll copy some more info from the previous BZ to keep everything in one place. So what I found out is that if I set the scheduler for the emulator thread before resuming the VM (current state) I get EINVAL when trying to write the vcpu TID to /tasks in the cpu,cpuacct controller (probably because that is the first one that is being tried). I cannot reproduce this with anything else, I don't know why this is happening and the only thing that would make sense to me is if this was a kernel bug. I would love if someone could find out. But when I try booting without the <emulatorsched/> setting, I can then set the scheduler and even reboot and everything works. So I am going to need to change the sequence in which libvirt is doing this (which is very inconvenient in the current state of things), but it will work. Hence I'm switching this to ASSIGNED as this needs yet another fix. In the meantime, if someone can figure out why this is happening, that would help me not to get a headache again =) I am also attaching libvirtd logs for two starts of the domain, one with <emulatorsched/>, which fails, and one without it which works. I hope that helps someone who is trying to figure out why this is happening. Some nice things to search for in those logs are "virCgroupSetValue", "error" or any QMP commands like "query-cpus" and so on. Created attachment 1571935 [details]
excerpt from libvirtd.log when starting a domain with <emulatorsched/> fails
Created attachment 1571936 [details]
excerpt from libvirtd.log when starting a domain without <emulatorsched/> works
Hi, I think the problem you were facing is related to https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L6525 I spent some time understanding how libvirtd sets up emulator and vcpu(s) properties and I believe a simpler reproducer of the very same problem is the following: # mkdir /sys/fs/cgroup/cpu,cpuacct/kvm # mkdir /sys/fs/cgroup/cpu,cpuacct/kvm/emulator # echo $$ > /sys/fs/cgroup/cpu,cpuacct/kvm/tasks # chrt -fp 10 $$ # echo $$ > /sys/fs/cgroup/cpu,cpuacct/kvm/emulator/tasks bash: echo: write error: Invalid argument This is the EINVAL libvirtd gets if it tries to first setup emulator's scheduling properties (setting it to FIFO) and then move it into emulator group. As you noticed, doing the other way around (first move it into the group and then setup scheduling properties) works OK. This sounds not correct to me, so I'll try to see what upstream folks think about it (and if there is indeed a plausible explanation to this seemingly odd behavior). Issue seems fixed with the following kernel: http://brew-task-repos.usersys.redhat.com/repos/scratch/jlelli/kernel-rt/4.18.0/100.rt16.40.el8.cpuctrl/ Related patch is currently under discussion (positive feedback from cgroups maintainer so far) upstream: https://lore.kernel.org/lkml/20190605114935.7683-1-juri.lelli@redhat.com/ libvirt master is probably fine as it is today, though. (In reply to Juri Lelli from comment #6) Thanks a lot, I'm glad this made sense, even though I could not reproduce it with just shell (no idea what I was doing differently). This must've taken you awful amount of time and that makes me appreciate it even more. The workaround might actually make more sense for us anyway, but I'm really glad it got sorted out, even if it is not upstream yet. Thank you again. The kernel fix has been in mainline for a while now and I think it would be good if we could have this in RHEL as well. It doesn't affect RHEL (where RT_GROUP_SCHED is enabled), but it might create problems (like the one this BZ is about) for RHEL-RT. Since the fix is upstream I think correct procedure is to bring it through RHEL. Phil, do you see any problems with it and would you be up for backporting it? If yes, please take the BZ and change component to kernel. Thanks a lot in any case! Upstream fix is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.4-rc8&id=a07db5c0865799ebed1f88be0df50c581fb65029 Sure. I had that on my not critical list of fixes anyway. I should be able to pull it into RHEL8.2, with sanity only testing I think. Thanks! Juri, Maybe it would be better to clone it to rhel8, then this one can get tested in RT for real. I think that makes the process better and keeps the RT part of this from getting lost. What do you think? (In reply to Phil Auld from comment #11) > Juri, > Maybe it would be better to clone it to rhel8, then this one can get > tested in RT for real. I think that makes the process better and keeps the > RT part of this from getting lost. What do you think? Yes. Makes sense to me. Please feel free to do so. Thanks! I posted the other one for rhel8.2 (bz1774652). Move to VERIFIED as Comment 15. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1567 |