Bug 1700390
| Summary: | KVM-RT guest with 10 vCPUs hangs on reboot | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jaroslav Suchanek <jsuchane> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED NOTABUG | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 15.0 (Stein) | CC: | chayang, dasmith, egallen, eglynn, fiezzi, jdenemar, jhakimra, jsuchane, juzhang, kchamart, knoel, lcapitulino, lyarwood, mkletzan, mtosatti, nilal, pezhang, sbauza, sgordon, smooney, stephenfin, toneata, virt-bugs, vromanso |
| Target Milestone: | --- | Keywords: | Reopened, Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1580229 | Environment: | |
| Last Closed: | 2021-06-01 13:31:32 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1580229 | ||
| Bug Blocks: | 1932086 | ||
|
Description
Jaroslav Suchanek
2019-04-16 12:50:37 UTC
Nova currently doesn't allow CPU overcommit, so it is not a problem today to expose 'emulatorsched' option. So closing the bug with this rationale. (In future, if we decide we need this, we can always reopen the bug.) (In reply to Kashyap Chamarthy from comment #2) > Nova currently doesn't allow CPU overcommit, so it is not a problem today to > expose 'emulatorsched' option. > > So closing the bug with this rationale. (In future, if we decide we need > this, we can always reopen the bug.) Kashyap, Whether CPU overcommit is supported or not is not revelant to this option. See comment #39 of https://bugzilla.redhat.com/show_bug.cgi?id=1580229 (In reply to Marcelo Tosatti from comment #3) > (In reply to Kashyap Chamarthy from comment #2) > > Nova currently doesn't allow CPU overcommit, so it is not a problem today to > > expose 'emulatorsched' option. > > > > So closing the bug with this rationale. (In future, if we decide we need > > this, we can always reopen the bug.) > > Kashyap, > > > Whether CPU overcommit is supported or not is not revelant to this option. Thanks for correcting, Marcelo. I closed it based on a bug triage discussion; I was reminded by a colleague, Sean Mooney, that the reasons are more granular, as in, there are two things here: (1) Nova can expose 'emulatorsched' unconditionally whenever 'vcpusched' is exposed. (2) However, Nova should _not_ expose this to the "tenant" users (who are not admins) > See comment #39 of > > https://bugzilla.redhat.com/show_bug.cgi?id=1580229 Given that this sounds this is important for real-time workloads, then we can enable (i.e. the first point noted earlier) 'emulatorsched' whenever 'vcpusched' is exposed. This can be done whenever you set a property called: `hw:cpu_realtime=yes` on a "flavor" (which defines the compute, memory and storage capacity for guests), then make sure to enable both 'emulatorsched' and 'vcpusched'. Their values can both be set to whatever the value of the Nova config attribute is 'realtime_scheduler_priority'[1], which is defined as follows: "In a realtime host context vCPUs for guest will run in that scheduling priority. Priority depends on the host kernel (usually 1-99)" (Sean, please correct me if I misparsed you.) [1] https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.realtime_scheduler_priority Another point that Sean suggested is that, to _not_ hit this bug in OpenStack Nova, you can do the following.
If you set the flavor property `hw:cpu_realtime`, ensure to also set:
- `hw:emulator_thread_policy=isolate` or `hw:emulator_thread_policy=share`; and
- set `cpu_shared_set` (this defines which physical CPUs will be used for
best-effort guest vCPU resources) in `nova.conf`
The above is not enforced in the code, though. So this feature request can protect those users who don't manually set the above.
What's the status of this BZ? Sorry for the late response. I chatted with my colleague Sean Mooney and we both agree that this is a user-error. I.e. the user must ensure to configure "hw:emulator_thread_policy" for a real-time guest. See comment#6 for details on hw:emulator_thread_policy". On the above basis, I'm closing this bug. since this was fresh in our minds i brought this up in our upstream team meeting today. we still belive the statemng above is correct that as implmented today it is user error to use realtime instance without hw:emulator_thread_policy today. with that said it also occurred to me that there may be a better default we can do in this specific case when the flavor/image combination is miss configured. hw:cpu_realtime_mask is also a require paramter when using realtime cpus in os 15 and osp 16. in osp 17 that is releaxt by https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/releasenotes/notes/bug-1884231-16acf297d88b122e.yaml but only when hw:emulator_thread_policy is used. what we might be able to do is future relax the requirements. in the event that emulator_thread_policy is not defined and cpu_realtime_mask is not defined we can return an error. but when cpu_realtime_mask is defined and emulator_thread_policy is not defied we can reduce the priorty of the non realtime vcpus and then confine the emultor thread to float over the non realtim vCPU host cores with the same elevated priority as the realtime vcpus. what this woudl mean for a 2 core vm where guest cpu 0 is non realtime and guest cpu 1 is realtime we would generate the xml as follows .e.g. hw:cpu_policy=dedicated hw:cpu_realtime=True hw:cpu_realtime_mask=^0 <vcpupin vcpu="0" cpuset="0"/> <vcpupin vcpu="1" cpuset="1"/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> <emulatorpin cpuset="1"/> <emulatorsched scheduler='fifo' priority='1'> vs today <vcpupin vcpu="0" cpuset="0"/> <vcpupin vcpu="1" cpuset="1"/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> <emulatorpin cpuset="1,2"/> that should ensure that you cant get into an situation where the emulator thread is starved by the guest cpus. this however would still not be our recommended configuration as we would advise using an isolated emulator thread or preferable an emulator thread form the cpu_shared_set instead. i will capture this in an upstream bug report but to set expectation this is a low priory wishlist enhancement. it is not clear when we will have time to implement this change but we may be able to include it in other work. (In reply to Kashyap Chamarthy from comment #10) > Sorry for the late response. I chatted with my colleague Sean Mooney and we > both agree that this is a user-error. I.e. the user must ensure to > configure "hw:emulator_thread_policy" for a real-time guest. > > See comment#6 for details on hw:emulator_thread_policy". > > On the above basis, I'm closing this bug. Hello Kashyap, Seems the "hw:emulator_threads_policy=share" is for setting RT VM <emulatorpin xx>, I got this info from Bug 1849469. And seems this bz is a new request to support <emulatorsched scheduler='fifo' priority='1'/> in OSP (RHEL has supported it after fix of Bug 1580229). But currently RT VMs work well without <emulatorsched scheduler='fifo' priority='1'/> in both RHEL layer and OSP layer. From function working well perspective, I agree this bug can be closed now. Thanks. Best regards, Pei |