Bug 1472286 - RHEL7.4: libvirtError: internal error: unable to execute QEMU command 'device_del': Bus 'pci.0' does not support hotplugging
RHEL7.4: libvirtError: internal error: unable to execute QEMU command 'device...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
---
x86_64 Linux
high Severity high (vote)
: ovirt-4.2.0
: ---
Assigned To: Francesco Romani
Michael Burman
: Regression
Depends On:
Blocks: 1412074
  Show dependency treegraph
 
Reported: 2017-07-18 07:17 EDT by Michael Burman
Modified: 2017-12-20 06:46 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1471667
Environment:
Last Closed: 2017-12-20 06:46:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
rule-engine: blocker+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 79560 master MERGED tests: virt: add test for Vm.acpi_enabled() 2017-07-19 10:40 EDT
oVirt gerrit 79561 master MERGED virt: libvirtxml: don't do appendMetadata in init 2017-07-20 08:49 EDT
oVirt gerrit 79562 master MERGED virt: add libvirtxml.make_minimal_domain helper 2017-07-20 10:45 EDT

  None (edit)
Comment 1 Dan Kenigsberg 2017-07-18 07:28:01 EDT
asking for a blocker, since not setting <acpi> may have serious effect on some guests.
Comment 2 Dan Kenigsberg 2017-07-18 07:30:02 EDT
Francesco, I don't see the patch you've mentioned on 4.1

$ git log --grep I40859f2d57a19d60f4fa1e1faf442d072fa0993c ovirt/ovirt-4.1
Comment 3 Francesco Romani 2017-07-18 07:37:47 EDT
(In reply to Dan Kenigsberg from comment #2)
> Francesco, I don't see the patch you've mentioned on 4.1
> 
> $ git log --grep I40859f2d57a19d60f4fa1e1faf442d072fa0993c ovirt/ovirt-4.1

because we didn't backport it, it is a master only patch:

 $ git branch --contains a877434796eeb5af51368f6acdf8ed7c8bf33906 | grep -i ovirt
 $

so no released version is affected. Lowering priority because of this.
Comment 4 Francesco Romani 2017-07-18 07:38:52 EDT
(In reply to Dan Kenigsberg from comment #1)
> asking for a blocker, since not setting <acpi> may have serious effect on
> some guests.

No need for a blocker, see https://bugzilla.redhat.com/show_bug.cgi?id=1472286#c3
Comment 5 Dan Kenigsberg 2017-07-19 01:36:33 EDT
Oh, I thought Burman has seen it in ovirt-4.1.
Comment 6 Michael Burman 2017-07-19 01:53:53 EDT
(In reply to Dan Kenigsberg from comment #5)
> Oh, I thought Burman has seen it in ovirt-4.1.

Only master indeed.
Comment 7 Francesco Romani 2017-07-20 10:45:23 EDT
patches merged to master -> MODIFIED
Comment 8 Francesco Romani 2017-07-24 03:52:11 EDT
this bug doesn't need doc_text. A mistake slipped in a pre-alpha release. hot(un)plugging should just work as always did.
Comment 9 Michael Burman 2017-07-26 02:48:56 EDT
Hi,

Testing this on latest master - 4.2.0-0.0.master.20170725202023.git561151b.el7.centos and vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64 with same results on some guests. 
Moving back to assigned based on this. 

2017-07-26 09:44:31,136+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler8) [a419ccb] FINISH, FullListVDSCommand, return: [{acpiEnable=false
Comment 10 Francesco Romani 2017-07-26 03:08:12 EDT
(In reply to Michael Burman from comment #9)
> Hi,

Hi,
 
> Testing this on latest master -
> 4.2.0-0.0.master.20170725202023.git561151b.el7.centos and
> vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64 with same results on some
> guests. 

"Some guests" sounds a bit too vague :)
When does the fix works, and when it doesnt?

It should go like this:
1. NEW VM, with ACPI enabled (default in Engine)
   * bugged Vdsm -> ACPI disabled -> no hot(un)plug
   * patched Vdsm -> ACPI enabled -> hot(un)plug works

2. OLD VM, with ACPI enabled (default in Engine), created BEFORE the bug was introduced:
   * ACPI enabled -> hot(un)plug works

3. OLD VM, with ACPI enabled (default in Engine), **created with buggy Vdsm**
   * ACPI state was reported by Vdsm as disabled, and I'm quite sure this ended up in the Engine DB. So, if the VM starts again, it will get acpiEnable=false, and Vdsm will dutifully comply

TL;DR: please provide one example -with Vdsm logs!- that Vdsm received acpiEnable=true in VM.create and reports acpiEnable=false in FullListVDSCommand
Comment 11 Michael Burman 2017-07-26 03:22:27 EDT
(In reply to Francesco Romani from comment #10)
> (In reply to Michael Burman from comment #9)
> > Hi,
> 
> Hi,
>  
> > Testing this on latest master -
> > 4.2.0-0.0.master.20170725202023.git561151b.el7.centos and
> > vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64 with same results on some
> > guests. 
> 
> "Some guests" sounds a bit too vague :)
> When does the fix works, and when it doesnt?
> 
> It should go like this:
> 1. NEW VM, with ACPI enabled (default in Engine)
>    * bugged Vdsm -> ACPI disabled -> no hot(un)plug
>    * patched Vdsm -> ACPI enabled -> hot(un)plug works
> 
> 2. OLD VM, with ACPI enabled (default in Engine), created BEFORE the bug was
> introduced:
>    * ACPI enabled -> hot(un)plug works
> 
> 3. OLD VM, with ACPI enabled (default in Engine), **created with buggy Vdsm**
>    * ACPI state was reported by Vdsm as disabled, and I'm quite sure this
> ended up in the Engine DB. So, if the VM starts again, it will get
> acpiEnable=false, and Vdsm will dutifully comply
> 
> TL;DR: please provide one example -with Vdsm logs!- that Vdsm received
> acpiEnable=true in VM.create and reports acpiEnable=false in
> FullListVDSCommand

Just to make it clear, are you saying that i need to create new VM??
If it had acpiEnable=false before the fix, then it will always start as false?
Comment 12 Michael Burman 2017-07-26 03:25:41 EDT
It actually make no sense. 
I have 5 VMs created together, 1 VM always get false, all others get true.
Please contact me and i provide you the setup. will be faster. Thanks,
Comment 13 Francesco Romani 2017-07-26 04:42:03 EDT
(In reply to Michael Burman from comment #11)
> (In reply to Francesco Romani from comment #10)
> > (In reply to Michael Burman from comment #9)
> > > Hi,
> > 
> > Hi,
> >  
> > > Testing this on latest master -
> > > 4.2.0-0.0.master.20170725202023.git561151b.el7.centos and
> > > vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64 with same results on some
> > > guests. 
> > 
> > "Some guests" sounds a bit too vague :)
> > When does the fix works, and when it doesnt?
> > 
> > It should go like this:
> > 1. NEW VM, with ACPI enabled (default in Engine)
> >    * bugged Vdsm -> ACPI disabled -> no hot(un)plug
> >    * patched Vdsm -> ACPI enabled -> hot(un)plug works
> > 
> > 2. OLD VM, with ACPI enabled (default in Engine), created BEFORE the bug was
> > introduced:
> >    * ACPI enabled -> hot(un)plug works
> > 
> > 3. OLD VM, with ACPI enabled (default in Engine), **created with buggy Vdsm**
> >    * ACPI state was reported by Vdsm as disabled, and I'm quite sure this
> > ended up in the Engine DB. So, if the VM starts again, it will get
> > acpiEnable=false, and Vdsm will dutifully comply
> > 
> > TL;DR: please provide one example -with Vdsm logs!- that Vdsm received
> > acpiEnable=true in VM.create and reports acpiEnable=false in
> > FullListVDSCommand
> 
> Just to make it clear, are you saying that i need to create new VM??
> If it had acpiEnable=false before the fix, then it will always start as
> false?

To make sure the fix works (or doesn't) yes, we need to doublecheck creating a new VM.

The problem is the following:
1. Engine start a VM with acpiEnable = True (default)
2. Vdsm, because of the bug, fails to set the flag, and reports acpiEnable = False
3. Engine reads back the value reported by Vdsm, and sets acpiEnable = False on the DB (<- this could be one Engine bug)
4. from now on, Engine will read the ACPI state from DB, and always send acpiEnable = false, regardless of the bug or the fix.
5. So even a correct working Vdsm will get acpiEnable = false, and this will prevent hot(un)plug to work.

I agree this is highly unpratical, but we must also acknowledge that this happened on a pre-alpha snapshot, so no customer could be affected (once the bug is fixed of course).
The users affected by this bug (testers and developers, I'd assume) can fix this issue running the following query (with Engine stopped):

  update vm_dynamic set acpi_enable=true where vm_guid in (...)

one needs to set the list of vm_guid that needs to be fixed.
Comment 14 Michal Skrivanek 2017-07-26 05:03:47 EDT
moving back, please test that according to suggestions above
Comment 15 Michael Burman 2017-07-26 06:35:48 EDT
Based on comment 13^^ moving to VERIFIED

New VM - working as expected. acpiEnable = true and hotunplug working.
Old VM with the bug - acpiEnable = false and hotunplug doesn't wokring and will never work for this VM.
- Option1 to fix DB
- Option2 to create new VM

Verified on -  4.2.0-0.0.master.20170725202023.git561151b.el7.centos
and vdsm-4.20.1-241.giteb37c05.el7.centos.x86_64
Comment 16 Sandro Bonazzola 2017-12-20 06:46:39 EST
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.