Bug 1729817

Summary: memfd migration failed - qemu-kvm: Unknown ramblock "/objects/ram-node0", cannot accept migration
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Jing Qi <jinqi>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Live Migration QA Contact: Jing Qi <jinqi>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: unspecified CC: fjin, jdenemar, jen, jsuchane, marcandre.lureau, mprivozn, mrezanin, pkrempa, virt-maint, xuzhang
Version: 8.0Keywords: Triaged
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-18 02:54:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jing Qi 2019-07-15 02:46:54 UTC
Description of problem:

With source type='memfd' memoryBacking setting, migrate the domain failed for below error - 
 
qemu-kvm: Unknown ramblock "/objects/ram-node0", cannot accept migration

Version-Release number of selected component (if applicable):

libvirt-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64
qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64

How reproducible:
always

Steps to Reproduce:
1. Start a domain with xml -
  <maxMemory slots='8' unit='KiB'>8938496</maxMemory>
  <memory unit='KiB'>512000</memory>
  <currentMemory unit='KiB'>512000</currentMemory>
   <memoryBacking>
    <source type='memfd'/>
  </memoryBacking>
   ....
   <cpu>
    <numa>
      <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

2.virsh migrate avocado-vt-vm-440 --live qemu+ssh://10.73.224.48/system 
root.224.48's password: 
error: internal error: qemu unexpectedly closed the monitor: 2019-07-15T02:45:29.335167Z qemu-kvm: Unknown ramblock "/objects/ram-node0", cannot accept migration
2019-07-15T02:45:29.335231Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2019-07-15T02:45:29.336059Z qemu-kvm: load of migration failed: Invalid argument


Actual results:
Migration failed.

Expected results:
Migration succeeded

Additional info:

Comment 1 Jing Qi 2019-07-18 05:26:04 UTC
In the bug description, the target machine is  8.1.0 AV -
 
version : libvirt-5.5.0-1.virtcov.el8.x86_64  & qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf.x86_64.

And tried to migrate the domain to target machine with 8.0.1 AV and the migration can succeed.

Comment 2 Michal Privoznik 2020-03-10 14:32:00 UTC
Marc-Andre, is this even supported on qemu side?

Comment 3 Marc-Andre Lureau 2020-03-10 16:05:59 UTC
Yes, it should work, but there was some complication related to ramblock naming, see upstream commit fa0cb34d2210cc749b9a70db99bb41c56ad20831 ("hostmem: use object id for memory region name with >= 4.0")

What is the machine version being used?

    <type arch='x86_64' machine='??'>hvm</type>

Are the qemu version different from src & dest?

Have you tried save & restore before doing migration?

Comment 4 Marc-Andre Lureau 2020-03-10 16:08:50 UTC
(In reply to Jing Qi from comment #1)
> In the bug description, the target machine is  8.1.0 AV -
>  
> version : libvirt-5.5.0-1.virtcov.el8.x86_64  &
> qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf.x86_64.
> 
> And tried to migrate the domain to target machine with 8.0.1 AV and the
> migration can succeed.

Ok, so?

 qemu-kvm-3.1.0 -> qemu-kvm-3.1.0 OK
 qemu-kvm-3.1.0 -> qemu-kvm-4.0.0 KO

Let's make sure the machine version is correct. We might be missing compat stuff for RHEL I suppose..

Comment 5 Jing Qi 2020-03-11 09:42:22 UTC
I tried with both machine type of  "pc-q35-rhel8.0.0" & "pc-i440fx-rhel7.6.0".

Comment 6 Jing Qi 2020-03-11 09:42:45 UTC
I tried with both machine type of  "pc-q35-rhel8.0.0" & "pc-i440fx-rhel7.6.0".

Comment 7 Jing Qi 2020-03-11 09:50:15 UTC
I tried with the newer qemu-kvm version of rhel8.1.0av  for target machine:
qemu-kvm-4.1.0-23.module+el8.1.1+5938+f5e53076.2.x86_64
 
The source qemu-kvm version:
qemu-kvm-3.1.0-30.module+el8.0.1+4607+7ea9baa9.2.x86_64

It reports different error message:
virsh migrate avocado-vm1 qemu+ssh://10.73.224.176/system --live --verbose   
root.224.176's password: 
root.224.176's password: 
error: internal error: process exited while connecting to monitor: 2020-03-11T09:25:40.349373Z qemu-kvm: -object memory-backend-memfd,id=ram-node0,size=1073741824: invalid object type: memory-backend-memfd

Comment 9 Jing Qi 2020-03-11 12:16:21 UTC
Marc-Andre,
I have sent an email including the machines' info to you. 

For the second issue, shall we need a bug for it?

Thanks,

Jing Qi

Comment 10 Marc-Andre Lureau 2020-03-11 14:06:27 UTC
(In reply to Jing Qi from comment #9)
> For the second issue, shall we need a bug for it?

Yes, please

Comment 11 Marc-Andre Lureau 2020-03-11 14:16:55 UTC
Hi Miroslav,

since rhel-7.6 ~ qemu-3.0,
why pc_machine_rhel760_options() doesn't chain up to pc_i440fx_3_1_machine_options() ? (there is no rhel8 machine)

fwiw, pc_q35_machine_rhel760_options() chains up to pc_q35_machine_rhel800_options() which seems correct.

Comment 12 Marc-Andre Lureau 2020-03-11 14:24:46 UTC
In qemu-kvm-4.2 we have:

static void pc_machine_rhel760_options(MachineClass *m)
{
    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
    pc_machine_rhel7_options(m);
    m->desc = "RHEL 7.6.0 PC (i440FX + PIIX, 1996)";
    m->async_pf_vmexit_disable = true;
    m->smbus_no_migration_support = true;
    pcmc->pvh_enabled = false;
    pcmc->default_cpu_version = CPU_VERSION_LEGACY;
    compat_props_add(m->compat_props, hw_compat_rhel_8_1, hw_compat_rhel_8_1_len);
    compat_props_add(m->compat_props, pc_rhel_8_1_compat, pc_rhel_8_1_compat_len);
    compat_props_add(m->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
    compat_props_add(m->compat_props, pc_rhel_8_0_compat, pc_rhel_8_0_compat_len);
    compat_props_add(m->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
    compat_props_add(m->compat_props, pc_rhel_7_6_compat, pc_rhel_7_6_compat_len);
}

In qemu-kvm-4.1:

static void pc_machine_rhel760_options(MachineClass *m)
{
    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
    pc_machine_rhel7_options(m);
    m->desc = "RHEL 7.6.0 PC (i440FX + PIIX, 1996)";
    m->async_pf_vmexit_disable = true;
    m->smbus_no_migration_support = true;
    pcmc->pvh_enabled = false;
    pcmc->default_cpu_version = CPU_VERSION_LEGACY;
    compat_props_add(m->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
    compat_props_add(m->compat_props, pc_rhel_8_0_compat, pc_rhel_8_0_compat_len);
    compat_props_add(m->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
    compat_props_add(m->compat_props, pc_rhel_7_6_compat, pc_rhel_7_6_compat_len);
}

The rhel8 lines were added by commit 0784125ba3ccd72a590d210cf3f52d80e96b4263 ("x86 machine types: add pc-q35-rhel8.1.0").

I think we need similar lines for qemu-kvm-4.0

Comment 13 Marc-Andre Lureau 2020-03-11 15:42:27 UTC
Even with:
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 37907fe76a..348228b329 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -1029,6 +1029,7 @@ static void pc_machine_rhel760_options(MachineClass *m)
     pc_machine_rhel7_options(m);
     m->desc = "RHEL 7.6.0 PC (i440FX + PIIX, 1996)";
     m->async_pf_vmexit_disable = true;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
     compat_props_add(m->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
     compat_props_add(m->compat_props, pc_rhel_7_6_compat, pc_rhel_7_6_compat_len);
 }

I get: "Missing section footer for 0000:00:01.3/piix4_pm"

Comment 14 Eduardo Habkost 2020-03-11 19:19:44 UTC
(In reply to Marc-Andre Lureau from comment #12)
> I think we need similar lines for qemu-kvm-4.0

I don't think we ever released a qemu-kvm-4.0 package to customers.

Comment 15 Peter Krempa 2020-03-12 07:03:33 UTC
So is this a qemu bug then? Should we move it to qemu? Or does it require a libvirt copy?

Comment 16 Marc-Andre Lureau 2020-03-12 10:59:03 UTC
It's qemu.  I sent a patch to rhvirt: "[RHEL-AV-8.1.0 qemu-kvm PATCH] piix: chain up to rhel8.0 compat properties".

But apparently that version of qemu, rhel-av-8.1.0/master-4.0.0, isn't maintained.

Comment 17 Jing Qi 2020-03-16 09:03:49 UTC
Since the qemu-4.1.0 is the released version for rhel-av-8.1.0 and the bug happened in qemu-4.0.0, the bug doesn't need an extra patch, right?  And the memfd type was decided to be not supported in rhel-av-8.1.0 / qemu-4.1.0-23 at last.

Comment 19 Jeff Nelson 2020-03-22 18:51:25 UTC
Clearing stale needinfo in comment 11.