Bug 728877

Summary: Can not live migration 64bit RHEL6.1 PV guest with 128G memory
Product: Red Hat Enterprise Linux 5 Reporter: Yuyu Zhou <yuzhou>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.8CC: leiwang, lersek, qwan, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When live migrating a paravirtualized Xen guest with 128 GB of memory, it may be necessary to increase the Xen heap size on both source and target hypervisors at host boot time. On x86_64 platforms, the default Xen heap size is 16 MB; in order to triple it, pass the "xenheap_megabytes=48" option on the xen.gz command line in grub.conf.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-12 13:01:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xend log(source host)
none
xend log(receive host)
none
new src_xend.log none

Description Yuyu Zhou 2011-08-08 09:11:44 UTC
Description of problem:
Error message show when live migrating 64bit RHEL6.1 PV guest with 128G memory in Intel x7550. 
Host memory size is 500G. The guest kernel already have the patch of  716539 included, so 128G is supported.

Version-Release number of selected component (if applicable):
Host: kernel-2.6.18-274.el5xen, xen-3.0.3-132.el5
Guest:kernel-2.6.32-131.12.1.el6

How reproducible:
100%

Steps to Reproduce:
1. Live migrate a 64bit RHEL6.1 PV guest with 128G memory, that is memory=131072 in configure file.
  
Actual results:
 [root@intel-x7550-512-2 boot]# xm migrate 96 --live 10.66.83.210
 Error: /usr/lib64/xen/bin/xc_save 15 96 0 0 1 failed
 Usage: xm migrate <Domain> <Host>
 Migrate a domain to another machine.
 Options:
 -h, --help           Print this help.
 -l, --live           Use live migration.
 -p=portnum, --port=portnum
                      Use specified port for migration.
 -r=MBIT, --resource=MBIT
                      Set level of resource usage for migration.

Expected results:
1. live migration finished successfully

Additional info:
1. Live migrate 64bit RHEL6 PV guest with 120G memory PASS.
2. Migrate ((without --live) 64bit RHEL6 PV guest with 128G memory PASS.
3. No similar test is done in AMD machine or other Intel machine till now. So not sure about whether this issue exist in other platform.

Comment 1 Yuyu Zhou 2011-08-08 09:12:28 UTC
Created attachment 517143 [details]
xend log(source host)

Comment 2 Yuyu Zhou 2011-08-08 09:13:00 UTC
Created attachment 517144 [details]
xend log(receive host)

Comment 3 Yuyu Zhou 2011-08-08 13:04:51 UTC
Similar phenomenon also exist in RHEL5.7 64bit PV guest.
1. live migrate 64bit RHEL5.7 PV guest with 120G memory PASS
2. live migrate 64bit RHEL6.7 PV guest with 128G memory FAIL

Comment 4 Yuyu Zhou 2011-08-08 13:05:50 UTC
(In reply to comment #3)
> Similar phenomenon also exist in RHEL5.7 64bit PV guest.
> 1. live migrate 64bit RHEL5.7 PV guest with 120G memory PASS
> 2. live migrate 64bit RHEL6.7 PV guest with 128G memory FAIL
sorry for typo 

2. live migrate 64bit RHEL5.7 PV guest with 128G memory FAIL

Comment 5 Laszlo Ersek 2011-08-08 13:40:14 UTC
(In reply to comment #0)

> 1. Live migrate a 64bit RHEL6.1 PV guest with 128G memory, that is
> memory=131072 in configure file.

Can you please try with memory=131071 (= 128G - 1M) and memory=130048 (= 127G)? Thanks!

Comment 6 Yuyu Zhou 2011-08-09 03:16:26 UTC
(In reply to comment #5)
> (In reply to comment #0)
> 
> > 1. Live migrate a 64bit RHEL6.1 PV guest with 128G memory, that is
> > memory=131072 in configure file.
> 
> Can you please try with memory=131071 (= 128G - 1M) and memory=130048 (= 127G)?
> Thanks!

Hi, Laszlo

1. live migrate 64bit RHEL6.1 PV guest with memory=131071(=128G-1M) FAIL
2. live migrate 64bit RHEL6.1 PV guest with memory=130048(=127G) PASS

Yuyu

Comment 7 Laszlo Ersek 2011-08-09 09:18:54 UTC
(1) -- In reply to comment #0

>  [root@intel-x7550-512-2 boot]# xm migrate 96 --live 10.66.83.210
>  Error: /usr/lib64/xen/bin/xc_save 15 96 0 0 1 failed

I checked the two xend logs (attachment 517143 [details] and attachment 517144 [details]), but I can't see anything related to domain 96.

Of course on the target host the domain ID would be different; can you please check the highest domid on the target before starting the migration, so that the xend messages can be identified?

BTW the meaning of parameters is:

io_fd = 15 -- some comm channel I guess to xend, or the other migration party,
              or something else
domid = 96 -- domain to migrate
maxit =  0 -- defaults to DEF_MAX_ITERS (--> 1 + 29), max number of loops
              around the guest, chasing the mutating frames
max_f =  0 -- defaults to DEF_MAX_FACTOR (--> 3); data to be sent seems to
              be limited by guest RAM size times this factor
flags =  1 -- live migration

I think we can exclude these parameters as problem sources.

(2) Is the error immediately printed, or more like near the end of the migration?

(3) Please try the 128GB live migration with the xen package from here:

https://brewweb.devel.redhat.com/taskinfo?taskID=3543609

This sets the DEBUG bit in the "flags" argument mentioned above, and so the migration should produce many debug entries in xend.log.

Thanks.

Comment 8 Yuyu Zhou 2011-08-09 10:10:59 UTC
(In reply to comment #7)
> (1) -- In reply to comment #0
> 
> >  [root@intel-x7550-512-2 boot]# xm migrate 96 --live 10.66.83.210
> >  Error: /usr/lib64/xen/bin/xc_save 15 96 0 0 1 failed
> 
> I checked the two xend logs (attachment 517143 [details] and attachment 517144 [details]), but I
> can't see anything related to domain 96.
> 
> Of course on the target host the domain ID would be different; can you please
> check the highest domid on the target before starting the migration, so that
> the xend messages can be identified?
> 
> BTW the meaning of parameters is:
> 
> io_fd = 15 -- some comm channel I guess to xend, or the other migration party,
>               or something else
> domid = 96 -- domain to migrate
> maxit =  0 -- defaults to DEF_MAX_ITERS (--> 1 + 29), max number of loops
>               around the guest, chasing the mutating frames
> max_f =  0 -- defaults to DEF_MAX_FACTOR (--> 3); data to be sent seems to
>               be limited by guest RAM size times this factor
> flags =  1 -- live migration
> 
> I think we can exclude these parameters as problem sources.
> 
I didn't record my log during the domain 96, but repeat the experiment later again. That's why you didn't see anything related domain 96.
I use following command to record log during my experiment this time, but nothing recorded in target host. Source host xend log attached.
#tail -n0 -f /var/log/xen/xend.log | tee log 

[root@intel-x7550-512-2 boot]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0    32768    32 r-----    193.2
RHEL6.1-64-PV                              1   131072     1 r-----      2.3
[root@intel-x7550-512-2 boot]# xm migrate 1 10.66.82.210 --live
Error: can't connect: No route to host
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-r=MBIT, --resource=MBIT
                     Set level of resource usage for migration.



> (2) Is the error immediately printed, or more like near the end of the
> migration?
> 
Immediately printed.

> (3) Please try the 128GB live migration with the xen package from here:
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=3543609
> 
> This sets the DEBUG bit in the "flags" argument mentioned above, and so the
> migration should produce many debug entries in xend.log.
> 
> Thanks.

Used this package this time.
BTW, If you need to check, you can use intel-x7550-512-1.englab.nay.redhat.com and intel-x7550-512-2.englab.nay.redhat.com.
I reserved these two machine till end of this Wednesday.
you can create the  guest by /data/exp/boot/pv.cfg in intel-x7550-512-2.

Comment 9 Yuyu Zhou 2011-08-09 10:11:29 UTC
Created attachment 517375 [details]
new src_xend.log

Comment 10 Laszlo Ersek 2011-08-09 14:07:40 UTC
I reproduced the problem. Attachment 517143 [details] actually contains the userspace error message:

        Internal error: Couldn't enable shadow mode

Also, the hypervisor logs this entry twice:

        (XEN) Cannot handle page request order 11!

The userspace code printing the message is xc_domain_save() [tools/libxc/xc_domain_save.c]:

 926 /* Domain is still running at this point */
 927 if ( live )
 928 {
 929     /* Live suspend. Enable log-dirty mode. */
 930     if ( xc_shadow_control(xc_handle, dom,
 931                            XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
 932                            NULL, 0, NULL, 0, NULL) < 0 )
 933     {
 934         /* log-dirty already enabled? There's no test op,
 935            so attempt to disable then reenable it */
 936         if ( !(xc_shadow_control(xc_handle, dom, XEN_DOMCTL_SHADOW_OP_OFF,
 937                                  NULL, 0, NULL, 0, NULL) >= 0 &&
 938                xc_shadow_control(xc_handle, dom,
 939                                  XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
 940                                  NULL, 0, NULL, 0, NULL) >= 0) )
 941         {
 942             ERROR("Couldn't enable shadow mode");
 943             goto out;
 944         }
 945     }

We can see two (failed) XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY requests, and I think those correspond to the two hypervisor log entries.

In the hypervisor:

paging_log_dirty_enable() [arch/x86/mm/paging.c]
-> paging_alloc_log_dirty_bitmap() 
  -> xmalloc_array() [include/xen/xmalloc.h]
    -> _xmalloc_array()
      -> _xmalloc() [common/xmalloc.c]
        -> xmalloc_whole_pages() 
          -> alloc_xenheap_pages() [common/page_alloc.c]
            -> alloc_heap_pages() -- fails
            -> prints "Cannot handle page request order 11!"

paging_alloc_log_dirty_bitmap() tries to allocate a bitmap where a single bit represents if a guest frame has been dirtied since it was migrated. The bitmap is composed of longs. The number of guest frames is rounded up to the next multiple of 64 (64 bits/long), and then divided by 64 to get the number of longs in the array.

guest memory          :      128 GB = 2**37 bytes
frame size            :        4 KB = 2**12 bytes
number of guest frames: ~32 million = 2**25
bits per long         :          64 = 2** 6
needed longs          :               2**19
bytes per long        :           8 = 2** 3
needed bytes          :        4 MB = 2**22 (_xmalloc_array())
needed xenheap pages  :               2**10 (xmalloc_whole_pages(),
                                             get_order_from_bytes())

The "order" in this case is "10".

However, when we specify "memory=131072" in the vm config file, there's probably some overhead, and the number of guest frames is a tiny bit higher than what 4 MB in the xen heap could cover. See Mirek's testing in

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-July/msg00904.html

which seems to be consistent with comment 6 -- for the 128 GB guest, we have 256 MB overhead.

get_order_from_bytes() works in whole powers of two, so what we actually try to allocate for a 128 GB (+ 256 MB, see above) guest's live migration is 8 MB of contiguous xenheap.

I'd have to look at the xen boot log on this machine (not yet available in "xm dmesg"), but "opt_xenheap_megabytes" seems to default to XENHEAP_DEFAULT_MB -- 16 megabytes on x86_64. I can imagine that the default is not enough, or just too fragmented, to allocate 8 MB for the dirty bitmap.

I suggest retrying the test with "xenheap_megabytes=48" on the Xen command line (the default's triple).

Comment 11 Laszlo Ersek 2011-08-11 12:51:33 UTC
When passing "xenheap_megabytes=48" to both hypervisors, the migration starts, but it never finishes.

Source xend.log is full with such messages:

[2011-08-11 08:46:02 xend 17699] INFO (XendCheckpoint:498) 6 pfn= 00ae95fa
                                 mfn= 07128d8b 0  [mfn]= 5555555555555555
[2011-08-11 08:46:02 xend 17699] INFO (XendCheckpoint:498) 6 pfn= 00b351fa
                                 mfn= 070dd18b 0  [mfn]= ffffffffffffffff

Source hypervisor log (though this is for a different domain, dom2 -- the domain being migrated currently is dom5):

(XEN) mm.c:654:d0 Error getting mfn 5dbe0f1 (pfn 1454294) from L1 entry 8000005dbe0f1425 for dom2


Target xend.log:

[2011-08-11 10:16:15 xend 8688] DEBUG (XendCheckpoint:317) restore:shadow=0x0,
                               _static_max=0x20000, _static_min=0x20000, 
[2011-08-11 10:16:15 xend 8688] DEBUG (balloon:146) Balloon: 460306172 KiB
                                free; need 134217728; done.
[2011-08-11 10:16:15 xend 8688] DEBUG (XendCheckpoint:342) [xc_restore]:
                                /usr/lib64/xen/bin/xc_restore 23 3 1 2 0 0 0
[2011-08-11 10:16:15 xend 8688] INFO (XendCheckpoint:498) xc_domain_restore
                                start: p2m_size = 2000000
[2011-08-11 10:16:16 xend 8688] INFO (XendCheckpoint:498) Reloading memory
                                pages:   0%
[2011-08-11 10:39:40 xend 8688] INFO (XendCheckpoint:498) ERROR Internal error:
                                Error when reading batch size
[2011-08-11 10:39:41 xend 8688] INFO (XendCheckpoint:498) Restore exit with
                                rc=1

Comment 12 Laszlo Ersek 2011-08-12 13:01:34 UTC
I reinstalled the official -132 xen packages on both machines, plus the debuginfo packages. I've been watching both xc_save and xc_restore progress in gdb, and they completed successfully. The full live migration completed successfully.

(In reply to comment #11)
> When passing "xenheap_megabytes=48" to both hypervisors

"xenheap_megabytes=48" was again passed to both hypervisors.

> the migration starts, but it never finishes.

I realized that enabling "debug mode" (see comment 7) actually changes the migration logic. It's not only about debug logging.

> Source xend.log is full with such messages:
> 
> [2011-08-11 08:46:02 xend 17699] INFO (XendCheckpoint:498) 6 pfn= 00ae95fa
>                                  mfn= 07128d8b 0  [mfn]= 5555555555555555
> [2011-08-11 08:46:02 xend 17699] INFO (XendCheckpoint:498) 6 pfn= 00b351fa
>                                  mfn= 070dd18b 0  [mfn]= ffffffffffffffff

The code formatting the debug message is broken.

                    if ( !hvm && is_mapped(pfn_to_mfn(n)) )
                        DPRINTF("  [mfn]= %08lx",
                                mfn_to_pfn(pfn_to_mfn(n)&0xFFFFF));

The mask doesn't make much sense to me.

> Target xend.log:
> 
> [2011-08-11 10:39:40 xend 8688] INFO (XendCheckpoint:498) ERROR Internal
>                                 error: Error when reading batch size

This message is witness to a networking problem.

----o----

Finally, I live-migrated the guest back to the original host successfully (didn't attach gdb this time to the utilities).

Adding technical note and closing as WORKSFORME. Please retest and reopen if necessary.

Comment 13 Laszlo Ersek 2011-08-12 13:01:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When live migrating a paravirtualized Xen guest with 128 GB of memory, it may be necessary to increase the Xen heap size on both source and target hypervisors at host boot time. On x86_64 platforms, the default Xen heap size is 16 MB; in order to triple it, pass the "xenheap_megabytes=48" option  on the xen.gz command line in grub.conf.