Bug 618317

Summary: RFE: RHEL5 Xen: support online dynamic resize of guest virtual disks
Product: Red Hat Enterprise Linux 5 Reporter: Pasi Karkkainen <pasik>
Component: kernel-xenAssignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.6CC: bmr, drjones, jentrena, jzheng, leiwang, lersek, mrezanin, pbonzini, qwan, xen-maint, yufang521247, yuzhang, yuzhou
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.18-284.el5 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-21 03:28:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514490, 648851    
Attachments:
Description Flags
move definition of struct backend_info around
none
vbd resizing
none
revalidate virtual disk in blkfront after size change
none
fix xenbus xact start deadlock by removing double xact end
none
vbd resizing + revalidate vbd + xenbus xact deadlock removed
none
more informative dmesg entries after resizing
none
online dynamic resize of guest virtual disks none

Description Pasi Karkkainen 2010-07-26 16:27:07 UTC
Description of problem:
RHEL5 Xen doesn't support online resizing guest virtual disks (vbds).
Upstream xen.org linux-2.6.18-xen contains patches to enable this support.

Version-Release number of selected component (if applicable):
RHEL5.5 kernel-xen

How reproducible:
Always.

Steps to Reproduce:
1. Create a new RHEL5 Xen PV domU using LVM-backed virtual disks.
2. Online resize the LVM volume from dom0 while domU is running.

Actual results:
xvd/vbd-device doesn't get resized in the domU. domU doesn't see the device size change.

Expected results:
domU disks should see the new changed size.

Additional info:
Upstream patch here: http://xenbits.xen.org/linux-2.6.18-xen.hg?rev/f7f420bd7b7a
Xen guys are also pushing the blkfront resizing patches to upstream Linux kernel atm.

RHEL5 kernels do support SCSI online resizing currently, with for example software iSCSI.

Comment 1 Pasi Karkkainen 2010-11-19 09:08:15 UTC
Support for online dynamic resize of Xen PV domU disks was added in upstream kernel.org Linux 2.6.36 xen-blkfront driver.

Related RHEL6 RFE: https://bugzilla.redhat.com/show_bug.cgi?id=654982

Comment 2 Paolo Bonzini 2011-01-18 12:46:57 UTC
Simple patch, backport should be possible.

Comment 3 Laszlo Ersek 2011-05-11 13:49:26 UTC
Reproducing the problem (saving it here also in order to help QE):

dom0 running x86_64 -259

Create new LV (PE size is 32 MB in VolGroup0):

  lvcreate --extents=1 --name=bz618317 VolGroup0

Attach to guest (running x86_64 -260):

  xm block-attach rhel56-64bit-pv phy:/dev/mapper/VolGroup0-bz618317 xvdb w

Guest:

  fdisk -ul /dev/xvdb

    255 heads, 63 sectors/track, 4 cylinders, total 65536 sectors

Resize LV in dom0:

  lvresize --extents=+1 /dev/VolGroup0/bz618317

    Extending logical volume bz618317 to 64.00 MB

Repeat fdisk check in guest:

  fdisk -ul /dev/xvdb

    255 heads, 63 sectors/track, 4 cylinders, total 65536 sectors

(In reply to comment #0)
> Upstream patch here:

(Upstream changed their URL scheme.) linux-2.6.18-xen.hg c/s 1005:f7f420bd7b7a:
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/f7f420bd7b7a

Comment 4 Laszlo Ersek 2011-05-11 14:02:12 UTC
Also backport 1006:13e25228ce40 (http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/13e25228ce40) which moves the definition of struct backend_info from xenbus.c to common.h -- the new vbd_resize() function in vbd.c needs to know the complete type to dereference (blkif->be).

Comment 5 Laszlo Ersek 2011-05-11 14:32:38 UTC
Also backport some include changes from xen-unstable c/s 12333:4eaadb2ae198... Now that common.h defines struct backend_info, client code needs to know struct xenbus_watch as well. Move inclusion of xenbus.h from all C files to "common.h".

Comment 6 Laszlo Ersek 2011-05-11 16:48:33 UTC
Created attachment 498339 [details]
move definition of struct backend_info around

Backport of linux-2.6.18-xen.hg c/s 1006:13e25228ce40.
Plus some #include massaging according to xen-unstable c/s 12333:4eaadb2ae198.

Comment 7 Laszlo Ersek 2011-05-11 16:49:30 UTC
Created attachment 498340 [details]
vbd resizing

backport of linux-2.6.18-xen.hg c/s 1005:f7f420bd7b7a

Comment 8 Laszlo Ersek 2011-05-11 18:11:28 UTC
Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=3316497

I installed the kernel-xen package in both dom0 and domU, then increased the LV size twice, and checked with fdisk in the domU each time (see comment 3).

The first check went okay, dom0 logged "VBD Resize: new size 131072", fdisk displayed the increased size, and the domU logged "Setting capacity to 131072".

However, after the second lvresize command returned in dom0 successfully (dmesg: "VBD Resize: new size 196608") and I issued "fdisk -ul /dev/xvdb" in the domU, the fdisk command hung.

dom0 message:

INFO: task blkback.1.xvdb:4875 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkback.1.xvd D ffff88019bed37a0     0  4875     19                4671 (L-TLB)
 ffff88019bec3d90  0000000000000246  ffffffff880756b7  ffffffff880bcc4a 
 000000000000000a  ffff88019bed37a0  ffff8801da7590c0  0000000000007e8e 
 ffff88019bed3988  0000000000000015 
Call Trace:
 [<ffffffff880756b7>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880bcc4a>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff8028df64>] printk+0x52/0xc6
 [<ffffffff802634a3>] __down_read+0x82/0x9a
 [<ffffffff803bd22e>] xenbus_transaction_start+0x15/0x62
 [<ffffffff888f4e5b>] :blkbk:vbd_resize+0x7e/0x12f
 [<ffffffff888f3a46>] :blkbk:blkif_schedule+0x6f/0x4c9
 [<ffffffff888f39d7>] :blkbk:blkif_schedule+0x0/0x4c9
 [<ffffffff8029d046>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802339a3>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff8029d046>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802338a5>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

domU message:

INFO: task fdisk:2467 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fdisk         D 000000a7d1bc1dc1     0  2467   2421                     (NOTLB)
 ffff88002ec99c08  0000000000000282  ffff88002f3ca040  0000000000000000 
 0000000000000008  ffff88003f389080  ffffffff804feb80  0000000000007675 
 ffff88003f389268  ffffffff80263909 
Call Trace:
 [<ffffffff80263909>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8022947b>] sync_page+0x0/0x43
 [<ffffffff8022947b>] sync_page+0x0/0x43
 [<ffffffff802625e5>] io_schedule+0x3f/0x67
 [<ffffffff802294b9>] sync_page+0x3e/0x43
 [<ffffffff80262729>] __wait_on_bit_lock+0x36/0x66
 [<ffffffff8024164e>] __lock_page+0x5e/0x64
 [<ffffffff8029d28c>] wake_bit_function+0x0/0x23
 [<ffffffff8020ca5c>] do_generic_mapping_read+0x1de/0x391
 [<ffffffff8020d8fd>] file_read_actor+0x0/0x101
 [<ffffffff8020cd5b>] __generic_file_aio_read+0x14c/0x198
 [<ffffffff802c02ad>] generic_file_read+0xac/0xc5
 [<ffffffff8031e886>] inode_has_perm+0x56/0x63
 [<ffffffff8029d25e>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80243d20>] do_ioctl+0x21/0x6b
 [<ffffffff8032140b>] selinux_file_permission+0x9f/0xb4
 [<ffffffff8020bd4f>] vfs_read+0xcb/0x171
 [<ffffffff802126ef>] sys_read+0x45/0x6e
 [<ffffffff8025f2f9>] tracesys+0xab/0xb6

The domU side waits for the dom0 side. The dom0 side is blocked on some semaphore operation in xenbus_transaction_start(). This seems to be a deadlock. A printk() within __down_read() seems outright garbled.

The upstream version's context has try_to_freeze(). Related changesets:

http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/c09686d2bbff
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/cb50d25a9468

I have no idea if they would fix the deadlock. I assume not; according to Documentation/power/kernel_threads.txt, this only matters when the system (dom0) is suspended.

... Okay, after some digging, this is my understanding:

The original patch was submitted by Ky Srinivasan on 09 Mar 2010 [1]. On 16 Mar 2010, Joost Roeleveld reported the same problem as described above [2]. He also faced the problem on the second resizing attempt. On 18 Mar 2010, Ky resubmitted the corrected patch as a full patch [3], and then as a delta [4]. The difference is: blkfront calls revalidate_disk() too on the resized disk.

It was discussed in further parts of the thread that revalidate_disk() is not present in linux-2.6.18-xen.hg. As Pasi pointed out [5] [6], RHEL-5 luckily does have revalidate_disk(). (See commits 39bf2c01 and 56d76e5a, and bug 444964.)

The patch fixed the problem for Joost [7].

So it seems that we can't purely rely on upstream now, because they still need to port the RHEL-5 revalidate_disk() to their tree (as of 1080:c896d26c6b7c). I will add the revalidate_disk() to blkfront and retest.

[1] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00467.html
[2] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00930.html
[3] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg01047.html
[4] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg01049.html
[5] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg01159.html
[6] http://lists.xensource.com/archives/html/xen-devel/2010-03/msg01201.html
[7] http://lists.xensource.com/archives/html/xen-devel/2010-04/msg00097.html

Comment 9 Laszlo Ersek 2011-05-11 18:16:59 UTC
(In reply to comment #8)
> Joost Roeleveld reported the same problem as described above [2]. He also
> faced the problem on the second resizing attempt.

Lack of the revalidate_disk() call probably leaves the system in such a state that further resize attempts won't work.

Comment 10 Laszlo Ersek 2011-05-11 18:31:53 UTC
Created attachment 498371 [details]
revalidate virtual disk in blkfront after size change

Taken from
http://lists.xensource.com/archives/html/xen-devel/2010-03/msg01049.html

Comment 11 Laszlo Ersek 2011-05-12 08:35:16 UTC
Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=3317051

2nd resize attempt hung with the same symptoms as in comment 8.

I notice if vbd_resize() reaches the call to xenbus_transaction_end(), then in case of any return value except -EAGAIN, it will call xenbus_transaction_end() twice in a row. If the first call was successful (commit succeeded), then this is definitely wrong.

xenbus_transaction_end() calls

    up_read(&xs_state.suspend_mutex);

unconditionally, which is the other half of

    down_read(&xs_state.suspend_mutex);

called by xenbus_transaction_start(). vbd_resize() messes up this mutex by calling xenbus_transaction_end() twice in a row, and that may be why blkback hangs in xenbus_transaction_start() / __down_read() on the second try.

I'll try to fix this and if it works, we'll have to submit it to upstream.

Comment 12 Laszlo Ersek 2011-05-12 09:13:02 UTC
Created attachment 498486 [details]
fix xenbus xact start deadlock by removing double xact end

Comment 13 Laszlo Ersek 2011-05-12 10:59:24 UTC
With this patch, everything worked fine, multiple increases and decreases.

host & guest:

  2.6.18-260.el5.vbd_resize_bz618317_4.local.xen

(Brew build completed in the meantime:
https://brewweb.devel.redhat.com/taskinfo?taskID=3318970)

host dmesg:

  VBD Resize: new size 131072
  VBD Resize: new size 196608
  VBD Resize: new size 262144
  VBD Resize: new size 196608
  VBD Resize: new size 131072
  VBD Resize: new size 65536

guest fdisk:

  255 heads, 63 sectors/track, 4 cylinders, total 65536 sectors
  255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
  255 heads, 63 sectors/track, 12 cylinders, total 196608 sectors
  255 heads, 63 sectors/track, 16 cylinders, total 262144 sectors
  255 heads, 63 sectors/track, 12 cylinders, total 196608 sectors
  255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
  255 heads, 63 sectors/track, 4 cylinders, total 65536 sectors

guest dmesg:

  Setting capacity to 131072
  Setting capacity to 196608
  Setting capacity to 262144

  end_request: I/O error, dev xvdb, sector 262136
  Buffer I/O error on device xvdb, logical block 32767
  Setting capacity to 196608
  end_request: I/O error, dev xvdb, sector 262136
  Buffer I/O error on device xvdb, logical block 32767
  end_request: I/O error, dev xvdb, sector 196600
  Buffer I/O error on device xvdb, logical block 24575
  Setting capacity to 131072
  end_request: I/O error, dev xvdb, sector 196600
  Buffer I/O error on device xvdb, logical block 24575
  end_request: I/O error, dev xvdb, sector 131064
  Buffer I/O error on device xvdb, logical block 16383
  Setting capacity to 65536
  end_request: I/O error, dev xvdb, sector 131064
  Buffer I/O error on device xvdb, logical block 16383

The guest dmesg errors are the consequence of the decrease steps. Upstream knows about them and they seem to be harmless:

http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00752.html

Mailed patch in attachment 498486 [details] to upstream:
http://lists.xensource.com/archives/html/xen-devel/2011-05/msg00670.html

Comment 14 Pasi Karkkainen 2011-05-12 11:44:07 UTC
Great, thanks!

Could you please add more debugging information to the log entries.. ie. which VBD was resized (on the host), for which VM?

And in the guest: "Setting capacity to 131072" - what unit is that? sectors? And which block device (xvdX) it was? 

I *think* earlier there was a patch that adds more logging.. see xen-devel mailinglist archives. I think it was sent by Ky Srinivasan.

Comment 16 Laszlo Ersek 2011-05-12 13:38:23 UTC
(In reply to comment #14)

> Could you please add more debugging information to the log entries.. ie. which
> VBD was resized (on the host), for which VM?
> 
> And in the guest: "Setting capacity to 131072" - what unit is that? sectors?
> And which block device (xvdX) it was? 
> 
> I *think* earlier there was a patch that adds more logging.. see xen-devel
> mailinglist archives. I think it was sent by Ky Srinivasan.

http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01588.html

Comment 17 Laszlo Ersek 2011-05-12 15:01:03 UTC
Created attachment 498564 [details]
vbd resizing + revalidate vbd + xenbus xact deadlock removed

Comment 18 Laszlo Ersek 2011-05-12 15:01:51 UTC
Created attachment 498565 [details]
more informative dmesg entries after resizing

Comment 19 Laszlo Ersek 2011-05-12 15:12:25 UTC
both host & guest:

2.6.18-260.el5.vbd_resize_bz618317_5.local.xen

host dmesg:

  VBD Resize: Domid: 2, Device: (253, 4), New Size: 131072 sectors
  VBD Resize: Domid: 2, Device: (253, 4), New Size: 196608 sectors
  VBD Resize: Domid: 2, Device: (253, 4), New Size: 262144 sectors
  VBD Resize: Domid: 2, Device: (253, 4), New Size: 196608 sectors
  VBD Resize: Domid: 2, Device: (253, 4), New Size: 131072 sectors
  VBD Resize: Domid: 2, Device: (253, 4), New Size: 65536 sectors

[root@lacos-workstation ~]# ls -l /dev/mapper/VolGroup0-bz618317 
brw-rw---- 1 root disk 253, 4 May 12 16:42 /dev/mapper/VolGroup0-bz618317

guest fdisk:

  255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
  255 heads, 63 sectors/track, 12 cylinders, total 196608 sectors
  255 heads, 63 sectors/track, 16 cylinders, total 262144 sectors
  255 heads, 63 sectors/track, 12 cylinders, total 196608 sectors
  255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
  255 heads, 63 sectors/track, 4 cylinders, total 65536 sectors

guest dmesg:

  Changing capacity of (202, 16) to 131072 sectors
  Changing capacity of (202, 16) to 196608 sectors
  Changing capacity of (202, 16) to 262144 sectors
  end_request: I/O error, dev xvdb, sector 262136
  Buffer I/O error on device xvdb, logical block 32767
  Changing capacity of (202, 16) to 196608 sectors
  end_request: I/O error, dev xvdb, sector 262136
  Buffer I/O error on device xvdb, logical block 32767
  end_request: I/O error, dev xvdb, sector 196600
  Buffer I/O error on device xvdb, logical block 24575
  Changing capacity of (202, 16) to 131072 sectors
  end_request: I/O error, dev xvdb, sector 196600
  Buffer I/O error on device xvdb, logical block 24575
  end_request: I/O error, dev xvdb, sector 131064
  Buffer I/O error on device xvdb, logical block 16383
  Changing capacity of (202, 16) to 65536 sectors
  end_request: I/O error, dev xvdb, sector 131064
  Buffer I/O error on device xvdb, logical block 16383

without errors:

  Changing capacity of (202, 16) to 131072 sectors
  Changing capacity of (202, 16) to 196608 sectors
  Changing capacity of (202, 16) to 262144 sectors
  Changing capacity of (202, 16) to 196608 sectors
  Changing capacity of (202, 16) to 131072 sectors
  Changing capacity of (202, 16) to 65536 sectors

[root@localhost ~]# ls -l /dev/xvdb
brw-r----- 1 root disk 202, 16 May 12 10:53 /dev/xvdb

Comment 23 Laszlo Ersek 2011-05-25 12:08:52 UTC
Created attachment 500795 [details]
online dynamic resize of guest virtual disks

(4) blkback: don't call vbd_size() if bd_disk is NULL
    http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01710.html

Comment 24 Pasi Karkkainen 2011-05-25 13:49:41 UTC
Laszlo: Are you planning to add the online resize patches also to the rhel6 kernel? It's tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=654982 .

Comment 25 Laszlo Ersek 2011-05-25 15:52:01 UTC
(In reply to comment #24)
> Laszlo: Are you planning to add the online resize patches also to the rhel6
> kernel?

Yes, at some point one of us will do it.

Comment 28 Jarod Wilson 2011-09-02 15:38:00 UTC
Patch(es) available in kernel-2.6.18-284.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 29 Jarod Wilson 2011-09-02 17:39:58 UTC
Patch(es) available in kernel-2.6.18-284.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 30 Qixiang Wan 2011-12-07 08:44:25 UTC
Verified with 
Host: kernel-xen-2.6.18-300.el5
Guests: RHEL5.7 PV guests with kernel-xen-2.6.18-300.el5 (both i386 and x86_64)

test steps (follow comment 2):

in host (PE Size = 4096K):

# xm block-attach rhel5x32pv phy:/dev/VolGroup0/testlv xvdb w
# lvresize --extents=+1 /dev/VolGroup0/testlv 
# lvresize --extents=+1 /dev/VolGroup0/testlv 
# lvresize --extents=+1 /dev/VolGroup0/testlv 
# lvresize --extents=+1024 /dev/VolGroup0/testlv 
# lvresize --extents=-512 /dev/VolGroup0/testlv 

host dmesg:

VBD Resize: Domid: 4, Device: (253, 0), New Size: 16384 sectors
VBD Resize: Domid: 4, Device: (253, 0), New Size: 24576 sectors
VBD Resize: Domid: 4, Device: (253, 0), New Size: 32768 sectors
VBD Resize: Domid: 4, Device: (253, 0), New Size: 8421376 sectors
VBD Resize: Domid: 4, Device: (253, 0), New Size: 4227072 sectors


in guest:

$ fdisk -ul /dev/xvdb

255 heads, 63 sectors/track, 0 cylinders, total 8192 sectors
255 heads, 63 sectors/track, 1 cylinders, total 16384 sectors
255 heads, 63 sectors/track, 1 cylinders, total 24576 sectors
255 heads, 63 sectors/track, 2 cylinders, total 32768 sectors
255 heads, 63 sectors/track, 524 cylinders, total 8421376 sectors
255 heads, 63 sectors/track, 263 cylinders, total 4227072 sectors

# dmesg
Changing capacity of (202, 16) to 16384 sectors
Changing capacity of (202, 16) to 24576 sectors
Changing capacity of (202, 16) to 32768 sectors
Changing capacity of (202, 16) to 8421376 sectors
end_request: I/O error, dev xvdb, sector 8421368
Buffer I/O error on device xvdb, logical block 1052671
Changing capacity of (202, 16) to 4227072 sectors
end_request: I/O error, dev xvdb, sector 8421368
Buffer I/O error on device xvdb, logical block 1052671

The I/O error messages printed while decreasing the lv are harmless according to comment 13.

Comment 31 Julio Entrena Perez 2012-01-13 12:25:09 UTC
Is this supposed to work with the boot VBD /dev/xvda?

[root@guest ~]# dmesg | tail -1
Changing capacity of (202, 0) to 33554432 sectors
[root@guest ~]# fdisk -ul /dev/xvda | grep sectors$
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
[root@guest ~]# /sbin/blockdev --rereadpt /dev/xvda
BLKRRPART: Device or resource busy

Comment 32 Laszlo Ersek 2012-01-13 12:36:50 UTC
(In reply to comment #31)
> Is this supposed to work with the boot VBD /dev/xvda?
> 
> [root@guest ~]# dmesg | tail -1
> Changing capacity of (202, 0) to 33554432 sectors
> [root@guest ~]# fdisk -ul /dev/xvda | grep sectors$
> 255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
> [root@guest ~]# /sbin/blockdev --rereadpt /dev/xvda
> BLKRRPART: Device or resource busy

I have no idea. Can you trace why the ioctl returns -EBUSY?

Comment 33 Laszlo Ersek 2012-01-13 12:40:33 UTC
(In reply to comment #32)
> (In reply to comment #31)

> > Is this supposed to work with the boot VBD /dev/xvda?

> I have no idea.

By that I mean I can't recall any argument for or against such support in the upstream discussion.

Looking at blkdev_reread_part(), it can immediately return -EBUSY if:

	if (!mutex_trylock(&bdev->bd_mutex))
		return -EBUSY;

Can you reread the partition table (inside the guest) without resizing the LV first in the host?

Comment 34 Laszlo Ersek 2012-01-13 12:54:57 UTC
The patch in attachment 500795 [details] adds a call to revalidate_disk().

connect() [drivers/xen/blkfront/blkfront.c]
-> revalidate_disk() [fs/block_dev.c]

  -> no blkfront specific revalidation, see "xlvbd_block_fops" in
     "drivers/xen/blkfront/vbd.c"

  -> mutex_lock(&bdev->bd_mutex)
  -> check_disk_size_change(disk, bdev)
  -> mutex_unlock(&bdev->bd_mutex)

That's the same mutex (but most probably not the only other locking site).

I recall from my testing that you have to read a file or do something similar on the vbd first to have the guest kernel notice the size change. Can you please try that?

1. LV resizing in the host,
2. in the guest, cat /etc/motd
3. reread the partition table

Thanks.

Comment 36 Julio Entrena Perez 2012-01-13 14:23:26 UTC
> Can you reread the partition table (inside the guest) without resizing the LV
> first in the host?

It still returns -EBUSY indeed.

Comment 38 Bryn M. Reeves 2012-01-13 17:35:38 UTC
I think in Julio's case we are not even getting that far (although I did just check and BLKPG ioctls to also fail to modify non-busy partitions:

# partx -a /dev/xvda
BLKPG: Device or resource busy
error adding partition 1
BLKPG: Device or resource busy
error adding partition 2
BLKPG: Device or resource busy
error adding partition 3
BLKPG: Device or resource busy
error adding partition 4

Does xvda do any of its own partition logic? Seems odd to have xvda3 and xvda4 (not currently defined in MBR) as start/size 0 devices that return ENXIO on access and don't permit changes.)

I took a quick look at the fdisk side and it seems that BLKGETSIZE/BLKGETSIZE64 are still returning the old size value even though proc & sys report the new size:

# grep xvda /proc/partitions 
 202     0   16777216 xvda
 202     1     104391 xvda1
 202     2    8281507 xvda2

# cat /sys/block/xvda/size 
33554432

[ both report correct size, 17179869184 bytes ]

# blockdev --getsz /dev/xvda
16777216
# fdisk -l /dev/xvda

Disk /dev/xvda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          13      104391   83  Linux
/dev/xvda2              14        1044     8281507+  8e  Linux LVM

[ both report the wrong (old) size as reported by BLKGETSIZE/BLKGETSIZE64 ]

dmesg also reports the resize:
Changing capacity of (202, 0) to 33554432 sectors

Comment 39 Laszlo Ersek 2012-01-13 18:55:32 UTC
(In reply to comment #38)
> I think in Julio's case we are not even getting that far (although I did just
> check and BLKPG ioctls to also fail to modify non-busy partitions:
> 
> # partx -a /dev/xvda
> BLKPG: Device or resource busy
> error adding partition 1
> BLKPG: Device or resource busy
> error adding partition 2
> BLKPG: Device or resource busy
> error adding partition 3
> BLKPG: Device or resource busy
> error adding partition 4

In a -274 guest on a -303 host, the same is printed right after booting into the guest (no resizing whatsoever, so the host side having the host side patch should not matter). There's a difference in the output between the first and any further attempts. First attempt:

  [root@dhcp-1-154 ~]# partx -a /dev/xvda
  BLKPG: Device or resource busy
  error adding partition 1
  BLKPG: Device or resource busy
  error adding partition 2

Second and further attempts:

  [root@dhcp-1-154 ~]# partx -a /dev/xvda
  BLKPG: Device or resource busy
  error adding partition 1
  BLKPG: Device or resource busy
  error adding partition 2
  BLKPG: Device or resource busy
  error adding partition 3
  BLKPG: Device or resource busy
  error adding partition 4

However, /dev/xvdb works both with "partx -a" and "blockdev --rereadpt".


> Does xvda do any of its own partition logic?

What do you mean? Blkfront has the following fops [drivers/xen/blkfront/vbd.c]:

static struct block_device_operations xlvbd_block_fops =
{
	.owner = THIS_MODULE,
	.open = blkif_open,
	.release = blkif_release,
	.ioctl  = blkif_ioctl,
	.getgeo = blkif_getgeo
};

In the guest the xvda block device should be partitioned like any other block device, I think. Boot disk partitions cannot be reread on a physical machine & physical disk either.


> Seems odd to have xvda3 and xvda4 (not currently defined in MBR) as
> start/size 0 devices that return ENXIO on access and don't permit changes.)

It may be odd, but they were not introduced by the patches for this bug. Please always compare results with RHEL-5.7 guests.


> I took a quick look at the fdisk side and it seems that
> BLKGETSIZE/BLKGETSIZE64 are still returning the old size value even though
> proc & sys report the new size:

I think it's due to the same mutex being held. The BLKGETSIZE64 ioctl is served by

    i_size_read(bdev->bd_inode)

That quantity is set by:

connect() [drivers/xen/blkfront/blkfront.c]
-> revalidate_disk() [fs/block_dev.c]
  -> bdget_disk()
  -> mutex_lock(&bdev->bd_mutex)
  -> check_disk_size_change(disk, bdev)
    -> i_size_write(bdev->bd_inode, disk_size)
  -> mutex_unlock(&bdev->bd_mutex)

If bdget_disk() fails (or the mutex is held), the new value is not written. The latter case should also cause a thread to hang -- therefore I assume it's bdget_disk() that fails for the boot disk. But we should summon block layer experts for that.

BTW check_disk_size_change() prints "detected capacity change" on KERN_INFO level, and the above logs don't contain that. Please repeat the same resizing test (and compare the dmesgs) under a -304 PV guest and -304 host between xvda and xvdb.

check_disk_size_change() compares the gendisk size with the blockdev size, and if they differ, the blockdev size is adapted to the gendisk size. This happens under the blockdev mutex. The vbd resizing happens on the gendisk side first -- see the set_capacity() call in attachment 500795 [details] --,
then revalidate_disk() triggers check_disk_size_change(), if bdget_disk() succeeds and the mutex can be acquired.

The "Changing capacity of (202, 0) to 33554432 sectors" message corresponds to the gendisk level change, printed by blkfront (see the patch). The missing "... detected capacity change from ..." message would be printed by the blockdev layer, and it's not (for xvda).

Comment 40 Laszlo Ersek 2012-01-13 19:06:35 UTC
So, it's not supposed to work with the boot vbd, for the same reason why the partition table of a physical boot disk can't be re-read without rebooting. In the guest, the boot gendisk is in fact resized, but the kernel prevents the blockdev from following suit.

I'd greatly appreciate if you could run this by a block layer expert.

Comment 41 errata-xmlrpc 2012-02-21 03:28:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html