Bug 2139728
Summary: | [Azure][RHEL8] Live resize of disk does not trigger a rescan of the device capacity | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Klaas Demter <klaas> | |
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | |
kernel sub component: | Hyper-V | QA Contact: | xuli <xuli> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | cavery, cwarfiel, eterrell, haiyangz, jmagrini, kkashanjat, kys, litian, longli, mikelley, minlei, revers, sgeorgejohn, sreber, vkuznets, xuli, xxiong, yacao, yuxisun | |
Version: | 8.6 | Keywords: | Triaged, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-4.18.0-445.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2192344 2192345 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-16 08:55:50 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2192344, 2192345 |
Description
Klaas Demter
2022-11-03 10:13:54 UTC
Can someone please remove the private again, nothing in this bug is private information @xuli it seems the private flag gets readded by @bugzilla automatically. Shall I open a bug against bugzilla for that -- or is it some setting I don't know about :) Also I am guessing we should add the Microsoft people to this, right? This should be missing in the HyperV linux kernel parts? @haiyangz @kys Hi Klaas, Thank you so much for raising this bug. Based on my test, the same behavior exists when testing RHEL 7/8/9 VM on Hyper-V, Ubuntu VM (5.15.0-52-generic) on Hyper-V, also RHEL VM on ESXi platform. After expanding the disk, it needs to execute `echo 1 > /sys/block/XX/device/rescan` to get the expanded disk size. Could please Long Li and Cathy help to check and explain more whether this is working as design or not? Thank you so much. Add more detailed steps: 1. Start VM, add new hard disk 1 G (SCSI Controller -> Hard Drive -> New -> Dynamically expanding) # fdisk -l Disk /dev/sdb: 1 GiB, 1073741824 bytes, 2097152 sectors Disk model: Virtual Disk Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes # dmesg | grep -i capacity [ 157.693295] sd 0:0:0:1: Capacity data has changed 2. Go to Hyper-V Manager, select disk, edit disk and expand to 2G 3. Check disk size in VM # fdisk -l Disk /dev/sdb: 1 GiB, 1073741824 bytes, 2097152 sectors Disk model: Virtual Disk Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes # echo 1 > /sys/block/sdb/device/rescan # fdisk -l Disk /dev/sdb: 2 GiB, 2147483648 bytes, 4194304 sectors Disk model: Virtual Disk Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes (In reply to Klaas Demter from comment #2) > @xuli it seems the private flag gets readded by > @bugzilla automatically. Shall I open a bug against bugzilla for > that -- or is it some setting I don't know about :) From bugzilla team's information, defaulting bugs to private is a per component setting. The setting is controlled by the team that own a component, e.g. Kernel component for this bug. If we disagree with the setting for a specific component we need to contact the owners of that component and discuss it with them. Thank you. Hi Xuli, What is the behavior/result when doing similar test on virtio-scsi/kvm guest? Thanks, Ming @minlei I can answer that one. Outside of using Azure I only use qemu-kvm via ovirt/red hat virtualization. There the disk resize is immediately recognized by the OS. This is the output from a current c8s kernel VM: [186622.645817] sd 0:0:0:0: Capacity data has changed [186622.651773] sd 0:0:0:0: [sda] 106954752 512-byte logical blocks: (54.8 GB/51.0 GiB) [186622.655150] sda: detected capacity change from 53687091200 to 54760833024 The example here is with virtio-scsi. So the first part happens with hyperV as well, sd notices that the capacity data has changed (see initial description "[620251.940766] sd 1:0:0:0: Capacity data has changed"). But it does not continue. I had a brief talk with the qemu people a couple of weeks ago, they said the host is kicking the VM to initiate a rescan, I believe they pointed to https://github.com/torvalds/linux/blob/f141df371335645ce29a87d9683a3f79fba7fd67/drivers/block/virtio_blk.c#L586 , but I did not save the conversation :) Greetings Klaas (In reply to Klaas Demter from comment #7) > @minlei I can answer that one. > > Outside of using Azure I only use qemu-kvm via ovirt/red hat virtualization. > There the disk resize is immediately recognized by the OS. This is the > output from a current c8s kernel VM: > > > [186622.645817] sd 0:0:0:0: Capacity data has changed > [186622.651773] sd 0:0:0:0: [sda] 106954752 512-byte logical blocks: (54.8 > GB/51.0 GiB) > [186622.655150] sda: detected capacity change from 53687091200 to 54760833024 > > The example here is with virtio-scsi. > > > So the first part happens with hyperV as well, sd notices that the capacity > data has changed (see initial description "[620251.940766] sd 1:0:0:0: > Capacity data has changed"). But it does not continue. > > I had a brief talk with the qemu people a couple of weeks ago, they said the > host is kicking the VM to initiate a rescan, I believe they pointed to > https://github.com/torvalds/linux/blob/ > f141df371335645ce29a87d9683a3f79fba7fd67/drivers/block/virtio_blk.c#L586 , > but I did not save the conversation :) Probably the difference may be caused in the following code: sd_open(): ... if (sd_need_revalidate(bdev, sdkp)) sd_revalidate_disk(bdev->bd_disk); ... For virtio-scsi, the above sd_revalidate_disk() is called and size change message is dumpped, but looks sd_need_revalidate() returns false for storvsc. Thanks, I just posted a fix to LKML: https://lore.kernel.org/linux-hyperv/1668019722-1983-1-git-send-email-mikelley@microsoft.com/T/#u. (In reply to Michael Kelley from comment #10) > I just posted a fix to LKML: > https://lore.kernel.org/linux-hyperv/1668019722-1983-1-git-send-email- > mikelley/T/#u. @mikelley thank you for the update. Can you ping this BZ once your patch is merged so we can get it backported? Thanks! Patch is now merged. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/storvsc_drv.c?id=b8a5376c321b4669f7ffabc708fd30c3970f3084 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2951 I have verified this on a rhel 8.8 VM. works as expected. Thanks to everyone involved in solving this! This seems to have come back, it does not happen reliably, 4.18.0-553.27.1.el8_10.x86_64 - unsure if this is triggered by a change in rhel or within azure/hyperv. (In reply to Klaas Demter from comment #30) > This seems to have come back, it does not happen reliably, > 4.18.0-553.27.1.el8_10.x86_64 - unsure if this is triggered by a change in > rhel or within azure/hyperv. I'm genuinely surprised in case 4.18.0-553.27.1.el8_10 build brings the change, there's nothing storvsc or even storage related there: $ git log --no-decorate --no-merges kernel-4.18.0-553.26.1.el8_10..kernel-4.18.0-553.27.1.el8_10 --oneline b3b28473264f [redhat] kernel-4.18.0-553.27.1.el8_10 4392329c5e11 lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() 95bcd0ef46a5 xfrm: set dst dev to blackhole_netdev instead of loopback_dev in ifdown 71dbc0ae3aab ELF: fix kernel.randomize_va_space double read 9ac343b70f16 loopback: fix lockdep splat 2b2e1b9e8905 blackhole_netdev: use blackhole_netdev to invalidate dst entries ae956806371c loopback: create blackhole net device similar to loopack. b1f43fbcdee8 tty: tty_io: update timestamps on all device nodes 91718de50c00 tty: use 64-bit timstamp aa0b335659f5 bpf: Fix overrunning reservations in ringbuf bedd99db2bee bonding: fix xfrm real_dev null pointer dereference 8e208c48f54d bonding: fix null pointer deref in bond_ipsec_offload_ok 41f7c421e69a xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create (and in fact it has been a while since the last change in storvsc in RHEL8, kernel-4.18.0-511.el8/RHEL-8.9.0). I would suggest to re-test 4.18.0-553.26.1.el8_10 (or what was previously running on the system) to rule out RHEL kernel changes as the culprit. |