Hide Forgot
Customer Contact Name: Masayoshi Mizuma Description of Problem: The open() system call accidentally cancels block-size change performed by prior ioctl(, BLKBSZSET,) system call. The behavior of open() is a kernel bug, so please fix it. The detail is as follows. According to the man page of blockdev(8), we can set the block size to a block device by "--setbsz" option: --------------------------------------------------------------------- --setbsz N Set blocksize to N bytes. --------------------------------------------------------------------- "--setbsz" option issues ioctl(, BLKBSZSET, ) system call. We tested the option as follows but it did not work. 1. Get the current block size # blockdev --getbsz /dev/sda7 4096 2. Change the block size from 4096 to 1024 # blockdev --setbsz 1024 /dev/sda7 3. Check the block size # blockdev --getbsz /dev/sda7 4096 We investigated it and found the root cause. The block size is wrongly reset as a value in __blkdev_get() via open() system call when no one used it. The code is as follows: --------------------------------------------------------------------- static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) { ... if (!bdev->bd_openers) { ... bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); ... bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); void bd_set_size(struct block_device *bdev, loff_t size) { ... bdev->bd_block_size = bsize; --------------------------------------------------------------------- Therefore, the block size is reset on step 3. even after the block size was modified on step 2. We believe this behavior is a kernel bug because __blkdev_get() should not be the place to initialize the block size. It has been already initialized in bdget() before __blkdev_get() is called. --------------------------------------------------------------------- struct block_device *bdget(dev_t dev) { if (inode->i_state & I_NEW) { ... bdev->bd_block_size = (1 << inode->i_blkbits); --------------------------------------------------------------------- Hence, the resetting in __blkdev_get() is not even initialization of the block size. In addition, the resetting in __blkdev_get() is pointless and meaningless because one of upper layers of block device, for example, filesystem, swap device, or raw device, will set the block size using set_blocksize() or sb_set_blocksize() after __blkdev_get() is called. Therefore, we believe there is no reason to reset the block size in __blkdev_get(). If the behavior is not a kernel bug for some reason we do not know yet, please do following three actions. o explain why the behavior is not a kernel bug. o fix the man page of blockdev(8). Changing the block size is reset when open() system call is issued by someone, for example, "blockdev --getbsz" command, so we consider "blockdev --setbsz" command does not work. For example, remove the description of "--setbsz" option. o fix the man page of guestfish(1) because it includes same description as blockdev(8) almost without any changes. For example, remove the description of "blockdev-setbsz" option. --------------------------------------------------------------------- blockdev-setbsz blockdev-setbsz device blocksize This sets the block size of a device. (Note this is different from both size in blocks and filesystem block size). This uses the blockdev(8) command. --------------------------------------------------------------------- Version-Release number of selected component: Red Hat Enterprise Linux Version Number: RHEL6 Release Number: 6.4 Architecture: x86_64 Kernel Version: 2.6.32-358.el6.x86_64 Related Package Version: util-linux-ng-2.17.2-12.9.el6.x86_64.rpm libguestfs-tools-c-1.16.34-2.el6.x86_64.rpm Related Middleware/Application: None. Drivers or hardware or architecture dependency: None. How reproducible: Always. Step to Reproduce: 1. Get the current block size # blockdev --getbsz /dev/sda7 4096 2. Change the block size from 4096 to 1024 # blockdev --setbsz 1024 /dev/sda7 3. Check the block size # blockdev --getbsz /dev/sda7 4096 Actual Results: The block size is still 4096 on step 3. Expected Results: The block size is correctly 1024 on step 3. Summary of actions taken to resolve issue: None. Location of diagnostic data: None. Hardware configuration: Model: PRIMERGY RX200 S5 CPU Info: Intel(R) Xeon(R) CPU L5506 @ 2.13GHz Memory Info: 3GB Hardware Component Information: Configuration Info: Guest Configuration Info: Business Impact: None. Target Release: 6.5GA Errata Request: None. Additional Info: We think the resetting of block size via open() was introduced on the following commit: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=a9e3cad6d153a0802946d0d9cc3b8f66eade3373 --- commit a9e3cad6d153a0802946d0d9cc3b8f66eade3373 Author: Alexander Viro <viro.edu> Date: Tue Apr 30 19:58:03 2002 -0700 [PATCH] (4/6) blksize_size[] removal ... + if (!bdev->bd_openers) { + unsigned bsize = bdev_hardsect_size(bdev); + while (bsize < PAGE_CACHE_SIZE) { + if (bdev->bd_inode->i_size & bsize) + break; + bsize <<= 1; + } + bdev->bd_block_size = bsize; + bdev->bd_inode->i_blkbits = blksize_bits(bsize); + } --- Later, the initialization of block size was introduced on the following commit: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fe178e800e8ac0f3f13f0ce4b2c170599669713b --- commit fe178e800e8ac0f3f13f0ce4b2c170599669713b Author: Alexander Viro <viro.org.uk> Date: Sat Aug 30 22:51:22 2003 -0700 [PATCH] dev_t handling cleanups (9/12) ... + if (inode->i_state & I_NEW) { + bdev->bd_contains = NULL; + bdev->bd_inode = inode; + bdev->bd_block_size = (1 << inode->i_blkbits); --- So, we think the resetting in __blkdev_get() should have been removed on fe178e800e.
Jeff, Tejun, is this something one of you two can look at? Thanks!
I believe the last close destroys the block device, so there is no way to retain that information. If you instead tested with another application holding the block device open, I think you would get the behaviour you desire. I don't see this as a bug. If an application wishes to set a particular block size for a device, then shouldn't it also be using the device? Can you please explain what it is you are trying to accomplish?
Hi Jeffrey, > I believe the last close destroys the block device, so there is no way to retain that information. We believe the block device is not destroyed at the last close, so the block_device.bd_block_size remains after the last close. blkdev_close => blkdev_put => __blkdev_put # don't destroy block device. > If you instead tested with another application holding the block device open, I think you would get the behaviour you desire. > > I don't see this as a bug. If an application wishes to set a particular block size for a device, then shouldn't it also be using the device? We have a different opinion. We believe an application which wishes to set a particular block size for a device does not have to always keep open()ing the device. Therefore, the block size needs to keep the value even after the next open() call. blockdev command and guestfish command seem to be implemented based on such understanding. In addition, we can not understand why the behavior of resetting depends on whether another application is holding the block device or not. We believe the block size should not be reset regardless of whether another application is using or not. Regards, Masayoshi Mizuma
Hello, While I agree that the current behavior is a bit weird. I have some basic questions. * Why does this matter? More fundamentally, why do we still have block_size at all? At this point, it's just an arbitrary power-of-two number between logical block size and PAGE_SIZE. Nobody really cares what this value is set to. At this point, it's just a backward compatibility fluff, which, BTW, makes perfect sense. Block size is inherent property of the underlying device, it doesn't really make any sense to have that configurable at block layer. * While block_device may not go away each time it's released. It *is* transient. When nobody is holding onto it, its inode can be reclaimed and as such block layer should fully reinitialize on each fresh open. Also, device characteristics may change across ->open. e.g. block size may change across ->open if the device supports removable media. If we decide to keep the configured blocksize, it isn't clear how such situations should be handled. Should it be reset or should it be kept at the configured value as long as it is inside allowed range? If the configured value isn't in the newly allowed range, what should be done about it? It's all doable but the main question here is why. What's the use case here? I can't think of anything useful to do with blocksize and we'd just be wandering aimlessly if we don't know what we're trying to achieve. If it's just "the doc says so", updating the doc and getting rid of blocksize would be a far more productive way to go. Thanks.
Hi Tejun, > Block size is inherent property of the underlying device, it doesn't > really make any sense to have that configurable at block layer. We believe it is not always true because filesystems, swap device and raw device need to configure the block size to manage the I/O unit. > * While block_device may not go away each time it's released. It *is* transient. > When nobody is holding onto it, its inode can be reclaimed and as such block > layer should fully reinitialize on each fresh open. Block device and inode will be released only when the device is removed e.g. removing hot pluggable devices or deleting partitions. Therefore, we believe the block size has been kept until such events happen. > Also, device characteristics may change across ->open. e.g. block size may change > across ->open if the device supports removable media. If we decide to keep the > configured blocksize, it isn't clear how such situations should be handled. In such case or deleting and recreating disk partitions, the block device and the inode will be released and the block size will be reinitialized at bdget(), so block size configuration will not be kept. > Should it be reset or should it be kept at the configured value as long as > it is inside allowed range? If the configured value isn't in the newly > allowed range, what should be done about it? > It's all doable but the main question here is why. What's the use case here? > I can't think of anything useful to do with blocksize and we'd just be > wandering aimlessly if we don't know what we're trying to achieve. If it's > just "the doc says so", updating the doc and getting rid of blocksize > would be a far more productive way to go. Unfortunately, we do not also know the use case to set the block size by using blockdev command and guestfish command... Regards, Masayoshi Mizuma
There are file system block sizes, VM page sizes and physical storage sector/block sizes. I think that you will need to make a very compelling argument of why this control is useful, not just to Red Hat, but also to the upstream list. Best regards, Ric
Hello, (In reply to Fujitsu kernel engineers from comment #6) > > Block size is inherent property of the underlying device, it doesn't > > really make any sense to have that configurable at block layer. > > We believe it is not always true because filesystems, swap device and > raw device need to configure the block size to manage the I/O unit. Yeah, sure, upper layers are free to do whatever they want to do but block layer and anything underneath don't need to know about that. It simply doesn't make any difference to them and the existing code base already doesn't really care about the configured number. Block layer is already just keeping the number for backward compatibility at this point. > > * While block_device may not go away each time it's released. It *is* transient. > > When nobody is holding onto it, its inode can be reclaimed and as such block > > layer should fully reinitialize on each fresh open. > > Block device and inode will be released only when the device is removed > e.g. removing hot pluggable devices or deleting partitions. Therefore, > we believe the block size has been kept until such events happen. Hmmm? Are you sure? bdget/bdput() just gets and puts the inode. I don't see anything pinning it unless it's in use and it looks like it should be reclaimable. Am I missing something here? > In such case or deleting and recreating disk partitions, the block > device and the inode will be released and the block size will be > reinitialized at bdget(), so block size configuration will not be kept. Hmmm? Media change doesn't kill the inode or bdev. It just invalidates it. I suppose it'd be the correct behavior to reset it tho. > Unfortunately, we do not also know the use case to set the block size by > using > blockdev command and guestfish command... So, it's not like there's anyone who needs it, right? I actually can't see how anyone would make any use of it. It's just an arbitrary number which doesn't do anything at this point. I vote for just updating the documentation and possibly just remove the whole thing in the long term. It doesn't do anything other than confusing people. Thanks.
The blockdev "--setbsz" command could also be considered a "historical" interface; sure, you can set & query block size, but it doesn't say anything about persistence (or usefulness). Removing the docs from the blockdev(8) manpage is probably fine, although I'm not sure it's really worth doing in RHEL6. I'd be more curious about why guestfish has this ability - Richard, any input here? Was this added for a specific purpose or just for blockdev(8) parity? Thanks, -Eric
Hi guys, We understand Tejun and Eric's opinion and agree with your suggestion because nobody seems to expect the functionality of changing block size in those commands. Therefore, please updates the man pages. > Removing the docs from the blockdev(8) manpage is probably fine, although I'm not sure it's really worth doing in RHEL6 We consider the removing is worth doing in RHEL6, of course also RHEL7 because it prevents users from confusion and questions about the reset behavior. Regards, Masayoshi Mizuma
I suggest we deprecate these calls in libguestfs too (bug 624334). I didn't know that the block size was just a historical artifact.
Hi Oguma-san, How about the current status of fixing the man page of blockdev(8) and guestfish(1)? It seems that no one is assigned to that yet, so we afraid there is no progress about that. Could you assign an engineer and fix the man pages by RHEL6.5GA release? Regards, Masayoshi Mizuma
(In reply to Fujitsu kernel engineers from comment #12) Hi Mizuma-san, > How about the current status of fixing the man page of blockdev(8) and > guestfish(1)? > It seems that no one is assigned to that yet, so we afraid there is no > progress about that. > Could you assign an engineer and fix the man pages by RHEL6.5GA release? We are sorry that response becomes slow. We cannot promise whether we can fix this in RHEL6.5GA, however I will confirm this to engineering, and update as soon as possible. Best Regards, Yoko Oguma
I don't think there is time to fix this for RHEL 6.5. However 6.6 is possible. I have filed bugs against RHEL 6.6 and RHEL 7.0 for the libguestfs issues: bug 624335, bug 1016465.
And for blockdev (ie. util-linux): bug 1016470, bug 1016471
Can we make this bug public? It would help with discussion of the blockdev patch.
Dear Fujitsu, (In reply to Richard W.M. Jones from comment #16) > Can we make this bug public? It would help with discussion > of the blockdev patch. Could you please answer the above question? Thank you. Best Regards, Yoko Oguma
Hi Richard and Oguma-san, > Can we make this bug public? Yes, please. Regards, Masayoshi Mizuma
Uncheck group " Red Hat Quality Assurance (internal)" according to #C18 (Requested by: Richard W.M. Jones)
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Hi Oguma-san, (In reply to Richard W.M. Jones from comment #14) > I have filed bugs against RHEL 6.6 and RHEL 7.0 for the > libguestfs issues: bug 624335, bug 1016465. Please add a permission to see Bug 1016465 for us. Regards, Masayoshi Mizuma
(In reply to Fujitsu kernel engineers from comment #22) Hi Mizuma-san, > (In reply to Richard W.M. Jones from comment #14) > > I have filed bugs against RHEL 6.6 and RHEL 7.0 for the > > libguestfs issues: bug 624335, bug 1016465. > > Please add a permission to see Bug 1016465 for us. My apologies. I added a permission to Bug 1016465 so that you can check the contents now. Thank you. Best Regards, Yoko Oguma
Hi Oguma-san, (In reply to Yoko Oguma from comment #23) > My apologies. I added a permission to Bug 1016465 so that you can check the > contents now. Thank you. We can access Bug 1016465, thank you. Regards, Masayoshi Mizuma
I've actually just made the bug completely public.
As this was determined to be a documentation bug for userspace utilities, I am closing this bug.