Bug 1002825 - [FJ6.4 Bug] Changing block size is accidentally canceled by following open() system call.
[FJ6.4 Bug] Changing block size is accidentally canceled by following open() ...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
Unspecified Linux
medium Severity medium
: rc
: ---
Assigned To: Jeff Moyer
Red Hat Kernel QE team
QJ130523-022
: OtherQA
Depends On: 1016470 1016471
Blocks: 1011600 839484
  Show dependency treegraph
 
Reported: 2013-08-30 01:35 EDT by Fujitsu kernel engineers
Modified: 2014-09-30 11:32 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1016470 1016471 (view as bug list)
Environment:
Last Closed: 2014-06-26 13:41:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Fujitsu kernel engineers 2013-08-30 01:35:03 EDT
Customer Contact Name:
Masayoshi Mizuma

Description of Problem:
The open() system call accidentally cancels block-size change performed by
prior ioctl(, BLKBSZSET,) system call.
The behavior of open() is a kernel bug, so please fix it.
The detail is as follows.

According to the man page of blockdev(8), we can set the block size 
to a block device by "--setbsz" option:

---------------------------------------------------------------------
       --setbsz N
              Set blocksize to N bytes.
---------------------------------------------------------------------
"--setbsz" option issues ioctl(, BLKBSZSET, ) system call.
We tested the option as follows but it did not work.

1. Get the current block size 
# blockdev --getbsz /dev/sda7
4096

2. Change the block size from 4096 to 1024
# blockdev --setbsz 1024 /dev/sda7 

3. Check the block size
# blockdev --getbsz /dev/sda7
4096

We investigated it and found the root cause.

The block size is wrongly reset as a value in __blkdev_get() via open()
system call when no one used it. The code is as follows:
---------------------------------------------------------------------
static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
{
...
             if (!bdev->bd_openers) {
...
                               bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
...
                             bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);

void bd_set_size(struct block_device *bdev, loff_t size)
{
...
        bdev->bd_block_size = bsize;
---------------------------------------------------------------------
Therefore, the block size is reset on step 3. even after the block size was modified on step 2.

We believe this behavior is a kernel bug because __blkdev_get() should
not be the place to initialize the block size. It has been already
initialized in bdget() before __blkdev_get() is called.
---------------------------------------------------------------------
 struct block_device *bdget(dev_t dev)
 {
           if (inode->i_state & I_NEW) {
...
                  bdev->bd_block_size = (1 << inode->i_blkbits);
---------------------------------------------------------------------
Hence, the resetting in __blkdev_get() is not even initialization
of the block size.
In addition, the resetting in __blkdev_get() is pointless and meaningless because
one of upper layers of block device, for example, filesystem, swap device,
or raw device, will set the block size using set_blocksize() or
sb_set_blocksize() after __blkdev_get() is called.
Therefore, we believe there is no reason to reset the block size
in __blkdev_get().

If the behavior is not a kernel bug for some reason we do not know yet, 
please do following three actions.

o explain why the behavior is not a kernel bug.

o fix the man page of blockdev(8). Changing the block size is reset when
  open() system call is issued by someone, for example, "blockdev --getbsz" command,
  so we consider "blockdev --setbsz" command does not work. 
  For example, remove the description of "--setbsz" option. 

o fix the man page of guestfish(1) because it includes same description as blockdev(8)
  almost without any changes.
  For example, remove the description of "blockdev-setbsz" option.
---------------------------------------------------------------------
   blockdev-setbsz
        blockdev-setbsz device blocksize

       This sets the block size of a device.

       (Note this is different from both size in blocks and filesystem block size).

       This uses the blockdev(8) command.
---------------------------------------------------------------------

Version-Release number of selected component:
  Red Hat Enterprise Linux Version Number: RHEL6
  Release Number: 6.4
  Architecture: x86_64
  Kernel Version: 2.6.32-358.el6.x86_64
  Related Package Version: util-linux-ng-2.17.2-12.9.el6.x86_64.rpm
                           libguestfs-tools-c-1.16.34-2.el6.x86_64.rpm
  Related Middleware/Application: None.


Drivers or hardware or architecture dependency:
None.

How reproducible:
Always.

Step to Reproduce:
1. Get the current block size 
# blockdev --getbsz /dev/sda7
4096

2. Change the block size from 4096 to 1024
# blockdev --setbsz 1024 /dev/sda7 

3. Check the block size
# blockdev --getbsz /dev/sda7
4096

Actual Results:
The block size is still 4096 on step 3.

Expected Results:
The block size is correctly 1024 on step 3.

Summary of actions taken to resolve issue:
None.

Location of diagnostic data:
None.

Hardware configuration:

  Model:        PRIMERGY RX200 S5
  CPU Info:     Intel(R) Xeon(R) CPU L5506  @ 2.13GHz
  Memory Info:  3GB
  Hardware Component Information:
  Configuration Info:
  Guest Configuration Info:

Business Impact:
None.

  Target Release: 6.5GA

  Errata Request: None.

Additional Info:
We think the resetting of block size via open() was introduced on the following commit:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=a9e3cad6d153a0802946d0d9cc3b8f66eade3373
---
commit a9e3cad6d153a0802946d0d9cc3b8f66eade3373
Author: Alexander Viro <viro@math.psu.edu>
Date:   Tue Apr 30 19:58:03 2002 -0700

    [PATCH] (4/6) blksize_size[] removal
...
+       if (!bdev->bd_openers) {
+               unsigned bsize = bdev_hardsect_size(bdev);
+               while (bsize < PAGE_CACHE_SIZE) {
+                       if (bdev->bd_inode->i_size & bsize)
+                               break;
+                       bsize <<= 1;
+               }
+               bdev->bd_block_size = bsize;
+               bdev->bd_inode->i_blkbits = blksize_bits(bsize);
+       }
---

Later, the initialization of block size was introduced on the following commit:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fe178e800e8ac0f3f13f0ce4b2c170599669713b
---
commit fe178e800e8ac0f3f13f0ce4b2c170599669713b
Author: Alexander Viro <viro@www.linux.org.uk>
Date:   Sat Aug 30 22:51:22 2003 -0700

    [PATCH] dev_t handling cleanups (9/12)
...
+       if (inode->i_state & I_NEW) {
+               bdev->bd_contains = NULL;
+               bdev->bd_inode = inode;
+               bdev->bd_block_size = (1 << inode->i_blkbits);
---
So, we think the resetting in __blkdev_get() should have been removed on fe178e800e.
Comment 2 Ric Wheeler 2013-08-31 12:03:46 EDT
Jeff, Tejun, is this something one of you two can look at?

Thanks!
Comment 3 Jeff Moyer 2013-09-04 14:17:16 EDT
I believe the last close destroys the block device, so there is no way to retain that information.  If you instead tested with another application holding the block device open, I think you would get the behaviour you desire.

I don't see this as a bug.  If an application wishes to set a particular block size for a device, then shouldn't it also be using the device?

Can you please explain what it is you are trying to accomplish?
Comment 4 Fujitsu kernel engineers 2013-09-10 07:53:14 EDT
Hi Jeffrey,

> I believe the last close destroys the block device, so there is no way to retain that information.

We believe the block device is not destroyed at the last close, so
the block_device.bd_block_size remains after the last close.

blkdev_close
 => blkdev_put
    => __blkdev_put
       # don't destroy block device.

> If you instead tested with another application holding the block device open, I think you would get the behaviour you desire.
> 
> I don't see this as a bug.  If an application wishes to set a particular block size for a device, then shouldn't it also be using the device?

We have a different opinion.
We believe an application which wishes to set a particular block size
for a device does not have to always keep open()ing the device. Therefore, the
block size needs to keep the value even after the next open() call. blockdev
command and guestfish command seem to be implemented based on such understanding.

In addition, we can not understand why the behavior of resetting
depends on whether another application is holding the block device or not.
We believe the block size should not be reset regardless of whether another
application is using or not.

Regards,
Masayoshi Mizuma
Comment 5 Tejun Heo 2013-09-10 11:54:01 EDT
Hello,

While I agree that the current behavior is a bit weird. I have some basic questions.

* Why does this matter? More fundamentally, why do we still have block_size at all? At this point, it's just an arbitrary power-of-two number between logical block size and PAGE_SIZE. Nobody really cares what this value is set to. At this point, it's just a backward compatibility fluff, which, BTW, makes perfect sense. Block size is inherent property of the underlying device, it doesn't really make any sense to have that configurable at block layer.

* While block_device may not go away each time it's released. It *is* transient. When nobody is holding onto it, its inode can be reclaimed and as such block layer should fully reinitialize on each fresh open. Also, device characteristics may change across ->open. e.g. block size may change across ->open if the device supports removable media. If we decide to keep the configured blocksize, it isn't clear how such situations should be handled. Should it be reset or should it be kept at the configured value as long as it is inside allowed range? If the configured value isn't in the newly allowed range, what should be done about it?

It's all doable but the main question here is why. What's the use case here? I can't think of anything useful to do with blocksize and we'd just be wandering aimlessly if we don't know what we're trying to achieve. If it's just "the doc says so", updating the doc and getting rid of blocksize would be a far more productive way to go.

Thanks.
Comment 6 Fujitsu kernel engineers 2013-09-11 04:04:05 EDT
Hi Tejun,

> Block size is inherent property of the underlying device, it doesn't
> really make any sense to have that configurable at block layer.

We believe it is not always true because filesystems, swap device and
raw device need to configure the block size to manage the I/O unit.

> * While block_device may not go away each time it's released. It *is* transient.
> When nobody is holding onto it, its inode can be reclaimed and as such block
> layer should fully reinitialize on each fresh open.

Block device and inode will be released only when the device is removed
e.g. removing hot pluggable devices or deleting partitions. Therefore,
we believe the block size has been kept until such events happen.

> Also, device characteristics may change across ->open. e.g. block size may change
> across ->open if the device supports removable media. If we decide to keep the
> configured blocksize, it isn't clear how such situations should be handled.

In such case or deleting and recreating disk partitions, the block
device and the inode will be released and the block size will be
reinitialized at bdget(), so block size configuration will not be kept.

> Should it be reset or should it be kept at the configured value as long as
> it is inside allowed range? If the configured value isn't in the newly
> allowed range, what should be done about it?
> It's all doable but the main question here is why. What's the use case here?
> I can't think of anything useful to do with blocksize and we'd just be
> wandering aimlessly if we don't know what we're trying to achieve. If it's
> just "the doc says so", updating the doc and getting rid of blocksize
> would be a far more productive way to go.

Unfortunately, we do not also know the use case to set the block size by using
blockdev command and guestfish command...

Regards,
Masayoshi Mizuma
Comment 7 Ric Wheeler 2013-09-11 09:36:33 EDT
There are file system block sizes, VM page sizes and physical storage sector/block sizes.

I think that you will need to make a very compelling argument of why this control is useful, not just to Red Hat, but also to the upstream list.

Best regards,

Ric
Comment 8 Tejun Heo 2013-09-11 12:18:02 EDT
Hello,

(In reply to Fujitsu kernel engineers from comment #6)
> > Block size is inherent property of the underlying device, it doesn't
> > really make any sense to have that configurable at block layer.
> 
> We believe it is not always true because filesystems, swap device and
> raw device need to configure the block size to manage the I/O unit.

Yeah, sure, upper layers are free to do whatever they want to do but block layer and anything underneath don't need to know about that. It simply doesn't make any difference to them and the existing code base already doesn't really care about the configured number. Block layer is already just keeping the number for backward compatibility at this point.

> > * While block_device may not go away each time it's released. It *is* transient.
> > When nobody is holding onto it, its inode can be reclaimed and as such block
> > layer should fully reinitialize on each fresh open.
> 
> Block device and inode will be released only when the device is removed
> e.g. removing hot pluggable devices or deleting partitions. Therefore,
> we believe the block size has been kept until such events happen.

Hmmm? Are you sure? bdget/bdput() just gets and puts the inode. I don't see anything pinning it unless it's in use and it looks like it should be reclaimable. Am I missing something here?

> In such case or deleting and recreating disk partitions, the block
> device and the inode will be released and the block size will be
> reinitialized at bdget(), so block size configuration will not be kept.

Hmmm? Media change doesn't kill the inode or bdev. It just invalidates it. I suppose it'd be the correct behavior to reset it tho.

> Unfortunately, we do not also know the use case to set the block size by
> using
> blockdev command and guestfish command...

So, it's not like there's anyone who needs it, right? I actually can't see how anyone would make any use of it. It's just an arbitrary number which doesn't do anything at this point. I vote for just updating the documentation and possibly just remove the whole thing in the long term. It doesn't do anything other than confusing people.

Thanks.
Comment 9 Eric Sandeen 2013-09-16 12:55:08 EDT
The blockdev "--setbsz" command could also be considered a "historical" interface; sure, you can set & query block size, but it doesn't say anything about persistence (or usefulness).   Removing the docs from the blockdev(8) manpage is probably fine, although I'm not sure it's really worth doing in RHEL6.

I'd be more curious about why guestfish has this ability - Richard, any input here?  Was this added for a specific purpose or just for blockdev(8) parity?

Thanks,
-Eric
Comment 10 Fujitsu kernel engineers 2013-09-20 02:40:18 EDT
Hi guys,

We understand Tejun and Eric's opinion and agree with your suggestion
because nobody seems to expect the functionality of changing block size
in those commands.
Therefore, please updates the man pages.

> Removing the docs from the blockdev(8) manpage is probably fine, although I'm not sure it's really worth doing in RHEL6

We consider the removing is worth doing in RHEL6, of course also RHEL7 because it
prevents users from confusion and questions about the reset behavior.

Regards,
Masayoshi Mizuma
Comment 11 Richard W.M. Jones 2013-09-20 04:52:03 EDT
I suggest we deprecate these calls in libguestfs too (bug 624334).
I didn't know that the block size was just a historical artifact.
Comment 12 Fujitsu kernel engineers 2013-10-08 04:19:14 EDT
Hi Oguma-san,

How about the current status of fixing the man page of blockdev(8) and guestfish(1)?
It seems that no one is assigned to that yet, so we afraid there is no progress about that.
Could you assign an engineer and fix the man pages by RHEL6.5GA release?

Regards,
Masayoshi Mizuma
Comment 13 Yoko Oguma 2013-10-08 04:38:51 EDT
(In reply to Fujitsu kernel engineers from comment #12)
Hi Mizuma-san,

> How about the current status of fixing the man page of blockdev(8) and
> guestfish(1)?
> It seems that no one is assigned to that yet, so we afraid there is no
> progress about that.
> Could you assign an engineer and fix the man pages by RHEL6.5GA release?

We are sorry that response becomes slow.
We cannot promise whether we can fix this in RHEL6.5GA, however I will confirm this to engineering, and update as soon as possible.

Best Regards,
Yoko Oguma
Comment 14 Richard W.M. Jones 2013-10-08 04:41:33 EDT
I don't think there is time to fix this for RHEL 6.5.  However 6.6 is
possible.

I have filed bugs against RHEL 6.6 and RHEL 7.0 for the
libguestfs issues: bug 624335, bug 1016465.
Comment 15 Richard W.M. Jones 2013-10-08 04:46:40 EDT
And for blockdev (ie. util-linux): bug 1016470, bug 1016471
Comment 16 Richard W.M. Jones 2013-10-08 10:10:09 EDT
Can we make this bug public?  It would help with discussion
of the blockdev patch.
Comment 17 Yoko Oguma 2013-10-08 20:51:12 EDT
Dear Fujitsu,

(In reply to Richard W.M. Jones from comment #16)
> Can we make this bug public?  It would help with discussion
> of the blockdev patch.

Could you please answer the above question? Thank you.

Best Regards,
Yoko Oguma
Comment 18 Fujitsu kernel engineers 2013-10-09 02:17:26 EDT
Hi Richard and Oguma-san,

> Can we make this bug public?

Yes, please.

Regards,
Masayoshi Mizuma
Comment 19 bfan 2013-10-09 06:08:06 EDT
Uncheck group " Red Hat Quality Assurance (internal)"  according to #C18 

(Requested by: Richard W.M. Jones)
Comment 20 RHEL Product and Program Management 2013-10-13 22:29:11 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 22 Fujitsu kernel engineers 2013-11-06 01:20:01 EST
Hi Oguma-san,

(In reply to Richard W.M. Jones from comment #14)
> I have filed bugs against RHEL 6.6 and RHEL 7.0 for the
> libguestfs issues: bug 624335, bug 1016465.

Please add a permission to see Bug 1016465 for us.

Regards,
Masayoshi Mizuma
Comment 23 Yoko Oguma 2013-11-06 01:57:03 EST
(In reply to Fujitsu kernel engineers from comment #22)
Hi Mizuma-san,

> (In reply to Richard W.M. Jones from comment #14)
> > I have filed bugs against RHEL 6.6 and RHEL 7.0 for the
> > libguestfs issues: bug 624335, bug 1016465.
> 
> Please add a permission to see Bug 1016465 for us.

My apologies. I added a permission to Bug 1016465 so that you can check the contents now. Thank you.

Best Regards,
Yoko Oguma
Comment 24 Fujitsu kernel engineers 2013-11-06 03:09:54 EST
Hi Oguma-san,

(In reply to Yoko Oguma from comment #23)
> My apologies. I added a permission to Bug 1016465 so that you can check the
> contents now. Thank you.

We can access Bug 1016465, thank you.

Regards,
Masayoshi Mizuma
Comment 25 Richard W.M. Jones 2013-11-06 06:25:01 EST
I've actually just made the bug completely public.
Comment 26 Jeff Moyer 2014-06-26 13:41:14 EDT
As this was determined to be a documentation bug for userspace utilities, I am closing this bug.

Note You need to log in before you can comment on or make changes to this bug.