Bug 454872
Summary: | [NetApp 4.8 bug] online resize of filesystem does not work | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Tanvi <tanvi> |
Component: | kernel | Assignee: | Jeff Moyer <jmoyer> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.8 | CC: | ahecox, andriusb, bmarzins, coughlan, marting, mchristi, naveenr, rlerch, tanvi, tao, xdl-redhat-bugzilla |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | 4.8 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Red Hat Enterprise Linux 4.8 can detect online growing or shrinking of an underlying block device. However, there is no method to automatically detect that a device has changed size, so manual steps are required to recognize this and resize any file systems which reside on the given device(s). When a resized block device is detected, a message like the following will appear in the system logs:
VFS: busy inodes on changed media or resized disk sdi
If the block device was grown, then this message can be safely ignored. However, if the block device was shrunk without shrinking any data set on the block device first, the data residing on the device may be corrupted.
It is only possible to do an online resize of a filesystem that was created on the entire LUN (or block device). If there is a partition table on the block device, then the file system will have to be unmounted to update the partition table.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-05-18 19:31:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 444964, 480338 | ||
Bug Blocks: | 450897, 458123, 458752, 461297, 479684 | ||
Attachments: |
Description
Tanvi
2008-07-10 14:02:31 UTC
This is highly dependent on RHEL 5.3 inclusion, so we'll follow its lead on this. Furthermore this could be something we may not have capacity for in 4.8 since it will be a small release. Updating PM score. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 317386 [details]
wrapper for lower level revalidate_disk routines
Created attachment 317387 [details]
adjust block device size after an online resize of a disk
Created attachment 317388 [details]
check for device resize when rescanning partitions
Created attachment 317389 [details]
scsi sd driver calls revalidate_disk wrapper
Created attachment 317390 [details]
add flush_disk to factor out common buffer cache flushing code
Created attachment 317391 [details]
call flush_disk after detecting an online resize
The above patches are backports of the upstream patch set from Andrew Patterson. They have not yet been through any sort of testing. I'll update the bug when I have testing results. Committed in 78.26.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Can we get customer testing on this kernel, please? Thanks! I tested it with 78.28.EL kernel. Things are not working. I had to unmount and then remount the SCSI device before ext2online could resize the filesystem. (In reply to comment #14) > I tested it with 78.28.EL kernel. Things are not working. I had to unmount and > then remount the SCSI device before ext2online could resize the filesystem. Thanks for the quick testing turn-around, Tanvi! I'll look into this immediately. (In reply to comment #14) > I tested it with 78.28.EL kernel. Things are not working. I had to unmount and > then remount the SCSI device before ext2online could resize the filesystem. Hi, Tanvi, I just tried this with the 78.29.EL kernel, and it works for me. Could you provide more information on your test procedure so that I can try to reproduce the problem? These are the steps I took: service iscsi start mkfs -t ext3 /dev/sdi mount /dev/sdi /mnt/equallogic/ cd /mnt/equallogic/ touch x touch y dd if=/dev/zero of=foo bs=1M count=100 sync sync # login to iscsi target and resize the lun iscsi-rescan ext2online /dev/sdi df -h . Thanks! Hi Jeffrey, I did follow the same steps. I retested with 78.29.EL kernel and was able to resize an online SCSI device. Thank you. But to resize a file system which was created on top of a multipathed device, I had to follow following steps 1.unmount the device 2.flush the map 3.create the map again 4.remount it 5.ext2online Is it expected to be fixed in RHEL4.8? Hi, Tanvi, Sorry I didn't test with a multipath device! I just went ahead and did so, and I got it to work, but it's even worse than RHEL 5! For the most part, the procedure is the same as the RHEL 5 procedure. I'll spell it out here, though: service iscsi start service multipathd start mkfs -t ext3 /dev/mpath/mpathX mount /dev/mpath/mpathX /mnt/equallogic/ cd /mnt/equallogic/ touch x touch y dd if=/dev/zero of=foo bs=1M count=100 # resize the iscsi target iscsi-rescan dmsetup table mpathX > /root/newtab # modify newtab to have the new end sector of the device in column 2 dmsetup suspend /dev/mpath/mpathX dmsetup reload /dev/mpath/mpathX /root/newtab dmsetup resume /dev/mpath/mpathX # and now the stupid part. ext2online sees that /dev/mpath/mpathX is a # symbolic link to /dev/dm-X, and so looks in /etc/mtab for /dev/dm-X. # Of course, that doesn't exist, so it fails. So, I modified /etc/mtab to # put the dm-X device in place of the mpath device, and: ext2online /dev/mpath/mpath9 And that worked for me. Strangely enough, I couldn't use --force, nor could I just pass in /dev/dm-X. I'm quite puzzled by ext2online's reticence to actually do what you want. I'll file a bug on that. Now, I know a bug was filed to update the user-space tools to allow this online resizing to be less painful using the multipath utilities in RHEL 5. Has the same bug been filed for RHEL 4? If now, we should dup the RHEL 5 bug to RHEL 4. I'll get to work on the ext2online bug. Thanks again for your patience and your testing, Tanvi. It is much appreciated! I filed bug 480338 to track the e2fsprogs (ext2online) issue. Yes, the user-space bug for multipath utilities has been cloned for RHEL 4.8. It is tracked at bugzilla #479684. Did you try just running # multipath After the underlying block device has been resized. In RHEL4, it should already do the same thing as the manual method in Comment #18 did. Strangely, when I try this, ext2online fails for me [root@ask-06 mnt]# ext2online /dev/mapper/mpath7 ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b error: Input/output error: read -1 of 16384 bytes at 4096 However, it fails just the same using the method in Comment #18, so I'm not sure if this is a completely unrelated problem. (In reply to comment #21) > Did you try just running > # multipath > After the underlying block device has been resized. In RHEL4, it should > already do the same thing as the manual method in Comment #18 did. Hi, Ben. I'll assume your question is addressed to me. No, I didn't try that, and yes, it does work. > Strangely, when I try this, ext2online fails for me > > [root@ask-06 mnt]# ext2online /dev/mapper/mpath7 > ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b > error: Input/output error: read -1 of 16384 bytes at 4096 > > However, it fails just the same using the method in Comment #18, so I'm not > sure if this is a completely unrelated problem. I've never seen that. You'll need to provide a whole lot more information if we're to debug it, though. I'm using an X86_64 box with the RHEL 4.8 1/4/09 nightly build installed. It's connected to a Winchestor storage array via FC using the qla2400 driver. I'm running the 2.6.9-78.30.ELsmp kernel. The multipath device I'm using looks like: mpath7 (3600d0230000000000e13955cc3757806) [size=48 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1][active] \_ 5:0:0:6 sdh 8:112 [active][ready] The commands I'm running are: # multipath # mkfs -t ext3 /dev/mapper/mpath7 # mount /dev/mapper/mpath7 /mnt/test # echo 1 > /sys/block/sdh/device/rescan # multipath # ext2online After more testing, I've found that it works just fine if I start with 51200000 block device, and resize to a 307200000 block device. However, it fails when I try to go from a 51200000 block device to a 614400000 block device. I'm not sure exactly at what size it starts failing, but it does seem to be size dependent. Some more information. I was wrong. It seems to happen randomly. It just looked like it was size dependent for a couple of runs. Also, when this happens, all IO to the device seems to fail. If you try to do a dd from the multipath device, it will fail as well. However unmounting the filesystem fixes this. I found this in your system logs: Jan 22 09:46:47 ask-06 kernel: kjournald starting. Commit interval 5 seconds Jan 22 09:46:47 ask-06 kernel: EXT3 FS on dm-2, internal journal Jan 22 09:46:47 ask-06 kernel: EXT3-fs: mounted filesystem with ordered data mod e. Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 8208 Jan 22 09:47:17 ask-06 kernel: device-mapper: dm-multipath: Failing path 8:112. Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 10 27 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 12312 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 15 39 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: end_request: I/O error, dev sdh, sector 8 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 1 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:47:17 ask-06 kernel: Buffer I/O error on device dm-2, logical block 10 26 Jan 22 09:47:17 ask-06 kernel: lost page write due to I/O error on dm-2 Jan 22 09:48:51 ask-06 kernel: SCSI device sdh: 1228800000 512-byte hdwr sectors (629146 MB) Jan 22 09:48:51 ask-06 kernel: SCSI device sdh: drive cache: write back Jan 22 09:48:51 ask-06 kernel: sdh: detected capacity change from 52428800000 to 629145600000 /dev/mpath/mpath7 is a symbolic link to /dev/dm-2. It looks like those I/O errors were present before any resizing was done. Is that right? You can read the first 4k of the multipath disk just fine: [root@ask-06 mnt]# dd if=/dev/mapper/mpath7 of=/dev/null bs=4k count=1 1+0 records in 1+0 records out But try to read the next 4k and you get a failure: [root@ask-06 mnt]# dd if=/dev/mapper/mpath7 of=/dev/null bs=4k count=2 dd: reading `/dev/mapper/mpath7': Input/output error 1+0 records in 1+0 records out I/O to the underlying sd device works just fine: [root@ask-06 mnt]# dd if=/dev/sdh of=/dev/null bs=4k count=2 2+0 records in 2+0 records out I'd like to know what caused the I/O errors in the first place. Ben mentioned that it might have happened during the online resize, as he has to unmap the LUN before growing it. At any rate, the problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an error. I proposed this patch upstream: http://lkml.org/lkml/2009/1/23/288 A simple way around the problem is to mmap the device and read from the locations that are giving I/O errors (but that's hardly acceptable!). So, if the I/O errors are indeed from taking the LUN offline, then we will have to update our documentation to perhaps suggest suspending the device mapper device before doing the resize of the storage device. Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Red Hat Enterprise Linux 4.8 can detect online growing or shrinking of an underlying block device. However, there is no method to automatically detect that a device has changed size, so manual steps are required to recognize this and resize any file systems which reside on the given device(s). When a resized block device is detected, a message like the following will appear in the system logs: VFS: busy inodes on changed media or resized disk sdi If the block device was grown, then this message can be safely ignored. However, if the block device was shrunk without shrinking any data set on the block device first, the data residing on the device may be corrupted. It is only possible to do an online resize of a filesystem that was created on the entire LUN (or block device). If there is a partition table on the block device, then the file system will have to be unmounted to update the partition table. ~~ Attention Partners! ~~ RHEL 4.8 Partner Alpha has been released on partners.redhat.com. There should be a fix present in the Beta, which addresses this bug. If you have already completed testing your other URGENT priority bugs, and you still haven't had a chance yet to test this bug, please do so at your earliest convenience, to ensure that only the highest possible quality bits are shipped in the upcoming public Beta drop. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. Thanks, more information about Beta testing to come. - Red Hat QE Partner Management Verified it in RHEL4.8 successfully. Steps followed - (Taken from Comment 18) 1. Map a LUN from a NetApp Controller (Fibre Channel target) 2. Discover it on the host 3. mkfs -t ext3 /dev/mapper/mpath5 4. mount /dev/mapper/mpath5 mnt1/ 5. touch x touch y dd if=/dev/zero of=foo bs=1M count=100 6. Resize the LUN and then did a rescan on the host 7. dmsetup table mpath5 > /root/newtab 8. Modified newtab to have new end sector of the device. 9. dmsetup suspend /dev/mapper/mpath5 dmsetup reload /dev/mapper/mpath5 /root/newtab dmsetup resume /dev/mapper/mpath5 ext2online /dev/mapper/mpath5 Online resize happened successfully. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html |