Bug 455929
Summary: | [RHEL-5] partprobe fails on s390x on DASD with mounted file systems | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Brad Hinson <bhinson> | ||||||||||
Component: | parted | Assignee: | David Cantrell <dcantrell> | ||||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Release Test Team <release-test-team-automation> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 5.2 | CC: | atodorov, ddumas, hdegoede, jkachuck, jlaska, rlerch, syeghiay, tao | ||||||||||
Target Milestone: | rc | Keywords: | Patch, Reopened | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | s390x | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: |
Performing a System z installation when the install.img is located on direct access storage device (DASD) disk, will cause the installer to crash, returning a backtrace. anaconda is attempting to re-write (commit) all disk labels when partitioning is complete, but is failing because the partition is busy. To work around this issue, a non-DASD source should be used for install.img.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2010-06-10 15:50:17 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 499522, 502912, 540752 | ||||||||||||
Attachments: |
|
Description
Brad Hinson
2008-07-18 20:33:54 UTC
Created attachment 312169 [details]
patch to libparted/arch/linux.c
Open fd with O_DIRECT if defined. Previous test for defined(__s390__) and
defined(__s390x__) was incorrect.
This patch is already available in the parted that is going in RHEL 5.3, please test and reopen if buggy behaviour persists. Which version of parted contains this patch? I don't see it as of parted-1.8.1-19.el5. yep, you are totally right. I got my cables crossed. We had already patched parted upstream, not in RHEL5. This should be a good candidate for rhel5.4. The relative commit in upstream is 0faa0c9bd2b170a9ca87a981d32282d1f16c3341. Thx for the bug report. Created attachment 312764 [details] patch for dasd. The patch that was applied of upstream is a bit different from the one contained in comment #1. I'm going to go with whatever upstream has. This should be available in parted-1_8_1-20_el5. I applied the patch in comment #7 (but only the libparted/arch/linux.c part. The other part was already there) With the patch from comment 7, the problem still exists. Patch in comment 1 does allow partprobe to succeed with mounted file systems. Do you know the history of why O_DIRECT wasn't allowed for s390/s390x? Maybe it wasn't supported in the past, but it definitely is now. i.e. if O_DIRECT is defined (on any arch), we should use it. I'll commit the changes from comment #1 instead of #7. Reproducer is to execute `partprobe` when in a s390x machine and having dasda storage. be sure to have the s390x package and not the s390 one. This will be in parted-1.8.1-21.el5 Im talking this out of ON_QA because of https://bugzilla.redhat.com/show_bug.cgi?id=463917. The O_DIRECT check *is* necesary for the correct functioning of parted with dasda devs. the issue with partprobe may be another one. Created attachment 319189 [details] Call ioctl() on temporary read-only fd A different approach. Tested successfully on s390x and x86_64, but needs testing on related bug 463917. This is a good workaround, and it actually *does* work. But it doesn't address the real problem. In the patch you open a file descriptor with RD_MODE. Thing is, that it doesn't really matter if its RD_MODE or RW_MODE(it works with this one as well). What matter is that it avoids the LINUX_SPECIFIC macro that does something funcky with dasda/s390. Thx for the patch. Ok I think the posted patch has some strangeness to it that makes it misbehave. Its funny how I didn't see this before but there are a couple of "{}" missing from the section where it checks to see if the tmpfd has opened correctly. The posted patch simple opens a file descriptor to then close it immediately and return 0. It never gets to the ioctl call. Moreover this is not an O_DIRECT issue (not directly anyway). This is a "device is bussy" issue. If you go and try an ioctl(fd, BLKRRPART) on another arch it will consistently return a "device busy" error. Now, there is a different path followed when in s390x while handling dasdx devices. The path followed when *not* in dasd uses a different ioctl call. and uses other logic. This is the reason why it does not blow up in other arches. The question remains. Why was a partition mounted/used while the installer was executing? More specifically, can the exact way that rhel was insalled be specified so we can get a reproducer and I can see exactly where this issue needs to be addressed. Finally. If there is any bug pressent in parted it has to do with the fact that it can't modify partitions while they are being used (in s390 with dasd devices). I'll further explorte the issue to see if we can make parted forcefully modify the partitions. Because this is getting too complicated too close to the end of 5.3, I'm moving it to 5.4 to get sorted out. Denise Mike is out of the office right now, but I have a question on this. I assume that RHEL 5.4 is targeted for 6 months after 5.3 is released. Since 5.3 is tentatively scheduled for Jan 09, I'm guessing that this means we are looking at about Jul 09 for RHEL 5.4? Is getting a fix to the customer on this issue in July 09 the best we can do when they opened the bug in July 08? That is a 1 year time-to-fix. This event sent from IssueTracker by jwest issue 191497 If we had a reasonably-validated fix at this point I'd try to get it in to 5.3 even though code freeze was two weeks ago. But we don't. The original 'fix' completely blocked all S390 testing, thus our ability to ship Beta, before we backed it out. We can't afford to derail 5.3 again for this one problem, since there are so many other business reasons to keep 5.3 on schedule. Sorry. Created attachment 321085 [details] Call ioctl() on temporary read-only fd ( fix for missing {} ) In response to comment 17: oops. Corrected patch attached. Tested and works on busy (mounted) DASD. In response to comment 18: The partition is in use because installation was performed from an ISO image on disk, i.e. a disk install, not network. The ISO image(s) are placed on an existing ext2 file system, say /dev/dasda1 for example, then RHEL is installed to /dev/dasda2, /dev/dasdb1, etc. As for other arches, please correct me if I'm wrong, but based on partprobe.c, this should be the stack trace for everyone (most recent call last): main() (partprobe.c) process_dev() (partprobe.c) ped_disk_commit_to_os() (libparted/disk.c) ped_architecture->disk_ops->disk_commit() (libparted/disk.c) linux_disk_commit() (arch/linux.c) _kernel_reread_part_table (arch/linux.c) I think this could ultimately be a difference between how s390 handles ioctl(BLKRRPART) versus other arches. I modeled this patch (ioctl called on read/only file descriptor) from /sbin/fdasd, part of s390utils available on z. This patch should satisfy all arches. Joel, is the partprobe /dev/$disk reproducer in comment #0 still valid? With a recent 5.4/s390x tree I get: [root@z202 ~]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/dasda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) [root@z202 ~]# rpm -q parted parted-1.8.1-24.el5 parted-1.8.1-24.el5 [root@z202 ~]# partprobe /dev/dasda Warning: The kernel was unable to re-read the partition table on /dev/dasda (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/dasda. /dev/dasda has 2 partitions, one is mounted on /boot and the other is a PV part of VolGroup00. This sounds like fails_qa if the reproducer is still valid. I posted put this patch for rhel5 parted in hope that the test form Brad were sufficient. I now realize that we have a misunderstanding. Brad: Can you confirm that the current parted for RHEL5 (the one that has your patch) fails the partprobe test? If it fails, what was the difference with you test environment? The one that led you to believe what was said in Comment #22. There is clearly something that you had in that test machine that is not present in a normal rhel install. I think if we identify that aspect, we can move forward in this issue (though I still strongly believe that one should not be able to partition a mounted disk) If your test succeed, then there is something wrong with the rhts tests and we need to find out what that difference is and move forward from there. Alex: The reproducer is still valid. I'm fine with this going to fails_qa so we can have a better look at it. moving back to assigned based on comment #28 I've lost my original environment, and now unfortunately I'm getting the same error when running partprobe. The patch in comment 22 did fix the issue for me back then, but now it doesn't appear to be working. Looking deeper into this now. Comment 23 redux - If we had a reasonably-validated fix at this point I'd try to get it in to 5.4 even though code freeze was two weeks ago. But we don't. And once again this 'fix' completely blocked all S390 testing, thus our ability to ship Beta. Sorry, moving this one to 5.5. We'll do one more pass at this, but I'm going to ask that this be validated independently. I'm rebuilding parted for rhel5. Reverting the change. In this way parted for rhel5.4 will not have the patch and we can explore this further for rhel5.5. scratch that, can't do a rebuild as the bug state disallows me from committing. I told rel-eng to tag the previous version. Brad: Parted does not allow to modify a mounted partition. This, IMO, is the way that it should be. To modify the partition table on something that is mounted is just crazy. I think we should try to solve this issue from anacondas perspective. What was the procedure, from anacondas point of view, to reproduce this issue. What I want to do is to make sure that whatever we are doing with parted, is done to stuff that is not mounted. I was under the impression this was possible. On any x86* machine, partprobe returns success even on mounted filesystem: # uname -a Linux xxx 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux # mount | grep sda /dev/sda2 on / type ext3 (rw) /dev/sda1 on /boot type ext3 (rw) # partprobe /dev/sda # echo $? 0 If I try this same test on s390, partprobe fails with the same error anaconda fails with (traceback in comment 1). So I assumed it was the same issue. To reproduce in anaconda, all that's required is to attempt a harddrive install to DASD. When anaconda goes to format the disks, you get the traceback in comment 1. Brad: Your test looks good, but what is happening is not a "full partition probe". What is happening with devices that are _not_ DASD is a "best effort probe". The probe code tells the kernel to delete all the partitions from its tables and then, immediately after, it adds all the partitions back (adding new or deleting old partitions in the process). Now, when it receives an error on the BLKPG_DEL_PARTITION ioctl call, it ignores that partition. It assumes that the ioctl call failed because the partition was mounted. Since this behavior is expected, no error is thrown. With DASD devices, the path is different. And instead of returning after the process I just described, it does a BLKRRPART, which is not expected to fail. hence the error. Since the none DASD devices are handled with a partition granularity, we can handle the error efter the BLKPG_DEL_PARTITION call. But, since the DASD devices are handled with just one ioctl call (BLKRRPART), we are not sure if it was because the partition was mounted or some other reason. So an error is reasonable for me there. The solution might be to have a special code path for dasd devices in disk commit. Does this behavior occur in RHEL6? reassigning to anaconda to understand why we are trying to modify a mounted partition. Once we pinpoint that, we should stop doing it. The problem is that we loop over all disks and do a commit() of their partition table independent of whether that table was changed or not. For non dasd disks we get away with this because of the behaviour of parted, where it silently ignores commit errors, assuming busy partitions are unchanged (it does not check this!), this is changed in RHEL-6, here parted will error out on any busy partition. In RHEL-6 we also no longer commit the table for unchanged disks. For RHEL-5 this is probably best fixed with a release note, as this only impacts installation using a DASD harddisk source, which is an uncommon scenario and actually fixing this requires disruptive changes to anaconda. As stated in comment #38, we should note this problem in the release notes for 5.5. The problem has been corrected upstream, but really isn't worth backporting to RHEL-5. Candidate text: "When installing on the s390x platform and using a DASD volume for the installation source, the /sbin/partprobe command cannot be used from a shell during installation. Running the command will not cause any harm to the system, it just does not return any useful information. This problem has been corrected upstream and will not be present in RHEL 6.0." Event posted on 01-15-2010 10:46am EST by Glen Johnson ------- Comment From gmuelas.com 2010-01-15 10:42 EDT------- (In reply to comment #45) > Hello, > From the last update. The question appears to be: > Is is supported to use a local drive as install media, then remove the media. > This should be supported. > I am unsure of the meaning of this question: > Is this type of installation supported or is it restricted? > What do you mean by restricted? > Thank You > Joe Kachuck > Status set to: Waiting on Client Sorry, but that is not the question. The question is: Is this type (the type described in the first comment of this bugzilla, see steps to reproduce) of installation supported or is it restricted (=not supported and documented not to be supported)? So, is this issue (described in the first comment) fixed (ask QA)? - If Yes, no need for a Release Notes. - If No, should it be supported? If Yes. Then, please continue to work on fixing it and documented in Release Notes and Installation Guide that with that release is not working based on the steps described in the first comment. If No. Then as long as this feature is available in the installer for the customer, please document in Release Notes and Installation Guide what is not supported based on the steps described in the first comment. And in the long term, if this feature should remain to be not supported, please remove it from the installation path/package so that the customers do not try to use it. Thank you! Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by jkachuck issue 212814 Hans, Is this something that can be fixed in libparted now? If so, is it something we can fix in the RHEL-5 parted? Ok, so the problem is that for getting the kernel to update its view of the partition table, for dasd we make a regular reload table call, which fails in the install from dasd case, as 1 of the partitions is busy. On non dasd disks parted uses blkpg calls instead, but those do not work on dasd disks. Note tha the blkpg approach actually has various issues and that in RHEL-6 it has been completely dropped and parted uses the reload table call for all disk types. The install from harddisk case has been fixed in RHEL-6 by copying install.img (stage2) to the tmpfs stage1 is running from, thus freeing the partition so we can partition the disk. IOW I see no feasible way of fixing this and I believe that a release note for this is the best solution. Regards, Hans Event posted on 02-09-2010 04:11am EST by Glen Johnson ------- Comment From mgrf.com 2010-02-09 04:07 EDT------- (In reply to comment #49) > These changes made by hdegoede. > > The install from harddisk case has been fixed in RHEL-6 by copying install.img > (stage2) to the tmpfs stage1 is running from, thus freeing the partition so we > can partition the disk. > > IOW I see no feasible way of fixing this and I believe that a release note for > this is the best solution. > > https://bugzilla.redhat.com/show_bug.cgi?id=455929 Hello Red Hat, Hello Hans, Thx for your investigations, based on that I agree to a snipped in release notes for R5 stream. Would you please post your proposal for release notes? Great thanks in advance This event sent from IssueTracker by jkachuck issue 212814 Proposed technical note: ### Anaconda will back-trace when doing an installation with a dasd source When doing an s390 installation with stage2 (install.img) located on a dasd disk anaconda will exit with a back-trace. This is caused by anaconda re-writing (comitting) all disk labels when completing partitioning and the commit on the dasd disk holding stage2.img fails. This commit fails because the partition is busy and thus the re-reading of the partition table by the kernel fails. This issue can be worked around by using another stage2 source then a dasd disk. ### David, I'm moving this over to you for proof reading (as this is an s390 bug) and for corrections where necessary. Hans' technical note for this issue looks fine. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Anaconda will back-trace when doing an installation with a dasd source When doing an s390 installation with stage2 (install.img) located on a dasd disk anaconda will exit with a back-trace. This is caused by anaconda re-writing (comitting) all disk labels when completing partitioning and the commit on the dasd disk holding stage2.img fails. This commit fails because the partition is busy and thus the re-reading of the partition table by the kernel fails. This issue can be worked around by using another stage2 source then a dasd disk. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,6 +1 @@ -Anaconda will back-trace when doing an installation with a dasd source +Performing a System z installation when the install.img is located on direct access storage device (DASD) disk, will cause the installer to crash, returning a backtrace. anaconda is attempting to re-write (commit) all disk labels when partitioning is complete, but is failing because the partition is busy. To work around this issue, a non-DASD source should be used for install.img.- -When doing an s390 installation with stage2 (install.img) located on a dasd disk anaconda will exit with a back-trace. This is caused by anaconda re-writing -(comitting) all disk labels when completing partitioning and the commit on the dasd disk holding stage2.img fails. This commit fails because the partition is busy and thus the re-reading of the partition table by the kernel fails. - -This issue can be worked around by using another stage2 source then a dasd disk. This issue has been addressed in RHEL 6.0 and will not be addressed in RHEL-5. |