Bug 455929

Summary: [RHEL-5] partprobe fails on s390x on DASD with mounted file systems
Product: Red Hat Enterprise Linux 5 Reporter: Brad Hinson <bhinson>
Component: partedAssignee: David Cantrell <dcantrell>
Status: CLOSED NEXTRELEASE QA Contact: Release Test Team <release-test-team-automation>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: atodorov, ddumas, hdegoede, jkachuck, jlaska, rlerch, syeghiay, tao
Target Milestone: rcKeywords: Patch, Reopened
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Performing a System z installation when the install.img is located on direct access storage device (DASD) disk, will cause the installer to crash, returning a backtrace. anaconda is attempting to re-write (commit) all disk labels when partitioning is complete, but is failing because the partition is busy. To work around this issue, a non-DASD source should be used for install.img.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-10 15:50:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 499522, 502912, 540752    
Attachments:
Description Flags
patch to libparted/arch/linux.c
none
patch for dasd.
none
Call ioctl() on temporary read-only fd
none
Call ioctl() on temporary read-only fd ( fix for missing {} ) none

Description Brad Hinson 2008-07-18 20:33:54 UTC
Description of problem:
Problem originally reported through Anaconda, attempting install from disk
partition on s390.  Just before formatting, error is printed:

Traceback (most recent call first):
File "/usr/lib/anaconda/partedUtils.py", line 895, in savePartitions
  disk.commit()
File "/usr/lib/anaconda/packages.py", line 147, in turnOnFilesystems
  anaconda.id.diskset.savePartitions ()
File "/usr/lib/anaconda/dispatch.py", line 201, in moveStep
  rc = stepFunc(self.anaconda)
File "/usr/lib/anaconda/dispatch.py", line 124, in gotoNext
  self.moveStep()
File "/usr/lib/anaconda/text.py", line 588, in run
  anaconda.dispatch.gotoNext()
File "/usr/bin/anaconda", line 982, in ?
  anaconda.intf.run(anaconda)
error: Warning: The kernel was unable to re-read the partition table on
/dev/dasdc (Device or resource busy).  This means Linux won't know anything
about the modifications you made until you reboot.  You should reboot your
computer before doing anything with /dev/dasdc.


Problem can be reproduced by running partprobe on /dev/dasda (assuming mounted
file system on dasda).  Same error is printed.  On x86, partprobe retuns
successfully, regardless of mounted file systems.

Version-Release number of selected component (if applicable):
parted-1.8.1-17.el5

How reproducible:
100%

Steps to Reproduce:
- partprobe /dev/dasda gives error on s390, while partprobe /dev/sda succeeds on
x86.

Comment 1 Brad Hinson 2008-07-18 20:36:10 UTC
Created attachment 312169 [details]
patch to libparted/arch/linux.c

Open fd with O_DIRECT if defined.  Previous test for defined(__s390__) and
defined(__s390x__) was incorrect.

Comment 2 Joel Andres Granados 2008-07-20 19:14:25 UTC
This patch is already available in the parted that is going in RHEL 5.3,  please
test and reopen if buggy behaviour persists.

Comment 3 Brad Hinson 2008-07-21 15:24:52 UTC
Which version of parted contains this patch?  I don't see it as of
parted-1.8.1-19.el5.

Comment 4 Joel Andres Granados 2008-07-21 16:17:32 UTC
yep, you are totally right.  I got my cables crossed.  We had already patched
parted upstream, not in RHEL5.  This should be a good candidate for rhel5.4.

The relative commit in upstream is 0faa0c9bd2b170a9ca87a981d32282d1f16c3341.

Thx for the bug report.

Comment 7 Joel Andres Granados 2008-07-28 11:40:34 UTC
Created attachment 312764 [details]
patch for dasd.

The patch that was applied of upstream is a bit different from the one
contained in comment #1.  I'm going to go with whatever upstream has.

Comment 8 Joel Andres Granados 2008-07-28 12:10:35 UTC
This should be available in parted-1_8_1-20_el5.  I applied the patch in comment
#7 (but only the libparted/arch/linux.c part.  The other part was already there)

Comment 10 Brad Hinson 2008-07-28 14:17:13 UTC
With the patch from comment 7, the problem still exists.  Patch in comment 1
does allow partprobe to succeed with mounted file systems.

Do you know the history of why O_DIRECT wasn't allowed for s390/s390x?  Maybe it
wasn't supported in the past, but it definitely is now.  i.e. if O_DIRECT is
defined (on any arch), we should use it.

Comment 11 Joel Andres Granados 2008-07-28 16:31:04 UTC
I'll commit the changes from comment #1 instead of #7.
Reproducer is to execute `partprobe` when in a s390x machine and having dasda
storage.
be sure to have the s390x package and not the s390 one.  This will be in
parted-1.8.1-21.el5

Comment 13 Joel Andres Granados 2008-10-01 19:35:57 UTC
Im talking this out of ON_QA because of https://bugzilla.redhat.com/show_bug.cgi?id=463917.  The O_DIRECT check *is* necesary for the correct functioning of parted with dasda devs.
the issue with partprobe may be another one.

Comment 15 Brad Hinson 2008-10-02 03:53:46 UTC
Created attachment 319189 [details]
Call ioctl() on temporary read-only fd

A different approach.  Tested successfully on s390x and x86_64, but needs testing on related bug 463917.

Comment 16 Joel Andres Granados 2008-10-02 15:52:25 UTC
This is a good workaround, and it actually *does* work.  But it doesn't address the real problem.  In the patch you open a file descriptor with RD_MODE.  Thing is, that it doesn't really matter if its RD_MODE or RW_MODE(it works with this one as well).  What matter is that it avoids the LINUX_SPECIFIC macro that does something funcky with dasda/s390.
Thx for the patch.

Comment 17 Joel Andres Granados 2008-10-03 11:22:20 UTC
Ok I think the posted patch has some strangeness to it that makes it misbehave.  Its funny how I didn't see this before but there are a couple of "{}" missing from the section where it checks to see if the tmpfd has opened correctly.  The posted patch simple opens a file descriptor to then close it immediately and return 0.  It never gets to the ioctl call.

Comment 18 Joel Andres Granados 2008-10-03 16:03:18 UTC
Moreover this is not an O_DIRECT issue (not directly anyway).  This is a "device is bussy" issue.  If you go and try an ioctl(fd, BLKRRPART) on another arch it will consistently return a "device busy" error.

Now, there is a different path followed when in s390x while handling dasdx devices.  The path followed when *not* in dasd uses a different ioctl call.  and uses other logic.  This is the reason why it does not blow up in other arches.

The question remains.  Why was a partition mounted/used while the installer was executing?  More specifically, can the exact way that rhel was insalled be specified so we can get a reproducer and I can see exactly where this issue needs to be addressed.

Finally. If there is any bug pressent in parted it has to do with the fact that it can't modify partitions while they are being used (in s390 with dasd devices).  I'll further explorte the issue to see if we can make parted forcefully modify the partitions.

Comment 19 Denise Dumas 2008-10-03 16:34:29 UTC
Because this is getting too complicated too close to the end of 5.3, I'm moving it to 5.4 to get sorted out.
Denise

Comment 20 Issue Tracker 2008-10-03 20:25:33 UTC
Mike is out of the office right now, but I have a question on this.  

I assume that RHEL 5.4 is targeted for 6 months after 5.3 is released. 
Since 5.3 is tentatively scheduled for Jan 09, I'm guessing that this
means we are looking at about Jul 09 for RHEL 5.4?  Is getting a fix to
the customer on this issue in July 09 the best we can do when they opened
the bug in July 08?  That is a 1 year time-to-fix.


This event sent from IssueTracker by jwest 
 issue 191497

Comment 21 Denise Dumas 2008-10-03 20:38:35 UTC
If we had a reasonably-validated fix at this point I'd try to get it in to 5.3 even though code freeze was two weeks ago. But we don't. The original 'fix' completely blocked all S390 testing, thus our ability to ship Beta, before we backed it out. We can't afford to derail 5.3 again for this one problem, since there are so many other business reasons to keep 5.3 on schedule. Sorry.

Comment 22 Brad Hinson 2008-10-21 20:12:20 UTC
Created attachment 321085 [details]
Call ioctl() on temporary read-only fd ( fix for missing {} )

In response to comment 17:
oops.  Corrected patch attached.  Tested and works on busy (mounted) DASD.

In response to comment 18:
The partition is in use because installation was performed from an ISO image on disk, i.e. a disk install, not network.  The ISO image(s) are placed on an existing ext2 file system, say /dev/dasda1 for example, then RHEL is installed to /dev/dasda2, /dev/dasdb1, etc.

As for other arches, please correct me if I'm wrong, but based on partprobe.c, this should be the stack trace for everyone (most recent call last):

main() (partprobe.c)
process_dev() (partprobe.c)
ped_disk_commit_to_os() (libparted/disk.c)
ped_architecture->disk_ops->disk_commit() (libparted/disk.c)
linux_disk_commit() (arch/linux.c)
_kernel_reread_part_table (arch/linux.c)

I think this could ultimately be a difference between how s390 handles ioctl(BLKRRPART) versus other arches.  I modeled this patch (ioctl called on read/only file descriptor) from /sbin/fdasd, part of s390utils available on z.

This patch should satisfy all arches.

Comment 27 Alexander Todorov 2009-05-27 10:58:30 UTC
Joel,
is the partprobe /dev/$disk reproducer in comment #0 still valid? 

With a recent 5.4/s390x tree I get:

[root@z202 ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/dasda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[root@z202 ~]# rpm -q parted
parted-1.8.1-24.el5
parted-1.8.1-24.el5

[root@z202 ~]# partprobe /dev/dasda
Warning: The kernel was unable to re-read the partition table on /dev/dasda (Device or resource busy).  This means Linux won't know anything about the modifications you made until you reboot.  You should reboot your computer before doing anything with /dev/dasda.

/dev/dasda has 2 partitions, one is mounted on /boot and the other is a PV part of VolGroup00. This sounds like fails_qa if the reproducer is still valid.

Comment 28 Joel Andres Granados 2009-05-27 12:29:22 UTC
I posted put this patch for rhel5 parted in hope that the test form Brad were sufficient.  I now realize that we have a misunderstanding.

Brad:
Can you confirm that the current parted for RHEL5 (the one that has your patch) fails the partprobe test?

If it fails, what was the difference with you test environment?  The one that led you to believe what was said in  Comment #22.  There is clearly something that you had in that test machine that is not present in a normal rhel install.  I think if we identify that aspect, we can move forward in this issue (though I still strongly believe that one should not be able to partition a mounted disk)

If your test succeed, then there is something wrong with the rhts tests and we need to find out what that difference is and move forward from there.

Alex: The reproducer is still valid.  I'm fine with this going to fails_qa so we can have a better look at it.

Comment 29 Alexander Todorov 2009-05-27 12:57:23 UTC
moving back to assigned based on comment #28

Comment 30 Brad Hinson 2009-05-27 20:54:06 UTC
I've lost my original environment, and now unfortunately I'm getting the same error when running partprobe.  The patch in comment 22 did fix the issue for me back then, but now it doesn't appear to be working.

Looking deeper into this now.

Comment 31 Denise Dumas 2009-06-02 19:16:40 UTC
Comment 23 redux - If we had a reasonably-validated fix at this point I'd try to get it in to 5.4 even though code freeze was two weeks ago. But we don't. And once again this 'fix' completely blocked all S390 testing, thus our ability to ship Beta.

Sorry, moving this one to 5.5.  We'll do one more pass at this, but I'm going to ask that this be validated independently.

Comment 32 Joel Andres Granados 2009-06-03 09:01:13 UTC
I'm rebuilding parted for rhel5.  Reverting the change.  In this way parted for rhel5.4 will not have the patch and we can explore this further for rhel5.5.

Comment 33 Joel Andres Granados 2009-06-03 09:24:41 UTC
scratch that, can't do a rebuild as the bug state disallows me from committing.  I told rel-eng to tag the previous version.

Comment 34 Joel Andres Granados 2009-08-14 11:07:07 UTC
Brad:
Parted does not allow to modify a mounted partition.  This, IMO, is the way that it should be.  To modify the partition table on something that is mounted is just crazy.  I think we should try to solve this issue from anacondas perspective.  What was the procedure, from anacondas point of view, to reproduce this issue.

What I want to do is to make sure that whatever we are doing with parted, is done to stuff that is not mounted.

Comment 35 Brad Hinson 2009-08-14 17:39:01 UTC
I was under the impression this was possible.  On any x86* machine, partprobe returns success even on mounted filesystem:

# uname -a
Linux xxx 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
# mount | grep sda
/dev/sda2 on / type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
# partprobe /dev/sda
# echo $?
0

If I try this same test on s390, partprobe fails with the same error anaconda fails with (traceback in comment 1).  So I assumed it was the same issue.

To reproduce in anaconda, all that's required is to attempt a harddrive install to DASD.  When anaconda goes to format the disks, you get the traceback in comment 1.

Comment 36 Joel Andres Granados 2009-08-17 09:21:34 UTC
Brad:
Your test looks good,  but what is happening is not a "full partition probe".  What is happening with devices that are _not_ DASD is a "best effort probe".
The probe code tells the kernel to delete all the partitions from its tables and then, immediately after, it adds all the partitions back (adding new or deleting old partitions in the process).  Now, when it receives an error on the BLKPG_DEL_PARTITION ioctl call, it ignores that partition.  It assumes that the ioctl call failed because the partition was mounted.  Since this behavior is expected, no error is thrown.

With DASD devices, the path is different.  And instead of returning after the process I just described, it does a BLKRRPART, which is not expected to fail.  hence the error.

Since the none DASD devices are handled with a partition granularity, we can handle the error efter the BLKPG_DEL_PARTITION call.  But, since the DASD devices are handled with just one ioctl call (BLKRRPART), we are not sure if it was because the partition was mounted or some other reason.  So an error is reasonable for me there.

The solution might be to have a special code path for dasd devices in disk commit.  Does this behavior occur in RHEL6?

Comment 37 Denise Dumas 2009-09-10 13:12:08 UTC
reassigning to anaconda to understand why we are trying to modify a mounted partition.  Once we pinpoint that, we should stop doing it.

Comment 38 Hans de Goede 2009-11-24 10:06:12 UTC
The problem is that we loop over all disks and do a commit() of their partition table independent of whether that table was changed or not.

For non dasd disks we get away with this because of the behaviour of parted, where
it silently ignores commit errors, assuming busy partitions are unchanged
(it does not check this!), this is changed in RHEL-6, here parted will error out
on any busy partition. In RHEL-6 we also no longer commit the table for unchanged
disks.

For RHEL-5 this is probably best fixed with a release note, as this only impacts installation using a DASD harddisk source, which is an uncommon scenario and actually fixing this requires disruptive changes to anaconda.

Comment 39 David Cantrell 2009-12-21 01:24:02 UTC
As stated in comment #38, we should note this problem in the release notes for 5.5.  The problem has been corrected upstream, but really isn't worth backporting to RHEL-5.

Candidate text:

"When installing on the s390x platform and using a DASD volume for the installation source, the /sbin/partprobe command cannot be used from a shell during installation.  Running the command will not cause any harm to the system, it just does not return any useful information.  This problem has been corrected upstream and will not be present in RHEL 6.0."

Comment 42 Issue Tracker 2010-01-15 16:00:05 UTC
Event posted on 01-15-2010 10:46am EST by Glen Johnson

------- Comment From gmuelas.com 2010-01-15 10:42 EDT-------
(In reply to comment #45)
> Hello,
> From the last update. The question appears to be:
> Is is supported to use a local drive as install media, then remove the
media.
> This should be supported.
> I am unsure of the meaning of this question:
> Is this type of installation supported or is it restricted?
> What do you mean by restricted?
> Thank You
> Joe Kachuck
> Status set to: Waiting on Client

Sorry, but that is not the question. The question is:
Is this type (the type described in the first comment of this bugzilla,
see steps to reproduce) of installation supported or is it restricted
(=not supported and documented not to be supported)?

So, is this issue (described in the first comment) fixed (ask QA)?
- If Yes, no need for a Release Notes.
- If No, should it be supported?
If Yes. Then, please continue to work on fixing it and documented in
Release Notes and Installation Guide that with that release is not working
based on the steps described in the first comment.
If No. Then as long as this feature is available in the installer for the
customer, please document in Release Notes and Installation Guide what is
not supported based on the steps described in the first comment. And in
the long term, if this feature should remain to be not supported, please
remove it from the installation path/package so that the customers do not
try to use it.

Thank you!

Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by jkachuck 
 issue 212814

Comment 43 David Cantrell 2010-01-20 18:31:47 UTC
Hans,

Is this something that can be fixed in libparted now?  If so, is it something we can fix in the RHEL-5 parted?

Comment 44 Hans de Goede 2010-01-20 19:58:56 UTC
Ok,

so the problem is that for getting the kernel to update its view of the
partition table, for dasd we make a regular reload table call, which fails in the install from dasd case, as 1 of the partitions is busy.

On non dasd disks parted uses blkpg calls instead, but those do not work
on dasd disks. Note tha the blkpg approach actually has various issues and that
in RHEL-6 it has been completely dropped and parted uses the reload table call
for all disk types.

The install from harddisk case has been fixed in RHEL-6 by copying install.img (stage2) to the tmpfs stage1 is running from, thus freeing the partition so we
can partition the disk.

IOW I see no feasible way of fixing this and I believe that a release note for this is the best solution.

Regards,

Hans

Comment 45 Issue Tracker 2010-02-09 15:43:53 UTC
Event posted on 02-09-2010 04:11am EST by Glen Johnson

------- Comment From mgrf.com 2010-02-09 04:07 EDT-------
(In reply to comment #49)
> These changes made by hdegoede.
>
> The install from harddisk case has been fixed in RHEL-6 by copying
install.img
> (stage2) to the tmpfs stage1 is running from, thus freeing the partition
so we
> can partition the disk.
>
> IOW I see no feasible way of fixing this and I believe that a release
note for
> this is the best solution.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=455929

Hello Red Hat,
Hello Hans,

Thx for your investigations, based on that I agree to a snipped in release
notes for R5 stream.
Would you please post your proposal for release notes?

Great thanks in advance


This event sent from IssueTracker by jkachuck 
 issue 212814

Comment 47 Hans de Goede 2010-03-15 09:16:54 UTC
Proposed technical note:

###

Anaconda will back-trace when doing an installation with a dasd source

When doing an s390 installation with stage2 (install.img) located on a dasd
disk anaconda will exit with a back-trace. This is caused by anaconda re-writing
(comitting) all disk labels when completing partitioning and the commit on the
dasd disk holding stage2.img fails. This commit fails because the partition
is busy and thus the re-reading of the partition table by the kernel fails.

This issue can be worked around by using another stage2 source then a dasd disk.

###

David, I'm moving this over to you for proof reading (as this is an s390 bug) and
for corrections where necessary.

Comment 48 David Cantrell 2010-03-18 13:45:18 UTC
Hans' technical note for this issue looks fine.

Comment 49 Hans de Goede 2010-03-18 14:54:21 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Anaconda will back-trace when doing an installation with a dasd source

When doing an s390 installation with stage2 (install.img) located on a dasd disk anaconda will exit with a back-trace. This is caused by anaconda re-writing
(comitting) all disk labels when completing partitioning and the commit on the dasd disk holding stage2.img fails. This commit fails because the partition is busy and thus the re-reading of the partition table by the kernel fails.

This issue can be worked around by using another stage2 source then a dasd disk.

Comment 50 Ryan Lerch 2010-03-19 03:17:05 UTC
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,6 +1 @@
-Anaconda will back-trace when doing an installation with a dasd source
+Performing a System z installation when the install.img  is located on direct access storage device (DASD) disk, will cause the installer to crash, returning a backtrace. anaconda is attempting to re-write (commit) all disk labels when partitioning is complete, but is failing because the partition is busy. To work around this issue, a non-DASD source should be used for install.img.-
-When doing an s390 installation with stage2 (install.img) located on a dasd disk anaconda will exit with a back-trace. This is caused by anaconda re-writing
-(comitting) all disk labels when completing partitioning and the commit on the dasd disk holding stage2.img fails. This commit fails because the partition is busy and thus the re-reading of the partition table by the kernel fails.
-
-This issue can be worked around by using another stage2 source then a dasd disk.

Comment 53 David Cantrell 2010-06-10 15:50:17 UTC
This issue has been addressed in RHEL 6.0 and will not be addressed in RHEL-5.