Bug 533492

Summary:

[LTC 6.0 FEAT] 201085:zFCP portion of original BZ

Product:

Red Hat Enterprise Linux 6

Reporter:

Denise Dumas <ddumas>

Component:

anaconda

Assignee:

David Cantrell <dcantrell>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Release Test Team <release-test-team-automation>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.0

CC:

bhinson, borgan, brueckner, bugproxy, dhorak, diehl, ejratl, gmuelas, jjarvis, jkachuck, jstodola, maier, pknirsch, rlerch, rpacheco, snagar, syeghiay, tao

Target Milestone:

Keywords:

FutureFeature, Reopened

Target Release:

6.0

Hardware:

s390x

OS:

All

Whiteboard:

Fixed In Version:

anaconda-13.21.47-1

Doc Type:

Enhancement

Doc Text:

Story Points:

---

Clone Of:

463544

Environment:

Last Closed:

2010-07-06 19:08:41 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

463544, 576015, 589278

Bug Blocks:

555224, 563347, 582286

Attachments:

Description	Flags
linux-2.6.32-s390-zfcp-unit-remove.patch	none

Comment 1 Denise Dumas 2009-11-06 21:24:30 UTC

This BZ is to address the following portion of the original BZ: 


>    Unmask and wait for appearance of devices needed (interactive and
> kickstart)[linuxrc.s390 already has support; backport for zfcp in anaconda]

As stated, linuxrc.s390 does this.  If zfcp support is missing from this
facility, file another bug and detail just that issue.

Comment 2 Steffen Maier 2009-11-07 14:24:36 UTC

Support for cio_ignore with zfcp in anaconda is upstream (and in rhel6-branch) since commit f2de4e76d7f8b8e7f21de371e42427096909a361.

Comment 3 Hans de Goede 2009-11-17 13:23:42 UTC

(In reply to comment #2)
> Support for cio_ignore with zfcp in anaconda is upstream (and in rhel6-branch)
> since commit f2de4e76d7f8b8e7f21de371e42427096909a361.  

Hmm, so I guess this can be closed then, David ?

Comment 4 David Cantrell 2009-11-18 00:45:31 UTC

Looks like it.  This fix has been present since anaconda-12.16-1.  Moving to MODIFIED.  I imagine we should at least get IBM to verify the functionality they want is there (even though the patch came from Steffen at IBM...).

Comment 5 releng-rhel@redhat.com 2009-11-19 13:35:52 UTC

Fixed in 'anaconda-12.16-1'. 'anaconda-12.38.5-1.el6' included in compose 'RHEL6.0-20091118.1'.
Moving to ON_QA.

Comment 9 Steffen Maier 2009-12-15 22:26:33 UTC

I'm sorry, I only realized this now, but anaconda's zfcp support only has support to unmask but NOT to wait for the appearance of devices that have just been unmasked. Back when the unmasking was implemented in those places, we were not aware that writing to /proc/cio_ignore was asynchronous and would not block. Not waiting for the device appearance might lead to strange error situations and zFCP disks not becoming available.
See also https://bugzilla.redhat.com/show_bug.cgi?id=463544#c15.

Comment 11 Jan Stodola 2010-03-16 15:20:24 UTC

Hello, as mentioned by Steffen in comment 9, anaconda doesn't wait for the appearance of zfcp devices. zfcp device is not available in anaconda gui when zfcp device is specified in CMS config file. When adding zfcp device in gui, the first attempt fails, the second attempt to add the device is successful. Moving back to ASSIGNED.

Comment 12 David Cantrell 2010-03-18 14:45:01 UTC

The storage/zfcp.py code needed updating to block on the cio_ignore free operation.  I've created a patch that has the code use /sbin/zfcp_cio_free from the s390utils package rather than writing our own thing.  I also changed the linuxrc.s390 script to just write out /etc/zfcp.conf rather than writing /tmp/fcpconfig.  The zfcp_cio_free script can just read /etc/zfcp.conf and go from there.

Comment 13 David Cantrell 2010-03-23 01:53:17 UTC

In order to fix this, I need the /sbin/zfcp_cio_free command to support specifying zFCP devices at the command line.  I filed bug #576015 requesting this and set that bug to block this one.

As for whether or not this bug is beta 1 blocker status, I personally don't think so.  The problem here is for people booting where zFCP devices are blacklisted.  They can boot up and have those devices excluded from the blacklist to complete an install of beta 1.  We can release note that if we want to.

Comment 16 Phil Knirsch 2010-03-25 17:49:32 UTC

Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
* The cio ignore implementation has not been completed for zFCP devices. In order to avoid installation problems on those devices the images/generic.prm file needs have the following entry instead:
    root=/dev/ram0 ro ip=off ramdisk_size=40000

Comment 17 Ryan Lerch 2010-04-15 02:13:10 UTC

added to the Beta1 Release notes as a known issue

Comment 18 David Cantrell 2010-05-01 02:01:03 UTC

*** Bug 587364 has been marked as a duplicate of this bug. ***

Comment 19 David Cantrell 2010-05-11 17:11:24 UTC

*** Bug 589278 has been marked as a duplicate of this bug. ***

Comment 20 Issue Tracker 2010-05-11 18:00:27 UTC

Event posted on 05-11-2010 01:54pm EDT by Glen Johnson

------- Comment From MAIER.com 2010-05-11 13:48 EDT-------
(In reply to comment #29)
> When DASD and ZFCP both are attached to the system and if only ZFCP lun
is
> selected for installation then I see the error sometimes. However the
error is
> not consisten and I am able to proceed with the install other times with
ZFCP
> lun.

This indeterministic part is indeed a duplicate of RH bug 533492.

> Also if I press back button and go to devices screen again then
sometimes ZFCP
> lun is not listed. If I try to attach same lun again, it says the device
is
> already attached.

However, this is not. Instead, this is what was already pointed out in
https://bugzilla.redhat.com/show_bug.cgi?id=587364#c15.

David, could you reopen and use this bug here as the "separate problem"
as you named it in
https://bugzilla.redhat.com/show_bug.cgi?id=587364#c16
?


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 24 David Cantrell 2010-05-25 20:50:50 UTC

Deleted Technical Notes Contents.

Old Contents:
* The cio ignore implementation has not been completed for zFCP devices. In order to avoid installation problems on those devices the images/generic.prm file needs have the following entry instead:
    root=/dev/ram0 ro ip=off ramdisk_size=40000

Comment 27 Jan Stodola 2010-05-31 13:35:16 UTC

Performed number of installations in graphical and text mode, with zFCP LUN(s) defined in CMS config file and also added manually via "Add Advanced Target" button. Tested with 1 - 7 zFCP LUNs. Also tested in rescue mode.

anaconda-13.21.48-1.el6, build RHEL6.0-20100527.2

Moving to VERIFIED.

Comment 28 Issue Tracker 2010-06-08 16:35:06 UTC

Event posted on 06-08-2010 12:51am EDT by Glen Johnson

File uploaded: zfcpAddException

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 745773

Comment 29 Issue Tracker 2010-06-08 16:35:10 UTC

Event posted on 06-08-2010 07:31am EDT by Glen Johnson

------- Comment From MAIER.com 2010-06-08 07:25 EDT-------
(In reply to comment #45)
> Created an attachment (id=54385) [details]
> zfcp Add Execption
>
> This problem is seen in RHEL6.0 Snap5 when tried to add zfcp lun

This is already known as
https://bugzilla.redhat.com/show_bug.cgi?id=595290
and fixed by
http://git.fedorahosted.org/git/?p=anaconda.git;a=commit;h=cbd823f0bf1d74d9c09281cae9b6a4dac9c96eed
in anaconda-13.21.46-1. Therefore, it should be fixed in snap6
(RHEL6.0-20100527.2) which contains anaconda-13.21.48-1.

However, with that fix, you're going to run into what's already known as
https://bugzilla.redhat.com/show_bug.cgi?id=597223
for which currently no fix exists yet.
anaconda 13.21.48 exception report
Traceback (most recent call first):
File "/usr/lib/anaconda/iw/filter_gui.py", line 69, in __contains__
return item["name"] in iter(self)
File "/usr/lib/anaconda/iw/filter_gui.py", line 438, in <lambda>
mpaths = filter(lambda d: d not in self._cachedMPaths, new_mpaths)
File "/usr/lib/anaconda/iw/filter_gui.py", line 438, in
_add_advanced_clicked
mpaths = filter(lambda d: d not in self._cachedMPaths, new_mpaths)
TypeError: list indices must be integers, not str

That said, the attached exception is not at all related to
this bug (LTC bug 62837 / RIT 840543 / RH bug 589278) here.


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 30 Issue Tracker 2010-06-15 16:11:02 UTC

Event posted on 06-15-2010 06:33am EDT by Glen Johnson

------- Comment From holger.dengler.com 2010-06-15 06:27 EDT-------
Installation failed after activating a ZFCP/SCSI device manually.
A DASD device and a SCSI device are selected as install targets. More
details see logs.


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 31 Issue Tracker 2010-06-15 16:11:08 UTC

Event posted on 06-15-2010 06:33am EDT by Glen Johnson

File uploaded: logs.tgz

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 769603

Comment 32 Issue Tracker 2010-06-16 17:11:58 UTC

Event posted on 06-16-2010 06:36am EDT by Glen Johnson

------- Comment From htengshe.com 2010-06-16 06:30 EDT-------
Hi,
With further testing on RHEL6.0 prebeta I see 2 problems.
1)
In basic devices screen if I add ZFCp LUN and then proceed till
partitioning screen, I see SCSI LUN. But  if I press back button and go to
devices screen again then ZFCP lun is not listed. If I try to attach same
lun again, it says the device is already attached.
I need to check if the problem is persistent.

2) In second case I had some DASDs and I added ZFCP lun. It was listed in
"Other SAN Devices" tab. However when I selected only ZFCP lun for
installation and presses next, it again threw the error "No usable disk
found".
Then ZFCP lun vanished from the "other SAN devices" tab and it was not
present in any other tab as well. At this point if I try to add the same
lun, it says "the lun is already attached to the system"
However the lun is not listed on the screen.

Uploading images for the same.


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 33 Issue Tracker 2010-06-16 17:12:01 UTC

Event posted on 06-16-2010 06:36am EDT by Glen Johnson

File uploaded: ZFCPError.JPG

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 773743

Comment 34 Issue Tracker 2010-06-16 17:12:03 UTC

Event posted on 06-16-2010 06:36am EDT by Glen Johnson

File uploaded: ZFCPError1.JPG

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 773753

Comment 35 Issue Tracker 2010-06-16 17:12:04 UTC

Event posted on 06-16-2010 06:37am EDT by Glen Johnson

File uploaded: ZFCPError2.JPG

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 773763

Comment 36 John Jarvis 2010-06-16 17:28:55 UTC

Open separate Bugzillas to report these bugs.

Comment 37 Issue Tracker 2010-06-17 14:47:58 UTC

Event posted on 06-17-2010 04:23am EDT by Glen Johnson

------- Comment From MAIER.com 2010-06-17 04:20 EDT-------
I'm pretty sure, this is all caused by the kernel race between scsi
delete and fcp unit_remove and therefore the same bug.
The last comments were just a verification attempt that failed.


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 38 Issue Tracker 2010-06-30 14:27:29 UTC

Event posted on 06-30-2010 06:13am EDT by Glen Johnson

------- Comment From brueckner.ibm.com 2010-06-30 06:05
EDT-------
The patch has been tested and fixes the problem. The patch will be sent
upstream for 2.6.36.

With best regards,
Hendrik


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 39 Issue Tracker 2010-06-30 14:27:33 UTC

Event posted on 06-30-2010 06:13am EDT by Glen Johnson

File uploaded: linux-2.6.32-s390-zfcp-unit-remove.patch

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 818133

Comment 40 Issue Tracker 2010-06-30 14:27:36 UTC

Event posted on 06-30-2010 06:13am EDT by Glen Johnson

<cde:attachment>
Comment on attachment: linux-2.6.32-s390-zfcp-unit-remove.patch

------- Comment on attachment From brueckner.ibm.com 2010-06-30
06:01 EDT-------


Description: zfcp: Remove SCSI device during unit_remove
Symptom:     When issuing the commands to delete a SCSI device and
             then to remove the zfcp unit from a script, the zfcp unit
             remove can fail.
Problem:     The unit_remove will fail when the reference count of the
             unit is not zero. When the SCSI device exists, it holds a
             reference to the unit. The upstream commit
             d9a9cdfb078d755e648d53ec25b7370f84ee5729 changed the
             deletion of a SCSI to be run asynchronously from a workqueue.
             With this change, the actual removal of the SCSI device
             can run after the unit_remove in zfcp and the
             unit_remove will fail.
Solution:    Get a reference to the SCSI device from the unit_remove
             function and remove the SCSI device from this function.
             If the SCSI device has already been deleted earlier,
             unit_remove cannot get the reference and does nothing.
             If the removal of the SCSI device is running on two threads,
             this is protected by the scan_mutex and the second one will
             exit early.
</cde:attachment>


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 41 John Jarvis 2010-06-30 14:51:23 UTC

I'm puzzled why I'm seeing a patch being posted to a BZ that is in VERIFIED state when I requested on June 16 that separate bugzillas be opened to track bug reports.  Why is this?

Comment 43 Thorsten Diehl 2010-06-30 15:50:11 UTC

John, the reason for no further new bugzillas you can find in comment 37 from MAIER.com 2010-06-17 04:20 EDT------- :
> I'm pretty sure, this is all caused by the kernel race between
> scsi delete and fcp unit_remove and therefore the same bug.
> The last comments were just a verification attempt that failed.

But, you're right, this bug had better been reopened.

Comment 44 Thorsten Diehl 2010-06-30 16:18:44 UTC

OK, Hendrik clarified this in comment 42 which is invisible for me; forget my comment 43.

Comment 45 releng-rhel@redhat.com 2010-07-02 20:41:55 UTC

Red Hat Enterprise Linux Beta 2 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Comment 46 Issue Tracker 2010-07-06 16:45:57 UTC

Event posted on 07-06-2010 04:33am EDT by Glen Johnson

------- Comment From htengshe.com 2010-07-06 03:54 EDT-------
The problem still exists in RHEl6.0 beta2.
I had added DASD and ZFCP LUN in parmfile using following parameters
DASD=exxx
FCP_1="0.0.3xxx 0x500507630303c562 0x4014402200000000"

During installation I selected only ZFCP lun, It gave error "No usable
disks have been found".

------- Comment From htengshe.com 2010-07-06 03:56 EDT-------
Reopening Bug


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 47 Issue Tracker 2010-07-06 16:46:05 UTC

Event posted on 07-06-2010 04:33am EDT by Glen Johnson

File uploaded: ZFCPInstallError.JPG

This event sent from IssueTracker by jkachuck 
 issue 840543
it_file 832833

Comment 48 Issue Tracker 2010-07-06 16:46:07 UTC

Event posted on 07-06-2010 04:34am EDT by Glen Johnson

<cde:attachment>
Comment on attachment: ZFCPInstallError.JPG

------- Comment (attachment only) From htengshe.com 2010-07-06
03:55 EDT-------
</cde:attachment>


This event sent from IssueTracker by jkachuck 
 issue 840543

Comment 49 David Cantrell 2010-07-06 16:56:01 UTC

Please attach the /tmp/anaconda* files so we can see what is going on.

Comment 50 Joseph Kachuck 2010-07-06 18:51:03 UTC

Created attachment 429855 [details]
linux-2.6.32-s390-zfcp-unit-remove.patch

Comment 51 Joseph Kachuck 2010-07-06 18:57:17 UTC

Hello,
This issue should most likely be able to be closed again. This will be worked through BZ 589278.

Thank You
Joe Kachuck

Comment 52 David Cantrell 2010-07-06 19:08:41 UTC

Thanks, Joe.  I've set this bug to depend on bug #589278 and I have moved it back to CLOSED CURRENTRELEASE which was the state at the time of comment #45.