Bug 510524

Summary: fix zFCP in anaconda and also make it work with changed sysfs interface of device driver
Product: [Fedora] Fedora Reporter: Steffen Maier <maier>
Component: anacondaAssignee: David Cantrell <dcantrell>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: hpicht, karsten, rmaximo, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-25 05:33:48 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 467765    
Attachments:
Description Flags
[PATCH 1/5] correctly activate zFCP LUN on s390
none
[PATCH 2/5] correctly delete a SCSI device provided by a zFCP LUN on s3 90
none
[PATCH 3/5] correctly deactivate zFCP LUN on s390
none
[PATCH 4/5] error messages of zFCP on s390: log or pass to the UI
none
[PATCH 5/5] prevent getting started up or shutdown again while already in such state none

Description Steffen Maier 2009-07-09 12:06:00 EDT
Description of problem:
zFCP LUNs (more specifically disks in this context) cannot be activated in anaconda, neither by specifying FCP_* options in the parmfile or conffile, no by using "add zFCP" in the advanced storage configuration of the GUI.

On going back through the wizard screens in anaconda, the storage subsystem gets shutdown at roughly the first screen and if zFCP LUNs would have been active before, they are shutdown incorrectly causing all kinds of kernel error messages, e.g. SCSI devices can no longer access their LUN.

Version-Release number of selected component (if applicable):
anaconda-11.5.0.51-1.fc11.s390x

How reproducible:
In parm file, conf file, or the anaconda UI, try to add zFCP LUNs.
With actived zFCP LUNs, go backwards in the anaconda UI up to the first screen.

Actual results:
zFCP LUNs (disks) cannot be activated and also do not get deactivated correctly.

Expected results:
zFCP LUNs can be activated and deactivated by all means provided by anaconda.

Additional info:
Patch with fix will follow.

More details can be found in a somewhat related bug against RHEL 5.3:
Bug 494033 - upgrade on FCP disks impossible (possibly also on iSCSI)

The essential parts of #494033 which apply here:

> 13:57:12 INFO    : moving (1) to step partitionobjinit
> 13:57:12 DEBUG   : echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_add
> 13:57:12 DEBUG   : echo 0x401040ea00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> 13:57:12 DEBUG   : echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> 13:57:12 DEBUG   : echo 0x401040eb00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> 13:57:12 DEBUG   : echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> 13:57:12 DEBUG   : echo 0x401040ea00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> 13:57:12 WARNING : error bringing zfcp device 0.0.3c1b online: [Errno 22] Invalid argument
> 13:57:12 DEBUG   : echo 0x401040eb00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> 13:57:12 WARNING : error bringing zfcp device 0.0.3c1b online: [Errno 22] Invalid argument
> 13:57:12 DEBUG   : starting mpaths

The steps are executed in the wrong order. Also each LUN seems to be
tried to be added twice. Since the drives appear, it does not seem to
matter. However, I strongly suggest getting it right in order not to
provoke any other issues in the future. This would be the correct
order with a simplified scheme (not taking into account which LUNs are
on the same WWPN, which has already been added before the first of its
LUNs):

forall defined FCP disks do
1) set FCP adapter device online
2) add WWPN to adapter
3) add LUN to WWPN
done

I.e. the above log should come out as follows:

> echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_add
> echo 0x401040ea00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_add
> echo 0x401040eb00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add

The absolut correct version would be (but probably requires to much
code checking dependencies and the above also works):

> echo 1 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_add
> echo 0x401040ea00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add
> echo 0x401040eb00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_add

> 13:57:13 DEBUG   : done starting mpaths.  Drivelist: ['sda', 'sdb', 'dasda']
> 13:57:13 INFO    : pv is /dev/dasda2 in vg VolGroup00, size is 6943
> 13:57:13 INFO    : vg VolGroup00, size is 6912, pesize is 32768
> 13:57:13 DEBUG   : VolumeGroupRequestSpec('VolGroup00').preexist_size is 6912.0
> 13:57:13 INFO    : lv is VolGroup00/LogVol00, size of 4896
> 13:57:13 INFO    : lv is VolGroup00/LogVol01, size of 2016
> 13:57:13 DEBUG   : /dev/VolGroup00/LogVol00 not probed as ext4dev
> 13:57:13 DEBUG   : /dev/VolGroup00/LogVol00 not probed as ext4
> 13:57:13 INFO    : moving (1) to step parttype

OK, the SCSI disks are there now. To my surprise the VolGroup01 on sda
does not appear in the log.

> 13:57:38 INFO    : moving (-1) to step partitionobjinit
> 13:57:38 DEBUG   : removing drive dasda from disk lists
> 13:57:38 DEBUG   : removing drive sda from disk lists
> 13:57:38 DEBUG   : removing drive sdb from disk lists
> 13:57:38 DEBUG   : echo 1 > /sys/bus/scsi/devices/0:0:0:1/delete
> 13:57:38 DEBUG   : echo 0 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> 13:57:38 DEBUG   : echo 0x401040ea00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> 13:57:38 DEBUG   : echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_remove
> 13:57:38 WARNING : error bringing zfcp device 0.0.3c1b offline: [Errno 6] No such device or address
> 13:57:38 DEBUG   : echo 1 > /sys/bus/scsi/devices/0:0:0:2/delete
> 13:57:38 DEBUG   : echo 0 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> 13:57:38 DEBUG   : echo 0x401040eb00000000 >
/sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> 13:57:38 DEBUG   : echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_remove
> 13:57:38 INFO    : moving (-1) to step findinstall

Even worse, when I step back, hoping to now have activated SCSI LUNs
and my installation on VolGroup01 on sda would appear in the upgrade
systems dropdown box, anaconda unconfigures all FCP stuff again.

And additionally in a wrong order again and this time the wrong order
does matter and even generates ugly error messages from the zfcp
device driver on the console:

*** result of wrong deletion of zfcp scsi disks on the console: ***

> zfcp: unit erp failed on unit 0x401040ea00000000 on port 0x500507630300c562  on adapter 0.0.3c1b
> zfcp: unit erp failed on unit 0x401040eb00000000 on port 0x500507630300c562  on adapter 0.0.3c1b
>  rport-0:0-0: blocked FC remote port time out: saving binding

In order to prevent this, the following order must be used for
unconfiguring FCP SCSI devices (again simplified as above):

forall SCSI devices that are FCP attached: remove SCSI device
forall defined LUNs remove unit from corresponding WWPN
forall defined WWPNs remove port from corresponding adapter
forall defined FCP adapters set adapter offline

I.e. the above log should come out as follows:

> echo 1 > /sys/bus/scsi/devices/0:0:0:1/delete
> echo 1 > /sys/bus/scsi/devices/0:0:0:2/delete
> echo 0x401040ea00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> echo 0x401040eb00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_remove
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_remove
> echo 0 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online
> echo 0 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online

The absolut correct version would be (but probably requires to much
code checking dependencies and the above also works):

> echo 1 > /sys/bus/scsi/devices/0:0:0:1/delete
> echo 1 > /sys/bus/scsi/devices/0:0:0:2/delete
> echo 0x401040ea00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> echo 0x401040eb00000000 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/0x500507630300c562/unit_remove
> echo 0x500507630300c562 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/port_remove
> echo 0 > /sys/bus/ccw/drivers/zfcp/0.0.3c1b/online

*** most important constraints on configuring zFCP: ***

http://www.ibm.com/developerworks/linux/linux390/documentation_dev.html
Device Drivers, Features, and Commands - SC33-8411-02
May 2009, Linux Kernel 2.6 - Development stream
http://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26ddd02.pdf

Chapter 6. SCSI-over-Fibre Channel device driver
Working with the zfcp device driver

Setting an FCP channel online or offline
By default, FCP channels are offline. Set an FCP channel online before
you perform any other tasks.

Configuring and removing ports
Before you start: The FCP channel must be online.
...
You cannot remove a port while SCSI devices are configured for it (see
"Configuring SCSI devices" on page 72) or if the port is in use, for
example, by error recovery.

Configuring SCSI devices
To configure a SCSI device for a target port write the device's LUN to
the port's unit_add attribute.
...
Adding a SCSI device also registers the device with the SCSI stack and
creates a sysfs entry in the SCSI branch (see "Mapping the
representations of a SCSI device in sysfs").

Removing SCSI devices
To remove a SCSI device from a target port you need to first
unregister the device from the SCSI stack and then remove it from the
target port.
Comment 1 Steffen Maier 2009-07-10 16:27:16 EDT
Created attachment 351297 [details]
[PATCH 1/5] correctly activate zFCP LUN on s390
Comment 2 Steffen Maier 2009-07-10 16:27:43 EDT
Created attachment 351298 [details]
[PATCH 2/5] correctly delete a SCSI device provided by a zFCP LUN on s3 90
Comment 3 Steffen Maier 2009-07-10 16:28:09 EDT
Created attachment 351299 [details]
[PATCH 3/5] correctly deactivate zFCP LUN on s390
Comment 4 Steffen Maier 2009-07-10 16:28:36 EDT
Created attachment 351300 [details]
[PATCH 4/5] error messages of zFCP on s390: log or pass to the UI
Comment 5 Steffen Maier 2009-07-10 16:28:52 EDT
Created attachment 351301 [details]
[PATCH 5/5] prevent getting started up or shutdown again while already  in such state
Comment 6 Steffen Maier 2009-07-10 16:40:05 EDT
patches are tested as is currently possible: anaconda executed in a running F11 getting multiple LUNs over different paths using partly /tmp/fcpconfig and also "add zFCP" GUI plus going back and forth through the wizard screens to startup and shutdown at will
Comment 7 David Cantrell 2009-08-24 22:45:29 EDT
Steffen,

Didn't I already apply these patches to the git repo?  I remember going through a lot of zFCP patches on the mailing list.

FYI, you don't need to open bugs *and* post to the list.  The list is sufficient.

If these are already in the git repo, let's close this bug.