Bug 459044 - Hipersocket real devices attach/detach in z/VM doesn't recover with write and data device
Hipersocket real devices attach/detach in z/VM doesn't recover with write and...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: initscripts (Show other bugs)
5.2.z
s390x All
high Severity high
: rc
: ---
Assigned To: initscripts Maintenance Team
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-13 17:28 EDT by IBM Bug Proxy
Modified: 2011-01-24 18:15 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 17:15:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
initscripts-8.45.21-s390-subchannel.patch (560 bytes, text/plain)
2008-11-06 05:20 EST, IBM Bug Proxy
no flags Details
initscripts-8.45.21-s390-specfile.patch (1.02 KB, text/plain)
2008-11-06 05:20 EST, IBM Bug Proxy
no flags Details

  None (edit)
Description IBM Bug Proxy 2008-08-13 17:28:27 EDT
=Comment: #0=================================================
Anupama B. Nagaraj <anupama.reddy@in.ibm.com> - 2008-08-05 08:18 EDT
Problem description:
After detach/attach of a device of hipersockets  in layer2 mode is not regrouped
after attach ( hotplug event ) is not generated.



[root@anupama ~]# vmcp det 8001 \*

OSA 8001 DETACHED BY H0530010

[root@anupama ~]# vmcp att 8001 \*

OSA 8001 ATTACHED TO H0530010 8001

[root@h0530010 ~]#

Message from syslogd@ at Wed Mar 19 19:20:24 2008 ...

h0530010 kernel: unregister_netdevice: waiting for hsi0 to become free. Usage
count = 1

[root@h0530010 ~]# ifconfig hsi0

hsi0: error fetching interface information: Device not found


Here, device 8100 is the write device in the group.

#dmesg



qeth: sense data available on channel 0.0.8000.

qeth:  cstat 0x0

 dstat 0xE

qdio : received check condition on activate queues on device 0.0.8002 (cs=x0,
ds=xe).

qeth: irb: 01 c2 40 17  3b 27 a0 38  0e 00 10 00  00 80 00 00

qeth: irb: 01 20 00 00  00 00 00 00  00 00 00 00  00 00 00 00

qeth: sense data: 00 00 af fe  00 00 00 00  00 00 00 00  00 00 00 00

qeth: sense data: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00

qeth: Recovery of device 0.0.8000 started ...

crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=1, erc=4, rsid=17

qeth: Device 0.0.8000 could not be recovered!

KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at
net/sched/sch_generic.c (605)

crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=1, erc=4, rsid=17

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1

unregister_netdevice: waiting for hsi0 to become free. Usage count = 1





If this is not an installation problem,
       Describe any custom patches installed.

       Provide output from "uname -a", if possible:

   Linux h0530011 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:58 EDT 2008 s390x s390x
s390x GNU/Linux


Additional information:
This is the information provided  by developer

/lib/udev/ccw_init calls get_config_by_subchannel in
/etc/sysconfig/network-scripts/network-functions.

 

get_config_by_subchannel ()

{

    LANG=C grep -il "^[[:space:]]*SUBCHANNELS=${1}\([[:space:]#]\|$\|,\)"
/etc/sysconfig/network-scripts/ifcfg-* \

        | LC_ALL=C sed -e "$__sed_discard_ignored_files"

}



This function is only able to find the subchannel-corresponding config file, if
the first subchannel(read device ) within the ccwgroup is used as argument.



Thus for the 2nd and 3rd subchannel of a qeth-device: Even though the hotplug
event is generated, the scripts are not able to determine the corresponding
ifcfg-file. The group-operation in /lib/udev/ccw_init

	echo "$SUBCHANNELS" > $DIR/group

fails, because $NETTYPE is empty and thus $DIR is not setup correctly.
Comment 1 IBM Bug Proxy 2008-10-22 11:11:17 EDT
Tjhis is proposed for 5.4 as per todays bug call,
Comment 2 IBM Bug Proxy 2008-10-29 08:30:44 EDT
Hello Red Hat:
There is the  RIT 232889 created now and connected to this LTC BZ.

Please link RIT 232889  to RH BZ 459044.

> https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=232889
Comment 3 IBM Bug Proxy 2008-10-29 13:11:12 EDT
Hello Red Hat,
the fix in udev  is for a System z specific event which would not have impact  on other architectures and it will benefit customers working with virtualized environments which  are common on System z .

Please evaluate inclusion into RHEL5.3

As asked for in todays call:     (for adaption in RIT)
This bug has severity "high" and Priority 2 on IBM site.

if you have any issues to accept this fix please let us know what are your concerns regarding:
business impact
technical issues
or quality

Thanks in advance
Comment 5 IBM Bug Proxy 2008-11-06 05:20:45 EST
Created attachment 322692 [details]
initscripts-8.45.21-s390-subchannel.patch
Comment 6 IBM Bug Proxy 2008-11-06 05:20:50 EST
Created attachment 322693 [details]
initscripts-8.45.21-s390-specfile.patch
Comment 7 IBM Bug Proxy 2008-11-06 05:30:38 EST
Hello Red Hat,

attached to this BZ you'll find two patches against the initrd package. This is *not* a kernel issue.

The patches have been tested and fix the problem. Posting this patch upstream is not applicable, since this only changes Redhat specific distribution code.

With best regards,

--Hans
Comment 13 Harald Hoyer 2008-11-10 12:13:05 EST
Patch is broken..

# t="60" grep -i "^[[:space:]]*SUBCHANNELS=.*${t}.*$" /tmp/sub-test.txt 
SUBCHANNELS=0.600.0
SUBCHANNELS=0.0.600
SUBCHANNELS=600.0.0
Comment 15 IBM Bug Proxy 2008-11-11 13:10:44 EST
(In reply to comment #20)
> ------- Comment From harald@redhat.com 2008-11-10 12:13:05 EDT-------
> Patch is broken..
>
> # t="60" grep -i "^[[:space:]]*SUBCHANNELS=.*${t}.*$" /tmp/sub-test.txt
> SUBCHANNELS=0.600.0
> SUBCHANNELS=0.0.600
> SUBCHANNELS=600.0.0
>

Hi Harald,

I think your testcase is broken. In this test $t should look like this "0.0.0060", since this
is the format which is passed to get_config_by_subchannel ().

With best regards,

--Hans
Comment 16 Harald Hoyer 2008-11-12 04:47:57 EST
CHANNEL=${DEVPATH##*/}
CONFIG=$(get_config_by_subchannel $CHANNEL)

and IIRC CHANNEL is only one channel
Comment 17 Harald Hoyer 2008-11-12 05:17:54 EST
I stand corrected.. your patch would work
Comment 18 Harald Hoyer 2008-11-12 05:27:06 EST
$  t=0.0.0100 grep -i "^[[:space:]]*SUBCHANNELS=.*${t}.*$" /tmp/sub-test.txt 
SUBCHANNELS=0.0.0100,0.0.0200,0.0.0300
SUBCHANNELS=10.0.0100,0.0.0200,0.0.0300
Comment 19 IBM Bug Proxy 2008-11-12 11:22:14 EST
As per John Jarvis this should be in snapshot 3 because fix is checked in already
Comment 21 IBM Bug Proxy 2008-11-25 11:41:21 EST
Hi Hans,

RHEl5.3 is initscripts-8.45.25-1.el5 contains this , which is different from the patch above

LANG=C egrep -i -l  "^[[:space:]]*SUBCHANNELS=([0-9]\.[0-9]\.[a-f0-9]+,){0,2}${1}(,[0-9]\.[0-9]\.[a-f0-9]+){0,2}([[:space:]]+#|[[:space:]]*$)" /etc/sysconfig/network-scripts/ifcfg-* \
| LC_ALL=C sed -e "$__sed_discard_ignored_files"

The problem is not fixed , the write and data device doesn't recover from after detach/attach

thanks
Anupama
Comment 22 Phil Knirsch 2008-11-25 11:58:33 EST
Could you attach your ifcfg files to this bugzilla please? That way we can reproduce the failing restart.

Thanks & regards, Phil
Comment 24 Phil Knirsch 2008-12-02 09:50:07 EST
Ping IBM, please attach the ifcfg files for a reproducer.

Thanks & regards, Phil
Comment 25 IBM Bug Proxy 2008-12-03 05:10:57 EST
Hello Redhat ,

Here is the ifcfg file of Hipersockets

# IBM QETH
DEVICE=hsi0
BOOTPROTO=static
IPADDR=10.40.48.35
NETMASK=255.255.0.0
NETTYPE=qeth
ONBOOT=yes
PORTNAME=OSAPORT
OPTIONS="layer2=1"
SUBCHANNELS=0.0.8003,0.0.8004,0.0.8005

[root@h4215035 ~]# cat /etc/modprobe.conf f
alias eth0 qeth
alias hsi0 qeth
Comment 26 Harald Hoyer 2008-12-03 10:19:32 EST
with:
# cat ifcfg-eth1
TYPE=Ethernet
DEVICE=eth1
BOOTPROTO=none
ONBOOT=no
USERCTL=no
IPV6INIT=no
PEERDNS=yes
SUBCHANNELS=0.0.0700,0.0.0701,0.0.0702
NETTYPE=qeth
NETMASK=255.255.255.0
IPADDR=192.168.1.1

I was able to do:
# vmcp detach 701 \*
# vmcp attach 701 \*

and ccw_init found the correct ifcfg file, but:

# ifconfig eth1
eth1: error fetching interface information: Device not found

I had to detach everything first to make it work afterwards.

# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:09:6B:1A:B3:A9  
          BROADCAST MULTICAST  MTU:1492  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

#  vmcp detach 700 \*
OSA 0700 DETACHED BY DEVEL3
#  vmcp detach 701 \*
OSA 0701 DETACHED BY DEVEL3
#  vmcp detach 702 \*
OSA 0702 DETACHED BY DEVEL3
#  vmcp attach 700 \*
OSA 0700 ATTACHED TO DEVEL3 0700
# ifconfig eth1
eth1: error fetching interface information: Device not found
#  vmcp attach 701 \*
OSA 0701 ATTACHED TO DEVEL3 0701
# ifconfig eth1
eth1: error fetching interface information: Device not found
#  vmcp attach 702 \*
OSA 0702 ATTACHED TO DEVEL3 0702
[root@devel3 network-scripts]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:09:6B:1A:B3:A9  
          BROADCAST MULTICAST  MTU:1492  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


Is there anything to be done additionally in ccw_init, if only one channel is reattached?
Comment 27 Harald Hoyer 2008-12-03 10:24:16 EST
so to summarize:

Thus for the 2nd and 3rd subchannel of a qeth-device: Even though the hotplug
event is generated, the scripts are not able to determine the corresponding
ifcfg-file. The group-operation in /lib/udev/ccw_init

 echo "$SUBCHANNELS" > $DIR/group

fails, because $NETTYPE is empty and thus $DIR is not setup correctly.


Now, the ifcfg-file is found, but the:

 echo "$SUBCHANNELS" > $DIR/group

fails nontheless.
Comment 28 Harald Hoyer 2008-12-03 10:49:15 EST
Both, initscripts-8.45.25-1.el5 and the patch from comment #5 let ccw_init find the correct ifcfg-file.. but both versions fail on my machine to recover the interface!
Comment 29 Hans-Joachim Picht 2008-12-04 11:51:11 EST
After a telephone conference with Phil Knirsch we came to the conclusion that the best idea is to close this issue (since the correct ifcfg-configuration-file is found) and to open a new bug against the corresponding kernel device driver for 5.4

With best regards,

       --Hans
Comment 32 IBM Bug Proxy 2008-12-05 10:40:54 EST
(In reply to comment #39)
> After a telephone conference with Phil Knirsch we came to the conclusion that
> the best idea is to close this issue (since the correct
> ifcfg-configuration-file is found) and to open a new bug against the
> corresponding kernel device driver for 5.4

Hi Hans,
Thanks for the update. Have you already opened a new bug for RHEL 5.4? Please go ahead and close this bug (Resolution= WILLNOTFIX ?) when you open the new bug.
Thanks!
Comment 34 IBM Bug Proxy 2008-12-08 05:01:11 EST
Closing this bug @IBM.

For the follow up discussion we use the following bugzilla entry:

https://bugzilla.linux.ibm.com/show_bug.cgi?id=50500

We currently don't have a RH BZ id yet.

With best regards,

--Hans
Comment 36 errata-xmlrpc 2009-01-20 17:15:40 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0245.html

Note You need to log in before you can comment on or make changes to this bug.