Bug 230251 - Installation crashes when configuring lcs network adapter
Summary: Installation crashes when configuring lcs network adapter
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: s390x
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jan Glauber
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-27 19:34 UTC by Ludvik Kos
Modified: 2009-12-08 16:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-08 16:10:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test kernel images (13.89 MB, application/octet-stream)
2007-02-28 15:50 UTC, Brad Hinson
no flags Details
Two different crash behaviors (2.78 KB, text/plain)
2007-03-01 07:40 UTC, Ludvik Kos
no flags Details
Test kernel images based on beta 2 (13.89 MB, application/octet-stream)
2007-03-01 19:06 UTC, Brad Hinson
no flags Details
Test kernel images based on beta 2 (13.86 MB, application/x-tar)
2007-03-02 19:22 UTC, Brad Hinson
no flags Details
Test results from the last kernel. (2.04 KB, text/plain)
2007-03-05 14:50 UTC, Ludvik Kos
no flags Details
initrd-lcs-carrier (11.68 MB, application/octet-stream)
2007-03-06 17:49 UTC, Brad Hinson
no flags Details
initrd-lcs-recovery (11.68 MB, application/octet-stream)
2007-03-06 17:50 UTC, Brad Hinson
no flags Details
initrd-netdev-sched (11.68 MB, application/octet-stream)
2007-03-06 17:51 UTC, Brad Hinson
no flags Details
initrd-lcs-carrier (11.68 MB, application/octet-stream)
2007-03-08 17:17 UTC, Brad Hinson
no flags Details
initrd-lcs-recovery (11.68 MB, application/octet-stream)
2007-03-08 17:18 UTC, Brad Hinson
no flags Details
initrd-netdev-sched (11.68 MB, application/octet-stream)
2007-03-08 17:19 UTC, Brad Hinson
no flags Details
13 experiments with "lcs-recovery" kernel&initrd (17.36 KB, audio/x-pn-realaudio)
2007-03-10 13:07 UTC, Ludvik Kos
no flags Details
Test kernel/initrd (12.98 MB, application/x-bzip2)
2007-04-23 13:38 UTC, Brad Hinson
no flags Details
Output of the last test (1.25 KB, text/plain)
2007-04-23 15:03 UTC, Ludvik Kos
no flags Details

Description Ludvik Kos 2007-02-27 19:34:47 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1

Description of problem:
During network configuration the following output is displayed and installation crashes:

lcs: Query IPAssist failed. Assuming unsupported!
lcs: check on device 0.0.02e4, dstat=0xE, cstat=0x0
lcs: Recovery of device 0.0.02e4 started...
lcs: LCS device eth0 without IPv6 support
lcs: LCS device eth0 without Multicast support
lcs: Device eth0 successfully recovered!
Badness in linkwatch_run_queue at net/core/link_watch.c:79
000000001e43fde0 000000001e43fd58 0700000000000ae4 0000000100000000
       0000000000107874 000000001e43fc98 000000001e43fc98 000000000010475a
       0000000000000000 0000000000000000 0000000000000000 0000000000383224
       000000001e43fde0 0000000000000008 000000000000000e 000000001e43fd78
       0000000000450a50 000000000010475a 000000001e43fd00 000000001e43fd40
Call Trace:
(]<000000000010477a>( dump_stack+0x2a2/0x35c)
]<00000000003833e4>( linkwatch_event+0x1c0/0x294
]<0000000000165488>( worker_thread+0x1e8/0x280
]<0000000000170da0>( kthread+0x118/0x14c
]<0000000000107462>( kernel_thread_starter+0x6/0xc
]<000000000010745c>( kernel_thread_starter+0x0/0xc

Version-Release number of selected component (if applicable):
kernel 2.6.18-1.2747.el5

How reproducible:
Always


Steps to Reproduce:
1. Start RHEL5 Beta 2 installation on zSeries system (z/VM installation type). Network is of LCS type.
2. Select LCS network type and enter appropriate information.


Actual Results:
Installation crashes with output shown in 'Description'. CP prompt appears (z/VM).

Expected Results:
Network adapter should become operational. Installation should continue.

Additional info:

Comment 1 Ludvik Kos 2007-02-27 20:03:39 UTC
This is the original dump on RHEL5. In Description I've put SLES10 dump by
mistake. It is however the same problem (installation failure due to lcs network
setup crash).

Enter the relative port number of your LCS device
(required for OSA-Express ATM cards only):
0
lcs: Loading LCS driver
lcs: Query IPAssist failed. Assuming unsupported!
lcs: check on device 0.0.02e4, dstat=0xE, cstat=0x0
lcs: Recovery of device 0.0.02e4 started...
specification exception: 0006 Ý#1¨
CPU:    0    Not tainted

Process lcs_recover (pid: 166, task: 0000000001224148, ksp: 0000000000c6fc78)

Krnl PSW : 0004000180000000 0000000000000152 (0x152)

Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0000000000000140
0000000000000000 0000000000000000 0000000000000004 0000000001166df0
0000000000000000 0000000000000000 0000000000000000 0000000001166840
0000000001166800 000000000024c8c0 0000000000787f98 00000000005cbda0

Krnl Code: 00 01 80 00 00 00 00 00 00 00 00 00 01 52 00 00 00 00 00 00
Call Trace:
(Ý<0000000000000080>¨ 0x80)
Ý<00000000001a9fa8>¨ net_tx_action+0x138/0x180
Ý<000000000003d88c>¨ __do_softirq+0x78/0x110
Ý<000000000001ed5c>¨ do_softirq+0x98/0xb0
Ý<000000000001f5b0>¨ io_return+0x0/0x10
Ý<00000000000f1ee0>¨ sysfs_get_name+0x58/0xac
(Ý<0000000000c6fc78>¨ 0xc6fc78)
Ý<00000000000f36bc>¨ sysfs_dirent_exist+0x44/0x94
Ý<00000000000f2996>¨ sysfs_add_file+0x56/0xa0
Ý<00000000000f5414>¨ sysfs_create_group+0x104/0x164

Ý<000000000016aed4>¨ class_device_add+0x2fc/0x504
Ý<00000000001a92f4>¨ register_netdevice+0x2a4/0x3d0
Ý<00000000001a94a4>¨ register_netdev+0x84/0xa0
Ý<00000000208937fe>¨ lcs_new_device+0x9ea/0xb08 Ýlcs¨
Ý<0000000020895a12>¨ lcs_recovery+0x126/0x170 Ýlcs¨
Ý<00000000000184ce>¨ kernel_thread_starter+0x6/0xc
Ý<00000000000184c8>¨ kernel_thread_starter+0x0/0xc

<0>Kernel panic - not syncing: Fatal exception in interrupt
HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00015E24

Comment 2 Brad Hinson 2007-02-27 20:07:35 UTC
From email thread <LINUX-390.EDU>:

From: 	Brad Hinson <bhinson>
To: 	Linux on 390 Port <LINUX-390.EDU>
Subject: 	Re: Installation problems (zSeries & lcs network)
Date: 	Tue, 27 Feb 2007 12:41:13 -0500

Here are a few theories as to why this is happening.  If there are
kernel folks reading feel free to correct or chime in:

1.) In RHEL 5 (kernel 2.6.18), register_netdevice() calls might_sleep().
This is a change from RHEL 4 (2.6.9) where might_sleep() was not called.
This may be related because the top of the stack (most recently called
function) is an interrupt routine.

2.) In kernel 2.6.18, drivers/s390/net/lcs.c, lcs_new_device():2160, the
function netif_carrier_on() is called.  This was not called in the 2.6.9
lcs code.  I can't find the bug report that necessitated this change,
but perhaps this introduced a regression.

3.) In drivers/s390/net/lcs.c, in lcs_recover(), there is:
[snip]
rc = __lcs_shutdown_device(gdev, 1);
rc = lcs_new_device(gdev);
[..]

This makes me wonder if there is a possible race condition since the
device is destroyed and recreated right after each other, and your crash
is in lcs_new_device() after ultimately attempting to check if the sysfs
group exists (maybe it's still lingering from __lcs_shutdown_device?).
Note the 2nd argument to __lcs_shutdown_device() is 1.  Looking at
__lcs_shutdown_device(), it does lcs_wait_for_threads() only when the
2nd argument is 0.  Looking through the qeth code, it appears
qeth_recover() does something similar but does call
qeth_wait_for_threads().  Perhaps lcs_recovery() should do the same
(i.e. call __lcs_shutdown_device with 0).

Comment 3 Brad Hinson 2007-02-28 15:50:34 UTC
Created attachment 148933 [details]
Test kernel images

This contains:

images/orig-rc/kernel.img:
Post-beta2 kernel (for reference)

images/netdev-sched/kernel.img:
Remove might_sleep() in register_netdevice()

images/lcs-netif-carrier/kernel.img:
Remove netif_carrier_on() in lcs_new_device()

images/lcs-recovery/kernel.img:
Call __lcs_shutdown_device() with 0 instead of 1 in lcs_recover()

Can you test these?

Thanks

Comment 4 Ludvik Kos 2007-02-28 17:39:42 UTC
Unfortunately all kernels (including one for reference) failed with the same
output. I've used your kernels with initrd.img and redhat.parm from Beta 2 CD.

--- OUTPUT ---
...
...
Enter the relative port number of your LCS device
(required for OSA-Express ATM cards only):
0
Could not detect LCS interface, aborting...
Kernel panic - not syncing: Attempted to kill init!
HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 0003AD16
--- END OF OUTPUT ---

(In reply to comment #3)
> Created an attachment (id=148933) [edit]
> Test kernel images
> 
> This contains:
> 
> images/orig-rc/kernel.img:
> Post-beta2 kernel (for reference)
> 
> images/netdev-sched/kernel.img:
> Remove might_sleep() in register_netdevice()
> 
> images/lcs-netif-carrier/kernel.img:
> Remove netif_carrier_on() in lcs_new_device()
> 
> images/lcs-recovery/kernel.img:
> Call __lcs_shutdown_device() with 0 instead of 1 in lcs_recover()
> 
> Can you test these?
> 
> Thanks



Comment 5 Brad Hinson 2007-02-28 17:59:09 UTC
The output in comment 4 is different from comment 2.  In my tests I get the
"Could not detect LCS interface, aborting..." because I don't have an LCS chpid
defined.  Is there anything different about your process that could lead to the
LCS subchannels not being detected correctly?

Comment 6 Ludvik Kos 2007-02-28 18:18:28 UTC
In response to Comment #5:
I believe that nothing is different now since I get almost the same output with
kernel from Beta 2 CD.

--- OUTPUT ---
0
 lcs: Loading LCS driver
 lcs: Query IPAssist failed. Assuming unsupported!
 lcs: check on device 0.0.02e4, dstat=0xE, cstat=0x0
 lcs: Recovery of device 0.0.02e4 started...
 lcs: LCS device eth0 without IPv6 support
 lcs: LCS device eth0 without Multicast support
 lcs: Device eth0 successfully recovered!
BUG: warning at net/core/link_watch.c:117/linkwatch_run_queue() (Not tainted)
0000000000000000 000000001eeb3cb8 0000000000000002 0000000000000000
       000000001eeb3d58 000000001eeb3cd0 000000001eeb3cd0 000000000003748e
       0000000000380218 000000000037f9a4 0000000000000000 000000000000000b
       0000000000000008 0000000000000000 000000001eeb3cb8 000000001eeb3d30
       000000000024c380 00000000000164b2 000000001eeb3cb8 000000001eeb3d08
Call Trace:
(Ý<000000000001640a>¨ show_trace+0x11e/0x130)
 Ý<00000000001b3d70>¨ linkwatch_run_queue+0x1c0/0x22c
 Ý<00000000001b3e42>¨ linkwatch_event+0x66/0x74
 Ý<000000000004ca5a>¨ run_workqueue+0xea/0x150
 Ý<000000000004d8b0>¨ worker_thread+0x114/0x154
 Ý<0000000000051990>¨ kthread+0x118/0x14c
 Ý<00000000000184ce>¨ kernel_thread_starter+0x6/0xc
 Ý<00000000000184c8>¨ kernel_thread_starter+0x0/0xc
--- END OF OUTPUT ---

Network settings are correct (work for SLES9 installation program).

Comment 7 Brad Hinson 2007-02-28 18:29:36 UTC
The output in comment 6 looks like the original output from SLES.  Are you now
getting that output from RHEL 5?  Are you able to repeatedly reproduce the
output in comment 1, or is the output constantly changing?

Comment 8 Ludvik Kos 2007-02-28 19:44:11 UTC
Today I was able to reproduce the output (from Comment #7) twice on two separate
z/VMs. I'll repeat those tests tomorrow to see if this is now constant behavior.
Do you have any idea why there is a difference between the reference image and
the image from Beta 2 CD?

Comment 9 Brad Hinson 2007-02-28 20:01:51 UTC
The reference image is a post-beta2 kernel.  The output in comment 6 does not
indicate a kernel panic.  Does the install continue much further?

Comment 10 Ludvik Kos 2007-02-28 20:04:22 UTC
There is no kernel panic, it just hangs. Like in the case of SLES10.

Comment 11 Ludvik Kos 2007-03-01 07:40:41 UTC
Created attachment 148998 [details]
Two different crash behaviors

Comment 12 Ludvik Kos 2007-03-01 07:41:52 UTC
Today I repeated test with kernel from the Beta 2 CD. I've got two different
behaviors. See <a
href="https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=148998">attachment</a>.

I also have additional question: is it ok just to use modified kernel while old
lcs.ko is still there (in initrd.img from Beta 2 CD). Is it possible that your
reference kernel failed because of clash between new thing in kernel and old
modules in initrd?

Comment 13 Brad Hinson 2007-03-01 17:19:10 UTC
Good point about the initrd and lcs.ko.  Instead of attaching a new initrd
(which won't match your installation tree), I'll get started on rebuilding the
test kernels based on beta 2.

Comment 14 Brad Hinson 2007-03-01 19:06:48 UTC
Created attachment 149042 [details]
Test kernel images based on beta 2

Comment 15 Ludvik Kos 2007-03-02 07:59:17 UTC
(In reply to comment #14)
> Created an attachment (id=149042) [edit]
> Test kernel images based on beta 2
> 

Can you please check new attachment? It has the same checksum as the old one
(sent on on 2007-02-28)...

Comment 16 Brad Hinson 2007-03-02 19:22:43 UTC
Created attachment 149139 [details]
Test kernel images based on beta 2

Oops, these are the right ones.

Comment 17 Ludvik Kos 2007-03-05 14:50:27 UTC
Created attachment 149263 [details]
Test results from the last kernel.

It is the same result except that reference kernel now crashes like the one
from Beta 2 CD. This makes me wander if it is enough just to provide me with a
new kernel? I believe that there should be also a new initrd with fixed
drivers.

Comment 18 Brad Hinson 2007-03-06 17:49:38 UTC
Created attachment 149362 [details]
initrd-lcs-carrier

You are correct.  Here are updated initrd images for testing.

Comment 19 Brad Hinson 2007-03-06 17:51:04 UTC
Created attachment 149363 [details]
initrd-lcs-recovery

Comment 20 Brad Hinson 2007-03-06 17:51:58 UTC
Created attachment 149364 [details]
initrd-netdev-sched

Comment 21 Ludvik Kos 2007-03-07 15:36:50 UTC
Unfortunately I get a message:
....
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
No filesystem could mount root, tried:  ext2 iso9660
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00358E10

My assumption is that something can be wrong with the packaging. Earlier I see this:
...
cpu 0 phys_idx=0 vers=FF ident=00018F machine=1247 unused=0000
Brought up 1 CPUs
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
Freeing initrd memory: 11961k freed
...


Comment 22 Brad Hinson 2007-03-08 17:17:55 UTC
Created attachment 149591 [details]
initrd-lcs-carrier

Apologies, used an incorrect flag when creating the initrd.  I've tested these,
they work correctly.

Comment 23 Brad Hinson 2007-03-08 17:18:44 UTC
Created attachment 149592 [details]
initrd-lcs-recovery

Comment 24 Brad Hinson 2007-03-08 17:19:27 UTC
Created attachment 149593 [details]
initrd-netdev-sched

Comment 25 Ludvik Kos 2007-03-09 12:50:20 UTC
This time I've got something different:
Kernels and initrds netdev-sched and lcs-recovery produced more or less standard
output:

Enter the relative port number of your LCS device
(required for OSA-Express ATM cards only):
0
 lcs: Loading LCS driver
 lcs: Query IPAssist failed. Assuming unsupported!
 lcs: check on device 0.0.02e4, dstat=0xE, cstat=0x0
 lcs: Recovery of device 0.0.02e4 started...
 lcs: LCS device eth0 without IPv6 support
 lcs: LCS device eth0 without Multicast support
 lcs: Device eth0 successfully recovered!
BUG: warning at net/core/link_watch.c:117/linkwatch_run_queue() (Not tainted)
0000000000000000 0000000001303cb8 0000000000000002 0000000000000000
       0000000001303d58 0000000001303cd0 0000000001303cd0 000000000003748e
       0000000000380218 000000000037f9a4 0000000000000000 000000000000000b
       0000000000000008 0000000000000000 0000000001303cb8 0000000001303d30
       000000000024c378 00000000000164b2 0000000001303cb8 0000000001303d08
Call Trace:
(Ý<000000000001640a>¨ show_trace+0x11e/0x130)
 Ý<00000000001b3d98>¨ linkwatch_run_queue+0x1c0/0x22c
 Ý<00000000001b3e6a>¨ linkwatch_event+0x66/0x74
 Ý<000000000004ca5a>¨ run_workqueue+0xea/0x150
 Ý<000000000004d8b0>¨ worker_thread+0x114/0x154
 Ý<0000000000051990>¨ kthread+0x118/0x14c
 Ý<00000000000184ce>¨ kernel_thread_starter+0x6/0xc
 Ý<00000000000184c8>¨ kernel_thread_starter+0x0/0xc

~~~

However, there has been no crash with lcs-netif-carrier:
Enter the relative port number of your LCS device
(required for OSA-Express ATM cards only):
0
 lcs: Loading LCS driver
 lcs: Query IPAssist failed. Assuming unsupported!
 lcs: check on device 0.0.02e4, dstat=0xE, cstat=0x0
 lcs: Recovery of device 0.0.02e4 started...
 lcs: LCS device eth0 without IPv6 support
 lcs: LCS device eth0 without Multicast support
 lcs: Device eth0 successfully recovered!

Unfortunately nothing has happened after that. I've even left system running
over night but nothing happened.
All this gives me an idea that perhaps lcs driver is not (main) culprit. What
should happen in normal installation after loading network driver and setting up
network?



Comment 26 Brad Hinson 2007-03-09 16:45:54 UTC
The output from comment 25 looks very similar to the SLES 10 output (BUG at
net/core/linkwatch.c, linkwatch_run_queue).  Can you confirm that this output is
from RHEL 5?  If so, this is significantly different from the originally
reported RHEL 5 output, and just want to make sure everything else has stayed
consistent except the new kernel and initrd.

After loading the LCS driver, the installer should continue asking for more
info, like subchannels, IP address, etc.

Comment 27 Ludvik Kos 2007-03-10 13:07:40 UTC
Created attachment 149769 [details]
13 experiments with "lcs-recovery" kernel&initrd

Yes I do confirm that output has been obtained from RHEL.

I've performed additional 13 experiments in a row with "lcs-recovery"
kernel&initrd (see the attachment). There are 5 cases with a crash that returns
you to CP prompt and 8 cases which doesn't. My guess is that there are two
separate problems. I also believe that I have no direct control to occurrence
of a particular one. All experiments have been performed on the same z/VM with
the same z/VM user.

You did mention that after loading the LCS driver, the installer should
continue asking for more info, like subchannels, IP address. This is actually
done *before* lcs loading. What happens *after*? Is there something critical?

One additional question: if you logout from particular z/VM user and then login
again, does it mean that environment has been cleaned? So there will be nothing
"remembered" from previous RHEL boot?

Can you advise me what else to experiment or what additional information should
I provide you from particular crash?

Comment 28 Brad Hinson 2007-03-12 18:23:13 UTC
After entering the lcs information, the module is loaded and the interface
brought online.  Also, logging out of z/VM does clear all relevant information,
so nothing is remembered that is not written to disk.

I will attempt to get an lcs chpid defined here to try and reproduce, just to
make sure it's not Flex-ES related.

Comment 29 Brad Hinson 2007-04-23 13:38:42 UTC
Created attachment 153281 [details]
Test kernel/initrd

Please test this kernel and patched initrd.img.  This solves the issue for me
here.  It is based on a newer tree than GA, so you won't be able to perform a
full install yet, but I'd like to know if you get past the original error.

Thanks

Comment 30 Ludvik Kos 2007-04-23 15:03:43 UTC
Created attachment 153285 [details]
Output of the last test

Unfortunately it didn't help.

I was running: "Linux version 2.6.18-8.1.3.el5
(brewbuilder.redhat.com) (gcc version 4.1.1 20070105 (Red Hat
4.1.1-52)) #1 SMP Mon Apr 16 15:55:25 EDT 2007".

Output is in the output.txt attachment. Installation then hangs.

Regards,
Ludvik

Comment 31 Ludvik Kos 2007-04-23 15:08:45 UTC
See comment #30.

(In reply to comment #29)
> Created an attachment (id=153281) [edit]
> Test kernel/initrd
> 
> Please test this kernel and patched initrd.img.  This solves the issue for me
> here.  It is based on a newer tree than GA, so you won't be able to perform a
> full install yet, but I'd like to know if you get past the original error.
> 
> Thanks



Comment 32 Brad Hinson 2007-04-23 15:41:45 UTC
Looks like there's a problem with your particular LCS device.  It's failing here
(from lcs.c):

        rc = lcs_get_problem(cdev, irb);
        if (rc || (dstat & DEV_STAT_UNIT_EXCEP)) {
                PRINT_WARN("check on device %s, dstat=0x%X, cstat=0x%X \n",
                            cdev->dev.bus_id, dstat, cstat);
                if (rc) {
                        lcs_schedule_recovery(card);
                        wake_up(&card->wait_q);
                        return;
                }
        }

The dstat= and cstat= indicate the error code.  cstat is subchannel status (0x0)
and dstat is device status (0xE).

What details do you have for these devices?  Can you access them in a different
LPAR?  Since I have mine working here, I can compare our IOCDS data with yours
if you'd like to post the config.

Comment 33 Brad Hinson 2007-04-23 16:36:33 UTC
Here's more info:

dstat=0xE corresponds to these three flags being set:
DEV_STAT_DEV_END
DEV_STAT_CHN_END
DEV_STAT_UNIT_CHECK

..which corresponds to state CH_STATE_STOPPED for the device.  Most likely the
device is offline or not defined for this LPAR.

Comment 34 Ludvik Kos 2007-04-25 14:48:01 UTC
I would like to stress that with the same VM with the same LCS address it is
possible to set up networking with older kernel (SLES9, 31bit). So I believe
that device is not off-line.
Can you please advise me how to perform more diagnostics? Since I am not quite
experienced with z/VM: how to provide IOCDS data for LCS devices? I believe that
#cp q ctca is not enough? Please also bear in mind that all this is running on
FLEX-ES.

Regards,
Ludvik

p.s.
#cp q ctca
CTCA 02E0 ON DEV   02E0 SUBCHANNEL = 0008
CTCA 02E1 ON DEV   02E1 SUBCHANNEL = 0009



(In reply to comment #33)
> Here's more info:
> 
> dstat=0xE corresponds to these three flags being set:
> DEV_STAT_DEV_END
> DEV_STAT_CHN_END
> DEV_STAT_UNIT_CHECK
> 
> ..which corresponds to state CH_STATE_STOPPED for the device.  Most likely the
> device is offline or not defined for this LPAR.



Comment 35 Brad Hinson 2007-04-25 15:13:59 UTC
The fact that it doesn't work in either distro (both based on newer 2.6 kernels)
leads me to believe Flex-ES needs to get involved.  The fact that this is not
reproducible on the z9 supports this point.  The kernel is seeing the
subchannels as stopped.  What is their response to the status of the flags above
on the LCS subchannels?

Also, disregard my comment about IOCDS, which is specific to System z hardware
configuration.  Since it's an emulator, Flex-ES may use another method.

Comment 36 Ludvik Kos 2007-05-07 11:26:20 UTC
While I am waiting for response regarding flex-es, I wonder if it is realistic
that the issue is about emulation. Please note that it is possible to install
and run previous version of RHEL (and SLES) in the very same environment. I
cannot help but question if there hasn't been done something to the kernel
(lcs.c, specifically) that prevents RHEL5 (and SLES10 too) from installing on
zSeries.
It is possible that kernel sees the channel as stopped because it failed to put
device online.
You mentioned that "this is not reproducible on the z9". Does that mean that you
were not able to reproduce the problem on your system using LCS type network?

Comment 37 Brad Hinson 2007-05-08 05:01:35 UTC
That's correct, I was unable to reproduce this on a z9 using subchannels on an
LCS type network.  There are not many changes to lcs.c between kernels 2.6.9 and
2.6.18; these are the changes we tested early in this bug report.  I admit I
don't know enough about how Flex-ES emulates the subchannels to narrow down the
problem, but the fact that it works on z9 points to an emulation problem.

Just to throw the question out there, is there any way around lcs on Flex-ES, or
is this still a hard requirement for you?

Comment 38 Ludvik Kos 2007-05-19 08:53:43 UTC
Hi!

I am still waiting for response regarding FLEX-ES emulator...
Unfortunately, lcs is a hard requirement because we have other working systems
on the same physical machine that would be affected with network adapter change.
At this point I would like to make some experiment myself. Can you please advise
me how to establish cross-compiling environment to build kernel from vanilla
sources by myself? Or can you please direct me to an appropriate documentation?

Regards,
Ludvik

(In reply to comment #37)
> That's correct, I was unable to reproduce this on a z9 using subchannels on an
> LCS type network.  There are not many changes to lcs.c between kernels 2.6.9 and
> 2.6.18; these are the changes we tested early in this bug report.  I admit I
> don't know enough about how Flex-ES emulates the subchannels to narrow down the
> problem, but the fact that it works on z9 points to an emulation problem.
> 
> Just to throw the question out there, is there any way around lcs on Flex-ES, or
> is this still a hard requirement for you?



Comment 39 Brad Hinson 2007-05-22 16:04:06 UTC
If you have gcc and binutils, you should be able to compile.  Grab the kernel
from http://www.kernel.org/ .  The general order is that you untar, run 'make
menuconfig', 'make bzimage', 'make modules', then 'make modules_install'.  There
are other steps, like copying out the vmlinuz and generating the initrd, but
these are the main ones.

If you'd like to rebuild a Red Hat kernel, the process is a lot easier.  Just
run rpmbuild --rebuild <kernel.src.rpm>

Comment 40 Brad Hinson 2007-08-09 15:23:34 UTC
Hi,

Any response from the Flex-ES team?

Comment 41 Jan Glauber 2009-12-08 16:10:57 UTC
I think this can be closed. I'm not sure about the state of FLEX-ES anyway.


Note You need to log in before you can comment on or make changes to this bug.