100920 – rhgb hangs system with pcmcia network card

Bug 100920 - rhgb hangs system with pcmcia network card

Summary: rhgb hangs system with pcmcia network card

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	pcmcia-cs
Sub Component:
Version:	1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	101724 (view as bug list)
Depends On:
Blocks:	CambridgeBlocker
TreeView+	depends on / blocked

Reported:	2003-07-27 04:21 UTC by Alexandre Oliva
Modified:	2007-11-30 22:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:	I've verified that this problem is fixed in kernel-2.6.3-2.1.253.2.1
Clone Of:
Environment:
Last Closed:	2004-03-19 16:28:40 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace output for various programs during a graphical boot (1.54 KB, application/x-bzip2) 2003-07-27 04:22 UTC, Alexandre Oliva	no flags	Details
patch that works around the hang (442 bytes, patch) 2003-10-25 19:24 UTC, Alexandre Oliva	no flags	Details \| Diff
View All

Description Alexandre Oliva 2003-07-27 04:21:04 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
If I boot a Toshiba Tecra 8100 with Severn, with acpi=off nogui, it works
perfectly well.  If I enable nogui, it hangs on (almost?) every boot, unless I
switch to VT1 before the graphical boot is completed.  The last thing I see if I
enter ps from a ssh session is xprefdm running rhgb-client --quit.  It turns out
that, if I ssh into the box while rhgb is in progress and run rhgb-client --quit
myself, the box hangs immediately as well, just after the screen switches to
VT1.  Ditto if I press Ctrl-Alt-Backspace.  Even if I switch to VT1 after some
point in the graphical boot, it hangs.  If I switch to VT1 while Kudzu is
probing the hardware, it doesn't hang, but it sometimes won't make progress any
more.  Even moving the mouse seems to do it at times.  If I switch to VT1 after
it starts cups or apmd, it doesn't hang, but it does hang when I switch back to
VT8.  This is looking more and more like a bug in X, but I never get these
problems with X in VT7.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Boot this machine without disabling rhgb
2.While the machine is in VT8, run rhgb-client --quit (or wait for it to be run
by xprefdm)

Actual Results:  The machine hangs

Expected Results:  It shouldn't

Additional info:

Comment 1 Alexandre Oliva 2003-07-27 04:22:06 UTC

Created attachment 93175 [details]
strace output for various programs during a graphical boot

Comment 2 Alexandre Oliva 2003-07-31 20:54:34 UTC

FWIW, the hang does not occur when using a vanilla 2.4.21 kernel.  There's no
acpi support, of course, and the pcmcia network card doens't work (why?  the
needed modules are there!), but rhgb completes and gdm starts successfully.

Comment 3 Alexandre Oliva 2003-08-01 07:28:49 UTC

Ok, I figured out why pcmcia wouldn't load: because I hadn't deleted
/lib/modules/<release>/pcmcia.  As soon as I did, the network card would be
enabled on boot again, but then rhgb would hang when exiting.  We seem to have
some conflict between rhgb and pcmcia :-(

This is with acpi=off, btw.  I'm changing this to kernel, since it now seems to
be another symptom of problems in the pcmcia modules in the kernel, like the
instant hang I get when acpi is enabled (bug 100528), just this one hangs only
at the end of rhgb.

Comment 4 Alexandre Oliva 2003-08-01 07:38:16 UTC

Eeek.  And the latest kernel erratum for Shrike (2.4.20-19.9), installed on a
Severn tree, hangs just the same when rhgb is enabled on this machine.

Comment 5 Alexandre Oliva 2003-09-03 23:14:06 UTC

Still broken in kernel-2.4.22-20.1.2024.2.36.nptl, in case it matters (it
probably does)

Comment 6 Jonathan Blandford 2003-10-08 17:30:39 UTC

Are you seeing this with both the latest rhgb and the mount /dev/pts line?

Comment 7 Alexandre Oliva 2003-10-08 20:45:06 UTC

It still hangs after installing today's updates, including hgb-0.10.2-1.

Comment 8 Bill Nottingham 2003-10-08 20:46:58 UTC

Grab initscripts-7.36-2 and kudzu-1.1.32-1 (or later)

Comment 9 Alexandre Oliva 2003-10-09 05:47:52 UTC

Woohoo!  That fixed it!

Comment 10 Alexandre Oliva 2003-10-09 05:57:05 UTC

Argh!  I spoke too soon.  It actually made it all the way to opening the GDM
login screen, but then the system froze as before.  Second time, it froze even
before X for GDM started.  I.e., no change :-(

Comment 11 Bill Nottingham 2003-10-09 17:08:07 UTC

This sounds like more of a kernel or X issue at this point.

Comment 12 Mark Heslep 2003-10-09 22:15:11 UTC

Same problem occurs for me attempting to start X (startx) from run level 3. 
Machine hangs, num lock wont work.  I believe it started just _after_ I grabbed
the latest kudzu & initscripts from freshrpms.net this afternoon

Comment 13 Tom Diehl 2003-10-10 06:34:12 UTC

I am seeing this on an athlon machine with a via chipset. It hangs when X for
gdm starts. The numlock will not respond, the display is garbled. One thing I
did find out is that if a move the mouse normal operation is restored. acpi on
or off makes no difference.

Comment 14 Jonathan Blandford 2003-10-13 18:23:56 UTC

*** Bug 101724 has been marked as a duplicate of this bug. ***

Comment 15 Michael K. Johnson 2003-10-21 20:17:21 UTC

Looks like X to me...

Comment 16 Mike A. Harris 2003-10-21 21:54:18 UTC

Realistically for this kind of hardware specific problem, I'm not sure if
or how myself or anyone else here at Red Hat will be able to debug and fix
this problem without having the hardware physically in front of them and
running things through a debugger, etc.

I don't have this hardware available to do that, so someone who does will
have to either:

- Narrow the problem down and prove it is X, preferably with specific details
  of where it is in X that is causing the problem.

or

- Send me this hardware (Ontario, Canada) to use for debugging purposes for
  an indeterminate amount of time.

Who all out there can reproduce this, and can you please narrow it down, and
report back what the specific problem is?  Personally I don't see any 100%
proof present that this is an XFree86 bug, but it's certainly a possibility.

Awaiting feedback...

Comment 17 Mike A. Harris 2003-10-21 21:59:01 UTC

Also, let me assume it is X for a minute...

Try disabling 2D acceleration with:

    Option "noaccel"

If that works around the problem, try the XaaNo options one at a time from
the XF86Config manpage after commenting out the noaccel option.  Try to find
which if any solve the problem.

If either of these handles the issue it would be a video driver bug, which
is workaroundable in the driver, but only if someone who has the actual
hardware can test this now and provide details as to what works and what
doesn't.

HTH, TIA

Comment 18 Mark Heslep 2003-10-21 22:41:03 UTC

well scratch me off (Comment #12).  I have stopped having X hangups w/ the
radeon driver (Fire GL R300).  Im afraid I lost track of exactly what upgrade
solved the problem but it disappeared 1-2 weeks ago and I sync to Rawhide every
2-3 days.  (Still no joy w/ dual head radeon)

Comment 19 Alexandre Oliva 2003-10-25 18:53:58 UTC

The problem is not in X, but in the kernel.  I found out that if I switch to vt1
before loading $PCIC in /etc/init.d/pcmcia (that's yenta_socket on this box),
and switch back to vt8 right after it, the machine no longer hangs.  This may
very well be related with the other problems on this machine, that have required
noacpi or pci=noacpi in the past (bug 100528).

Comment 20 Alexandre Oliva 2003-10-25 19:22:18 UTC

Some more info.  The problem only occurs when a 3Com Megahertz 3CXFE574BT card
is inserted in a PCMCIA socket.  In fact, I found out that, when the machine
hangs, removing the card brings it back to life.  But then, inserting it back
will freeze the machine again upon the next text-to-graphical-mode switch. 
Unloading the 3c574_cs module is not enough to fix the problem.  It's necessary
to unload the yenta_socket module, and have it run again while in text mode to
fix it, and then the fix is permanent (well, until the next reboot).  Even if I
stop pcmcia (such that even yenta_socket is unloaded) and load it again while X
is active and visible in VT7, the problem no longer occurs.  Maybe it has to do
with the fact that, when I load yenta_socket in text mode it prints messages
such as:

Yenta IRQ list 06b0, PCI irq11
Socket status: 30000007
Yenta IRQ list 06b0, PCI irq11
Socket status: 30000011

I wonder if the problems could be related with the fact that the video card
seems to use system memory, and Something Bad (TM) happens when the module
attempts to write the messages above in text mode while we're in graphical mode.

Comment 21 Alexandre Oliva 2003-10-25 19:24:50 UTC

Created attachment 95480 [details]
patch that works around the hang

Comment 22 John McBride 2003-11-14 05:14:41 UTC

[comment added at the request of Alexandre]
My networking does not initialize properly on FC1 after a clean
install. This notebook has a Netgear FA511 10/100 pcmcia card and
onboard ATI video. This did not happen under RH9. Until I applied the
patch I had to either reinsert/hotplug the network card (after the
boot completed) or set GRAPHICAL=no in /etc/sysconfig/init (and
reboot) to get the ether up. Here are the lines from dmesg and XFree86
log after the patch:

eth0: ADMtek Comet rev 17 at 0xc88e7000, 00:10:7A:6B:19:21, IRQ 10.
 
(--) PCI:*(1:0:0) ATI Technologies Inc 3D Rage LT Pro AGP-133 rev 220,
Mem @ 0xd8000000/24, 0xd9000000/12, I/O @ 0x8000/8, BIOS @ 0x000c0000/17

Comment 23 Alexandre Oliva 2003-11-14 12:38:35 UTC

This pretty much rules XFree86 out of the picture, since we use
completely different video cards.  The network cards are also totally
unrelated, so yenta_socket (can you confirm you're using this module)
is my prime suspect.

Comment 24 Robert Brimhall 2003-11-23 00:24:26 UTC

I get this same problem with rhgb acpi=on and my orinocco card plugged
in. Remove the card and the machine unfreezes. I have yenta socket as
well. By the way, where do you apply the patch?

Comment 25 Miloš Komarčević 2003-11-25 00:39:54 UTC

I guess this covers my bug 106838 as well.

Just tried Alexandre's workaround and got a functional system on boot
for the first time with rhgb, 3c59x and cs4232 working ok as before
Severn.

Comment 26 David Morse 2003-12-18 05:06:53 UTC

FC1, fully up-to-date (as of 12/17/03), Dell Latitude CPxH w/ Linksys 
PCM200 (tulip).

I had the exact same symptoms as John McBride reported here:
http://www.redhat.com/archives/fedora-list/2003-November/msg01073.html
(apparently the Linksys PCM200 and Netgear FA511 use the same chip, 
as mine also reports ADMtek Comet rev 17)

Symptom: with rhgb, my PCM200 NIC fails with these messages 
repeatedly:
eth0: Transmit timed out, status fc67c057, CSR12 00000000, 
resetting...

Either disabling graphical boot or applying Alexandre's patch both 
fix this issue.

Comment 27 Alexandre Oliva 2003-12-18 05:34:37 UTC

FWIW, the same problem was present with
kernel-2.6.0-0.test11.1.13.i686.rpm

Comment 28 J. Erik Hemdal 2004-01-17 16:37:14 UTC

I see this issue using a Dell TrueMobile 1050 PCMCIA card on an 
Inspiron 1100.  It uses the Intersil driver.

My system also uses a portion of system RAM for the (awful) Intel 
video subsystem.  I get the hang just after starting X.  Removing the 
card allows the boot to complete; then plugging it back in brings 
back eth1.

I am using yenta-socket.

Comment 29 J. Erik Hemdal 2004-01-25 21:05:21 UTC

I also get a hang on shutdown when I try to shutdown ntpd while using
the wireless card.  If I disable the card before shutting down, then I
get shutdown failures of processes like ntpd, but I do get a clean
shutdown.

Comment 30 Persio Barros 2004-02-04 11:59:03 UTC

Same problem here. Network hangs after boot with card inserted.
My system is:
KDS Notebook, FC1, Kenel 2.4.22-1.2149.nptl, Video Trident CyberBlade Ai1
I tried 3 different net cards:
3Com 3CXFE575CT
D-Link DFE650
D-Link DWL650+ (Wireless)
Only the second one (DFE650) worked all right. The two other cards
needed Oliva's patch, or to insert them only after the boot to get
them working.
The main difference between the DFE650 card and the others is that it
is a 16bit pcmcia, while the others are 32bit pcCards. Maybe this
information helps to find out what is the cause of the bug.
Another information: I noted another misbehavior in cardctl. It should
emit a beep when the card is inserted and recognized and another beep
when the proper modules are loaded and the initscripts executed. This
is also suposed to happen in the boot proccess when pcmcia service is
started. When the card is not recognized, cardctl is suposed to emit a
lower pith beep. 
With the DFE650 card the two beeps are emited as expected, but with
the two other cards there are no beeps, even when the card is inserted
after the boot.
Hope this helps.
Persio

Comment 31 Alexandre Oliva 2004-02-26 03:44:33 UTC

It's getting better in current rawhide (kernel 2.6.3-1.106). 
Preloading yenta-socket in initrd, with acpi disabled, it boots
perfectly well and the network card works.  Working around the
incomatibility of the current /etc/init.d/pcmcia with kernel 2.6's
/proc, as suggested in bug 116205, however, it will sometimes, but not
always, work.  At least once I noticed that cardmgr had detected only
1 socket, while the notebook has two, and the network card was in the
other.  Restarting cardmgr was enough to get the network card to work.

Comment 32 Alexandre Oliva 2004-03-25 17:29:56 UTC

Pre-loading yenta-socket is no longer needed, but pci=noacpi still is
on this particular laptop, that is known to have buggy acpi tables.  I
suppose we can leave this closed, even though I closed it by mistake
forgetting I still had the pci=noacpi flag in the boot command line.

Note You need to log in before you can comment on or make changes to this bug.