Bug 57476 - init q can hang the system requiring a hard reboot
Summary: init q can hang the system requiring a hard reboot
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-12-13 16:23 UTC by Don Knott
Modified: 2007-04-18 16:38 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-12-17 15:46:22 UTC
Embargoed:


Attachments (Terms of Use)

Description Don Knott 2001-12-13 16:23:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)

Description of problem:
After editing inittab and doing init q to reinitialze, the system may lock 
up solid.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Edit inittab
2. init q to reread inittab
3. System may hang needing to be power cycled
	

Additional info:

System has a Digi RAS4 4bri digital modem card that uses mgetty and pppd 
to provide isdn/analog access.

The last syslog entry is:

Dec 13 09:03:17 temphost pppd[2664]: Terminating on signal 15. 
Dec 13 09:03:17 temphost pppd[2689]: Terminating on signal 15. 
Dec 13 09:03:17 temphost mgetty[1401]: failed dev=ttyG0_03, pid=1401, got 
signal 15, exiting 
Dec 13 09:03:17 temphost mgetty[2148]: failed dev=ttyG0_02, pid=2148, got 
signal 15, exiting

Comment 1 Bill Nottingham 2001-12-13 17:37:37 UTC
If the system hangs hard, that indicates a kernel driver problem of some sort,
most likely.

Comment 2 Arjan van de Ven 2001-12-13 17:39:49 UTC
Well.... unless you kill init....

Comment 3 Don Knott 2001-12-13 17:51:24 UTC
The Digi Datafire RAS4 BRI card has a kernel module called dgdm that makes the 
ports /dev/ttyG_0x available. I think when the crash occurs is when there is an 
active ppp session and I try to do an 'init q' to reread changes to the inittab.

I've also found that 'service dgdm stop' doesn't remove the dgdm module and I 
have to rmmod dgdm before I can do 'service dgdm start'.

Its feeling like this may be the dgdm module causing problems. If it only 
happens on an 'init q' during active ppp sessions I can work around that. Once 
the box is stable and in production I won't be doing 'init q' all the time. 
I'll also contact Digi to get their thoughts on their module.

Comment 4 Don Knott 2001-12-13 22:14:42 UTC
System locked up again and I was not making any changes to inittab and did not 
use 'init q'.

I've opened up a case with Digi support to see if they can shed any light on 
the problem.

Comment 5 Don Knott 2001-12-14 15:56:26 UTC
Digi support recommends installing and using the older gcc compiler.

compat-egcs-6.2-1.1.2.16
compat-glibc-6.2-2.1.3.2

They say they have had reports of issues when gcc-2.96 is used.

I've done this and will see if the system is stable for the next few days.




Comment 6 Arjan van de Ven 2001-12-14 16:01:43 UTC
Ehm that compiler is known to not compile 2.4 kernels well; also using a
different compiler for the kernel and modules is a really bad idea ;(

Do you have a pointer to the source of the module; I'll have a look

Comment 7 Don Knott 2001-12-14 16:17:29 UTC
http://support.digi.com/support/indexes/linux-dfrasb4.html

I had no trouble rebuilding the src rpm with either compiler. I noticed that 
when I built the rpm with gcc-2.96 that the driver wouldn't unload itself and I 
had to do an rmmod to remove it. I haven't seen that problem with the older 
compiler.

Comment 8 Don Knott 2001-12-17 15:46:16 UTC
http://support.digi.com/support/techsupport/unix/linux/rh7xfaq.html

The link above is to RH7x specific FAQ. It states that RH7.1 is not supported 
and that the shipped version of gcc 2.96 is broken.

The FAQ also recommends the latest RH 7.2 kernel and source which I have.

I've attempted to implement all of Digi's recommended fixes.
The system is still unstable and locked up over the weekend.

Comment 9 Don Knott 2001-12-31 16:30:35 UTC
Based upon additional information from Digi, this problem has been found not to 
be a Redhat bug and is a problem with Digi's drivers not being smp safe.

Mail from Digi:

I just recieved more info from Digi:

"Unfortunately, there are reports of system lock-ups on SMP hardware (not
just related to SMP kernel).  We have been advised that Engineering will
be looking into this matter in January.  If you do not have a single
processor machine or if you need this resolved in a more timely manner, we
will be happy to make arrangements for a product refund."


Note You need to log in before you can comment on or make changes to this bug.