Bug 57476

Summary: init q can hang the system requiring a hard reboot
Product: [Retired] Red Hat Linux Reporter: Don Knott <dknott>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-12-17 15:46:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Don Knott 2001-12-13 16:23:05 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)

Description of problem:
After editing inittab and doing init q to reinitialze, the system may lock 
up solid.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Edit inittab
2. init q to reread inittab
3. System may hang needing to be power cycled
	

Additional info:

System has a Digi RAS4 4bri digital modem card that uses mgetty and pppd 
to provide isdn/analog access.

The last syslog entry is:

Dec 13 09:03:17 temphost pppd[2664]: Terminating on signal 15. 
Dec 13 09:03:17 temphost pppd[2689]: Terminating on signal 15. 
Dec 13 09:03:17 temphost mgetty[1401]: failed dev=ttyG0_03, pid=1401, got 
signal 15, exiting 
Dec 13 09:03:17 temphost mgetty[2148]: failed dev=ttyG0_02, pid=2148, got 
signal 15, exiting

Comment 1 Bill Nottingham 2001-12-13 17:37:37 UTC
If the system hangs hard, that indicates a kernel driver problem of some sort,
most likely.

Comment 2 Arjan van de Ven 2001-12-13 17:39:49 UTC
Well.... unless you kill init....

Comment 3 Don Knott 2001-12-13 17:51:24 UTC
The Digi Datafire RAS4 BRI card has a kernel module called dgdm that makes the 
ports /dev/ttyG_0x available. I think when the crash occurs is when there is an 
active ppp session and I try to do an 'init q' to reread changes to the inittab.

I've also found that 'service dgdm stop' doesn't remove the dgdm module and I 
have to rmmod dgdm before I can do 'service dgdm start'.

Its feeling like this may be the dgdm module causing problems. If it only 
happens on an 'init q' during active ppp sessions I can work around that. Once 
the box is stable and in production I won't be doing 'init q' all the time. 
I'll also contact Digi to get their thoughts on their module.

Comment 4 Don Knott 2001-12-13 22:14:42 UTC
System locked up again and I was not making any changes to inittab and did not 
use 'init q'.

I've opened up a case with Digi support to see if they can shed any light on 
the problem.

Comment 5 Don Knott 2001-12-14 15:56:26 UTC
Digi support recommends installing and using the older gcc compiler.

compat-egcs-6.2-1.1.2.16
compat-glibc-6.2-2.1.3.2

They say they have had reports of issues when gcc-2.96 is used.

I've done this and will see if the system is stable for the next few days.




Comment 6 Arjan van de Ven 2001-12-14 16:01:43 UTC
Ehm that compiler is known to not compile 2.4 kernels well; also using a
different compiler for the kernel and modules is a really bad idea ;(

Do you have a pointer to the source of the module; I'll have a look

Comment 7 Don Knott 2001-12-14 16:17:29 UTC
http://support.digi.com/support/indexes/linux-dfrasb4.html

I had no trouble rebuilding the src rpm with either compiler. I noticed that 
when I built the rpm with gcc-2.96 that the driver wouldn't unload itself and I 
had to do an rmmod to remove it. I haven't seen that problem with the older 
compiler.

Comment 8 Don Knott 2001-12-17 15:46:16 UTC
http://support.digi.com/support/techsupport/unix/linux/rh7xfaq.html

The link above is to RH7x specific FAQ. It states that RH7.1 is not supported 
and that the shipped version of gcc 2.96 is broken.

The FAQ also recommends the latest RH 7.2 kernel and source which I have.

I've attempted to implement all of Digi's recommended fixes.
The system is still unstable and locked up over the weekend.

Comment 9 Don Knott 2001-12-31 16:30:35 UTC
Based upon additional information from Digi, this problem has been found not to 
be a Redhat bug and is a problem with Digi's drivers not being smp safe.

Mail from Digi:

I just recieved more info from Digi:

"Unfortunately, there are reports of system lock-ups on SMP hardware (not
just related to SMP kernel).  We have been advised that Engineering will
be looking into this matter in January.  If you do not have a single
processor machine or if you need this resolved in a more timely manner, we
will be happy to make arrangements for a product refund."