Bug 37936 - Kudzu hangs the node during boot procedure if the kernel was built with CONFIG_SCSI_MULTI_LUN=y
Summary: Kudzu hangs the node during boot procedure if the kernel was built with CONFI...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kudzu
Version: 7.1
Hardware: i586
OS: Linux
high
high
Target Milestone: ---
Assignee: Bill Nottingham
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-04-26 22:26 UTC by IBM Bug Proxy
Modified: 2014-03-17 02:20 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-05-07 02:22:38 UTC
Embargoed:


Attachments (Terms of Use)

Description IBM Bug Proxy 2001-04-26 22:26:25 UTC
I had re-built the kernel including the modules and the initrd after
turining 
on the flag, CONFIG_SCSI_MULTI_LUN (Refer to defect # 81 for more info).

Upon reboot, the node successfully went through the POST and the initial
setup
of the SCSI/Fibre Channel adapters on the node. When the node entered 
runlevel 3 during the boot procedure it hung with the following message on
the console:
Updating /etc/fstab.

I then rebooted the node in runlevel 1 and edited  /etc/sysconfig/kudzu and
changed SAFE=no to SAFE=yes and then proceeded with boot in runlevel 3.
This
time the node booted correctly without hanging.

What is not clear is what caused kudzu to hang after turining on the flag
CONFIG_SCSI_MULTI_LUN.

There are several work-arounnds for this one:
1. Edit /etc/sysconfig/kudzu and change SAFE=no to SAFE=yes.
2. Un-install kudzu if there is not apparent need for it.

Comment 1 Bill Nottingham 2001-04-27 01:18:22 UTC
That's really odd.  That just turns off the serial probe.

If you just run 'kudzu' from the command line, does it still hang?
If you strace it, where is it hanging.

Comment 2 IBM Bug Proxy 2001-04-27 14:42:42 UTC
LTC#82 
One more thing from my end. 
It appears that editing /etc/sysconfig/kudzu and changing SAFE=no to SAFE=yes 
dosent work everytime. I have one node where it works but on the other end, I 
have two other nodes where this change did not help and I actually ended up 
renaming /etc/rc.d/init.d/kudzu to /etc/rc.d/init.d/kudzu.bak so that the script 
would not get executed at boot time. I am pretty sure that there may be a 
better/correct way for it but I was just trying to verify if disabling kudzu 
during boot actually lets the machine boot all the way.
I also did not feel very comfortable just un-installing the kudzu RPM either 
since other system utilities such Xconfigurator and cdrom support (look in 
/etc/fstab) use kudzu under the wraps.
I request that the RedHat team build a kernel with CONFIG_SCSI_MULTI_LUN=y 
(refer to defect 81) and then look at where kudzu is causing the hang. From my 
end I cant tell more than that it is kudzu. Upon investigation, the RedHat team 
may be able to suggest a better fix/workaround for it.
Thanks much for your help.

Comment 3 Bill Nottingham 2001-04-27 15:48:47 UTC
The problem is that it's most likely hardware specific.

On one of the machines where it hangs if you run it at boot, does it
also hang if you run it after the system is booted?  If so, what
does 'strace -f /usr/sbin/kudzu' say?

Comment 4 James Washer 2001-05-07 02:22:34 UTC
The problem with this is NOT kudzu itself, but rather with updfstab,
The bug is in kudzu:scsi.c:scsiProbe(), The code has a 16kish buffer
       char buf[16XXX];
It then opens proc/scsi/scsi, and does a single read of bugsize.... 
       fd=open("/proc/scsi/scsi"...);
       read(read(fd,buf,sizeof(buf)
Then comes the unfortunate loop control that depends heavily on finding a terminating carriage return...
       while ( *chptr != '\n' )chptr++


Now, if (heaven forbid) the output of /proc/scsi/scsi should happen to be larger than 16k, then there is every likelyhood that the buffer will NOT end with a 
carriage return, so the for loop (above) will go spinning off to na-na land..

The cheap and sleazy way to fix this is just to make the buffer exceptionally large.. say 1M, as this is a shortlived boot time program, I don't think 
allocating a 1M buffer that will hardly ever be page-faulted into existence anyway will be any great risk.. Also, one should probably check for the null 
string end in the while loop as well, as in,
while(  *chptr !=\n ){if(!*chptr){print dire warning and die);chptr++}

A more robust fix would involve processing whatever could be read in a pass, then copying the remainder to the beginning of the bother, and issuing 
another read...  but I think the 1M ( or just make if 5M or so ) should do the trick just fine.



Comment 5 Bill Nottingham 2001-05-07 16:11:18 UTC
This will be fixed in kudzu-0.99.1-1; we'll just dynamically allocate the buffer
as we go along.

Comment 6 IBM Bug Proxy 2001-05-10 20:09:31 UTC
when will kudzu-0.99.1-1 be available

Comment 7 Bill Nottingham 2001-05-10 20:13:16 UTC
It will be available in the next public rawhide push; sometime this week, or
early next week.

Comment 8 Need Real Name 2001-06-15 22:20:10 UTC
SInce this has been resolved in the new level of kudzu ( 0.99.1 -1).  My question is this: is there any way of determining when this will get rolled into 
the normal Red Hat 7.1 distribution so that when customers download 7.1 they will receive the new level of Kudzu? Thank you. (khake.com)

Comment 9 Bill Nottingham 2001-06-16 02:45:06 UTC
Yes, it should be.


Note You need to log in before you can comment on or make changes to this bug.