Bug 485546
Summary: | mount command freezes system with removable SATA drives | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Todd <ToddAndMargo> | ||||||||||||
Component: | kernel | Assignee: | David Milburn <dmilburn> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | 5.2 | CC: | ajb, jfeeney | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i386 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2013-10-30 22:12:06 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Todd
2009-02-14 03:07:59 UTC
Created attachment 332164 [details]
lshw of one of my servers
While searching for a work around to test if my removable drive was actually in the sleeve before call the "mount" command, a respondent said the the output of my "lshw" command would be helpful. Please note that two of the tree servers
only have one partition on /dev/sdb. This output will show the third server which has two. In my write up, I refer only to /dev/sdb1, as it is the majority of my systems.
-T
Would you please boot with "log_buf_len=1000000" kernel parameter and then after booting # echo 509 > /proc/sys/dev/scsi/logging_level And then please capture and attach "dmesg -c" after step #4 (unmount), and also after step #5 and step #6? Created attachment 332836 [details]
dmesg-c drive mounted
This is dmesg -c with the drive mounted (step 4)
Created attachment 332837 [details]
dmesg-c drive dismounted
This is dmesg -c after the drive gets dismounted from step 4
Created attachment 332838 [details]
dmesg -c with drive removed
This is dmesg -c with the drive removed. Step 5
Created attachment 332839 [details]
dmesg -c trying to mount the removed drive
This is dmesg -c trying to mount the drive when the drive is removed (you have two minutes to run the command before the system freezes). Step 6
Hi All, I do not know if this is relevant to this, but Robert Hancock over at kernel.org just wrote me this: "If the device is in AHCI mode then the ata_piix driver won't load for it - at least it won't in current kernels, I can't say for sure that it won't in the CentOS 5 version.. You do need to get it using the ahci driver instead of ata_piix or hotplug definitely won't work. You can try changing the boot initrd to try to load the AHCI driver instead by changing the scsi_hostadapter entry in /etc/modprobe.conf to be ahci instead of ata_piix, then rebuilding the initrd or reinstalling the kernel RPM. However, if the BIOS isn't set up properly for AHCI mode to work, you'll have to either boot up in rescue mode and fix it, or boot up from a different kernel entry in grub." I am afraid rebuilding initrd is over my head. But, maybe something in what he wrote someone else will understand. -T My /etc/modprobe.conf: alias scsi_hostadapter sata_sil alias eth1 e1000e alias scsi_hostadapter1 megaraid_mbox alias scsi_hostadapter2 ata_piix alias snd-card-0 snd-intel8x0 options snd-card-0 index=0 options snd-intel8x0 index=0 remove snd-intel8x0 { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-intel8x0 Hi All, I have been doing a bunch of research and have been corresponding with the folks over at kernel.org. Here is the scoop. 1) my motherboard's bios is corked. Supermicro has since added all kinds of ACHI support to the BIOS. (I will be ordering new BIOS chips on Monday.) 2) You only "automatically" load the ACHI drivers at installation time. No auto detect after the fact. (But, you can do it manually.) 3) From my /etc/modprobe.conf alias scsi_hostadapter sata_sil alias scsi_hostadapter1 megaraid_mbox alias scsi_hostadapter2 ata_piix The "ata_piix" driver is the WRONG driver for "ACHI Hot Pluggable" devices. The driver I need, but did not get, due to my crappy BIOS, at install time, should be alias scsi_hostadapter2 ahci Which won't work properly until I get my new BIOS chips. When working correctly, I should be able to comment out the "ata_piix" driver completely. (And, the ACHI driver should work fine with my SATA DVD/CD writer.) So "New" description of symptom: "mount" will hang for approximately two minutes and then the entire system will freeze when attempting to execute a "mount" on a physically detached Hot Pluggable device when accidentally using the wrong (ata_piix) driver. New steps to reproduce: 0) goes without saying, but backup your stuff! 1) verify that your /etc/modprobe is misconfigured. To reproduce this bug, your should only be using the "ata_piix" driver (alias scsi_hostadapter2 ata_piix) and not the (correct) "achi" (alias scsi_hostadapter2 achi) 2) power off your server. Removable SATA drives only automount with the ACHI driver, so you have to be powered off to register them in /dev. 3) install a removable (not eSATA) ext3 formatted drive as a second drive (not your root). Make sure it is powered on (the switch on the front of the sleeve is on and locked). 4) power back on. Make sure the new drive shows up in /dev. Make the appropriate entry in fstab. For instance /dev/sdb1 /lin-bak ext3 defaults,noauto 0 0). Create a /lin-bak directory. 5) make sure you can mount the drive with mount /lin-bak. Then unmount the drive (umount /lin-bak). Make sure the drive is dismounted by checking mtab (cat /etc/mtab) 6) power off the removable SATA drive and/or remove the removable SATA drive from the sleeve 7) attempt to remount /lin-bak with "mount /lin-bak" "mount" will hang for about one to two minutes, then your system will hard freeze. Note: if you are quick enough, and repower the device, you can save yourself the freeze up. It will take me two to three weeks to upgrade my three server's BIOS. I can delay the update on one of the servers (mine) if requested by this forum. Let me know. Many thanks, -T Hi All, A comment and an update. A comment. My motherboard's controller is a "hot plugging" controller. Whether or not the BIOS is configured for AHCI, comparing removing a hard drive from my controller to "rip[ing] out a memory chip or your VGA", as has been suggested in other quarters, is a great misunderstanding of the technology. I far as I can tell, AHCI is not the default BIOS settings on any motherboard I have checked. I had no idea I was mis-configured until I took down an accounting firm's server right in the middle of tax season. I had no idea there was any problem hot plugging a device in to or out of my motherboard's hot plugging controller without the correct driver being loaded into Linux. From my posting around the web on this issue, very, very few individuals know either. So, the object of this bug is to give the user an error message when he tries to mount a missing drive when he is using the wrong driver to operate what he thinks is a properly configured system, instead of crashing the server. My uninformed opinion would be to look at the two minute time out and find why it crashes instead of safely reentering. An update. I got my first server's BIOS chip changed. Updating initrd was way simpler than I had feared. (My boot device is a RAID controller, which does not require ahci or ata_piix to operate: I can still boot if I goof ahci or ata_piix. Fortunately, it never came to that.) My removable SATA drive works perfectly under my new BIOS's AHCI function, just like its USB and Firewire Hot Plugging cousins (and not at all like ripping my memory or VGA out of my system). I will be updating my second server tomorrow. I will wait on updating my third server (my office's server) until I hear back from you guys. -T This Bugzilla has been reviewed by Red Hat and is not planned on being addressed in Red Hat Enterprise Linux 5, and therefore is being closed. If this bug is critical to production systems, please contact your Red Hat support representative and provide a sufficient business justification in order to re-open it. |