| Summary: | I/O errors on boot | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Joe Pope <pope_svr4> | ||||||||||||||||||||||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||||||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||||||||
| Version: | 6.1 | CC: | agk, bmarzins, dwysocha, heinzm, prajnoha, prockai, zkabelac | ||||||||||||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||
| Last Closed: | 2015-09-30 14:25:04 UTC | Type: | --- | ||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
|
Description
Joe Pope
2011-10-05 13:41:35 UTC
This looks like it's the same issue as Bug 690523. If that's the case, there are two sources of your errors. The first is that rdac scsi device handler module was getting loaded after the qlogic driver, and this was keeping the devices from getting set up correctly when they were initially discovered, causing a lot of IO errors. The second issue is likely that multipathd is getting started before all of the devices have been discovered. When this happens, if multipath sees a passive path first, it will activate that path. If there is IO going to the formerly active path, this will be failed back. To make sure that this is what you are seeing, you could try disabling multipath, and verifying that the errors stop The easiest way to do this is to run #chkconfig multipathd off and remove /etc/multipath.conf and then remake the initramfs. You should make a backup of /etc/multipath.conf and your current initramfs. With this done, multipath will be disabled. If you reboot and still see errors, they are being caused by something other than multipath. When I remake the initramfs should I leave out the --preload of scsi_dh_rdac? What about the wwids file in /etc/mulitpath, should that be removed? turned multipathd off, removed multipath.conf, rebuilt initramfs - on reboot the server never finished booting. It continuously scrolled I/O error messages as follows for all of the devices: end_request: I/O error, dev sdcz, sector 0 end_request: I/O error, dev sdcz, sector 15619784576 Then udev started and the errors changed but continued as follows for all of the devices: Buffer I/O error on device sdev, logical block 244059105 ERROR: pdc: reading /dev/sdev [Input/output error] I finally had to force a restart after 20+ minutes. To actually get the server to boot, I had to modify the kernel line from grub.conf and add the following: rdloaddriver=scsi_dh_rdac Once I added that line the server still scrolled some I/O errors but not nearly as many and it booted. I also received the following errors during this boot: udevd-work[562]: rename(/dev/disk/by-id/scsi-360080e500017f0e60000a5894baafeeb.udev-tmp, /dev/disk/by-id/scsi-360080e500017f0e60000a5894baafeeb) failed: No such file or directory udevd-work[562]: rename(/dev/disk/by-id/wwn-360080e500017f0e60000a5894baafeeb.udev-tmp, /dev/disk/by-id/wwn-360080e500017f0e60000a5894baafeeb) failed: No such file or directory (In reply to comment #3) > When I remake the initramfs should I leave out the --preload of scsi_dh_rdac? > What about the wwids file in /etc/mulitpath, should that be removed? You should include "--preload scsi_dh_rdac", otherwise, you will hit those errors. You could also include "--nompath". But removing /etc/multipath.conf should be enough. You don't need to remove the /etc/multipath directory. Dracut and the init scripts only check if the config file is there, to determine if multipathing should be started. So, did you see more IO errors with multipath enabled (and the scsi device handler preloaded) or was it the same? Once you've booted up, can you verify that multipathd really isn't running, and that # multipath -l shows no devices. If multipath really was disabled, then I'm not sure where those errors are coming from. It could be that without multipath there, LVM is probing the passive paths. Or it could something else completely. The --nompath is not a valid option to mkinitrd. I am running the tests from comment #6 now and will post results shortly. And yes multipath was disabled when I ran the tests in comment #4. just verified (In reply to comment #7) > The --nompath is not a valid option to mkinitrd. Sorry. Thanks should be "-o multipath", but it shouldn't be necessary if you removed /etc/multipath.conf before you remade the initramfs. I removed the multipath.conf, turned off multipathd and rebuilt the initramfs with the "--preload scsi_dh_rdac" option. I got the same results as in comment #4. However, if I rebuild the initramfs with the "--preload scsi_dh_rdac" option and enable multipath the server boots in normal time with I/O errors and some of the udev errors but the I/O errors are considerably less. I will attach some docs this evening that have excerpts from dmesg to see if anything can be gleaned from that. Interestingly enough... When I had multipathd off and "--preload scsi_dh_rdac" option built into the initramfs the system would not boot as stated in comment #10. The only way to get it to boot was to modify the kernel line with "rdloaddriver=scsi_dh_rdac". Question: If I built the initramfs to preload the scsi_dh_rdac module shouldn't it have already been loaded? Why did I have to add the rdloaddriver option as well to get it to boot? Created attachment 526654 [details]
config files
Page 1 of config files and command output
Created attachment 526655 [details]
command output
Page 2 of command output
Created attachment 526657 [details]
command output
Page 3 of command output
Created attachment 526658 [details]
command output
Page 4 of command output
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.
Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
(In reply to comment #16) > Since RHEL 6.2 External Beta has begun, and this bug remains > unresolved, it has been rejected as it is not proposed as > exception or blocker. > > Red Hat invites you to ask your support representative to > propose this request, if appropriate and relevant, in the > next release of Red Hat Enterprise Linux. So does this mean there will be no forth coming help with this issue? The problem still exists. What can I do for support on this bug? As far as having the default initramfs make sure that the scsi device handlers have been loaded, that is being dealt with in Bug 690523. Since there is a workaround (manually running dracut with the --preload option) and required code is not upstreams yet, this was pushed back the 6.3 I'm not sure what the difference is between the rdloaddriver and the --preload dracut option. I can take a look. As for the remaining error messages, they may still be caused by multipath as I speculated in Comment 2, but I noticed something odd while looking those four pages of output. All of the error messages on those pages were from scsi devices that weren't listed as multipath paths. When you boot up, with multipath running, can you capture the error messages? Once you are booted up, run # cat /proc/partitions # multipath -ll I'm interested in seeing which devices are actually causing the errors on bootup. I added the following option to the kernel line in grub.conf "rdloaddriver=scsi_dh_rdac". I am using the original initramfs generated when the OS was installed. The initramfs does NOT have the "--preload=scsi_dh_rdac" option built in. The server boots in normal time and displays fewer I/O errors and udev but they are still there. I will be attaching the output of the "multipath -ll" and "cat /proc/partitions" commands. I will also be attaching a large portion of the "dmesg" command. Created attachment 527559 [details]
multipath -ll output
Output: multipath -ll
Created attachment 527560 [details]
cat /proc/partitions output
cat /proc/partitions output
Created attachment 527568 [details]
dmesg output zip 1
dmesg output zip 1
Created attachment 527569 [details]
dmesg output zip 2
Created attachment 527570 [details]
dmesg output zip 3
Created attachment 527571 [details]
dmesg output zip 4
Created attachment 527572 [details]
dmesg output zip 5
Created attachment 527573 [details]
dmesg output zip 6
I'm digging through this information right now, but is there anyway to repost this as text files, instead of PDFs. I would be able to sift through this much faster that way, and I'm not able to convert the pdfs back to text files accurately enough to be helpful. The servers are on a closed network and that is the only way I have to send the files. So, looking at the output, all of your errors are happening before multipath is started (it doesn't get started in the initramfs), and the I/O errors are happening on the passive paths when LVM is scanning the devices. Could you try rebuilding the initramfs with multipath included. first, make sure /etc/multipath exists, then run dracut -a multipath Then LVM should notice that the devices are multipathed and talk to the multipath devices instead of scsi devices. The errors like: "unable to read partition table" "unable to read RDB block 0" "unknown partition table" that happen when a passive path is initially discovered are probably unavoidable. what is the difference between mkinitrd and dracut? They both create an initramfs file. Does dracut do things differently? I will test the dracut command tomorrow and let you know. mkinitrd is now just a bash script wrapper around dracut. I ran "dracut -a multipath" and removed the "rdloaddriver=scsi_sh_rdac" parameter from grub.conf. On reboot I get a bunch of "end_request: I/O error" messages and I see a bunch of "/dev/sd##: read failed after 0 of 4096 at 0: Input/output error" messages. Once multipathd starts there are no more error messages. The server boots in normal time as well. If this is normal behaviour I am fine with that but I just wanted to make sure. I will use the dracut command for all my servers. On shutdown I also see the "...read failed..." messages but the server reboots just fine. Are the results I mentioned in comment #33 normal? If you need more info or logs, please let me know. Before multipath starts, there's no way for any programs to know that they aren't supposed to use the passive paths, so that's probably normal. Works for me. I will just rebuild my initramfs files on each server using the dracut command you sent and call it a day. Thanks for the help. |