Description of problem:
Got there message when try to install:
Running anaconda 13.21.134, the Red Hat Enterprise Linux system installer - please wait.
Anaconda died after receiving signal 9.
install exited abnormally [1/1]
The system will be rebooted when you press Ctrl-C or Ctrl-Alt-Delete.
udevd: worker  failed while handling '/devices/pci0000:00/0000:00:07.0/0000:0d:00.1/host4/rport-4:0-4/target4:0:2/4:0:2:15/block/sdiw/sdiw1'
udevd: worker  unexpectedly returned with status 0x0100
I was investigated into this (but forget how). This is what I can recall:
1. I found many lvm pvs process, about 400, each are scaning on 1 disk or 1 partion.
2. These disks should be assembled by multipath, no scan needed on disk itself.
These lvm process might cause out of memory and anaconda process got killed by OOM killer.
I know nothing about anaconda storage part, please forgive my wild guess.
After I unmap reduce LUN count into 10 (80 disks) on storage array, installation works well.
Version-Release number of selected component (if applicable):
RHEL 6.1 GA also have same problem.
Steps to Reproduce:
1. mask/map 20+ LUNs to a host via 8 paths.
2. Install OS
Please attach /tmp/*log.
I tried sshd kernel option, but still no change to got a shell for it.
Will the syslog=<IP> options provide sufficient information?
Please switch to tty2, cd into /tmp and then copy out all the *log files there (using scp for instance).
no ttys2 for beaker console.
I will try KVM and provide the info to you.
ok thanks. I'll keep this in needinfo until then.
Created attachment 521561 [details]
Anaconda log for installation failure on storageqe-03
Storageqe-04 is busy with HBA driver auto-testing.
This is the log for storageqe-03, it only have 50 LUNs via 4 paths. (200 disks).
Not sure why we got many I/O error (maybe the LVM try to access the passive link).
I found 3 issue:
1. There will be 380+ LVM PVS process sanning each patition.
2. storage log show that you are try to scan multipath on /dev/sda1, it might be incorrect, multipath is not base on partition. for sda1, kpartx will handle it after mpath created.
3. syslog and other logs show that it suddenly been killed as the last log line only is partial.
My wild guess it's out of memory.
The syslog is full of errors like this:
22:14:12,133 ERR kernel:end_request: I/O error, dev sdcr, sector 4063216
22:14:12,133 INFO kernel:sd 4:0:1:46: [sdcr] Unhandled error code
22:14:12,133 INFO kernel:sd 4:0:1:46: [sdcr] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
22:14:12,133 INFO kernel:sd 4:0:1:46: [sdcr] CDB: Read(10): 28 00 00 00 08 08 00 00 08 00
I think it is either a hardware error or some kernel problem.
I don't think that is storage array hardware issue.
And storageqe-03 and -04 are using different storage array. It's a little possibility that they are having problem at the same time.
HBA driver testing show both storage array works well.
I will investigate on these I/O errors' but I think you might need to find a good way for LVM PVS scan. I think kicking off large mount of lvm pvs process is what this bug supposed to fix.
LVM should not touch any /dev/sdX before multipath started because accessing passive link will cause storage array performing controller transition. (That might why the I/O come). Customer will get angry if their storage array keep bouncing if you are access all link (passive and active).
This is anaconda or initramfs issue. Change back to anaconda component.
clearing the needinfo.
This one has been skipped through several update releases, moving it over to rawhide so we can address it upstream and consider backports to RHEL releases after that.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
Is anyone able to test this with F18 or F19-Beta-TC3? There has been a considerable amount of change in the storage since F13.