Bug 104243
Summary: | raid1 install hangs installing kernel | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | James Laska <jlaska> | ||||||||||
Component: | kernel | Assignee: | Jason Baron <jbaron> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 2.1 | CC: | bruno.verkist, coughlan, dale_kaisner, damorep, gary_lerhaupt, jakub, jturner, katzj, knoel, ltroan, mgalgoci, nate, tao, yngve.svendsen | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | Dell requests fix for U3. | ||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2004-02-12 20:55:38 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 107565 | ||||||||||||
Attachments: |
|
Description
James Laska
2003-09-11 18:07:17 UTC
This isn't mkinitrd's fault... it's just running tar to a pipe and then the tar process is getting endless looping SIGPIPE's. *** Bug 107728 has been marked as a duplicate of this bug. *** An update from SUN - Original Problem: We are attampting to install Redhat AS 2.1 QU2 on a Sun Fire V60x machine (dual Xeon 2.8GHz, 1Gb RAM, 2 e1000 network interfaces) over the network (Kickstart).Everything works up to the point where the kernel RPM is unpacked, about 1m 10s after starting to install RPMS. The RPM is unpacked (the progress bar goes to 100%), then the installation hangs. We believe this is related to the dual network interfaces. We have encountered this on other systems, and the workaround has always been to remove the extra (up to 4 extra interfaces on some machines) network interfaces, and this has always caused installation to run smoothly. However, the workaround is not possible here, since the built-in interfaces cannot be disabled individually. Anaconda is anaconda-7.2-68_ELAS. - New Update: By dropping this new kernel RPM (2.4.9-e.27.6) into the network RPM repository and regenerating the hdlist file, I am now able to install without any problems. As an added bonus, we can now skip the postinstall kernel update. Still, the original problem is still there, and as far as I can see, plain-vanilla non-hacked RHAS 2.1 QU 2 can't be installed by Kickstart on a V60x. *** Bug 73414 has been marked as a duplicate of this bug. *** *** Bug 109047 has been marked as a duplicate of this bug. *** Bug 109047 was opened against Dell Issue Tracker 28914 which is a sev 1 and was DUP'd to this bug. Dell requests this be fixed for U3 though it came in after the MUSTFIX deadline so not marked with Blocker bug. Updating this Bug severity to HIGH to reflect sev 1. I can reproduce this issue 100% of the time on RHEL2.1 U3 beta on a Dell PowerEdge 6650, QLogic 2342 card inserted and the ks.cfg file (linux ks=floppy install) that I will attach. Created attachment 96456 [details]
dell ks.cfg
Gary, we don't have any 2342 cards in either Centennial or Westford (least none that we find) but I've tried an install with a 2312 card and am not seeing any problems. Can you provide some more details about how the card is configured (point-to-point, loop?) Also, I'm assuming that you're putting all of the partitions on the connected array? FROM BUGZILLA 109047 (marked as DUP of this Bug). Additional Comment #5 From Gary Lerhaupt on 2003-11-04 16:39 ------- A couple housekeeping questions. First, where it says "Resolved" above, I assume this means that this specific bugzilla is resolved as a duplicate, not that the underlying issue is resolved (since the underlying issue is still in the new state). Secondly, is 104243 marked as a MUSTFIX for Q3? ------- Additional Comment #6 From Gary Lerhaupt on 2003-11-06 12:37 ------- Updating the severity. Any feedback on my questions above? ------- Additional Comment #7 From Larry Troan on 2003-12-15 10:03 ------- "Resolved" does mean DUP of 104243 -- not underlying problem resolved. Bug 104243 is not a MUSTFIX for U3. Neither was 109047 as it came in beyond the Engineering cutoff... Sue Denham is discussing with Dale at Dell about the criticality of getting this resolved and into Update 1. This problem is being tracker by Bug 104243. There are no cables attached to the 2342 during installation. All partitions are on the local disk. Apparently this has been replicated on U3 beta 2. What change was made in U3 beta 2 that was thought to address this? I just found this in /mnt/sysimage/tmp/install.log Installing kernel tar: error while loading shared libraries: libredhat-kernel.so.1: cannot open shared object file: No such file or directory --- On an installed system, libredhat-kernel.so.1 is a symlink to libredhat-kernel.so.1.0.1. Neither of these appear to exist in /mnt/sysimage/lib. Thoughts? Gary, Can you take a look at /var/log/messages after Anaconda gets up and running? Is there any sign of a problem when the QLA2342 driver is loaded? Differences when the QLA2342 is not present? Tom /var/state/xkb/syslog seems normal. It reports two lips and two disconnected cables as I would expect and continues on its way. What are your thoughts on the tar error above? Please see this bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=67213 I wonder if the lack of libredhat-kernel, which Adrian suggests might break up2date, is possibly breaking mkinitrd/tar during install. Adding in a piece of trivia gleaned from a conversation w/ Dale: They had no storage whatsoever connected to the qlogic card. They weren't installing to it. Rather they were installing to the baseboard disks which weren't on the QLA2342. Received the card this morning, threw it in a 6650, and booted up the install (using the kickstart provided by Dell, with minor tweaks to work in our environment) and I'm not seeing any hangs. We're going to keep poking around here, as well as try to find some of the machines which IS was reporting problems on and see if we can replicate. Created attachment 96569 [details]
syslog from stalled install with sysrq info
Gary, Can you give us the same syslog information from a system that does not have the QLA2342 card installed? Then we can see what is different in a systme that works. Thanks. Tom Hmm. At what point during the install do you want me to run the sysrq on this system without the 2342 installed? This seems arbitrary to me. In answer to Sue's question to Dale: This issue first showed up in RHEL2.1 Update 2. i think Tom was looking for syslog from the install, not the sysreq data regarding the tar eror from comment 18, it is something that certainly needs to be fixed, but i'm not yet convinced it is the root of the installer hang. i tried installing kernels on running system with the libredhat symlink removed and i get the same errors from tar, but the installation finishes...The initrd is hosed and needs to be re-made but i don't see any sort of hang. Can you get us a "top" and "vmstat 1" output when the system is hung up in tar so we can verify the system is looping in the kernel. Also, please get us several "AltSysrq P" outputs so we can see where in the kernel it is looping. Thanks, Larry Created attachment 96586 [details]
Syslog from system with no QLogic 2342
Below are a couple of sysrq-p's. There is no top or vmstat available during install. <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f285>] CPU: 0EIP is at <4> ESP: 002b:bffeb77c EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 40029ff8 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f284>] CPU: 0EIP is at <4> ESP: 002b:bffeb778 EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c011e1e6>] CPU: 0EIP is at <4> EFLAGS: 00000286 Not tainted <4>EAX: 00000001 EBX: dcc3e000 ECX: 0000000d EDX: 00000000 <4>ESI: 0000000d EDI: 00000286 EBP: 00000000 DS: 0018 ES: 0018 <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: [<c013df1d>] (0xdcc3ff50) <4>[<c0135b99>] (0xdcc3ff7c) <4>[<c0108458>] (0xdcc3ff94) <4>[<c010a954>] (0xdcc3ffa8) <4>[<c0107003>] (0xdcc3ffc0) <4> <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c0106fd2>] CPU: 0EIP is at <4> EFLAGS: 00000282 Not tainted <4>EAX: 00000004 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c0106fd0>] CPU: 0EIP is at <4> EFLAGS: 00000282 Not tainted <4>EAX: 00000004 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c0106fd0>] CPU: 0EIP is at <4> EFLAGS: 00000282 Not tainted <4>EAX: 00000004 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f284>] CPU: 0EIP is at <4> ESP: 002b:bffeb778 EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c011e1e6>] CPU: 0EIP is at <4> EFLAGS: 00000286 Not tainted <4>EAX: 00000001 EBX: dcc3e000 ECX: 0000000d EDX: 00000000 <4>ESI: 0000000d EDI: 00000286 EBP: 00000000 DS: 0018 ES: 0018 <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: [<c013df1d>] (0xdcc3ff50) <4>[<c0135b99>] (0xdcc3ff7c) <4>[<c011a24b>] (0xdcc3ff94) <4>[<c0108458>] (0xdcc3ffac) <4>[<c0107003>] (0xdcc3ffc0) <4> <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f284>] CPU: 0EIP is at <4> ESP: 002b:bffeb778 EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c011e1e6>] CPU: 0EIP is at <4> EFLAGS: 00000286 Not tainted <4>EAX: 00000001 EBX: dcc3e000 ECX: 0000000d EDX: 00000000 <4>ESI: 0000000d EDI: 00000286 EBP: 00000000 DS: 0018 ES: 0018 <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: [<c013df1d>] (0xdcc3ff50) <4>[<c0135b99>] (0xdcc3ff7c) <4>[<c0108458>] (0xdcc3ffa0) <4>[<c0107003>] (0xdcc3ffc0) <4> <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f284>] CPU: 0EIP is at <4> ESP: 002b:bffeb778 EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0023:[<4013f284>] CPU: 0EIP is at <4> ESP: 002b:bffeb778 EFLAGS: 00000282 Not tainted <4>EAX: ffffffe0 EBX: 00000001 ECX: 080720e0 EDX: 00001800 <4>ESI: 00001800 EDI: 080720e0 EBP: bffeb7a8 DS: 002b ES: 002b <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c011e1e6>] CPU: 0EIP is at <4> EFLAGS: 00000286 Not tainted <4>EAX: 00000001 EBX: dcc3e000 ECX: 0000000d EDX: 00000000 <4>ESI: 0000000d EDI: 00000286 EBP: 00000000 DS: 0018 ES: 0018 <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: [<c013df1d>] (0xdcc3ff50) <4>[<c0135b99>] (0xdcc3ff7c) <4>[<c0108458>] (0xdcc3ffa0) <4>[<c0107003>] (0xdcc3ffc0) <4> <6>SysRq : Show Regs <4> <4>Process: 767, { tar} <4>Kernel 2.4.9-e.33BOOT <4>EIP: 0010:[<c011e1e8>] CPU: 0EIP is at <4> EFLAGS: 00000286 Not tainted <4>EAX: 00000000 EBX: dcc3e000 ECX: 0000000d EDX: 00000000 <4>ESI: 0000000d EDI: 00000286 EBP: 00000000 DS: 0018 ES: 0018 <4>CR0: 8005003b CR2: 080cf00c CR3: 1cbe1000 CR4: 000006d0 <4>Call Trace: [<c013df1d>] (0xdcc3ff50) <4>[<c0135b99>] (0xdcc3ff7c) <4>[<c011a24b>] (0xdcc3ff94) <4>[<c0108458>] (0xdcc3ffac) <4>[<c0107003>] (0xdcc3ffc0) <4> I've put an updates.img up at http://people.redhat.com/~katzj/tarhang.img. It contains a workaround that I think should solve the hang you're seeing. Instructions for use: 1) Download and copy to a floppy disk 2) Boot with 'linux updates' 3) Provide the floppy when prompted 4) See what happens :) Feedback would be much appreciated. Using the above updates disk along with my kickstart floppy, the issue seems to have been cleared up. This appears to be a suitable workaround for now but I need to understand the impacts of setting LD_ASSUME_KERNEL=2.2.5 and do more testing to ensure this is proper. FROM ISSUE TRACKER Event posted 12-29-2003 12:28pm by glerhaupt with duration of 0.00 I cannot reproduce the issue in RHEL 2.1 U3 RC 1. It appears fixed. Can you comment on what the exact fix was? By the way, it appears in addition to needing a QLA2342 to reproduce this issue, you also need a PERC3 in conjunction with it. Basically what was happening was that the kernel was being installed before libredhat-kernel. The librt in the i686 glibc depends on the existence of the libredhat-kernel stub for some AIO functions. When we shipped RHEL2.1 originally, the version of tar included did not depend on librt. Later, an errata version of tar began linking against librt to provide sub-second resolution on timestamps. Setting LD_ASSUME_KERNEL makes it so that the i386 glibc is used for any scriptlet processing and thus not the librt that depends on libredhat-kernel's functionality. This is only a workaround for U3. In the future, we're going to go back to a fixed version of tar which doesn't have this requirement on librt. This appears fixed to me at this point. I'm closing |