Bug 112953
Summary: | LTC5443-Network install of ES 2.1 QU2 fails on xSeries BladeCenter | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | IBM Bug Proxy <bugproxy> |
Component: | anaconda | Assignee: | Jeremy Katz <katzj> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike McLean <mikem> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.1 | CC: | tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-06-09 20:36:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
IBM Bug Proxy
2004-01-06 16:46:53 UTC
This is worked around in update 3. ----- Additional Comments From ruddk.com 2004-01-12 18:48 ------- This has not been fixed in update 3. The PE group was still able to replicate an install hang using the latest U3 bits. It has the exact same fingerprint (although it is dying in the 2.4.9-e.34 kernel RPM this time): ... Installing kernel. tar: error while loading shared libraries: libredhat-kernel.so.1: cannot open shared object file: No such file or directory This is really looking like some sort of intermittent library path bug. What I am seeing is that the i686 libraries are being picked up very intermittently. When things work, this is what an ldd on tar returns: libpthread.so.0 => /lib/libpthread.so.0 (0x40018000) librt.so.1 => /lib/librt.so.1 (0x4004b000) libc.so.6 => /lib/libc.so.6 (0x4005d000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) When it doesn't work, ldd returns: libpthread.so.0 => /lib/i686/libpthread.so.0 (0x40018000) librt.so.1 => /lib/i686/librt.so.1 (0x40049000) libc.so.6 => /lib/i686/libc.so.6 (0x4005c000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) libredhat-kernel.so.1 => not found It is /lib/i686/librt.so.1 that has the reference to libredhat-kernel.so.1. As the libaio RPM has not been installed yet any tar instance that picks up /lib/i686/librt.so.1 will fail. As a test, I chroot'ed to /mnt/sysimage, and ran "ldd /bin/tar" in a loop for a while. Out of 2059 loops, /lib/i686/librt.so.1 was referenced 42 times. Wild hunch: It is possible that the heavy USB activity found in a blade environment is a catalyst for this problem. It appeared that I was able to generate more of these incorrect library references when I was changing the remote console between blades. ----- Additional Comments From lepore.com 2004-01-13 17:05 ------- Glen, In light of the new information gathered, could you re-open the Red Hat bug (it's currently closed)? Thanks. Mike IBM - this is for Issue tracker - shipping product - I don't go hunting for bugs that's why you have a TAM and issue tracker. ----- Additional Comments From lepore.com 2004-01-14 10:30 ------- Glen will be moving this to Issue Tracker. We will set-up a meeting to discuss this with Red Hat once this is open in issue tracker. ----- Additional Comments From gjlynx.com(prefers email via gjohnson.com) 2004-01-14 12:44 ------- ----- Additional Comments From khoa.com 2004-01-14 16:26 ------- Latest update from Kevin Rudd: This does not appear to be something that is limited to the blade servers. I have been successful in replicating the inconsistent library behavior on an x440 system that I have in the lab. This was done on both the U2 and U3 releases of rhes21. I'm about to load up rhas21 to confirm that this is really not an issue in that environment. My earlier thoughts about USB being a factor can be ignored. Thanks, -Kevin ----- Additional Comments From ruddk.com 2004-01-14 17:26 ------- Ignore the thoughts about USB being a factor. The inconsistent nature of the problem can be misleading at times. I have been able to replicate the library behavior on a non-blade system (an x440 system). In addition, I have replicated this with both AS2.1 and ES2.1 For my test, I modified the kernel RPM so that a long sleep was added to it's %post processing. This pauses the install process at the same point that it has been found in in the previous hangs. Once at that point, I am able to switch over to the shell virtual console (F2), and run ldd loop tests. My test is just a simple loop: chroot /mnt/sysimage /bin/bash i=0 while true do if ldd /bin/tar | grep i686 then echo "i686 path picked up after $i loops" break fi i=$((i+1)) done I have seen /lib/i686/librt.so.1 referenced after as few as 14 loops and as many as 8067 loops. ----- Additional Comments From ruddk.com 2004-01-14 17:26 ------- Ignore the thoughts about USB being a factor. The inconsistent nature of the problem can be misleading at times. I have been able to replicate the library behavior on a non-blade system (an x440 system). In addition, I have replicated this with both AS2.1 and ES2.1 For my test, I modified the kernel RPM so that a long sleep was added to it's %post processing. This pauses the install process at the same point that it has been found in in the previous hangs. Once at that point, I am able to switch over to the shell virtual console (F2), and run ldd loop tests. My test is just a simple loop: chroot /mnt/sysimage /bin/bash i=0 while true do if ldd /bin/tar | grep i686 then echo "i686 path picked up after $i loops" break fi i=$((i+1)) done I have seen /lib/i686/librt.so.1 referenced after as few as 14 loops and as many as 8067 loops. ----- Additional Comments From lepore.com 2004-01-15 11:06 ------- The comment above indicates this was fixed in U3, so it seems there may already be some understanding of the issue and/or it's root cause. This issue is top priorty for IBM right now because several hundred machines will not be shipped until this is resolved. Hopefully Kevin's information to reproduce the failure in U3, combined with any knowledge Red Hat already has on this, could be enough for us to find a fix. Red Hat's help on this would be very valuable and much appreciated. Thanks. https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=31562&gid=43 added issue tracker and reopening here. Even better fix went into U4 (fixed tar package) |