Bug 463275
Summary: | Failure of Compute Node to Install on RHEL 5.2 HPC combination | ||
---|---|---|---|
Product: | Red Hat HPC Solution | Reporter: | Tom Lehmann <tsrlehmann> |
Component: | initrd-templates | Assignee: | OCS Support <ocs2> |
Status: | NEW --- | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 5.2 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tom Lehmann
2008-09-22 19:48:09 UTC
Is the hardware you are using certified for Red Hat Enterprise Linux? If you are using hardware that requires drivers new, than what has been enabled in Red Hat Enterprise linux so far, it obviously can't be expected to work. Red Hat has a way to release drivers asynchronously. But before that happens, the hardware in qustion can't be certified for RHEL. There also might not yet be support for these asynchronously added drivers in the HPC Solution. - This is a question to Platform. I have the same problem. Where can I find the list of Certified Hardware for Red Hat Enterprise Linux? The Gigabit Chipset that I have is 82575EB. And as far as I know, the igb driver supports this chipset, however I don't know if is certified for Red Hat. The HW Cert can be found at https://hardware.redhat.com/ If it doesn't work, I guess it is not yet certified. You can request a new driver through the regular Intel / Red Hat process. The system I am using (Intel OEM SR1250ML)contains two of the X38ML motherboards that are on the RHEL certified list. The system can be loaded from DVV media without any issue. It's only when you try to load using PXE that the problem occurs due to the incorrect driver in the first stage. Please test the igb driver from Bug 436040 and post your results here and in that BZ. I do not seem to have access to Bug 436040. The driver I used was the latest posted at the Intel support web site. What should I do? Install one of the kernel rpm packages available al http://people.redhat.com/agospoda/#rhel5? kernel-2.6.18-116.el5.gtest.57.x86_64.rpm Tom, Please go to the url referenced in Comment #7 to access rpms with the drivers we're considering for RHEL 5.3. Ronald, Sorry, no joy. The new kernel RPM apparently doesn't correct the situation. I loaded the head node from my original media (DVD). After updating the system and registering it with RHN I downloaded and installed the new kernel RPM you suggested above. As far as I can tell, the update was successful. I then registered the system for the HPC feature on RHN. I ran the yum install ocs and the ocs-setup to get the head node ready to install compute nodes. All of the intemediate tests passed so I started addhost on the head node. I booted the compute node from the network. The PXE boot went correctly and the head node noted the compute-00-00 at the proper address. On the compute node the first stage started running and tried to find the boot files on the head node. It failed in apparently the same way as the unmodified version of the HPC package. Question: The HPC installation asks the system to copy all of the install media to the system. Is there the chance that the installation procedure overwrites some of the files modified in the RPM? Remember, I fixed the initrd just before I ran addhost. If the ocs installation had overwritten any files, my modification would have overwritten what had just been done. Tom Lehmann The new driver & kernel seems to work fine. At least for the provisioning. Mark helped me to get it working. The node group must be updated to use the new kernel. These are the steps I followed: 1) Copy the kernel into the /depot/kits/rhel/5/x86_64/Server $> cp kernel-2.6.18-116.el5.gtest.57.x86_64.rpm /depot/kits/rhel/5/x86_64/Server/ 2) Run repoman $> repoman -u -r rhel5_x86_64 3) Edit the kusudb database driverpack table and change the "dpname" of the kernel rpm $> sqlrunner -q "update driverpacks SET dpname=\"kernel-2.6.18-116.el5.gtest.57.x86_64.rpm\" where dpid=1" 4) Run driverpatch $> driverpatch nodegroup name=compute-rhel What about using the "pci=nomsi" kernel parameter. It's mentioned in bug 460349. You can set the kernel parameter for a nodegroup using ngedit. We used it with RHEL 5.2 on a small Melstone cluster that was exhibiting the issue, and it seemed to help. |