From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) Description of problem: Installing a 64-way pSeries lpar machine with FCTest2. I successfully did a netboot from a nimserver to boot up the lpar using FC5Test2 images. The ibm scsi driver (cannot recall driver's name) loaded just fine. Then the message "Loading DAC960 driver" was displayed and machine stays here forever unless I reboot. Version-Release number of selected component (if applicable): anaconda How reproducible: Always Steps to Reproduce: 1.netboot using ppc/images/netboot/ppc64.img 2. 3. Actual Results: Machine hangs forvever with message "Loading DAC960 driver..." Expected Results: I was expecting the next screen which has me "to choose a Language" to come up and thus continue installation process. Additional info: I am using vnc. At my open firmware prompt, I do "setenv boot-file vnc",so I can use vnc for installing lpar.
I think we should be just using ipr. If you netboot yaboot and use linux nostorage then select the ipr driver does it work? What does lspci look like on the system?
I am not loading these drivers myself, the installer is. The disk I am installing is clean in that it never had an operating system on it so I can't do an lspci.
If you pass nostorage as a boot argument you will be able to select and should be able to complete an install.
We did as suggested and passed nostorage as a boot argument so we could select devices. Just selecting ipr did not help. However when we tried sym53c8xx it worked and our install completed. An lspci on the machine shows d0:01.0 SCSI storage controller: LSI Logic/Symbios Logic 53c1010 66MHz Ultra3 SCSI Adapter (rev 01) d8:01.0 SCSI storage controller: Mylex Corporation AccelRAID 600/500/400/Sapphire support Device (rev 04) I recall that when installing, the sym53c8xx always installed first and successfully and then when loading DAC960 next, it would hang. By not installing DAC960, everything went ok. Perhaps a hw probe saw the Mylex driver and so attempted to load DAC960, thus the hang.
If you increase the kernel log level and try loading DAC960 on the running system do you get a hang. You might want to enable Sysrq so you can drop into xmon to get more details.
Connecting to IBM Ltc bug 23422... Thanks.
Does rawhide also hang?
ahh, I would like to find out, but I need help. Please point me to where the 'ppc64.img' netboot image is for rawhide. I find the FC5 copy from mid March here: http://download.fedora.redhat.com/pub/fedora/linux/core/5/ppc/os/images/netboot/ but http://download.fedora.redhat.com/pub/fedora/linux/core/development/ppc64/images/ appears empty to me. I assume 'rawhide' is synonymous with these daily builds in the "development" branch. true? http://download.fedora.redhat.com/pub/fedora/linux/core/development/tree-ppc/ppc64/ has a yaboot.conf. But the only way I know to start this on ppc64 is to copy into an already bootable system. Is this my best option? or is there a ppc64.img in the rawhide tree?
I may have found it. I think my mistake was going into development/ppc64. Instead I find a netboot image here: http://download.fedora.redhat.com/pub/fedora/linux/core/development/ppc/images/netboot/ Is this the "rawhide" image? (sorry for the newbie questions)
Yes the ppc tree is what you want - it's a biarch tree (32 and 64 bit) and is what we base the trees off, the ppc64 tree is just a side affect of how we build. You might want to check out: http://fedoraproject.org/wiki/Testing I'd tend to use the kernel/initrd with yaboot: http://download.fedora.redhat.com/pub/fedora/linux/core/development/ppc/ppc/ppc64/ Perhaps it'd be a good idea to work on a ppc specific testing wiki page.
> Does rawhide also hang? appears fixed in the May 1 snapshot of rawhide. on 5 victims, both hmc-attached lpars and standalone, they all recreated on FC5 from mid-March, but did not recreate on rawhide.
Thanks for testing. I'm going to close this out as there is a known work around, and it's fixed moving forward.
This was not recreating on the May1 snapshot of rawhide. On the May10 snapshot I captured this morning, the problem again recreates.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |REOPENED Resolution|FIX_BY_DISTRO | ------- Additional Comments From marksmit.com 2006-05-10 21:18 EDT ------- This was not recreating on the May1 snapshot of rawhide. On the May10 snapshot I captured this morning, the problem again recreates.
Do you happen to know the exact kernel version it was working on with the May 1 snapshot. Was it on exactly the same hardware? Had anything been run on the hardware in between?
----- Additional Comments From marksmit.com 2006-05-10 23:16 EDT ------- hostname: hermeslp1 OpenPower710 was previously installed/running my "May 1"snapshot of rawhide: Linux version 2.6.16-1.2181_FC6 (bhcompile.redhat.com) (gcc version 4.1.0 20060425 (Red Hat 4.1.0-11)) #1 SMP Sun Apr 30 23:03:19 EDT 2006 I then shutdown and net booted ppc64.img from today. so same exact hw config. I then rebooted after recreate, net booted the May1 version of ppc64.img and re-installed the May1 images.
----- Additional Comments From marksmit.com 2006-05-19 16:21 EDT ------- This problem continued to recreate as I sampled ppc64.img netboots from rawhide. Until the May18 snapshot. It recreates on kernel-2.6.16-1.2202_FC6.src.rpm (May 15th snapshot) and no longer on kernel-2.6.16-1.2206_FC6.src.rpm (May 18th snapshot)
Manoj can we try and get somme more details on what may be causing this - you might want to check the interaction between dac960 and the other devices on the system (pci id reuse, etc). Some sort of trace would be useful - you may have to add some debugging to loader to make this work and create an new initrd.img with your debugging loader.
changed: What |Removed |Added ---------------------------------------------------------------------------- Version|FC5 |Other ------- Additional Comments From marksmit.com 2006-05-26 22:31 EDT ------- I am changing the version on the LTC Bugzilla from FC5 to "other" to reflect that this is found on Fedora-devel (rawhide). The status remains the same. It did not recreate on May1 version of fedora-devel. It recreated on May10,May15 versions of fedora-devel. It does not recreate on May 18, May25 & today's May26 versions of fedora-devel.
changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|RH178229-FC5 hangs loading |RH178229-Fedora-devel hangs |DAC960 prior to install |loading DAC960 prior to | |install Version|Other |devel ------- Additional Comments From marksmit.com 2006-06-05 23:30 EDT ------- I am changing the version to 'devel' to properly reflect that this recreates on rawhide. The status of this bug has changed. It now recreates on the June5 version of rawhide. kernel-2.6.16-1.2245_FC6.ppc64.rpm
----- Additional Comments From marksmit.com 2006-06-08 01:50 EDT ------- And now it is gone again with June6 snapshot: kernel-2.6.16-1.2252_FC6.ppc64.rpm
----- Additional Comments From marksmit.com 2006-06-08 02:28 EDT ------- oops. the kernel levels are correct, but calling it a June6 snapshot is wrong. It is the June7 snapshot with that kernel that no longer recreates.
----- Additional Comments From marksmit.com 2006-06-08 19:51 EDT ------- And now it is back again with the June8 snapshot of rawhide: kernel-2.6.16-1.2255_FC6.ppc64.rpm
This just sounds like the dac960 is exporting that it supports a PCI id which it really doesn't. That, or your virtual scsi is saying that it's a dac960 when it's really not. Can you provide the output of lspci and lspci -n?
Created attachment 131208 [details] lspci-n.txt
----- Additional Comments From marksmit.com 2006-06-20 13:11 EDT ------- lspci and lspci -n output for two different ppc64 recreates.
----- Additional Comments From marksmit.com 2006-06-20 13:19 EDT ------- Customers are also encountering this bug: http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?message=13821904&cat=5&thread=119552&treeDisplayType=threadmode1&forum=375#13821904 I posted the workaround.
changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |gcwilson.com ------- Additional Comments From gcwilson.com 2006-06-24 16:41 EDT ------- I am encountering this bug installing FC5 on the same 64-way Squadrons H that Joy Latten reported seeing it on back in January. I can't speak for rawhide because I need FC5 + updates + selected rawhide packages for LSPP testing. Same work around still applies. However, on boot after the install completes, start_udev hangs forever. I booted it with init=/bin/bash and ran start_udev manually in the background. Turns out start_udev is attempting to modprobe DAC960. modprobe hangs using 100% CPU and start_udev never completes. It generated a stack dump, which unfortunately got wiped from my terminal history. Workaround is to remove DAC960.ko from the modules tree. It appears not to recreate then I move the module back into place and run start_udev now--maybe it has something to do with the preexisting device nodes?
----- Additional Comments From marksmit.com 2006-07-05 12:55 EDT ------- This recreated using the FC6_test1 network boot image.
----- Additional Comments From marksmit.com 2006-07-13 13:28 EDT ------- This may be related to the IBM Power RAID IPR driver. July 10 rawhide did not recreate. July 11 rawhide does. So I took a SF4 (p5-550) hmc-attached system and started removing devices from the lpar profile (power-off, then activate to try new). The non-raid scsi adapter that does _not_ use IPR boots past the point of hang. Putting the integrated IPR (T14) device or either of the slot 2 or 4 raid scsi adapters (IBM f/c 5703) will cause the dac970 hang problem to occur on netboot. I will get the hmc info on those adapters and append shortly.
----- Additional Comments From marksmit.com 2006-07-13 14:00 EDT ------- device that recreates: feature code 5703 vendor id 1069 device id B166 subsyst vend id 1014 subsyst dev id 0278 class code 0104 revision id 04 This is on slot C2 of 9123-720 (OpenPower 720) ninagal integrated T14 IPR and raid adapter in slot C4 (all recreate alike) have identical f/c and properties to these. hmc access: sqh14lte.upt.austin.ibm.com hscroot abc123 I am also getting info from the p5-550. paytonlp1 (on same hmc for ease). f/c and properties of older non-raid adapter in slot C08 of external reliance io drawer: this does not recreate, but loads the sym53C8xx device instead of ipr PCI 160MB Ultra3 SCSI LVD f/c 6203 vendor id 1000 device id 0021 subsyst vend id 1000 subsyst dev id 1010 class code 0100 revision id 01
----- Additional Comments From marksmit.com 2006-07-13 14:42 EDT ------- I have more resources in paytonlp1. both f/c 5702 and 5703 adapters recreate. These adapters also cause the hang on 'DAC960' (my mistake calling out 'DAC970' - 2ndprior comment - in this bug - typo) Storage Controller f/c 5702 vendor id 1069 device id B166 subsyst vend id 1014 subsyst dev id 0266 class code 0100 revision id 04 These lpfc Fibre Channel Serial Bus adapters do not recreate f/c <none> vendor id 10DF device id FA00 subsyst vend id 10DF subsyst dev id FA00 class code 0C04 revision id 01
----- Additional Comments From bjking1.com(prefers email via brking.com) 2006-07-13 15:12 EDT ------- This is a DAC960 bug. The pci id table in DAC960.c indicates it supports all devices with PCI vendor id 1069 and PCI device id B166, which it does not. There are several ipr adapters that use this same chip. The DAC960 driver needs to be fixed to specify which PCI subsystem ids it supports so it does not try to initialize ipr adapters. Can we simply not build DAC960 for ppc64pseries and ppc64iseries as a short term solution?
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|csiddali.com |bjking1.com ------- Additional Comments From bjking1.com(prefers email via brking.com) 2006-07-13 16:09 EDT ------- I sent a couple notes out to try to track down a contact for the DAC960 driver/hardware...
Created attachment 132464 [details] dac960_id_table_fixup.patch
----- Additional Comments From bjking1.com(prefers email via brking.com) 2006-07-14 17:16 EDT ------- Proposed fix to the DAC960 driver The proposed fix should prevent the DAC960 driver from talking to ipr adapters.
----- Additional Comments From bjking1.com(prefers email via brking.com) 2006-07-14 17:17 EDT ------- Can someone who has this hardware try out the patch and verify it fixes the problem?
Manoj have you tried this patch out?
I was able to reproduce this problem on a squadron using the RHEL5 7/11 boot.iso, I will build a kernel with brians patches and see if that fixes the issue.
----- Additional Comments From bjking1.com(prefers email via brking.com) 2006-07-26 18:27 EDT ------- Has anyone been able to try out this patch?
Tested the patch with FC5 Gold kernel, the DAC960 does not hang, and makes progress. I need to do more testing and do a full install with FC development / RHEL tree.
I was able to boot the install disk with Rawhide kernel plus this patch and did not hang on DAC960.
----- Additional Comments From bjking1.com (prefers email at brking.com) 2006-08-07 09:41 EDT ------- Code has been submitted upstream: http://marc.theaimsgroup.com/?l=linux-scsi&m=115463141705264&w=2
Brian did you recieve any feedback on this patch, is it in any upstream trees? Could someone from the kernel team review this patch possibly?
*** Bug 207140 has been marked as a duplicate of this bug. ***
----- Additional Comments From bjking1.com (prefers email at brking.com) 2006-09-20 09:40 EDT ------- Its currently in James Bottomley's scsi-misc tree, so it should get pushed with his scsi update for 2.6.19. http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=fddafd3d21953d5ea740f7b2f27149f7dd493194
----- Additional Comments From rschelle.com 2006-09-21 10:35 EDT ------- Will Red Hat be backporting the DAC960 driver patch to the RHEL5 kernel? If so, when should we expect a RHEL5 beta build that integrates this patch?
rschelle: I looked at the Beta2 kernels (http://people.redhat.com/dzickus/el5/) and the linux-2.6-ppc-dac960-ipr-clash.patch has the patch. If you can, pls verify the kernel.
----- Additional Comments From rschelle.com 2006-10-03 11:15 EDT ------- The 20060927 code drop for rhel5 beta1 (2.6.18-1.2702.el5) no longer runs into the DAC960 driver hang on install. The ipr driver is correctly probed and loaded without user assistance.
----- Additional Comments From bjking1.com (prefers email at brking.com) 2006-10-05 11:48 EDT ------- Mark, Can you verify this is fixed in fc6-test3?
----- Additional Comments From marksmit.com 2006-10-05 12:11 EDT ------- Hi Brian, I can certainly confirm that this bug does not recreate on FC6-test3. I've been using those iso's for many ppc64 installs now. But here's the tricky part. When I tested the daily builds of rawhide (June, July), this recreate came & went frequently. I could never figure out if it is something random in the build process, but do this day cannot tell in advance whether a build (without your fix) will recreate the hang or not.
----- Additional Comments From bjking1.com (prefers email at brking.com) 2006-10-05 13:21 EDT ------- According to the kernel rpm changelog in fc6-test3, the dac960/ipr collision patch was included on 08/03.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ACCEPTED |CLOSED ------- Additional Comments From bjking1.com (prefers email at brking.com) 2006-10-05 13:22 EDT ------- Closing, as the problem is fixed in FC6-test3.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Although it is not possible to test the recreate on Version: 2.6.18-1.2200.fc5 without either a netboot image or an install kernel, I am able to observe using lsmod, that mylex DAC960 module is loaded by default on this ppc64 machine (with only ipr devices, not dac960 devices) when booting from the stock FC5 kernel (2.6.15) after a fresh install. And yes, you do have to apply the nostorage workaround to do the fresh install, else this recreates. By comparison, after doing yum update and rebooting with the Version: 2.6.18- 1.2200.fc5 kernel, DAC960 module is no longer loaded by default. Based on this, I would guess that if a netboot image were made available of this Version: 2.6.18-1.2200.fc5 kernel, then this would not recreate. I have opened a new bug: 211383 for a regression seen while testing.