Description of problem: During in installation, stage1:can find and use the ad144(ad385) to install rhel5.6, stage2: missing ad144(ad385) when configure all network cards. Login system: can find the 10gb card in -lspci but can not be drived. Version-Release number of selected component (if applicable): rx2660 AD144A: in slot (1) (efi_2.0.4.6) AD337A: in slot (2) AD338A: in slot (3) [rx2660-12] MP:CM> sysrev SYSREV Current firmware revisions MP FW : F.02.25 BMC FW : 05.26 EFI FW : ROM A 07.14, ROM B 07.14 System FW : ROM A 04.15, ROM B 04.11, Boot ROM A PDH FW : 50.07 UCIO FW : 03.0b PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01 How reproducible: Steps to Reproduce: 1.install AD144A or AD385A on rx2660 (all integrity server) 2.install rhel5.6 b1 or s1 3.login system: [root@mincm ~]# cd /etc/sysconfig/network-scripts/ [root@mincm network-scripts]# ls ifcfg-* Actual results: missing AD144A or AD385A in stage2 can not be drived in system Expected results: AD144A or AD385A work normal Additional info: login system [root@max ~]# sutl nics eth0 (AD337A) 00:1a:4b:f3:05:cc <p0> e1000e [1000Mb/s] eth1 (AD337A) 00:1a:4b:f3:05:cd <p1> e1000e [1000Mb/s] eth2 (AD338A) 00:1a:4b:f3:58:9a <p0> e1000e [Unknown!] eth3 (AD338A) 00:1a:4b:f3:58:9b <p1> e1000e [Unknown!] eth4 (rx2660) 00:17:a4:99:1d:0f <p0> tg3 [1000Mb/s] eth5 (rx2660) 00:17:a4:99:1d:0e <p1> tg3 [1000Mb/s] [root@max ~]# sutl cards Unrecognized PCI Devices (3): Unknown location: 103c:403b-0000:0000 PCI bridge: HP PCIe Root Port (pcieport-driver) Recognized PCI Devices (16): Unknown location: MP [HP Management Processor] RUSA-Serial [Ruby/Sapphire Unified Core I/O board] (serial) RUSA-USB [Ruby/Sapphire Unified Core I/O board] (ohci_hcd) RUSA-USB2 [Ruby/Sapphire Unified Core I/O board] (ehci_hcd) Merlion-VGA [Embedded I/O subsystems for Merlion] Merlion-Sputnik [Embedded I/O subsystems for Merlion] (mptsas) rx2660 [Embedded I/O for rx2660] (tg3) AD337A [HP PCIe 2-port 1000Base-T Card] (e1000e) AD397A/AD348A#008 (Spawn 1/2) [Smart Array P400 SAS RAID PCI-e] (cciss) AD144A [S2io Xframe 10Gig-E PCI-X] AD338A [HP PCIe 2-port 1000Base-SX Card ] (e1000e) [root@max ~]# modprobe -r s2io [root@max ~]# modprobe s2io alloc_dev: Private data too big. dmesg can find this message: s2io: s2io_init_nic: Using 64bit DMA alloc_dev: Private data too big. s2io: Device allocation failed
Did this work on RHEL5.5? It looks like a change to the size of the s2io_nic structure might be causing this, so I'll have to look at what changed in RHEL5.4 and RHEL5.5 as it doesn't look like much changed in RHEL5.5 that would cause this.
(In reply to comment #1) > Did this work on RHEL5.5? > It looks like a change to the size of the s2io_nic structure might be causing > this, so I'll have to look at what changed in RHEL5.4 and RHEL5.5 as it doesn't > look like much changed in RHEL5.5 that would cause this. Hi,andy, i have install RHEL5.5 on rx2660 rx2660 AD144A: in slot (1) (efi_2.0.4.6) AE311A: in slot (2) AD338A: in slot (3) AD144A can be drived and work normal [root@max ~]# sutl nics eth0 (AD144A) 00:0c:fc:00:2f:4a <p0> S2IO [10000Mb/s] eth1 (AD337A) 00:1a:4b:f3:05:cc <p0> e1000e [Unknown!] eth2 (AD337A) 00:1a:4b:f3:05:cd <p1> e1000e [Unknown!] eth3 (rx2660) 00:17:a4:99:1d:0f <p0> tg3 [1000Mb/s] eth4 (rx2660) 00:17:a4:99:1d:0e <p1> tg3 [1000Mb/s]
hi, This is Jiayin of China QA team who maintain Redhat defect list. I added me to this issue's cc list, but after I updated it, I found below information (my email is in Excluding list), and I still can't receive the updating for this defect. I wonder why it is. Anyone could tell me? Changes submitted for bug 654948 Email sent to: bugsfx, agospoda, shengliang.lv, xue-wen.du, adam.vinsh, joseph.szczypek, arozansk, dag, shawn.pagan, bugbot.org, li.zhang6, shi.ze Excluding: submit.redhat.com, kernel-qe, jiayin.shao Jiayin
(In reply to comment #4) > hi, > This is Jiayin of China QA team who maintain Redhat defect list. I added me to > this issue's cc list, but after I updated it, I found below information (my > email is in Excluding list), and I still can't receive the updating for this > defect. I wonder why it is. Anyone could tell me? > > Changes submitted for bug 654948 > Email sent to: > bugsfx, agospoda, shengliang.lv, xue-wen.du, > adam.vinsh, joseph.szczypek, arozansk, dag, > shawn.pagan, bugbot.org, li.zhang6, > shi.ze > Excluding: > submit.redhat.com, kernel-qe, jiayin.shao > > > Jiayin If you make the update to the bugzilla, you will not get an email about the update to the bugzilla and will be on the 'excluding' list.
(In reply to comment #5) > If you make the update to the bugzilla, you will not get an email about the > update to the bugzilla and will be on the 'excluding' list. Got it. thanks!
Created attachment 463650 [details] patch for fixing the s2io initialize When the s2io initialize(s2io_init_nic), I found the sizeof(struct s2io_nic) = 73344, it is larger than NETDEV_PRIV_LEN_MAX 0X0000FFFF(64K) which compared in the alloc_netdev, so "Private data too big" is reported and device allocation failed. NETDEV_PRIV_LEN_MAX and compared section are added by patch: linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch I remove some related codes for workaround, the s2io can work. There is the file s2io_fix_init.patch in the attachment, hope it can help us fix this issue. By the way we need take care if the changes will cause another issue. Thanks, Dawei
(In reply to comment #7) > Created attachment 463650 [details] > patch for fixing the s2io initialize > > When the s2io initialize(s2io_init_nic), I found the sizeof(struct s2io_nic) = > 73344, it is larger than NETDEV_PRIV_LEN_MAX 0X0000FFFF(64K) which compared in > the alloc_netdev, so "Private data too big" is reported and device allocation > failed. > > NETDEV_PRIV_LEN_MAX and compared section are added by patch: > linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch > > I remove some related codes for workaround, the s2io can work. > There is the file s2io_fix_init.patch in the attachment, hope it can help us > fix this issue. > > By the way we need take care if the changes will cause another issue. > > Thanks, > Dawei The attached patch will have other side effects, so we cannot use it. What will need to happen is to put the s2io_nic structure on a diet and convert some of the data stored in the structure to pointers to allocated memory. The best thing will be to load a system with crash and look at what elements are taking up the most space and can be moved around. Reassiging to Bob as he should be able to quickly knock this out.
Created attachment 464238 [details] sizes of structs on x86_64 (pahole s2io.o) pahole is a nice tool to explore sizes of structures. Attached is the full output of "pahole s2io.o" on x86_64. The biggest members are: struct s2io_nic { ... struct mac_info mac_control; /* size: 65920 */ ... /* size: 73344 */ }; struct mac_info { ... struct ring_info rings[8]; /* size: 64512 */ ... /* size: 65920 */ }; struct ring_info { ... struct lro lro0_n[32]; /* size: 4096 */ ... struct rx_block_info rx_blocks[150]; /* size: 3600 */ ... /* size: 8064 */ };
Please test a kernel rpm at: http://people.redhat.com/~bpicco/.bz654948/ , We've been unable to find working local hardware. thanx, bob
Adding Shawn Pagan of hp in the CC list. Shawn, does your group have hardware that can test this kernel?
I downloaded kernel file from http://people.redhat.com/~bpicco/.bz654948/kernel-2.6.18-235.el5.s2iov3.ia64.rpm, installed it on the RHEL5.6S3(IA64, rx2660) and reboot. The Ethernet port of AD144 or AD385 can be found and ping successfully. By the way, which snapshot will plan to add this fix? At that time, I will do some stress test for this driver. The followed is some information cut from dmesg: -----------------------cut from dmesg-------------------- GSI 52 (level, low) -> CPU 2 (0x0200) vector 69 ACPI: PCI Interrupt 0000:06:01.0[A] -> GSI 52 (level, low) -> IRQ 69 s2io: s2io_init_nic: Using 64bit DMA s2io: eth%d: Ring Mem PHY: 0x100ec220000 s2io: s2io_reset: Resetting XFrame card eth%d PM: Writing back config space on device 0000:06:01.0 at offset 1 (was 2300142, writing 2300146) s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth2: Neterion HP PCI-X 266MHz 10GbE SR Fiber Adapter (rev 2) s2io: eth2: Driver version 2.0.26.25 s2io: eth2: MAC Address: 00:0c:fc:00:58:23 s2io: Serial number: SXT0808109 s2io: eth2: Device is on 64 bit 133MHz PCIX(M1) bus s2io: eth2: 1-Buffer receive mode enabled s2io: eth2: NAPI enabled s2io: eth2: Using 1 Tx fifo(s) s2io: eth2: Using 1 Rx ring(s) s2io: eth2: Interrupt type INTA s2io: eth2: Multiqueue support disabled s2io: eth2: No steering enabled for transmit s2io: Fifo partition at: 0xc000080680101108 is: 0xfff00000000 s2io: eth2: Next block at: e0000100ec848000 s2io: eth2: Next block at: e0000100ec84c000 s2io: eth2: Next block at: e0000100ecf48000 s2io: eth2: Next block at: e0000100ecf4c000 s2io: eth2: Next block at: e0000100ebf80000 s2io: eth2: Next block at: e0000100ebf84000 s2io: eth2: Next block at: e0000100ec348000 s2io: eth2: Next block at: e0000100ec34c000 s2io: Buf in ring:0 is 3810: s2io: eth2: Link Up s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100eb8dc000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ec848000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ec84c000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ecf48000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ecf4c000 s2io: eth2: Next block at: e0000100ebf80000 s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ebf84000 s2io: eth2: Next block at: e0000100ec348000 ------------------------End---------------------------- Thanks, Dawei
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-237.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
The s2io in the kernel-2.6.18-237.el5.ia64.rpm can work. The ethernet port of AD144 or AD385 can be found and ping successfully. I will run the 24 hours network stress test for this kernel, and feedback the result tomorrow. The followed is some information cut from dmesg: -----------------------cut from dmesg after insmod s2io.ko--------------------- GSI 38 (level, low) -> CPU 0 (0x0000) vector 64 ACPI: PCI Interrupt 0000:0a:01.0[A] -> GSI 38 (level, low) -> IRQ 64 PM: Writing back config space on device 0000:0a:01.0 at offset c (was 0, writing a0000000) PM: Writing back config space on device 0000:0a:01.0 at offset 7 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 6 (was c, writing 8000000c) PM: Writing back config space on device 0000:0a:01.0 at offset 5 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 4 (was c, writing 8010000c) PM: Writing back config space on device 0000:0a:01.0 at offset 3 (was 4000, writing 4020) PM: Writing back config space on device 0000:0a:01.0 at offset 1 (was 2300000, writing 2300146) PM: Writing back config space on device 0000:0a:01.0 at offset c (was 0, writing a0000000) PM: Writing back config space on device 0000:0a:01.0 at offset 7 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 6 (was c, writing 8000000c) PM: Writing back config space on device 0000:0a:01.0 at offset 5 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 4 (was c, writing 8010000c) PM: Writing back config space on device 0000:0a:01.0 at offset 3 (was 4000, writing 4020) PM: Writing back config space on device 0000:0a:01.0 at offset 1 (was 2300000, writing 2300146) s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth2: Neterion HP PCI-X 133MHz 10GbE SR Fiber Adapter (rev 4) s2io: eth2: Driver version 2.0.26.25 s2io: eth2: MAC Address: 00:0c:fc:00:2f:4a s2io: Serial number: SXT0710158 s2io: eth2: 1-Buffer receive mode enabled s2io: eth2: NAPI enabled s2io: eth2: Using 1 Tx fifo(s) s2io: eth2: Using 1 Rx ring(s) s2io: eth2: Interrupt type INTA s2io: eth2: Multiqueue support disabled s2io: eth2: No steering enabled for transmit GSI 67 (level, low) -> CPU 1 (0x0200) vector 65 ACPI: PCI Interrupt 0000:4a:01.0[A] -> GSI 67 (level, low) -> IRQ 65 PM: Writing back config space on device 0000:4a:01.0 at offset 1 (was 2300142, writing 2300146) s2io: eth2: Link Up s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth0: Neterion HP PCI-X 266MHz 10GbE SR Fiber Adapter (rev 2) s2io: eth0: Driver version 2.0.26.25 s2io: eth0: MAC Address: 00:0c:fc:00:4d:ca s2io: Serial number: SXT0740103 s2io: eth0: Device is on 64 bit 133MHz PCIX(M1) bus s2io: eth0: 1-Buffer receive mode enabled s2io: eth0: NAPI enabled s2io: eth0: Using 1 Tx fifo(s) s2io: eth0: Using 1 Rx ring(s) s2io: eth0: Interrupt type INTA s2io: eth0: Multiqueue support disabled s2io: eth0: No steering enabled for transmit ----------------------------end-------------------------------------------------
(In reply to comment #18) > This request was evaluated by Red Hat Product Management for inclusion in a Red > Hat Enterprise Linux maintenance release. Product Management has requested > further review of this request by Red Hat Engineering, for potential > inclusion in a Red Hat Enterprise Linux Update release for currently deployed > products. This request is not yet committed for inclusion in an Update > release. This issue is a critical problem for HP which may affect rhel5.6's LR. And since the snapshot5 is a last snapshot version, I wonder which maintenence release you are going to resolve it. Jiayin
I ran the 24 hours network stress test with kernel-2.6.18-237.el5.ia64.rpm , it is PASS
HP strongly hope you could resolve this issue before RC release. Jiayin
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html