Bug 654948
Summary: | RHEL5.6 : 10Gb network card (AD144 &AD385)will be missing in installation and can not be drived in system | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | duxuewen <xue-wen.du> | ||||||
Component: | kernel | Assignee: | bob picco <bpicco> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Network QE <network-qe> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.6 | CC: | adam.vinsh, arozansk, dawei.pang, hjia, jiayin.shao, joseph.szczypek, joshua.powers, kzhang, li.zhang6, mschmidt, myamazak, ohudlick, shawn.pagan, shengliang.lv, shi.ze, tcamuso | ||||||
Target Milestone: | rc | Keywords: | OtherQA, Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | ia64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-01-13 22:01:45 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 502912 | ||||||||
Attachments: |
|
Description
duxuewen
2010-11-19 06:17:18 UTC
Did this work on RHEL5.5? It looks like a change to the size of the s2io_nic structure might be causing this, so I'll have to look at what changed in RHEL5.4 and RHEL5.5 as it doesn't look like much changed in RHEL5.5 that would cause this. (In reply to comment #1) > Did this work on RHEL5.5? > It looks like a change to the size of the s2io_nic structure might be causing > this, so I'll have to look at what changed in RHEL5.4 and RHEL5.5 as it doesn't > look like much changed in RHEL5.5 that would cause this. Hi,andy, i have install RHEL5.5 on rx2660 rx2660 AD144A: in slot (1) (efi_2.0.4.6) AE311A: in slot (2) AD338A: in slot (3) AD144A can be drived and work normal [root@max ~]# sutl nics eth0 (AD144A) 00:0c:fc:00:2f:4a <p0> S2IO [10000Mb/s] eth1 (AD337A) 00:1a:4b:f3:05:cc <p0> e1000e [Unknown!] eth2 (AD337A) 00:1a:4b:f3:05:cd <p1> e1000e [Unknown!] eth3 (rx2660) 00:17:a4:99:1d:0f <p0> tg3 [1000Mb/s] eth4 (rx2660) 00:17:a4:99:1d:0e <p1> tg3 [1000Mb/s] hi, This is Jiayin of China QA team who maintain Redhat defect list. I added me to this issue's cc list, but after I updated it, I found below information (my email is in Excluding list), and I still can't receive the updating for this defect. I wonder why it is. Anyone could tell me? Changes submitted for bug 654948 Email sent to: bugsfx, agospoda, shengliang.lv, xue-wen.du, adam.vinsh, joseph.szczypek, arozansk, dag, shawn.pagan, bugbot.org, li.zhang6, shi.ze Excluding: submit.redhat.com, kernel-qe, jiayin.shao Jiayin hi, This is Jiayin of China QA team who maintain Redhat defect list. I added me to this issue's cc list, but after I updated it, I found below information (my email is in Excluding list), and I still can't receive the updating for this defect. I wonder why it is. Anyone could tell me? Changes submitted for bug 654948 Email sent to: bugsfx, agospoda, shengliang.lv, xue-wen.du, adam.vinsh, joseph.szczypek, arozansk, dag, shawn.pagan, bugbot.org, li.zhang6, shi.ze Excluding: submit.redhat.com, kernel-qe, jiayin.shao Jiayin (In reply to comment #4) > hi, > This is Jiayin of China QA team who maintain Redhat defect list. I added me to > this issue's cc list, but after I updated it, I found below information (my > email is in Excluding list), and I still can't receive the updating for this > defect. I wonder why it is. Anyone could tell me? > > Changes submitted for bug 654948 > Email sent to: > bugsfx, agospoda, shengliang.lv, xue-wen.du, > adam.vinsh, joseph.szczypek, arozansk, dag, > shawn.pagan, bugbot.org, li.zhang6, > shi.ze > Excluding: > submit.redhat.com, kernel-qe, jiayin.shao > > > Jiayin If you make the update to the bugzilla, you will not get an email about the update to the bugzilla and will be on the 'excluding' list. (In reply to comment #5) > If you make the update to the bugzilla, you will not get an email about the > update to the bugzilla and will be on the 'excluding' list. Got it. thanks! Created attachment 463650 [details]
patch for fixing the s2io initialize
When the s2io initialize(s2io_init_nic), I found the sizeof(struct s2io_nic) = 73344, it is larger than NETDEV_PRIV_LEN_MAX 0X0000FFFF(64K) which compared in the alloc_netdev, so "Private data too big" is reported and device allocation failed.
NETDEV_PRIV_LEN_MAX and compared section are added by patch: linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch
I remove some related codes for workaround, the s2io can work.
There is the file s2io_fix_init.patch in the attachment, hope it can help us fix this issue.
By the way we need take care if the changes will cause another issue.
Thanks,
Dawei
(In reply to comment #7) > Created attachment 463650 [details] > patch for fixing the s2io initialize > > When the s2io initialize(s2io_init_nic), I found the sizeof(struct s2io_nic) = > 73344, it is larger than NETDEV_PRIV_LEN_MAX 0X0000FFFF(64K) which compared in > the alloc_netdev, so "Private data too big" is reported and device allocation > failed. > > NETDEV_PRIV_LEN_MAX and compared section are added by patch: > linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch > > I remove some related codes for workaround, the s2io can work. > There is the file s2io_fix_init.patch in the attachment, hope it can help us > fix this issue. > > By the way we need take care if the changes will cause another issue. > > Thanks, > Dawei The attached patch will have other side effects, so we cannot use it. What will need to happen is to put the s2io_nic structure on a diet and convert some of the data stored in the structure to pointers to allocated memory. The best thing will be to load a system with crash and look at what elements are taking up the most space and can be moved around. Reassiging to Bob as he should be able to quickly knock this out. Created attachment 464238 [details]
sizes of structs on x86_64 (pahole s2io.o)
pahole is a nice tool to explore sizes of structures. Attached is the full output of "pahole s2io.o" on x86_64.
The biggest members are:
struct s2io_nic {
...
struct mac_info mac_control; /* size: 65920 */
...
/* size: 73344 */
};
struct mac_info {
...
struct ring_info rings[8]; /* size: 64512 */
...
/* size: 65920 */
};
struct ring_info {
...
struct lro lro0_n[32]; /* size: 4096 */
...
struct rx_block_info rx_blocks[150]; /* size: 3600 */
...
/* size: 8064 */
};
Please test a kernel rpm at: http://people.redhat.com/~bpicco/.bz654948/ , We've been unable to find working local hardware. thanx, bob Adding Shawn Pagan of hp in the CC list. Shawn, does your group have hardware that can test this kernel? I downloaded kernel file from http://people.redhat.com/~bpicco/.bz654948/kernel-2.6.18-235.el5.s2iov3.ia64.rpm, installed it on the RHEL5.6S3(IA64, rx2660) and reboot. The Ethernet port of AD144 or AD385 can be found and ping successfully. By the way, which snapshot will plan to add this fix? At that time, I will do some stress test for this driver. The followed is some information cut from dmesg: -----------------------cut from dmesg-------------------- GSI 52 (level, low) -> CPU 2 (0x0200) vector 69 ACPI: PCI Interrupt 0000:06:01.0[A] -> GSI 52 (level, low) -> IRQ 69 s2io: s2io_init_nic: Using 64bit DMA s2io: eth%d: Ring Mem PHY: 0x100ec220000 s2io: s2io_reset: Resetting XFrame card eth%d PM: Writing back config space on device 0000:06:01.0 at offset 1 (was 2300142, writing 2300146) s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth2: Neterion HP PCI-X 266MHz 10GbE SR Fiber Adapter (rev 2) s2io: eth2: Driver version 2.0.26.25 s2io: eth2: MAC Address: 00:0c:fc:00:58:23 s2io: Serial number: SXT0808109 s2io: eth2: Device is on 64 bit 133MHz PCIX(M1) bus s2io: eth2: 1-Buffer receive mode enabled s2io: eth2: NAPI enabled s2io: eth2: Using 1 Tx fifo(s) s2io: eth2: Using 1 Rx ring(s) s2io: eth2: Interrupt type INTA s2io: eth2: Multiqueue support disabled s2io: eth2: No steering enabled for transmit s2io: Fifo partition at: 0xc000080680101108 is: 0xfff00000000 s2io: eth2: Next block at: e0000100ec848000 s2io: eth2: Next block at: e0000100ec84c000 s2io: eth2: Next block at: e0000100ecf48000 s2io: eth2: Next block at: e0000100ecf4c000 s2io: eth2: Next block at: e0000100ebf80000 s2io: eth2: Next block at: e0000100ebf84000 s2io: eth2: Next block at: e0000100ec348000 s2io: eth2: Next block at: e0000100ec34c000 s2io: Buf in ring:0 is 3810: s2io: eth2: Link Up s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100eb8dc000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ec848000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ec84c000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ecf48000 s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ecf4c000 s2io: eth2: Next block at: e0000100ebf80000 s2io: eth2: In Neterion Tx routine s2io: eth2: Next block at: e0000100ebf84000 s2io: eth2: Next block at: e0000100ec348000 ------------------------End---------------------------- Thanks, Dawei This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-237.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. The s2io in the kernel-2.6.18-237.el5.ia64.rpm can work. The ethernet port of AD144 or AD385 can be found and ping successfully. I will run the 24 hours network stress test for this kernel, and feedback the result tomorrow. The followed is some information cut from dmesg: -----------------------cut from dmesg after insmod s2io.ko--------------------- GSI 38 (level, low) -> CPU 0 (0x0000) vector 64 ACPI: PCI Interrupt 0000:0a:01.0[A] -> GSI 38 (level, low) -> IRQ 64 PM: Writing back config space on device 0000:0a:01.0 at offset c (was 0, writing a0000000) PM: Writing back config space on device 0000:0a:01.0 at offset 7 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 6 (was c, writing 8000000c) PM: Writing back config space on device 0000:0a:01.0 at offset 5 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 4 (was c, writing 8010000c) PM: Writing back config space on device 0000:0a:01.0 at offset 3 (was 4000, writing 4020) PM: Writing back config space on device 0000:0a:01.0 at offset 1 (was 2300000, writing 2300146) PM: Writing back config space on device 0000:0a:01.0 at offset c (was 0, writing a0000000) PM: Writing back config space on device 0000:0a:01.0 at offset 7 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 6 (was c, writing 8000000c) PM: Writing back config space on device 0000:0a:01.0 at offset 5 (was 0, writing 802) PM: Writing back config space on device 0000:0a:01.0 at offset 4 (was c, writing 8010000c) PM: Writing back config space on device 0000:0a:01.0 at offset 3 (was 4000, writing 4020) PM: Writing back config space on device 0000:0a:01.0 at offset 1 (was 2300000, writing 2300146) s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth2: Neterion HP PCI-X 133MHz 10GbE SR Fiber Adapter (rev 4) s2io: eth2: Driver version 2.0.26.25 s2io: eth2: MAC Address: 00:0c:fc:00:2f:4a s2io: Serial number: SXT0710158 s2io: eth2: 1-Buffer receive mode enabled s2io: eth2: NAPI enabled s2io: eth2: Using 1 Tx fifo(s) s2io: eth2: Using 1 Rx ring(s) s2io: eth2: Interrupt type INTA s2io: eth2: Multiqueue support disabled s2io: eth2: No steering enabled for transmit GSI 67 (level, low) -> CPU 1 (0x0200) vector 65 ACPI: PCI Interrupt 0000:4a:01.0[A] -> GSI 67 (level, low) -> IRQ 65 PM: Writing back config space on device 0000:4a:01.0 at offset 1 (was 2300142, writing 2300146) s2io: eth2: Link Up s2io: Copyright(c) 2002-2007 Neterion Inc. s2io: eth0: Neterion HP PCI-X 266MHz 10GbE SR Fiber Adapter (rev 2) s2io: eth0: Driver version 2.0.26.25 s2io: eth0: MAC Address: 00:0c:fc:00:4d:ca s2io: Serial number: SXT0740103 s2io: eth0: Device is on 64 bit 133MHz PCIX(M1) bus s2io: eth0: 1-Buffer receive mode enabled s2io: eth0: NAPI enabled s2io: eth0: Using 1 Tx fifo(s) s2io: eth0: Using 1 Rx ring(s) s2io: eth0: Interrupt type INTA s2io: eth0: Multiqueue support disabled s2io: eth0: No steering enabled for transmit ----------------------------end------------------------------------------------- (In reply to comment #18) > This request was evaluated by Red Hat Product Management for inclusion in a Red > Hat Enterprise Linux maintenance release. Product Management has requested > further review of this request by Red Hat Engineering, for potential > inclusion in a Red Hat Enterprise Linux Update release for currently deployed > products. This request is not yet committed for inclusion in an Update > release. This issue is a critical problem for HP which may affect rhel5.6's LR. And since the snapshot5 is a last snapshot version, I wonder which maintenence release you are going to resolve it. Jiayin I ran the 24 hours network stress test with kernel-2.6.18-237.el5.ia64.rpm , it is PASS HP strongly hope you could resolve this issue before RC release. Jiayin An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |