Bug 223505
| Summary: | LSPP: tcpdump crashes kernel and system goes into debugger. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Joy Latten <latten> | ||||||
| Component: | kernel | Assignee: | Herbert Xu <herbert.xu> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 5.0 | CC: | davem, dzickus, fnovak, herbert.xu, iboverma, jgirouar, jmorris, krisw, linda.knippers, mlichvar, sgrubb | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | ppc64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-02-13 17:01:27 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 224041 | ||||||||
| Attachments: |
|
||||||||
Can someone verify that the tcpdump work on other ethernet adapter? Also, what networking driver/adapter is eth0 attached to? This occurs on an lpar which is using ibmveth driver, that is it is a virtual ethernet. Just so we understand this correctly.... is the original problem description stating that this works fine on stock RHEL5RC, but fails on the LSPP specific kernel? The last ibmveth change went in on 1.2789.el5 for rhel5, is tcpdump worked on prior kernels? i.e. beta2 kernel, etc. tcpdump -i eth0 caused a panic on a Cell architecture blade after about receiving 8 packets. This was running 2.6.18-4.el5. Will attempt to switch to the kernel mentioned in comment #5 and look for an difference. 2.6.18-1.2767.el5 appears to work correctly and without issue 2.6.18-1.2789.el5 also worked fine. Still working to isolate the probomatic patch. panic was introduced somewhere between 1.3002.el5 and 1.3014.el5 even better, appears to work fine on 1.3013.el so problem must be between 3013 and 3014 I'm going to go back a reverify my work that this patch is the problem but the differences between 3013 and 3014 seem to be a result of Related: rhbz#219681 - xen dhcp patch has a new fix for a missing prototype, round 2. Adding Herbert to the CC since I believe it is his patch. This appears to work just fine on x86/x86_64 however on ppc64 it goes boom. Created attachment 146320 [details]
diff between 3013 and 3014
When you're in the debugger, can you get a backtrace of the crash? No. Below is what I get. You can easily access the machine I'm doing this on
internally.
[root@ibm-cell-01 ~]# tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:33:13.867488 arp who-has frodo.lab.boston.redhat.com tell
i386-5as.lab.boston.redhat.com
15:33:13.870286 arp who-has frodo.lab.boston.redhat.com tell
ibm-cell-01.lab.boston.redhat.com
15:33:13.872051 IP squad5-lp1.lab.boston.redhat.com >
ibm-cell-01.lab.boston.redhat.com: ICMP echo request, id 27000, seq 44259, length 64
15:33:13.872080 IP ibm-cell-01.lab.boston.redhat.com >
squad5-lp1.lab.boston.redhat.com: ICMP echo reply, id 27000, seq 44259, length 64
15:33:13.882778 arp reply frodo.lab.boston.redhat.com is-at 00:08:02:46:ea:e9
(oui Unknown)
15:33:13.882792 IP ibm-cell-01.lab.boston.redhat.com.cap >
frodo.lab.boston.redhat.com.domain: 22026+ PTR? 10.76.168.192.in-acpu 0x1:
Vector: 700 (Program Check) at [c00000001b023b10]
pc: c000000000940004
lr: c000000000940000
sp: c00000001b023d90
msr: 9000000000089032
current = 0xc000000001f5cb40
paca = 0xc000000000464500
pid = 2625, comm = tcpdump
enter ? for help
1:mon> t
[c00000001b023d90] c000000000940000 (unreliable)
1:mon>
Created attachment 146377 [details]
[PACKET]: Fix skb->cb clobbering between aux and sockaddr
Both aux data and sockaddr tries to use the same buffer which
obviously doesn't work. We just happen to have 4 bytes free in
the skb->cb if you take away the maximum length of sockaddr_ll.
That's just enough to store the one piece of info from aux data
that we can't generate at recvmsg(2) time.
This is what the following patch does.
Signed-off-by: Herbert Xu <herbert.org.au>
QE ack for RHEL5. in 2.6.18-6.el5 Closing out. |
Description of problem: Just issuing, "tcpdump" or "tcpdump -i eth0" in lspp 63 kernel causes the kernel to crash and system goes into debugger. Version-Release number of selected component (if applicable): tcpdump-3.9.4-8.1 How reproducible: Happens every time. Steps to Reproduce: 1. tcpdump -i eth0 OR tcpdump Actual results: tcpdump -i eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes (a few packes are picked up) ... ... Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x002d1694 cpu 0x0: Vector: 400 (Instruction Access) at [c00000000277bb10] pc: 00000000002d1694 lr: 00000000002d1694 sp: c00000000277bd90 msr: 8000000040009032 current = 0xc0000000029122f0 paca = 0xc000000000464300 pid = 1701, comm = tcpdump enter ? for help 0:mon>t 0:mon> r R00 = 00000000002d1694 R16 = 0000000000000000 R01 = c00000000277bd90 R17 = 0000000000000000 R02 = c000000000579640 R18 = 00000000ffffffff R03 = 00000000000000c8 R19 = 0000000000000000 R04 = c00000000277bcd8 R20 = 000000001008dd64 R05 = 0000000000000004 R21 = 00000000100b0000 R06 = 0000000000000000 R22 = 00000000100b0000 R07 = 0000000000000001 R23 = 00000000fd31fe3b R08 = 000000c80000000e R24 = c00000000fd50688 R09 = c000000002778000 R25 = c00000000fd50750 R10 = 0000000000000000 R26 = c00000000fd50880 R11 = 0000000000000000 R27 = 0000000000000000 R12 = 000c00010000a8c0 R28 = 0000000000000000 R13 = c000000000464300 R29 = 0000000000000000 R14 = 0000000000000000 R30 = c00000000050bed0 R15 = 0000000000000000 R31 = c00000000f6fbb80 pc = 00000000002d1694 lr = 00000000002d1694 msr = 8000000040009032 cr = 24022482 ctr = 0000000000000000 xer = 0000000000000000 trap = 400 0:mon> Expected results: Don't expect to see kernel debugger. :-) Additional info: uname -a Linux XXXXXXXX 2.6.18-1.3015.2.1.el5.lspp.63 #1 SMP Mon Jan 15 16:51:12 EST 2007 ppc64 ppc64 ppc64 GNU/Linux I think this may be a kernel issue. The same machine is installed with 2.6.18-1.3002.el5 kernel, and tcpdump works fine when using this kernel.