Description of problem: Just issuing, "tcpdump" or "tcpdump -i eth0" in lspp 63 kernel causes the kernel to crash and system goes into debugger. Version-Release number of selected component (if applicable): tcpdump-3.9.4-8.1 How reproducible: Happens every time. Steps to Reproduce: 1. tcpdump -i eth0 OR tcpdump Actual results: tcpdump -i eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes (a few packes are picked up) ... ... Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x002d1694 cpu 0x0: Vector: 400 (Instruction Access) at [c00000000277bb10] pc: 00000000002d1694 lr: 00000000002d1694 sp: c00000000277bd90 msr: 8000000040009032 current = 0xc0000000029122f0 paca = 0xc000000000464300 pid = 1701, comm = tcpdump enter ? for help 0:mon>t 0:mon> r R00 = 00000000002d1694 R16 = 0000000000000000 R01 = c00000000277bd90 R17 = 0000000000000000 R02 = c000000000579640 R18 = 00000000ffffffff R03 = 00000000000000c8 R19 = 0000000000000000 R04 = c00000000277bcd8 R20 = 000000001008dd64 R05 = 0000000000000004 R21 = 00000000100b0000 R06 = 0000000000000000 R22 = 00000000100b0000 R07 = 0000000000000001 R23 = 00000000fd31fe3b R08 = 000000c80000000e R24 = c00000000fd50688 R09 = c000000002778000 R25 = c00000000fd50750 R10 = 0000000000000000 R26 = c00000000fd50880 R11 = 0000000000000000 R27 = 0000000000000000 R12 = 000c00010000a8c0 R28 = 0000000000000000 R13 = c000000000464300 R29 = 0000000000000000 R14 = 0000000000000000 R30 = c00000000050bed0 R15 = 0000000000000000 R31 = c00000000f6fbb80 pc = 00000000002d1694 lr = 00000000002d1694 msr = 8000000040009032 cr = 24022482 ctr = 0000000000000000 xer = 0000000000000000 trap = 400 0:mon> Expected results: Don't expect to see kernel debugger. :-) Additional info: uname -a Linux XXXXXXXX 2.6.18-1.3015.2.1.el5.lspp.63 #1 SMP Mon Jan 15 16:51:12 EST 2007 ppc64 ppc64 ppc64 GNU/Linux I think this may be a kernel issue. The same machine is installed with 2.6.18-1.3002.el5 kernel, and tcpdump works fine when using this kernel.
Can someone verify that the tcpdump work on other ethernet adapter? Also, what networking driver/adapter is eth0 attached to?
This occurs on an lpar which is using ibmveth driver, that is it is a virtual ethernet.
Just so we understand this correctly.... is the original problem description stating that this works fine on stock RHEL5RC, but fails on the LSPP specific kernel?
The last ibmveth change went in on 1.2789.el5 for rhel5, is tcpdump worked on prior kernels? i.e. beta2 kernel, etc.
tcpdump -i eth0 caused a panic on a Cell architecture blade after about receiving 8 packets. This was running 2.6.18-4.el5. Will attempt to switch to the kernel mentioned in comment #5 and look for an difference.
2.6.18-1.2767.el5 appears to work correctly and without issue
2.6.18-1.2789.el5 also worked fine. Still working to isolate the probomatic patch.
panic was introduced somewhere between 1.3002.el5 and 1.3014.el5
even better, appears to work fine on 1.3013.el so problem must be between 3013 and 3014
I'm going to go back a reverify my work that this patch is the problem but the differences between 3013 and 3014 seem to be a result of Related: rhbz#219681 - xen dhcp patch has a new fix for a missing prototype, round 2. Adding Herbert to the CC since I believe it is his patch. This appears to work just fine on x86/x86_64 however on ppc64 it goes boom.
Created attachment 146320 [details] diff between 3013 and 3014
When you're in the debugger, can you get a backtrace of the crash?
No. Below is what I get. You can easily access the machine I'm doing this on internally. [root@ibm-cell-01 ~]# tcpdump -i eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 15:33:13.867488 arp who-has frodo.lab.boston.redhat.com tell i386-5as.lab.boston.redhat.com 15:33:13.870286 arp who-has frodo.lab.boston.redhat.com tell ibm-cell-01.lab.boston.redhat.com 15:33:13.872051 IP squad5-lp1.lab.boston.redhat.com > ibm-cell-01.lab.boston.redhat.com: ICMP echo request, id 27000, seq 44259, length 64 15:33:13.872080 IP ibm-cell-01.lab.boston.redhat.com > squad5-lp1.lab.boston.redhat.com: ICMP echo reply, id 27000, seq 44259, length 64 15:33:13.882778 arp reply frodo.lab.boston.redhat.com is-at 00:08:02:46:ea:e9 (oui Unknown) 15:33:13.882792 IP ibm-cell-01.lab.boston.redhat.com.cap > frodo.lab.boston.redhat.com.domain: 22026+ PTR? 10.76.168.192.in-acpu 0x1: Vector: 700 (Program Check) at [c00000001b023b10] pc: c000000000940004 lr: c000000000940000 sp: c00000001b023d90 msr: 9000000000089032 current = 0xc000000001f5cb40 paca = 0xc000000000464500 pid = 2625, comm = tcpdump enter ? for help 1:mon> t [c00000001b023d90] c000000000940000 (unreliable) 1:mon>
Created attachment 146377 [details] [PACKET]: Fix skb->cb clobbering between aux and sockaddr Both aux data and sockaddr tries to use the same buffer which obviously doesn't work. We just happen to have 4 bytes free in the skb->cb if you take away the maximum length of sockaddr_ll. That's just enough to store the one piece of info from aux data that we can't generate at recvmsg(2) time. This is what the following patch does. Signed-off-by: Herbert Xu <herbert.org.au>
QE ack for RHEL5.
in 2.6.18-6.el5
Closing out.