496686 – kernel panics with GPF and exception RIP: pskb_copy+307

Bug 496686 - kernel panics with GPF and exception RIP: pskb_copy+307

Summary: kernel panics with GPF and exception RIP: pskb_copy+307

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Pirko
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	525215 533192
TreeView+	depends on / blocked

Reported:	2009-04-20 18:13 UTC by Marc Milgram
Modified:	2018-12-02 15:36 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-03-04 20:36:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Believe we can reproduce this consistently (131.03 KB, image/jpeg) 2009-09-14 20:08 UTC, Mary Edie Meredith	no flags	Details
View All

Description Marc Milgram 2009-04-20 18:13:27 UTC

Description of problem:
kernel panics with GPF and exception RIP: pskb_copy+307
Found that skb_shinfo(n)->frags seems to be corrupt.
It is being dereferenced, but currently is 0x408.

PID: 0      TASK: ffff810137b1c100  CPU: 5   COMMAND: "swapper"
 #0 [ffff810137b3fc40] crash_kexec at ffffffff800aaa19
 #1 [ffff810137b3fd00] __die at ffffffff8006520f
 #2 [ffff810137b3fd40] die at ffffffff8006bc17
 #3 [ffff810137b3fd70] do_general_protection at ffffffff80065657
 #4 [ffff810137b3fdb0] error_exit at ffffffff8005dde9
    [exception RIP: pskb_copy+307]
    RIP: ffffffff8021955d  RSP: ffff810137b3fe60  RFLAGS: 00010282
    RAX: ffff810d7024f120  RBX: ffff810fda0e12c0  RCX: ffff810fdae83d30
    RDX: dfc143c21b0e4080  RSI: ffff810d7024f130  RDI: 0000000000000002
    RBP: ffff810e03586b40   R8: 000000000b88e56c   R9: 0000000000000000
    R10: ffff810fda0e12c0  R11: 00000000000000c8  R12: 0000000000000220
    R13: ffff810e03586b40  R14: 0000000000000005  R15: ffffffff803d6300
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff810137b3fe78] tcp_transmit_skb at ffffffff80021377
 #6 [ffff810137b3fec8] tcp_retransmit_skb at ffffffff80243d6f
 #7 [ffff810137b3ff08] tcp_write_timer at ffffffff8024570f
 #8 [ffff810137b3ff28] run_timer_softirq at ffffffff80094dbb
 #9 [ffff810137b3ff58] __do_softirq at ffffffff80011fbc
#10 [ffff810137b3ff88] call_softirq at ffffffff8005e2fc
#11 [ffff810137b3ffa0] do_softirq at ffffffff8006cada
#12 [ffff810137b3ffb0] apic_timer_interrupt at ffffffff8005dc8e
--- <IRQ stack> ---
#13 [ffff810137b3bdf8] apic_timer_interrupt at ffffffff8005dc8e
    [exception RIP: acpi_processor_idle+436]
    RIP: ffffffff8018d0f1  RSP: ffff810137b3bea8  RFLAGS: 00000246
    RAX: ffff810137b3bfd8  RBX: ffff810fdfdd5900  RCX: 0000000000b57183
    RDX: 0000000000000408  RSI: 0000000000b57f63  RDI: 0000000000000000
    RBP: ffff810137b3bee8   R8: ffff810137b3a000   R9: 000000000000003b
    R10: ffff810f9b02c100  R11: ffff810ba2a4fda8  R12: ffff810f9b02c100
    R13: 0000000000402000  R14: 0000000000000000  R15: ffff810f9b02c100
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#14 [ffff810137b3bea0] acpi_processor_idle at ffffffff8018d050
#15 [ffff810137b3bef0] cpu_idle at ffffffff80048d19

The other CPUs are currently idle.

Version-Release number of selected component (if applicable):
kernel-2.6.18-128.1.1.el5


How reproducible:
Panic occurred twice.
Currently no way to reproduce.

Steps to Reproduce:
Unknown
  
Actual results:
Panic

Expected results:
No panic

Additional info:

Comment 5 Dave Anderson 2009-06-25 18:16:36 UTC

These are my notes re: /cores/20090407192354/work/1908902-vmcore-1st.
The other vmcore is gzip'd, root-owned, and even when I tried to copy it
to another machine, the scp stalled, so I couldn't look at that one.

The incoming sk_buff at ffff810e03586b40 looks OK:
  
  crash> kmem -s ffff810e03586b40
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  ffff810fdf939500 skbuff_fclone_cache      512         64       308     44     4k
  SLAB              MEMORY            TOTAL  ALLOCATED  FREE
  ffff810e035860c0  ffff810e03586140      7          4     3
  FREE / [ALLOCATED]
    [ffff810e03586b40]
  crash>

And to my untrained eye, I guess most of its contents look OK:

  crash> sk_buff ffff810e03586b40
  struct sk_buff {
    next = 0xffff810fb780cb00, 
    prev = 0xffff810d513eee70, 
    sk = 0x0, 
    tstamp = {
      off_sec = 0x0, 
      off_usec = 0x0
    }, 
    dev = 0x0, 
    input_dev = 0x0, 
    h = {
      th = 0x0, 
      uh = 0x0, 
      icmph = 0x0, 
      igmph = 0x0, 
      ipiph = 0x0, 
      ipv6h = 0x0, 
      raw = 0x0
    }, 
    nh = {
      iph = 0x0, 
      ipv6h = 0x0, 
      arph = 0x0, 
      raw = 0x0
    }, 
    mac = {
      raw = 0x0
    }, 
    dst = 0x0, 
    sp = 0x0, 
    cb = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000?<A3>!S<F3><A8>!S<FE><U+07BF>\
  005\020\004\000\000\000\000\000\000\000\000\000", 
    len = 0x5b4, 
    data_len = 0x5b4, 
    mac_len = 0x0, 
    csum = 0x0, 
    priority = 0x0, 
    local_df = 0x0, 
    cloned = 0x1, 
    ip_summed = 0x1, 
    nohdr = 0x1, 
    nfctinfo = 0x0, 
    pkt_type = 0x0, 
    fclone = 0x1, 
    ipvs_property = 0x0, 
    protocol = 0x0, 
    destructor = 0, 
    nfct = 0x0, 
    nf_bridge = 0x0, 
    nfmark = 0x0, 
    tc_index = 0x0, 
    tc_verd = 0x0, 
    dma_cookie = 0x0, 
    secmark = 0x0, 
    truesize = 0x7a4, 
    users = {
      counter = 0x1
    }, 
    head = 0xffff810fdae83c00 "\001\200\1777>4\017\002i\005\037<DE>\020\002\201\2054\a\016\033_<FA>\"<C1><DF>\017", 
    data = 0xffff810fdae83d00 "", 
    tail = 0xffff810fdae83d00 "", 
    end = 0xffff810fdae83d00 ""
  }
  
tcp_transmit_skb() calls the fatal pskb_copy() function here:

       if (likely(clone_it)) {
                if (unlikely(skb_cloned(skb)))
                        skb = pskb_copy(skb, gfp_mask);
                else
                        skb = skb_clone(skb, gfp_mask);
                if (unlikely(!skb))
                        return -ENOBUFS;
        }
  
skb_cloned() makes two checks:

  static inline int skb_cloned(const struct sk_buff *skb)
  {
          return skb->cloned &&
                 (atomic_read(&skb_shinfo(skb)->dataref) & SKB_DATAREF_MASK) != 1;
  }
  
where skb->cloned is 1 above, but the skb->end (0xffff810fdae83d00) seems to be 
pointing to a bogus skb_shared_info structure:
  
  #define skb_shinfo(SKB)         ((struct skb_shared_info *)((SKB)->end)) 
  
  crash> skb_shared_info
  struct skb_shared_info {
      atomic_t dataref;
      short unsigned int nr_frags;
      short unsigned int gso_size;
      short unsigned int gso_segs;
      short unsigned int gso_type;
      unsigned int ip6_frag_id;
      struct sk_buff *frag_list;
      skb_frag_t frags[18];
  }
  SIZE: 0x138
  crash> skb_frag_t
  No struct type named skb_frag_t.
  struct skb_frag_struct {
      struct page *page;
      __u16 page_offset;
      __u16 size;
  }
  SIZE: 0x10
  crash>
  
Only 4 of the 18 struct page pointers are legitimate page pointer values.
The "nr_frags" value of 0xbf05 is bogus (and ultimately leads to the crash)
because it can never be larger than the number of frags[] entries, or 18.

FWIW, clearly the "gso_type" is invalid, and the questionable dataref "counter" 
value allows the skb_cloned() check on the counter to think it's OK, because 
(0xa080100 & ffff) is not equal to 1:
  
  crash> skb_shared_info 0xffff810fdae83d00
  struct skb_shared_info {
    dataref = {
      counter = 0xa080100  <- probably bogus
    }, 
    nr_frags = 0xbf05,   <- bogus -- cannot exceed 18
    gso_size = 0x46d3, 
    gso_segs = 0x150c, 
    gso_type = 0x34f8,   <- definitely bogus
    ip6_frag_id = 0x0, 
    frag_list = 0x0, 
    frags = {{
        page = 0xffff8101282872d0,  <- valid page address 
        page_offset = 0x199, 
        size = 0x88
      }, {
        page = 0xffff810127b370f0,  <- valid page address
        page_offset = 0x0, 
        size = 0x52c
      }, {
        page = 0xdfc143c21b0e4080,  <- bogus address causing panic/GPF 
        page_offset = 0x2e,            (repeats below) 
        size = 0x2e34
      }, {
        page = 0x1fa2e002314080, 
        page_offset = 0x3136, 
        size = 0xc602
      }, {
        page = 0x18bcc1022e343f35, 
        page_offset = 0x8901, 
        size = 0x340b
      }, {
        page = 0x2e34002edfc143c2, 
        page_offset = 0x3703, 
        size = 0x3441
      }, {
        page = 0xc6023136001fa2e0, 
        page_offset = 0xa220, 
        size = 0x1f
      }, {
        page = 0x340c890118bcc102, 
        page_offset = 0x4080, 
        size = 0x1b0e
      }, {
        page = 0x344137032e34002e, 
        page_offset = 0x4080, 
        size = 0x231
      }, {
        page = 0x1fa220c6023136, 
        page_offset = 0x3f35, 
        size = 0x2e34
      }, {
        page = 0x1b0e4080340d8901, 
        page_offset = 0x43c2, 
        size = 0xdfc1
      }, {
        page = 0x231408034413703, 
        page_offset = 0xa2e0, 
        size = 0x1f
      }, {
        page = 0x2e343f35001fa220, 
        page_offset = 0xc102, 
        size = 0x18bc
      }, {
        page = 0xdfc143c21b0e4080,  <- same bogus address as the one abovecausing panic/GPF 
        page_offset = 0x2e,            that caused the GPF
        size = 0x2e34
      }, {
        page = 0x1fa2e002314080, 
        page_offset = 0x3136, 
        size = 0xc602
      }, {
        page = 0x10002, 
        page_offset = 0x0, 
        size = 0x0
      }, {
        page = 0xffff8101267cfd90,  <- valid page address
        page_offset = 0xf68, 
        size = 0x98
      }, {
        page = 0xffff8101267cfdc8,  <- valid page address
        page_offset = 0x0, 
        size = 0xab4
      }}
  }

Also, the skb_shared_info structure at 0xffff810fdae83d00 is presently
allocated from the size-1024 slab cache, from a slab that starts 256
below that at ffff810fdae83c00:
  
  crash> kmem -s 0xffff810fdae83d00
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  ffff8101379562c0 size-1024               1024       1337      1600    400     4k
  SLAB              MEMORY            TOTAL  ALLOCATED  FREE
  ffff810fde1a8440  ffff810fdae83000      4          4     0
  FREE / [ALLOCATED]
    [ffff810fdae83c00]
  crash> skb_shared_info
  struct skb_shared_info {
      atomic_t dataref;
      short unsigned int nr_frags;
      short unsigned int gso_size;
      short unsigned int gso_segs;
      short unsigned int gso_type;
      unsigned int ip6_frag_id;
      struct sk_buff *frag_list;
      skb_frag_t frags[18];
  }
  SIZE: 312
  crash>

Given that the sk_shared_info is 312 bytes in size, if it were
kmalloc'd anonymously (?), it would have come out of the size-512
slab cache.  But I have no idea how it comes to be, perhaps it's
purposely encapsulated inside some other larger entity?
    
So, anyway, we get into pskb_copy() with the same sk_buff pointing
to the suspect skb_shared_info.  It sees the bogus nr_frags value
of 0xbf05, and tries to walk through that many skb_frag_t structures
of which there are only 18.  Interestingly enough, the first two of
them in the array do in fact have legitimate page structure
pointers, but the third one causes the crash:
    
        if (skb_shinfo(skb)->nr_frags) {
                int i;

                for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
                        skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
GPF on page deref ====> get_page(skb_shinfo(n)->frags[i].page);
                }
                skb_shinfo(n)->nr_frags = i;
        }

  crash> skb_shared_info 0xffff810fdae83d00
  struct skb_shared_info {
    dataref = {
      counter = 0xa080100  <- probably bogus
    },
    nr_frags = 0xbf05,   <- bogus -- cannot exceed 18
    gso_size = 0x46d3,
    gso_segs = 0x150c,
    gso_type = 0x34f8,   <- definitely bogus
    ip6_frag_id = 0x0,
    frag_list = 0x0,
    frags = {{
        page = 0xffff8101282872d0,  <- valid page address
        page_offset = 0x199,
        size = 0x88
      }, {
        page = 0xffff810127b370f0,  <- valid page address
        page_offset = 0x0,
        size = 0x52c
      }, {
        page = 0xdfc143c21b0e4080,  <- bogus address causing panic/GPF
        page_offset = 0x2e,            (repeats below)
        size = 0x2e34

So the question is: what's the deal with the skb_shared_info structure
being used?

I don't have any background/experience/understanding of networking
code, so this will have to be looked at by somebody who does.

Comment 6 Dave Anderson 2009-06-25 18:30:46 UTC

> Also, the skb_shared_info structure at 0xffff810fdae83d00 is presently
> allocated from the size-1024 slab cache, from a slab that starts 256
> below that at ffff810fdae83c00:
>   
>   crash> kmem -s 0xffff810fdae83d00
>   CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
>   ffff8101379562c0 size-1024               1024       1337      1600    400     4k
>   SLAB              MEMORY            TOTAL  ALLOCATED  FREE
>   ffff810fde1a8440  ffff810fdae83000      4          4     0
>   FREE / [ALLOCATED]
>     [ffff810fdae83c00]
>   crash> skb_shared_info
>   struct skb_shared_info {
>       atomic_t dataref;
>       short unsigned int nr_frags;
>       short unsigned int gso_size;
>       short unsigned int gso_segs;
>       short unsigned int gso_type;
>       unsigned int ip6_frag_id;
>       struct sk_buff *frag_list;
>       skb_frag_t frags[18];
>   }
>   SIZE: 312
>   crash>
> 
> Given that the sk_shared_info is 312 bytes in size, if it were
> kmalloc'd anonymously (?), it would have come out of the size-512
> slab cache.  But I have no idea how it comes to be, perhaps it's
> purposely encapsulated inside some other larger entity?

Sorry about the blather above -- clearly the block of memory was
allocated from ffff810fdae83c00, but the skb_shared_info points
256-bytes into it:

  crash> sk_buff ffff810e03586b40
    struct sk_buff {
      next = 0xffff810fb780cb00, 
      prev = 0xffff810d513eee70, 
  
      ... [ snip ] ...
  
      head = 0xffff810fdae83c00     <--- chuck of kmalloc'd memory
      data = 0xffff810fdae83d00 "", 
      tail = 0xffff810fdae83d00 "", 
      end = 0xffff810fdae83d00 ""   <-- containing this skb_shared_info
    }
  
But obviously it's got bogus data in it...

Comment 7 Dave Anderson 2009-07-01 19:45:57 UTC

One other point -- the slab subsystem shows no corruption.

So the bottom line is that tcp_transmit_skb() received an sk_buff
at ffff810e03586b40, which references an skb_shared_info structure
at 0xffff810fdae83d00, which is at least partially corrupt.

This needs to be looked at by a networking guru...

Comment 9 Mary Edie Meredith 2009-09-14 20:08:21 UTC

Created attachment 360994 [details]
Believe we can reproduce this consistently

Enclosed please find a screen shot of a RHEL5.3 kernel panic we experience.  Based on the bugzilla discussion and RIP message, we may have the same issue as reported here, and we can repeat this at will.   This panic is holding up a project.   What can we do to make progress?

Comment 10 Jiri Pirko 2009-10-01 10:34:16 UTC

(In reply to comment #9)
> Created an attachment (id=360994) [details]
> Believe we can reproduce this consistently

Can you please provide steps to reproduce this issue?

Thanks.

> 
> Enclosed please find a screen shot of a RHEL5.3 kernel panic we experience. 
> Based on the bugzilla discussion and RIP message, we may have the same issue as
> reported here, and we can repeat this at will.   This panic is holding up a
> project.   What can we do to make progress?

Comment 12 Rich Rauenzahn 2009-10-20 17:46:52 UTC

I'm working with Mary on this --

Mary's case is reproduced by using system-config-netboot to create an initrd and also by recompiling a kernel with networking enabled.  This custom kernel is configured to enable networking/dhcp config at kernel load time.  (per the instructions we've seen for diskless/nfs root)

In order to cause the panic, the nfs server is cycled.  The panic occurs when the nfs server comes up again, sends a tcp reset to reestablish the tcp connection.  After it is reestablished, the panic occurs.

I'm wondering if this isn't due to a root nfs config, but perhaps some kind of networking corruption in the kernel from enabling kernel network config at boot time.

We have not been able to reproduce with ordinary nfs mounts with stock kernels that config networking normally (post kernel load).

Comment 15 Neil Horman 2010-03-04 15:35:45 UTC

I just noticed that e1000e is in use here.  We just found a data corruptor with that driver.  Jiri, could you try building a test kernel for this with this patch please:
http://post-office.corp.redhat.com/archives/rhkernel-list/2010-February/msg01293.html

Comment 16 Jiri Pirko 2010-03-04 16:09:24 UTC

Hmm, that could be it. Actually the patch you are reffering to is present in 2.6.18-191.el5

http://people.redhat.com/jwilson/el5/191.el5/

Rich, Mary, would you please test this kernel?

Thanks.

Comment 17 Rich Rauenzahn 2010-03-04 16:26:35 UTC

Sorry, the project was canceled due to this bug and the environment re-purposed.

Comment 18 Jiri Pirko 2010-03-04 17:25:33 UTC

Marc, do you think it would be possible to test kernel mentioned in comment #15, if it solves problem in this bz?

Comment 19 Marc Milgram 2010-03-04 20:27:55 UTC

Jiri,

I attempted to reproduce the issue, but was unable to do so.
The customer no longer has the configuration.

As far as I am concerned, you can go ahead and close this BZ.

Comment 20 Jiri Pirko 2010-03-04 20:36:50 UTC

Closing with insufficient data. Feel free to reopen.

Note You need to log in before you can comment on or make changes to this bug.