Bug 622390

Summary: RHEL4.8 kernel BUG at include/asm/spinlock.h:133 caused by slab corruption
Product: Red Hat Enterprise Linux 4 Reporter: Mark Wu <dwu>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8CC: james.brown, jwest, syeghiay, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-14 19:54:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark Wu 2010-08-09 07:49:25 UTC
Description of problem:
System got panic in the code of spinlock because the magic was corrupted. And before the panic, the system was running in low memory condition and a lot of "oom" events were seen in the log.

------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:133!
invalid operand: 0000 [#1]
SMP 
Modules linked in: netconsole netdump autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc md5 ipv6 vmmemctl(U) cpufreq_powersave dm_mirror dm_multipath dm_mod button battery ac pcnet32 vmxnet(U) mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
CPU:    2
EIP:    0060:[<c02dc7e7>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-89.0.11.ELsmp) 
EIP is at _spin_lock+0x1c/0x34
eax: c02f0760   ebx: f28667ec   ecx: f699dea8   edx: c029b506
esi: d98fbd80   edi: e262b540   ebp: 000000d0   esp: f699deac
ds: 007b   es: 007b   ss: 0068
Process csnmpd (pid: 5022, threadinfo=f699d000 task=f7b647f0)
Stack: e262b540 c029b506 000000d0 d2f71a00 f28667ec f28667c0 00000006 e262b540 
       f28667ec f28667c0 f689ca80 c029b75e f7fe1c00 f7fd3dc0 d2f71a00 c0358910 
       00000300 00000006 c029030a c02900c6 c02901d6 00000000 00000001 00000014 
Call Trace:
 [<c029b506>] netlink_dump+0x46/0x18b
 [<c029b75e>] netlink_dump_start+0x113/0x11c
 [<c029030a>] rtnetlink_rcv+0x131/0x327
 [<c02900c6>] rtnetlink_dump_all+0x0/0x8f
 [<c02901d6>] rtnetlink_done+0x0/0x3
 [<c029b3d3>] netlink_data_ready+0x14/0x44
 [<c028a006>] dev_ioctl+0x76/0x4ab
 [<c015cc6b>] fd_install+0x21/0x40
 [<c02bdaff>] udp_ioctl+0x0/0xc5
 [<c02c3ca6>] inet_ioctl+0xa0/0xa5
 [<c02808b3>] sock_ioctl+0x28c/0x2b4
 [<c016dc0a>] sys_ioctl+0x227/0x269
 [<c0126fc5>] sys_gettimeofday+0x53/0xac
 [<c02ddcdb>] syscall_call+0x7/0xb
Code: 00 75 09 f0 81 02 00 00 00 01 30 c9 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 60 07 2f c0 e8 3d 6d e4 ff 58 5a <0f> 0b 85 00 68 f7 2e c0 f0 fe 0b 79 09 f3 90 80 3b 00 7e f9 eb 


Version-Release number of selected component (if applicable):
2.6.9-89.0.11.ELsmp

How reproducible:
Only once until now.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
kernel panic.

Expected results:
No panic

Additional info:
vmcore is avaiable at core-i386.gsslab.rdu.redhat.com:/cores/20100802103059/work

Comment 1 Mark Wu 2010-08-09 08:29:27 UTC
crash> bt
PID: 5022   TASK: f7b647f0  CPU: 2   COMMAND: "csnmpd"
#0 [f699dd68] netpoll_start_netdump at f8960570
#1 [f699dd88] die at c0106049
#2 [f699ddbc] do_invalid_op at c0106424
#3 [f699de6c] error_code (via invalid_op) at c02de7d1
   EAX: c02f0760  EBX: f28667ec  ECX: f699dea8  EDX: c029b506  EBP: 000000d0
   DS:  007b      ESI: d98fbd80  ES:  007b      EDI: e262b540
   CS:  0060      EIP: c02dc7e7  ERR: ffffffff  EFLAGS: 00010246
#4 [f699dea8] _spin_lock at c02dc7e7
#5 [f699deb0] netlink_dump at c029b501
#6 [f699ded8] netlink_dump_start at c029b759
#7 [f699def4] rtnetlink_rcv at c0290305
#8 [f699df28] netlink_data_ready at c029b3d1
#9 [f699df30] dev_ioctl at c028a000
#10 [f699df68] inet_ioctl at c02c3ca1
#11 [f699df7c] sock_ioctl at c02808b0
#12 [f699df94] sys_ioctl at c016dc07
#13 [f699dfc0] system_call at c02ddcd4
   EAX: 00000036  EBX: 00000009  ECX: 00008912  EDX: bfebdf58
   DS:  007b      ESI: bfebdf58  ES:  007b      EDI: 00000008
   SS:  007b      ESP: bfebde94  EBP: bfebded8
   CS:  0073      EIP: 004dc7a2  ERR: 00000036  EFLAGS: 00000206

Pick up the pointer of netlink_opt  from stack
crash> netlink_opt f28667c0
struct netlink_opt {
 pid = 0x0,
 groups = 0x1,
 dst_pid = 0x0,
 dst_groups = 0x0,
 state = 0xf5616e00,
 handler = 0xf5616e00,
 wait = {
   lock = {
     lock = 0x2b2f2507,
     magic = 0xf6eee580
   },
   task_list = {
     next = 0x7,
     prev = 0xdead4ead
   }
 },
 cb = 0xf28667e8,
 cb_lock = {
   lock = 0xf28667e8,
   magic = 0xc033b524   ---> The magic should be 0xdead4ead, but it was corrupted already, so system panicked when acquiring spin_lock
 },
 data_ready = 0
}

Get the parameter(a pointer to structure sock) passed to function netlink_dump from EDI register, and it looks good, not corrupted, so f28667c0 should be a good pointer.
crash> sock e262b540   
struct sock {
 ...
 sk_lock = {
   slock = {
     lock = 0x1,
     magic = 0xdead4ead
   },
   owner = 0x0,
   wq = {
     lock = {
       lock = 0x1,
       magic = 0xdead4ead
     },
     task_list = {
       next = 0xe262b574,
       prev = 0xe262b574
     }
   }
 },
 sk_rcvbuf = 0x100000,
 sk_sleep = 0x0,
 sk_dst_cache = 0x0,
 sk_dst_lock = {
   lock = 0x1000000,
   magic = 0xdeaf1eed
 },
 sk_policy = {0x0, 0x0},
 sk_rmem_alloc = {
   counter = 0x1030
 },
 sk_receive_queue = {
   next = 0xe262b59c,
   prev = 0xe262b59c,
   qlen = 0x0,
   lock = {
     lock = 0x1,
     magic = 0xdead4ead
   }
 },
 sk_wmem_alloc = {
   counter = 0x0
 },
 sk_write_queue = {
   next = 0xe262b5b4,
   prev = 0xe262b5b4,
   qlen = 0x0,
   lock = {
     lock = 0x1,
     magic = 0xdead4ead
   }
 },
 sk_omem_alloc = {
   counter = 0x0
 },
 sk_wmem_queued = 0x0,
 sk_forward_alloc = 0x0,
 sk_allocation = 0xd0,
 sk_sndbuf = 0x100000,
 sk_flags = 0x1,
 sk_no_check = 0x0,
 sk_debug = 0x0,
 sk_rcvtstamp = 0x0,
 sk_no_largesend = 0x0,
 sk_route_caps = 0x0,
 sk_lingertime = 0x0,
 sk_hashent = 0x0,
 sk_backlog = {
   head = 0x0,
   tail = 0x0
 },
 sk_callback_lock = {
   lock = 0x1000000,
   magic = 0xdeaf1eed
 },
 sk_error_queue = {
   next = 0xe262b600,
   prev = 0xe262b600,
   qlen = 0x0,
   lock = {
     lock = 0x1,
     magic = 0xdead4ead
   }
 },
 sk_prot = 0x0,
 sk_err = 0x0,
 sk_err_soft = 0x0,
 sk_ack_backlog = 0x0,
 sk_max_ack_backlog = 0x0,
 sk_priority = 0x0,
 sk_type = 0x3,
 sk_localroute = 0x0,
 sk_protocol = 0x0,
 sk_peercred = {
   pid = 0x0,
   uid = 0xffffffff,
   gid = 0xffffffff
 },
 sk_rcvlowat = 0x1,
 sk_rcvtimeo = 0x7fffffff,
 sk_sndtimeo = 0x7fffffff,
 sk_filter = 0x0,
 sk_protinfo = 0xf28667c0,   ---> The pointer of structure netlink_opt
 sk_slab = 0xf7f06740,
 ...
}


Have a look at the source code of function netlink_dump_start
<snip>
int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
      struct nlmsghdr *nlh,
      int (*dump)(struct sk_buff *skb, struct netlink_callback*),
      int (*done)(struct netlink_callback*))
{
...

/* A dump is in progress... */
spin_lock(&nlk->cb_lock);
if (nlk->cb) {
spin_unlock(&nlk->cb_lock);
netlink_destroy_callback(cb);
sock_put(sk);
return -EBUSY;
}
nlk->cb = cb;
spin_unlock(&nlk->cb_lock);

netlink_dump(sk);
return 0;
}
</snip>

Before calling netlink_dump, it had tried to aqcuire "cb_lock". However, no panic happened at that moment, so it should be corrupted after that. Have a check with the running time of process "csnmpd":
crash> px per_cpu__runqueues
PER-CPU DATA TYPE:
 struct runqueue per_cpu__runqueues;
PER-CPU ADDRESSES:
 [0]: c5055d80
 [1]: c505dd80
 [2]: c5065d80
 [3]: c506dd80

crash> runqueue c5065d80|grep last
 timestamp_last_tick = 0x11d5301eebd680,
crash> task_struct.last_ran f7b647f0
 last_ran = 0x11d5301dd92e00,
crash> px (5019477258000000-5019477240000000)/1000000
$4 = 0x12        --> 12ms

Search the reference of f28667c0

crash> search -k f28667c0
c5290108: f28667c0 --> in cpu1's array_cache of slab size-64
c5290168: f28667c0 --> the same as above
e262b648: f28667c0 --> the member sk_protinfo of sock e262b540
f5616dfc: f28667c0 --> used by a nfs_inode
f699dec0: f28667c0 --> in the stack of panic process
f699ded0: f28667c0 --> the same as above

crash> kmem -s|grep size-64
kmem: "size-1620" cache: object e1070300 on both cpu 2 and shared lists
kmem: "size-1620" cache: object e1070980 on both cpu 2 and shared lists
kmem: "size-1620" cache: object e112e2c0 on both cpu 2 and shared lists
kmem: "size-1620" cache: object e112e940 on both cpu 2 and shared lists
kmem: "size-1620" cache: object e112efc0 on both cpu 2 and shared lists
kmem: "size-1620" cache: object e112f640 on both cpu 2 and shared lists
kmem: "size-1620" cache: object ece7c280 on both cpu 2 and shared lists
kmem: "size-1620" cache: object ece7d600 on both cpu 2 and shared lists
kmem: "size-1620" cache: object d4ff55c0 on both cpu 2 and shared lists
kmem: "size-1620" cache: object cfca0f00 on both cpu 2 and shared lists
kmem: "size-1620" cache: object f3ec4840 on both cpu 2 and shared lists
kmem: "size-1620" cache: object dc0b0180 on both cpu 2 and shared lists
kmem: "size-1620" cache: object dc0b1500 on both cpu 2 and shared lists
kmem: "size-1620" cache: object f5bd0300 on both cpu 2 and shared lists
kmem: "size-1620" cache: object f541d040 on both cpu 2 and shared lists
f7fff340 size-64(DMA)              64          0         0      0     4k
f7fff240 size-64                   64       4766     18971    311     4k
crash> kmem_cache_s.array f7fff240
 array = {0xc523f000, 0xc5290000, 0xc52c8000, 0xf7e02000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
crash>array_cache 0xc5290000
struct array_cache {
  avail = 0x26, 
  limit = 0x78, 
  batchcount = 0x3c, 
  touched = 0x1
}
crash> rd  0xc5290000 100
c5290000:  00000026 00000078 0000003c 00000001   &...x...<.......
...
c52900e0:  f7ca7c40 f3314c40 f7ca7600 f7ca7a40   @|..@L1..v..@z..
c52900f0:  f7ca78c0 f7ca7d00 f7ca7b40 f689c4c0   .x...}..@{......
c5290100:  f2866e80 f2866200 f28667c0 f7c87300   .n...b...g...s..
                              ^^^^^^^
c5290110:  e4617cc0 e7595c80 f7c89280 f6899400   .|a..\Y.........
c5290120:  f6899d00 e44f3e40 e44f3fc0 e44f3a80   ....@>O..?O..:O.
c5290130:  e44f37c0 f2866740 f6e12540 e44f3600   .7O.@g..@%...6O.
c5290140:  e44f3d80 f7b9bb40 f7b9be40 f7b9b500   .=O.@...@.......
c5290150:  f3314c80 f2866700 f2866340 f2866f00   .L1..g..@c...o..
c5290160:  f2866e80 f2866200 f28667c0 f7c87300   .n...b...g...s..
                              ^^^^^^
c5290170:  e4617cc0 e7595c80 f7c89280 e7594400   .|a..\Y......DY.
c5290180:  f7b359c0 eca01600 eca01600 eca01600   .Y..............

f28667c0 was duplicated in the entries after cursor "avail"(0x26) in cpu1's per array_cache of slab "size-64". I think this should be a problem.
Another problem, how could the process "csnmpd" on cpu2 use the slab object on cpu1?

crash> nfs_inode.access_cache,access_cache_entry_lru 0xf5616d44
 access_cache = {
   rb_node = 0xf28667c0
 },
 access_cache_entry_lru = {
   next = 0xf28667d0,
   prev = 0xf28667d0
 },
crash> nfs_access_entry f28667c0
struct nfs_access_entry {
 rb_node = {
   rb_parent = 0x0,
   rb_color = 0x1,
   rb_right = 0x0,
   rb_left = 0x0
 },
 lru = {
   next = 0xf5616e00,
   prev = 0xf5616e00
 },
 jiffies = 0x2b2f2507,
 cred = 0xf6eee580,
 mask = 0x7
}
crash> rpc_cred.cr_magic 0xf6eee580
 cr_magic = 0xf4aa4f0, -----> the same magic as defined in source code.

crash> px jiffies
jiffies = $6 = 0x2b2fe8fd
crash> px 0x2b2fe8fd-0x2b2f2507
$7 = 0xc3f6            -----> before the last run time of process csnmpd

So it seems that the size-64 object f28667c0 was used as "netlink_opt" at first, and when it's still in use, it's allocated again for "nfs_access_entry" erroneously. When "csnmpd" was scheduled on cpu again, system got panic due to the corruption of structure "netlink_opt".