Bug 167532

Summary:	Bad: kernel crashed
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Janos Haar <djani22>
Component:	gnbd	Assignee:	Ben Marzinski <bmarzins>
Status:	CLOSED WORKSFORME	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4	CC:	cluster-maint
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
URL:	http://download.netcenter.hu/gnbd-bug/1-8.log
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-02-02 21:39:53 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Janos Haar 2005-09-04 12:13:58 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)

Description of problem:
When I use this program, I gets often these messages, and the system crashes:

1:

 Unable to handle kernel paging request at virtual address a014d7a5 
  printing eip: 
 c0118cee 
 *pde = f7bedd02 
 Oops: 0000 [#1] 
 SMP  
 Modules linked in: netconsole gnbd 
 CPU:    0 
 EIP:    0060:[<c0118cee>]    Not tainted VLI 
 EFLAGS: 00010296   (2.6.13-rc6)  
 EIP is at kmap+0x1e/0x54 
 eax: 00000246   ebx: a014d7a5   ecx: c11ef260   edx: cabbc400 
 esi: 00008000   edi: 00000001   ebp: f6c7fe00   esp: f6c7fdf4 
 ds: 007b   es: 007b   ss: 0068 
 Process md3_raid1 (pid: 2769, threadinfo=f6c7e000 task=f7eef020) 
 Stack: c0577800 00000006 f5f93cfc f6c7fe54 f895a9cc a014d7a5 00000001 cf793000  
        00001000 00004000 d3fc3180 f73e9bf0 f895e718 cabbc400 007ea037 01000000  
        d4175a4c f895e6f0 65000000 00f03d8d 00100000 d4175a4c f895e6f0 f895e700  
 Call Trace: 
  [<c0103ca2>] show_stack+0x9a/0xd0 
  [<c0103e6d>] show_registers+0x175/0x209 
  [<c010408c>] die+0xfa/0x17c 
  [<c0117b68>] do_page_fault+0x269/0x7bd 
  [<c01038d7>] error_code+0x4f/0x54 
  [<f895a9cc>] __gnbd_send_req+0x196/0x28d [gnbd] 
  [<f895af12>] do_gnbd_request+0xe5/0x198 [gnbd] 
  [<c0383a0d>] __generic_unplug_device+0x28/0x2e 
  [<c038150f>] __elv_add_request+0xaa/0xac 
  [<c0384e5b>] __make_request+0x20d/0x512 
  [<c0385490>] generic_make_request+0xb2/0x27a 
  [<c04748a2>] raid1d+0xbf/0x2cb 
  [<c04825c9>] md_thread+0x134/0x16f 
  [<c01010d5>] kernel_thread_helper+0x5/0xb 
 Code: 89 c1 81 e1 ff ff 0f 00 eb b0 90 90 90 55 89 e5 53 83 ec 08 8b 5d 08 c7 44 24 04 06 00 00 00 c7 04 24 00 78 57 c0 e8 72 47 00 00 <8b> 03 c1 e8 1e 8b 14 85 14 db 73 c0 8b 82 0c 04 00 00 05 00 09  
  <0>Fatal exception: panic in 5 seconds 

2-3-4...8 are in this url:

http://download.netcenter.hu/gnbd-bug/1-8.log


Version-Release number of selected component (if applicable):
gnbd-kernel-2.6.11.2-20050420.133124.FC4.39

How reproducible:
Sometimes

Steps to Reproduce:
1. high use of this module.

Additional info:

I use this module with very high load. (continously 200-350Mbit/s for reads and 25-200Mbit/s for write (write average about 30-40Mbit/s))
I get this problem about daily, or 1-2 days often, depends on traffic.

The system:
HW:
dual xeon 2x3G 800FSB
4GB ECC DDR2/400
2x e1000 ethernet.

SW:
RH 9.0
GCC: 4.0.0
kernel: 2.6.13 and 13-rc6

summary:
1x webserver (client, xeon 8TB = 4x2TB)
4x GNBD server (4x 2.0TB)

In the GNBD-client I get one big (8TB) raid0 device from 4 nodes.
On the big raid0 device I use XFS. (no cluster, only gnbd)

When I switch from 2.6.13-rc3 to rc6 I gets this problem.
I have reported it to kernel developers, but they forwarded me to here.

Comment 1 Ben Marzinski 2006-02-02 21:39:53 UTC

I have not seen this issue using either the STABLE or HEAD gnbd branches with a
kernel.org kernel.  If someone sees this issue again, please reopen the bug.