620504 – during longevity test run hit paging request BUG assertion

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 620504 - during longevity test run hit paging request BUG assertion

Summary: during longevity test run hit paging request BUG assertion

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-02 17:46 UTC by Mike Gahagan
Modified:	2011-04-04 14:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-04-04 14:07:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Complete console log (45.34 KB, text/plain) 2010-08-02 17:48 UTC, Mike Gahagan	no flags	Details
View All

Description Mike Gahagan 2010-08-02 17:46:38 UTC

Description of problem:
BUG: unable to handle kernel paging request at ffffeba400000000
IP: [<ffffffff8116bd7e>] free_block+0x9e/0x230
PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu63/cache/index2/shared_cpu_map
CPU 8 
Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs nls_koi8_u cryptd aes_x86_64 aes_generic autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci dm_mod [last unloaded: rmd128]

Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs nls_koi8_u cryptd aes_x86_64 aes_generic autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci dm_mod [last unloaded: rmd128]
Pid: 28, comm: ksoftirqd/8 Not tainted 2.6.32-54.el6.x86_64.debug #1 Sunrise Ridge
RIP: 0010:[<ffffffff8116bd7e>]  [<ffffffff8116bd7e>] free_block+0x9e/0x230
RSP: 0018:ffff88002fa03da8  EFLAGS: 00010086
RAX: ffffeba400000000 RBX: ffff88057b0f0100 RCX: 0000000000000008
RDX: ffffea0000000000 RSI: ffff8801453d20c0 RDI: 0000000000000000
RBP: ffff88002fa03df8 R08: ffff8802777a3740 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880236994000
R13: ffff880276a40760 R14: 0000000000000006 R15: 000000000000101a
FS:  0000000000000000(0000) GS:ffff88002fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffeba400000000 CR3: 000000047343f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/8 (pid: 28, threadinfo ffff880276a84000, task ffff880276a80740)
Stack:
 ffff8802777a3798 0000001000000000 0000000000000000 ffff8802777a3740
<0> ffff88002fa03df8 0000000000000010 ffff880276a406e0 ffff8802777a3740
<0> ffff88057b0f0100 ffff880276a40730 ffff88002fa03e58 ffffffff8116c205
Call Trace:
 <IRQ> 
 [<ffffffff8116c205>] cache_flusharray+0x95/0x180
 [<ffffffff8116bb06>] kmem_cache_free+0x256/0x2b0
 [<ffffffff8118746d>] file_free_rcu+0x4d/0x70
 [<ffffffff810effcd>] __rcu_process_callbacks+0x12d/0x3e0
 [<ffffffff810f02ab>] rcu_process_callbacks+0x2b/0x50
 [<ffffffff81077135>] __do_softirq+0xd5/0x220
 [<ffffffff810143cc>] call_softirq+0x1c/0x30
 <EOI> 
 [<ffffffff810160cd>] ? do_softirq+0xad/0xe0
 [<ffffffff81076a70>] ksoftirqd+0x80/0x120
 [<ffffffff810769f0>] ? ksoftirqd+0x0/0x120
 [<ffffffff81096646>] kthread+0x96/0xa0
 [<ffffffff810142ca>] child_rip+0xa/0x20
 [<ffffffff81013c10>] ? restore_args+0x0/0x30
 [<ffffffff810965b0>] ? kthread+0x0/0xa0
 [<ffffffff810142c0>] ? child_rip+0x0/0x20
Code: 89 c7 48 89 45 c0 e8 a2 ee ed ff 48 c1 e8 0c 48 8d 14 c5 00 00 00 00 48 c1 e0 06 48 29 d0 48 ba 00 00 00 00 00 ea ff ff 48 01 d0 <48> 8b 10 66 85 d2 0f 88 23 01 00 00 84 d2 0f 89 70 01 00 00 4c 
RIP  [<ffffffff8116bd7e>] free_block+0x9e/0x230
 RSP <ffff88002fa03da8>
CR2: ffffeba400000000
---[ end trace 0b9a0d246f57ca69 ]---

see attached log for the rest of the kernel trace.

Version-Release number of selected component (if applicable):
0722.0 tree running the -54.x86_64.debug kernel.

How reproducible:
So far only once

Steps to Reproduce:
1.build/install LTP and run ltpstress.sh as follows:
./ltpstress.sh  -m 22000 -t 96  # use 22GB of RAM, run for 96hrs.
2.
3.
  
Actual results:

see above and attached file for complete log

Expected results:

test completes with no panic

Additional info:

Failure seems to have occured sometime after the first 24 hours or operation.

Comment 1 Mike Gahagan 2010-08-02 17:48:00 UTC

Created attachment 436079 [details]
Complete console log

Comment 2 Mike Gahagan 2010-08-02 19:33:17 UTC

starting another run with the non-debug -54 kernel.

Comment 5 Larry Woodman 2010-08-05 20:09:10 UTC

If possible we need to get a crash dump when this happens.  Evidently there is corruption in the slabcache because we are crashing in free_block() while dereferencing a kmem_list.


Larry

Comment 6 Mike Gahagan 2010-08-05 20:45:41 UTC

I'll try with the debug kernel again and hopefully get a crash dump this time.

By the way, the run I started on Monday with the non-debug kernel is nearly finished. It should finish tonight/early tomorrow morning. So far I haven't seen any issues.

Comment 7 Mike Gahagan 2010-08-06 19:09:52 UTC

I'm running into some issues getting a crash dump out of the debug kernel that looks like bz 612244, so it may be hard to get a crash dump unless I can find a workaround. I've sent mail to Jason B to follow up.

Comment 9 RHEL Program Management 2011-01-07 04:08:26 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 10 Suzanne Logcher 2011-01-07 16:17:04 UTC

This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 11 Larry Woodman 2011-01-13 15:47:41 UTC

Does this problem still happen in the latest 6.1 kernel?  We removed some buggy debug code from the slab debug code that looks like it was in this area.

Either way, I can not reproduce this problem and we never got a dump so I cant make any progress on this BZ until we can get more data.

Larry

Comment 12 Mike Gahagan 2011-01-13 16:05:40 UTC

I don't recall ever hitting this on any recent RHEL 6.0 kernel. I think it only occured one time. We have not yet done a longevity test run with any of the 6.1 kernels yet, usually we wait till close to the end of the testing phase, but in light of this bug we'll run it a bit earlier.

Comment 14 RHEL Program Management 2011-02-01 05:41:16 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 15 RHEL Program Management 2011-02-01 18:31:59 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 16 RHEL Program Management 2011-04-04 02:21:49 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 17 Mike Gahagan 2011-04-04 14:07:32 UTC

I've run the longevity test on x86_64 with the beta kernel and it finished without issues. Also ran on s/390x and observed one panic (filed as a separate bz) that had to do with NFS, so I think this one can be closed.

Note You need to log in before you can comment on or make changes to this bug.