450155 – /sbin/dump causes a kernel panic in map_bio() /

Bug 450155 - /sbin/dump causes a kernel panic in map_bio() /

Summary: /sbin/dump causes a kernel panic in map_bio() /

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Anton Arapov
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-05 16:02 UTC by Alf Clement
Modified:	2014-06-18 08:01 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-12-18 11:25:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Alf Clement 2008-06-05 16:02:37 UTC

Description of problem:

this applies to kernels 2.6.19-92.el5 and 2.6.18-53
assign to kernel MD component.

I installed a linux server and run a backup using dump.
The effect was the the kernel panic'ed in map_bio() with 2.6.18-53.1.4.el5.
After I upgraded to 2.6.18-92 the system seem to hang.
Always after some time, ie. when 17% of the dump is done, sometime earlier.

I do a dump level 0 (-b64) from a RAID 1 array mounted on /data to a
file in root fs in /backup. It's called /dev/mapper/nvidia_bcacfdbgp1
The system is in normal user mode, but also in single user mode.

Steps to reproduce:
/sbin/dump -0u -b64 -f /backup/data.0 /data

either in single or multiuser mode.

Maybe it has something todo with the -b option? I've created a few backups
before I introduced the -b64 option to improve the speed.
-b64 seems to panic always. Right now I start over to test with -b16, but this
is slow...

EIP was pointing to map_bio(). 
Callstack which I could read of the console:
add_to_page_cache
__do_pache_cache_readahead
dm_any_congested
blockable_page_cache_readahead
make_ahead_window
page_cache_readahead
do_generic_mapping
__generic_file_io_read
file_read_actor
generic_file_read
auto_remove_wake_function
mutex_lock
block_lseek
vfs_read
sys_read
sys_call

Comment 1 Anton Arapov 2008-06-10 09:26:59 UTC

Please, try to reproduce this problem and attach the whole debug message, you've
got. That will provide us detailed information to work.

Comment 2 Alf Clement 2008-06-10 09:39:41 UTC

I had reconfigured the machine in order to get it stable running, so I cannot 
reproduce it anymore. The output on the console is the same as I wrote.

It a HP Proliant ML115. I've had / on a 160GB disk and two 500GB discs for the
RAID array mounted on /data. All fileststems with ext3.
So dump was reading from /data and storing data on /backup. 
The size of used space in /data about 40GB. / had enough space to store the
backup. I played with the blocksizes, but got also problems at 32. 16 or lower
takes too much time to do the backup.

Hope you can reproduce it.

Comment 3 Anton Arapov 2008-06-10 10:22:44 UTC

It would be _very_ helpful to have mentioned backtrace with the
addresses/offsets... 

I'm trying to reproduce it but have no luck so far ...

Comment 4 Anton Arapov 2008-06-10 12:06:42 UTC

Tried to reproduce it in many ways, even on the same configuration: raid1(using
dm),... played with blocksize variable. Every backup were successful.

No similar complaints were found in lkml and internet. I need detailed backtrace
for further investigation.

So that putting bug into NEEDINFO state.

Note You need to log in before you can comment on or make changes to this bug.