58940 – (IDE)gpart unable to complete job without crashing the kernel

Bug 58940 - (IDE)gpart unable to complete job without crashing the kernel

Summary: (IDE)gpart unable to complete job without crashing the kernel

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:	http://www.stud.uni-hannover.de/user/...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-01-28 12:57 UTC by Francois-Xavier 'FiX' KOWALSKI
Modified:	2008-08-01 16:22 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:21 UTC
Embargoed:

Attachments	(Terms of Use)

Description Francois-Xavier 'FiX' KOWALSKI 2002-01-28 12:57:49 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)
Gecko/20011019 Netscape6/6.2

Description of problem:
Using gpart to recover a deleted partitition table makes kernel 2.4.9-12 (from
ftp://updates.redhat.com) either:
- kill gpart with SIGSEGV, after a bottom_hlaf failure in the IDE sub-system
- Ooops the kernel (same location, not the same stack trace)
- Panic the kernel
In any case, gpart is unable to complete its job.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. download gpart.linux from
<http://www.stud.uni-hannover.de/user/76201/gpart/>, or recmopile the binray
from the source (the final result is the same).
2. having an IDE drive with an extended partition, carrying several logical
partitions.
3. delete the first logical partition
4. re-create the logical partition as Linux swap
5. make the swap partition (mkswap)
NOTE: steps 4 & 5 are what I can remember from a wrong manipulation in my
personnal partition layout.
6. delete the new swap partition
7. run gpart.bin looking forecfully for partition boundaries (-f) without
various search sizes: head (-n h), cylinders (-n c) or sectors (-n s).

NOTE: the test is always run in single-user mode

Actual Results:  The kernel encounters an internal error & kills gpart.  The
error stack trace is shown below (captured using a serial link).  It is followed
by a dump of various information using the "Magiv SysRq" keys.

N

# ./gpart-0.1h-2.4.9-12-i586 -k 8562644 -vvfn h


dev(/dev/hdb) mss(512) chs(1245/255/63)(LBA) #s(20000925) size(9766mb)
Primary partition(1)
   type: 131(0x83)(Linux ext2 filesystem)
   size: 23mb #s(48132) s(63-48194)
   chs:  (0/1/1)-(2/254/63)d (0/1/1)-(2/254/63)r
   hex:  00 01 01 00 83 FE 3F 02 3F 00 00 00 04 BC 00 00

Primary partition(2)
   type: 004(0x04)(Primary DOS with 16 bit FAT (<= 32MB))
   size: 23mb #s(48195) s(48195-96389)
   chs:  (3/0/1)-(5/254/63)d (3/0/1)-(5/254/63)r
   hex:  00 00 01 03 04 FE 3F 05 43 BC 00 00 43 BC 00 00

Primary partition(3)
   type: 131(0x83)(Linux ext2 filesystem)
   size: 4000mb #s(8193150) s(96390-8289539)
   chs:  (6/0/1)-(515/254/63)d (6/0/1)-(515/254/63)r
   hex:  00 00 01 06 83 FE BF 03 86 78 01 00 7E 04 7D 00

Primary partition(4)
   type: 005(0x05)(Extended DOS)
   size: 5718mb #s(11711385) s(8289540-20000924)
   chs:  (516/0/1)-(1023/254/63)d (516/0/1)-(1244/254/63)r
   hex:  00 00 81 04 05 FE FF FF 04 7D 7E 00 99 B3 B2 00

   Logical partition
      type: 130(0x82)(Linux swap or Solaris/x86)
      size: 133mb #s(273042) s(8289603-8562644)
      chs:  (516/1/1)-(532/254/63)d (516/1/1)-(532/254/63)r
      hex:  00 01 81 04 82 FE BF 14 3F 00 00 00 92 2A 04 00


Begin scan...
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012c1ad>]    Not tainted
EFLAGS: 00010086
eax: 00000020   ebx: c10d9518   ecx: 00000001   edx: 000013aa
esi: c02d56d4   edi: c10d9518   ebp: 00000000   esp: c269bc78
ds: 0018   es: 0018   ss: 0018
Process gpart-0.1h-2.4. (pid: 229, stackpage=c269b000)
Stack: c021a29e 000000e3 c035c5fc 00000216 c02d5564 00000000 c02d5530 c02d5530 
       c02d56d4 00000000 00000070 c012c4d6 00000001 c02d56d0 00000400 00000000 
       00000340 00612d39 c01353f8 00000000 0000000c 00000340 c0133491 00000400 
Call Trace: [<c021a29e>] error_table [kernel] 0x5e36 
[<c012c4d6>] __alloc_pages [kernel] 0x5e 
[<c01353f8>] grow_buffers [kernel] 0x3c 
[<c0133491>] refill_freelist [kernel] 0x9 
[<c0133875>] getblk [kernel] 0x115 
[<c02083cc>] __generic_copy_to_user [kernel] 0x4c 
[<c013728f>] block_read [kernel] 0x26b 
[<c019180d>] ide_dmaproc [kernel] 0x185 
[<c0190f14>] ide_dma_intr [kernel] 0x0 
[<c019a535>] do_rw_disk [kernel] 0xf1 
[<c018834e>] ide_wait_stat [kernel] 0xbe 
[<c0188630>] start_request [kernel] 0x198 
[<c018897c>] ide_do_request [kernel] 0x290 
[<c0188a02>] do_ide_request [kernel] 0xe 
[<c0112412>] schedule [kernel] 0x25e 
[<c0124ef4>] __lock_page [kernel] 0x94 
[<c0124d9a>] read_cluster_nonblocking [kernel] 0xba 
[<c0154a3c>] ext2_get_block [kernel] 0x0 
[<c0125ce5>] filemap_nopage [kernel] 0xb5 
[<c0166c7e>] batch_entropy_process [kernel] 0xa6 
[<c0166c7e>] batch_entropy_process [kernel] 0xa6 
[<c0118d81>] __run_task_queue [kernel] 0x49 
[<c011c267>] run_all_timers [kernel] 0x17 
[<c0118ccf>] bh_action [kernel] 0x1b 
[<c0118c02>] tasklet_hi_action [kernel] 0x52 
[<c0118a1f>] do_softirq [kernel] 0x47 
[<c0108110>] do_IRQ [kernel] 0x90 
[<c021016c>] call_do_IRQ [kernel] 0x5 
[<c0131c75>] sys_read [kernel] 0x95 
[<c0106e03>] system_call [kernel] 0x33 


Code: 0f 0b 59 5e 8b 57 04 8b 07 89 50 04 89 02 c7 47 04 00 00 00 
 Segmentation fault
sh-2.04# rm core
rm: cannot remove `core': No such file or directory
sh-2.04# cat /prc/meminfo
cat: /prc/meminfo: No such file or directory
sh-2.04# cat /prc/meminfoo
        total:    used:    free:  shared: buffers:  cached:
Mem:  63242240 60571648  2670592        0 51228672  2600960
Swap: 275996672   827392 275169280
MemTotal:        61760 kB
MemFree:          2608 kB
MemShared:           0 kB
Buffers:         50028 kB
Cached:           1732 kB
SwapCached:        808 kB
Active:          33404 kB
Inact_dirty:     19164 kB
Inact_clean:         0 kB
Inact_target:    16384 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61760 kB
LowFree:          2608 kB
SwapTotal:      269528 kB
SwapFree:       268720 kB
sh-2.04# <6>SysRq : Show State

                         free                        sibling
  task             PC      stack   pid father child younger older
init          S 00000000    3128     1      0   199       4       (NOTLB)
Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c013f0cf>] do_select [kernel] 0x1df 
[<c013f4bc>] sys_select [kernel] 0x398 
[<c0106e03>] system_call [kernel] 0x33 

Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c013f0cf>] do_select [kernel] 0x1df 
[<c013f4bc>] sys_select [kernel] 0x398 
[<c0106e03>] system_call [kernel] 0x33 

keventd       S 00000000    6564     2      1             3       (L-TLB)
Call Trace: [<c012057b>] context_thread [kernel] 0xdf 
[<c012049c>] context_thread [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012049c>] context_thread [kernel] 0x0 

Call Trace: [<c012057b>] context_thread [kernel] 0xdf 
[<c012049c>] context_thread [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012049c>] context_thread [kernel] 0x0 

kapm-idled    S 00000064    6144     3      1             9     2 (L-TLB)
Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c0110a6e>] apm_mainloop [kernel] 0x66 
[<c0111363>] apm [kernel] 0x26f 
[<c0105000>] stext [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01110f4>] apm [kernel] 0x0 

Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c0110a6e>] apm_mainloop [kernel] 0x66 
[<c0111363>] apm [kernel] 0x26f 
[<c0105000>] stext [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01110f4>] apm [kernel] 0x0 

ksoftirqd_CPU S C0324560    6404     4      0             5     1 (L-TLB)
Call Trace: [<c0105000>] stext [kernel] 0x0 
[<c0118dfa>] ksoftirqd [kernel] 0x6e 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c0118d8c>] ksoftirqd [kernel] 0x0 

Call Trace: [<c0105000>] stext [kernel] 0x0 
[<c0118dfa>] ksoftirqd [kernel] 0x6e 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c0118d8c>] ksoftirqd [kernel] 0x0 

kswapd        S C02D5564    6128     5      0             6     4 (L-TLB)
Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c0112861>] interruptible_sleep_on_timeout [kernel] 0x41 
[<c012b9bc>] kswapd [kernel] 0xc8 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012b8f4>] kswapd [kernel] 0x0 

Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c0112861>] interruptible_sleep_on_timeout [kernel] 0x41 
[<c012b9bc>] kswapd [kernel] 0xc8 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012b8f4>] kswapd [kernel] 0x0 

kreclaimd     S C02D2800    6640     6      0             7     5 (L-TLB)
Call Trace: [<c01127fc>] interruptible_sleep_on [kernel] 0x3c 
[<c0219f1b>] error_table [kernel] 0x5ab3 
[<c012ba7e>] kreclaimd [kernel] 0x4a 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012ba34>] kreclaimd [kernel] 0x0 

Call Trace: [<c01127fc>] interruptible_sleep_on [kernel] 0x3c 
[<c0219f1b>] error_table [kernel] 0x5ab3 
[<c012ba7e>] kreclaimd [kernel] 0x4a 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c012ba34>] kreclaimd [kernel] 0x0 

bdflush       S C02D2800    6624     7      0             8     6 (L-TLB)
Call Trace: [<c01127fc>] interruptible_sleep_on [kernel] 0x3c 
[<c013595c>] bdflush [kernel] 0xac 
[<c0105000>] stext [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01358b0>] bdflush [kernel] 0x0 

Call Trace: [<c01127fc>] interruptible_sleep_on [kernel] 0x3c 
[<c013595c>] bdflush [kernel] 0xac 
[<c0105000>] stext [kernel] 0x0 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01358b0>] bdflush [kernel] 0x0 

kupdated      S 00000286    6184     8      0                   7 (L-TLB)
Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c021be77>] Unused_offset [kernel] 0x16c8 
[<c01359e5>] kupdate [kernel] 0x81 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c0135964>] kupdate [kernel] 0x0 

Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c021be77>] Unused_offset [kernel] 0x16c8 
[<c01359e5>] kupdate [kernel] 0x81 
[<c0105000>] stext [kernel] 0x0 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c0135964>] kupdate [kernel] 0x0 

mdrecoveryd   S C02D2800    6592     9      1            76     3 (L-TLB)
Call Trace: [<c01b1a79>] md_thread [kernel] 0xb1 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01b19c8>] md_thread [kernel] 0x0 

Call Trace: [<c01b1a79>] md_thread [kernel] 0xb1 
[<c010566e>] kernel_thread [kernel] 0x26 
[<c01b19c8>] md_thread [kernel] 0x0 

minilogd      S C122DD50       0    76      1           199     9 (NOTLB)
Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c013f75f>] do_pollfd [kernel] 0x10b 
[<c013f795>] do_pollfd [kernel] 0x141 
[<c013f9e4>] sys_poll [kernel] 0x22c 
[<c0106e03>] system_call [kernel] 0x33 

Call Trace: [<c0112188>] schedule_timeout [kernel] 0x80 
[<c01120c0>] process_timeout [kernel] 0x0 
[<c013f75f>] do_pollfd [kernel] 0x10b 
[<c013f795>] do_pollfd [kernel] 0x141 
[<c013f9e4>] sys_poll [kernel] 0x22c 
[<c0106e03>] system_call [kernel] 0x33 

init          S C3EE9090      96   199      1   200            76 (NOTLB)
Call Trace: [<c0117b27>] sys_wait4 [kernel] 0x37f 
[<c0106e03>] system_call [kernel] 0x33 

Call Trace: [<c0117b27>] sys_wait4 [kernel] 0x37f 
[<c0106e03>] system_call [kernel] 0x33 

sh            S 00000000    5100   200    199                     (NOTLB)
Call Trace: [<c011211f>] schedule_timeout [kernel] 0x17 
[<c0122d59>] handle_mm_fault [kernel] 0x9d 
[<c01644de>] read_chan [kernel] 0x376 
[<c0164a17>] write_chan [kernel] 0x1cf 
[<c0160a06>] tty_read [kernel] 0xb6 
[<c0131c75>] sys_read [kernel] 0x95 
[<c0106e03>] system_call [kernel] 0x33 

Call Trace: [<c011211f>] schedule_timeout [kernel] 0x17 
[<c0122d59>] handle_mm_fault [kernel] 0x9d 
[<c01644de>] read_chan [kernel] 0x376 
[<c0164a17>] write_chan [kernel] 0x1cf 
[<c0160a06>] tty_read [kernel] 0xb6 
[<c0131c75>] sys_read [kernel] 0x95 
[<c0106e03>] system_call [kernel] 0x33 


sh-2.04# 


Expected Results:  gpart.bin should complete his job (i.e. attempt to guess the
partition table, succeeding or not) without being killed by the kernel due to an
internal error.

Additional info:

- Test is always run as single user.
- Activated swap size is 256 MiB, whereas the physical memory is 64 MiB.
- / & /usr partitions are located on another disk (/dev/hda), whereas gpart
looks par the partition table on the /dev/hdb disk
- gpart is run skipping (-k) the first sectors of the disk, in area where
partitions are already known.
- upgrading to a stock kernel.org kernel changes the results: head & cylinder
seeks still fail with the same kernel backtrace, whereas sector seek sometimes
succeeds.
- data on / & /usr has to be recovered at each reboot.

Comment 1 Bugzilla owner 2004-09-30 15:39:21 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.