Bug 190636

Summary: dom0 kernel crashes on large scp
Product: [Fedora] Fedora Reporter: Matt Wlazlo <mwlazlo>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bstein, katzj
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-24 19:27:23 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
console log from startup to crash none

Description Matt Wlazlo 2006-05-04 02:26:24 EDT
Description of problem:
Copying a development environment from one machine to dom0 causes a crash. Crash
only occurs with USB enabled from the BIOS, and the uhci_hcd driver loaded. See
below for more details.


Version-Release number of selected component (if applicable):
xen-3.0.1-4
kernel-xenU-2.6.15-1.2054_FC5
kernel-xen0-2.6.15-1.2054_FC5


How reproducible:
Always

Steps to Reproduce:
1. scp -r xx.xx.xx.xx:/work .
  
Actual results:
scp copies data for roughly 10 seconds, then displays -stalled-. The OS does not
respond to keypress, changing VTs etc. Strangely, scp continues to update it's
status line - MB/s slowly decreases to 0.

Expected results:
scp should complete the copy operation.

Additional info:

Hardware is a Dell Poweredge 2850 with 2x3Ghz Xeons and ~3Gb RAM.

The first time I reproduced with serial logging I got the following output:

-- START LOG --
BUG: soft lockup detected on CPU#0!

Pid: 461, comm:            kjournald
EIP: 0061:[<f492443a>] CPU: 0
EIP is at uhci_irq+0x16/0x13c [uhci_hcd]
 EFLAGS: 00010206    Not tainted  (2.6.15-1.2054_FC5xen0 #1)
EAX: f4920000 EBX: c08bd000 ECX: 00000000 EDX: 0000ace2
ESI: 00000000 EDI: c08bd0d0 EBP: 00000010 DS: 007b ES: 007b
CR0: 8005003b CR2: b7f18000 CR3: 3221d000 CR4: 00000640
 [<c026f406>] usb_hcd_irq+0x26/0x55
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c026007b>] cdrom_open+0x467/0x88c
 [<c0231804>] force_evtchn_callback+0xa/0xc
 [<c013c67c>] handle_IRQ_event+0x35/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<f48dbdf8>] e1000_intr+0x1f/0x1ae [e1000]
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<f486545b>] megaraid_ack_sequence+0x18c/0x194 [megaraid_mbox]
 [<f4865470>] megaraid_isr+0xd/0x28 [megaraid_mbox]
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c01c0337>] __end_that_request_first+0xa8/0x424
 [<c0232149>] end_pirq+0x47/0x4c
 [<c013c79b>] __do_IRQ+0xcf/0xdc
 [<f4847abc>] scsi_end_request+0x1f/0xb7 [scsi_mod]
 [<f4847da3>] scsi_io_completion+0x24f/0x3fe [scsi_mod]
 [<c02f575d>] _spin_unlock_irqrestore+0x9/0x31
 [<c01250ea>] del_timer+0x45/0x4b
 [<f48225bc>] sd_rw_intr+0x2a4/0x2c8 [sd_mod]
 [<c02f575d>] _spin_unlock_irqrestore+0x9/0x31
 [<f484339c>] scsi_finish_command+0x13/0xc2 [scsi_mod]
 [<c01c1e05>] blk_done_softirq+0x88/0x96
 [<c012106e>] __do_softirq+0x70/0xef
 [<c012112d>] do_softirq+0x40/0x67
 [<c01062e6>] do_IRQ+0x22/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c015b404>] ll_rw_block+0x33/0xa5
 [<f4883434>] journal_commit_transaction+0x3a9/0xf3d [jbd]
 [<c02f578b>] _spin_unlock_irq+0x6/0x29
 [<c02f4138>] schedule+0x604/0x66f
 [<c02f565f>] _spin_lock_irqsave+0x22/0x27
 [<c02f575d>] _spin_unlock_irqrestore+0x9/0x31
 [<c012455c>] try_to_del_timer_sync+0x48/0x4e
 [<f488717a>] kjournald+0xbf/0x215 [jbd]
 [<c0118537>] schedule_tail+0x36/0x94
 [<f4886a63>] commit_timeout+0x0/0x9 [jbd]
 [<c012e03b>] autoremove_wake_function+0x0/0x3a
 [<f48870bb>] kjournald+0x0/0x215 [jbd]
 [<c01029ad>] kernel_thread_helper+0x5/0xb
-- END LOG --

Just to make sure, I reproduced the bug again with slightly different results:

-- START LOG --
BUG: soft lockup detected on CPU#3!

Pid: 2288, comm:                  scp
EIP: 0061:[<c0107115>] CPU: 3
EIP is at delay_tsc+0xd/0x15
 EFLAGS: 00200246    Not tainted  (2.6.15-1.2054_FC5xen0 #1)
EAX: fe37cc00 EBX: 00000001 ECX: fe37cb78 EDX: 00000098
ESI: 00000000 EDI: c0853b2c EBP: 00000001 DS: 007b ES: 007b
CR0: 8005003b CR2: 08ff82f8 CR3: 3263a000 CR4: 00000640
 [<c01cf777>] _raw_spin_lock+0x6d/0xd9
 [<c02f5613>] _spin_lock+0x6/0x8
 [<f4882925>] journal_dirty_data+0x53/0x1d4 [jbd]
 [<f48fc1bb>] ext3_journal_dirty_data+0x12/0x35 [ext3]
 [<f48face6>] walk_page_buffers+0x4c/0x67 [ext3]
 [<f48fd1d2>] ext3_ordered_commit_write+0x4b/0xc8 [ext3]
 [<f48fc1a9>] ext3_journal_dirty_data+0x0/0x35 [ext3]
 [<c013ee80>] generic_file_buffered_write+0x3af/0x5c7
 [<c0120cd0>] current_fs_time+0x44/0x50
 [<c013f476>] __generic_file_aio_write_nolock+0x3de/0x42b
 [<c02f578b>] _spin_unlock_irq+0x6/0x29
 [<c02f4138>] schedule+0x604/0x66f
 [<c013f8dc>] generic_file_aio_write+0x64/0xb6
 [<f48f9c32>] ext3_file_write+0x26/0x92 [ext3]
 [<c015894c>] do_sync_write+0xc0/0xf3
 [<c012e03b>] autoremove_wake_function+0x0/0x3a
 [<c02f5613>] _spin_lock+0x6/0x8
 [<c0159362>] vfs_read+0xf6/0x136
 [<c01591d8>] vfs_write+0xa2/0x136
 [<c01597c0>] sys_write+0x3b/0x64
 [<c0104b3d>] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#0!

Pid: 461, comm:            kjournald
EIP: 0061:[<f492e43a>] CPU: 0
EIP is at uhci_irq+0x16/0x13c [uhci_hcd]
 EFLAGS: 00010206    Not tainted  (2.6.15-1.2054_FC5xen0 #1)
EAX: f4920000 EBX: f2b92c00 ECX: 00000000 EDX: 0000ace2
ESI: 00000000 EDI: f2b92cd0 EBP: 00000010 DS: 007b ES: 007b
CR0: 8005003b CR2: b7fc6000 CR3: 32ea6000 CR4: 00000640
 [<c026f406>] usb_hcd_irq+0x26/0x55
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c026007b>] cdrom_open+0x467/0x88c
 [<c0231804>] force_evtchn_callback+0xa/0xc
 [<c013c67c>] handle_IRQ_event+0x35/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c0231804>] force_evtchn_callback+0xa/0xc
 [<f4865407>] megaraid_ack_sequence+0x138/0x194 [megaraid_mbox]
 [<f4865470>] megaraid_isr+0xd/0x28 [megaraid_mbox]
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<c01c007b>] blk_dump_rq_flags+0xc9/0xca
 [<f48dbdf8>] e1000_intr+0x1f/0x1ae [e1000]
 [<c013c689>] handle_IRQ_event+0x42/0x85
 [<c013c759>] __do_IRQ+0x8d/0xdc
 [<c01062e1>] do_IRQ+0x1d/0x28
 [<c0231cda>] evtchn_do_upcall+0x63/0x9d
 [<c0104d14>] hypervisor_callback+0x2c/0x34
 [<f4883478>] journal_commit_transaction+0x3ed/0xf3d [jbd]
 [<c02f578b>] _spin_unlock_irq+0x6/0x29
 [<c02f4138>] schedule+0x604/0x66f
 [<c02f565f>] _spin_lock_irqsave+0x22/0x27
 [<c02f575d>] _spin_unlock_irqrestore+0x9/0x31
 [<c012455c>] try_to_del_timer_sync+0x48/0x4e
 [<f488717a>] kjournald+0xbf/0x215 [jbd]
 [<c0118537>] schedule_tail+0x36/0x94
 [<f4886a63>] commit_timeout+0x0/0x9 [jbd]
 [<c012e03b>] autoremove_wake_function+0x0/0x3a
 [<f48870bb>] kjournald+0x0/0x215 [jbd]
 [<c01029ad>] kernel_thread_helper+0x5/0xb
-- END LOG --

Then I tried several different combinations of settings. Note that the DELL has
3 settings for USB: Off, On with BIOS support, and Off with BIOS support.

usb enabled | uhci_hcd | uhci_hcd |
in hardware | loaded   | unloaded |
------------+----------+----------+
off         |   ok     |    ok    |
on w bios   |  fail    |    ok    |
on w/o bios |  fail    |    ok    |

I have dmesg logs and stack traces for these failures if it will help.
Comment 1 Matt Wlazlo 2006-05-04 02:31:45 EDT
Created attachment 128584 [details]
console log from startup to crash
Comment 2 Red Hat Bugzilla 2007-07-24 19:54:50 EDT
change QA contact
Comment 3 Daniel Berrange 2007-09-24 19:27:23 EDT
Fedora Core 5 is now End-of-life and will not be receiving any further kernel
updates. If this problem still occurrs on Fedora 6 or later,  please feel free
to re-open this bug and change the version to the appropriate newer release.