Bug 520720

Summary: Kernel panic throughout file transfer to gfs2 filesystem partition
Product: Red Hat Enterprise Linux 5 Reporter: Cleber Paiva de Souza <cleberps>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: adas, bmarzins, cleberps, djansa, rpeterso, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-12 10:04:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 526947, 533192    
Attachments:
Description Flags
Proposed fix (upstream)
none
Proposed fix (RHEL) none

Description Cleber Paiva de Souza 2009-09-01 23:29:56 UTC
Description of problem:
Kernel panic during file transfer to gfs2 filesystem.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-128.7.1.el5
gfs2-utils-0.1.53-1.el5_3.3

How reproducible:
Always, starting a transfer from a Windows System or NFS where files are used in a windows environment and file and path names contain special pontuaction and long file path.

Steps to Reproduce:
1. Start a ssh connection from Windows
2. Transfer files the the user's profile to the remote gfs2 partition
3. The kernel panic occurs in the middle of operation (random, not file specific)
  
Actual results:
Server reboots

Expected results:
Files transfered.

Additional info:
dmesg reports:

Unable to handle kernel NULL pointer dereference at 0000000000000078 RIP: 
 [<ffffffff886562af>] :gfs2:revoke_lo_add+0x1a/0x32
PGD 0 
Oops: 0002 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 0 
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 263, comm: kswapd0 Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff886562af>]  [<ffffffff886562af>] :gfs2:revoke_lo_add+0x1a/0x32
RSP: e02b:ffff88006d2fdae8  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8800379bb490 RCX: ffff88000c03c1e0
RDX: ffff88000c03c450 RSI: ffff88005cd797c8 RDI: ffff88005cd79000
RBP: ffff88000c03c430 R08: ffffffff804eab80 R09: ffff88006d2fdb20
R10: ffff880001ec7970 R11: ffffffff88656295 R12: ffff88005cd79000
R13: 0000000000000000 R14: ffff8800379bb490 R15: ffff88005cd79000
FS:  00002ba9f851eee0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process kswapd0 (pid: 263, threadinfo ffff88006d2fc000, task ffff88006d255860)
Stack:  ffffffff8865772b  ffff88006d2fde50  ffff8800379bb490  ffff880002cd93d8 
 0000000000000000  0000000000000000  ffffffff88658a5b  000000000000000e 
 ffff880002cd93d8  00000000000000b0 
Call Trace:
 [<ffffffff8865772b>] :gfs2:gfs2_remove_from_journal+0x11a/0x12c
 [<ffffffff88658a5b>] :gfs2:gfs2_invalidatepage+0xea/0x151
 [<ffffffff88658799>] :gfs2:gfs2_writepage_common+0x95/0xb1
 [<ffffffff88658ccd>] :gfs2:gfs2_jdata_writepage+0x56/0x98
 [<ffffffff802bf7c2>] shrink_inactive_list+0x401/0x809
 [<ffffffff80249667>] __pagevec_release+0x19/0x22
 [<ffffffff802bf329>] shrink_active_list+0x42a/0x43a
 [<ffffffff80213611>] shrink_zone+0xf6/0x11c
 [<ffffffff80259a5c>] kswapd+0x324/0x447
 [<ffffffff8026defe>] monotonic_clock+0x35/0x7b
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80259738>] kswapd+0x0/0x447
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233575>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233477>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12


Code: ff 40 78 c7 40 50 01 00 00 00 ff 87 9c 07 00 00 48 89 d7 e9 
RIP  [<ffffffff886562af>] :gfs2:revoke_lo_add+0x1a/0x32
 RSP <ffff88006d2fdae8>
CR2: 0000000000000078
 <3>BUG: soft lockup - CPU#2 stuck for 10s! [nfsd:11632]
CPU 2:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11632, comm: nfsd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff880058899ac8  EFLAGS: 00000282
RAX: ffff8800325e7aa0 RBX: ffff88005cd79000 RCX: 0000000000000006
RDX: ffff8800325e7aa0 RSI: 0000000000001000 RDI: ffff88005cd79780
RBP: 0000000000000000 R08: ffffffff804eab80 R09: ffff8800588999b8
R10: ffff880058899a20 R11: 0000000000000060 R12: ffff88005cd79810
R13: ffff88005cd79780 R14: 000000000000000a R15: 0000000000000001
FS:  00002ac62cd676e0(0000) GS:ffffffff805ba100(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88655c08>] :gfs2:gfs2_log_reserve+0xb4/0x18f
 [<ffffffff8866709d>] :gfs2:gfs2_do_trans_begin+0x102/0x144
 [<ffffffff88653be3>] :gfs2:gfs2_createi+0x690/0xd28
 [<ffffffff8864adf7>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff88650e94>] :gfs2:gfs2_glock_dq+0x1e/0x132
 [<ffffffff8022d353>] wake_up_bit+0x11/0x22
 [<ffffffff8865ea1f>] :gfs2:gfs2_create+0x65/0x143
 [<ffffffff886535b6>] :gfs2:gfs2_createi+0x63/0xd28
 [<ffffffff88651a26>] :gfs2:gfs2_glock_nq_num+0x3b/0x68
 [<ffffffff8023b78c>] vfs_create+0xe6/0x158
 [<ffffffff887ada0e>] :nfsd:nfsd_create_v3+0x2c9/0x412
 [<ffffffff887b333e>] :nfsd:nfsd3_proc_create+0x12f/0x140
 [<ffffffff887a81db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff88739440>] :sunrpc:svc_process+0x42b/0x6f2
 [<ffffffff80263540>] __down_read+0x35/0x9a
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a8746>] :nfsd:nfsd+0x1a5/0x2cb
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: soft lockup - CPU#3 stuck for 10s! [sftp-server:22524]
CPU 3:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 22524, comm: sftp-server Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff880041763b00  EFLAGS: 00000282
RAX: 0000000000000000 RBX: ffff88006b33b3d0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88005e04a0f0 RDI: ffff88005cd79780
RBP: ffff88003aab8540 R08: ffffffff804eab02 R09: ffff8800634ba440
R10: dace97d441ee1b29 R11: ffffffff88656d99 R12: ffff88005cd79000
R13: ffff88005e04a0f0 R14: ffff88005e04a0d0 R15: ffff8800634ba330
FS:  00002aeb6df266c0(0000) GS:ffffffff805ba180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88656df5>] :gfs2:databuf_lo_add+0x5c/0x101
 [<ffffffff88658302>] :gfs2:gfs2_page_add_databufs+0x70/0x95
 [<ffffffff88659a94>] :gfs2:gfs2_write_end+0x543/0x55a
 [<ffffffff886594b6>] :gfs2:gfs2_write_begin+0x2cf/0x36a
 [<ffffffff8865ad8a>] :gfs2:gfs2_file_buffered_write+0x1b6/0x2e5
 [<ffffffff8865b155>] :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
 [<ffffffff80408591>] sock_aio_read+0x4f/0x5e
 [<ffffffff8865b2f8>] :gfs2:gfs2_file_write_nolock+0xaa/0x10f
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80263bce>] lock_kernel+0x1b/0x32
 [<ffffffff8865b448>] :gfs2:gfs2_file_write+0x49/0xa7
 [<ffffffff80216d8b>] vfs_write+0xce/0x174
 [<ffffffff802175d8>] sys_write+0x45/0x6e
 [<ffffffff8025f2f9>] tracesys+0xab/0xb6

BUG: soft lockup - CPU#0 stuck for 10s! [gfs2_logd:21050]
CPU 0:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 21050, comm: gfs2_logd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b50>]  [<ffffffff80263b50>] .text.lock.spinlock+0x5/0x30
RSP: e02b:ffff8800556f7e48  EFLAGS: 00000282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff804e0a80
RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffff88005cd79780
RBP: ffff88005cd79000 R08: ffff8800556f6000 R09: 0000000000000000
R10: ffffffff804e0d60 R11: ffff880057747080 R12: ffff88005cd79000
R13: 000000000000003c R14: 0000000000000100 R15: ffffffff80299dd2
FS:  00002ba9f851eee0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88654a21>] :gfs2:gfs2_ail1_empty+0x1a/0x95
 [<ffffffff8864a2d4>] :gfs2:gfs2_logd+0x48/0x15c
 [<ffffffff8864a28c>] :gfs2:gfs2_logd+0x0/0x15c
 [<ffffffff80233575>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233477>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: soft lockup - CPU#2 stuck for 10s! [nfsd:11632]
CPU 2:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11632, comm: nfsd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b50>]  [<ffffffff80263b50>] .text.lock.spinlock+0x5/0x30
RSP: e02b:ffff880058899ac8  EFLAGS: 00000282
RAX: ffff8800325e7aa0 RBX: ffff88005cd79000 RCX: 0000000000000006
RDX: ffff8800325e7aa0 RSI: 0000000000001000 RDI: ffff88005cd79780
RBP: 0000000000000000 R08: ffffffff804eab80 R09: ffff8800588999b8
R10: ffff880058899a20 R11: 0000000000000060 R12: ffff88005cd79810
R13: ffff88005cd79780 R14: 000000000000000a R15: 0000000000000001
FS:  00002ac62cd676e0(0000) GS:ffffffff805ba100(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88655c08>] :gfs2:gfs2_log_reserve+0xb4/0x18f
 [<ffffffff8866709d>] :gfs2:gfs2_do_trans_begin+0x102/0x144
 [<ffffffff88653be3>] :gfs2:gfs2_createi+0x690/0xd28
 [<ffffffff8864adf7>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff88650e94>] :gfs2:gfs2_glock_dq+0x1e/0x132
 [<ffffffff8022d353>] wake_up_bit+0x11/0x22
 [<ffffffff8865ea1f>] :gfs2:gfs2_create+0x65/0x143
 [<ffffffff886535b6>] :gfs2:gfs2_createi+0x63/0xd28
 [<ffffffff88651a26>] :gfs2:gfs2_glock_nq_num+0x3b/0x68
 [<ffffffff8023b78c>] vfs_create+0xe6/0x158
 [<ffffffff887ada0e>] :nfsd:nfsd_create_v3+0x2c9/0x412
 [<ffffffff887b333e>] :nfsd:nfsd3_proc_create+0x12f/0x140
 [<ffffffff887a81db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff88739440>] :sunrpc:svc_process+0x42b/0x6f2
 [<ffffffff80263540>] __down_read+0x35/0x9a
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a8746>] :nfsd:nfsd+0x1a5/0x2cb
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: soft lockup - CPU#3 stuck for 10s! [sftp-server:22524]
CPU 3:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 22524, comm: sftp-server Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff880041763b00  EFLAGS: 00000282
RAX: 0000000000000000 RBX: ffff88006b33b3d0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88005e04a0f0 RDI: ffff88005cd79780
RBP: ffff88003aab8540 R08: ffffffff804eab02 R09: ffff8800634ba440
R10: dace97d441ee1b29 R11: ffffffff88656d99 R12: ffff88005cd79000
R13: ffff88005e04a0f0 R14: ffff88005e04a0d0 R15: ffff8800634ba330
FS:  00002aeb6df266c0(0000) GS:ffffffff805ba180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88656df5>] :gfs2:databuf_lo_add+0x5c/0x101
 [<ffffffff88658302>] :gfs2:gfs2_page_add_databufs+0x70/0x95
 [<ffffffff88659a94>] :gfs2:gfs2_write_end+0x543/0x55a
 [<ffffffff886594b6>] :gfs2:gfs2_write_begin+0x2cf/0x36a
 [<ffffffff8865ad8a>] :gfs2:gfs2_file_buffered_write+0x1b6/0x2e5
 [<ffffffff8865b155>] :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
 [<ffffffff80408591>] sock_aio_read+0x4f/0x5e
 [<ffffffff8865b2f8>] :gfs2:gfs2_file_write_nolock+0xaa/0x10f
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80263bce>] lock_kernel+0x1b/0x32
 [<ffffffff8865b448>] :gfs2:gfs2_file_write+0x49/0xa7
 [<ffffffff80216d8b>] vfs_write+0xce/0x174
 [<ffffffff802175d8>] sys_write+0x45/0x6e
 [<ffffffff8025f2f9>] tracesys+0xab/0xb6

BUG: soft lockup - CPU#0 stuck for 10s! [gfs2_logd:21050]
CPU 0:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 21050, comm: gfs2_logd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4b>]  [<ffffffff80263b4b>] .text.lock.spinlock+0x0/0x30
RSP: e02b:ffff8800556f7e48  EFLAGS: 00000282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff804e0a80
RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffff88005cd79780
RBP: ffff88005cd79000 R08: ffff8800556f6000 R09: 0000000000000000
R10: ffffffff804e0d60 R11: ffff880057747080 R12: ffff88005cd79000
R13: 000000000000003c R14: 0000000000000100 R15: ffffffff80299dd2
FS:  00002ba9f851eee0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88654a21>] :gfs2:gfs2_ail1_empty+0x1a/0x95
 [<ffffffff8864a2d4>] :gfs2:gfs2_logd+0x48/0x15c
 [<ffffffff8864a28c>] :gfs2:gfs2_logd+0x0/0x15c
 [<ffffffff80233575>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233477>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: soft lockup - CPU#2 stuck for 10s! [nfsd:11632]
CPU 2:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11632, comm: nfsd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff880058899ac8  EFLAGS: 00000282
RAX: ffff8800325e7aa0 RBX: ffff88005cd79000 RCX: 0000000000000006
RDX: ffff8800325e7aa0 RSI: 0000000000001000 RDI: ffff88005cd79780
RBP: 0000000000000000 R08: ffffffff804eab80 R09: ffff8800588999b8
R10: ffff880058899a20 R11: 0000000000000060 R12: ffff88005cd79810
R13: ffff88005cd79780 R14: 000000000000000a R15: 0000000000000001
FS:  00002ac62cd676e0(0000) GS:ffffffff805ba100(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88655c08>] :gfs2:gfs2_log_reserve+0xb4/0x18f
 [<ffffffff8866709d>] :gfs2:gfs2_do_trans_begin+0x102/0x144
 [<ffffffff88653be3>] :gfs2:gfs2_createi+0x690/0xd28
 [<ffffffff8864adf7>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff88650e94>] :gfs2:gfs2_glock_dq+0x1e/0x132
 [<ffffffff8022d353>] wake_up_bit+0x11/0x22
 [<ffffffff8865ea1f>] :gfs2:gfs2_create+0x65/0x143
 [<ffffffff886535b6>] :gfs2:gfs2_createi+0x63/0xd28
 [<ffffffff88651a26>] :gfs2:gfs2_glock_nq_num+0x3b/0x68
 [<ffffffff8023b78c>] vfs_create+0xe6/0x158
 [<ffffffff887ada0e>] :nfsd:nfsd_create_v3+0x2c9/0x412
 [<ffffffff887b333e>] :nfsd:nfsd3_proc_create+0x12f/0x140
 [<ffffffff887a81db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff88739440>] :sunrpc:svc_process+0x42b/0x6f2
 [<ffffffff80263540>] __down_read+0x35/0x9a
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a8746>] :nfsd:nfsd+0x1a5/0x2cb
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff887a85a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: soft lockup - CPU#3 stuck for 10s! [sftp-server:22524]
CPU 3:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 22524, comm: sftp-server Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff880041763b00  EFLAGS: 00000282
RAX: 0000000000000000 RBX: ffff88006b33b3d0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88005e04a0f0 RDI: ffff88005cd79780
RBP: ffff88003aab8540 R08: ffffffff804eab02 R09: ffff8800634ba440
R10: dace97d441ee1b29 R11: ffffffff88656d99 R12: ffff88005cd79000
R13: ffff88005e04a0f0 R14: ffff88005e04a0d0 R15: ffff8800634ba330
FS:  00002aeb6df266c0(0000) GS:ffffffff805ba180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88656df5>] :gfs2:databuf_lo_add+0x5c/0x101
 [<ffffffff88658302>] :gfs2:gfs2_page_add_databufs+0x70/0x95
 [<ffffffff88659a94>] :gfs2:gfs2_write_end+0x543/0x55a
 [<ffffffff886594b6>] :gfs2:gfs2_write_begin+0x2cf/0x36a
 [<ffffffff8865ad8a>] :gfs2:gfs2_file_buffered_write+0x1b6/0x2e5
 [<ffffffff8865b155>] :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
 [<ffffffff80408591>] sock_aio_read+0x4f/0x5e
 [<ffffffff8865b2f8>] :gfs2:gfs2_file_write_nolock+0xaa/0x10f
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80299fea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80263bce>] lock_kernel+0x1b/0x32
 [<ffffffff8865b448>] :gfs2:gfs2_file_write+0x49/0xa7
 [<ffffffff80216d8b>] vfs_write+0xce/0x174
 [<ffffffff802175d8>] sys_write+0x45/0x6e
 [<ffffffff8025f2f9>] tracesys+0xab/0xb6

BUG: soft lockup - CPU#0 stuck for 10s! [gfs2_logd:21050]
CPU 0:
Modules linked in: blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl auth_rpcgss sunrpc autofs4 hidp l2cap bluetooth lock_dlm gfs2 dlm configfs bridge netloop netbk ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin scsi_dh_rdac dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport st joydev sr_mod cdrom sg i5000_edac edac_mc e1000e pcspkr qla2xxx pl2303 i2c_i801 serio_raw i2c_core scsi_transport_fc mptspi usbserial scsi_transport_spi serial_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 21050, comm: gfs2_logd Not tainted 2.6.18-128.7.1.el5xen #1
RIP: e030:[<ffffffff80263b4d>]  [<ffffffff80263b4d>] .text.lock.spinlock+0x2/0x30
RSP: e02b:ffff8800556f7e48  EFLAGS: 00000282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff804e0a80
RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffff88005cd79780
RBP: ffff88005cd79000 R08: ffff8800556f6000 R09: 0000000000000000
R10: ffffffff804e0d60 R11: ffff880057747080 R12: ffff88005cd79000
R13: 000000000000003c R14: 0000000000000100 R15: ffffffff80299dd2
FS:  00002ba9f851eee0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff88654a21>] :gfs2:gfs2_ail1_empty+0x1a/0x95
 [<ffffffff8864a2d4>] :gfs2:gfs2_logd+0x48/0x15c
 [<ffffffff8864a28c>] :gfs2:gfs2_logd+0x0/0x15c
 [<ffffffff80233575>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233477>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

Comment 1 Steve Whitehouse 2009-09-02 08:00:07 UTC
You appear to be using journaled data mode. Please confirm the mount arguments which you used.

Comment 2 Steve Whitehouse 2009-09-02 10:33:27 UTC
The issue is that somehow we have tried to invalidate a page whilst not in a transaction. Everything else that you've seen has just followed on from that one issue.

It looks like there is a page which has been left dirty and then truncated such that when writepage has been passed the page, its only option is to remove it from the journal as it is no longer required for writing back to disk. During this process it has tried to remove the page from the journal and hit this bug.

I suspect that if you turn off journaled data mode then that will work around the issue in the short term while we try and come up with a fix for the bug.

Comment 3 Steve Whitehouse 2009-09-02 10:54:08 UTC
Created attachment 359507 [details]
Proposed fix (upstream)

I'll try and get a RHEL version sorted out ready for testing.

Comment 4 Steve Whitehouse 2009-09-02 13:16:31 UTC
Created attachment 359518 [details]
Proposed fix (RHEL)

Needs testing, but I suspect that this will do the trick.

Comment 5 Cleber Paiva de Souza 2009-09-02 14:41:20 UTC
(In reply to comment #1)
> You appear to be using journaled data mode. Please confirm the mount arguments
> which you used.  

I used the acl and quota=account mount options.

Comment 6 Steve Whitehouse 2009-09-02 14:48:55 UTC
Did you do a chattr +j on any files/directories?

Comment 7 Cleber Paiva de Souza 2009-09-02 14:57:16 UTC
(In reply to comment #6)
> Did you do a chattr +j on any files/directories?  

No, I only mounted and transfered the files. No settings with chattr nor setfacl.
On the original filesystem, (ext3) from where the files were transfered, I used ACL and have directories with were 'setfacl'ed.

Now I'm justing testing using gfs version 1, and no problem until now for the same files. Almost 200 GB of data transfered. For gfs2 the system breaks at most with 10 GB of data  transfer, sometimes sooner.

The next test will be disabling data journaling for gfs2.

Comment 8 Cleber Paiva de Souza 2009-09-02 15:22:23 UTC
(In reply to comment #2)
> The issue is that somehow we have tried to invalidate a page whilst not in a
> transaction. Everything else that you've seen has just followed on from that
> one issue.
> 
> It looks like there is a page which has been left dirty and then truncated such
> that when writepage has been passed the page, its only option is to remove it
> from the journal as it is no longer required for writing back to disk. During
> this process it has tried to remove the page from the journal and hit this bug.
> 
> I suspect that if you turn off journaled data mode then that will work around
> the issue in the short term while we try and come up with a fix for the bug.  

The partition was already mounted as data=ordered, since this is the default and I do not specified anything for data= during the mounting.

Comment 9 RHEL Program Management 2009-09-25 17:43:12 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Steve Whitehouse 2009-12-02 15:03:11 UTC
I haven't been able to figure out what is going on here yet. If you upgrade to the latest 5.4 kernel does the issue go away?

Also, please check if you upgraded from 5.2 that the gfs2 kmod isn't still around as that is know to be broken and unfortunately the upgrade process doesn't remove it and it will load in preference to the 5.3 gfs2 module if it is not removed by hand.

Comment 14 Steve Whitehouse 2010-03-12 10:04:26 UTC
This issue has been in needinfo for several months now. I greatly suspect that it was caused by a left-over gfs2 kmod. We've had one report via the mailing lists of a very similar result which appeared to have been caused by exactly the same thing (left over kmod).

Since we've heard nothing more from the reporters of either issue since the suggestion to check for the kmod, I assume that must have been the cause. We are therefore closing this issue and if that is incorrect, please reopen the bug.