Bug 1120928 - Loss storage LUNS ramdomly when using volumes as CIFS
Summary: Loss storage LUNS ramdomly when using volumes as CIFS
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: cifs-maint
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-18 02:16 UTC by Do Hakyong
Modified: 2016-07-01 10:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-01 10:15:31 UTC


Attachments (Terms of Use)
message.log (4.15 MB, text/plain)
2014-07-18 02:16 UTC, Do Hakyong
no flags Details

Description Do Hakyong 2014-07-18 02:16:58 UTC
Created attachment 918931 [details]
message.log

Description of problem:
We created CIFS server for using at hospital. The enviroment is HP ProLiant DL380p Gen8 Server(certificated) + RHEL5.8(x86_64) + Veritas file system + Hitachi Storage.
The problem is that when we operating samba server, we have loss LUNS ramdomly with below message:

----------------------------------------------------------------------------

Jul 10 11:44:54 IV03 kernel: qla2xxx 0000:07:00.1: Mailbox command timeout occurred, cmd=0x54 mb[0]=0x54. Issuing ISP abort.
Jul 10 11:44:54 IV03 kernel: qla2xxx 0000:07:00.1: Performing ISP error recovery - ha= ffff81083ae584f8.
Jul 10 11:44:55 IV03 kernel: qla2xxx 0000:07:00.1: LIP reset occured (f700).
Jul 10 11:44:55 IV03 kernel: qla2xxx 0000:07:00.1: LOOP UP detected (4 Gbps).
Jul 10 11:44:56 IV03 kernel: qla2xxx 0000:07:00.1: scsi(4:0:13): Abort command issued -- 0 410191 2002.
Jul 10 11:45:36 IV03 kernel: qla2xxx 0000:07:00.1: Mailbox command timeout occurred, cmd=0x54 mb[0]=0x54. Issuing ISP abort.
Jul 10 11:45:36 IV03 kernel: qla2xxx 0000:07:00.1: Performing ISP error recovery - ha= ffff81083ae584f8.
Jul 10 11:45:37 IV03 kernel: qla2xxx 0000:07:00.1: LIP reset occured (f700).
Jul 10 11:45:37 IV03 kernel: qla2xxx 0000:07:00.1: LOOP UP detected (4 Gbps).
Jul 10 11:45:37 IV03 kernel: qla2xxx 0000:07:00.1: scsi(4:0:13): Abort command issued -- 0 410191 2002.
Jul 10 11:46:17 IV03 kernel: qla2xxx 0000:07:00.1: Mailbox command timeout occurred, cmd=0x54 mb[0]=0x54. Issuing ISP abort.
Jul 10 11:46:17 IV03 kernel: qla2xxx 0000:07:00.1: Performing ISP error recovery - ha= ffff81083ae584f8.
Jul 10 11:46:19 IV03 kernel: qla2xxx 0000:07:00.1: LIP reset occured (f700).
Jul 10 11:46:19 IV03 kernel: qla2xxx 0000:07:00.1: LOOP UP detected (4 Gbps).
Jul 10 11:46:19 IV03 kernel: qla2xxx 0000:07:00.1: scsi(4:0:22): Abort command issued -- 0 410192 2002.
Jul 10 11:46:48 IV03 kernel: INFO: task vx_worklist_thr:9957 blocked for more than 120 seconds.
Jul 10 11:46:48 IV03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 11:46:48 IV03 kernel: vx_worklist_t D ffffffff801568f1     0  9957      1          9958  9956 (L-TLB)
Jul 10 11:46:48 IV03 kernel:  ffff8108282f5ae0 0000000000000046 ffffffffffffff9c ffff810840002c00
Jul 10 11:46:48 IV03 kernel:  0000000000011200 000000000000000a ffff810834013820 ffff81103fe4e0c0
Jul 10 11:46:48 IV03 kernel:  0003e5334062270b 00000000002ac803 ffff810834013a08 0000000a00031200
Jul 10 11:46:48 IV03 kernel: Call Trace:
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8006ece7>] do_gettimeofday+0x40/0x90
Jul 10 11:46:48 IV03 kernel:  [<ffffffff80028c7d>] sync_page+0x0/0x43
Jul 10 11:46:48 IV03 kernel:  [<ffffffff800637de>] io_schedule+0x3f/0x67
Jul 10 11:46:48 IV03 kernel:  [<ffffffff80028cbb>] sync_page+0x3e/0x43
Jul 10 11:46:48 IV03 kernel:  [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
Jul 10 11:46:48 IV03 kernel:  [<ffffffff800350fb>] wait_on_page_bit+0x6c/0x72
Jul 10 11:46:48 IV03 kernel:  [<ffffffff800a34d9>] wake_bit_function+0x0/0x23
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886c7941>] :vxfs:vx_pvn_wait_writeback+0x64/0xca
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886cc91e>] :vxfs:vx_pvn_range_dirty+0xad5/0xb63
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886c78cc>] :vxfs:vx_pvn_lookup_dirty_tag+0x0/0x7
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886ccba8>] :vxfs:vx_putpage_dirty_wbc+0xdf/0xec
Jul 10 11:46:48 IV03 kernel:  [<ffffffff80044e51>] mempool_free_slab+0x0/0xe
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886cccd2>] :vxfs:vx_putpage_dirty+0x29/0x2e
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886ac9f2>] :vxfs:vx_do_putpage+0xc2/0x147
Jul 10 11:46:48 IV03 kernel:  [<ffffffff886371af>] :vxfs:vx_idelxwri_flush+0x11e/0x211
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8864baea>] :vxfs:vx_idalloc_off+0x303/0x413
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88639a11>] :vxfs:vx_dalloc_flush+0x186/0x21d
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88635d20>] :vxfs:vx_workitem_process+0x2d/0x3d
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88635ef7>] :vxfs:vx_worklist_process+0x1c7/0x2a8
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88639207>] :vxfs:vx_worklist_thread+0x0/0x98
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88639261>] :vxfs:vx_worklist_thread+0x5a/0x98
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8868f830>] :vxfs:vx_kthread_init+0x57/0x5e
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88639207>] :vxfs:vx_worklist_thread+0x0/0x98
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Jul 10 11:46:48 IV03 kernel:  [<ffffffff88639207>] :vxfs:vx_worklist_thread+0x0/0x98
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8868f7d9>] :vxfs:vx_kthread_init+0x0/0x5e
Jul 10 11:46:48 IV03 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Jul 10 11:46:48 IV03 kernel: 
Jul 10 11:46:48 IV03 kernel: INFO: task vx_worklist_thr:9960 blocked for more than 120 seconds.
Jul 10 11:46:48 IV03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 11:46:48 IV03 kernel: vx_worklist_t D ffffffff801568f1     0  9960      1          9961  9959 (L-TLB)
Jul 10 11:46:48 IV03 kernel:  ffff8108282fba30 0000000000000046 ffff810839102860 ffff810840002c00
Jul 10 11:46:48 IV03 kernel:  0000000000011200 000000000000000a ffff810839102860 ffff8108400dc080
Jul 10 11:46:48 IV03 kernel:  0003e542d2905269 00000000000022b7 ffff810839102a48 000000053a5079b8

-------------------------------------------------------------------------------

I've also attached message.log please refer to attachment.
The LUNs are provided with Veritas file system(vxfs) and configured MPIO that handle by veritas file system. 

In my opinion, even we loss any of paths, CIFS should be working because we have 4 paths from SAN switch(2 Hba card are inserted), even if one or two ports are died, it should can not be affected to service.

We tired change HBA card(Qlogic to Emulex)but same issue has still caused so it's not a HBA card problem. 

The Veritas engineer said that it's not a veritas problem. Just push this problem to OS and storage or HBA card.

I've suffered this problem over 10days.. please let me go home...(cry)  

your valuable advise should be a great help to me. 
THANKS!


Version-Release number of selected component (if applicable):
RHEL 5.8(x86_64)


How reproducible:


Steps to Reproduce:
1. configure CIFS server with storage volume(volume is formatted veritas filesystem)
2. after starting CIFS, few ours or few days later the problem is caused with above log message.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sachin Prabhu 2016-07-01 10:15:31 UTC
Hello,

This issue seems to have slipped through the cracks. Please report such problems to Red Hat support to ensure adequate attention is given to issues.

I am closing this issue since it was reported a couple of years ago and hasn't had any updates yet. RHEL 5 is currently in maintenance phase. Please re-open the case with GSS if you would like to continue debugging the problem.

Sachin Prabhu


Note You need to log in before you can comment on or make changes to this bug.