Bug 764840 (GLUSTER-3108)

Summary: iozone fails with Bad file descriptor - stripe-replicate
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: nfsAssignee: Vivek Agarwal <vagarwal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: amarts, gluster-bugs, jdarcy, nuaa_liuben, rme, sankarshan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852571 (view as bug list) Environment:
Last Closed: 2013-07-24 17:29:37 UTC Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 852571    
Attachments:
Description Flags
nfs log file none

Description Lakshmipathi G 2011-06-29 12:35:43 UTC
While running iozone on nfs with following setup it fails -
# gluster volume info
 
Volume Name: dsr
Type: Striped-Replicate (RAID 01)
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.1.12.172:/export/dsr1
Brick2: 10.1.12.172:/export/dsr2
Brick3: 10.1.12.170:/export/dsr1
Brick4: 10.1.12.170:/export/dsr2
Brick5: 10.1.12.173:/export/dsr1
Brick6: 10.1.12.173:/export/dsr2
Options Reconfigured:
diagnostics.client-log-level: TRACE
diagnostics.brick-log-level: TRACE


iozone 
-----
            128      16 2608629 2511022    79846  6406138 4759253 4104338 4596273 4267461 5847904  2140585  2326073   76785  5847904
             128      32 2784517 3657016    66086  6727225 3759450 4135958   41077 4267461 5545860  3359523  3359523   78092  6045455
             128      64 2770150 3199360    66834  7082197 4557257 3359523 5784891 3560017 5847904  1406136  4596273   77956  6406138
             128     128 2166499 2558895    89750  7176872 6406138 4104338   87483 3124872 4557257  3468030  3277486   78183  3982553
             256       4 2170027 2665657
Error reading block 32 2b0341500000
read: Bad file descriptor


nfslog-

[2011-06-29 05:22:04.58028] T [rpc-clnt.c:641:rpc_clnt_reply_init] 0-dsr-client-4: received rpc message (RPC XID: 0x1415x Program: GlusterFS 3.1, ProgVers: 310, Proc: 1) from rpc-transport (dsr-client-4)
[2011-06-29 05:22:04.58400] T [rpc-clnt.c:641:rpc_clnt_reply_init] 0-dsr-client-4: received rpc message (RPC XID: 0x1416x Program: GlusterFS 3.1, ProgVers: 310, Proc: 1) from rpc-transport (dsr-client-4)
[2011-06-29 05:22:04.58423] T [rpc-clnt.c:641:rpc_clnt_reply_init] 0-dsr-client-0: received rpc message (RPC XID: 0x3139x Program: GlusterFS 3.1, ProgVers: 310, Proc: 1) from rpc-transport (dsr-client-0)
[2011-06-29 05:22:04.58507] D [nfs3-helpers.c:2424:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7716401c, ACCESS: NFS: 0(Call completed successfully.), POSIX: 0(Success)
[2011-06-29 05:22:04.58530] T [rpcsvc.c:1675:nfs_rpcsvc_submit_generic] 0-nfsrpc: Tx message: 96
[2011-06-29 05:22:04.58546] T [rpcsvc.c:1456:nfs_rpcsvc_record_build_header] 0-nfsrpc: Reply fraglen 120, payload: 96, rpc hdr: 24
[2011-06-29 05:22:04.58554] T [rpcsvc.c:1493:nfs_rpcsvc_conn_submit] 0-nfsrpc: Tx Header: 28, payload: 96
[2011-06-29 05:22:04.58582] T [rpcsvc.c:2571:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: First Tx Buf
[2011-06-29 05:22:04.58613] T [rpcsvc.c:2582:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: conn: 0x5dff000, Tx request: 28, Tx sent: 28
[2011-06-29 05:22:04.58647] T [rpcsvc.c:2578:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: Last Tx Buf
[2011-06-29 05:22:04.58595] T [rpc-clnt.c:641:rpc_clnt_reply_init] 0-dsr-client-0: received rpc message (RPC XID: 0x3140x Program: GlusterFS 3.1, ProgVers: 310, Proc: 1) from rpc-transport (dsr-client-0)
[2011-06-29 05:22:04.58673] T [rpcsvc.c:2582:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: conn: 0x5dff000, Tx request: 96, Tx sent: 96
[2011-06-29 05:22:04.58709] D [nfs3-helpers.c:2424:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7816401c, ACCESS: NFS: 0(Call completed successfully.), POSIX: 0(Success)
[2011-06-29 05:22:04.58742] T [rpcsvc.c:1675:nfs_rpcsvc_submit_generic] 0-nfsrpc: Tx message: 96
[2011-06-29 05:22:04.58751] T [rpcsvc.c:1456:nfs_rpcsvc_record_build_header] 0-nfsrpc: Reply fraglen 120, payload: 96, rpc hdr: 24
[2011-06-29 05:22:04.58758] T [rpcsvc.c:1493:nfs_rpcsvc_conn_submit] 0-nfsrpc: Tx Header: 28, payload: 96
[2011-06-29 05:22:04.58779] T [rpcsvc.c:2571:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: First Tx Buf
[2011-06-29 05:22:04.58793] T [rpcsvc.c:2582:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: conn: 0x5dff000, Tx request: 28, Tx sent: 28
[2011-06-29 05:22:04.58802] T [rpcsvc.c:2578:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: Last Tx Buf
[2011-06-29 05:22:04.58820] T [rpcsvc.c:2582:__nfs_rpcsvc_conn_data_poll_out] 0-nfsrpc: conn: 0x5dff000, Tx request: 96, Tx sent: 96
[2011-06-29 05:22:16.220841] D [client-handshake.c:179:client_start_ping] 0-dsr-client-3: returning as transport is already disconnected OR there are no frames (0 || 0)
[2011-06-29 05:22:16.220885] D [client-handshake.c:179:client_start_ping] 0-dsr-client-1: returning as transport is already disconnected OR there are no frames (0 || 0)
[2011-06-29 05:22:16.220894] D [client-handshake.c:179:client_start_ping] 0-dsr-client-2: returning as transport is already disconnected OR there are no frames (0 || 0)
[2011-06-29 05:22:16.220902] D [client-handshake.c:179:client_start_ping] 0-dsr-client-5: returning as transport is already disconnected OR there are no frames (0 || 0)
[2011-06-29 05:22:24.233837] D [client-handshake.c:179:client_start_ping] 0-dsr-client-4: returning as transport is already disconnected OR there are no frames (0 || 0)
[2011-06-29 05:22:24.233891] D [client-handshake.c:179:client_start_ping] 0-dsr-client-0: returning as transport is already disconnected OR there are no frames (0 || 0)



glfs-log-
========

[2011-06-29 05:21:43.939025] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 1010
[2011-06-29 05:21:43.939044] T [rpcsvc-auth.c:302:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS
[2011-06-29 05:21:43.939052] T [rpcsvc.c:356:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0xc42, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 12) from rpc-transport (tcp.dsr-server)
[2011-06-29 05:21:43.939060] T [auth-glusterfs.c:191:auth_glusterfs_authenticate] 0-rpc-service: Auth Info: pid: 1, uid: 0, gid: 0, owner: 1357
[2011-06-29 05:21:43.939067] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS-3.1.0 - READ
[2011-06-29 05:21:43.939121] T [rpcsvc.c:1003:rpcsvc_submit_generic] 0-rpc-service: Tx message: 16496
[2011-06-29 05:21:43.939138] T [rpcsvc.c:640:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 16520, payload: 16496, rpc hdr: 24
[2011-06-29 05:21:43.939164] T [rpcsvc.c:1040:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x3138x, Program: GlusterFS-3.1.0, ProgVers: 310, Proc: 12) to rpc-transport (tcp.dsr-server)
[2011-06-29 05:22:04.58065] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 1010
[2011-06-29 05:22:04.58102] T [rpcsvc-auth.c:302:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS
[2011-06-29 05:22:04.58113] T [rpcsvc.c:356:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0xc43, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 1) from rpc-transport (tcp.dsr-server)
[2011-06-29 05:22:04.58128] T [auth-glusterfs.c:191:auth_glusterfs_authenticate] 0-rpc-service: Auth Info: pid: 1, uid: 0, gid: 0, owner: 1358
[2011-06-29 05:22:04.58142] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS-3.1.0 - STAT
[2011-06-29 05:22:04.58163] T [server-resolve.c:127:resolve_loc_touchup] 0-dsr-server: return value inode_path 1
[2011-06-29 05:22:04.58200] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 1010
[2011-06-29 05:22:04.58211] T [rpcsvc-auth.c:302:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS
[2011-06-29 05:22:04.58220] T [rpcsvc.c:356:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0xc44, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 1) from rpc-transport (tcp.dsr-server)
[2011-06-29 05:22:04.58231] T [auth-glusterfs.c:191:auth_glusterfs_authenticate] 0-rpc-service: Auth Info: pid: 1, uid: 0, gid: 0, owner: 1359
[2011-06-29 05:22:04.58247] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS-3.1.0 - STAT
[2011-06-29 05:22:04.58234] T [rpcsvc.c:1003:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108
[2011-06-29 05:22:04.58279] T [rpcsvc.c:640:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24
[2011-06-29 05:22:04.58260] T [server-resolve.c:127:resolve_loc_touchup] 0-dsr-server: return value inode_path 1
[2011-06-29 05:22:04.58298] T [rpcsvc.c:1040:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x3139x, Program: GlusterFS-3.1.0, ProgVers: 310, Proc: 1) to rpc-transport (tcp.dsr-server)
[2011-06-29 05:22:04.58322] T [rpcsvc.c:1003:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108
[2011-06-29 05:22:04.58352] T [rpcsvc.c:640:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24
[2011-06-29 05:22:04.58370] T [rpcsvc.c:1040:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x3140x, Program: GlusterFS-3.1.0, ProgVers: 310, Proc: 1) to rpc-transport (tcp.dsr-server)

Comment 1 Shehjar Tikoo 2011-06-30 01:19:21 UTC
Lakshmi, do we have the complete log?

Comment 2 Lakshmipathi G 2011-07-01 03:14:51 UTC
Created attachment 538 [details]
LSMOD

Comment 3 Amar Tumballi 2011-09-22 03:27:34 UTC
Please check if it is happening now, and take a resolution.

Comment 4 Lakshmipathi G 2011-09-23 06:40:07 UTC
(In reply to comment #3)
> Please check if it is happening now, and take a resolution.

Yes,iozone still fails with stripe-replicated (tested with 3.3.qa10)

------
        Command line used: /opt/qa/tools/iozone -a -s 1m
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
            1024       4 1389594 1999700
Error reading block 224 2b07cd500000
read: Bad file descriptor
------

Comment 5 Lakshmipathi G 2011-09-28 05:47:51 UTC
note: seems like it starts to fail if size >= 132KB 
---------------
        File size set to 132 KB
        Command line used: /opt/qa/tools/iozone -a -s 132K
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
             132       4 1166051 2757133
Error reading block 32 2b8195900000
read: Bad file descriptor
----------
with size 131KB
=======
       File size set to 131 KB
        Command line used: /opt/qa/tools/iozone -a -s 131K
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
             131       4 1336145 2558895  3124872  3982553 2558895 2420455 2673584  2843510  2843510  2066432  1997245 3468030  3895854
             131       8 2969325 3277486  4934216  5122535 4135958 3124872 3759450  3982553  5122535  2558895  2409592 4717434  4934216
             131      16 2969325 3380677  5603747  5545860 4934216 2511022 4267461  3982553  5325799  2660335  3297617   37439  5122535
             131      32 3124872 3536566  2175277  6406138 4135958 2558895   42967  4267461  5847904  1967960  3657016   56436  5603747
             131      64 3199360 2843510  5847904  4557257 3867787 3759450 4934216  2905056  6114306  3277486  5122535 5545860  6114306
             131     128 3053774 3759450  6114306  6406138 6045455 3759450 3106789  3895854  5545860  3199360  3657016 5847904  6114306

iozone test complete.

Comment 6 Krishna Srinivas 2012-11-02 07:43:16 UTC
It works in the latest code base:

[root@escape nfs]# iozone -a -s 1m
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.414 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
	             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
	             Vangel Bojaxhi, Ben England.

	Run began: Fri Nov  2 13:08:47 2012

	Auto Mode
	File size set to 1024 KB
	Command line used: iozone -a -s 1m
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
            1024       4  612446 1689730   281936  4361233 5120336 2869422    8944  1554004  2611230  1590254  1244266  378551  4114719
            1024       8 2343383 3436508   352398  5479632 8750850 4232305   19586  2129589  1765437   930894   830457  314308  4805258
            1024      16  722717 2444759   554060  6208343 9485232 4339202   54662  2019445  4762630  3229770  1013967  368198  5593820
            1024      32  739898 3567824   505182  4570058 5690162 2993413 6569179  4550690  3281592  1610528  1288307  375079  4470172
            1024      64  760197 3735385   392492  3894582 5885083 2646630  184074  6918376 11645609  1357961  1095429  450498  5853003
            1024     128  727368 1014925   642785  3472628 5018624 2391665 4949226  4162573 11132462  1957770  2645000  330641  4095102
            1024     256  736725  998874   643170  4949226 6441107 3191372  297851  3200886   699414  2908281  2497356  465689  5096034
            1024     512  758452 3403826  2432298  8199542 9743447 4720752  395382  2403712   507151  1662264  5444898  311053  4700088
            1024    1024  890553 3358576  5363307 10662628 9464330 4433259  259439  4488860  4282950  4700088  3003881 7537599  8914314

iozone test complete.
[root@escape nfs]#

Comment 7 Krishna Srinivas 2012-11-02 09:11:10 UTC
Putting it on_qa to verify this in the latest code base as it works fine for me.