Bug 976835
Summary: | NFS: Bonnie++ fails with Can't write block.: Input/output error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ben Turner <bturner> | ||||||||
Component: | glusterd | Assignee: | santosh pradhan <spradhan> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Ben Turner <bturner> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 2.1 | CC: | rhs-bugs, shaines, vagarwal, vbellur | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | glusterfs-devel-3.4.0.12rhs-1 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2013-09-23 22:39:51 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Ben Turner
2013-06-21 15:18:29 UTC
More errors from the server being mounted: [2013-06-21 14:42:29.010032] E [rpc-clnt.c:207:call_bail] 0-testvol-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x137608x sent = 2013-06-21 14:12:28.982549. timeout = 1800 [2013-06-21 14:42:29.486573] E [rpc-clnt.c:207:call_bail] 0-testvol-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x137607x sent = 2013-06-21 14:12:28.975136. timeout = 1800 [2013-06-21 14:42:29.488010] C [rpc-clnt.c:460:rpc_clnt_fill_request_info] 0-testvol-client-0: cannot lookup the saved frame corresponding to xid (137339) [2013-06-21 14:42:29.488132] E [rpc-clnt.c:207:call_bail] 0-testvol-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x137606x sent = 2013-06-21 14:12:28.967714. timeout = 1800 [2013-06-21 14:42:29.488328] E [rpc-clnt.c:207:call_bail] 0-testvol-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x137605x sent = 2013-06-21 14:12:28.960302. timeout = 1800 [2013-06-21 14:42:29.488404] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f9daeb6def8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f9daeb6db23] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f9daeb6d23e]))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2013-06-21 14:12:29.057388 (xid=0x137609x) [2013-06-21 14:42:29.493496] E [rpc-clnt.c:207:call_bail] 0-testvol-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x137604x sent = 2013-06-21 14:12:28.952882. timeout = 1800 Created attachment 763893 [details]
sos server mounted NFS
Created attachment 763894 [details]
Sos server
Created attachment 763895 [details]
sos client
In the NFS server log, I see the bricks are not connected. Log snippet: ============ 0-testvol-client-0 ------------------ [2013-06-21 14:42:29.559517] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-testvol-client-0: remote operation failed: Transport endpoint is not connected [2013-06-21 14:42:29.559565] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f9daeb6def8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f9daeb6db23] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f9daeb6d23e]))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2013-06-21 14:12:36.611217 (xid=0x138333x) [2013-06-21 14:42:29.559572] W [client-rpc-fops.c:866:client3_3_writev_cbk] 0-testvol-client-0: remote operation failed: Transport endpoint is not connected [2013-06-21 14:42:29.559591] W [rpc-clnt.c:1487:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0x156467x Program: GlusterFS 3.3, ProgVers: 330, Proc: 34) to rpc-transport (testvol-client-0) 0-testvol-client-1 ------------------ [2013-06-21 14:43:15.323237] W [client-rpc-fops.c:1811:client3_3_fxattrop_cbk] 0-testvol-client-1: remote operation failed: Transport endpoint is not connected [2013-06-21 14:43:15.323252] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f9daeb6def8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f9daeb6db23] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f9daeb6d23e]))) 0-testvol-client-1: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2013-06-21 14:43:13.538009 (xid=0x155907x) Could you check why they were down? Thanks, Santosh I tried with latest build i.e. glusterfs-devel-3.4.0.12rhs-1 My bonnie++ test PASSED with following results with 1x2 repli-distr volume. ===========Test result start============= #/opt/qa/tools/bonnie++ -u root -d /mnt/nfsdir Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP santosh-1.locald 4G 1527 98 23227 3 3847 1 3155 99 288897 31 1325 98 Latency 12597us 5126ms 6014ms 9277us 6590us 2987ms Version 1.96 ------Sequential Create------ --------Random Create-------- santosh-1.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 170 2 553 8 491 6 253 3 1225 9 772 6 Latency 51831us 1127ms 74608us 52319us 8247us 64872us 1.96,1.96,santosh-1.localdomain,1,1372295002,4G,,1527,98,23227,3,3847,1,3155,99,288897,31,1325,98,16,,,,,170,2,553,8,491,6,253,3,1225,9,772,6,12597us,5126ms,6014ms,9277us,6590us,2987ms,51831us,1127ms,74608us,52319us,8247us,64872us ===============Test result End==================== Could you retry the test case? I am moving it to ON_QA for the time being, please feel free to reopen if hit the issue again. Thanks, Santosh I saw this problem in the glusterfs-3.4.0.10rhs-2.el6rhs.x86_64 and glusterfs-3.4.0.11rhs-2.el6rhs.x86_64 builds but I haven't hit this in daily runs since the glusterfs-devel-3.4.0.12rhs-1 build dropped. Moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |