Bug 1411219
Summary: | [Ganesha] : I/O Error post failover/failback | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | nfs-ganesha | Assignee: | Soumya Koduri <skoduri> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ambarish <asoman> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-21 12:44:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-01-09 07:02:19 UTC
I could reproduce this error after running 4 instances of Bonnie++ as done by Ambarish. Changing to the specified mountpoint /mnt/nfs/dir1/run7163 executing bonnie Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...Bonnie: drastic I/O error (re-write read): Stale file handle Total 0 tests were successful Switching over to the previous working directory Removing /mnt/nfs/dir2/run7164/ rmdir: failed to remove ‘/mnt/nfs/dir2/run7164/’: Stale file handle rmdir failed:Directory not empty [root@dhcp9 ~]# ... ... The mount point itself has become stale and hence all 4 bonnie instances also failed with STALE_FILE_HANDLE. This error was returned after failback (i.e, when nfs-ganesha is restarted) [root@dhcp46-111 ~]# showmount -e Export list for dhcp46-111.lab.eng.blr.redhat.com: [root@dhcp46-111 ~]# 12/01/2017 15:17:26 : epoch c2810000 : dhcp46-111.lab.eng.blr.redhat.com : ganesha.nfsd-11504[main] glusterfs_create_export :FSAL :EVENT :Volume vol_ec exported at : '/' 12/01/2017 15:17:30 : epoch c2810000 : dhcp46-111.lab.eng.blr.redhat.com : ganesha.nfsd-11504[main] glusterfs_create_export :FSAL :CRIT :Unable to initialize volume. Export: /vol_ec 12/01/2017 15:17:31 : epoch c2810000 : dhcp46-111.lab.eng.blr.redhat.com : ganesha.nfsd-11504[main] fsal_cfg_commit :CONFIG :CRIT :Could not create export for (/vol_ec) to (/vol_ec) 12/01/2017 15:17:32 : epoch c2810000 : dhcp46-111.lab.eng.blr.redhat.com : ganesha.nfsd-11504[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! [2017-01-12 09:46:17.6320[2017-01-12 09:47:26.703717] E [socket.c:2309:socket_connect_finish] 0-gfapi: connection to ::1:24007 failed (Connection refused) [2017-01-12 09:47:26.703930] E [MSGID: 104024] [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected) [Transport endpoint is not connected] [2017-01-12 09:47:30.672554] E [MSGID: 104007] [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume file (key:vol_ec) [Invalid argument] [2017-01-12 09:47:30.689123] E [MSGID: 104024] [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: localhost (No space left on device) [No space left on device] Volume initialization failed either because of network connection failures or no space on the machine. Either case, it resulted in nfs-ganesha server not exporting the volume and hence the mount point has become stale. I shall give a test run once more and see if we hit same issue. Since the volume hasn't got unexported I did not hit this issue while trying to repro it. I hit some other n/w network issue,even before the EIO was hit(at the time of reporting). So ,till now I have not got a clean run with Bonnie++ on Ganesha mounts with continuous failover/failback. I would say defer this and bring it back in if and when QE gets a reproducer. reopen if seen again |