| Summary: | [quota] All brick processes got killed, while removing the directory from fuse mount on which quota is set and recreating the same directory, followed by 'quota list' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||
| Component: | glusterd | Assignee: | Raghavendra Bhat <rabhat> | ||||
| Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.1 | CC: | asriram, kparthas, rhs-bugs, saujain, shaines, vbellur | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glusterfs-3.4.0.33rhs | Doc Type: | Bug Fix | ||||
| Doc Text: |
Previously, when a directory, on which quota limits are set, is deleted from the fuse mount and recreated with the same absolute path leads to quota list command failure due to brick processes crashing. Now, with this update this issue has been fixed.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-27 15:35:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 791760 [details]
tar-ed sosreports
Additional Info =============== 1. Volume is fuse mounted on 10.70.36.32 ( RHEL 6.4 ) 2. Mount point - /mnt/distvol 3. All commands are executed from RHS Node - 10.70.37.174 4. sosreports are attached 5. Observation =============== I could see following in brick logs in 10.70.37.174 () <snip> patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-08-29 09:42:14configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.20rhsquota5 /lib64/libc.so.6[0x3abc432920] /lib64/libc.so.6[0x3abc481321] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/storage/posix.so(posix_make_ancestryfromgfid+0x239)[0x7fd492a97629] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/storage/posix.so(posix_get_ancestry_directory+0xdd)[0x7fd492a913dd] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/storage/posix.so(+0x1b216)[0x7fd492a94216] /usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x31de014025] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/storage/posix.so(posix_lookup_xattr_fill+0x85)[0x7fd492a938f5] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/storage/posix.so(posix_lookup+0x871)[0x7fd492a90401] /usr/lib64/libglusterfs.so.0(default_lookup+0x6d)[0x31de01befd] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/features/access-control.so(posix_acl_lookup+0x1a2)[0x7fd4924638c2] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/features/locks.so(pl_lookup+0x222)[0x7fd49224b892] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/performance/io-threads.so(iot_lookup_wrapper+0x12c)[0x7fd492037ebc] /usr/lib64/libglusterfs.so.0(call_resume+0x122)[0x31de030172] /usr/lib64/glusterfs/3.4.0.20rhsquota5/xlator/performance/io-threads.so(iot_worker+0x158)[0x7fd49203c9f8] /lib64/libpthread.so.0[0x3abcc07851] /lib64/libc.so.6(clone+0x6d)[0x3abc4e890d] --------- </snip> https://code.engineering.redhat.com/gerrit/#/c/12036/ fixes the issue. When readlink was done on the gfid handle for a directory (while building the ancestory upon getting a nameless lookup on the gfid) the failure of the readlink call was not handled. The return value of the readlink call was collected in an unsigned variable (readlink returns -1 upon failure). And the return value was not checked. The patch mentioned above fixes the issue. Verified with RHS 2.1 containing glusterfs-3.4.0.33rhs-1.el6rhs Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |
Description of problem: ======================= Deleting the directory from the fuse mount, on which quota is set, and recreating the directory, followed by 'gluster volume <quota> <vol-name> list', killed all the bricks, resulting in 'Transport endpoint is not connected' from fuse mount. Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.4.0.20rhsquota5 built on Aug 26 2013 02:56:39 How reproducible: ================= Tried 3 times and hit it all the time (3/3) Steps to Reproduce: =================== 1. On a RHS Cluster of 4 nodes, create a distribute volume with 2 bricks (i.e) gluster volume create <vol-name> <brick1> <brick2> 2. Start the volume (i.e) gluster volume start <vol-name> 3. Enable quota on the volume (i.e) gluster volume quota <vol-name> enable 4. Set the quota on the non-existing directory (i.e) gluster volume quota <vol-name> limit-usage <non-existent-dir> 2GB NOTE: This step would fail with error message 5. Fuse mount the volume and create the directory, which is tried in step 4 6. Repeat step 4. [setting quota limit] NOTE: quota will be set on that dir 7. List the quota on the volume (i.e) gluster volume quota <vol-name> list 8. Delete the directory 9. Repeat step 7 [listing the quota] NOTE: no quota entries are listed 10. Recreate the directory on the fuse mount,[create a directory, with the same name, which is deleted in step 8 ] 11. List the quota on volume (i.e) gluster volume quota <vol-name> list Actual results: =============== All Brick processes are killed Expected results: ================ Not sure about the ideal/expected behaviour, but brick processes should not get killed Additional info: ================ Console logs ============ [Thu Aug 29 09:31:39 UTC 2013 root.37.174:~ ] # gluster volume create dogvol 10.70.37.174:/rhs/brick1/dogdir1 10.70.37.185:/rhs/brick1/dogdir1 volume create: dogvol: success: please start the volume to access data [Thu Aug 29 09:37:29 UTC 2013 root.37.174:~ ] # gluster volume start dogvol volume start: dogvol: success [Thu Aug 29 09:37:39 UTC 2013 root.37.174:~ ] # gluster volume status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 49160 Y 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 49160 Y 14854 NFS Server on localhost 2049 Y 27608 NFS Server on 10.70.37.118 2049 Y 10212 NFS Server on 10.70.37.185 2049 Y 14868 NFS Server on 10.70.37.95 2049 Y 10163 There are no active volume tasks [Thu Aug 29 09:38:32 UTC 2013 root.37.174:~ ] # gluster v info Volume Name: dogvol Type: Distribute Volume ID: 0350f1f9-75bd-4e1d-ac88-4eb00378740f Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.37.174:/rhs/brick1/dogdir1 Brick2: 10.70.37.185:/rhs/brick1/dogdir1 [Thu Aug 29 09:38:37 UTC 2013 root.37.174:~ ] # gluster v status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 49160 Y 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 49160 Y 14854 NFS Server on localhost 2049 Y 27666 NFS Server on 10.70.37.95 2049 Y 10206 NFS Server on 10.70.37.118 2049 Y 10248 NFS Server on 10.70.37.185 2049 Y 14911 There are no active volume tasks [Thu Aug 29 09:38:40 UTC 2013 root.37.174:~ ] # ps aux | grep quotad root 27714 0.0 0.0 103244 804 pts/0 R+ 15:08 0:00 grep quotad [Thu Aug 29 09:39:06 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol enable volume quota : success [Thu Aug 29 09:39:13 UTC 2013 root.37.174:~ ] # ps aux | grep quotad root 27758 0.4 0.8 187988 18028 ? Ssl 15:09 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/7e63030677df5afe2fa7a9f790189502.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off root 27771 0.0 0.0 103244 812 pts/0 S+ 15:09 0:00 grep quotad [Thu Aug 29 09:39:17 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol list [Thu Aug 29 09:39:23 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol limit-usage /master 2GB quota command failed : Failed to get trusted.gfid attribute on path /master. Reason : No such file or directory <CREATED THE DIRECTORY FROM FUSE MOUNT> [Thu Aug 29 09:39:40 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol limit-usage /master 2GB volume quota : success [Thu Aug 29 09:41:03 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- /master 2.0GB 9130191673159152629 0Bytes 2.0GB <REMOVE THE DIRECTORY FROM FUSE MOUNT> [Thu Aug 29 09:41:05 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- [Thu Aug 29 09:41:21 UTC 2013 root.37.174:~ ] # gluster volume status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 49160 Y 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 49160 Y 14854 NFS Server on localhost 2049 Y 27666 Quota Daemon on localhost N/A Y 27758 NFS Server on 10.70.37.185 2049 Y 14911 Quota Daemon on 10.70.37.185 N/A Y 14948 NFS Server on 10.70.37.118 2049 Y 10248 Quota Daemon on 10.70.37.118 N/A Y 10281 NFS Server on 10.70.37.95 2049 Y 10206 Quota Daemon on 10.70.37.95 N/A Y 10239 There are no active volume tasks <REMOVE THE DIRECTORY, FROM FUSE MOUNT AFTER SETTING QUOTA ON IT> [Thu Aug 29 09:41:27 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol limit-usage /master 2GB quota command failed : Failed to get trusted.gfid attribute on path /master. Reason : No such file or directory [Thu Aug 29 09:41:40 UTC 2013 root.37.174:~ ] # gluster volume status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 49160 Y 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 49160 Y 14854 NFS Server on localhost 2049 Y 27666 Quota Daemon on localhost N/A Y 27758 NFS Server on 10.70.37.118 2049 Y 10248 Quota Daemon on 10.70.37.118 N/A Y 10281 NFS Server on 10.70.37.185 2049 Y 14911 Quota Daemon on 10.70.37.185 N/A Y 14948 NFS Server on 10.70.37.95 2049 Y 10206 Quota Daemon on 10.70.37.95 N/A Y 10239 There are no active volume tasks [Thu Aug 29 09:41:42 UTC 2013 root.37.174:~ ] # gluster volume status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 49160 Y 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 49160 Y 14854 NFS Server on localhost 2049 Y 27666 Quota Daemon on localhost N/A Y 27758 NFS Server on 10.70.37.185 2049 Y 14911 Quota Daemon on 10.70.37.185 N/A Y 14948 NFS Server on 10.70.37.118 2049 Y 10248 Quota Daemon on 10.70.37.118 N/A Y 10281 NFS Server on 10.70.37.95 2049 Y 10206 Quota Daemon on 10.70.37.95 N/A Y 10239 There are no active volume tasks <RECREATING THE SAME DIRECTORY FROM FUSE MOUNT> [Thu Aug 29 09:42:08 UTC 2013 root.37.174:~ ] # gluster volume quota dogvol list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- [Thu Aug 29 09:42:14 UTC 2013 root.37.174:~ ] # gluster volume status Status of volume: dogvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.174:/rhs/brick1/dogdir1 N/A N 27596 Brick 10.70.37.185:/rhs/brick1/dogdir1 N/A N 14854 NFS Server on localhost 2049 Y 27666 Quota Daemon on localhost N/A Y 27758 NFS Server on 10.70.37.185 2049 Y 14911 Quota Daemon on 10.70.37.185 N/A Y 14948 NFS Server on 10.70.37.95 2049 Y 10206 Quota Daemon on 10.70.37.95 N/A Y 10239 NFS Server on 10.70.37.118 2049 Y 10248 Quota Daemon on 10.70.37.118 N/A Y 10281 There are no active volume tasks