Bug 1312833
Summary: | [RHEL-7] gluster-swift: Fix exceptions raised when there are mutliple instances | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prashanth Pai <ppai> | |
Component: | gluster-swift | Assignee: | Prashanth Pai <ppai> | |
Status: | CLOSED ERRATA | QA Contact: | surabhi <sbhaloth> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | rhgs-3.1 | CC: | nlevinki, rcyriac, rhinduja, rhs-bugs, thiago | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | swiftonfile-2.3.0-3 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1328445 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:31:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1299184, 1328445 |
Description
Prashanth Pai
2016-02-29 10:50:38 UTC
RCA: The behaviour of glusterfs for renames has changed in recent versions. rename() used to be a blocking operation and now it isn't. More details can be found here: https://www.spinics.net/lists/gluster-devel/msg18250.html W.r.t rename(), the scope of fixing this BZ is limited to making sure gluster-swift code takes cognizance of this change and handles rename() as non-blocking operation. gluster-swift does not expect rename() to throw a EBUSY/EAGAIN/ESTALE today. We need to handle that. Ideally, we want the old behavior in glusterfs i.e rename() be blocking but fixing that is beyond the scope of this BZ. This BZ also tracks further effort (if any) and work required to make gluster-swift code compatible with glusterfs behaviour when multiple FUSE mounts are used. rename() is one such case that I've seen in one of the customer issues. There could be others. The blocking rename() issue has been tracked in this upstream glusterfs BZ here: https://bugzilla.redhat.com/show_bug.cgi?id=1304966 The non-blocking rename issue seems to be fixed in GlusterFS. In a multi-node test with haproxy as load balancer and 3 other node with PACO server running, following exceptions were seen in logs which are related to situation when space in volume is full. Apr 14 12:55:03 rhs312el7-2 object-server[18055]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/37/ad/123a178f-2e9b-4759-9ac7-6629c62733a9/26b555302e5fee68c2a9ca1aa11cbb45.dat : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 904, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/server.py", line 83, in PUT#012 return server.ObjectController.PUT(self, request)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 533, in PUT#012 with disk_file.create(size=fsize) as writer:#012 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 896, in create#012 self._create_dir_object(self._obj_path)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 798, in _create_dir_object#012 ret, newmd = make_directory(cur_path, self._uid, self._gid, md)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 156, in make_directory#012 " path %s (%s)" % (full_path, str(err)))#012DiskFileError: make_directory: mkdir failed on path /mnt/gluster-object/test/c1/37/ad/123a178f-2e9b-4759-9ac7-6629c62733a9 ([Errno 28] No space left on device: '/mnt/gluster-object/test/c1/37/ad/123a178f-2e9b-4759-9ac7-6629c62733a9') (txn: tx77d90959c13e45f79dba4-00570f45c9) Apr 14 12:54:59 rhs312el7-2 object-server[18054]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/a7/02/e5f557c5-87aa-450d-b852-2fc2b3866f0e/9ee80cd8afeeabb65fe38eeba3ab40b7.dat : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 904, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/server.py", line 83, in PUT#012 return server.ObjectController.PUT(self, request)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 533, in PUT#012 with disk_file.create(size=fsize) as writer:#012 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 909, in create#012 dw.close()#012AttributeError: 'NoneType' object has no attribute 'close' (txn: txcca05d5ffdf04632a042f-00570f45ca) Apr 14 12:55:04 rhs312el7-2 object-server[18055]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/eb/e8/bf21d1e5-2001-4ea6-a8c9-2926d8cace34/77e14470a18907d356869f96c8925932.dat : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 904, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/server.py", line 83, in PUT#012 return server.ObjectController.PUT(self, request)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 533, in PUT#012 with disk_file.create(size=fsize) as writer:#012 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 841, in create#012 os.O_WRONLY | os.O_CREAT | os.O_EXCL | O_CLOEXEC)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/common/fs_utils.py", line 245, in do_open#012 err.strerror, path, flags, kwargs))#012GlusterFileSystemOSError: [Errno 116] Stale file handle, os.open("/mnt/gluster-object/test/c1/eb/e8/bf21d1e5-2001-4ea6-a8c9-2926d8cace34/.77e14470a18907d356869f96c8925932.dat.17e98fafc90344448ff719a79a9a3ae4", 800c1, {}) (txn: tx2a8c3ac889ce49eca0412-00570f45cd) Apr 14 12:55:00 rhs312el7-4 object-server[18035]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/89/d5/7964ad1a-5ff9-4b8a-bde5-001f11367511/e838c0a8a7ae0077755cea490e2a7eba.dat : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 904, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/server.py", line 83, in PUT#012 return server.ObjectController.PUT(self, request)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 533, in PUT#012 with disk_file.create(size=fsize) as writer:#012 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 841, in create#012 os.O_WRONLY | os.O_CREAT | os.O_EXCL | O_CLOEXEC)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/common/fs_utils.py", line 245, in do_open#012 err.strerror, path, flags, kwargs))#012GlusterFileSystemOSError: [Errno 116] Stale file handle, os.open("/mnt/gluster-object/test/c1/89/d5/7964ad1a-5ff9-4b8a-bde5-001f11367511/.e838c0a8a7ae0077755cea490e2a7eba.dat.10e5a0ff2e344f4598c531bc34e24cbd", 800c1, {}) (txn: txb3d41d5925d54565816eb-00570f45c9) Apr 14 12:55:00 rhs312el7-4 object-server[18035]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/af/e2/786e3c0f-7048-4707-8536-d50a295f1784/51a49aaf9944d941581c2561181f802b.dat : [Errno 28] No space left on device, os.rename("/mnt/gluster-object/test/c1/af/e2/786e3c0f-7048-4707-8536-d50a295f1784/.51a49aaf9944d941581c2561181f802b.dat.612bdc5e472441be9db253a8b4410717", "/mnt/gluster-object/test/c1/af/e2/786e3c0f-7048-4707-8536-d50a295f1784/51a49aaf9944d941581c2561181f802b.dat"), rename('/mnt/gluster-object/test/c1/af/e2/786e3c0f-7048-4707-8536-d50a295f1784/.51a49aaf9944d941581c2561181f802b.dat.612bdc5e472441be9db253a8b4410717', '/mnt/gluster-object/test/c1/af/e2/786e3c0f-7048-4707-8536-d50a295f1784/51a49aaf9944d941581c2561181f802b.dat') (txn: txbd0f65b39dbf4f479b344-00570f45c9) On an unrelated note, cleaning the volume from GlusterFS FUSE mount fails as follows. [root@rhs312el7-3 ~]# rm -rf /mnt/gluster-object/test/* rm: cannot remove ‘/mnt/gluster-object/test/c1/8a/33/859c81e9-24f3-465a-bea5-c4cdd383c53f’: Directory not empty rm: cannot remove ‘/mnt/gluster-object/test/c1/a7/02/e5f557c5-87aa-450d-b852-2fc2b3866f0e’: Directory not empty rm: cannot remove ‘/mnt/gluster-object/test/c1/6e/bc’: Directory not empty rm: cannot remove ‘/mnt/gluster-object/test/c1/d5/27/f18587f0-79d7-49da-9f08-4be1c1a418d4’: Directory not empty rm: cannot remove ‘/mnt/gluster-object/test/c1/3b/17/da648fb9-b829-45bc-a5ae-d276112884f9’: Directory not empty I'm being told that this is a known issue in DHT and the fix is slated to get into RHGS 3.1.3 swift-object-server seems to be hung to these files: [root@rhs312el7-2 ~]# ls -l /proc/18054/fd total 0 lrwx------. 1 root root 64 Apr 14 15:13 0 -> /dev/null lrwx------. 1 root root 64 Apr 14 15:13 1 -> /dev/null lrwx------. 1 root root 64 Apr 14 15:13 12 -> socket:[273343] lrwx------. 1 root root 64 Apr 14 15:13 13 -> socket:[273344] lrwx------. 1 root root 64 Apr 14 15:13 2 -> /dev/null l-wx------. 1 root root 64 Apr 14 15:13 20 -> /mnt/gluster-object/test/c1/a7/02/e5f557c5-87aa-450d-b852-2fc2b3866f0e/.9ee80cd8afeeabb65fe38eeba3ab40b7.dat.0c316d00e9884d07b75e158daf17f2d0 lrwx------. 1 root root 64 Apr 14 15:13 3 -> socket:[260495] lrwx------. 1 root root 64 Apr 14 15:13 4 -> socket:[260496] lrwx------. 1 root root 64 Apr 14 15:13 7 -> socket:[260505] lr-x------. 1 root root 64 Apr 14 15:13 8 -> /dev/urandom lr-x------. 1 root root 64 Apr 14 15:13 9 -> /dev/urandom The above fd leak arises from error in following exception: Apr 14 12:54:59 rhs312el7-2 object-server[18054]: ERROR __call__ error with PUT /test/0/AUTH_test/c1/a7/02/e5f557c5-87aa-450d-b852-2fc2b3866f0e/9ee80cd8afeeabb65fe38eeba3ab40b7.dat : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 904, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/server.py", line 83, in PUT#012 return server.ObjectController.PUT(self, request)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2648, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1205, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 533, in PUT#012 with disk_file.create(size=fsize) as writer:#012 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/gluster/swift/obj/diskfile.py", line 909, in create#012 dw.close()#012AttributeError: 'NoneType' object has no attribute 'close' (txn: txcca05d5ffdf04632a042f-00570f45ca) Upstream change: http://review.gluster.org/#/c/14118 Changes reviewed and merged. Upstream: http://review.gluster.org/#/c/14118 Downstream: https://code.engineering.redhat.com/gerrit/#/c/73583/ Verified on latest build as follows: 1.setup 4 node swift setup. Start swift services on all nodes. 2. Configure haproxy node 3. start large number(3000 from node1)(3000 from node 2)(2000 from node 3) of PUT from multiple nodes and verify from ha proxy log that the requests have been forwarded to all the nodes which have swift services running and are configured in ha proxy .cfg. Also verified that there are no exceptions raised. Marking the BZ verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1289 |