Created attachment 624638 [details] SOS report of the server Description of problem: In a distributed-replicate volume, we see xattr failures in the brick logs. Version-Release number of selected component (if applicable): # rpm -qa | grep glus glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-devel-3.3.0rhsvirt1-7.el6rhs.x86_64 How reproducible: We saw it just once. Steps to Reproduce: 1. Create (2x2) distributed-replicate volume 2. VMs were hosted with gluster volume as storage domain in RHEV-M 3. Actual results: setting xattrs failed Expected results: Shouldn't fail. Additional info: Volume Name: dist-replica Type: Distributed-Replicate Volume ID: 39e0c10c-12d8-4484-b21d-a3be0cd0b7aa Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client36.lab.eng.blr.redhat.com:/dist-replica1 Brick2: rhs-client37.lab.eng.blr.redhat.com:/dist-replica1 Brick3: rhs-client43.lab.eng.blr.redhat.com:/dist-replica1 Brick4: rhs-client44.lab.eng.blr.redhat.com:/dist-replica1 Options Reconfigured: cluster.eager-lock: enable storage.linux-aio: disable performance.read-ahead: disable performance.stat-prefetch: disable performance.io-cache: disable performance.quick-read: disable Brick log: [2012-10-09 16:39:08.163031] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-client36-2984-2012/09/28-16:38:02:179578-dist-replica-client-0-3 (version: 3.3.0rhsvirt1) [2012-10-09 16:39:08.167223] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-client36-2984-2012/09/28-16:38:02:179578-dist-replica-client-0-2 (version: 3.3.0rhsvirt1) [2012-10-09 16:39:08.192640] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-client44.lab.eng.blr.redhat.com-2892-2012/09/28-16:16:11:723950-dist-replica-client-0-2 (version: 3.3.0rhsvirt1) [2012-10-09 16:39:08.197965] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-client44.lab.eng.blr.redhat.com-2892-2012/09/28-16:16:11:723950-dist-replica-client-0-1 (version: 3.3.0rhsvirt1) [2012-10-09 16:39:08.203243] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-client44.lab.eng.blr.redhat.com-2892-2012/09/28-16:16:11:723950-dist-replica-client-0-3 (version: 3.3.0rhsvirt1) [2012-10-09 16:39:08.259597] I [server-handshake.c:571:server_setvolume] 0-dist-replica-server: accepted client from rhs-gp-srv1.lab.eng.blr.redhat.com-6478-2012/09/28-16:56:14:880857-dist-replica-client-0-2 (version: 3.3.0rhsvirt1) [2012-10-09 19:13:09.495736] E [posix-helpers.c:701:posix_handle_pair] 0-dist-replica-posix: /dist-replica1/7746e77b-7475-4fb8-ab7f-fd85773c5762/images/ac076c0c-22b1-4dd3-be3c-ac8befb67c58/e7f10d0b-0f7f-44cd-b6ed-02109bea1113: key:trusted.glusterfs.dht.linkto error:File exists [2012-10-09 19:13:09.495779] E [posix.c:859:posix_mknod] 0-dist-replica-posix: setting xattrs on /dist-replica1/7746e77b-7475-4fb8-ab7f-fd85773c5762/images/ac076c0c-22b1-4dd3-be3c-ac8befb67c58/e7f10d0b-0f7f-44cd-b6ed-02109bea1113 failed (File exists) [2012-10-09 19:16:15.179306] E [posix-helpers.c:701:posix_handle_pair] 0-dist-replica-posix: /dist-replica1/7746e77b-7475-4fb8-ab7f-fd85773c5762/images/0b547063-a616-459d-9465-f9a3d3b2aa8c/e7f10d0b-0f7f-44cd-b6ed-02109bea1113: key:trusted.glusterfs.dht.linkto error:File exists [2012-10-09 19:16:15.179362] E [posix.c:859:posix_mknod] 0-dist-replica-posix: setting xattrs on /dist-replica1/7746e77b-7475-4fb8-ab7f-fd85773c5762/images/0b547063-a616-459d-9465-f9a3d3b2aa8c/e7f10d0b-0f7f-44cd-b6ed-02109bea1113 failed (File exists)
This can happen if the link file already exists... Shishir can you check this? and we may want to neglect to log EEXISTs
possible race between 'create()' and 'mknod()' to create the dht-linkfile (is possible when from one node fix-layout is happening which results in getting file's created with linkfile, and another node is trying to migrate-data, for which the first step is to create the linkfile if doesn't exists already). mostly the following patch should fix the issue: --------------------------------------------------- amar@unused:~/work/glusterfs$ git diff diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c index cf4e086..8160115 100644 --- a/xlators/storage/posix/src/posix.c +++ b/xlators/storage/posix/src/posix.c @@ -1718,6 +1718,9 @@ posix_create (call_frame_t *frame, xlator_t *this, goto out; } + if (was_present) + goto fill_stat; + op_ret = posix_gfid_set (this, real_path, loc, xdata); if (op_ret) { gf_log (this->name, GF_LOG_ERROR, @@ -1748,6 +1751,7 @@ posix_create (call_frame_t *frame, xlator_t *this, strerror (errno)); } +fill_stat: op_ret = posix_fdstat (this, _fd, &stbuf); if (op_ret == -1) { op_errno = errno;
CHANGE: http://review.gluster.org/4265 (storage/posix: if create returns EXIST, donot set gfid/xattrs) merged in master by Anand Avati (avati)
Verified with RHS 2.0+
Shishir, This bug has been added to Update 4 errata. Could you provide your inputs in doc text field which will enable me to update errata?? Thanks, Divya
I think I hit this running on the 3.3.0.6 RHEL 5 client RPMs. In the logs I see: [2013-03-04 01:57:01.351079] E [posix-helpers.c:721:posix_handle_pair] 0-DISTRIBUTED-REPLICATED-posix: /brick1/run10228/p7/d6/fc: key:trusted.glusterfs.dht.linkto error:File exists [2013-03-04 01:57:01.351096] E [posix.c:860:posix_mknod] 0-DISTRIBUTED-REPLICATED-posix: setting xattrs on /brick1/run10228/p7/d6/fc failed (File exists) Was the patch that resolves this BZ included in the 3.3.0.6 el5 packages?
glusterfs-3.3.0.5rhs-41 and glusterfs-3.3.0.5rhs-43 rpms for RHEL 5 has the fix.
Per 03/12 Anshi status call, targeting for Big Bend.
The fix CHANGE: http://review.gluster.org/4265 (storage/posix: if create returns EXIST, donot set gfid/xattrs) merged in master by Anand Avati (avati) is available in release for Big Bend. Moving it to ON_QA for the bug to be verified.
Verified on glusterfs-3.4.0.14rhs-1.el6rhs.x86_64.rpm.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html