Bug 838197 - dht_aggregate 'user.swift.metadata' mismatch
dht_aggregate 'user.swift.metadata' mismatch
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: shishir gowda
Saurabh
:
Depends On:
Blocks: 840812
  Show dependency treegraph
 
Reported: 2012-07-06 23:10 EDT by Harshavardhana
Modified: 2016-01-19 01:10 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 840812 (view as bug list)
Environment:
Last Closed: 2012-09-11 10:23:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
GlusterFS logs (1.94 MB, application/x-gzip)
2012-07-06 23:13 EDT, Harshavardhana
no flags Details
kernel_trace.tar.gz (3.60 KB, application/x-gzip)
2012-07-09 18:36 EDT, Justin Bautista
no flags Details

  None (edit)
Description Harshavardhana 2012-07-06 23:10:54 EDT
8mg0ga9rp0wr2rca_t150.jpg on subvolume m3vol0-client-2 => -1 (No such file or directory)
[2012-07-05 23:44:13.468896] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9MC5eo41aEutXMIXn4U75Q
[2012-07-05 23:44:13.516336] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data entry self-heal complete
d on /x23ZfwYxTNUi-LXckObJdo/photo
[2012-07-05 23:44:13.641162] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9TvGJXLpGEKCxj
DDbPw23h, reason: lookup detected pending operations
[2012-07-05 23:44:13.643814] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9TvGJXLpGEKCxjDDbPw23h
[2012-07-05 23:44:13.877739] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9c6m47SLqU27Vi
9kqMXQGg, reason: lookup detected pending operations
[2012-07-05 23:44:13.893210] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9c6m47SLqU27Vi9kqMXQGg
[2012-07-05 23:44:13.904431] W [dht-common.c:73:dht_aggregate] 0-dht: xattr mismatch for user.swift.metadata
[2012-07-05 23:44:13.905929] W [dht-common.c:73:dht_aggregate] 0-dht: xattr mismatch for user.swift.metadata
[2012-07-05 23:44:14.008516] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9hA2vgWiX0WmJW
I3DfYPXQ, reason: lookup detected pending operations
[2012-07-05 23:44:14.024805] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9hA2vgWiX0WmJWI3DfYPXQ
[2012-07-05 23:44:14.620186] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /A4-OEKzbWk-ep4
LnB3lrTw, reason: lookup detected pending operations
[2012-07-05 23:44:14.966200] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
A4-OEKzbWk-ep4LnB3lrTw
[2012-07-05 23:44:15.012553] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /A6X6TO1bEkWA61
5LJHSqwg, reason: lookup detected pending operations
[2012-07-05 23:44:15.139516] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
A6X6TO1bEkWA615LJHSqwg
[2012-07-05 23:44:15.442715] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /AITzSEc-ok2kdU
mVs4V1xQ, reason: lookup detected pending operations
[2012-07-05 23:44:15.450344] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
AITzSEc-ok2kdUmVs4V1xQ
[2012-07-05 23:44:15.608096] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data s
Comment 1 Harshavardhana 2012-07-06 23:13:19 EDT
Created attachment 596713 [details]
GlusterFS logs
Comment 3 shishir gowda 2012-07-09 05:14:04 EDT
This can occur only if any of the bricks went offline during a subsequent update of the xattr. Since DHT cannot take a call on the most recent update, we do not self-heal xattrs. Hence the log.
Comment 4 Justin Bautista 2012-07-09 18:36:57 EDT
Created attachment 597165 [details]
kernel_trace.tar.gz

Uploading kernel trace for 'D' state process
Comment 5 Junaid 2012-07-10 04:56:59 EDT
DHT sends setxattr fop to all its children and hence expects the same value to be present on all the bricks when it performs getxattr. But the setxattr is not done with in locks, hence this is bound to be racy. When two gluterfs client processes receive setxattr calls on the same directory at the same time, propagate the setxattr calls to all the servers. Say there are two bricks, then these two bricks receive the setxattrs with different values but same key. Now if there is a race between the setxattrs on the two machines, the bricks will end up with two different xttr's values to the same key. Now when dht performs getxattr, it will see that the bricks have different values and hence the glusterfs log.
Comment 6 Harshavardhana 2012-07-10 12:47:38 EDT
What are the eventual problems of this 'race' ? does it result in inconsistency? is there a xattr self-heal to figure out the the correct xattr value and heal the neighboring directories? 

btw isn't it a bug that setxattr is not done with in locks? was that avoided for performance reasons? if that is the case can this be racy even across other modules as well?
Comment 7 Harshavardhana 2012-07-10 18:14:54 EDT
Also the question would be that if getxattr fails with that message, what is the perspective experience from Object Storage side? will that observer 404 as well?
Comment 8 Junaid 2012-07-11 01:04:37 EDT
For now there is no xattr self-heal in place. On an xattr mismatch DHT logs in the file but getxattr doesn't fail. DHT returns one of the xattr values. This may cause some inconsistency in the case of HEAD but this will be healed when the user does a GET.

The 404 error are mostly because of timeouts. When the storage servers (account, container and object) fail respond with in the stipulated timeout period the proxy server response to the client with the error.
Comment 9 Vijay Bellur 2012-07-25 11:20:44 EDT
CHANGE: http://review.gluster.com/3722 (cluster/distribute: Suppress user xattr mismatch log message) merged in master by Vijay Bellur (vbellur@redhat.com)
Comment 10 Vijay Bellur 2012-07-25 12:28:53 EDT
CHANGE: http://review.gluster.com/3723 (cluster/distribute: Suppress user xattr mismatch log message) merged in release-3.3 by Vijay Bellur (vbellur@redhat.com)
Comment 12 errata-xmlrpc 2012-09-11 10:23:14 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1253.html

Note You need to log in before you can comment on or make changes to this bug.