838197 – dht_aggregate 'user.swift.metadata' mismatch

Bug 838197 - dht_aggregate 'user.swift.metadata' mismatch

Summary: dht_aggregate 'user.swift.metadata' mismatch

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	shishir gowda
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	840812
TreeView+	depends on / blocked

Reported:	2012-07-07 03:10 UTC by Harshavardhana
Modified:	2018-11-28 19:59 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	840812 (view as bug list)
Environment:
Last Closed:	2012-09-11 14:23:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
GlusterFS logs (1.94 MB, application/x-gzip) 2012-07-07 03:13 UTC, Harshavardhana	no flags	Details
kernel_trace.tar.gz (3.60 KB, application/x-gzip) 2012-07-09 22:36 UTC, Justin Bautista	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:1253	0	normal	SHIPPED_LIVE	Red Hat Storage 2.0 enhancement and bug fix update #2	2012-09-11 18:21:30 UTC

Description Harshavardhana 2012-07-07 03:10:54 UTC

8mg0ga9rp0wr2rca_t150.jpg on subvolume m3vol0-client-2 => -1 (No such file or directory)
[2012-07-05 23:44:13.468896] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9MC5eo41aEutXMIXn4U75Q
[2012-07-05 23:44:13.516336] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data entry self-heal complete
d on /x23ZfwYxTNUi-LXckObJdo/photo
[2012-07-05 23:44:13.641162] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9TvGJXLpGEKCxj
DDbPw23h, reason: lookup detected pending operations
[2012-07-05 23:44:13.643814] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9TvGJXLpGEKCxjDDbPw23h
[2012-07-05 23:44:13.877739] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9c6m47SLqU27Vi
9kqMXQGg, reason: lookup detected pending operations
[2012-07-05 23:44:13.893210] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9c6m47SLqU27Vi9kqMXQGg
[2012-07-05 23:44:13.904431] W [dht-common.c:73:dht_aggregate] 0-dht: xattr mismatch for user.swift.metadata
[2012-07-05 23:44:13.905929] W [dht-common.c:73:dht_aggregate] 0-dht: xattr mismatch for user.swift.metadata
[2012-07-05 23:44:14.008516] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /9hA2vgWiX0WmJW
I3DfYPXQ, reason: lookup detected pending operations
[2012-07-05 23:44:14.024805] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
9hA2vgWiX0WmJWI3DfYPXQ
[2012-07-05 23:44:14.620186] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /A4-OEKzbWk-ep4
LnB3lrTw, reason: lookup detected pending operations
[2012-07-05 23:44:14.966200] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
A4-OEKzbWk-ep4LnB3lrTw
[2012-07-05 23:44:15.012553] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /A6X6TO1bEkWA61
5LJHSqwg, reason: lookup detected pending operations
[2012-07-05 23:44:15.139516] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
A6X6TO1bEkWA615LJHSqwg
[2012-07-05 23:44:15.442715] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data self-heal triggered. path: /AITzSEc-ok2kdU
mVs4V1xQ, reason: lookup detected pending operations
[2012-07-05 23:44:15.450344] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-m3vol0-replicate-1: background  meta-data self-heal completed on /
AITzSEc-ok2kdUmVs4V1xQ
[2012-07-05 23:44:15.608096] I [afr-common.c:1340:afr_launch_self_heal] 0-m3vol0-replicate-1: background  meta-data s

Comment 1 Harshavardhana 2012-07-07 03:13:19 UTC

Created attachment 596713 [details]
GlusterFS logs

Comment 3 shishir gowda 2012-07-09 09:14:04 UTC

This can occur only if any of the bricks went offline during a subsequent update of the xattr. Since DHT cannot take a call on the most recent update, we do not self-heal xattrs. Hence the log.

Comment 4 Justin Bautista 2012-07-09 22:36:57 UTC

Created attachment 597165 [details]
kernel_trace.tar.gz

Uploading kernel trace for 'D' state process

Comment 5 Junaid 2012-07-10 08:56:59 UTC

DHT sends setxattr fop to all its children and hence expects the same value to be present on all the bricks when it performs getxattr. But the setxattr is not done with in locks, hence this is bound to be racy. When two gluterfs client processes receive setxattr calls on the same directory at the same time, propagate the setxattr calls to all the servers. Say there are two bricks, then these two bricks receive the setxattrs with different values but same key. Now if there is a race between the setxattrs on the two machines, the bricks will end up with two different xttr's values to the same key. Now when dht performs getxattr, it will see that the bricks have different values and hence the glusterfs log.

Comment 6 Harshavardhana 2012-07-10 16:47:38 UTC

What are the eventual problems of this 'race' ? does it result in inconsistency? is there a xattr self-heal to figure out the the correct xattr value and heal the neighboring directories? 

btw isn't it a bug that setxattr is not done with in locks? was that avoided for performance reasons? if that is the case can this be racy even across other modules as well?

Comment 7 Harshavardhana 2012-07-10 22:14:54 UTC

Also the question would be that if getxattr fails with that message, what is the perspective experience from Object Storage side? will that observer 404 as well?

Comment 8 Junaid 2012-07-11 05:04:37 UTC

For now there is no xattr self-heal in place. On an xattr mismatch DHT logs in the file but getxattr doesn't fail. DHT returns one of the xattr values. This may cause some inconsistency in the case of HEAD but this will be healed when the user does a GET.

The 404 error are mostly because of timeouts. When the storage servers (account, container and object) fail respond with in the stipulated timeout period the proxy server response to the client with the error.

Comment 9 Vijay Bellur 2012-07-25 15:20:44 UTC

CHANGE: http://review.gluster.com/3722 (cluster/distribute: Suppress user xattr mismatch log message) merged in master by Vijay Bellur (vbellur)

Comment 10 Vijay Bellur 2012-07-25 16:28:53 UTC

CHANGE: http://review.gluster.com/3723 (cluster/distribute: Suppress user xattr mismatch log message) merged in release-3.3 by Vijay Bellur (vbellur)

Comment 12 errata-xmlrpc 2012-09-11 14:23:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1253.html

Note You need to log in before you can comment on or make changes to this bug.