Bug 865619 - Adding "large" amounts of metadata to a container or object, then deleting it, can strand metadata keys causing unnecessary reads of those keys
Summary: Adding "large" amounts of metadata to a container or object, then deleting it...
Keywords:
Status: MODIFIED
Alias: None
Product: Gluster-Swift
Classification: Community
Component: utils
Version: 1.8.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 903396 978061
TreeView+ depends on / blocked
 
Reported: 2012-10-11 22:23 UTC by Peter Portante
Modified: 2023-01-31 23:39 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
Creates three containers, adds metadata to two, removes from the third, third one left with multiple xattr keys (3.44 KB, application/x-shellscript)
2012-10-11 22:28 UTC, Peter Portante
no flags Details

Description Peter Portante 2012-10-11 22:23:57 UTC
Description of problem:

It would appear, though I have not traced every possible code path, that if the size of the stored metadata associated with an object grows to use multiple metadata keys, and then shrinks to use fewer keys, those higher numbered keys end up stranded, causing metadata read operations for all keys even though they are not used.

As of the latest code at the time of this writing, plugins/utils.py:write_metadata() pickles the metadata dictionary into a string, and then writes that string in key-value pairs as follows (depending on the length):

  user.swift.metadata: string[0-253]
  user.swift.metadata1: string[254-507]
  user.swift.metadata2: string[508-761]

So if the pickled size of the metadata is initially 400 bytes long, then it will be stored using two keys, user.swift.metadata and user.swift.metadata1. Should the pickled size of the metadata for that object shrink to a length less than 255 bytes, it will now be stored using only one key.

However, the second key, user.swift.metadata1, is not removed.

Currently, plugins/utils.py:read_metadata() is coded to keep looking for keys, incrementing the trailing number, appending the strings together to form the pickled string, doing so until it is unable to find a key. It then unpickles the resulting object.

The pickle code ignores any trailing data that is not part of the pickle, so the metadata is stored properly still.

With the second key now stranded, it will be continually added to the value of the first key before unpickling. Until that second key is removed, multiple system calls are then required to fetch all the metadata.


Version-Release number of selected component (if applicable):

 RHS 2.0

How reproducible:

 Every time

Steps to Reproduce:

See attached reproducer, test-metadata.sh, which creates three containers, where the first one has minimal metadata fitting in one xattr key/value pair, the second has big enough metadata requiring 3 xattr key/value pairs, and the third container adds and removes enough metadata so that it once required 3 xattr key/value pairs, but no longer does. As a result, looking at the xattrs on the file system shows the keys, but only one is required.
  
Actual results:

 Three keys created, two get stranded.

Expected results:

 One key exists.

Additional info:

This code was originally copied from the OpenStack Swift project, swift/obj/server.py, where metadata on a file is never rewritten, so they never encounter this problem. They create temporary '*.meta' data files that they rename to replace the existing one when they update data. We can encounter the problem because we use rewrite the metadata on the existing file.

Note that this code relies on pyxattr, which performs two getxattr system calls to read each key: one to find the size of the value stored for that key, the second to fetch the actual value once a suitably sized buffer to hold it has been allocated.

It should be noted that XFS has a 64 KB limit per key for xattr values.

Also note that Swift has a default HTTP REST protocol metadata (key lengths + values) limit of 4096. Note that the defaults for the individual length of metadata values is 256, with a limit of 90 such values, for a total of 23,040 bytes storeable by default. Since we use pickle to store these bytes, it adds some amount bytes of overhead.

Comment 1 Peter Portante 2012-10-11 22:28:00 UTC
Created attachment 625717 [details]
Creates three containers, adds metadata to two, removes from the third, third one left with multiple xattr keys

The script does not inspect the gluster file system directly. The user has to go to see the keys on the container, for example:

$ python
Python 2.6.6 (r266:84292, May  1 2012, 13:52:17) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xattr
>>> xattr.list("/mnt/gluster-object/AUTH_ufo0/tcontainer01")
['user.swift.metadata', 'security.selinux']
>>> xattr.list("/mnt/gluster-object/AUTH_ufo0/tcontainer02")
['user.swift.metadata2', 'user.swift.metadata1', 'user.swift.metadata', 'security.selinux']
>>> xattr.list("/mnt/gluster-object/AUTH_ufo0/tcontainer03")
['user.swift.metadata2', 'user.swift.metadata1', 'user.swift.metadata', 'security.selinux']

Where tcontainer03 only has enough metadata to require just one xattr key (see GET output for container 3 in the tar ball generated by the test).

Comment 2 Peter Portante 2012-10-19 05:16:48 UTC
I'll take this one, as I am currently working on a set of changes to address this.

Comment 3 Vijay Bellur 2012-10-25 22:03:20 UTC
CHANGE: http://review.gluster.org/4109 (object-storage: reduce the number of getxattr system calls by one) merged in master by Anand Avati (avati)


Note You need to log in before you can comment on or make changes to this bug.