Bug 918437
| Summary: | timestamps updated after self-heal following primary brick rebuild | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Todd Stansell <todd+rhbugs> | ||||||
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.3.1 | CC: | gluster-bugs, jdarcy, joe | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 955469 (view as bug list) | Environment: | |||||||
| Last Closed: | 2013-07-24 17:50:48 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 918917, 955469 | ||||||||
| Attachments: |
|
||||||||
|
Description
Todd Stansell
2013-03-06 08:59:12 UTC
Created attachment 706248 [details]
data.log TRACE output
This is TRACE log output (data.log) while issuing a self-heal by running:
US-CA1 host13 (root) /var/log/glusterfs:$ ls -l /data/lib/Rcs.pm
-r--r--r-- 1 root root 42400 Mar 5 17:27 /data/lib/Rcs.pm
shortly thereafter the timestamp is updated:
US-CA1 host13 (root) /var/log/glusterfs:$ ls -la /gfs0/lib/Rcs.pm
-r--r--r-- 2 root root 42400 Mar 6 20:38 /gfs0/lib/Rcs.pm
US-CA1 host13 (root) /var/log/glusterfs:$ ls -l /data/lib/Rcs.pm
-r--r--r-- 1 root root 42400 Mar 6 20:38 /data/lib/Rcs.pm
If there are other logs that you think would be useful, let me know.
I'm sitting in #gluser IRC with tjstansell nick if folks would like to chat realtime.
Todd
JoeJulian built 3.3.0 for me so I could test that: http://koji.fedoraproject.org/koji/taskinfo?taskID=5086588 I used the rpms produced at http://kojipkgs.fedoraproject.org/work/tasks/6588/5086588/ and uninstalled my 3.3.1 packages, installed the 3.3.0 ones, updated the UUID so they didn't change on the servers and started things back up. I did a self-heal on another directory and the timestamps there also got updated. Todd I have simplified things down to the following test case to replicate this problem: 1. stop all gluster processes on node A 2. nuke both links to a file on the brick: US-CA1 host13 (root)$ getfattr -d -e hex -m "" /gfs0/brick1/bin/sync-data getfattr: Removing leading '/' from absolute path names # file: gfs0/brick1/bin/sync-data trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0xabe895ece50241d28502952de49693e1 US-CA1 host13 (root)$ rm -f /gfs0/brick1/bin/sync-data /gfs0/brick1/.glusterfs/ab/e8/abe895ec-e502-41d2-8502-952de49693e1 3. start glusterd back up 4. self heal the file When I do this from host13 (the second brick in the list) it works as expected. When I do this from host14 (the primary brick), the file is healed, but the timestamp gets updated. At this point I've verified the same behavior with 3.3.0, 3.3.1, and 3.4.0alpha. Todd Here's the set of commands I ran to reproduce this on a clean install: 1. install glusterfs host14$ mkdir /gfs0/brick13 /gfs0/brick14 host14$ yum install glusterfs glusterfs-server glusterfs-fuse host14$ service glusterd start host13$ mkdir /gfs0/brick13 /gfs0/brick14 host13$ yum install glusterfs glusterfs-server glusterfs-fuse host13$ service glusterd start 2. build two identical gluster volumes, except for ordering of bricks: host14$ gluster peer probe host13 host14$ gluster peer status host14$ gluster volume create gv13 replica 2 transport tcp host13:/gfs0/brick13 host14:/gfs0/brick13 host14$ gluster volume create gv14 replica 2 transport tcp host14:/gfs0/brick14 host13:/gfs0/brick14 host14$ gluster volume set gv13 nfs.disable on host14$ gluster volume set gv14 nfs.disable on host14$ gluster volume start gv13 host14$ gluster volume start gv14 3. mount volumes and copy data into them host13:/gv13 /gv13 glusterfs defaults 0 0 host13:/gv14 /gv14 glusterfs defaults 0 0 host13$ mkdir /gv13 /gv14 host13$ mount /gv13 host13$ mount /gv14 host13$ cp -p /etc/init.d/gluster* /gv13 host13$ cp -p /etc/init.d/gluster* /gv14 4. simulate disk failure host13$ umount /gv13 host14$ umount /gv14 host13$ service glusterd stop host13$ service glusterfsd stop host13$ ps -ef | grep gluster host13$ pkill gluster host14$ service glusterd stop host14$ service glusterfsd stop host14$ ps -ef | grep gluster host14$ pkill gluster host14$ getfattr -d -e hex -m "" /gfs0/brick13/glusterfs-heal host14$ rm -f /gfs0/brick13/glusterfs-heal /gfs0/brick13/.glusterfs/fc/bc/fcbc9c6b-d87b-4b4d-921d-ea6a8fa25d3f host14$ getfattr -d -e hex -m "" /gfs0/brick14/glusterfs-heal host14$ rm -f /gfs0/brick14/glusterfs-heal /gfs0/brick14/.glusterfs/66/0d/660d3734-04d0-438f-9746-a1d719ce6b00 host14$ service glusterd start 5. trigger self-heal and watch timestamps get updated, but only on /gv14: US-CA1 host13 (root) /:$ ls -la /gv13 /gv14 /gv13: total 10 drwxr-xr-x 3 root root 98 Mar 7 00:38 . dr-xr-xr-x. 26 root root 4096 Mar 7 00:37 .. -rwxr-xr-x 1 root root 1979 Feb 8 13:32 glusterd -rwxr-xr-x 1 root root 468 Mar 4 23:55 glusterfs-heal -rwxr-xr-x 1 root root 572 Mar 4 23:55 glusterfs-mount -rwxr-xr-x 1 root root 2019 Feb 8 13:32 glusterfsd /gv14: total 10 drwxr-xr-x 3 root root 98 Mar 7 00:38 . dr-xr-xr-x. 26 root root 4096 Mar 7 00:37 .. -rwxr-xr-x 1 root root 1979 Feb 8 13:32 glusterd -rwxr-xr-x 1 root root 468 Mar 4 23:55 glusterfs-heal -rwxr-xr-x 1 root root 572 Mar 4 23:55 glusterfs-mount -rwxr-xr-x 1 root root 2019 Feb 8 13:32 glusterfsd US-CA1 host13 (root) /:$ ls -la /gv13 /gv14 /gv13: total 10 drwxr-xr-x 3 root root 98 Mar 7 00:38 . dr-xr-xr-x. 26 root root 4096 Mar 7 00:37 .. -rwxr-xr-x 1 root root 1979 Feb 8 13:32 glusterd -rwxr-xr-x 1 root root 468 Mar 4 23:55 glusterfs-heal -rwxr-xr-x 1 root root 572 Mar 4 23:55 glusterfs-mount -rwxr-xr-x 1 root root 2019 Feb 8 13:32 glusterfsd /gv14: total 10 drwxr-xr-x 3 root root 98 Mar 7 00:38 . dr-xr-xr-x. 26 root root 4096 Mar 7 00:37 .. -rwxr-xr-x 1 root root 1979 Feb 8 13:32 glusterd -rwxr-xr-x 1 root root 468 Mar 7 00:41 glusterfs-heal -rwxr-xr-x 1 root root 572 Mar 4 23:55 glusterfs-mount -rwxr-xr-x 1 root root 2019 Feb 8 13:32 glusterfsd And then doing the same thing, but from the other node, it affects only /gv13 (since it's primary brick is on host13): host13$ service glusterd stop host13$ service glusterfsd stop host13$ pkill gluster host13$ rm -f /gfs0/brick13/glusterfs-heal /gfs0/brick13/.glusterfs/fc/bc/fcbc9c6b-d87b-4b4d-921d-ea6a8fa25d3f host13$ rm -f /gfs0/brick14/glusterfs-heal /gfs0/brick14/.glusterfs/66/0d/660d3734-04d0-438f-9746-a1d719ce6b00 host13$ service glusterd start see only /gv13 affected: US-CA1 host13 (root) /:$ date; ls -l /gv*/glusterfs-heal Thu Mar 7 01:18:27 UTC 2013 -rwxr-xr-x 1 root root 468 Mar 4 23:55 /gv13/glusterfs-heal -rwxr-xr-x 1 root root 468 Mar 7 00:41 /gv14/glusterfs-heal US-CA1 host13 (root) /:$ date; ls -l /gv*/glusterfs-heal Thu Mar 7 01:18:33 UTC 2013 -rwxr-xr-x 1 root root 468 Mar 7 01:18 /gv13/glusterfs-heal -rwxr-xr-x 1 root root 468 Mar 7 00:41 /gv14/glusterfs-heal I've done some more testing using a 3-brick replicate cluster locally on one host. It will always use the timestamp from brick1 when repairing the other bricks. In the case of removing a file from brick2 and touching brick3 (so it was different than brick1), all files will get healed to brick1's timestamp. When removing the file from brick1, it will always update all 3 bricks to the timestamp of when it restored the data to brick1. It looks to me like the metadata heal always happens after the data heal and the metadata heal *always* uses the first brick as the authoritative source for metadata. Interestingly, if I chown the file in brick1 and touch the file in brick2 and brick3 so they all have separate metadata, it seems to randomly choose a brick to use as the authoritative source, which is probably fine. It's a split-brain scenario for metadata, so it just chooses one. before: US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g* -rwxr-xr-x 2 bstansell root 468 2013-03-07 15:33:10.225361000 +0000 /gfs0/brick2/glusterfs-heal -rwxr-xr-x 2 dstuit root 468 2013-03-07 15:33:10.225361533 +0000 /gfs0/brick3/glusterfs-heal after: US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g* -rwxr-xr-x 2 bstansell root 468 2013-03-07 15:41:15.611249000 +0000 /gfs0/brick1/glusterfs-heal -rwxr-xr-x 2 bstansell root 468 2013-03-07 15:41:15.611249000 +0000 /gfs0/brick2/glusterfs-heal -rwxr-xr-x 2 bstansell root 468 2013-03-07 15:41:15.611249000 +0000 /gfs0/brick3/glusterfs-heal Any time it's restoring the file to brick1, it updates the timestamp. Here's a fun one... simulating brick1 going offline briefly. 1. kill the glusterfs process for brick1 2. update the file in /gv3 US-CA1 host13 (root) /:$ cp -p /etc/init.d/glusterfsd /gv3/glutserfs-heal 3. see the discrepancy: US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g* -rwxr-xr-x 2 root root 468 2013-03-07 15:49:19.845462000 +0000 /gfs0/brick1/glusterfs-heal -rwxr-xr-x 2 root root 2019 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick2/glusterfs-heal -rwxr-xr-x 2 root root 2019 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick3/glusterfs-heal 4. restart glusterd to bring brick1 back online US-CA1 host13 (root) /:$ service glusterd stop Stopping glusterd: [ OK ] US-CA1 host13 (root) /:$ service glusterfsd stop Stopping glusterfsd: [ OK ] US-CA1 host13 (root) /:$ service glusterd start Starting glusterd: [ OK ] 5. check out self-heal results: US-CA1 host13 (root) /:$ ls -la --full-time /gfs0/brick?/g* -rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick1/glusterfs-heal -rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick2/glusterfs-heal -rwxr-xr-x 2 root root 2019 2013-03-07 15:53:39.102271000 +0000 /gfs0/brick3/glusterfs-heal It used the newly updated data, but updated all of the timestamps. Todd I'm marking this as urgent, since it involves corruption of metadata. Todd I've been digging through trace logs and found the following. Not sure if this is the issue or not. The logs show metadata+data self-healing twice: [2013-03-07 17:00:46.972953] D [afr-common.c:1211:afr_lookup_set_self_heal_params_by_xattr] 0-gv3-replicate-0: metadata self-heal is pending for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. [2013-03-07 17:00:46.972958] D [afr-common.c:1225:afr_lookup_set_self_heal_params_by_xattr] 0-gv3-replicate-0: data self-heal is pending for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. [2013-03-07 17:00:46.972963] D [afr-common.c:1211:afr_lookup_set_self_heal_params_by_xattr] 0-gv3-replicate-0: metadata self-heal is pending for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. [2013-03-07 17:00:46.972969] D [afr-common.c:1225:afr_lookup_set_self_heal_params_by_xattr] 0-gv3-replicate-0: data self-heal is pending for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. which is fine ... it then proceeds to do the first metadata check ... [2013-03-07 17:00:46.973038] T [afr-self-heal-common.c:2357:afr_self_heal] 0-gv3-replicate-0: proceeding to metadata check on <gfid:78a027b5-dae3-4851-904c-926034438fb4> [2013-03-07 17:00:46.973070] T [afr-self-heal-common.c:1027:afr_sh_missing_entries_done] 0-gv3-replicate-0: proceeding to metadata check on <gfid:78a027b5-dae3-4851-904c-926034438fb4> but fails to attain non-blocking locks ... [2013-03-07 17:00:46.973924] T [afr-lk-common.c:1472:afr_nonblocking_inodelk_cbk] 0-gv3-replicate-0: Last inode locking reply received [2013-03-07 17:00:46.973930] T [afr-lk-common.c:1486:afr_nonblocking_inodelk_cbk] 0-gv3-replicate-0: 0 servers locked. Trying again with blocking calls [2013-03-07 17:00:46.973937] D [afr-lk-common.c:456:transaction_lk_op] 0-gv3-replicate-0: lk op is for a self heal [2013-03-07 17:00:46.973944] T [afr-lk-common.c:716:afr_unlock_inodelk] 0-gv3-replicate-0: No internal locks unlocked [2013-03-07 17:00:46.973950] D [afr-self-heal-metadata.c:478:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-gv3-replicate-0: Non Blocking metadata inodelks failed for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. [2013-03-07 17:00:46.973956] D [afr-self-heal-metadata.c:480:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-gv3-replicate-0: Metadata self-heal failed for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. but instead of retrying with blocking locks, like it says, it simply moves on to the data heal: [2013-03-07 17:00:46.973961] D [afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gv3-replicate-0: proceeding to data check on <gfid:78a027b5-dae3-4851-904c-926034438fb4> though later on it does try to sync metadata again: [2013-03-07 17:00:46.974540] T [afr-self-heal-metadata.c:355:afr_sh_metadata_sync_prepare] 0-gv3-replicate-0: syncing metadata of <gfid:78a027b5-dae3-4851-904c-926034438fb4> from subvolume gv3-client-1 to 1 active sinks ... [2013-03-07 17:00:46.975364] D [afr-self-heal-metadata.c:267:afr_sh_metadata_sync] 0-gv3-replicate-0: self-healing metadata of <gfid:78a027b5-dae3-4851-904c-926034438fb4> from gv3-client-1 to gv3-client-0 but then see the same failure to get non-blocking locks: [2013-03-07 17:00:46.975849] D [afr-self-heal-data.c:1134:afr_sh_data_post_nonblocking_inodelk_cbk] 0-gv3-replicate-0: Non Blocking data inodelks failed for <gfid:78a027b5-dae3-4851-904c-926034438fb4>. by bc60dc7ce87f0000 Interestingly, the pending_matrix shows -2 for the 2 bricks where the data was being synced from (not that I understand the meaning of this): [2013-03-07 17:00:46.976090] D [afr-self-heal-common.c:2510:afr_sh_erase_pending] 0-gv3-replicate-0: Delta matrix for: 7089dc7ce87f0000 [2013-03-07 17:00:46.976107] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv3-replicate-0: pending_matrix: [ 0 0 0 ] [2013-03-07 17:00:46.976121] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv3-replicate-0: pending_matrix: [ -2 0 0 ] [2013-03-07 17:00:46.976134] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv3-replicate-0: pending_matrix: [ -2 0 0 ] and then proceeds to data check ... [2013-03-07 17:00:46.977685] D [afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gv3-replicate-0: proceeding to data check on <gfid:78a027b5-dae3-4851-904c-926034438fb4> I'll attach the full log in a bit. Todd Created attachment 706770 [details]
self-heal trace log
Here's the self-heal log with trace enabled I referenced from the previous comment.
Todd
Todd,
I tried to re-create the issue locally on my machine, but I am not able to :-(. Here are the two things I tried using the test-frame work we have. Please let me know what I am missing.
$CLI is gluster --mode=script
$V0 is name of the volume
$H0 is ip/hostname of the machine
$M0 is the mount point
$B0 is the directory in which bricks reside in the script.
TEST fails if the command fails
EXPECT <expected-value> <command-which-is-supposed-to-give-value>
Test 1:
#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
cleanup;
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}0 $H0:$B0/${V0}1
TEST $CLI volume set $V0 cluster.self-heal-daemon off
TEST $CLI volume start $V0
TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --attribute-timeout=0 --entry-timeout=0
#stat function gives modify time/ access time. We get the time stamp based on type
function get_timestamp {
local f=$1
local type=$2
stat $f | grep $type | grep -v Uid | awk '{print $3}' | cut -f1 -d'.'
}
#kill brick1
TEST kill_brick $V0 $H0 $B0/${V0}1
#file is created only in brick0 healing will happen from brick0 to brick1
echo "abc" > $M0/a
TEST $CLI volume start $V0 force
#wait till brick1 comes up
EXPECT_WITHIN 30 "1" afr_child_up_status $V0 1
#kill brick0
TEST kill_brick $V0 $H0 $B0/${V0}0
#file is created only in brick1 healing will happen from brick1 to brick0
echo "abc" > $M0/b
TEST $CLI volume start $V0 force
#wait till brick0 comes up
EXPECT_WITHIN 30 "1" afr_child_up_status $V0 0
#get time stamps before self-heal
TEST access_tstampa_0=$(get_timestamp $B0/${V0}0/a Access)
TEST modify_tstampa_0=$(get_timestamp $B0/${V0}0/a Modify)
TEST access_tstampb_1=$(get_timestamp $B0/${V0}1/b Access)
TEST modify_tstampb_1=$(get_timestamp $B0/${V0}1/b Modify)
#wait for 2 seconds to see if after healing the time stamps differ or not
sleep 2
#trigger self-heals
find $M0 | xargs stat 1>/dev/null
#get time stamps after self-heal on the bricks they are healed to
TEST access_tstampa_1=$(get_timestamp $B0/${V0}1/a Access)
TEST modify_tstampa_1=$(get_timestamp $B0/${V0}1/a Modify)
TEST access_tstampb_0=$(get_timestamp $B0/${V0}0/b Access)
TEST modify_tstampb_0=$(get_timestamp $B0/${V0}0/b Modify)
#check if the timestamps match or not.
EXPECT $access_tstampa_0 echo $access_tstampa_1
EXPECT $modify_tstampa_0 echo $modify_tstampa_1
EXPECT $access_tstampb_0 echo $access_tstampb_1
EXPECT $modify_tstampb_0 echo $modify_tstampb_1
cleanup;
Test-2:
#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
cleanup;
#Simulates a drive replacement by deleting file and its gfid-link and then
# check if the timestamps match after self-heal
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}0 $H0:$B0/${V0}1
TEST $CLI volume set $V0 cluster.self-heal-daemon off
TEST $CLI volume start $V0
TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --attribute-timeout=0 --entry-timeout=0
function get_timestamp {
local f=$1
local type=$2
stat $f | grep $type | grep -v Uid | awk '{print $3}' | cut -f1 -d'.'
}
echo "abc" > $M0/a
echo "abc" > $M0/b
TEST kill_brick $V0 $H0 $B0/${V0}0
#deletes both the file and its gfid link from .glusterfs
TEST gf_rm_file_and_gfid_link $B0/${V0}0/a
TEST gf_rm_file_and_gfid_link $B0/${V0}0/b
TEST access_tstampa_1=$(get_timestamp $B0/${V0}1/a Access)
TEST modify_tstampa_1=$(get_timestamp $B0/${V0}1/a Modify)
TEST access_tstampb_1=$(get_timestamp $B0/${V0}1/b Access)
TEST modify_tstampb_1=$(get_timestamp $B0/${V0}1/b Modify)
TEST $CLI volume start $V0 force
EXPECT_WITHIN 30 "1" afr_child_up_status $V0 0
sleep 2
find $M0 | xargs stat 1>/dev/null
TEST access_tstampa_0=$(get_timestamp $B0/${V0}0/a Access)
TEST modify_tstampa_0=$(get_timestamp $B0/${V0}0/a Modify)
TEST access_tstampb_0=$(get_timestamp $B0/${V0}0/b Access)
TEST modify_tstampb_0=$(get_timestamp $B0/${V0}0/b Modify)
EXPECT $access_tstampa_0 echo $access_tstampa_1
EXPECT $modify_tstampa_0 echo $modify_tstampa_1
EXPECT $access_tstampb_0 echo $access_tstampb_1
EXPECT $modify_tstampb_0 echo $modify_tstampb_1
cleanup;
Here's the 2-brick test case:
US-CA1 host13 (root) /var/log/glusterfs:$ mkdir /gfs0/brick0{1,2}
US-CA1 host13 (root) /var/log/glusterfs:$ gluster volume create gv01 replica 2 transport tcp host13:/gfs0/brick01 host13:/gfs0/brick02
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume? (y/n) y
volume create: gv01: success: please start the volume to access data
US-CA1 host13 (root) /var/log/glusterfs:$ gluster volume set gv01 nfs.disable on
volume set: success
US-CA1 host13 (root) /var/log/glusterfs:$ gluster volume start gv01
volume start: gv01: success
US-CA1 host13 (root) /var/log/glusterfs:$ mount -t glusterfs host13:/gv01 /gv01
US-CA1 host13 (root) /var/log/glusterfs:$ cp -p /etc/init.d/glusterd /gv01
US-CA1 host13 (root) /var/log/glusterfs:$ ls -la --full-time /gfs0/brick0?/g*
-rwxr-xr-x 2 root root 1979 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick01/glusterd
-rwxr-xr-x 2 root root 1979 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick02/glusterd
US-CA1 host13 (root) /var/log/glusterfs:$ ps -ef | grep brick01
root 4926 1 0 04:48 ? 00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id gv01.host13.gfs0-brick01 -p /var/lib/glusterd/vols/gv01/run/host13-gfs0-brick01.pid -S /var/run/a7c26a6374c851d7baf4c925dfd87d0f.socket --brick-name /gfs0/brick01 -l /var/log/glusterfs/bricks/gfs0-brick01.log --xlator-option *-posix.glusterd-uuid=860e1ecd-78c7-4a99-a642-e8eac0919a56 --brick-port 49159 --xlator-option gv01-server.listen-port=49159
root 5012 7522 0 04:48 pts/0 00:00:00 grep brick01
US-CA1 host13 (root) /var/log/glusterfs:$ kill 4926
US-CA1 host13 (root) /var/log/glusterfs:$ rm -f /gfs0/brick01/glusterd /gfs0/brick01/.glusterfs/
00/ b5/ indices/ landfill/
US-CA1 host13 (root) /var/log/glusterfs:$ rm -f /gfs0/brick01/glusterd /gfs0/brick01/.glusterfs/b5/5b/b55b1a9e-0ebd-4d8a-8837-3f5d1c9977be
US-CA1 host13 (root) /var/log/glusterfs:$ ls -la --full-time /gfs0/brick0?/g*
-rwxr-xr-x 2 root root 1979 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick02/glusterd
US-CA1 host13 (root) /var/log/glusterfs:$ gluster volume start gv01 force
volume start: gv01: success
US-CA1 host13 (root) /var/log/glusterfs:$ ls -la --full-time /gfs0/brick0?/g*
-rwxr-xr-x 2 root root 1979 2013-02-08 13:32:41.000000000 +0000 /gfs0/brick02/glusterd
US-CA1 host13 (root) /var/log/glusterfs:$ gluster volume heal gv01 full
Launching Heal operation on volume gv01 has been successful
Use heal info commands to check status
US-CA1 host13 (root) /var/log/glusterfs:$ ls -la --full-time /gfs0/brick0?/g*
-rwxr-xr-x 2 root root 1979 2013-03-08 04:49:49.645296000 +0000 /gfs0/brick01/glusterd
-rwxr-xr-x 2 root root 1979 2013-03-08 04:49:49.645296000 +0000 /gfs0/brick02/glusterd
Todd
CHANGE: http://review.gluster.org/4658 (cluster/afr: Preserve mtime in self-heal) merged in master by Vijay Bellur (vbellur) Will this fix be backported to 3.3.1? or only available in some future 3.4 build? I'm looking for the most stable way to be able to utilize this soon on Centos 6. Todd Given that this is a fix for data corruption, I'm officially asking for this to get backported to at least the last stable release, 3.3.1. I'm not sure how to officially request this, but it doesn't make sense to force people to upgrade to 3.4 to get the fix. Todd +1. Please backport bugfixes to 3.3. (In reply to comment #15) > Given that this is a fix for data corruption, I'm officially asking for this > to get backported to at least the last stable release, 3.3.1. I'm not sure > how to officially request this, but it doesn't make sense to force people to > upgrade to 3.4 to get the fix. > > Todd right now requesting back ports is ad hoc, but we'll document a process for it. keep an eye on gluster-users. http://review.gluster.com/4663 http://review.gluster.com/4664 Those are the backport patches for 3.4 and 3.3 Pranith Thanks for the backports. Any idea how long it might take to get reviews for these? We're trying to use 3.3.1, but without this patch, we can't feel confident and this piece of our project is on hold. So, I'm just curious how long I should expect the backport review to take so I can prepare. I know I could build it myself, but I'd much rather have a package that's built using kkeithley's standard build options/environment. Therefore, I'm waiting for this to go in. Thanks for the great turnaround on this bugfix, by the way! Todd COMMIT: http://review.gluster.org/4664 committed in release-3.3 by Vijay Bellur (vbellur) ------ commit a216f5f44675bfe189c318171dbc50e1c19bfc26 Author: Anand Avati <avati> Date: Tue Apr 9 17:22:01 2013 -0700 tests: fix further issues with bug-874498.t The failure of bug-874498.t seems to be a "bug" in glustershd. The situation seems to be when both subvolumes of a replica are "local" to glustershd, and in such cases glustershd is sensitive to the order in which the subvols come up. The core of the issue itself is that, without the patch (#4784), self-heal daemon completes the processing of index and no entries are left inside the xattrop index after a few seconds of volume start force. However with the patch, the stale "backing file" (against which index performs link()) is left. The likely reason is that an "INDEX" based crawl is not happening against the subvol when this patch is applied. Before #4784 patch, the order in which subvols came up was : [2013-04-09 22:55:35.117679] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49156, attached to remote volume '/d/backends/brick1'. ... [2013-04-09 22:55:35.118399] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49157, attached to remote volume '/d/backends/brick2'. However, with the patch, the order is reversed: [2013-04-09 22:53:34.945370] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49153, attached to remote volume '/d/backends/brick2'. ... [2013-04-09 22:53:34.950966] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49152, attached to remote volume '/d/backends/brick1'. The index in brick2 has the list of files/gfid to heal. It appears to be the case that when brick1 is the first subvol to be detected as coming up, somehow an INDEX based crawl is clearing all the index entries in brick2, but if brick2 comes up as the first subvol, then the backing file is left stale. Also, doing a "gluster volume heal full" seems to leave out stale backing files too. As the crawl is performed on the namespace and the backing file is never encountered there to get cleared out. So the interim (possibly permanent) fix is to have the script issue a regular self-heal command (and not a "full" one). The failure of the script itself is non-critical. The data files are all healed, and it is just the backing file which is left behind. The stale backing file too gets cleared in the next index based healing, either triggered manually or after 10mins. Change-Id: I5deb79652ef449b7e88684311e804a8a2aa4725d BUG: 874498 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/4798 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Jeff Darcy <jdarcy> COMMIT: http://review.gluster.org/4663 committed in release-3.4 by Anand Avati (avati) ------ commit 77a02c4dd8467e2a78b3ab3cdef95178ef4b1898 Author: Kaleb S. KEITHLEY <kkeithle> Date: Thu Apr 11 09:36:35 2013 -0400 object-storage: rebase Swift to 1.8.0 (grizzly) Two minor tweaks found while packaging 3.4.0-0.1.alpha2 for Fedora 19 BUG: 948039 Change-Id: I97175636164702cf4042bc4a18ffead76ad386cb Signed-off-by: Kaleb S. KEITHLEY <kkeithle> Reviewed-on: http://review.gluster.org/4807 Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> |