Bug 1642638 - Log-file rotation on a Disperse Volume while a failed brick results in files that cannot be healed.
Summary: Log-file rotation on a Disperse Volume while a failed brick results in files ...
Keywords:
Status: CLOSED DUPLICATE of bug 1359681
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.12
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-24 21:04 UTC by Jeff Byers
Modified: 2018-10-31 12:49 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-10-31 12:49:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Byers 2018-10-24 21:04:28 UTC
Log-file rotation on a Disperse Volume while a failed brick results in files that cannot be healed.

Simple log-file rotation on a Disperse Volume while there is a
failed brick results in files that cannot later be healed when
the brick is restored.

The rotation here is simply closing and renaming the old file
and creating a new file of the old name in the same directory.
What seems to happen is that on the failed and then restored
brick links both the new file and the renamed old file
together.

The only way to manually get the file to heal is to either
copy the file to a temp name, delete it, and move the temp
file to the original name. However, this is not an acceptable
procedure when the application has the file opened.

Another manual procedure that seems to work is to delete all
of the old name and new name files from the previously failed
brick, and run heal.

This problem is likely related to:

  Bug 1359681 – [disperse] Data gain while brick is down and rename a file
  https://bugzilla.redhat.com/show_bug.cgi?id=1359681
  
with the added issue of persistent heal failures, and manual
intervention being required.

Test plan below.

# gluster --version
glusterfs 3.12.14
##### Create the disperse volume.
# mkdir brick1 brick2 brick3
# gluster volume create disperse-vol disperse-data 2 redundancy 1 transport tcp \
    10.10.60.181:/exports/nas-segment-0002/brick1 \
    10.10.60.181:/exports/nas-segment-0002/brick2 \
    10.10.60.181:/exports/nas-segment-0002/brick3 force
volume create: disperse-vol: success: please start the volume to access data
# gluster volume start disperse-vol
volume start: disperse-vol: success
# mkdir /mnt/disperse-vol
# mount -t glusterfs 10.10.60.181:/disperse-vol /mnt/disperse-vol/

##### Place a test file, verify the contents.
# cp -i /var/log/messages /mnt/disperse-vol/
# sum /mnt/disperse-vol/messages
36906  4082
# gluster volume status disperse-vol
Status of volume: disperse-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.60.181:/exports/nas-segment-000
2/brick1                                    62001     0          Y       24636
Brick 10.10.60.181:/exports/nas-segment-000
2/brick2                                    62008     0          Y       24656
Brick 10.10.60.181:/exports/nas-segment-000
2/brick3                                    62009     0          Y       24676
Self-heal Daemon on localhost               N/A       N/A        Y       24717
Task Status of Volume disperse-vol
------------------------------------------------------------------------------
There are no active volume tasks

##### Simulate a brick failure by killing the brick process for brick3.
# kill 24676
##### Verify the contents.
# sum /mnt/disperse-vol/messages
36906  4082
##### Manually simulate a logfile rotation.
# mv /mnt/disperse-vol/messages /mnt/disperse-vol/messages-1
# cp -i /var/log/boot.log /mnt/disperse-vol/messages
##### Verify the contents.
# sum /mnt/disperse-vol/messages
58391     5
# sum /mnt/disperse-vol/messages-1
36906  4082

##### Fix the brick failure by force starting disperse-vol.
# gluster volume start disperse-vol force
volume start: disperse-vol: success

##### Force healing, and note that the new logfile is not healthy.
# gluster volume heal disperse-vol full
# gluster volume heal disperse-vol info
Brick 10.10.60.181:/exports/nas-segment-0002/brick1
/
/messages
Status: Connected
Number of entries: 2
Brick 10.10.60.181:/exports/nas-segment-0002/brick2
/
/messages
Status: Connected
Number of entries: 2
Brick 10.10.60.181:/exports/nas-segment-0002/brick3
Status: Connected
Number of entries: 0

##### Problems with unable to heal in GlusterFS log file.
[2018-10-24 20:19:25.611393] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-0
[2018-10-24 20:19:25.611824] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-1
[2018-10-24 20:19:25.612487] I [MSGID: 122059] [ec-heald.c:407:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-2
[2018-10-24 20:19:25.619637] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal
[2018-10-24 20:19:25.628517] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-0
[2018-10-24 20:19:25.629298] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal
[2018-10-24 20:19:25.633801] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-2
[2018-10-24 20:19:25.636178] I [MSGID: 122002] [ec-heal.c:1343:__ec_heal_name] 0-disperse-vol-disperse-0: 00000000-0000-0000-0000-000000000001/messages: Not able to heal
[2018-10-24 20:19:25.637871] I [MSGID: 122059] [ec-heald.c:416:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-1

##### Verify the contents.
# sum /mnt/disperse-vol/messages
58391     5
# sum /mnt/disperse-vol/messages-1
36906  4082

##### Dump the 'messages' file attributes. Note that the previously failed brick has 2 gfid2paths.
# getfattr --absolute-names -m '.*' -d -e hex /exports/nas-segment-0002/brick*/messages
# file: /exports/nas-segment-0002/brick1/messages
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x000000000000128e
trusted.ec.version=0x00000000000000010000000000000001
trusted.gfid=0x4d041ae05fd649e0a9ee6fc502899c73
trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573
trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text)

# file: /exports/nas-segment-0002/brick2/messages
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x000000000000128e
trusted.ec.version=0x00000000000000010000000000000001
trusted.gfid=0x4d041ae05fd649e0a9ee6fc502899c73
trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573
trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text)

# file: /exports/nas-segment-0002/brick3/messages
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000003fc5d1
trusted.ec.version=0x00000000000000200000000000000020
trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a
trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31
trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573
trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text)
trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text)

##### Dump the 'messages-1' file attributes. Note that the previously failed brick has 2 gfid2paths.
# getfattr --absolute-names -m '.*' -d -e hex /exports/nas-segment-0002/brick*/messages-1
# file: /exports/nas-segment-0002/brick1/messages-1
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000003fc5d1
trusted.ec.version=0x00000000000000200000000000000020
trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a
trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31
trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text)

# file: /exports/nas-segment-0002/brick2/messages-1
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000003fc5d1
trusted.ec.version=0x00000000000000200000000000000020
trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a
trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31
trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text)

# file: /exports/nas-segment-0002/brick3/messages-1
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x00000000003fc5d1
trusted.ec.version=0x00000000000000200000000000000020
trusted.gfid=0x1d288eab6fca4561a162f0a6b13a6d3a
trusted.gfid2path.084a6f56cf401310=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d657373616765732d31
trusted.gfid2path.f07d8a47ec444a18=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6d65737361676573
trusted.gfid2path.084a6f56cf401310="00000000-0000-0000-0000-000000000001/messages-1" (in text)
trusted.gfid2path.f07d8a47ec444a18="00000000-0000-0000-0000-000000000001/messages" (in text)

##### File 'messages' brick fragment stats.
# stat /exports/nas-segment-0002/brick*/messages
  File: `/exports/nas-segment-0002/brick1/messages'
  Size: 2560            Blocks: 8          IO Block: 4096   regular file
Device: 880h/2176d      Inode: 92          Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:18:37.935361600 -0700
Modify: 2018-10-24 13:18:37.953361595 -0700
Change: 2018-10-24 13:18:37.954361598 -0700
  File: `/exports/nas-segment-0002/brick2/messages'
  Size: 2560            Blocks: 8          IO Block: 4096   regular file
Device: 880h/2176d      Inode: 268435548   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:18:37.935361600 -0700
Modify: 2018-10-24 13:18:37.953361595 -0700
Change: 2018-10-24 13:18:37.954361598 -0700
  File: `/exports/nas-segment-0002/brick3/messages'
  Size: 2089984         Blocks: 4088       IO Block: 4096   regular file
Device: 880h/2176d      Inode: 537347155   Links: 3
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:17:16.301361665 -0700
Modify: 2018-10-24 13:17:16.438361664 -0700
Change: 2018-10-24 13:18:59.853361581 -0700

##### File 'messages-1' brick fragment stats.
# stat /exports/nas-segment-0002/brick*/messages-1
  File: `/exports/nas-segment-0002/brick1/messages-1'
  Size: 2089984         Blocks: 4088       IO Block: 4096   regular file
Device: 880h/2176d      Inode: 89          Links: 2
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:17:16.301361665 -0700
Modify: 2018-10-24 13:17:16.438361664 -0700
Change: 2018-10-24 13:18:59.872361580 -0700
  File: `/exports/nas-segment-0002/brick2/messages-1'
  Size: 2089984         Blocks: 4088       IO Block: 4096   regular file
Device: 880h/2176d      Inode: 268435544   Links: 2
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:17:16.301361665 -0700
Modify: 2018-10-24 13:17:16.438361664 -0700
Change: 2018-10-24 13:18:59.872361580 -0700
  File: `/exports/nas-segment-0002/brick3/messages-1'
  Size: 2089984         Blocks: 4088       IO Block: 4096   regular file
Device: 880h/2176d      Inode: 537347155   Links: 3
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2018-10-24 13:17:16.301361665 -0700
Modify: 2018-10-24 13:17:16.438361664 -0700
Change: 2018-10-24 13:18:59.853361581 -0700

##### Find the 'messages' file fragments on previously failed brick3.
# find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages -ls
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages-1
##### Find the 'messages-1' file fragments on previously failed brick3.
# find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages-1 -ls
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages
537347155 2044 -rw-------   3 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick3/messages-1

##### Find the 'messages' file fragments on always good brick1.
# find /exports/nas-segment-0002/brick1 -samefile /exports/nas-segment-0002/brick1/messages -ls
    92    4 -rw-r--r--   2 root     root         2560 Oct 24 13:18 /exports/nas-segment-0002/brick1/.glusterfs/4d/04/4d041ae0-5fd6-49e0-a9ee-6fc502899c73
    92    4 -rw-r--r--   2 root     root         2560 Oct 24 13:18 /exports/nas-segment-0002/brick1/messages
##### Find the 'messages-1' file fragments on always good brick1.
# find /exports/nas-segment-0002/brick1 -samefile /exports/nas-segment-0002/brick1/messages-1 -ls
    89 2044 -rw-------   2 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick1/.glusterfs/1d/28/1d288eab-6fca-4561-a162-f0a6b13a6d3a
    89 2044 -rw-------   2 root     root      2089984 Oct 24 13:17 /exports/nas-segment-0002/brick1/messages-1

##### "Fix" the healing by deleting all of the unhealable file 'messages' from the previously failed brick3.
# find /exports/nas-segment-0002/brick3 -samefile /exports/nas-segment-0002/brick3/messages -delete

##### Disperse Volume healing now works on the 'message' file.
# gluster volume heal disperse-vol full
# gluster volume heal disperse-vol info
Brick 10.10.60.181:/exports/nas-segment-0002/brick1
Status: Connected
Number of entries: 0
Brick 10.10.60.181:/exports/nas-segment-0002/brick2
Status: Connected
Number of entries: 0
Brick 10.10.60.181:/exports/nas-segment-0002/brick3
Status: Connected
Number of entries: 0

# sum /mnt/disperse-vol/messages
58391     5
# sum /mnt/disperse-vol/messages-1
36906  4082

Comment 1 Xavi Hernandez 2018-10-31 12:49:30 UTC
Thanks for all the information.

This bug is the same as bug #1359681. The only difference is that here you are 
recreating a file with the same original name, which causes a self-heal failure instead of data gain, but the underlying problem is the same: failure to correctly detect renamed entries before healing the original name.

Since this also happens on mainline version, I'll keep the older bug open and close this one as a duplicate.

*** This bug has been marked as a duplicate of bug 1359681 ***


Note You need to log in before you can comment on or make changes to this bug.