1161588 – ls -alR can not heal the disperse volume

Bug 1161588 - ls -alR can not heal the disperse volume

Summary: ls -alR can not heal the disperse volume

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Xavi Hernandez
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1159484
TreeView+	depends on / blocked

Reported:	2014-11-07 11:57 UTC by Xavi Hernandez
Modified:	2015-05-14 17:44 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:	1159484
Environment:
Last Closed:	2015-05-14 17:28:22 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Xavi Hernandez 2014-11-07 11:57:49 UTC

+++ This bug was initially created as a clone of Bug #1159484 +++

Steps to Reproduce:
1.gluster vol create test disperse 3 redundancy 1 10.10.21.20:/sdb 10.10.21.21:/sdb 10.10.21.22:/sdb force;
2.start the volume and mount it on /cluster2/test
3.cd /cluster2/test
4.mkdir a b c
5.touch a/1 b/2 c/3
6.gluster vol replace-brick test 10.10.21.22:/sdb 10.10.21.23:/sdb commit force
7.on 10.10.21.20 execute 'ls -alR /cluster2/test'

Actual results:
All directories are healed,but all files are not healed.
Then I try read/write/getattr on these file,find out that only write can heal them.

--- Additional comment from Xavier Hernandez on 2014-11-04 13:28:24 CET ---

Are you using 3.6.0beta3, the official 3.6.0 or a compiled one (which commit) ?

--- Additional comment from lidi on 2014-11-05 03:05:38 CET ---

3.6.0-beta3 and official 3.6.0 have the same problem.

Comment 1 Anand Avati 2014-11-07 12:43:32 UTC

REVIEW: http://review.gluster.org/9072 (ec: Fix self-healing issues.) posted (#1) for review on master by Xavier Hernandez (xhernandez)

Comment 2 Anand Avati 2014-11-17 11:21:31 UTC

REVIEW: http://review.gluster.org/9072 (ec: Fix self-healing issues.) posted (#2) for review on master by Xavier Hernandez (xhernandez)

Comment 3 Anand Avati 2014-12-04 19:34:42 UTC

COMMIT: http://review.gluster.org/9072 committed in master by Vijay Bellur (vbellur) 
------
commit bc91dd4de39ffd481a52b837f322f6782c14e9f1
Author: Xavier Hernandez <xhernandez>
Date:   Fri Nov 7 12:12:19 2014 +0100

    ec: Fix self-healing issues.
    
    Three problems have been detected:
    
    1. Self healing is executed in background, allowing the fop that
       detected the problem to continue without blocks nor delays.
    
       While this is quite interesting to avoid unnecessary delays,
       it can cause spurious failures of self-heal because it may
       try to recover a file inside a directory that a previous
       self-heal has not recovered yet, causing the file self-heal
       to fail.
    
    2. When a partial self-heal is being executed on a directory,
       if a full self-heal is attempted, it won't be executed
       because another self-heal is already in process, so the
       directory won't be fully repaired.
    
    3. Information contained in loc's of some fop's is not enough
       to do a complete self-heal.
    
    To solve these problems, I've made some changes:
    
    * Improved ec_loc_from_loc() to add all available information
      to a loc.
    
    * Before healing an entry, it's parent is checked and partially
      healed if necessary to avoid failures.
    
    * All heal requests received for the same inode while another
      self-heal is being processed are queued. When the first heal
      completes, all pending requests are answered using the results
      of the first heal (without full execution), unless the first
      heal was a partial heal. In this case all partial heals are
      answered, and the first full heal is processed normally.
    
    * An special virtual xattr (not physically stored on bricks)
      named 'trusted.ec.heal' has been created to allow synchronous
      self-heal of files.
    
      Now, the recommended way to heal an entire volume is this:
    
        find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
    
    Some minor changes:
    
    * ec_loc_prepare() has been renamed to ec_loc_update().
    
    * All loc management functions return 0 on success and -1 on
      error.
    
    * Do not delay fop unlocks if heal is needed.
    
    * Added basic ec xattrs initially on create, mkdir and mknod
      fops.
    
    * Some coding style changes
    
    Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985
    BUG: 1161588
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/9072
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>

Comment 4 Niels de Vos 2015-05-14 17:28:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:42 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:38:04 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:44:37 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.