1203089 – Disperse volume: misleading unsuccessful message with heal and heal full

Bug 1203089 - Disperse volume: misleading unsuccessful message with heal and heal full

Summary: Disperse volume: misleading unsuccessful message with heal and heal full

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ashish Pandey
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	qe_tracker_everglades 1224137 1232612
TreeView+	depends on / blocked

Reported:	2015-03-18 05:44 UTC by Bhaskarakiran
Modified:	2016-11-23 23:13 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:
Clones:	1224137 1232612 (view as bug list)
Environment:
Last Closed:	2016-06-16 12:43:55 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bhaskarakiran 2015-03-18 05:44:14 UTC

Description of problem:
======================

Though heal works on a disperse volume from the server side, the unsuccessful message is misleading as show below :

[root@vertigo bricks]# gluster v heal testvol full
Launching heal operation to perform full self heal on volume testvol has been unsuccessful

If the same command is given on the peer, commit failed message is thrown.

[root@ninja ~]# gluster v heal testvol full
Commit failed on 10.70.34.56. Please check log file for details.

[root@ninja ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.34.56
Uuid: 5656b0cb-9f99-4e7b-9125-95ea80b0c9a1
State: Peer in Cluster (Connected)
[root@ninja ~]# 


Version-Release number of selected component (if applicable):
=============================================================
[root@vertigo bricks]# gluster --version
glusterfs 3.7dev built on Mar 12 2015 01:40:59
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@vertigo bricks]# 

How reproducible:
=================
100%

Steps to Reproduce:
1. Create a disperse volume and create files and directories from client
2. Bring down 2 of the bricks and let the IO continue
3. Brick the bricks back up after some time and trigger heal with "gluster v heal <volname> full"

Actual results:


Expected results:
================
The message should be successful.


Additional info:
================
Sosreport of the node will be attached.

Gluster volume options :
========================

[root@ninja ~]# gluster v get testvol all
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      16                                      
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
cluster.quorum-type                     none                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.stripe-block-size               128KB                                   
cluster.stripe-coalesce                 true                                    
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.least-rate-limit            0                                       
performance.cache-size                  128MB                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.lazy-open                   yes                                     
performance.read-after-open             no                                      
performance.read-ahead-page-count       4                                       
performance.md-cache-timeout            1                                       
features.encryption                     off                                     
encryption.master-key                   (null)                                  
encryption.data-key-size                256                                     
encryption.block-size                   4096                                    
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
network.remote-dio                      disable                                 
client.event-threads                    4                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     (null)                                  
server.allow-insecure                   (null)                                  
server.root-squash                      off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
features.lock-heal                      off                                     
features.grace-timeout                  (null)                                  
server.ssl                              (null)                                  
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
client.send-gids                        on                                      
server.gid-timeout                      2                                       
server.own-thread                       (null)                                  
server.event-threads                    4                                       
performance.write-behind                on                                      
performance.read-ahead                  on                                      
performance.readdir-ahead               off                                     
performance.io-cache                    on                                      
performance.quick-read                  on                                      
performance.open-behind                 on                                      
performance.stat-prefetch               on                                      
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
features.file-snapshot                  off                                     
features.uss                            on                                      
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.limit-usage                    (null)                                  
features.quota-timeout                  0                                       
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              on                                      
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          on                                      
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             false                                   
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
features.read-only                      off                                     
features.worm                           off                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       -1                                      
storage.owner-gid                       -1                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.bd-aio                          off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             0                                       
changelog.changelog                     off                                     
changelog.changelog-dir                 (null)                                  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
locks.trace                             (null)                                  
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
[root@ninja ~]# 

Gluster volume status & info :
==============================

[root@ninja ~]# gluster v status
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick vertigo:/rhs/brick1/b1                49152     0          Y       4237 
Brick ninja:/rhs/brick1/b1                  49152     0          Y       4161 
Brick vertigo:/rhs/brick2/b2                49153     0          Y       4255 
Brick ninja:/rhs/brick2/b2                  49153     0          Y       4176 
Brick vertigo:/rhs/brick3/b3                49154     0          Y       2748 
Brick ninja:/rhs/brick3/b3                  49154     0          Y       2524 
Brick vertigo:/rhs/brick4/b4                49155     0          Y       2761 
Brick ninja:/rhs/brick4/b4                  49155     0          Y       2537 
Brick vertigo:/rhs/brick1/b1-1              49156     0          Y       2203 
Brick ninja:/rhs/brick1/b1-1                49156     0          Y       2550 
Brick vertigo:/rhs/brick2/b2-1              49157     0          Y       2218 
Brick ninja:/rhs/brick2/b2-1                49157     0          Y       2563 
Snapshot Daemon on localhost                49158     0          Y       2577 
NFS Server on localhost                     2049      0          Y       4192 
Quota Daemon on localhost                   N/A       N/A        Y       4210 
Snapshot Daemon on 10.70.34.56              49158     0          Y       2801 
NFS Server on 10.70.34.56                   2049      0          Y       682  
Quota Daemon on 10.70.34.56                 N/A       N/A        Y       701  
 
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@ninja ~]# 

=========================================================================

[root@ninja ~]# gluster v info
 
Volume Name: testvol
Type: Disperse
Volume ID: 7393260c-51d1-4dca-8fc8-e1f5ad6fee14
Status: Started
Number of Bricks: 1 x (8 + 4) = 12
Transport-type: tcp
Bricks:
Brick1: vertigo:/rhs/brick1/b1
Brick2: ninja:/rhs/brick1/b1
Brick3: vertigo:/rhs/brick2/b2
Brick4: ninja:/rhs/brick2/b2
Brick5: vertigo:/rhs/brick3/b3
Brick6: ninja:/rhs/brick3/b3
Brick7: vertigo:/rhs/brick4/b4
Brick8: ninja:/rhs/brick4/b4
Brick9: vertigo:/rhs/brick1/b1-1
Brick10: ninja:/rhs/brick1/b1-1
Brick11: vertigo:/rhs/brick2/b2-1
Brick12: ninja:/rhs/brick2/b2-1
Options Reconfigured:
features.quota-deem-statfs: on
cluster.disperse-self-heal-daemon: enable
features.uss: on
client.event-threads: 4
server.event-threads: 4
features.quota: on
[root@ninja ~]#

Comment 2 Anand Avati 2015-06-17 07:21:15 UTC

REVIEW: http://review.gluster.org/11267 ( ec: Display correct message after successful heal start) posted (#1) for review on master by Ashish Pandey (aspandey)

Comment 3 Mike McCune 2016-03-28 22:16:42 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Niels de Vos 2016-06-16 12:43:55 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.