1193022 – Disperse volume: Delete operation failed on some of the bricks

Bug 1193022 - Disperse volume: Delete operation failed on some of the bricks

Summary: Disperse volume: Delete operation failed on some of the bricks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Xavi Hernandez
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	qe_tracker_everglades 1224122
TreeView+	depends on / blocked

Reported:	2015-02-16 12:06 UTC by Bhaskarakiran
Modified:	2016-11-23 23:11 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1224122 (view as bug list)
Environment:
Last Closed:	2015-12-04 07:23:43 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of client (7.48 MB, application/x-xz) 2015-02-17 09:24 UTC, Bhaskarakiran	no flags	Details
sosreport of Node1 (8.57 MB, application/x-xz) 2015-02-17 09:26 UTC, Bhaskarakiran	no flags	Details
sosreport of Node2 (8.40 MB, application/x-xz) 2015-02-17 09:27 UTC, Bhaskarakiran	no flags	Details
View All

Description Bhaskarakiran 2015-02-16 12:06:32 UTC

Description of problem:
=======================

Untarred a linux tarball into multiple directories ~10 (linux.1 to linux.10) and tried to delete 2 of them (linux.1 and linux.2 directories). The delete operation succeded from the client but the entries are still listed in some of the bricks. 2 of the bricks were brought down/up at random 3-4 times while the deletes were happening.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3.7dev built on Feb 14 2015 01:05:51


Gluster volume options:
=======================

[root@vertigo ~]# gluster volume get testvol all 
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      16                                      
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
cluster.quorum-type                     none                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.stripe-block-size               128KB                                   
cluster.stripe-coalesce                 true                                    
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.least-rate-limit            0                                       
performance.cache-size                  128MB                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.lazy-open                   yes                                     
performance.read-after-open             no                                      
performance.read-ahead-page-count       4                                       
performance.md-cache-timeout            1                                       
features.encryption                     off                                     
encryption.master-key                   (null)                                  
encryption.data-key-size                256                                     
encryption.block-size                   4096                                    
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
network.remote-dio                      disable                                 
client.event-threads                    2                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     (null)                                  
server.allow-insecure                   (null)                                  
server.root-squash                      off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
features.lock-heal                      off                                     
features.grace-timeout                  (null)                                  
server.ssl                              (null)                                  
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
client.send-gids                        on                                      
server.gid-timeout                      2                                       
server.own-thread                       (null)                                  
server.event-threads                    2                                       
performance.write-behind                on                                      
performance.read-ahead                  on                                      
performance.readdir-ahead               off                                     
performance.io-cache                    on                                      
performance.quick-read                  on                                      
performance.open-behind                 on                                      
performance.stat-prefetch               on                                      
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
features.file-snapshot                  off                                     
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.limit-usage                    (null)                                  
features.quota-timeout                  0                                       
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          on                                      
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             false                                   
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
features.read-only                      off                                     
features.worm                           off                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       -1                                      
storage.owner-gid                       -1                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.bd-aio                          off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             0                                       
changelog.changelog                     off                                     
changelog.changelog-dir                 (null)                                  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
locks.trace                             disable                                 
cluster.disperse-self-heal-daemon       enable                                  
[root@vertigo ~]# 
[root@vertigo ~]# 
[root@vertigo ~]# 

Gluster volume status:
======================

[root@vertigo ~]# gluster v status
Status of volume: testvol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick ninja:/rhs/brick1/b1				49152	Y	19369
Brick vertigo:/rhs/brick1/b1				49152	Y	30191
Brick ninja:/rhs/brick2/b2				49153	Y	18934
Brick vertigo:/rhs/brick2/b2				49153	Y	28690
Brick ninja:/rhs/brick3/b3				49154	Y	17499
Brick vertigo:/rhs/brick3/b3				49158	Y	28705
NFS Server on localhost					2049	Y	30205
Quota Daemon on localhost				N/A	Y	30222
NFS Server on 10.70.34.68				2049	Y	19383
Quota Daemon on 10.70.34.68				N/A	Y	19400
 
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@vertigo ~]# 
[root@vertigo ~]# 

Gluster volume info:
====================

[root@vertigo ~]# gluster v info
 
Volume Name: testvol
Type: Disperse
Volume ID: 21ed8908-3458-4834-b93d-161b694c3e37
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: ninja:/rhs/brick1/b1
Brick2: vertigo:/rhs/brick1/b1
Brick3: ninja:/rhs/brick2/b2
Brick4: vertigo:/rhs/brick2/b2
Brick5: ninja:/rhs/brick3/b3
Brick6: vertigo:/rhs/brick3/b3
Options Reconfigured:
client.event-threads: 2
server.event-threads: 2
features.barrier: disable
cluster.disperse-self-heal-daemon: enable
features.quota: on
[root@vertigo ~]# 
[root@vertigo ~]# 
[root@vertigo ~]# 



How reproducible:
=================
Often


Steps to Reproduce:
1. Create a 1x(4+2) disperse volume
2. Untar a linux tarball into multiple directories.
3. Delete 1/2 directories. Check the bricks and try to list the them for server and client.

Actual results:
===============
stale entries listed on the client in multiple listing of the directories and some of the bricks list the directories which are deleted from the client


Expected results:
=================
No stale entries to be seen and deletes should be successful.


Additional info:

Comment 1 Bhaskarakiran 2015-02-16 12:10:11 UTC

sosreports of the client and server will be attached shortly as there's some problem with uploading.

Comment 2 Bhaskarakiran 2015-02-17 09:24:51 UTC

Created attachment 992568 [details]
sosreport of client

Comment 3 Bhaskarakiran 2015-02-17 09:26:18 UTC

Created attachment 992569 [details]
sosreport of Node1

Comment 4 Bhaskarakiran 2015-02-17 09:27:35 UTC

Created attachment 992570 [details]
sosreport of Node2

Comment 5 Pranith Kumar K 2015-05-28 14:35:16 UTC

Ashish tested that this case is working fine after http://review.gluster.org/10852 is merged.

Comment 6 Xavi Hernandez 2015-12-04 07:23:43 UTC

This bug is being closed as ERRATA since it seems to be solved by another patch already merged in master.

If the problem still happens, please reopen the bug or open a new one.

Note You need to log in before you can comment on or make changes to this bug.