Bug 1311881

Summary: VM paused with permission denied and Invalid argument errors in fuse mount logs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sahina Bose <sabose>
Component: replicateAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED WORKSFORME QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: kdhananj, knarra, pkarampu, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-17 05:10:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1258386    

Description Sahina Bose 2016-02-25 09:26:28 UTC
Description of problem:

On running replace-brick on a replica 3 volume, the VM running on the gluster volume goes to paused state.

Details:

rhsdev9, rhsdev-docker1, rhsdev-docker2 - running volume engine (replica 3)

Volume Name: engine
Type: Replicate
Volume ID: 7830b6f2-4cdf-4ec5-bb2b-c8804196a554
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsdev9.xxx:/rhgs/engine/brick1
Brick2: rhsdev-docker1.xxx:/rhgs/engine/brick1
Brick3: rhsdev-docker2.xxx:/rhgs/engine/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on


1. Peer probed new node(rhsdev14) to this cluster
2.  gluster volume replace-brick engine rhsdev-docker1.lab.eng.blr.redhat.com:/rhgs/engine/brick1 rhsdev14.lab.eng.blr.redhat.com:/rhgs/engine/brick1 commit force
3. gluster volume heal engine info - reported unsynced entries , and I waited for heal to complete

qemu VM that was running on rhsdev-docker1 goes to paused state with EOTHER error code

Mount log on rhsdev-docker1 :

[2016-02-23 11:33:44.759843] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 3-engine-client-1: remote operation failed. Path: <gfid:5947ba69-bd6b-48f9-9bee-8cc60f8da2fc> (5947ba69-bd6b-48f9-9bee-8cc60f8da2fc) [Permission denied]
[2016-02-23 11:33:44.779746] I [MSGID: 108026] [afr-self-heal-common.c:662:afr_log_selfheal] 3-engine-replicate-0: Completed data selfheal on 5947ba69-bd6b-48f9-9bee-8cc60f8da2fc. source=0 sinks=1
[2016-02-23 11:33:44.784343] W [MSGID: 114031] [client-rpc-fops.c:2325:client3_3_setattr_cbk] 3-engine-client-1: remote operation failed [Operation not permitted]
[2016-02-23 11:33:44.784716] W [MSGID: 114031] [client-rpc-fops.c:1164:client3_3_getxattr_cbk] 3-engine-client-1: remote operation failed. Path: <gfid:5947ba69-bd6b-48f9-9bee-8cc60f8da2fc> (5947ba69-bd6b-48f9-9bee-8cc60f8da2fc). Key: (null) [Permission denied]
[2016-02-23 11:33:44.785114] W [MSGID: 114031] [client-rpc-fops.c:1088:client3_3_setxattr_cbk] 3-engine-client-1: remote operation failed [Permission denied]
[2016-02-23 11:33:44.788631] I [MSGID: 108026] [afr-self-heal-common.c:662:afr_log_selfheal] 3-engine-replicate-0: Completed metadata selfheal on 5947ba69-bd6b-48f9-9bee-8cc60f8da2fc. source=0 sinks=
[2016-02-23 11:33:47.561145] I [MSGID: 108026] [afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do] 3-engine-replicate-0: performing metadata selfheal on 7ebdb94b-26fa-40b5-92d1-581c6f15063a
[2016-02-23 11:33:47.563518] W [MSGID: 114031] [client-rpc-fops.c:2325:client3_3_setattr_cbk] 3-engine-client-1: remote operation failed [Operation not permitted]
[2016-02-23 11:33:47.563999] W [MSGID: 114031] [client-rpc-fops.c:1164:client3_3_getxattr_cbk] 3-engine-client-1: remote operation failed. Path: /5f3618cd-cf2f-4f11-9ee4-670b2841616f/images/cc4c6f3d-f71e-4380-850f-7ee0cab016a5/ffcd824c-f473-43a7-97e8-2bd08053e6b8 (7ebdb94b-26fa-40b5-92d1-581c6f15063a). Key: (null) [Permission denied]
[2016-02-23 11:33:47.564546] W [MSGID: 114031] [client-rpc-fops.c:1088:client3_3_setxattr_cbk] 3-engine-client-1: remote operation failed [Permission denied]
[2016-02-23 11:33:47.569011] I [MSGID: 108026] [afr-self-heal-common.c:662:afr_log_selfheal] 3-engine-replicate-0: Completed metadata selfheal on 7ebdb94b-26fa-40b5-92d1-581c6f15063a. source=0 sinks=
[2016-02-23 11:33:44.782783] I [MSGID: 108026] [afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do] 3-engine-replicate-0: performing metadata selfheal on 5947ba69-bd6b-48f9-9bee-8cc60f8da2fc
[2016-02-23 11:45:14.475350] W [fuse-bridge.c:1284:fuse_err_cbk] 0-glusterfs-fuse: 64097409: FSYNC() ERR => -1 (Invalid argument) 

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-18.32.giteb76d88.el7rhgs.x86_64


How reproducible:
Not always

Steps to Reproduce:
As above


Additional info:
Logs from the 3 nodes will be attached

Comment 2 Sahina Bose 2016-02-25 10:03:30 UTC
Log files are at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1311881/

Comment 5 Sahina Bose 2016-05-17 05:10:49 UTC
Could not reproduce the permission denied errors on trying with latest glusterfs3.7.9-4(rhgs). However, a new issue was encountered and logged as Bug 1336295.

Will re-open this if encountered again.