1389422 – SMB[md-cache Private Build]:Error messages in brick logs related to upcall_cache_invalidate gf_uuid_is_null

Bug 1389422 - SMB[md-cache Private Build]:Error messages in brick logs related to upcall_cache_invalidate gf_uuid_is_null

Summary: SMB[md-cache Private Build]:Error messages in brick logs related to upcall_ca...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	upcall
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Poornima G
QA Contact:	Vivek Das
Docs Contact:
URL:
Whiteboard:
Depends On:	1379935 1392167 1394186 1394187 1394188
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-10-27 14:34 UTC by Rahul Hinduja
Modified:	2017-03-23 06:15 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.8.4-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1379935
Environment:
Last Closed:	2017-03-23 06:15:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Rahul Hinduja 2016-10-27 14:34:52 UTC

+++ This bug was initially created as a clone of Bug #1379935 +++

Description of problem:
*******************************

On running test cases related to AFR self heal on creating multiple files and directories , there are log errors related to gf_uuid_is_null in upcall_cache_invalidate.

The errors occurs where there is file create/write operation.

[2016-09-27 06:35:20.639063] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument]
[2016-09-27 06:35:22.717196] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument]
[2016-09-27 06:36:20.976194] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument]

Volume Name: vol1
Type: Distributed-Replicate
Volume ID: afb432f1-55d4-463c-bd50-ba19a63561e3
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.64:/mnt/brick/vol1/b1
Brick2: 10.70.47.66:/mnt/brick/vol1/b2
Brick3: 10.70.47.64:/mnt/brick/vol1/b3
Brick4: 10.70.47.66:/mnt/brick/vol1/b4
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
storage.batch-fsync-delay-usec: 0
server.allow-insecure: on
performance.stat-prefetch: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
cluster.metadata-self-heal: off
cluster.data-self-heal: off
cluster.entry-self-heal: off
cluster.self-heal-daemon: on
cluster.data-self-heal-algorithm: full



Version-Release number of selected component (if applicable):
glusterfs-3.8.3-0.39.git97d1dde.el7.x86_64


How reproducible:
Always 

Steps to Reproduce:
1. Create 2x2 volume, se vol options for md-cache , mount it on cifs
2. Start creating directories and files and run afr cases with brick down
3. observe logs for any errors

Actual results:
Lots of error messages related to upcall_cache_invalidate on create/write.


Expected results:
These type of error messages shouldn't be there.

Additional info:

--- Additional comment from Prasad Desala on 2016-10-14 09:34:05 EDT ---

Same issue is seen with private glusterfs build: 3.8.4-2.26.git0a405a4.el7rhgs.x86_64

This issue is specific to md-cache upcall, with the setup in same state disabled below md-cache options (see gluster v info ouput for more info) and ERROR messages were not seen in brick logs. Enabling the md-cache started spamming the brick logs with the error messages.

Steps that were performed:

1. Create a distributed replica volume and started it.
2. Enabled md-cache supported options to the volume. Please see below gluster v info for the more details on the md-cache enabled options.
3. Mounted volume on multiple clients. Simultaneosuly, from one client touch 10000 files and from another client create 10000 hardlinks for the same file.
4. Add few bricks and start rebalance.
5. Once the rebalance is completed, remove all the files on the mount point using "rm -rf".
6.  Check for brick logs, for the invalid argument messages.

--- Additional comment from Prasad Desala on 2016-10-14 09:34:58 EDT ---


[root@dhcp42-185 ~]# gluster v status
Status of volume: distrep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.185:/bricks/brick0/b0        49152     0          Y       16587
Brick 10.70.43.152:/bricks/brick0/b0        49152     0          Y       19074
Brick 10.70.42.39:/bricks/brick0/b0         49152     0          Y       19263
Brick 10.70.42.84:/bricks/brick0/b0         49152     0          Y       19630
Brick 10.70.42.185:/bricks/brick1/b1        49153     0          Y       16607
Brick 10.70.43.152:/bricks/brick1/b1        49153     0          Y       19094
Brick 10.70.42.39:/bricks/brick1/b1         49153     0          Y       19283
Brick 10.70.42.84:/bricks/brick1/b1         49153     0          Y       19650
Brick 10.70.42.185:/bricks/brick2/b2        49154     0          Y       16627
Brick 10.70.43.152:/bricks/brick2/b2        49154     0          Y       19114
Brick 10.70.42.39:/bricks/brick2/b2         49154     0          Y       19303
Brick 10.70.42.84:/bricks/brick2/b2         49154     0          Y       19670
Brick 10.70.42.185:/bricks/brick3/b3        49155     0          Y       19472
Brick 10.70.43.152:/bricks/brick3/b3        49155     0          Y       19380
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       19493
NFS Server on 10.70.42.39                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.42.39             N/A       N/A        Y       19588
NFS Server on 10.70.42.84                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.42.84             N/A       N/A        Y       19979
NFS Server on 10.70.43.152                  N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.43.152            N/A       N/A        Y       19401
 
Task Status of Volume distrep
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 19b1127e-246e-4afd-b59b-9690b9569122
Status               : completed           
 
[root@dhcp42-185 ~]# gluster v info
 
Volume Name: distrep
Type: Distributed-Replicate
Volume ID: 4ad479e4-fa01-4d91-8743-4e1510ba2c13
Status: Started
Snapshot Count: 0
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.42.185:/bricks/brick0/b0
Brick2: 10.70.43.152:/bricks/brick0/b0
Brick3: 10.70.42.39:/bricks/brick0/b0
Brick4: 10.70.42.84:/bricks/brick0/b0
Brick5: 10.70.42.185:/bricks/brick1/b1
Brick6: 10.70.43.152:/bricks/brick1/b1
Brick7: 10.70.42.39:/bricks/brick1/b1
Brick8: 10.70.42.84:/bricks/brick1/b1
Brick9: 10.70.42.185:/bricks/brick2/b2
Brick10: 10.70.43.152:/bricks/brick2/b2
Brick11: 10.70.42.39:/bricks/brick2/b2
Brick12: 10.70.42.84:/bricks/brick2/b2
Brick13: 10.70.42.185:/bricks/brick3/b3
Brick14: 10.70.43.152:/bricks/brick3/b3
Options Reconfigured:
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on

--- Additional comment from Atin Mukherjee on 2016-10-15 06:35:05 EDT ---

(In reply to Prasad Desala from comment #1)
> Same issue is seen with private glusterfs build:
> 3.8.4-2.26.git0a405a4.el7rhgs.x86_64

Prasad - this is an upstream bug and no reference of downstream build should be mentioned here. If required, please file a downstream bug to track the issue.
> 
> This issue is specific to md-cache upcall, with the setup in same state
> disabled below md-cache options (see gluster v info ouput for more info) and
> ERROR messages were not seen in brick logs. Enabling the md-cache started
> spamming the brick logs with the error messages.
> 
> Steps that were performed:
> 
> 1. Create a distributed replica volume and started it.
> 2. Enabled md-cache supported options to the volume. Please see below
> gluster v info for the more details on the md-cache enabled options.
> 3. Mounted volume on multiple clients. Simultaneosuly, from one client touch
> 10000 files and from another client create 10000 hardlinks for the same file.
> 4. Add few bricks and start rebalance.
> 5. Once the rebalance is completed, remove all the files on the mount point
> using "rm -rf".
> 6.  Check for brick logs, for the invalid argument messages.

--- Additional comment from Poornima G on 2016-10-19 01:27:30 EDT ---

The fix for this would be to reduce the loglevel from Error to Debug, this bug is not introduced as a part of md-cache changes.

Comment 2 Poornima G 2016-11-04 10:58:13 UTC

Patch posted upstream : REVIEW: http://review.gluster.org/15777 (upcall: Fix a log level) posted (#1) for review on master

Comment 3 Atin Mukherjee 2016-11-07 04:19:22 UTC

Poornima - can this be devel_acked for 3.2.0?

Comment 4 Poornima G 2016-11-07 08:55:18 UTC

I haven't posted the patch downstream yet, hence moving back to assigned

Comment 9 Nag Pavan Chilakam 2016-11-10 11:47:10 UTC

seen this on my systemic setup too
 /rhs/brick1/drvol/.trashcan/internal_op failed [File exists]
[2016-11-07 11:25:27.202460] E [upcall-internal.c:570:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.4/xlator/features/access-control.so(+0xad49) [0x7fe23211ad49] -->/usr/lib64/glusterfs/3.8.4/xlator/features/locks.so(+0xd4f2) [0x7fe231ef94f2] -->/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so(+0x5b3e) [0x7fe2316adb3e] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument]

Comment 11 Poornima G 2016-11-18 09:48:30 UTC

Master:  http://review.gluster.org/#/c/15777/
3.9 : http://review.gluster.org/#/c/15827/
3.8 : http://review.gluster.org/#/c/15828/
3.7 : http://review.gluster.org/#/c/15830/

Downstream :  https://code.engineering.redhat.com/gerrit/#/c/90547/

Comment 13 Vivek Das 2017-01-13 13:01:39 UTC

Created a Distribute Replica volume with md-cache, ran IOs, IOs with add brick and remove brick.

Followed the steps to reproduced and did not find any upcall error messages in brick logs.

Version
--------
samba-client-libs-4.4.6-4.el7rhgs.x86_64
glusterfs-3.8.4-11.el7rhgs.x86_64

Marking this as Verified.

Comment 15 errata-xmlrpc 2017-03-23 06:15:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.