+++ This bug was initially created as a clone of Bug #1379935 +++ Description of problem: ******************************* On running test cases related to AFR self heal on creating multiple files and directories , there are log errors related to gf_uuid_is_null in upcall_cache_invalidate. The errors occurs where there is file create/write operation. [2016-09-27 06:35:20.639063] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument] [2016-09-27 06:35:22.717196] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument] [2016-09-27 06:36:20.976194] E [upcall-internal.c:512:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.3/xlator/features/access-control.so(+0xad09) [0x7fd90aca4d09] -->/usr/lib64/glusterfs/3.8.3/xlator/features/locks.so(+0xd4f2) [0x7fd90aa834f2] -->/usr/lib64/glusterfs/3.8.3/xlator/features/upcall.so(+0x6cae) [0x7fd90a238cae] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument] Volume Name: vol1 Type: Distributed-Replicate Volume ID: afb432f1-55d4-463c-bd50-ba19a63561e3 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.64:/mnt/brick/vol1/b1 Brick2: 10.70.47.66:/mnt/brick/vol1/b2 Brick3: 10.70.47.64:/mnt/brick/vol1/b3 Brick4: 10.70.47.66:/mnt/brick/vol1/b4 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet storage.batch-fsync-delay-usec: 0 server.allow-insecure: on performance.stat-prefetch: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-samba-metadata: on performance.cache-invalidation: on performance.md-cache-timeout: 600 cluster.metadata-self-heal: off cluster.data-self-heal: off cluster.entry-self-heal: off cluster.self-heal-daemon: on cluster.data-self-heal-algorithm: full Version-Release number of selected component (if applicable): glusterfs-3.8.3-0.39.git97d1dde.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Create 2x2 volume, se vol options for md-cache , mount it on cifs 2. Start creating directories and files and run afr cases with brick down 3. observe logs for any errors Actual results: Lots of error messages related to upcall_cache_invalidate on create/write. Expected results: These type of error messages shouldn't be there. Additional info: --- Additional comment from Prasad Desala on 2016-10-14 09:34:05 EDT --- Same issue is seen with private glusterfs build: 3.8.4-2.26.git0a405a4.el7rhgs.x86_64 This issue is specific to md-cache upcall, with the setup in same state disabled below md-cache options (see gluster v info ouput for more info) and ERROR messages were not seen in brick logs. Enabling the md-cache started spamming the brick logs with the error messages. Steps that were performed: 1. Create a distributed replica volume and started it. 2. Enabled md-cache supported options to the volume. Please see below gluster v info for the more details on the md-cache enabled options. 3. Mounted volume on multiple clients. Simultaneosuly, from one client touch 10000 files and from another client create 10000 hardlinks for the same file. 4. Add few bricks and start rebalance. 5. Once the rebalance is completed, remove all the files on the mount point using "rm -rf". 6. Check for brick logs, for the invalid argument messages. --- Additional comment from Prasad Desala on 2016-10-14 09:34:58 EDT --- [root@dhcp42-185 ~]# gluster v status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.185:/bricks/brick0/b0 49152 0 Y 16587 Brick 10.70.43.152:/bricks/brick0/b0 49152 0 Y 19074 Brick 10.70.42.39:/bricks/brick0/b0 49152 0 Y 19263 Brick 10.70.42.84:/bricks/brick0/b0 49152 0 Y 19630 Brick 10.70.42.185:/bricks/brick1/b1 49153 0 Y 16607 Brick 10.70.43.152:/bricks/brick1/b1 49153 0 Y 19094 Brick 10.70.42.39:/bricks/brick1/b1 49153 0 Y 19283 Brick 10.70.42.84:/bricks/brick1/b1 49153 0 Y 19650 Brick 10.70.42.185:/bricks/brick2/b2 49154 0 Y 16627 Brick 10.70.43.152:/bricks/brick2/b2 49154 0 Y 19114 Brick 10.70.42.39:/bricks/brick2/b2 49154 0 Y 19303 Brick 10.70.42.84:/bricks/brick2/b2 49154 0 Y 19670 Brick 10.70.42.185:/bricks/brick3/b3 49155 0 Y 19472 Brick 10.70.43.152:/bricks/brick3/b3 49155 0 Y 19380 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 19493 NFS Server on 10.70.42.39 N/A N/A N N/A Self-heal Daemon on 10.70.42.39 N/A N/A Y 19588 NFS Server on 10.70.42.84 N/A N/A N N/A Self-heal Daemon on 10.70.42.84 N/A N/A Y 19979 NFS Server on 10.70.43.152 N/A N/A N N/A Self-heal Daemon on 10.70.43.152 N/A N/A Y 19401 Task Status of Volume distrep ------------------------------------------------------------------------------ Task : Rebalance ID : 19b1127e-246e-4afd-b59b-9690b9569122 Status : completed [root@dhcp42-185 ~]# gluster v info Volume Name: distrep Type: Distributed-Replicate Volume ID: 4ad479e4-fa01-4d91-8743-4e1510ba2c13 Status: Started Snapshot Count: 0 Number of Bricks: 7 x 2 = 14 Transport-type: tcp Bricks: Brick1: 10.70.42.185:/bricks/brick0/b0 Brick2: 10.70.43.152:/bricks/brick0/b0 Brick3: 10.70.42.39:/bricks/brick0/b0 Brick4: 10.70.42.84:/bricks/brick0/b0 Brick5: 10.70.42.185:/bricks/brick1/b1 Brick6: 10.70.43.152:/bricks/brick1/b1 Brick7: 10.70.42.39:/bricks/brick1/b1 Brick8: 10.70.42.84:/bricks/brick1/b1 Brick9: 10.70.42.185:/bricks/brick2/b2 Brick10: 10.70.43.152:/bricks/brick2/b2 Brick11: 10.70.42.39:/bricks/brick2/b2 Brick12: 10.70.42.84:/bricks/brick2/b2 Brick13: 10.70.42.185:/bricks/brick3/b3 Brick14: 10.70.43.152:/bricks/brick3/b3 Options Reconfigured: performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on --- Additional comment from Atin Mukherjee on 2016-10-15 06:35:05 EDT --- (In reply to Prasad Desala from comment #1) > Same issue is seen with private glusterfs build: > 3.8.4-2.26.git0a405a4.el7rhgs.x86_64 Prasad - this is an upstream bug and no reference of downstream build should be mentioned here. If required, please file a downstream bug to track the issue. > > This issue is specific to md-cache upcall, with the setup in same state > disabled below md-cache options (see gluster v info ouput for more info) and > ERROR messages were not seen in brick logs. Enabling the md-cache started > spamming the brick logs with the error messages. > > Steps that were performed: > > 1. Create a distributed replica volume and started it. > 2. Enabled md-cache supported options to the volume. Please see below > gluster v info for the more details on the md-cache enabled options. > 3. Mounted volume on multiple clients. Simultaneosuly, from one client touch > 10000 files and from another client create 10000 hardlinks for the same file. > 4. Add few bricks and start rebalance. > 5. Once the rebalance is completed, remove all the files on the mount point > using "rm -rf". > 6. Check for brick logs, for the invalid argument messages. --- Additional comment from Poornima G on 2016-10-19 01:27:30 EDT --- The fix for this would be to reduce the loglevel from Error to Debug, this bug is not introduced as a part of md-cache changes.
Patch posted upstream : REVIEW: http://review.gluster.org/15777 (upcall: Fix a log level) posted (#1) for review on master
Poornima - can this be devel_acked for 3.2.0?
I haven't posted the patch downstream yet, hence moving back to assigned
seen this on my systemic setup too /rhs/brick1/drvol/.trashcan/internal_op failed [File exists] [2016-11-07 11:25:27.202460] E [upcall-internal.c:570:upcall_cache_invalidate] (-->/usr/lib64/glusterfs/3.8.4/xlator/features/access-control.so(+0xad49) [0x7fe23211ad49] -->/usr/lib64/glusterfs/3.8.4/xlator/features/locks.so(+0xd4f2) [0x7fe231ef94f2] -->/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so(+0x5b3e) [0x7fe2316adb3e] ) 0-upcall_cache_invalidate: invalid argument: !(gf_uuid_is_null (up_inode_ctx->gfid)) [Invalid argument]
Master: http://review.gluster.org/#/c/15777/ 3.9 : http://review.gluster.org/#/c/15827/ 3.8 : http://review.gluster.org/#/c/15828/ 3.7 : http://review.gluster.org/#/c/15830/ Downstream : https://code.engineering.redhat.com/gerrit/#/c/90547/
Created a Distribute Replica volume with md-cache, ran IOs, IOs with add brick and remove brick. Followed the steps to reproduced and did not find any upcall error messages in brick logs. Version -------- samba-client-libs-4.4.6-4.el7rhgs.x86_64 glusterfs-3.8.4-11.el7rhgs.x86_64 Marking this as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html