Red Hat Bugzilla – Bug 1334664
Excessive errors messages are seen in hot tier's brick logs in a tiered volume
Last modified: 2017-03-23 01:30:27 EDT
Description of problem: Lot of error messages related to posix_lookup are seen in hot tier's brick logs. No functionality is broken, however excessive error messages are a concern. We should look at why we log these errors. [2016-05-10 08:29:54.009247] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-superman-posix: null gfid for path (null) [2016-05-10 08:29:54.009276] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-superman-posix: lstat on null failed [Invalid argument] Volume Name: superman Type: Tier Volume ID: afa1866e-2d0b-424f-8e22-a782f9068b25 Status: Started Number of Bricks: 20 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick1: 10.70.35.133:/bricks/brick7/sm1 Brick2: 10.70.35.10:/bricks/brick7/sm1 Brick3: 10.70.35.11:/bricks/brick7/sm1 Brick4: 10.70.35.225:/bricks/brick7/sm1 Brick5: 10.70.35.239:/bricks/brick7/sm1 Brick6: 10.70.37.60:/bricks/brick7/sm1 Brick7: 10.70.37.120:/bricks/brick7/sm1 Brick8: 10.70.37.101:/bricks/brick7/sm1 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick9: 10.70.37.101:/bricks/brick0/l1 Brick10: 10.70.37.120:/bricks/brick0/l1 Brick11: 10.70.37.60:/bricks/brick0/l1 Brick12: 10.70.35.239:/bricks/brick0/l1 Brick13: 10.70.35.225:/bricks/brick0/l1 Brick14: 10.70.35.11:/bricks/brick0/l1 Brick15: 10.70.35.10:/bricks/brick0/l1 Brick16: 10.70.35.133:/bricks/brick0/l1 Brick17: 10.70.37.101:/bricks/brick1/l1 Brick18: 10.70.37.120:/bricks/brick1/l1 Brick19: 10.70.37.60:/bricks/brick1/l1 Brick20: 10.70.35.239:/bricks/brick1/l1 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on cluster.enable-shared-storage: enable nfs-ganesha: disable Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-4.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a disperse volume 2. From fuse or nfs mount - create files, directories untar kernel package 3. While the above operations are in progress, attach tier 4. enable quota 5. monitor hot tier's brick logs Actual results: Error messages posted above are logged every 10 minutes on the brick logs Expected results: No error messages should be seen, need to RCA if these errors are a concern Additional info: sosreports shall be attached shortly.
seeing on non-tiered volume: I am seeing this on my stress setup continuously., [2016-05-16 09:42:48.004818] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument] [2016-05-16 09:42:48.009542] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-stressvol-posix: null gfid for path (null) [2016-05-16 09:42:48.009563] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument] [2016-05-16 09:42:48.009687] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-stressvol-posix: null gfid for path (null) [2016-05-16 09:42:48.009710] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument] On all brick logs it is seen [root@dhcp35-135 ~]# rpm -qa|grep gluster glusterfs-api-3.7.9-4.el7rhgs.x86_64 glusterfs-3.7.9-4.el7rhgs.x86_64 glusterfs-libs-3.7.9-4.el7rhgs.x86_64 glusterfs-fuse-3.7.9-4.el7rhgs.x86_64 glusterfs-cli-3.7.9-4.el7rhgs.x86_64 python-gluster-3.7.9-4.el7rhgs.noarch glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64 glusterfs-server-3.7.9-4.el7rhgs.x86_64 [root@dhcp35-135 ~]# gluster v info Volume Name: stressvol Type: Distributed-Replicate Volume ID: a485542e-3389-446c-bb80-c43464e92f48 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.135:/rhs/brick1/stressvol Brick2: 10.70.35.196:/rhs/brick1/stressvol Brick3: 10.70.35.99:/rhs/brick1/stressvol Brick4: 10.70.35.14:/rhs/brick1/stressvol Brick5: 10.70.35.135:/rhs/brick2/stressvol Brick6: 10.70.35.196:/rhs/brick2/stressvol Brick7: 10.70.35.99:/rhs/brick2/stressvol Brick8: 10.70.35.14:/rhs/brick2/stressvol Options Reconfigured: performance.readdir-ahead: on [root@dhcp35-135 ~]# gluster v status Status of volume: stressvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.135:/rhs/brick1/stressvol 49152 0 Y 14311 Brick 10.70.35.196:/rhs/brick1/stressvol 49152 0 Y 14243 Brick 10.70.35.99:/rhs/brick1/stressvol 49152 0 Y 14241 Brick 10.70.35.14:/rhs/brick1/stressvol 49152 0 Y 14224 Brick 10.70.35.135:/rhs/brick2/stressvol 49153 0 Y 14330 Brick 10.70.35.196:/rhs/brick2/stressvol 49153 0 Y 14262 Brick 10.70.35.99:/rhs/brick2/stressvol 49153 0 Y 14260 Brick 10.70.35.14:/rhs/brick2/stressvol 49153 0 Y 14243 NFS Server on localhost 2049 0 Y 14355 Self-heal Daemon on localhost N/A N/A Y 14352 NFS Server on 10.70.35.14 2049 0 Y 14265 Self-heal Daemon on 10.70.35.14 N/A N/A Y 14270 NFS Server on 10.70.35.196 2049 0 Y 14284 Self-heal Daemon on 10.70.35.196 N/A N/A Y 14289 NFS Server on 10.70.35.99 2049 0 Y 14282 Self-heal Daemon on 10.70.35.99 N/A N/A Y 14287 Task Status of Volume stressvol ------------------------------------------------------------------------------ There are no active volume tasks http://etherpad.corp.redhat.com/3-1-3-systemic-testing
Upstream mainline : http://review.gluster.org/14589 Upstream 3.8 : http://review.gluster.org/14596 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
The error messages reported in the bug is no longer seen in build - glusterfs-server-3.8.4-2 The log message is not seen with any tier operations. Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html