Bug 1334664 - Excessive errors messages are seen in hot tier's brick logs in a tiered volume
Summary: Excessive errors messages are seen in hot tier's brick logs in a tiered volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: index
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Pranith Kumar K
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On: 1336630 1341477 1341482 1368608
Blocks: 1351522
TreeView+ depends on / blocked
 
Reported: 2016-05-10 09:22 UTC by krishnaram Karthick
Modified: 2017-03-23 05:30 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 05:30:27 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description krishnaram Karthick 2016-05-10 09:22:33 UTC
Description of problem:
Lot of error messages related to posix_lookup are seen in hot tier's brick logs. No functionality is broken, however excessive error messages are a concern. We should look at why we log these errors.

[2016-05-10 08:29:54.009247] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-superman-posix: null gfid for path (null)
[2016-05-10 08:29:54.009276] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-superman-posix: lstat on null failed [Invalid argument]

Volume Name: superman
Type: Tier
Volume ID: afa1866e-2d0b-424f-8e22-a782f9068b25
Status: Started
Number of Bricks: 20
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: 10.70.35.133:/bricks/brick7/sm1
Brick2: 10.70.35.10:/bricks/brick7/sm1
Brick3: 10.70.35.11:/bricks/brick7/sm1
Brick4: 10.70.35.225:/bricks/brick7/sm1
Brick5: 10.70.35.239:/bricks/brick7/sm1
Brick6: 10.70.37.60:/bricks/brick7/sm1
Brick7: 10.70.37.120:/bricks/brick7/sm1
Brick8: 10.70.37.101:/bricks/brick7/sm1
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick9: 10.70.37.101:/bricks/brick0/l1
Brick10: 10.70.37.120:/bricks/brick0/l1
Brick11: 10.70.37.60:/bricks/brick0/l1
Brick12: 10.70.35.239:/bricks/brick0/l1
Brick13: 10.70.35.225:/bricks/brick0/l1
Brick14: 10.70.35.11:/bricks/brick0/l1
Brick15: 10.70.35.10:/bricks/brick0/l1
Brick16: 10.70.35.133:/bricks/brick0/l1
Brick17: 10.70.37.101:/bricks/brick1/l1
Brick18: 10.70.37.120:/bricks/brick1/l1
Brick19: 10.70.37.60:/bricks/brick1/l1
Brick20: 10.70.35.239:/bricks/brick1/l1
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: disable


Version-Release number of selected component (if applicable):
glusterfs-server-3.7.9-4.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a disperse volume
2. From fuse or nfs mount - create files, directories untar kernel package
3. While the above operations are in progress, attach tier
4. enable quota
5. monitor hot tier's brick logs

Actual results:
Error messages posted above are logged every 10 minutes on the brick logs

Expected results:
No error messages should be seen, need to RCA if these errors are a concern

Additional info:
sosreports shall be attached shortly.

Comment 3 nchilaka 2016-05-16 09:54:51 UTC
seeing on non-tiered volume:
I am seeing this on my stress setup continuously.,
[2016-05-16 09:42:48.004818] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument]
[2016-05-16 09:42:48.009542] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-stressvol-posix: null gfid for path (null)
[2016-05-16 09:42:48.009563] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument]
[2016-05-16 09:42:48.009687] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-stressvol-posix: null gfid for path (null)
[2016-05-16 09:42:48.009710] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-stressvol-posix: lstat on null failed [Invalid argument]


On all brick logs it is seen

[root@dhcp35-135 ~]# rpm -qa|grep gluster
glusterfs-api-3.7.9-4.el7rhgs.x86_64
glusterfs-3.7.9-4.el7rhgs.x86_64
glusterfs-libs-3.7.9-4.el7rhgs.x86_64
glusterfs-fuse-3.7.9-4.el7rhgs.x86_64
glusterfs-cli-3.7.9-4.el7rhgs.x86_64
python-gluster-3.7.9-4.el7rhgs.noarch
glusterfs-client-xlators-3.7.9-4.el7rhgs.x86_64
glusterfs-server-3.7.9-4.el7rhgs.x86_64
[root@dhcp35-135 ~]# gluster v info
 
Volume Name: stressvol
Type: Distributed-Replicate
Volume ID: a485542e-3389-446c-bb80-c43464e92f48
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.135:/rhs/brick1/stressvol
Brick2: 10.70.35.196:/rhs/brick1/stressvol
Brick3: 10.70.35.99:/rhs/brick1/stressvol
Brick4: 10.70.35.14:/rhs/brick1/stressvol
Brick5: 10.70.35.135:/rhs/brick2/stressvol
Brick6: 10.70.35.196:/rhs/brick2/stressvol
Brick7: 10.70.35.99:/rhs/brick2/stressvol
Brick8: 10.70.35.14:/rhs/brick2/stressvol
Options Reconfigured:
performance.readdir-ahead: on
[root@dhcp35-135 ~]# gluster v status
Status of volume: stressvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.135:/rhs/brick1/stressvol    49152     0          Y       14311
Brick 10.70.35.196:/rhs/brick1/stressvol    49152     0          Y       14243
Brick 10.70.35.99:/rhs/brick1/stressvol     49152     0          Y       14241
Brick 10.70.35.14:/rhs/brick1/stressvol     49152     0          Y       14224
Brick 10.70.35.135:/rhs/brick2/stressvol    49153     0          Y       14330
Brick 10.70.35.196:/rhs/brick2/stressvol    49153     0          Y       14262
Brick 10.70.35.99:/rhs/brick2/stressvol     49153     0          Y       14260
Brick 10.70.35.14:/rhs/brick2/stressvol     49153     0          Y       14243
NFS Server on localhost                     2049      0          Y       14355
Self-heal Daemon on localhost               N/A       N/A        Y       14352
NFS Server on 10.70.35.14                   2049      0          Y       14265
Self-heal Daemon on 10.70.35.14             N/A       N/A        Y       14270
NFS Server on 10.70.35.196                  2049      0          Y       14284
Self-heal Daemon on 10.70.35.196            N/A       N/A        Y       14289
NFS Server on 10.70.35.99                   2049      0          Y       14282
Self-heal Daemon on 10.70.35.99             N/A       N/A        Y       14287
 
Task Status of Volume stressvol
------------------------------------------------------------------------------
There are no active volume tasks


http://etherpad.corp.redhat.com/3-1-3-systemic-testing

Comment 10 Atin Mukherjee 2016-09-17 14:39:40 UTC
Upstream mainline : http://review.gluster.org/14589
Upstream 3.8 : http://review.gluster.org/14596

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 13 krishnaram Karthick 2016-10-11 12:03:19 UTC
The error messages reported in the bug is no longer seen in build - glusterfs-server-3.8.4-2

The log message is not seen with any tier operations. Moving the bug to verified.

Comment 15 errata-xmlrpc 2017-03-23 05:30:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.