| Summary: | Tiering related core observed with "uuid_is_null () message". | |||
|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Shashank Raj <sraj> | |
| Component: | tier | Assignee: | Nithya Balachandran <nbalacha> | |
| Status: | CLOSED ERRATA | QA Contact: | Sweta Anandpara <sanandpa> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.1 | CC: | amukherj, dlambrig, jthottan, kkeithle, mzywusko, nbalacha, ndevos, rcyriac, rhinduja, rhs-bugs, rkavunga, sankarshan, skoduri | |
| Target Milestone: | --- | |||
| Target Release: | RHGS 3.2.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1358196 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-23 05:29:17 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1351522, 1358196, 1360122, 1360125 | |||
|
Description
Shashank Raj
2016-05-03 12:43:58 UTC
As we dont have exact steps to reproduce, We dont know the cause of a NULL uuid or GFID. But yes handling the exception properly is important. Will send a patch with the necessary exception handling in ctr code. Is there any chance of getting access to the brick logs? Since i was not sure of the exact scenario which generated it and unfortunately there were lot many tests done on the same setup from the time the core got generated, i don't think i will be able to provide the exact brick logs. but will surely keep a check on this issue and update the bugzilla with details if i hit it again. Since the issue was seen on nfs-ganesha + Tiering setup and as we are no more testing/supporting tiering with ganesha for 3.1.3. I guess we can move it out. Targeting this BZ for 3.2.0. Upstream mainline : http://review.gluster.org/14964 Upstream 3.8 : http://review.gluster.org/15009 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. Tried to reproduce the issue on 3.1.3, by creating a scenario of continuous directory rename and triggering a graph switch by changing any of the performance options of the (tiered) volume, (as advised by Nithya). That did not result in any trace.
Have followed the same steps in the build 3.8.4-14 in a ganesha+tiered volume setup, and again, did not hit any trace. I do however see plenty of the errors (pasted below) in 3.1.3 as well as 3.2
[2017-02-28 09:59:28.840923] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:28.913263] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:28.913836] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:28.991517] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:28.991900] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:29.071659] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
[2017-02-28 09:59:29.072078] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA
[2017-02-28 09:59:29.145699] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context
I have been unsuccessful in reproducing this issue, and hence cannot confidently claim the fix that this BZ is addressing. Having said that, repeated testing of the above scenario has not resulted in any crash. Hence moving this to verified as of now. Will reopen this BZ if QE ends up hitting this crash again. Setup details below:
[root@dhcp46-111 ~]# gluster peer status
Number of Peers: 5
Hostname: dhcp46-115.lab.eng.blr.redhat.com
Uuid: 61964c73-d65d-45f5-8de6-2dfa1db76db7
State: Peer in Cluster (Connected)
Hostname: dhcp46-139.lab.eng.blr.redhat.com
Uuid: b0714c63-8dba-4922-9019-ac1ef9702076
State: Peer in Cluster (Connected)
Hostname: dhcp46-124.lab.eng.blr.redhat.com
Uuid: ffde978a-bb28-44ed-9c73-886d29d7fa19
State: Peer in Cluster (Connected)
Hostname: dhcp46-131.lab.eng.blr.redhat.com
Uuid: a0e14dcd-67ce-4b36-adeb-9b1be8e65b7f
State: Peer in Cluster (Connected)
Hostname: dhcp46-152.lab.eng.blr.redhat.com
Uuid: ce2bd89e-f047-4cd8-bd73-ec0c5a6d974c
State: Peer in Cluster (Connected)
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]# gluster v info vol_tier
Volume Name: vol_tier
Type: Tier
Volume ID: d544486f-c47e-420d-9b17-daad43058231
Status: Started
Snapshot Count: 0
Number of Bricks: 18
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick1: 10.70.46.152:/bricks/brick2/vol_tier_hot5
Brick2: 10.70.46.131:/bricks/brick2/vol_tier_hot4
Brick3: 10.70.46.124:/bricks/brick2/vol_tier_hot3
Brick4: 10.70.46.139:/bricks/brick2/vol_tier_hot2
Brick5: 10.70.46.115:/bricks/brick2/vol_tier_hot1
Brick6: 10.70.46.111:/bricks/brick2/vol_tier_hot0
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick7: 10.70.46.111:/bricks/brick0/vol_tier0
Brick8: 10.70.46.115:/bricks/brick0/vol_tier1
Brick9: 10.70.46.139:/bricks/brick0/vol_tier2
Brick10: 10.70.46.124:/bricks/brick0/vol_tier3
Brick11: 10.70.46.131:/bricks/brick0/vol_tier4
Brick12: 10.70.46.152:/bricks/brick0/vol_tier5
Brick13: 10.70.46.111:/bricks/brick1/vol_tier6
Brick14: 10.70.46.115:/bricks/brick1/vol_tier7
Brick15: 10.70.46.139:/bricks/brick1/vol_tier8
Brick16: 10.70.46.124:/bricks/brick1/vol_tier9
Brick17: 10.70.46.131:/bricks/brick1/vol_tier10
Brick18: 10.70.46.152:/bricks/brick1/vol_tier11
Options Reconfigured:
performance.client-io-threads: on
performance.stat-prefetch: on
ganesha.enable: on
features.cache-invalidation: on
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]# gluster v status vol_tier
Status of volume: vol_tier
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.46.152:/bricks/brick2/vol_tier_
hot5 49156 0 Y 29550
Brick 10.70.46.131:/bricks/brick2/vol_tier_
hot4 49156 0 Y 26907
Brick 10.70.46.124:/bricks/brick2/vol_tier_
hot3 49157 0 Y 21410
Brick 10.70.46.139:/bricks/brick2/vol_tier_
hot2 49158 0 Y 12868
Brick 10.70.46.115:/bricks/brick2/vol_tier_
hot1 49158 0 Y 18627
Brick 10.70.46.111:/bricks/brick2/vol_tier_
hot0 49160 0 Y 20041
Cold Bricks:
Brick 10.70.46.111:/bricks/brick0/vol_tier0 49156 0 Y 18443
Brick 10.70.46.115:/bricks/brick0/vol_tier1 49156 0 Y 17060
Brick 10.70.46.139:/bricks/brick0/vol_tier2 49156 0 Y 11256
Brick 10.70.46.124:/bricks/brick0/vol_tier3 49155 0 Y 19670
Brick 10.70.46.131:/bricks/brick0/vol_tier4 49154 0 Y 26794
Brick 10.70.46.152:/bricks/brick0/vol_tier5 49154 0 Y 29438
Brick 10.70.46.111:/bricks/brick1/vol_tier6 49159 0 Y 18462
Brick 10.70.46.115:/bricks/brick1/vol_tier7 49157 0 Y 17079
Brick 10.70.46.139:/bricks/brick1/vol_tier8 49157 0 Y 11275
Brick 10.70.46.124:/bricks/brick1/vol_tier9 49156 0 Y 19720
Brick 10.70.46.131:/bricks/brick1/vol_tier1
0 49155 0 Y 26813
Brick 10.70.46.152:/bricks/brick1/vol_tier1
1 49155 0 Y 29457
Self-heal Daemon on localhost N/A N/A Y 20103
Self-heal Daemon on dhcp46-115.lab.eng.blr.
redhat.com N/A N/A Y 18730
Self-heal Daemon on dhcp46-139.lab.eng.blr.
redhat.com N/A N/A Y 12899
Self-heal Daemon on dhcp46-124.lab.eng.blr.
redhat.com N/A N/A Y 21430
Self-heal Daemon on dhcp46-131.lab.eng.blr.
redhat.com N/A N/A Y 26927
Self-heal Daemon on dhcp46-152.lab.eng.blr.
redhat.com N/A N/A Y 29570
Task Status of Volume vol_tier
------------------------------------------------------------------------------
Task : Tier migration
ID : 3be315db-1eca-4cdd-ae81-ad54442e69fc
Status : in progress
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]# rpm -qa | grep gluster
glusterfs-3.8.4-14.el7rhgs.x86_64
glusterfs-cli-3.8.4-14.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-7.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-14.el7rhgs.x86_64
glusterfs-libs-3.8.4-14.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-14.el7rhgs.x86_64
glusterfs-fuse-3.8.4-14.el7rhgs.x86_64
glusterfs-server-3.8.4-14.el7rhgs.x86_64
python-gluster-3.8.4-14.el7rhgs.noarch
glusterfs-ganesha-3.8.4-14.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-14.el7rhgs.x86_64
glusterfs-api-3.8.4-14.el7rhgs.x86_64
glusterfs-devel-3.8.4-14.el7rhgs.x86_64
glusterfs-events-3.8.4-14.el7rhgs.x86_64
glusterfs-rdma-3.8.4-14.el7rhgs.x86_64
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
[root@dhcp46-111 ~]#
CLIENT LOGS
============
[root@dhcp35-153 mnt]#
[root@dhcp35-153 mnt]# mount -t nfs -o vers=4 10.70.44.92:vol_tier /mnt/test
[root@dhcp35-153 mnt]# cd /mnt/test
[root@dhcp35-153 test]# ls -a
. .. .trashcan
[root@dhcp35-153 test]#
[root@dhcp35-153 test]#
[root@dhcp35-153 test]#
[root@dhcp35-153 test]# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
10.70.44.92:vol_tier 114554880 719872 113835008 1% /mnt/test
[root@dhcp35-153 test]#
[root@dhcp35-153 test]#
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mkdir -p dir$i/dir$j;done; done
[root@dhcp35-153 test]# ls -a
. .. dir1 dir10 dir2 dir3 dir4 dir5 dir6 dir7 dir8 dir9 .trashcan
[root@dhcp35-153 test]# ls dir1/dir
Display all 100 possibilities? (y or n)
[root@dhcp35-153 test]# ls dir1/dir
ls: cannot access dir1/dir: No such file or directory
[root@dhcp35-153 test]#
[root@dhcp35-153 test]#
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/dir$j dir$i/newdir$j;done; done
[root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/newdir$j dir$i/olddir$j;done; done
[root@dhcp35-153 test]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |