Description of problem: Tiering related core observed with "uuid_is_null () message". Version-Release number of selected component (if applicable): glusterfs-3.7.9-2 How reproducible: Once Steps to Reproduce: Not sure of the exact steps which generated this core but this was observed when tiering related cases were executed. bt as below: #0 0x00007fa21e1543fc in uuid_is_null () from /lib64/libuuid.so.1 #1 0x00007fa21070fc09 in ctr_delete_hard_link_from_db.isra.1.constprop.4 () from /usr/lib64/glusterfs/3.7.9/xlator/features/changetimerecorder.so #2 0x00007fa210716bd1 in ctr_rename_cbk () from /usr/lib64/glusterfs/3.7.9/xlator/features/changetimerecorder.so #3 0x00007fa21092cccd in trash_common_rename_cbk () from /usr/lib64/glusterfs/3.7.9/xlator/features/trash.so #4 0x00007fa211169024 in posix_rename () from /usr/lib64/glusterfs/3.7.9/xlator/storage/posix.so #5 0x00007fa210934c27 in trash_rename () from /usr/lib64/glusterfs/3.7.9/xlator/features/trash.so #6 0x00007fa21071219d in ctr_rename () from /usr/lib64/glusterfs/3.7.9/xlator/features/changetimerecorder.so #7 0x00007fa21003449e in changelog_rename () from /usr/lib64/glusterfs/3.7.9/xlator/features/changelog.so #8 0x00007fa21e9cbb1a in default_rename () from /lib64/libglusterfs.so.0 #9 0x00007fa20b9cd4fa in ?? () from /usr/lib64/glusterfs/3.7.9/xlator/features/access-control.so #10 0x00007fa21ea3ef90 in graphyylex_destroy () from /lib64/libglusterfs.so.0 #11 0x00007fa21c4d7a54 in ?? () ---Type <return> to continue, or q <return> to quit--- #12 0x00007fa21e9fa009 in __gf_calloc () from /lib64/libglusterfs.so.0 #13 0x00007fa20c011bd0 in ?? () #14 0x00007fa21c4d7a54 in ?? () #15 0x00007fa21e9cbb1a in default_rename () from /lib64/libglusterfs.so.0 #16 0x00007fa20b5a0dd8 in up_rename () from /usr/lib64/glusterfs/3.7.9/xlator/features/upcall.so #17 0x00007fa21e9d8542 in default_rename_resume () from /lib64/libglusterfs.so.0 #18 0x00007fa21e9f73cd in call_resume () from /lib64/libglusterfs.so.0 #19 0x00007fa20c054c70 in ?? () #20 0x00007fa20c054c98 in ?? () #21 0x00007fa1ea4eae70 in ?? () #22 0x00007fa20c054c98 in ?? () #23 0x00007fa1ea4eae70 in ?? () #24 0x00007fa20b393363 in iot_worker () from /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so #25 0x00007fa21d82fdc5 in start_thread () from /lib64/libpthread.so.0 #26 0x00007fa21d1761cd in clone () from /lib64/libc.so.6 Actual results: Tiering related core observed with "uuid_is_null () message". Expected results: there should not be any core generated. Additional info: Dont have the exact steps to reproduce, however filing this bug so that we don't miss this issue.
As we dont have exact steps to reproduce, We dont know the cause of a NULL uuid or GFID. But yes handling the exception properly is important. Will send a patch with the necessary exception handling in ctr code. Is there any chance of getting access to the brick logs?
Since i was not sure of the exact scenario which generated it and unfortunately there were lot many tests done on the same setup from the time the core got generated, i don't think i will be able to provide the exact brick logs. but will surely keep a check on this issue and update the bugzilla with details if i hit it again.
Since the issue was seen on nfs-ganesha + Tiering setup and as we are no more testing/supporting tiering with ganesha for 3.1.3. I guess we can move it out.
Targeting this BZ for 3.2.0.
Upstream mainline : http://review.gluster.org/14964 Upstream 3.8 : http://review.gluster.org/15009 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Tried to reproduce the issue on 3.1.3, by creating a scenario of continuous directory rename and triggering a graph switch by changing any of the performance options of the (tiered) volume, (as advised by Nithya). That did not result in any trace. Have followed the same steps in the build 3.8.4-14 in a ganesha+tiered volume setup, and again, did not hit any trace. I do however see plenty of the errors (pasted below) in 3.1.3 as well as 3.2 [2017-02-28 09:59:28.840923] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA [2017-02-28 09:59:28.913263] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context [2017-02-28 09:59:28.913836] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA [2017-02-28 09:59:28.991517] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context [2017-02-28 09:59:28.991900] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA [2017-02-28 09:59:29.071659] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context [2017-02-28 09:59:29.072078] E [MSGID: 121023] [changetimerecorder.c:842:ctr_rename_cbk] 0-vol_tier-changetimerecorder: Failed to getting GF_RESPONSE_LINK_COUNT_XDATA [2017-02-28 09:59:29.145699] E [MSGID: 121022] [changetimerecorder.c:950:ctr_rename] 0-vol_tier-changetimerecorder: Failed updating hard link in ctr inode context I have been unsuccessful in reproducing this issue, and hence cannot confidently claim the fix that this BZ is addressing. Having said that, repeated testing of the above scenario has not resulted in any crash. Hence moving this to verified as of now. Will reopen this BZ if QE ends up hitting this crash again. Setup details below: [root@dhcp46-111 ~]# gluster peer status Number of Peers: 5 Hostname: dhcp46-115.lab.eng.blr.redhat.com Uuid: 61964c73-d65d-45f5-8de6-2dfa1db76db7 State: Peer in Cluster (Connected) Hostname: dhcp46-139.lab.eng.blr.redhat.com Uuid: b0714c63-8dba-4922-9019-ac1ef9702076 State: Peer in Cluster (Connected) Hostname: dhcp46-124.lab.eng.blr.redhat.com Uuid: ffde978a-bb28-44ed-9c73-886d29d7fa19 State: Peer in Cluster (Connected) Hostname: dhcp46-131.lab.eng.blr.redhat.com Uuid: a0e14dcd-67ce-4b36-adeb-9b1be8e65b7f State: Peer in Cluster (Connected) Hostname: dhcp46-152.lab.eng.blr.redhat.com Uuid: ce2bd89e-f047-4cd8-bd73-ec0c5a6d974c State: Peer in Cluster (Connected) [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# gluster v info vol_tier Volume Name: vol_tier Type: Tier Volume ID: d544486f-c47e-420d-9b17-daad43058231 Status: Started Snapshot Count: 0 Number of Bricks: 18 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 3 x 2 = 6 Brick1: 10.70.46.152:/bricks/brick2/vol_tier_hot5 Brick2: 10.70.46.131:/bricks/brick2/vol_tier_hot4 Brick3: 10.70.46.124:/bricks/brick2/vol_tier_hot3 Brick4: 10.70.46.139:/bricks/brick2/vol_tier_hot2 Brick5: 10.70.46.115:/bricks/brick2/vol_tier_hot1 Brick6: 10.70.46.111:/bricks/brick2/vol_tier_hot0 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick7: 10.70.46.111:/bricks/brick0/vol_tier0 Brick8: 10.70.46.115:/bricks/brick0/vol_tier1 Brick9: 10.70.46.139:/bricks/brick0/vol_tier2 Brick10: 10.70.46.124:/bricks/brick0/vol_tier3 Brick11: 10.70.46.131:/bricks/brick0/vol_tier4 Brick12: 10.70.46.152:/bricks/brick0/vol_tier5 Brick13: 10.70.46.111:/bricks/brick1/vol_tier6 Brick14: 10.70.46.115:/bricks/brick1/vol_tier7 Brick15: 10.70.46.139:/bricks/brick1/vol_tier8 Brick16: 10.70.46.124:/bricks/brick1/vol_tier9 Brick17: 10.70.46.131:/bricks/brick1/vol_tier10 Brick18: 10.70.46.152:/bricks/brick1/vol_tier11 Options Reconfigured: performance.client-io-threads: on performance.stat-prefetch: on ganesha.enable: on features.cache-invalidation: on cluster.tier-mode: cache features.ctr-enabled: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# gluster v status vol_tier Status of volume: vol_tier Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.46.152:/bricks/brick2/vol_tier_ hot5 49156 0 Y 29550 Brick 10.70.46.131:/bricks/brick2/vol_tier_ hot4 49156 0 Y 26907 Brick 10.70.46.124:/bricks/brick2/vol_tier_ hot3 49157 0 Y 21410 Brick 10.70.46.139:/bricks/brick2/vol_tier_ hot2 49158 0 Y 12868 Brick 10.70.46.115:/bricks/brick2/vol_tier_ hot1 49158 0 Y 18627 Brick 10.70.46.111:/bricks/brick2/vol_tier_ hot0 49160 0 Y 20041 Cold Bricks: Brick 10.70.46.111:/bricks/brick0/vol_tier0 49156 0 Y 18443 Brick 10.70.46.115:/bricks/brick0/vol_tier1 49156 0 Y 17060 Brick 10.70.46.139:/bricks/brick0/vol_tier2 49156 0 Y 11256 Brick 10.70.46.124:/bricks/brick0/vol_tier3 49155 0 Y 19670 Brick 10.70.46.131:/bricks/brick0/vol_tier4 49154 0 Y 26794 Brick 10.70.46.152:/bricks/brick0/vol_tier5 49154 0 Y 29438 Brick 10.70.46.111:/bricks/brick1/vol_tier6 49159 0 Y 18462 Brick 10.70.46.115:/bricks/brick1/vol_tier7 49157 0 Y 17079 Brick 10.70.46.139:/bricks/brick1/vol_tier8 49157 0 Y 11275 Brick 10.70.46.124:/bricks/brick1/vol_tier9 49156 0 Y 19720 Brick 10.70.46.131:/bricks/brick1/vol_tier1 0 49155 0 Y 26813 Brick 10.70.46.152:/bricks/brick1/vol_tier1 1 49155 0 Y 29457 Self-heal Daemon on localhost N/A N/A Y 20103 Self-heal Daemon on dhcp46-115.lab.eng.blr. redhat.com N/A N/A Y 18730 Self-heal Daemon on dhcp46-139.lab.eng.blr. redhat.com N/A N/A Y 12899 Self-heal Daemon on dhcp46-124.lab.eng.blr. redhat.com N/A N/A Y 21430 Self-heal Daemon on dhcp46-131.lab.eng.blr. redhat.com N/A N/A Y 26927 Self-heal Daemon on dhcp46-152.lab.eng.blr. redhat.com N/A N/A Y 29570 Task Status of Volume vol_tier ------------------------------------------------------------------------------ Task : Tier migration ID : 3be315db-1eca-4cdd-ae81-ad54442e69fc Status : in progress [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# rpm -qa | grep gluster glusterfs-3.8.4-14.el7rhgs.x86_64 glusterfs-cli-3.8.4-14.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-7.el7rhgs.x86_64 glusterfs-api-devel-3.8.4-14.el7rhgs.x86_64 glusterfs-libs-3.8.4-14.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-14.el7rhgs.x86_64 glusterfs-fuse-3.8.4-14.el7rhgs.x86_64 glusterfs-server-3.8.4-14.el7rhgs.x86_64 python-gluster-3.8.4-14.el7rhgs.noarch glusterfs-ganesha-3.8.4-14.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-14.el7rhgs.x86_64 glusterfs-api-3.8.4-14.el7rhgs.x86_64 glusterfs-devel-3.8.4-14.el7rhgs.x86_64 glusterfs-events-3.8.4-14.el7rhgs.x86_64 glusterfs-rdma-3.8.4-14.el7rhgs.x86_64 [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# [root@dhcp46-111 ~]# CLIENT LOGS ============ [root@dhcp35-153 mnt]# [root@dhcp35-153 mnt]# mount -t nfs -o vers=4 10.70.44.92:vol_tier /mnt/test [root@dhcp35-153 mnt]# cd /mnt/test [root@dhcp35-153 test]# ls -a . .. .trashcan [root@dhcp35-153 test]# [root@dhcp35-153 test]# [root@dhcp35-153 test]# [root@dhcp35-153 test]# df -k . Filesystem 1K-blocks Used Available Use% Mounted on 10.70.44.92:vol_tier 114554880 719872 113835008 1% /mnt/test [root@dhcp35-153 test]# [root@dhcp35-153 test]# [root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mkdir -p dir$i/dir$j;done; done [root@dhcp35-153 test]# ls -a . .. dir1 dir10 dir2 dir3 dir4 dir5 dir6 dir7 dir8 dir9 .trashcan [root@dhcp35-153 test]# ls dir1/dir Display all 100 possibilities? (y or n) [root@dhcp35-153 test]# ls dir1/dir ls: cannot access dir1/dir: No such file or directory [root@dhcp35-153 test]# [root@dhcp35-153 test]# [root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/dir$j dir$i/newdir$j;done; done [root@dhcp35-153 test]# for i in {1..10}; do for j in {1..100}; do mv dir$i/newdir$j dir$i/olddir$j;done; done [root@dhcp35-153 test]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html