Bug 1266079 - tiering: rename of file causes brick process crash
Summary: tiering: rename of file causes brick process crash
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Joseph Elwin Fernandes
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-24 12:27 UTC by Saravanakumar
Modified: 2016-06-20 00:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-02 12:52:58 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
core file generated (632.66 KB, application/x-bzip)
2015-09-24 13:00 UTC, Saravanakumar
no flags Details
core file during rename (606.08 KB, application/octet-stream)
2015-09-29 09:03 UTC, Saravanakumar
no flags Details
commands used while rename (12.68 KB, text/plain)
2015-10-14 09:54 UTC, Saravanakumar
no flags Details

Description Saravanakumar 2015-09-24 12:27:41 UTC
Description of problem:

renaming a file which was demoted causes glusterfsd to crash.

Version-Release number of selected component (if applicable):

git commit id 6838a5b342b40099d09ccdce6af8c6f769cccf39 in master.

How reproducible:
Ensure a file is demoted and then rename a file. 
glusterfsd crashes. 


Steps to Reproduce:
1. setup tier-volume in distribute replicate setup(cold - 3x2 hot - 2x2
2. demote a file (set a lower value for demotion)
3. rename the deleted file in mount point.

Actual results:

Expected results:


Additional info:

gdb log with core files generated.

[root@gfvm3 tierd_volume]# gdb /usr/local/sbin/glusterfsd  /core.26534
GNU gdb (GDB) Fedora 7.8.1-30.fc21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/glusterfsd...done.
[New LWP 29401]
[New LWP 26542]
[New LWP 26545]
[New LWP 26717]
[New LWP 26550]
[New LWP 27084]
[New LWP 26544]
[New LWP 26534]
[New LWP 26541]
[New LWP 26543]
[New LWP 27149]
[New LWP 26540]
[New LWP 26538]
[New LWP 26539]
[New LWP 27141]
[New LWP 26537]
[New LWP 26546]
[New LWP 26547]
[New LWP 26536]
[New LWP 26549]
[New LWP 26535]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/sbin/glusterfsd -s gfvm3 --volfile-id tiervol.gfvm3.opt-volume_test-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe6412718d7 in gf_print_trace (signum=11, ctx=0x1566010) at common-utils.c:582

warning: Source file is more recent than executable.
582	                        if (stack->type == GF_OP_TYPE_FOP)
Missing separate debuginfos, use: debuginfo-install glibc-2.20-8.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-9.fc21.x86_64 libacl-2.2.52-7.fc21.x86_64 libaio-0.3.110-4.fc21.x86_64 libattr-2.4.47-9.fc21.x86_64 libcom_err-1.42.11-4.fc21.x86_64 libgcc-4.9.2-1.fc21.x86_64 libselinux-2.3-5.fc21.x86_64 libuuid-2.25.2-2.fc21.x86_64 openssl-libs-1.0.1k-1.fc21.x86_64 pcre-8.35-8.fc21.x86_64 sqlite-3.8.8.3-1.fc21.x86_64 sssd-client-1.12.2-2.fc21.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64
(gdb) bt
#0  0x00007fe6412718d7 in gf_print_trace (signum=11, ctx=0x1566010) at common-utils.c:582
#1  0x00000000004095e5 in glusterfsd_print_trace (signum=11) at glusterfsd.c:2033
#2  <signal handler called>
#3  0x00007fe62f384d7e in ctr_setxattr (frame=0x7fe5f4000cfc, this=0x7fe63000bb60, loc=0x7fe60c007aec, xattr=0x7fe60c00447c, flags=0, xdata=0x0)
    at changetimerecorder.c:1043
#4  0x00007fe62ec8d54f in changelog_setxattr (frame=0x7fe5f4000bfc, this=0x7fe63000f210, loc=0x7fe60c007aec, dict=0x7fe60c00447c, flags=0, 
    xdata=0x0) at changelog.c:1491
#5  0x00007fe62ea73450 in br_stub_setxattr (frame=0x7fe5f4000bfc, this=0x7fe630010e10, loc=0x7fe60c007aec, dict=0x7fe60c00447c, flags=0, 
    xdata=0x0) at bit-rot-stub.c:1191
#6  0x00007fe62e8657cf in posix_acl_setxattr (frame=0x7fe5f400093c, this=0x7fe6300123f0, loc=0x7fe60c007aec, xattr=0x7fe60c00447c, flags=0, 
    xdata=0x0) at posix-acl.c:2026
#7  0x00007fe64126a959 in default_setxattr (frame=0x7fe5f400093c, this=0x7fe630013950, loc=0x7fe60c007aec, dict=0x7fe60c00447c, flags=0, 
    xdata=0x0) at defaults.c:1772
#8  0x00007fe64126a959 in default_setxattr (frame=0x7fe5f400093c, this=0x7fe630014d70, loc=0x7fe60c007aec, dict=0x7fe60c00447c, flags=0, 
    xdata=0x0) at defaults.c:1772
#9  0x00007fe6412653b5 in default_setxattr_resume (frame=0x7fe60c0022cc, this=0x7fe630016370, loc=0x7fe60c007aec, dict=0x7fe60c00447c, flags=0, 
    xdata=0x0) at defaults.c:1329
#10 0x00007fe641288779 in call_resume_wind (stub=0x7fe60c007aac) at call-stub.c:2139
#11 0x00007fe641290b26 in call_resume (stub=0x7fe60c007aac) at call-stub.c:2571
#12 0x00007fe62e21d4a0 in iot_worker (data=0x7fe630043030) at io-threads.c:210
#13 0x00007fe6400a852a in start_thread () from /lib64/libpthread.so.0
#14 0x00007fe63f9f722d in clone () from /lib64/libc.so.6
(gdb) 

# gluster volume info
 
Volume Name: tiervol
Type: Tier
Volume ID: 366c3435-a48e-4e60-919e-0f04e2efc322
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: gfvm3:/opt/volume_test/tier_vol/b5_2
Brick2: gfvm3:/opt/volume_test/tier_vol/b5_1
Brick3: gfvm3:/opt/volume_test/tier_vol/b4_2
Brick4: gfvm3:/opt/volume_test/tier_vol/b4_1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick5: gfvm3:/opt/volume_test/tier_vol/b1_1
Brick6: gfvm3:/opt/volume_test/tier_vol/b1_2
Brick7: gfvm3:/opt/volume_test/tier_vol/b2_1
Brick8: gfvm3:/opt/volume_test/tier_vol/b2_2
Brick9: gfvm3:/opt/volume_test/tier_vol/b3_1
Brick10: gfvm3:/opt/volume_test/tier_vol/b3_2
Options Reconfigured:
cluster.tier-promote-frequency: 1000
cluster.tier-demote-frequency: 10
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.ctr-enabled: on
performance.readdir-ahead: on
 
Volume Name: tv2
Type: Distribute
Volume ID: 6d7b2e3c-de7b-4b8e-9086-1661e2835c23
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfvm3:/opt/volume_test/tv_2/b1
Brick2: gfvm3:/opt/volume_test/tv_2/b2
Options Reconfigured:
performance.readdir-ahead: on

[root@gfvm3 tierd_volume]# gluster volume status 
Status of volume: tiervol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick gfvm3:/opt/volume_test/tier_vol/b5_2  N/A       N/A        N       N/A  
Brick gfvm3:/opt/volume_test/tier_vol/b5_1  N/A       N/A        N       N/A  
Brick gfvm3:/opt/volume_test/tier_vol/b4_2  49169     0          Y       26516
Brick gfvm3:/opt/volume_test/tier_vol/b4_1  49168     0          Y       26498
Cold Bricks:
Brick gfvm3:/opt/volume_test/tier_vol/b1_1  49162     0          Y       26345
Brick gfvm3:/opt/volume_test/tier_vol/b1_2  49163     0          Y       26363
Brick gfvm3:/opt/volume_test/tier_vol/b2_1  49164     0          Y       26381
Brick gfvm3:/opt/volume_test/tier_vol/b2_2  49165     0          Y       26399
Brick gfvm3:/opt/volume_test/tier_vol/b3_1  49166     0          Y       26417
Brick gfvm3:/opt/volume_test/tier_vol/b3_2  49167     0          Y       26435
NFS Server on localhost                     N/A       N/A        N       N/A  
 
Task Status of Volume tiervol
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 487cb989-ee28-48d5-8cbe-314ea501c1a6
Status               : in progress         
 
Status of volume: tv2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfvm3:/opt/volume_test/tv_2/b1        49172     0          Y       26606
Brick gfvm3:/opt/volume_test/tv_2/b2        49173     0          Y       26624
NFS Server on localhost                     N/A       N/A        N       N/A  
 
Task Status of Volume tv2
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@gfvm3 tierd_volume]# 

Also, geo-replication is setup between tiervol(master) and tv2(slave).

Comment 1 Saravanakumar 2015-09-24 13:00:37 UTC
Created attachment 1076540 [details]
core file generated

core created during rename.

Comment 2 Dan Lambright 2015-09-28 19:40:54 UTC
I was unable to recreate this on my own machine using the same configuration

Comment 3 Saravanakumar 2015-09-29 09:01:48 UTC
Hi Dan,
Observed another instance where a *promoted* file when RENAMED causes crash. 

Attaching the core file as well.
As per the core, loc->inode is NULL.
Please check.


[root@gfvm3 glusterfs]# gdb /usr/local/sbin/glusterfsd /core.17252 
GNU gdb (GDB) Fedora 7.8.1-30.fc21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/glusterfsd...done.
[New LWP 18096]
[New LWP 17270]
[New LWP 17269]
[New LWP 17268]
[New LWP 17259]
[New LWP 17267]
[New LWP 17252]
[New LWP 17258]
[New LWP 17261]
[New LWP 17265]
[New LWP 19028]
[New LWP 17260]
[New LWP 17256]
[New LWP 17255]
[New LWP 17254]
[New LWP 17264]
[New LWP 17253]
[New LWP 17262]
[New LWP 17257]
[New LWP 17266]
[New LWP 17263]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/sbin/glusterfsd -s gfvm3 --volfile-id tiervol.gfvm3.opt-volume_test-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fcbc2f5415e in ctr_setxattr (frame=0x7fcb9400131c, 
    this=0x7fcbbc00bb70, loc=0x7fcbac008a8c, xattr=0x7fcbac00509c, flags=0, 
    xdata=0x0) at changetimerecorder.c:1043
1043	        FILL_CTR_INODE_CONTEXT(_inode_cx, loc->inode->ia_type,
Missing separate debuginfos, use: debuginfo-install glibc-2.20-8.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-9.fc21.x86_64 libacl-2.2.52-7.fc21.x86_64 libaio-0.3.110-4.fc21.x86_64 libattr-2.4.47-9.fc21.x86_64 libcom_err-1.42.11-4.fc21.x86_64 libgcc-4.9.2-1.fc21.x86_64 libselinux-2.3-5.fc21.x86_64 libuuid-2.25.2-2.fc21.x86_64 openssl-libs-1.0.1k-1.fc21.x86_64 pcre-8.35-8.fc21.x86_64 sqlite-3.8.8.3-1.fc21.x86_64 sssd-client-1.12.2-2.fc21.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64
(gdb) bt
#0  0x00007fcbc2f5415e in ctr_setxattr (frame=0x7fcb9400131c, 
    this=0x7fcbbc00bb70, loc=0x7fcbac008a8c, xattr=0x7fcbac00509c, flags=0, 
    xdata=0x0) at changetimerecorder.c:1043
#1  0x00007fcbc285d9f8 in changelog_setxattr (frame=0x7fcb94000fdc, 
    this=0x7fcbbc00f220, loc=0x7fcbac008a8c, dict=0x7fcbac00509c, flags=0, 
    xdata=0x0) at changelog.c:1491
#2  0x00007fcbc2645700 in br_stub_setxattr (frame=0x7fcb94000fdc, 
    this=0x7fcbbc011010, loc=0x7fcbac008a8c, dict=0x7fcbac00509c, flags=0, 
    xdata=0x0) at bit-rot-stub.c:1191
#3  0x00007fcbc2438f0f in posix_acl_setxattr (frame=0x7fcb94001aec, 
    this=0x7fcbbc0125f0, loc=0x7fcbac008a8c, xattr=0x7fcbac00509c, flags=0, 
    xdata=0x0) at posix-acl.c:2026
#4  0x00007fcbd0ca1959 in default_setxattr (frame=0x7fcb94001aec, 
    this=0x7fcbbc013b50, loc=0x7fcbac008a8c, dict=0x7fcbac00509c, flags=0, 
    xdata=0x0) at defaults.c:1772
#5  0x00007fcbd0ca1959 in default_setxattr (frame=0x7fcb94001aec, 
    this=0x7fcbbc014f70, loc=0x7fcbac008a8c, dict=0x7fcbac00509c, flags=0, 
    xdata=0x0) at defaults.c:1772
#6  0x00007fcbd0c9c3b5 in default_setxattr_resume (frame=0x7fcbac00165c, 
    this=0x7fcbbc016570, loc=0x7fcbac008a8c, dict=0x7fcbac00509c, flags=0, 
    xdata=0x0) at defaults.c:1329
#7  0x00007fcbd0cbf779 in call_resume_wind (stub=0x7fcbac008a4c)
    at call-stub.c:2139
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) p loc
$1 = (loc_t *) 0x7fcbac008a8c
(gdb) p *loc
$2 = {path = 0x7fcbac007450 "/file999", name = 0x7fcbac007451 "file999", 
  inode = 0x0, parent = 0x7fcbbc07ef8c, 
  gfid = "\030\365\355[\202\aJ⻥\355\217\360Q\327w", 
  pargfid = '\000' <repeats 15 times>, "\001"}
(gdb) p _inode_ctx
No symbol "_inode_ctx" in current context.
(gdb) p _inode_cx
$3 = (gf_ctr_inode_context_t *) 0x7fcbaa7ebb30
(gdb) p (loc->inode)
$4 = (inode_t *) 0x0

Comment 4 Saravanakumar 2015-09-29 09:03:54 UTC
Created attachment 1078261 [details]
core file during rename

as mentioned in comment#3

Comment 5 Saravanakumar 2015-10-14 09:54:54 UTC
Created attachment 1082763 [details]
commands used while rename

I have observed glusterfsd crash again(while carrying out rename) ...(with same core file as attached in bugzilla)

I am giving entire log.
Please check whether it helps.

Comment 6 Nag Pavan Chilakam 2015-11-02 12:52:58 UTC
I have not seen this crashes on rename both in cold and hot tier on downstream
glusterfs-server-3.7.5-5.el7rhgs.x86_64


Hence closing this bug for now


Note You need to log in before you can comment on or make changes to this bug.