+++ This bug was initially created as a clone of Bug #1451280 +++ Description of problem: ======================= Had a 6 node cluster with 3.8.4-23 build. Created a 1 * (4+2) EC volume and mounted it via fuse. Created two files 'test1' and 'test2' and corrupted both. The scrubber detected both the files as corrupted. Updated the build to 3.8.4-25 and restarted glusterd. Followed the steps of recovering the file as mentioned in the admin guide. 'test2' recovered successfully, but 'test1' failed with 'Input/output error' on the mountpoint. Volume status showed 2 brick processes down. Version-Release number of selected component (if applicable): =========================================================== How reproducible: ================= 1:1 Additional info: ================ [root@dhcp47-121 ~]# gluster peer status Number of Peers: 5 Hostname: dhcp47-113.lab.eng.blr.redhat.com Uuid: a0557927-4e5e-4ff7-8dce-94873f867707 State: Peer in Cluster (Connected) Hostname: dhcp47-114.lab.eng.blr.redhat.com Uuid: c0dac197-5a4d-4db7-b709-dbf8b8eb0896 State: Peer in Cluster (Connected) Hostname: dhcp47-115.lab.eng.blr.redhat.com Uuid: f828fdfa-e08f-4d12-85d8-2121cafcf9d0 State: Peer in Cluster (Connected) Hostname: dhcp47-116.lab.eng.blr.redhat.com Uuid: a96e0244-b5ce-4518-895c-8eb453c71ded State: Peer in Cluster (Connected) Hostname: dhcp47-117.lab.eng.blr.redhat.com Uuid: 17eb3cef-17e7-4249-954b-fc19ec608304 State: Peer in Cluster (Connected) [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# gluster v status disp2 Status of volume: disp2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.121:/bricks/brick8/disp2_0 49154 0 Y 5552 Brick 10.70.47.113:/bricks/brick8/disp2_1 N/A N/A N N/A Brick 10.70.47.114:/bricks/brick8/disp2_2 49154 0 Y 30916 Brick 10.70.47.115:/bricks/brick8/disp2_3 49154 0 Y 23469 Brick 10.70.47.116:/bricks/brick8/disp2_4 49153 0 Y 27754 Brick 10.70.47.117:/bricks/brick8/disp2_5 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5497 Bitrot Daemon on localhost N/A N/A Y 5515 Scrubber Daemon on localhost N/A N/A Y 5525 Self-heal Daemon on dhcp47-113.lab.eng.blr. redhat.com N/A N/A Y 5893 Bitrot Daemon on dhcp47-113.lab.eng.blr.red hat.com N/A N/A Y 5911 Scrubber Daemon on dhcp47-113.lab.eng.blr.r edhat.com N/A N/A Y 5921 Self-heal Daemon on dhcp47-114.lab.eng.blr. redhat.com N/A N/A Y 30858 Bitrot Daemon on dhcp47-114.lab.eng.blr.red hat.com N/A N/A Y 30876 Scrubber Daemon on dhcp47-114.lab.eng.blr.r edhat.com N/A N/A Y 30886 Self-heal Daemon on dhcp47-116.lab.eng.blr. redhat.com N/A N/A Y 27708 Bitrot Daemon on dhcp47-116.lab.eng.blr.red hat.com N/A N/A Y 27726 Scrubber Daemon on dhcp47-116.lab.eng.blr.r edhat.com N/A N/A Y 27736 Self-heal Daemon on dhcp47-117.lab.eng.blr. redhat.com N/A N/A Y 9684 Bitrot Daemon on dhcp47-117.lab.eng.blr.red hat.com N/A N/A Y 9702 Scrubber Daemon on dhcp47-117.lab.eng.blr.r edhat.com N/A N/A Y 9712 Self-heal Daemon on dhcp47-115.lab.eng.blr. redhat.com N/A N/A Y 23411 Bitrot Daemon on dhcp47-115.lab.eng.blr.red hat.com N/A N/A Y 23429 Scrubber Daemon on dhcp47-115.lab.eng.blr.r edhat.com N/A N/A Y 23439 Task Status of Volume disp2 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# gluster v info disp2 Volume Name: disp2 Type: Disperse Volume ID: d7b0d170-f0e0-4e26-9369-f0a52dc92d38 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: 10.70.47.121:/bricks/brick8/disp2_0 Brick2: 10.70.47.113:/bricks/brick8/disp2_1 Brick3: 10.70.47.114:/bricks/brick8/disp2_2 Brick4: 10.70.47.115:/bricks/brick8/disp2_3 Brick5: 10.70.47.116:/bricks/brick8/disp2_4 Brick6: 10.70.47.117:/bricks/brick8/disp2_5 Options Reconfigured: performance.stat-prefetch: off nfs.disable: on transport.address-family: inet features.bitrot: on features.scrub: Active features.scrub-freq: hourly cluster.brick-multiplex: disable [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# gluster v bitrot disp2 scrub status Volume name : disp2 State of scrub: Active (In Progress) Scrub impact: lazy Scrub frequency: hourly Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node: localhost Number of Scrubbed files: 2 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 09:35:12 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= Node: dhcp47-114.lab.eng.blr.redhat.com Number of Scrubbed files: 1 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 09:35:12 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= Node: dhcp47-116.lab.eng.blr.redhat.com Number of Scrubbed files: 2 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 09:35:14 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= Node: dhcp47-113.lab.eng.blr.redhat.com Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 08:35:24 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= Node: dhcp47-115.lab.eng.blr.redhat.com Number of Scrubbed files: 2 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 09:35:11 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= Node: dhcp47-117.lab.eng.blr.redhat.com Number of Scrubbed files: 0 Number of Skipped files: 0 Last completed scrub time: 2017-05-16 08:35:23 Duration of last scrub (D:M:H:M:S): 0:0:0:7 Error count: 0 ========================================================= [root@dhcp47-121 ~]# gluster v heal disp2 info Brick 10.70.47.121:/bricks/brick8/disp2_0 /d1/d2/d3/d4/test2 Status: Connected Number of entries: 1 Brick 10.70.47.113:/bricks/brick8/disp2_1 Status: Transport endpoint is not connected Number of entries: - Brick 10.70.47.114:/bricks/brick8/disp2_2 /d1/d2/d3/d4/test2 Status: Connected Number of entries: 1 Brick 10.70.47.115:/bricks/brick8/disp2_3 /d1/d2/d3/d4/test2 Status: Connected Number of entries: 1 Brick 10.70.47.116:/bricks/brick8/disp2_4 /d1/d2/d3/d4/test2 Status: Connected Number of entries: 1 Brick 10.70.47.117:/bricks/brick8/disp2_5 Status: Transport endpoint is not connected Number of entries: - [root@dhcp47-121 ~]# [2017-05-16 08:54:10.160132] E [MSGID: 115070] [server-rpc-fops.c:1474:server_open_cbk] 0-disp2-server: 4619: OPEN /d1/d2/d3/d4/test2 (3673eecb-e5b5-4014-9bc6-a2fc007f08cb) ==> (Input/output error) [Input/output error] pending frames: frame : type(0) op(29) frame : type(0) op(11) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-05-16 08:55:01 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f0e805201b2] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f0e80529bd4] /lib64/libc.so.6(+0x35250)[0x7f0e7ec02250] /usr/lib64/glusterfs/3.8.4/xlator/features/bitrot-stub.so(+0xadf4)[0x7f0e7174cdf4] /usr/lib64/glusterfs/3.8.4/xlator/features/bitrot-stub.so(+0xde56)[0x7f0e7174fe56] /usr/lib64/glusterfs/3.8.4/xlator/features/access-control.so(+0x5815)[0x7f0e71535815] /usr/lib64/glusterfs/3.8.4/xlator/features/locks.so(+0x6dc8)[0x7f0e71312dc8] /usr/lib64/glusterfs/3.8.4/xlator/features/worm.so(+0x7e59)[0x7f0e71106e59] /usr/lib64/glusterfs/3.8.4/xlator/features/read-only.so(+0x4478)[0x7f0e70efb478] /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so(+0x50b4)[0x7f0e70ce70b4] /usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so(+0xf143)[0x7f0e70ad7143] /lib64/libglusterfs.so.0(default_open_resume+0x1c9)[0x7f0e805b1269] /lib64/libglusterfs.so.0(call_resume+0x75)[0x7f0e80542b25] /usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so(+0x4957)[0x7f0e708c1957] /lib64/libpthread.so.0(+0x7dc5)[0x7f0e7f37fdc5] /lib64/libc.so.6(clone+0x6d)[0x7f0e7ecc473d] BT: Program terminated with signal 11, Segmentation fault. #0 list_add_tail (head=0x7f0e28001908, new=0x18) at ../../../../../libglusterfs/src/list.h:40 40 new->next = head; Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libacl-2.2.51-12.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7_3.2.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 sqlite-3.7.17-8.el7.x86_64 sssd-client-1.14.0-43.el7_3.14.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 list_add_tail (head=0x7f0e28001908, new=0x18) at ../../../../../libglusterfs/src/list.h:40 #1 br_stub_add_fd_to_inode (this=this@entry=0x7f0e6c012440, fd=fd@entry=0x7f0e6c0a5050, ctx=ctx@entry=0x0) at bit-rot-stub.c:2398 #2 0x00007f0e7174fe56 in br_stub_open (frame=0x7f0e28000ca0, this=0x7f0e6c012440, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at bit-rot-stub.c:2352 #3 0x00007f0e71535815 in posix_acl_open (frame=0x7f0e280014b0, this=0x7f0e6c013d70, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at posix-acl.c:1129 #4 0x00007f0e71312dc8 in pl_open (frame=frame@entry=0x7f0e28000ac0, this=this@entry=0x7f0e6c015320, loc=loc@entry=0x7f0e6c0ccf90, flags=flags@entry=2, fd=fd@entry=0x7f0e6c0a5050, xdata=xdata@entry=0x0) at posix.c:1698 #5 0x00007f0e71106e59 in worm_open (frame=0x7f0e28000ac0, this=<optimized out>, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at worm.c:43 #6 0x00007f0e70efb478 in ro_open (frame=0x7f0e28001740, this=0x7f0e6c018130, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at read-only-common.c:341 #7 0x00007f0e70ce70b4 in leases_open (frame=0x7f0e28001b50, this=0x7f0e6c019880, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at leases.c:75 #8 0x00007f0e70ad7143 in up_open (frame=0x7f0e28002250, this=0x7f0e6c01af20, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at upcall.c:75 #9 0x00007f0e805b1269 in default_open_resume (frame=0x7f0e6c002020, this=0x7f0e6c01c690, loc=0x7f0e6c0ccf90, flags=2, fd=0x7f0e6c0a5050, xdata=0x0) at defaults.c:1726 #10 0x00007f0e80542b25 in call_resume (stub=0x7f0e6c0ccf40) at call-stub.c:2508 #11 0x00007f0e708c1957 in iot_worker (data=0x7f0e6c0550e0) at io-threads.c:220 #12 0x00007f0e7f37fdc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f0e7ecc473d in clone () from /lib64/libc.so.6 (gdb)
REVIEW: https://review.gluster.org/17357 (features/bitrot: Fix glusterfsd crash) posted (#1) for review on master by Kotresh HR (khiremat)
COMMIT: https://review.gluster.org/17357 committed in master by Atin Mukherjee (amukherj) ------ commit 6908e962f6293d38f0ee65c088247a66f2832e4a Author: Kotresh HR <khiremat> Date: Mon May 22 08:47:07 2017 -0400 features/bitrot: Fix glusterfsd crash With object versioning being optional, it can so happen the bitrot stub context is not always set. When it's not found, it's initialized. But was not being assigned to use in the local function. This was leading for brick crash. Fixed the same. Change-Id: I0dab6435cdfe16a8c7f6a31ffec1a370822597a8 BUG: 1454317 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: https://review.gluster.org/17357 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/