Bug 1452936

Summary:	tcmu-runner crashes when we create 10 blocks and delete them in a loop
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Pranith Kumar K <pkarampu>
Component:	tcmu-runner	Assignee:	Prasanna Kumar Kalever <prasanna.kalever>
Status:	CLOSED ERRATA	QA Contact:	Sweta Anandpara <sanandpa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, rhs-bugs, sanandpa, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tcmu-runner-1.2.0-3.el7rhgs	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-21 04:19:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1417151

Description Pranith Kumar K 2017-05-20 16:27:29 UTC

Description of problem:
[root@localhost tcmu-runner]# ./tcmu-runner 
=================================================================
==1257==ERROR: AddressSanitizer: heap-use-after-free on address 0x60400000b6d0 at pc 0x7f71ec9f7025 bp 0x7ffd11b4e250 sp 0x7ffd11b4e240
WRITE of size 48 at 0x60400000b6d0 thread T0
    #0 0x7f71ec9f7024 in gluster_cache_add /root/tcmu-runner/glfs.c:187
    #1 0x7f71ec9f8686 in tcmu_create_glfs_object /root/tcmu-runner/glfs.c:376
    #2 0x7f71ec9f8b72 in glfs_check_config /root/tcmu-runner/glfs.c:451
    #3 0x411aa9 in on_check_config /root/tcmu-runner/main.c:198
    #4 0x7f71f1667c57 in ffi_call_unix64 (/lib64/libffi.so.6+0x5c57)
    #5 0x7f71f16676b9 in ffi_call (/lib64/libffi.so.6+0x56b9)
    #6 0x7f71f3461c1d in g_cclosure_marshal_generic (/lib64/libgobject-2.0.so.0+0xfc1d)
    #7 0x7f71f34613e4 in g_closure_invoke (/lib64/libgobject-2.0.so.0+0xf3e4)
    #8 0x7f71f3473431  (/lib64/libgobject-2.0.so.0+0x21431)
    #9 0x7f71f347b240 in g_signal_emitv (/lib64/libgobject-2.0.so.0+0x29240)
    #10 0x416844 in _tcmuservice1_skeleton_handle_method_call /root/tcmu-runner/tcmuhandler-generated.c:1029
    #11 0x7f71f3775e86  (/lib64/libgio-2.0.so.0+0xd1e86)
    #12 0x7f71f375e28b  (/lib64/libgio-2.0.so.0+0xba28b)
    #13 0x7f71f31858e6  (/lib64/libglib-2.0.so.0+0x468e6)
    #14 0x7f71f3188e51 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x49e51)
    #15 0x7f71f31891cf  (/lib64/libglib-2.0.so.0+0x4a1cf)
    #16 0x7f71f31894f1 in g_main_loop_run (/lib64/libglib-2.0.so.0+0x4a4f1)
    #17 0x414019 in main /root/tcmu-runner/main.c:848
    #18 0x7f71f233a400 in __libc_start_main (/lib64/libc.so.6+0x20400)
    #19 0x4079f9 in _start (/root/tcmu-runner/tcmu-runner+0x4079f9)

0x60400000b6d0 is located 0 bytes inside of 48-byte region [0x60400000b6d0,0x60400000b700)
freed by thread T21 here:
    #0 0x7f71f3d34b00 in free (/lib64/libasan.so.3+0xc6b00)
    #1 0x7f71ec9f78dc in gluster_cache_refresh /root/tcmu-runner/glfs.c:256
    #2 0x7f71ec9f93e9 in tcmu_glfs_close /root/tcmu-runner/glfs.c:564
    #3 0x412f27 in cmdproc_thread_cleanup /root/tcmu-runner/main.c:531
    #4 0x41305e in tcmur_cmdproc_thread /root/tcmu-runner/main.c:541
    #5 0x7f71f2f286c9 in start_thread (/lib64/libpthread.so.0+0x76c9)

previously allocated by thread T0 here:
    #0 0x7f71f3d35210 in realloc (/lib64/libasan.so.3+0xc7210)
    #1 0x7f71ec9f6f94 in gluster_cache_add /root/tcmu-runner/glfs.c:187
    #2 0x7f71ec9f8686 in tcmu_create_glfs_object /root/tcmu-runner/glfs.c:376
    #3 0x7f71ec9f8b72 in glfs_check_config /root/tcmu-runner/glfs.c:451
    #4 0x411aa9 in on_check_config /root/tcmu-runner/main.c:198
    #5 0x7f71f1667c57 in ffi_call_unix64 (/lib64/libffi.so.6+0x5c57)
    #6 0x7ffd11b4f66f  (<unknown module>)

Thread T21 created by T0 here:
    #0 0x7f71f3c9f488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
    #1 0x7f71f3a4f86a in tcmulib_start_cmdproc_thread /root/tcmu-runner/libtcmu.c:732
    #2 0x4134dd in dev_added /root/tcmu-runner/main.c:631
    #3 0x7f71f3a4c628 in add_device /root/tcmu-runner/libtcmu.c:298
    #4 0x7f71f3a4b545 in handle_netlink /root/tcmu-runner/libtcmu.c:69
    #5 0x7f71f26e3634  (/lib64/libnl-genl-3.so.200+0x3634)
    #6 0x7f71f28f7a7b in nl_recvmsgs_report (/lib64/libnl-3.so.200+0x11a7b)

SUMMARY: AddressSanitizer: heap-use-after-free /root/tcmu-runner/glfs.c:187 in gluster_cache_add
Shadow bytes around the buggy address:
  0x0c087fff9680: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 06
  0x0c087fff9690: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fd
  0x0c087fff96a0: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 07
  0x0c087fff96b0: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 01
  0x0c087fff96c0: fa fa 00 00 00 00 00 04 fa fa 00 00 00 00 00 00
=>0x0c087fff96d0: fa fa 00 00 00 00 00 00 fa fa[fd]fd fd fd fd fd
  0x0c087fff96e0: fa fa 00 00 00 00 00 00 fa fa fd fd fd fd fd fa
  0x0c087fff96f0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9700: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9710: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c087fff9720: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Created a replica 3 volume:

Create 10 blocks
for i in {1..10}; do gluster-block create r3/$i ha 3 192.168.122.61,192.168.122.123,192.168.122.113 1GiB ; done

Delete them:
for i in {1..10}; do gluster-block delete r3/$i ; done

Create them again:
It crashed with above trace.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Prasanna Kumar Kalever 2017-05-21 10:28:28 UTC

Related patch:
https://github.com/open-iscsi/tcmu-runner/pull/158

Comment 6 Sweta Anandpara 2017-07-17 10:48:39 UTC

Tested and verified this on the build glusterfs-3.8.4-33 and gluster-block-0.2.1-6. 

Created multiple blocks in loops (>30) and deleted them, one by one as well as simultaneously when another create was taking place. The blocks in question were created and deleted successfully, with no errors/crashes seen in logs.

Moving this bug to verified in 3.3.

Comment 7 Sweta Anandpara 2017-07-17 10:52:13 UTC


[root@dhcp47-116 ~]# for i in {14..50}; do gluster-block create nash/nb$i ha 3 auth enable 10.70.47.115,10.70.47.116,10.70.47.117 1M; done 
IQN: iqn.2016-12.org.gluster-block:97bc7144-8060-4b75-8a18-5f10161d2036
USERNAME: 97bc7144-8060-4b75-8a18-5f10161d2036
PASSWORD: 052a6474-7d39-4034-b817-bc1178bf356d
PORTAL(S):  10.70.47.115:3260 10.70.47.116:3260 10.70.47.117:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:114cedc6-bdae-4974-b6a3-5d1a501be321
USERNAME: 114cedc6-bdae-4974-b6a3-5d1a501be321
PASSWORD: 99ef60e1-d6c9-4880-87c5-7bb5c96f9a15
PORTAL(S):  10.70.47.115:3260 10.70.47.116:3260 10.70.47.117:3260
RESULT: SUCCESS
...
...
...
IQN: iqn.2016-12.org.gluster-block:e9b59264-9129-4ee6-a4cb-eb845cdd0231
USERNAME: e9b59264-9129-4ee6-a4cb-eb845cdd0231
PASSWORD: b2f30d1e-78de-4a28-9e44-5e224ef8d51b
PORTAL(S):  10.70.47.115:3260 10.70.47.116:3260 10.70.47.117:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:72dd48a2-98d9-47f5-b822-265fb2c05100
USERNAME: 72dd48a2-98d9-47f5-b822-265fb2c05100
PASSWORD: 877000bd-5162-4eae-9881-223e0806e2fa
PORTAL(S):  10.70.47.115:3260 10.70.47.116:3260 10.70.47.117:3260
RESULT: SUCCESS
[root@dhcp47-116 ~]# 



[root@dhcp47-115 ~]# for i in {1..20}; do gluster-block delete nash/nb$i; done
SUCCESSFUL ON:   10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117 10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.117 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.117 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115 10.70.47.117 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117 10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.117
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.117 10.70.47.115 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.117 10.70.47.115 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115 10.70.47.116 10.70.47.117
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.117 10.70.47.115 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115 10.70.47.117 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.115 10.70.47.117 10.70.47.116
RESULT: SUCCESS
SUCCESSFUL ON:   10.70.47.116 10.70.47.115 10.70.47.117
RESULT: SUCCESS
[root@dhcp47-115 ~]# 


ENVIRONMENT
===========

[root@dhcp47-116 ~]# 
[root@dhcp47-116 ~]# gluster pool list
UUID					Hostname                         	State
49610061-1788-4cbc-9205-0e59fe91d842	dhcp47-121.lab.eng.blr.redhat.com	Connected 
a0557927-4e5e-4ff7-8dce-94873f867707	dhcp47-113.lab.eng.blr.redhat.com	Connected 
c0dac197-5a4d-4db7-b709-dbf8b8eb0896	dhcp47-114.lab.eng.blr.redhat.com	Connected 
f828fdfa-e08f-4d12-85d8-2121cafcf9d0	dhcp47-115.lab.eng.blr.redhat.com	Connected 
17eb3cef-17e7-4249-954b-fc19ec608304	dhcp47-117.lab.eng.blr.redhat.com	Connected 
a96e0244-b5ce-4518-895c-8eb453c71ded	localhost                        	Connected 
[root@dhcp47-116 ~]# 
[root@dhcp47-116 ~]# rpm -qa | grep gluster
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-events-3.8.4-33.el7rhgs.x86_64
gluster-block-0.2.1-6.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-3.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-26.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
[root@dhcp47-116 ~]# 
[root@dhcp47-116 ~]# gluster v info nash
 
Volume Name: nash
Type: Replicate
Volume ID: f1ea3d3e-c536-4f36-b61f-cb9761b8a0a6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.115:/bricks/brick4/nash0
Brick2: 10.70.47.116:/bricks/brick4/nash1
Brick3: 10.70.47.117:/bricks/brick4/nash2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
server.allow-insecure: on
cluster.brick-multiplex: disable
cluster.enable-shared-storage: enable
[root@dhcp47-116 ~]# 
[root@dhcp47-116 ~]# gluster-block list nash
nb21
nb22
nb23
nb24
nb25
nb26
nb27
nb28
nb29
nb30
nb31
nb32
nb33
nb34
nb35
nb36
nb37
nb38
nb39
nb40
nb41
nb42
nb43
nb44
nb45
nb46
nb47
nb48
nb49
nb50
[root@dhcp47-116 ~]# 

Eniv

Comment 10 errata-xmlrpc 2017-09-21 04:19:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773