Description of problem: ********************** To verify the dependency of services in block storage setup, I found that starting gluster-blockd starts tcmu-runner and rpcbind which is as expected. Now If I stop gluster-blockd it doesn't stops tcmu-runner and rpcbind, so I stopped tcmu-runner and rpcbind manually. After stopping all , If I start gluster-blockd , it fails to start as it was not able to start tcmu-runner and there were multiple crashes. I could figure out that stopping rpcbind stops glusterd as well so gluster-blockd fails to strat as gluster is down but it should not crash tcmu-runner. It could happen even when volume is stopped and we try to start gluster-blockd. snippet from log: May 4 11:37:18 dhcp46-65 journal: tcmu-runner#012: tcmu_create_glfs_object:377 : glfs_init failed: Transport endpoint is not connected May 4 11:37:19 dhcp46-65 journal: tcmu-runner#012: glfs_check_config:426 : tcmu_create_glfs_object failed May 4 11:37:19 dhcp46-65 tcmu-runner: *** Error in `/usr/bin/tcmu-runner': double free or corruption (out): 0x00000000021c2b10 *** May 4 11:37:19 dhcp46-65 tcmu-runner: ======= Backtrace: ========= May 4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libc.so.6(+0x7c503)[0x7f29b2029503] May 4 11:37:19 dhcp46-65 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1361)[0x7f29b09a8361] May 4 11:37:19 dhcp46-65 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1e93)[0x7f29b09a8e93] May 4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(+0x9475)[0x7f29b36db475] May 4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(+0x9a8d)[0x7f29b36dba8d] May 4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(tcmulib_initialize+0x1f8)[0x7f29b36dc058] May 4 11:37:19 dhcp46-65 tcmu-runner: /usr/bin/tcmu-runner[0x4060b0] May 4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f29b1fceb35] May 4 11:37:19 dhcp46-65 tcmu-runner: /usr/bin/tcmu-runner[0x40621c] May 4 11:37:19 dhcp46-65 tcmu-runner: ======= Memory map: ======== May 4 11:37:19 dhcp46-65 tcmu-runner: 00400000-0040f000 r-xp 00000000 fd:00 784294 /usr/bin/tcmu-runner May 4 11:37:19 dhcp46-65 tcmu-runner: 0060e000-0060f000 r--p 0000e000 fd:00 784294 /usr/bin/tcmu-runner May 4 11:37:19 dhcp46-65 tcmu-runner: 0060f000-00610000 rw-p 0000f000 fd:00 784294 /usr/bin/tcmu-runner May 4 11:37:19 dhcp46-65 tcmu-runner: 021b1000-0228c000 rw-p 00000000 00:00 0 [heap] May 4 11:37:19 dhcp46-65 tcmu-runner: 7f2994000000-7f2994021000 rw-p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f2994021000-7f2998000000 ---p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f299c000000-7f299c021000 rw-p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f299c021000-7f29a0000000 ---p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a17f8000-7f29a17f9000 ---p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a17f9000-7f29a27fa000 rw-p 00000000 00:00 0 [stack:23957] May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a27fa000-7f29a2802000 r-xp 00000000 fd:00 34037463 /usr/lib64/libnss_sss.so.2 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2802000-7f29a2a01000 ---p 00008000 fd:00 34037463 /usr/lib64/libnss_sss.so.2 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a01000-7f29a2a02000 r--p 00007000 fd:00 34037463 /usr/lib64/libnss_sss.so.2 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a02000-7f29a2a03000 rw-p 00008000 fd:00 34037463 /usr/lib64/libnss_sss.so.2 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a03000-7f29a2a0f000 r-xp 00000000 fd:00 33569228 /usr/lib64/libnss_files-2.17.so May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a0f000-7f29a2c0e000 ---p 0000c000 fd:00 33569228 /usr/lib64/libnss_files-2.17.so May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c0e000-7f29a2c0f000 r--p 0000b000 fd:00 33569228 /usr/lib64/libnss_files-2.17.so May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c0f000-7f29a2c10000 rw-p 0000c000 fd:00 33569228 /usr/lib64/libnss_files-2.17.so May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c10000-7f29a2c16000 rw-p 00000000 00:00 0 May 4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c16000-7f29a2c17000 ---p 00000000 00:00 0 Version-Release number of selected component (if applicable): rpm -qa | grep gluster-block gluster-block-0.1.1-1.x86_64 How reproducible: Always Steps to Reproduce: 1.Start gluster-blockd and verify if tcmu-runner and rpcbind get started 2.stop gluster-blockd and verify if tcmu-runner and rpcbind is running 3. stop rpcbind and tcmu-runner 4. start gluster-blockd Actual results: *************** gluster-blockd fails to start and tcmu-runner crashes multiple times. Expected results: ***************** tcmu-runner should not crash Additional info:
Here is patch: https://github.com/open-iscsi/tcmu-runner/pull/144/commits/cc42f9ad5aa25b0efaa1f19c2d397d23358967b5
With the latest builds , I still see this crash after following the exact same steps provided in the description. May 30 09:36:02 dhcp46-151 : tcmu_create_glfs_object:405 : glfs_init failed: Success May 30 09:36:02 dhcp46-151 tcmu-runner: 2017-05-30 09:36:02.917 16886 [ERROR] tcmu_create_glfs_object:405 : glfs_init failed: Success May 30 09:36:03 dhcp46-151 tcmu-runner: *** Error in `/usr/bin/tcmu-runner': double free or corruption (out): 0x0000000002360590 *** May 30 09:36:03 dhcp46-151 tcmu-runner: ======= Backtrace: ========= May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libc.so.6(+0x7c503)[0x7f15d9fac503] May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1754)[0x7f15d8b41754] May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x23a7)[0x7f15d8b423a7] May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(+0x988d)[0x7f15db87488d] May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(+0x9ddf)[0x7f15db874ddf] May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(tcmulib_initialize+0x1f8)[0x7f15db875358] May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/bin/tcmu-runner[0x407780] May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f15d9f51b35] May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/bin/tcmu-runner[0x407927] May 30 09:36:03 dhcp46-151 tcmu-runner: ======= Memory map: ======== May 30 09:36:03 dhcp46-151 tcmu-runner: 00400000-00415000 r-xp 00000000 fd:00 780917 /usr/bin/tcmu-runner May 30 09:36:03 dhcp46-151 tcmu-runner: 00615000-00616000 r--p 00015000 fd:00 780917 /usr/bin/tcmu-runner May 30 09:36:03 dhcp46-151 tcmu-runner: 00616000-00617000 rw-p 00016000 fd:00 780917 /usr/bin/tcmu-runner May 30 09:36:03 dhcp46-151 tcmu-runner: 02351000-0242b000 rw-p 00000000 00:00 0 [heap] May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15bc000000-7f15bc021000 rw-p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15bc021000-7f15c0000000 ---p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c0000000-7f15c0021000 rw-p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c0021000-7f15c4000000 ---p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c4000000-7f15c4021000 rw-p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c4021000-7f15c8000000 ---p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c9991000-7f15c9992000 ---p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c9992000-7f15ca993000 rw-p 00000000 00:00 0 [stack:16895] May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15ca993000-7f15ca994000 ---p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15ca994000-7f15cb194000 rw-p 00000000 00:00 0 May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15cb194000-7f15cb19c000 r-xp 00000000 fd:00 34106719 /usr/lib64/libnss_sss.so.2 Please have a look.
Surabhi can you please paste me the version numbers of gluster-block and tcmu-runner which you have used to verify this bug ?
rpm -qa | grep gluster-block gluster-block-debuginfo-0.2-3.el7rhgs.x86_64 gluster-block-0.2-3.el7rhgs.x86_64 rpm -qa | grep tcmu-runner tcmu-runner-1.2.0-3.el7rhgs.x86_64 tcmu-runner-debuginfo-1.2.0-3.el7rhgs.x86_64
With upgrading to latest build , following above steps, there is no crash seen for tcmu. Only there are errors in log messages regarding glfs_init failed which is there because glusterd is not running when we stop rpcbind. Works as expected.Marking the BZ verified. gluster-block-0.2.1-1.el7rhgs.x86_64 tcmu-runner-1.2.0-4.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2773