Bug 1448433 - [gluster-block]:tcmu-runner crashes if glusterd is stopped and try to start tcmu-runner
Summary: [gluster-block]:tcmu-runner crashes if glusterd is stopped and try to start t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Prasanna Kumar Kalever
QA Contact: surabhi
URL:
Whiteboard:
Depends On:
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-05-05 12:14 UTC by surabhi
Modified: 2017-09-21 04:17 UTC (History)
4 users (show)

Fixed In Version: tcmu-runner-1.2.0-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:17:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2773 0 normal SHIPPED_LIVE new packages: gluster-block 2017-09-21 08:16:22 UTC

Description surabhi 2017-05-05 12:14:13 UTC
Description of problem:

**********************
To verify the dependency of services in block storage setup, I found that starting gluster-blockd starts tcmu-runner and rpcbind which is as expected.

Now If I stop gluster-blockd it doesn't stops tcmu-runner and rpcbind, so I stopped tcmu-runner and rpcbind manually.

After stopping all , If I start gluster-blockd , it fails to start as it was not able to start tcmu-runner and there were multiple crashes.


I could figure out that stopping rpcbind stops glusterd as well so gluster-blockd fails to strat as gluster is down but it should not crash tcmu-runner.

It could happen even when volume is stopped and we try to start gluster-blockd.

snippet from log:


May  4 11:37:18 dhcp46-65 journal: tcmu-runner#012: tcmu_create_glfs_object:377 : glfs_init failed: Transport endpoint is not connected
May  4 11:37:19 dhcp46-65 journal: tcmu-runner#012: glfs_check_config:426 : tcmu_create_glfs_object failed
May  4 11:37:19 dhcp46-65 tcmu-runner: *** Error in `/usr/bin/tcmu-runner': double free or corruption (out): 0x00000000021c2b10 ***
May  4 11:37:19 dhcp46-65 tcmu-runner: ======= Backtrace: =========
May  4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libc.so.6(+0x7c503)[0x7f29b2029503]
May  4 11:37:19 dhcp46-65 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1361)[0x7f29b09a8361]
May  4 11:37:19 dhcp46-65 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1e93)[0x7f29b09a8e93]
May  4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(+0x9475)[0x7f29b36db475]
May  4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(+0x9a8d)[0x7f29b36dba8d]
May  4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libtcmu.so.1(tcmulib_initialize+0x1f8)[0x7f29b36dc058]
May  4 11:37:19 dhcp46-65 tcmu-runner: /usr/bin/tcmu-runner[0x4060b0]
May  4 11:37:19 dhcp46-65 tcmu-runner: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f29b1fceb35]
May  4 11:37:19 dhcp46-65 tcmu-runner: /usr/bin/tcmu-runner[0x40621c]
May  4 11:37:19 dhcp46-65 tcmu-runner: ======= Memory map: ========
May  4 11:37:19 dhcp46-65 tcmu-runner: 00400000-0040f000 r-xp 00000000 fd:00 784294                             /usr/bin/tcmu-runner
May  4 11:37:19 dhcp46-65 tcmu-runner: 0060e000-0060f000 r--p 0000e000 fd:00 784294                             /usr/bin/tcmu-runner
May  4 11:37:19 dhcp46-65 tcmu-runner: 0060f000-00610000 rw-p 0000f000 fd:00 784294                             /usr/bin/tcmu-runner
May  4 11:37:19 dhcp46-65 tcmu-runner: 021b1000-0228c000 rw-p 00000000 00:00 0                                  [heap]
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f2994000000-7f2994021000 rw-p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f2994021000-7f2998000000 ---p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f299c000000-7f299c021000 rw-p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f299c021000-7f29a0000000 ---p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a17f8000-7f29a17f9000 ---p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a17f9000-7f29a27fa000 rw-p 00000000 00:00 0                          [stack:23957]
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a27fa000-7f29a2802000 r-xp 00000000 fd:00 34037463                   /usr/lib64/libnss_sss.so.2
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2802000-7f29a2a01000 ---p 00008000 fd:00 34037463                   /usr/lib64/libnss_sss.so.2
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a01000-7f29a2a02000 r--p 00007000 fd:00 34037463                   /usr/lib64/libnss_sss.so.2
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a02000-7f29a2a03000 rw-p 00008000 fd:00 34037463                   /usr/lib64/libnss_sss.so.2
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a03000-7f29a2a0f000 r-xp 00000000 fd:00 33569228                   /usr/lib64/libnss_files-2.17.so
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2a0f000-7f29a2c0e000 ---p 0000c000 fd:00 33569228                   /usr/lib64/libnss_files-2.17.so
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c0e000-7f29a2c0f000 r--p 0000b000 fd:00 33569228                   /usr/lib64/libnss_files-2.17.so
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c0f000-7f29a2c10000 rw-p 0000c000 fd:00 33569228                   /usr/lib64/libnss_files-2.17.so
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c10000-7f29a2c16000 rw-p 00000000 00:00 0
May  4 11:37:19 dhcp46-65 tcmu-runner: 7f29a2c16000-7f29a2c17000 ---p 00000000 00:00 0


Version-Release number of selected component (if applicable):
rpm -qa | grep gluster-block
gluster-block-0.1.1-1.x86_64


How reproducible:
Always

Steps to Reproduce:
1.Start gluster-blockd and verify if tcmu-runner and rpcbind get started
2.stop gluster-blockd and verify if tcmu-runner and rpcbind is running
3. stop rpcbind and tcmu-runner
4. start gluster-blockd

Actual results:
***************
gluster-blockd fails to start and tcmu-runner crashes multiple times.


Expected results:
*****************
tcmu-runner should not crash


Additional info:

Comment 6 surabhi 2017-05-30 13:39:06 UTC
With the latest builds , I still see this crash after following the exact same steps provided in the description.

May 30 09:36:02 dhcp46-151 : tcmu_create_glfs_object:405 : glfs_init failed: Success
May 30 09:36:02 dhcp46-151 tcmu-runner: 2017-05-30 09:36:02.917 16886 [ERROR] tcmu_create_glfs_object:405 : glfs_init failed: Success
May 30 09:36:03 dhcp46-151 tcmu-runner: *** Error in `/usr/bin/tcmu-runner': double free or corruption (out): 0x0000000002360590 ***
May 30 09:36:03 dhcp46-151 tcmu-runner: ======= Backtrace: =========
May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libc.so.6(+0x7c503)[0x7f15d9fac503]
May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x1754)[0x7f15d8b41754]
May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/lib64/tcmu-runner/handler_glfs.so(+0x23a7)[0x7f15d8b423a7]
May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(+0x988d)[0x7f15db87488d]
May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(+0x9ddf)[0x7f15db874ddf]
May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libtcmu.so.1(tcmulib_initialize+0x1f8)[0x7f15db875358]
May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/bin/tcmu-runner[0x407780]
May 30 09:36:03 dhcp46-151 tcmu-runner: /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f15d9f51b35]
May 30 09:36:03 dhcp46-151 tcmu-runner: /usr/bin/tcmu-runner[0x407927]
May 30 09:36:03 dhcp46-151 tcmu-runner: ======= Memory map: ========
May 30 09:36:03 dhcp46-151 tcmu-runner: 00400000-00415000 r-xp 00000000 fd:00 780917                             /usr/bin/tcmu-runner
May 30 09:36:03 dhcp46-151 tcmu-runner: 00615000-00616000 r--p 00015000 fd:00 780917                             /usr/bin/tcmu-runner
May 30 09:36:03 dhcp46-151 tcmu-runner: 00616000-00617000 rw-p 00016000 fd:00 780917                             /usr/bin/tcmu-runner
May 30 09:36:03 dhcp46-151 tcmu-runner: 02351000-0242b000 rw-p 00000000 00:00 0                                  [heap]
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15bc000000-7f15bc021000 rw-p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15bc021000-7f15c0000000 ---p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c0000000-7f15c0021000 rw-p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c0021000-7f15c4000000 ---p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c4000000-7f15c4021000 rw-p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c4021000-7f15c8000000 ---p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c9991000-7f15c9992000 ---p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15c9992000-7f15ca993000 rw-p 00000000 00:00 0                          [stack:16895]
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15ca993000-7f15ca994000 ---p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15ca994000-7f15cb194000 rw-p 00000000 00:00 0
May 30 09:36:03 dhcp46-151 tcmu-runner: 7f15cb194000-7f15cb19c000 r-xp 00000000 fd:00 34106719                   /usr/lib64/libnss_sss.so.2


Please have a look.

Comment 7 Prasanna Kumar Kalever 2017-05-30 14:25:17 UTC
Surabhi can you please paste me the version numbers of gluster-block and tcmu-runner which you have used to verify this bug ?

Comment 8 surabhi 2017-06-01 07:44:50 UTC
rpm -qa | grep gluster-block
gluster-block-debuginfo-0.2-3.el7rhgs.x86_64
gluster-block-0.2-3.el7rhgs.x86_64

rpm -qa | grep tcmu-runner
tcmu-runner-1.2.0-3.el7rhgs.x86_64
tcmu-runner-debuginfo-1.2.0-3.el7rhgs.x86_64

Comment 9 surabhi 2017-06-13 08:57:29 UTC
With upgrading to latest build , following above steps, there is no crash seen for tcmu. Only there are errors in log messages regarding glfs_init failed which is there because glusterd is not running when we stop rpcbind.

Works as expected.Marking the BZ verified.

gluster-block-0.2.1-1.el7rhgs.x86_64
tcmu-runner-1.2.0-4.el7rhgs.x86_64

Comment 12 errata-xmlrpc 2017-09-21 04:17:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773


Note You need to log in before you can comment on or make changes to this bug.