Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1600790 - Segmentation fault while using gfapi while getting volume utilization
Segmentation fault while using gfapi while getting volume utilization
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rpc (Show other bugs)
3.4
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.4.0
Assigned To: Mohit Agrawal
Upasana
:
Depends On: 1607783
Blocks: 1503137 1600092
  Show dependency treegraph
 
Reported: 2018-07-12 23:58 EDT by Shubhendu Tripathi
Modified: 2018-09-18 06:24 EDT (History)
12 users (show)

See Also:
Fixed In Version: glusterfs-3.12.2-15
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1607783 (view as bug list)
Environment:
Last Closed: 2018-09-04 02:50:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gfapi-segfault.txt (6.22 KB, text/plain)
2018-07-13 00:00 EDT, Shubhendu Tripathi
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 02:51 EDT

  None (edit)
Description Shubhendu Tripathi 2018-07-12 23:58:49 EDT
Description of problem:
We have a 24 node gluster cluster with a distribute-disperse volume with bricks from all the nodes (48 bricks). While using the gfapi for getting the volume utilization, it throws a segmentation fault.

the volume info for the concerned volume is as below

# gluster v info volume_gama_disperse_4_plus_2x2
 
Volume Name: volume_gama_disperse_4_plus_2x2
Type: Distributed-Disperse
Volume ID: b7947c8d-c0e6-458a-a3d5-47221a5a0e63
Status: Stopped
Snapshot Count: 0
Number of Bricks: 8 x (4 + 2) = 48
Transport-type: tcp
Bricks:
Brick1: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick2: dahorak-usm3-gl02.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick3: dahorak-usm3-gl03.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick4: dahorak-usm3-gl04.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick5: dahorak-usm3-gl05.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick6: dahorak-usm3-gl06.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick7: dahorak-usm3-gl07.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick8: dahorak-usm3-gl08.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick9: dahorak-usm3-gl09.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick10: dahorak-usm3-gl10.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick11: dahorak-usm3-gl11.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick12: dahorak-usm3-gl12.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick13: dahorak-usm3-gl13.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick14: dahorak-usm3-gl14.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick15: dahorak-usm3-gl15.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick16: dahorak-usm3-gl16.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick17: dahorak-usm3-gl17.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick18: dahorak-usm3-gl18.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick19: dahorak-usm3-gl19.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick20: dahorak-usm3-gl20.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick21: dahorak-usm3-gl21.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick22: dahorak-usm3-gl22.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick23: dahorak-usm3-gl23.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick24: dahorak-usm3-gl24.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_1/1
Brick25: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick26: dahorak-usm3-gl02.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick27: dahorak-usm3-gl03.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick28: dahorak-usm3-gl04.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick29: dahorak-usm3-gl05.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick30: dahorak-usm3-gl06.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick31: dahorak-usm3-gl07.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick32: dahorak-usm3-gl08.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick33: dahorak-usm3-gl09.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick34: dahorak-usm3-gl10.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick35: dahorak-usm3-gl11.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick36: dahorak-usm3-gl12.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick37: dahorak-usm3-gl13.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick38: dahorak-usm3-gl14.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick39: dahorak-usm3-gl15.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick40: dahorak-usm3-gl16.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick41: dahorak-usm3-gl17.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick42: dahorak-usm3-gl18.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick43: dahorak-usm3-gl19.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick44: dahorak-usm3-gl20.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick45: dahorak-usm3-gl21.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick46: dahorak-usm3-gl22.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick47: dahorak-usm3-gl23.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Brick48: dahorak-usm3-gl24.usmqe.lab.eng.blr.redhat.com:/mnt/brick_gama_disperse_2/2
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on


Version-Release number of selected component (if applicable):

glusterfs-fuse-3.12.2-13.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
tendrl-gluster-integration-1.6.4-7.fc23.noarch
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-rdma-3.12.2-13.el7rhgs.x86_64
glusterfs-cli-3.12.2-13.el7rhgs.x86_64
python2-gluster-3.12.2-13.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64
glusterfs-libs-3.12.2-13.el7rhgs.x86_64
glusterfs-3.12.2-13.el7rhgs.x86_64
glusterfs-events-3.12.2-13.el7rhgs.x86_64
glusterfs-server-3.12.2-13.el7rhgs.x86_64
glusterfs-api-3.12.2-13.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-13.el7rhgs.x86_64

The WA code which invokes volume utilization using gfapi is at https://github.com/Tendrl/gluster-integration/blob/master/tendrl/gluster_integration/gfapi.py

How reproducible:
Always

Steps to Reproduce:
1. Create a 24 node gluster cluster
2. Create a 8 x (4+2) distribute-disperse volume with bricks from all the 24 nodes
3. Run volume utilization utility

Actual results:
Throws a segmentation fault. For other volumes of type distribute-replicate it shows the volume utilization as expected.

Expected results:
It should show the volume utilization details for all the volumes

Additional info:
The trace while segmentation fault is attached for reference
Comment 2 Shubhendu Tripathi 2018-07-13 00:00 EDT
Created attachment 1458625 [details]
gfapi-segfault.txt
Comment 3 Shubhendu Tripathi 2018-07-13 00:19:03 EDT
If I create a distribute-disperse volume with smaller no of bricks and from few nodes, the volume utilization details are shown properly as show below

# gluster v info test-disp
 
Volume Name: test-disp
Type: Distributed-Disperse
Volume ID: b2d2d004-34be-4448-9320-6a952b562447
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b1
Brick2: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b2
Brick3: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b3
Brick4: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b4
Brick5: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b5
Brick6: dahorak-usm3-gl01.usmqe.lab.eng.blr.redhat.com:/root/gluster_bricks/test-disp_b6
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

# tendrl-gluster-vol-utilization test-disp
{"test-disp": {"pcnt_used": 16.85206375567077, "used": 2130144.0, "used_inode": 22788, "free": 10510112.0, "pcnt_inode_used": 0.7207355373730537, "total_inode": 3161770, "total": 12640256.0}}
Comment 4 Poornima G 2018-07-20 02:19:30 EDT
Can you provide the core dump to debug this further? Without core its not possible to analyse what caused the crash. Also, installing debuginfo and pasting the backtrace of core is more helpful.
Comment 12 Mohit Agrawal 2018-07-24 05:05:20 EDT
Hi,

RCA: A gf_client program is getting crash in rpc_clnt_connection_cleanup at the 
     time of destroying saved frames on connection because saved frames are 
     already destroyed by rpc_clnt_destroy.To avoid this race set NULL to saved_frames in critical section in rpc_clnt_destroy.

     I have tried to execute client program with valgrind and found "O bytes inside a block" at the time of destroying frame like below

     =9735==  Address 0x18abbe70 is 0 bytes inside a block of size 272 free'd
==9735==    at 0x4C2ACBD: free (vg_replace_malloc.c:530)
==9735==    by 0x5645B9D: rpc_clnt_destroy (rpc-clnt.c:1777)
==9735==    by 0x5645B9D: rpc_clnt_notify (rpc-clnt.c:950)
==9735==    by 0x56419AB: rpc_transport_unref (rpc-transport.c:517)
==9735==    by 0x5644A38: rpc_clnt_trigger_destroy (rpc-clnt.c:1766)
==9735==    by 0x5644A38: rpc_clnt_unref (rpc-clnt.c:1803)
==9735==    by 0x5644E3F: call_bail (rpc-clnt.c:197)
==9735==    by 0x5AA6981: gf_timer_proc (timer.c:165)
==9735==    by 0x689DDD4: start_thread (pthread_create.c:308)
==9735==    by 0x515DB3C: clone (clone.S:113)


Regards
Mohit Agrawal
Comment 23 errata-xmlrpc 2018-09-04 02:50:20 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.