Bug 1292705

Summary:	gluster cli crashed while performing 'gluster vol bitrot <vol_name> scrub status'
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	RamaKasturi <knarra>
Component:	bitrot	Assignee:	Gaurav Kumar Garg <ggarg>
Status:	CLOSED ERRATA	QA Contact:	RamaKasturi <knarra>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.1	CC:	amukherj, asrivast, byarlaga, ggarg, knarra, rhs-bugs, sankarshan, smohan, storage-qa-internal, vshankar
Target Milestone:	---	Keywords:	Reopened, ZStream
Target Release:	RHGS 3.1.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.5-14	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1293558 (view as bug list)		Environment:
Last Closed:	2016-03-01 06:04:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1293558

Description RamaKasturi 2015-12-18 07:10:23 UTC

Description of problem:
gluster cli crashed when the command 'gluster vol bitrot <vol_name> scrub status' is run. 
Below is the output when the command is run :
=============================================
[root@rhs-client2 ~]# gluster vol bitrot vol_ec scrub status

Volume name : vol_ec

State of scrub: Active

Scrub impact: lazy

Scrub frequency: hourly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=========================================================

Node name: localhost

Number of Scrubbed files: 0

Number of Unsigned files: 0

Last completed scrub time: Scrubber pending to complete.

Duration of last scrub: 0:0:0:0

Error count: 4

Corrupted object's:

a6644ddd-7ab3-498f-9e85-a140b6cee08e

07bc67a2-5fcd-4ea4-9664-fb2a74dee275

c6c53e9e-c5df-4954-b6d7-235875326f88

8fdfa706-35e2-4980-a9ea-99392aca172c


=========================================================

Node name: 10.70.36.62

Number of Scrubbed files: 0

Number of Unsigned files: 0

Segmentation fault (core dumped)

Back trace from gdb:
========================================================
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `gluster vol bitrot vol_ec scrub status'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007faaf631d69f in gf_cli_print_bitrot_scrub_status (dict=0x7faaf764430c) at cli-rpc-ops.c:10770
10770	                cli_out ("%s: %s\n", "Last completed scrub time",
Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.el7_2.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-26.el7.x86_64 libxml2-2.9.1-6.el7_2.2.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 openssl-libs-1.0.1e-51.el7_2.1.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007faaf631d69f in gf_cli_print_bitrot_scrub_status (dict=0x7faaf764430c) at cli-rpc-ops.c:10770
#1  0x00007faaf631de82 in gf_cli_bitrot_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7faaf762d0ac)
    at cli-rpc-ops.c:10870
#2  0x00007faaf5340b20 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7faaf765e960, pollin=pollin@entry=0x7faae4000e00) at rpc-clnt.c:766
#3  0x00007faaf5340ddf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7faaf765e990, event=<optimized out>, data=0x7faae4000e00)
    at rpc-clnt.c:907
#4  0x00007faaf533c913 in rpc_transport_notify (this=this@entry=0x7faaf7661aa0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, 
    data=data@entry=0x7faae4000e00) at rpc-transport.c:545
#5  0x00007faaec67e4b6 in socket_event_poll_in (this=this@entry=0x7faaf7661aa0) at socket.c:2236
#6  0x00007faaec6813a4 in socket_event_handler (fd=fd@entry=6, idx=idx@entry=0, data=0x7faaf7661aa0, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:2349
#7  0x00007faaf5e828ca in event_dispatch_epoll_handler (event=0x7faae9feae80, event_pool=0x7faaf7629ce0) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7faaf7697260) at event-epoll.c:678
#9  0x00007faaf4289dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007faaf3bd021d in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x00007faaf631d69f in gf_cli_print_bitrot_scrub_status (dict=0x7faaf764430c) at cli-rpc-ops.c:10770
10770	                cli_out ("%s: %s\n", "Last completed scrub time",
(gdb) l
10765	                          scrub_files);
10766	
10767	                cli_out ("%s: %"PRIu64 "\n", "Number of Unsigned files",
10768	                          unsigned_files);
10769	
10770	                cli_out ("%s: %s\n", "Last completed scrub time",
10771	                          (*last_scrub) ? last_scrub : "Scrubber pending to "
10772	                           "complete.");
10773	
10774	                /* Printing last scrub duration time in human readable form*/
(gdb) p last_scrub
$1 = 0x0

 

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-12.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install the latest build on RHEL7.2
2. create a volume and enable bitrot on that
3. Now run the command 'gluster vol bitrot <vol_name> scrub status'

Actual results:
gluster cli crashed.

Expected results:
gluster cli should not crash

Additional info:

Comment 2 Gaurav Kumar Garg 2015-12-18 07:37:36 UTC

Hi rama kasturi,

i think in one node you are using latest build and in another node you are using old build. just make sure that you are using latest build on both node. 

IMO this issue will goes when you use updated build on both node. could you conform and close this bug if it is not reproducible after having latest build on both node.

Comment 3 RamaKasturi 2015-12-18 07:41:49 UTC

Closing this bug as not reproducible.

Comment 4 Venky Shankar 2015-12-19 08:35:14 UTC

> 10770	                cli_out ("%s: %s\n", "Last completed scrub time",
> 10771	                          (*last_scrub) ? last_scrub : "Scrubber

Hold on. That's a bug. The check should be:
    (last_scrub) ? last_scrub : "Scrubber pending to complete"

> pending to "
> 10772	                           "complete.");
> 10773	
> 10774	                /* Printing last scrub duration time in human readable
> form*/
> (gdb) p last_scrub
> $1 = 0x0

Comment 5 Gaurav Kumar Garg 2015-12-21 11:34:17 UTC

last_scrub is char pointer so even if the value of last_scrub will be null means (*last_scrub = NULL) then value of last_scrub will not be null, it will point to some address. so in this case in case of scrubber pending to complete it will always show nothing. you can modify this in code and check it.

Comment 11 Venky Shankar 2016-01-05 04:36:14 UTC

Gaurav,

I understand that this crash was encountered in nightly build and not in QA/release packages. So, if its ok, could you move the bz state as per the patch state?

Comment 15 RamaKasturi 2016-01-07 13:35:54 UTC

I had a observation when i performed the  process mentioned above in comment 14.

Once the nodes are updated even though there are some corrupted files in the system scrub status does not show them in the scrub output. Will the bad files be only shown when the scrubber runs the next time? Is this the expected behavior? Can you please clarify?

Comment 16 Gaurav Kumar Garg 2016-02-01 11:40:52 UTC

No, It should display bad file information as soon as it detect bad file.

Comment 18 errata-xmlrpc 2016-03-01 06:04:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html