1143835 – dht crashed on running regression with floating point exception

Bug 1143835 - dht crashed on running regression with floating point exception

Summary: dht crashed on running regression with floating point exception

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-18 06:03 UTC by Pranith Kumar K
Modified:	2015-05-14 17:43 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:
Environment:
Last Closed:	2015-05-14 17:27:44 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2014-09-18 06:03:34 UTC

Description of problem:
This is the run of the regression:
http://build.gluster.org/job/rackspace-regression-2GB-triggered/1559/consoleFull

(gdb) bt
#0  0x00007fb74f7fc418 in dht_selfheal_layout_new_directory (frame=0x7fb75a34a1a4, loc=0x7fb74c5dc898, layout=0x144f960)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-selfheal.c:1068
#1  0x00007fb74f7fc7d0 in dht_selfheal_dir_getafix (frame=0x7fb75a34a1a4, loc=0x7fb74c5dc898, layout=0x144f960)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-selfheal.c:1133
#2  0x00007fb74f7fcbe4 in dht_selfheal_directory (frame=0x7fb75a34a1a4, dir_cbk=0x7fb74f808a19 <dht_lookup_selfheal_cbk>, loc=0x7fb74c5dc898, layout=0x144f960)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-selfheal.c:1243
#3  0x00007fb74f80acb0 in dht_lookup_dir_cbk (frame=0x7fb75a34a1a4, cookie=0x7fb75a34a2fc, this=0x1455580, op_ret=0, op_errno=22, inode=0x7fb74e2e204c, stbuf=0x7fff8f7b51e0, xattr=0x7fb759d45650, 
    postparent=0x7fff8f7b5170) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:578
#4  0x00007fb74fa7871d in client3_3_lookup_cbk (req=0x7fb74c5924e4, iov=0x7fb74c592524, count=1, myframe=0x7fb75a34a2fc)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client-rpc-fops.c:2769
#5  0x00007fb75c0d9c49 in rpc_clnt_handle_reply (clnt=0x14901e0, pollin=0x144f4b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:766
#6  0x00007fb75c0da06a in rpc_clnt_notify (trans=0x14c61f0, mydata=0x1490210, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x144f4b0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:894
#7  0x00007fb75c0d65c0 in rpc_transport_notify (this=0x14c61f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x144f4b0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:516
#8  0x00007fb7512beec8 in socket_event_poll_in (this=0x14c61f0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2136
#9  0x00007fb7512bf383 in socket_event_handler (fd=8, idx=7, data=0x14c61f0, poll_in=1, poll_out=0, poll_err=0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2249
#10 0x00007fb75c379b2b in event_dispatch_epoll_handler (event_pool=0x142f3c0, events=0x144dd60, i=0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:384
#11 0x00007fb75c379d25 in event_dispatch_epoll (event_pool=0x142f3c0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:445
#12 0x00007fb75c346c53 in event_dispatch (event_pool=0x142f3c0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event.c:113
#13 0x0000000000409750 in main (argc=11, argv=0x7fff8f7b6908) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:2043

One of the following divisions should be the reason:

        if (weight_by_size) {
                /* We know total_size is not zero. */
                chunk = ((unsigned long) 0xffffffff) / total_size;
                gf_log (this->name, GF_LOG_INFO,
                        "chunk size = 0xffffffff / %u = 0x%x",
                        total_size, chunk);
        }
        else {
                chunk = ((unsigned long) 0xffffffff) / bricks_used;
        }
           

Version-Release number of selected component (if applicable):


How reproducible:
Not sure

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2014-09-18 06:35:13 UTC

More information:
(gdb) p bricks_used
$1 = 0
(gdb) p weight_by_size
$2 = _gf_false
(gdb) p layout->cnt
$3 = 1
(gdb) p layout->list[0].err
$4 = -1
(gdb)

Comment 2 Anand Avati 2014-09-22 08:10:52 UTC

REVIEW: http://review.gluster.org/8792 (cluster/dht: Modified the calculation of brick_count) posted (#1) for review on master by venkatesh somyajulu (vsomyaju)

Comment 3 Anand Avati 2014-09-23 05:50:32 UTC

COMMIT: http://review.gluster.org/8792 committed in master by Vijay Bellur (vbellur) 
------
commit f14d9bdd52b428466e7863d06c89b4684be3da07
Author: Venkatesh Somyajulu <vsomyaju>
Date:   Mon Sep 22 13:29:13 2014 +0530

    cluster/dht: Modified the calculation of brick_count
    
    Whenever new_layout is calculated for a directory,
    we calculate the number of childs of dht, who will get
    the actual(Non-zero) layout-range, and assign range
    to only those subvolume and other will get 0 as
    their layout->start and layout->stop value.
    
    This calculation is based on either
    a) weight_by_size or
    b) number of brick who will be assigned the non-zero range
    
    So if in case we are not assigning the layout based
    on weight_by_size, we should choose the "bricks_to_use"
    instead of "bricks_used".
    
    In regression test,
    we found that priv->du_stat[0].chunks was zero. In this
    case "bricks_used" variable will be zero, which will cause
    crash for
    
    chunk = ((unsigned long) 0xffffffff) / bricks__used;
    calculation.
    
    Change-Id: I6f1b21eff972a80d9eb22771087c1e2f53e7e724
    BUG: 1143835
    Signed-off-by: Venkatesh Somyajulu <vsomyaju>
    Reviewed-on: http://review.gluster.org/8792
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 4 Niels de Vos 2015-05-14 17:27:44 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:36 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:37:58 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:43:54 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.