Bug 800884 - nfs server crashed when "no more space was left" on volume during write operation
Summary: nfs server crashed when "no more space was left" on volume during write opera...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-07 12:57 UTC by Shwetha Panduranga
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:15:26 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shwetha Panduranga 2012-03-07 12:57:39 UTC
Description of problem:
Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
#1  0x0000003af1a340e5 in abort () from /lib64/libc.so.6
#2  0x0000003af1a2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003af1a2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f2622fee42e in afr_get_call_child (this=0x13aa4d0, child_up=0x14864f0 "\001\001\r", <incomplete sequence \360\255\272>, read_child=-1, 
    fresh_children=0x148ef70, call_child=0x7fff03b2738c, last_index=0x7f261d796b6c) at afr-common.c:670
#5  0x00007f2622fa7ded in afr_stat (frame=0x7f262559ed64, this=0x13aa4d0, loc=0x7f262052ffcc) at afr-inode-read.c:257
#6  0x00007f2622d746bd in dht_stat (frame=0x7f26255a0034, this=0x13ab190, loc=0x7f262052ffcc) at dht-inode-read.c:302
#7  0x00007f2622b1fff6 in wb_stat (frame=0x7f26255a4364, this=0x13ac470, loc=0x7f262052ffcc) at write-behind.c:753
#8  0x00007f2626775e74 in default_stat (frame=0x7f26255ab4f0, this=0x13ad790, loc=0x7f262052ffcc) at defaults.c:1174
#9  0x00007f2626775e74 in default_stat (frame=0x7f262559ab8c, this=0x13ae990, loc=0x7f262052ffcc) at defaults.c:1174
#10 0x00007f2626775e74 in default_stat (frame=0x7f26255af36c, this=0x13afbd0, loc=0x7f262052ffcc) at defaults.c:1174
#11 0x00007f26222d1ca3 in io_stats_stat (frame=0x7f26255a1100, this=0x13b0eb0, loc=0x7f262052ffcc) at io-stats.c:1869
#12 0x00007f2622078d48 in nfs_fop_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, loc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-fops.c:432
#13 0x00007f26220821bb in nfs_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, pathloc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-generics.c:72
#14 0x00007f2622089f27 in nfs3_getattr_resume (carg=0x7f262052fb94) at nfs3.c:760
#15 0x00007f26220a5474 in nfs3_fh_resolve_inode_done (cs=0x7f262052fb94, inode=0x7f26210700e0) at nfs3-helpers.c:3545
#16 0x00007f26220a6951 in nfs3_fh_resolve_inode (cs=0x7f262052fb94) at nfs3-helpers.c:3971
#17 0x00007f26220a69e5 in nfs3_fh_resolve_resume (cs=0x7f262052fb94) at nfs3-helpers.c:4003
#18 0x00007f26220a6c10 in nfs3_fh_resolve_root (cs=0x7f262052fb94) at nfs3-helpers.c:4057
#19 0x00007f26220a6e50 in nfs3_fh_resolve_and_resume (cs=0x7f262052fb94, fh=0x7fff03b285a0, entry=0x0, resum_fn=0x7f2622089d46 <nfs3_getattr_resume>)
    at nfs3-helpers.c:4104
#20 0x00007f262208a46b in nfs3_getattr (req=0x7f2621d27214, fh=0x7fff03b285a0) at nfs3.c:801
#21 0x00007f262208a5c8 in nfs3svc_getattr (req=0x7f2621d27214) at nfs3.c:835
#22 0x00007f26265360a9 in rpcsvc_handle_rpc_call (svc=0x13b4b30, trans=0x1485040, msg=0x1489cb0) at rpcsvc.c:514
#23 0x00007f262653644c in rpcsvc_notify (trans=0x1485040, mydata=0x13b4b30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpcsvc.c:610
#24 0x00007f262653bda8 in rpc_transport_notify (this=0x1485040, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpc-transport.c:498
#25 0x00007f26205f7270 in socket_event_poll_in (this=0x1485040) at socket.c:1686
#26 0x00007f26205f77f4 in socket_event_handler (fd=17, idx=8, data=0x1485040, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#27 0x00007f2626796030 in event_dispatch_epoll_handler (event_pool=0x139b370, events=0x147e7d0, i=1) at event.c:794
#28 0x00007f2626796253 in event_dispatch_epoll (event_pool=0x139b370) at event.c:856
#29 0x00007f26267965de in event_dispatch (event_pool=0x139b370) at event.c:956
#30 0x0000000000407dbd in main (argc=7, argv=0x7fff03b28b18) at glusterfsd.c:1611
(gdb) bt full
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003af1a340e5 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003af1a2b9be in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000003af1a2ba80 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f2622fee42e in afr_get_call_child (this=0x13aa4d0, child_up=0x14864f0 "\001\001\r", <incomplete sequence \360\255\272>, read_child=-1, 
    fresh_children=0x148ef70, call_child=0x7fff03b2738c, last_index=0x7f261d796b6c) at afr-common.c:670
        ret = 0
        priv = 0x0
        i = 0
        __PRETTY_FUNCTION__ = "afr_get_call_child"
        __FUNCTION__ = "afr_get_call_child"
#5  0x00007f2622fa7ded in afr_stat (frame=0x7f262559ed64, this=0x13aa4d0, loc=0x7f262052ffcc) at afr-inode-read.c:257
        priv = 0x13e77e0
        local = 0x7f261d795b9c
        children = 0x13e7c80
        call_child = 0
        op_errno = 0
        read_child = -1
        ret = 0
        __FUNCTION__ = "afr_stat"
#6  0x00007f2622d746bd in dht_stat (frame=0x7f26255a0034, this=0x13ab190, loc=0x7f262052ffcc) at dht-inode-read.c:302
        _new = 0x7f262559ed64
        old_THIS = 0x13ab190
        tmp_cbk = 0x7f2622d72fac <dht_file_attr_cbk>
        subvol = 0x13aa4d0
        op_errno = -1
        local = 0x7f261dc407ac
        layout = 0x13e2900
        i = 0
        call_cnt = 0
        __FUNCTION__ = "dht_stat"
#7  0x00007f2622b1fff6 in wb_stat (frame=0x7f26255a4364, this=0x13ac470, loc=0x7f262052ffcc) at write-behind.c:753
---Type <return> to continue, or q <return> to quit--- 
        _new = 0x7f26255a0034
        old_THIS = 0x13ac470
        tmp_cbk = 0x7f2622b1f495 <wb_stat_cbk>
        file = 0x0
        iter_fd = 0x0
        local = 0x13e1278
        tmp_file = 0
        stub = 0x0
        request = 0x0
        ret = -1
        op_errno = 22
        __PRETTY_FUNCTION__ = "wb_stat"
        __FUNCTION__ = "wb_stat"
#8  0x00007f2626775e74 in default_stat (frame=0x7f26255ab4f0, this=0x13ad790, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f26255a4364
        old_THIS = 0x13ad790
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#9  0x00007f2626775e74 in default_stat (frame=0x7f262559ab8c, this=0x13ae990, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f26255ab4f0
        old_THIS = 0x13ae990
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#10 0x00007f2626775e74 in default_stat (frame=0x7f26255af36c, this=0x13afbd0, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f262559ab8c
        old_THIS = 0x13afbd0
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#11 0x00007f26222d1ca3 in io_stats_stat (frame=0x7f26255a1100, this=0x13b0eb0, loc=0x7f262052ffcc) at io-stats.c:1869
        _new = 0x7f26255af36c
        old_THIS = 0x13b0eb0
        tmp_cbk = 0x7f26222ca9bd <io_stats_stat_cbk>
        __FUNCTION__ = "io_stats_stat"
#12 0x00007f2622078d48 in nfs_fop_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, loc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-fops.c:432
        _new = 0x7f26255a1100
---Type <return> to continue, or q <return> to quit---
        old_THIS = 0x13b2410
        tmp_cbk = 0x7f2622078800 <nfs_fop_stat_cbk>
        frame = 0x7f26253a1da4
        ret = -14
        nfl = 0x7f2622041c90
        __FUNCTION__ = "nfs_fop_stat"
#13 0x00007f26220821bb in nfs_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, pathloc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-generics.c:72
        ret = -14
#14 0x00007f2622089f27 in nfs3_getattr_resume (carg=0x7f262052fb94) at nfs3.c:760
        stat = NFS3ERR_SERVERFAULT
        ret = -14
        nfu = {uid = 0, gids = {0, 0, 1, 2, 3, 4, 6, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0}, ngrps = 8, lk_owner = {len = 0, data = '\000' <repeats 1023 times>}}
        cs = 0x7f262052fb94
        __FUNCTION__ = "nfs3_getattr_resume"
#15 0x00007f26220a5474 in nfs3_fh_resolve_inode_done (cs=0x7f262052fb94, inode=0x7f26210700e0) at nfs3-helpers.c:3545
        ret = 0
        __FUNCTION__ = "nfs3_fh_resolve_inode_done"
#16 0x00007f26220a6951 in nfs3_fh_resolve_inode (cs=0x7f262052fb94) at nfs3-helpers.c:3971
        inode = 0x7f26210700e0
        ret = -14
        __FUNCTION__ = "nfs3_fh_resolve_inode"
#17 0x00007f26220a69e5 in nfs3_fh_resolve_resume (cs=0x7f262052fb94) at nfs3-helpers.c:4003
        ret = -14
#18 0x00007f26220a6c10 in nfs3_fh_resolve_root (cs=0x7f262052fb94) at nfs3-helpers.c:4057
        ret = -14
        nfu = {uid = 0, gids = {0 <repeats 17 times>}, ngrps = 0, lk_owner = {len = 0, data = '\000' <repeats 1023 times>}}
        __FUNCTION__ = "nfs3_fh_resolve_root"
#19 0x00007f26220a6e50 in nfs3_fh_resolve_and_resume (cs=0x7f262052fb94, fh=0x7fff03b285a0, entry=0x0, resum_fn=0x7f2622089d46 <nfs3_getattr_resume>)
    at nfs3-helpers.c:4104
        ret = -14
#20 0x00007f262208a46b in nfs3_getattr (req=0x7f2621d27214, fh=0x7fff03b285a0) at nfs3.c:801
        vol = 0x13b0eb0
        stat = NFS3ERR_SERVERFAULT
        ret = -14
        nfs3 = 0x13d0910
---Type <return> to continue, or q <return> to quit---
        cstate = 0x7f262052fb94
        __FUNCTION__ = "nfs3_getattr"
#21 0x00007f262208a5c8 in nfs3svc_getattr (req=0x7f2621d27214) at nfs3.c:835
        fh = {ident = ":O", exportid = "\027G\205\377r\273@\241\246\207\277X\321aK&", gfid = "\235,g#\353\366@a\247\324\340\215\000ƥ", <incomplete sequence \310>, 
          hashcount = 1, entryhash = {257, 0 <repeats 13 times>}}
        args = {object = {data = {data_len = 38, 
              data_val = 0x7fff03b285a0 ":O\027G\205\377r\273@\241\246\207\277X\321aK&\235,g#\353\366@a\247\324", <incomplete sequence \340\215>}}}
        ret = -1
        __FUNCTION__ = "nfs3svc_getattr"
#22 0x00007f26265360a9 in rpcsvc_handle_rpc_call (svc=0x13b4b30, trans=0x1485040, msg=0x1489cb0) at rpcsvc.c:514
        actor = 0x7f26222bee68
        req = 0x7f2621d27214
        ret = -1
        port = 795
        is_unix = _gf_false
        unprivileged = _gf_false
        __FUNCTION__ = "rpcsvc_handle_rpc_call"
#23 0x00007f262653644c in rpcsvc_notify (trans=0x1485040, mydata=0x13b4b30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpcsvc.c:610
        ret = -1
        msg = 0x1489cb0
        new_trans = 0x0
        svc = 0x13b4b30
        listener = 0x0
        __FUNCTION__ = "rpcsvc_notify"
#24 0x00007f262653bda8 in rpc_transport_notify (this=0x1485040, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpc-transport.c:498
        ret = -1
        __FUNCTION__ = "rpc_transport_notify"
#25 0x00007f26205f7270 in socket_event_poll_in (this=0x1485040) at socket.c:1686
        ret = 0
        pollin = 0x1489cb0
#26 0x00007f26205f77f4 in socket_event_handler (fd=17, idx=8, data=0x1485040, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
        this = 0x1485040
        priv = 0x1480920
        ret = 0
        __FUNCTION__ = "socket_event_handler"
#27 0x00007f2626796030 in event_dispatch_epoll_handler (event_pool=0x139b370, events=0x147e7d0, i=1) at event.c:794
---Type <return> to continue, or q <return> to quit---
        event_data = 0x147e7e0
        handler = 0x7f26205f75d7 <socket_event_handler>
        data = 0x1485040
        idx = 8
        ret = -1
        __FUNCTION__ = "event_dispatch_epoll_handler"
#28 0x00007f2626796253 in event_dispatch_epoll (event_pool=0x139b370) at event.c:856
        events = 0x147e7d0
        size = 2
        i = 1
        ret = 0
        __FUNCTION__ = "event_dispatch_epoll"
#29 0x00007f26267965de in event_dispatch (event_pool=0x139b370) at event.c:956
        ret = -1
        __FUNCTION__ = "event_dispatch"
#30 0x0000000000407dbd in main (argc=7, argv=0x7fff03b28b18) at glusterfsd.c:1611
        ctx = 0x1383010
        ret = 0
        __FUNCTION__ = "main"


Version-Release number of selected component (if applicable):
mainline

How reproducible:


Steps to Reproduce:
1.create a distribute-replicate volume. start the volume 
2.create gluster,nfs mounts from client1
3.perform "dd if=/dev/zero of=gfsf1 bs=1M count=102400" from mount1
4.perform "dd if=/dev/zero of=nfsf1 bs=1M count=102400" from mount2
5.perform "dd if=/dev/urandom of=gfsf2 bs=1M count=102400" from mount3
6.perform "dd if=/dev/urandom of=nfsf2 bs=1M count=102400" from mount4
7.The file sizes created should exceed the space on the device. 
  
Actual results:
nfs server crashed when there was no space left on one of the replicate-pair.

Expected results:


Additional info:

Comment 1 Anand Avati 2012-05-16 07:42:49 UTC
CHANGE: http://review.gluster.com/3332 (cluster/afr: Return EIO if read-child < 0 in inode-read fops) merged in master by Anand Avati (avati)

Comment 2 Anand Avati 2012-05-18 06:02:32 UTC
CHANGE: http://review.gluster.com/3351 (cluster/afr: Return EIO if read-child < 0 in inode-read fops) merged in release-3.3 by Vijay Bellur (vijay)

Comment 3 Shwetha Panduranga 2012-05-21 11:28:34 UTC
Bug is fixed . Verified on 3.3.0qa42


Note You need to log in before you can comment on or make changes to this bug.