Bug 800884 - nfs server crashed when "no more space was left" on volume during write operation
nfs server crashed when "no more space was left" on volume during write opera...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Pranith Kumar K
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-03-07 07:57 EST by Shwetha Panduranga
Modified: 2015-12-01 11:45 EST (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:15:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shwetha Panduranga 2012-03-07 07:57:39 EST
Description of problem:
Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
#1  0x0000003af1a340e5 in abort () from /lib64/libc.so.6
#2  0x0000003af1a2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003af1a2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f2622fee42e in afr_get_call_child (this=0x13aa4d0, child_up=0x14864f0 "\001\001\r", <incomplete sequence \360\255\272>, read_child=-1, 
    fresh_children=0x148ef70, call_child=0x7fff03b2738c, last_index=0x7f261d796b6c) at afr-common.c:670
#5  0x00007f2622fa7ded in afr_stat (frame=0x7f262559ed64, this=0x13aa4d0, loc=0x7f262052ffcc) at afr-inode-read.c:257
#6  0x00007f2622d746bd in dht_stat (frame=0x7f26255a0034, this=0x13ab190, loc=0x7f262052ffcc) at dht-inode-read.c:302
#7  0x00007f2622b1fff6 in wb_stat (frame=0x7f26255a4364, this=0x13ac470, loc=0x7f262052ffcc) at write-behind.c:753
#8  0x00007f2626775e74 in default_stat (frame=0x7f26255ab4f0, this=0x13ad790, loc=0x7f262052ffcc) at defaults.c:1174
#9  0x00007f2626775e74 in default_stat (frame=0x7f262559ab8c, this=0x13ae990, loc=0x7f262052ffcc) at defaults.c:1174
#10 0x00007f2626775e74 in default_stat (frame=0x7f26255af36c, this=0x13afbd0, loc=0x7f262052ffcc) at defaults.c:1174
#11 0x00007f26222d1ca3 in io_stats_stat (frame=0x7f26255a1100, this=0x13b0eb0, loc=0x7f262052ffcc) at io-stats.c:1869
#12 0x00007f2622078d48 in nfs_fop_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, loc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-fops.c:432
#13 0x00007f26220821bb in nfs_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, pathloc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-generics.c:72
#14 0x00007f2622089f27 in nfs3_getattr_resume (carg=0x7f262052fb94) at nfs3.c:760
#15 0x00007f26220a5474 in nfs3_fh_resolve_inode_done (cs=0x7f262052fb94, inode=0x7f26210700e0) at nfs3-helpers.c:3545
#16 0x00007f26220a6951 in nfs3_fh_resolve_inode (cs=0x7f262052fb94) at nfs3-helpers.c:3971
#17 0x00007f26220a69e5 in nfs3_fh_resolve_resume (cs=0x7f262052fb94) at nfs3-helpers.c:4003
#18 0x00007f26220a6c10 in nfs3_fh_resolve_root (cs=0x7f262052fb94) at nfs3-helpers.c:4057
#19 0x00007f26220a6e50 in nfs3_fh_resolve_and_resume (cs=0x7f262052fb94, fh=0x7fff03b285a0, entry=0x0, resum_fn=0x7f2622089d46 <nfs3_getattr_resume>)
    at nfs3-helpers.c:4104
#20 0x00007f262208a46b in nfs3_getattr (req=0x7f2621d27214, fh=0x7fff03b285a0) at nfs3.c:801
#21 0x00007f262208a5c8 in nfs3svc_getattr (req=0x7f2621d27214) at nfs3.c:835
#22 0x00007f26265360a9 in rpcsvc_handle_rpc_call (svc=0x13b4b30, trans=0x1485040, msg=0x1489cb0) at rpcsvc.c:514
#23 0x00007f262653644c in rpcsvc_notify (trans=0x1485040, mydata=0x13b4b30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpcsvc.c:610
#24 0x00007f262653bda8 in rpc_transport_notify (this=0x1485040, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpc-transport.c:498
#25 0x00007f26205f7270 in socket_event_poll_in (this=0x1485040) at socket.c:1686
#26 0x00007f26205f77f4 in socket_event_handler (fd=17, idx=8, data=0x1485040, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#27 0x00007f2626796030 in event_dispatch_epoll_handler (event_pool=0x139b370, events=0x147e7d0, i=1) at event.c:794
#28 0x00007f2626796253 in event_dispatch_epoll (event_pool=0x139b370) at event.c:856
#29 0x00007f26267965de in event_dispatch (event_pool=0x139b370) at event.c:956
#30 0x0000000000407dbd in main (argc=7, argv=0x7fff03b28b18) at glusterfsd.c:1611
(gdb) bt full
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003af1a340e5 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003af1a2b9be in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000003af1a2ba80 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f2622fee42e in afr_get_call_child (this=0x13aa4d0, child_up=0x14864f0 "\001\001\r", <incomplete sequence \360\255\272>, read_child=-1, 
    fresh_children=0x148ef70, call_child=0x7fff03b2738c, last_index=0x7f261d796b6c) at afr-common.c:670
        ret = 0
        priv = 0x0
        i = 0
        __PRETTY_FUNCTION__ = "afr_get_call_child"
        __FUNCTION__ = "afr_get_call_child"
#5  0x00007f2622fa7ded in afr_stat (frame=0x7f262559ed64, this=0x13aa4d0, loc=0x7f262052ffcc) at afr-inode-read.c:257
        priv = 0x13e77e0
        local = 0x7f261d795b9c
        children = 0x13e7c80
        call_child = 0
        op_errno = 0
        read_child = -1
        ret = 0
        __FUNCTION__ = "afr_stat"
#6  0x00007f2622d746bd in dht_stat (frame=0x7f26255a0034, this=0x13ab190, loc=0x7f262052ffcc) at dht-inode-read.c:302
        _new = 0x7f262559ed64
        old_THIS = 0x13ab190
        tmp_cbk = 0x7f2622d72fac <dht_file_attr_cbk>
        subvol = 0x13aa4d0
        op_errno = -1
        local = 0x7f261dc407ac
        layout = 0x13e2900
        i = 0
        call_cnt = 0
        __FUNCTION__ = "dht_stat"
#7  0x00007f2622b1fff6 in wb_stat (frame=0x7f26255a4364, this=0x13ac470, loc=0x7f262052ffcc) at write-behind.c:753
---Type <return> to continue, or q <return> to quit--- 
        _new = 0x7f26255a0034
        old_THIS = 0x13ac470
        tmp_cbk = 0x7f2622b1f495 <wb_stat_cbk>
        file = 0x0
        iter_fd = 0x0
        local = 0x13e1278
        tmp_file = 0
        stub = 0x0
        request = 0x0
        ret = -1
        op_errno = 22
        __PRETTY_FUNCTION__ = "wb_stat"
        __FUNCTION__ = "wb_stat"
#8  0x00007f2626775e74 in default_stat (frame=0x7f26255ab4f0, this=0x13ad790, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f26255a4364
        old_THIS = 0x13ad790
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#9  0x00007f2626775e74 in default_stat (frame=0x7f262559ab8c, this=0x13ae990, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f26255ab4f0
        old_THIS = 0x13ae990
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#10 0x00007f2626775e74 in default_stat (frame=0x7f26255af36c, this=0x13afbd0, loc=0x7f262052ffcc) at defaults.c:1174
        _new = 0x7f262559ab8c
        old_THIS = 0x13afbd0
        tmp_cbk = 0x7f262676674b <default_stat_cbk>
        __FUNCTION__ = "default_stat"
#11 0x00007f26222d1ca3 in io_stats_stat (frame=0x7f26255a1100, this=0x13b0eb0, loc=0x7f262052ffcc) at io-stats.c:1869
        _new = 0x7f26255af36c
        old_THIS = 0x13b0eb0
        tmp_cbk = 0x7f26222ca9bd <io_stats_stat_cbk>
        __FUNCTION__ = "io_stats_stat"
#12 0x00007f2622078d48 in nfs_fop_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, loc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-fops.c:432
        _new = 0x7f26255a1100
---Type <return> to continue, or q <return> to quit---
        old_THIS = 0x13b2410
        tmp_cbk = 0x7f2622078800 <nfs_fop_stat_cbk>
        frame = 0x7f26253a1da4
        ret = -14
        nfl = 0x7f2622041c90
        __FUNCTION__ = "nfs_fop_stat"
#13 0x00007f26220821bb in nfs_stat (nfsx=0x13b2410, xl=0x13b0eb0, nfu=0x7fff03b27a20, pathloc=0x7f262052ffcc, cbk=0x7f2622089c35 <nfs3svc_getattr_stat_cbk>, 
    local=0x7f262052fb94) at nfs-generics.c:72
        ret = -14
#14 0x00007f2622089f27 in nfs3_getattr_resume (carg=0x7f262052fb94) at nfs3.c:760
        stat = NFS3ERR_SERVERFAULT
        ret = -14
        nfu = {uid = 0, gids = {0, 0, 1, 2, 3, 4, 6, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0}, ngrps = 8, lk_owner = {len = 0, data = '\000' <repeats 1023 times>}}
        cs = 0x7f262052fb94
        __FUNCTION__ = "nfs3_getattr_resume"
#15 0x00007f26220a5474 in nfs3_fh_resolve_inode_done (cs=0x7f262052fb94, inode=0x7f26210700e0) at nfs3-helpers.c:3545
        ret = 0
        __FUNCTION__ = "nfs3_fh_resolve_inode_done"
#16 0x00007f26220a6951 in nfs3_fh_resolve_inode (cs=0x7f262052fb94) at nfs3-helpers.c:3971
        inode = 0x7f26210700e0
        ret = -14
        __FUNCTION__ = "nfs3_fh_resolve_inode"
#17 0x00007f26220a69e5 in nfs3_fh_resolve_resume (cs=0x7f262052fb94) at nfs3-helpers.c:4003
        ret = -14
#18 0x00007f26220a6c10 in nfs3_fh_resolve_root (cs=0x7f262052fb94) at nfs3-helpers.c:4057
        ret = -14
        nfu = {uid = 0, gids = {0 <repeats 17 times>}, ngrps = 0, lk_owner = {len = 0, data = '\000' <repeats 1023 times>}}
        __FUNCTION__ = "nfs3_fh_resolve_root"
#19 0x00007f26220a6e50 in nfs3_fh_resolve_and_resume (cs=0x7f262052fb94, fh=0x7fff03b285a0, entry=0x0, resum_fn=0x7f2622089d46 <nfs3_getattr_resume>)
    at nfs3-helpers.c:4104
        ret = -14
#20 0x00007f262208a46b in nfs3_getattr (req=0x7f2621d27214, fh=0x7fff03b285a0) at nfs3.c:801
        vol = 0x13b0eb0
        stat = NFS3ERR_SERVERFAULT
        ret = -14
        nfs3 = 0x13d0910
---Type <return> to continue, or q <return> to quit---
        cstate = 0x7f262052fb94
        __FUNCTION__ = "nfs3_getattr"
#21 0x00007f262208a5c8 in nfs3svc_getattr (req=0x7f2621d27214) at nfs3.c:835
        fh = {ident = ":O", exportid = "\027G\205\377r\273@\241\246\207\277X\321aK&", gfid = "\235,g#\353\366@a\247\324\340\215\000ƥ", <incomplete sequence \310>, 
          hashcount = 1, entryhash = {257, 0 <repeats 13 times>}}
        args = {object = {data = {data_len = 38, 
              data_val = 0x7fff03b285a0 ":O\027G\205\377r\273@\241\246\207\277X\321aK&\235,g#\353\366@a\247\324", <incomplete sequence \340\215>}}}
        ret = -1
        __FUNCTION__ = "nfs3svc_getattr"
#22 0x00007f26265360a9 in rpcsvc_handle_rpc_call (svc=0x13b4b30, trans=0x1485040, msg=0x1489cb0) at rpcsvc.c:514
        actor = 0x7f26222bee68
        req = 0x7f2621d27214
        ret = -1
        port = 795
        is_unix = _gf_false
        unprivileged = _gf_false
        __FUNCTION__ = "rpcsvc_handle_rpc_call"
#23 0x00007f262653644c in rpcsvc_notify (trans=0x1485040, mydata=0x13b4b30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpcsvc.c:610
        ret = -1
        msg = 0x1489cb0
        new_trans = 0x0
        svc = 0x13b4b30
        listener = 0x0
        __FUNCTION__ = "rpcsvc_notify"
#24 0x00007f262653bda8 in rpc_transport_notify (this=0x1485040, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1489cb0) at rpc-transport.c:498
        ret = -1
        __FUNCTION__ = "rpc_transport_notify"
#25 0x00007f26205f7270 in socket_event_poll_in (this=0x1485040) at socket.c:1686
        ret = 0
        pollin = 0x1489cb0
#26 0x00007f26205f77f4 in socket_event_handler (fd=17, idx=8, data=0x1485040, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
        this = 0x1485040
        priv = 0x1480920
        ret = 0
        __FUNCTION__ = "socket_event_handler"
#27 0x00007f2626796030 in event_dispatch_epoll_handler (event_pool=0x139b370, events=0x147e7d0, i=1) at event.c:794
---Type <return> to continue, or q <return> to quit---
        event_data = 0x147e7e0
        handler = 0x7f26205f75d7 <socket_event_handler>
        data = 0x1485040
        idx = 8
        ret = -1
        __FUNCTION__ = "event_dispatch_epoll_handler"
#28 0x00007f2626796253 in event_dispatch_epoll (event_pool=0x139b370) at event.c:856
        events = 0x147e7d0
        size = 2
        i = 1
        ret = 0
        __FUNCTION__ = "event_dispatch_epoll"
#29 0x00007f26267965de in event_dispatch (event_pool=0x139b370) at event.c:956
        ret = -1
        __FUNCTION__ = "event_dispatch"
#30 0x0000000000407dbd in main (argc=7, argv=0x7fff03b28b18) at glusterfsd.c:1611
        ctx = 0x1383010
        ret = 0
        __FUNCTION__ = "main"


Version-Release number of selected component (if applicable):
mainline

How reproducible:


Steps to Reproduce:
1.create a distribute-replicate volume. start the volume 
2.create gluster,nfs mounts from client1
3.perform "dd if=/dev/zero of=gfsf1 bs=1M count=102400" from mount1
4.perform "dd if=/dev/zero of=nfsf1 bs=1M count=102400" from mount2
5.perform "dd if=/dev/urandom of=gfsf2 bs=1M count=102400" from mount3
6.perform "dd if=/dev/urandom of=nfsf2 bs=1M count=102400" from mount4
7.The file sizes created should exceed the space on the device. 
  
Actual results:
nfs server crashed when there was no space left on one of the replicate-pair.

Expected results:


Additional info:
Comment 1 Anand Avati 2012-05-16 03:42:49 EDT
CHANGE: http://review.gluster.com/3332 (cluster/afr: Return EIO if read-child < 0 in inode-read fops) merged in master by Anand Avati (avati@redhat.com)
Comment 2 Anand Avati 2012-05-18 02:02:32 EDT
CHANGE: http://review.gluster.com/3351 (cluster/afr: Return EIO if read-child < 0 in inode-read fops) merged in release-3.3 by Vijay Bellur (vijay@gluster.com)
Comment 3 Shwetha Panduranga 2012-05-21 07:28:34 EDT
Bug is fixed . Verified on 3.3.0qa42

Note You need to log in before you can comment on or make changes to this bug.