Bug 811206

Summary: Unable to allocate memory
Product: [Community] GlusterFS Reporter: Nakai Kanako <nakai.kanako>
Component: cliAssignee: Amar Tumballi <amarts>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.2.3CC: gluster-bugs, vraman, yasuya.ichikawa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-18 08:45:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nakai Kanako 2012-04-10 12:24:14 UTC
Description of problem:

we executed following command every 5 min. (for analyze Glusterfs performance)

extend glusterfs_profile /usr/local/glusterfs/sbin/gluster volume profile {VOLUME_NAME} info

Got following error message.

1.(server A)
[2012-04-10 19:33:29.305426] W [rpc-transport.c:605:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2012-04-10 19:33:29.718660] I [cli-rpc-ops.c:407:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2012-04-10 19:33:29.718823] I [cli-rpc-ops.c:586:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2012-04-10 19:33:29.718963] I [cli-rpc-ops.c:407:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2012-04-10 19:33:29.719112] I [cli-rpc-ops.c:586:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2012-04-10 19:33:29.719203] I [cli-rpc-ops.c:407:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2012-04-10 19:33:29.719230] I [cli-rpc-ops.c:586:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2012-04-10 19:33:29.719237] I [input.c:46:cli_batch] 0-: Exiting with: 0
[2012-04-10 19:35:05.34008] W [rpc-transport.c:605:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2012-04-10 19:35:05.105810] I [input.c:46:cli_batch] 0-: Exiting with: -1
[2012-04-10 19:35:10.43057] W [rpc-transport.c:605:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2012-04-10 19:35:10.115184] I [input.c:46:cli_batch] 0-: Exiting with: -1
[2012-04-10 19:35:24.737207] W [rpc-transport.c:605:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2012-04-10 19:35:24.818940] W [dict.c:2458:dict_unserialize] (-->/usr/local/glusterfs-3.2.3/lib/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2abbd11d95ad] (-->/usr/local/glusterfs-3.2.3/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2abbd11d93b2] (-->/usr/local/glusterfs/sbin/gluster(gf_cli3_1_profile_volume_cbk+0x138) [0x40f088]))) 0-dict: buf is null!
[2012-04-10 19:35:24.818976] E [cli-rpc-ops.c:3045:gf_cli3_1_profile_volume_cbk] 0-: Unable to allocate memory  <<<<<********here
[2012-04-10 19:35:24.819115] I [input.c:46:cli_batch] 0-: Exiting with: -1

2.(server B)
[2012-04-10 14:16:02.832940] E [glusterd-utils.c:235:glusterd_lock] 0-glusterd: Unable to get lock for uuid: d1e71e47-64f5-4fed-b6c5-0990e67cf15a, lock held by: 46d0d13c-459b-405a-b38d-3479bd60021e
[2012-04-10 14:16:02.832962] E [glusterd-handler.c:415:glusterd_op_txn_begin] 0-glusterd: Unable to acquire local lock, ret: -1

Version-Release number of selected component (if applicable):

version 3.2.3

How reproducible:

Steps to Reproduce:
1.Execute following command.

 extend glusterfs_profile /usr/local/glusterfs/sbin/gluster volume profile {VOLUME_NAME} info

2.  check your glusterfs log message

tail -f /usr/local/glusterfs/var/log/glusterfs/cli.log

1.(server A)
[2012-04-10 19:35:24.818976] E [cli-rpc-ops.c:3045:gf_cli3_1_profile_volume_cbk] 0-: Unable to allocate memory

2.(server B)
[2012-04-10 14:16:02.832940] E [glusterd-utils.c:235:glusterd_lock] 0-glusterd: Unable to get lock for uuid: d1e71e47-64f5-4fed-b6c5-0990e67cf15a, lock held by: 46d0d13c-459b-405a-b38d-3479bd60021e
[2012-04-10 14:16:02.832962] E [glusterd-handler.c:415:glusterd_op_txn_begin] 0-glusterd: Unable to acquire local lock, ret: -1
  
Actual results:
some error happend

Expected results:
show status

Additional info:
-We couldn't mount glusterfs from any clients.

Comment 1 Amar Tumballi 2012-04-11 02:46:34 UTC
can you paste the output of the 'gluster volume info' ?

Comment 2 Nakai Kanako 2012-04-11 04:43:23 UTC
# gluster volume info

Volume Name: xxxxxxx02
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: xxxxstorage102c.xx.xxxxxxx.co.jp:/brick02
Brick2: xxxxstorageb102c.xx.xxxxxxx.co.jp:/brick02
Options Reconfigured:
network.frame-timeout: 30
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.dump-fd-stats: off
performance.stat-prefetch: off
performance.io-thread-count: 64
performance.cache-size: 6GB
network.ping-timeout: 5
nfs.disable: on

Volume Name: xxxxxxx01
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: xxxxstorage101c.xx.xxxxxxx.co.jp:/brick01
Brick2: xxxxstorageb101c.xx.xxxxxxx.co.jp:/brick01
Options Reconfigured:
network.frame-timeout: 30
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.dump-fd-stats: off
performance.stat-prefetch: off
performance.io-thread-count: 64
performance.cache-size: 6GB
network.ping-timeout: 5
nfs.disable: on

Comment 3 yasuya.ichikawa 2012-04-17 00:59:14 UTC
I will send you a backtrace log.

(*) In case of normal processing (Server-A, Server-B)
(gdb) bt
#0  dict_unserialize (orig_buf=0x0, size=0, fill=0x7fffffffe118) at dict.c:2443
#1  0x000000000040f088 in gf_cli3_1_profile_volume_cbk (req=<value optimized out>, iov=0x2aaaabf5a064,
    count=<value optimized out>, myframe=<value optimized out>) at cli-rpc-ops.c:3039
#2  0x00002aaaaaf523b2 in rpc_clnt_handle_reply (clnt=0x622780, pollin=<value optimized out>) at rpc-clnt.c:736
#3  0x00002aaaaaf525ad in rpc_clnt_notify (trans=<value optimized out>, mydata=0x6227b0, event=<value optimized out>,
    data=0x36253529e8) at rpc-clnt.c:849
#4  0x00002aaaaaf4d287 in rpc_transport_notify (this=0x0, event=4294959384, data=0x36253529e8) at rpc-transport.c:918
#5  0x00002aaaac21feef in socket_event_poll_in (this=0x622940) at socket.c:1647
#6  0x00002aaaac220098 in socket_event_handler (fd=<value optimized out>, idx=0, data=0x622940, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:1762
#7  0x00002aaaaaae9391 in event_dispatch_epoll_handler (event_pool=0x621350) at event.c:794
#8  event_dispatch_epoll (event_pool=0x621350) at event.c:856
#9  0x0000000000407602 in main (argc=5, argv=0x7fffffffea28) at cli.c:648

(*) In case of normal processing (Server-C, Server-D)
(gdb) bt
#0  dict_unserialize (orig_buf=0x8069c48 "", size=2123, fill=0xbfffcc48) at dict.c:2443
#1  0x0805608e in gf_cli3_1_profile_volume_cbk (req=0xb720e014, iov=0xb720e034, count=1, myframe=0xb748f088) at cli-rpc-ops.c:3039
#2  0x00f4d132 in rpc_clnt_handle_reply (clnt=0x8066de0, pollin=0x8068948) at rpc-clnt.c:736
#3  0x00f4d352 in rpc_clnt_notify (trans=0x8066ed0, mydata=0x8066dfc, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x8068b40) at rpc-clnt.c:849
#4  0x00f47421 in rpc_transport_notify (this=0x8068b40, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x8068948) at rpc-transport.c:918
#5  0x0011bc40 in socket_event_poll_in (this=0x8066ed0) at socket.c:1647
#6  0x0011bdc2 in socket_event_handler (fd=7, idx=0, data=0x8066ed0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1762
#7  0x007acae7 in event_dispatch_epoll_handler (event_pool=0x80661e8) at event.c:794
#8  event_dispatch_epoll (event_pool=0x80661e8) at event.c:856
#9  0x007ab881 in event_dispatch (event_pool=0x8068b40) at event.c:956
#10 0x0804dd37 in main (argc=5, argv=0xbfffd494) at cli.c:648

(*) Comment
It seems orig_buf is null.
It seems not to get a fd at #6.
Please give us some advice for more investigation.

Comment 4 Amar Tumballi 2012-04-17 09:45:04 UTC
some of the memory tuning suggestions to check...

* reduce io-cache cache-size to 2GB or lesser (testing purpose)
* reduce the io-threads count to 8


The logs anyways suggests that the system's memory manager (malloc/calloc/free) itself returning ENOMEM (Failed to Allocate Memory) error.

Other good feedback would be to take 'statedump' of glusterfs/d (by sending signal SIGUSR1 to the process, which will be captured at /tmp/glusterdump.<pid>)

Comment 5 Nakai Kanako 2012-05-11 14:55:12 UTC
Hello.

Thank you for your advice.

This error is still continue.
We want to challenge reduce setting value but we can not be changed currently gluster's setting because of this error.
So we restarted OS and glusterd (server A) .
After that we could mount glusterfs but we still couldn't modify gluster's setting.

If we restart all server (at same time), we can solved this ?

Comment 6 Amar Tumballi 2012-09-18 08:45:47 UTC
Nakai, sorry for delay in getting back to you. Can you please check if this error happens now for you with 3.3.0 release?

I will be closing this bug as WORKSFORME, as in our testing in recent times, we never faced this issue. Re-open the bug if it still happens for you.

Comment 7 Nakai Kanako 2012-09-19 04:30:50 UTC
Hi,

We restarted all of the configuration node.
After that GFS didn't output same error messages.
Maybe resolved.
But we didn't retry execute profile command every 5 min,
Because I thought this profile command include some memory issue. 

Anyway thank you for your support.