Description of problem: After upgrading the glustefs from 3.5.2 to 3.6.1 on environment were two glusters ( one Distributed-Replicate & one Distribute) are formed as follows: gluster volume info Volume Name: ingest_vol Type: Distributed-Replicate Volume ID: acdd2208-5ed1-4729-9d27-923c42f22e2c Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: passtorage1:/mnt/ingest/brick Brick2: passtorage2:/mnt/ingest/brick Brick3: passtorage3:/mnt/ingest/brick Brick4: passtorage4:/mnt/ingest/brick Options Reconfigured: user.cifs: disable performance.force-readdirp: off cluster.extra-hash-regex: "(.*)\\.tmp" performance.lazy-open: off performance.strict-o-direct: on performance.flush-behind: on performance.read-ahead: on performance.write-behind: on performance.stat-prefetch: on nfs.disable: on Volume Name: storage_vol01 Type: Distribute Volume ID: 946a01dd-5546-4e1a-b1c1-fd02fb5d157a Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: passtorage1:/mnt/storage/brick Brick2: passtorage2:/mnt/storage/brick Brick3: passtorage3:/mnt/storage/brick Brick4: passtorage4:/mnt/storage/brick Brick5: passtorage5:/mnt/storage/brick Options Reconfigured: user.cifs: disable nfs.disable: on machine named passtorage1 glusterd component tries to allocated all the memory from the OS and gets killed by oom-killer. Gluster did run 2 and half days before glusterd crash with light load. All machines have memory as follows: cat /proc/meminfo MemTotal: 99025408 kB cat /proc/swaps Filename Type Size Used Priority /dev/dm-1 partition 8388604 32568 -1 Following glusterd logs are gathered from this incident: passtorage1: [2014-12-11 22:33:59.999976] E [glusterd-mgmt.c:127:gd_mgmt_v3_collate_errors] 0-management: Locking failed on passtorage4. Please check log file for details. passtorage2: [2014-12-11 22:38:34.095010] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has disconnected from glusterd. [2014-12-11 22:38:34.095746] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f49103e0420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f4905d49228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f4905cbe1c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4905ca9980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f49101b5f11] ))))) 0-management: Lock for vol ingest_vol not held [2014-12-11 22:38:34.096080] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f49103e0420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f4905d49228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f4905cbe1c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4905ca9980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f49101b5f11] ))))) 0-management: Lock for vol storage_vol01 not held [2014-12-11 22:38:34.096134] E [glusterd-utils.c:148:glusterd_lock] 0-management: Unable to get lock for uuid: c20d61b6-0b70-4cae-a941-b4e5e5168548, lock held by: 765b1cb3-354d-4dd3-9ca5-59b6d0081e13 passtorage3: [2014-12-11 22:38:34.094573] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f04a77ef420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f049d158228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f049d0cd1c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f049d0b8980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f04a75c4f11] ))))) 0-management: Lock for vol ingest_vol not held [2014-12-11 22:38:34.094759] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f04a77ef420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f049d158228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f049d0cd1c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f049d0b8980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f04a75c4f11] ))))) 0-management: Lock for vol storage_vol01 not held [2014-12-11 22:38:34.094785] E [glusterd-utils.c:148:glusterd_lock] 0-management: Unable to get lock for uuid: 765b1cb3-354d-4dd3-9ca5-59b6d0081e13, lock held by: 765b1cb3-354d-4dd3-9ca5-59b6d0081e13 [2014-12-11 22:38:34.094289] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has disconnected from glusterd. passtorage4: [2014-12-11 22:33:02.912414] W [glusterd-locks.c:550:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd1648fb420] (--> /usr/lib64/glusterf s/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1ca)[0x7fd15a264baa] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x4eb9f)[0x7fd15a1e0b9f] (--> /usr/lib64/ glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x1e5)[0x7fd15a1e4005] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0xeba44)[0x7fd15a27da44] ))))) 0-managem ent: Lock for storage_vol01 held by b15935ea-4e92-42d5-9828-fedb1877a83a [2014-12-11 22:33:02.912468] E [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to acquire lock for storage_vol01 [2014-12-11 22:33:02.912539] E [glusterd-op-sm.c:6584:glusterd_op_sm] 0-management: handler returned: -1 passtorage5: [2014-12-11 22:23:02.894479] W [glusterd-locks.c:550:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1ca)[0x7fd204cebbaa] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x4eb9f)[0x7fd204c67b9f] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x1e5)[0x7fd204c6b005] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0xeba44)[0x7fd204d04a44] ))))) 0-management: Lock for ingest_vol held by b15935ea-4e92-42d5-9828-fedb1877a83a [2014-12-11 22:23:02.894524] E [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to acquire lock for ingest_vol [2014-12-11 22:23:02.894594] E [glusterd-op-sm.c:6584:glusterd_op_sm] 0-management: handler returned: -1 [2014-12-11 22:38:34.094271] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has disconnected from glusterd. [2014-12-11 22:38:34.094732] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7fd204ceb228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7fd204c601c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fd204c4b980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7fd20e756f11] ))))) 0-management: Lock for vol ingest_vol not held [2014-12-11 22:38:34.095098] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7fd204ceb228] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7fd204c601c2] (--> /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fd204c4b980] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7fd20e756f11] ))))) 0-management: Lock for vol storage_vol01 not held [2014-12-11 22:38:34.095152] E [glusterd-utils.c:148:glusterd_lock] 0-management: Unable to get lock for uuid: b35df837-1761-41dc-8e27-8d99c75dbe79, lock held by: 765b1cb3-354d-4dd3-9ca5-59b6d0081e13 Version-Release number of selected component (if applicable): 3.6.1 glusterd How reproducible: not sure howto reproduce, system was running three days and then one glusterd process crashed Steps to Reproduce: 1. updgrade glusterfs into 3.6.1 release 2. run the gluster until one glusterd gets killed 3. Actual results: glusterd process gets killed by oom-killer after running the gluster some time Expected results: glusterd does not try to allocate all the memory from OS but runs with moderate memory consumption. Additional info:
Can you please attach the core file of the glusterd instance which crashed. Also we have identified an area of code in locking/unlocking path leading to memory leaks which we are planning to fix in 3.6.2. Fix (http://review.gluster.org/#/c/9269/) is already available in master branch
Hi, unfoturnately core-file is not available from the crash, at that moment core-files where disabled from the running system.
This is not a security bug, not going to fix this in 3.6.x because of http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html
If the issue persists in the latest releases, please feel free to clone them