| Summary: | [glusterfs-3.3.0qa22]: glusterd entering an infinite loop and using 100% CPU | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> |
| Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> |
| Status: | CLOSED WORKSFORME | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | mainline | CC: | gluster-bugs, nsathyan, vbellur |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-04-16 13:16:05 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Johnny, Could you check if this issue continues to happen? |
Description of problem: 4 peer cluster in which 2x2 distributed replicate volume is running. glusterd on one of the peers has entered infinite loop while looping around the list of peers (not coming out of the loop) and causing almost 100% CPU usage. This is the backtrace when gdb was attached to the running glusterd process. 0x00000034c232b500 in __strncasecmp_l_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 (gdb) bt #0 0x00000034c232b500 in __strncasecmp_l_sse42 () from /lib64/libc.so.6 #1 0x00007f038a916f29 in glusterd_friend_find_by_hostname (hoststr=0x1ddd3b0 "10.1.11.130", peerinfo=0x7ffff003d180) at ../../../../../xlators/mgmt/glusterd/src/glusterd-utils.c:3608 #2 0x00007f038a9173b1 in glusterd_hostname_to_uuid (hostname=0x1ddd3b0 "10.1.11.130", uuid=0x1dde7c0 "\274+\370\270\250\343Fd\212\235\036JP<\257\351\311]") at ../../../../../xlators/mgmt/glusterd/src/glusterd-utils.c:3683 #3 0x00007f038a90cb29 in glusterd_resolve_brick (brickinfo=0x1ddd3b0) at ../../../../../xlators/mgmt/glusterd/src/glusterd-utils.c:665 #4 0x00007f038a9495bb in glusterd_op_stage_start_volume (dict=0x1df16b0, op_errstr=0x7ffff003dad0) at ../../../../../xlators/mgmt/glusterd/src/glusterd-volume-ops.c:790 #5 0x00007f038a907cd0 in glusterd_op_stage_validate (op=GD_OP_START_VOLUME, dict=0x1df16b0, op_errstr=0x7ffff003dad0, rsp_dict=0x1df17c0) at ../../../../../xlators/mgmt/glusterd/src/glusterd-op-sm.c:2305 #6 0x00007f038a907610 in glusterd_op_ac_stage_op (event=0x1df19f0, ctx=0x1dd8d00) at ../../../../../xlators/mgmt/glusterd/src/glusterd-op-sm.c:2144 #7 0x00007f038a90a872 in glusterd_op_sm () at ../../../../../xlators/mgmt/glusterd/src/glusterd-op-sm.c:3600 #8 0x00007f038a8f829f in glusterd_handle_stage_op (req=0x7f038a59904c) at ../../../../../xlators/mgmt/glusterd/src/glusterd-handler.c:601 #9 0x00007f038e29cf89 in rpcsvc_handle_rpc_call (svc=0x1dc8de0, trans=0x1de8a80, msg=0x1dd0f50) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514 #10 0x00007f038e29d32c in rpcsvc_notify (trans=0x1de8a80, mydata=0x1dc8de0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1dd0f50) at ../../../../rpc/rpc-lib/src/rpcsvc.c:610 #11 0x00007f038e2a2c08 in rpc_transport_notify (this=0x1de8a80, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1dd0f50) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498 #12 0x00007f038a38e23d in socket_event_poll_in (this=0x1de8a80) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675 #13 0x00007f038a38e7c1 in socket_event_handler (fd=11, idx=12, data=0x1de8a80, poll_in=1, poll_out=0, poll_err=0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790 #14 0x00007f038e4fbc20 in event_dispatch_epoll_handler (event_pool=0x1dc7b80, events=0x1dd4b70, i=0) at ../../../libglusterfs/src/event.c:794 #15 0x00007f038e4fbe43 in event_dispatch_epoll (event_pool=0x1dc7b80) at ../../../libglusterfs/src/event.c:856 #16 0x00007f038e4fc1ce in event_dispatch (event_pool=0x1dc7b80) at ../../../libglusterfs/src/event.c:956 #17 0x0000000000407c2e in main (argc=1, argv=0x7ffff003e118) at ../../../glusterfsd/src/glusterfsd.c:1601 (gdb) f 1 #1 0x00007f038a916f29 in glusterd_friend_find_by_hostname (hoststr=0x1ddd3b0 "10.1.11.130", peerinfo=0x7ffff003d180) at ../../../../../xlators/mgmt/glusterd/src/glusterd-utils.c:3608 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) p entry->hostname $1 = 0x1de58b0 "10.1.11.144" (gdb) p hoststr $2 = 0x1ddd3b0 "10.1.11.130" (gdb) l 3603 priv = THIS->private; 3604 3605 GF_ASSERT (priv); 3606 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { 3608 if (!strncasecmp (entry->hostname, hoststr, 3609 1024)) { 3610 3611 gf_log ("glusterd", GF_LOG_DEBUG, 3612 "Friend %s found.. state: %d", hoststr, (gdb) p *entry $3 = {uuid = "\355\037\333\001\222\006As\233\t\356\000\071\264\207\036", uuid_str = '\000' <repeats 49 times>, state = { state = GD_FRIEND_STATE_BEFRIENDED, transition_time = {tv_sec = 0, tv_usec = 0}}, hostname = 0x1de58b0 "10.1.11.144", port = 0, uuid_list = {next = 0x1de5a20, prev = 0x1de5a20}, op_peers_list = {next = 0x0, prev = 0x0}, rpc = 0x1de6690, mgmt = 0x7f038ab7a3a0, peer = 0x7f038ab7a420, connected = 1, shandle = 0x1de5860, sm_log = {transitions = 0x1de5ad0, current = 3, size = 50, count = 4, state_name_get = 0x7f038a8fed8e <glusterd_friend_sm_state_name_get>, event_name_get = 0x7f038a8fedc4 <glusterd_friend_sm_event_name_get>}} (gdb) n Single stepping until exit from function __strncasecmp_l_sse42, which has no line number information. glusterd_friend_find_by_hostname (hoststr=0x1ddd3b0 "10.1.11.130", peerinfo=0x7ffff003d180) at ../../../../../xlators/mgmt/glusterd/src/glusterd-utils.c:3607 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) 3608 if (!strncasecmp (entry->hostname, hoststr, (gdb) 3607 list_for_each_entry (entry, &priv->peers, uuid_list) { (gdb) p priv->peers $4 = {next = 0x1de2a00, prev = 0x1de5a20} (gdb) p priv->peers->next $5 = (struct list_head *) 0x1de2a00 (gdb) p priv->peers->next->next $6 = (struct list_head *) 0x1de4170 (gdb) p priv->peers->next->next->next $7 = (struct list_head *) 0x1de5a20 (gdb) p priv->peers->next->next->next->next $8 = (struct list_head *) 0x1de5a20 (gdb) p priv->peers->next->next->next->next->next $9 = (struct list_head *) 0x1de5a20 (gdb) p priv->peers->next->next->next->next->next->next $10 = (struct list_head *) 0x1de5a20 (gdb) p priv->peers->next->next->next->next->next->next->next $11 = (struct list_head *) 0x1de5a20 (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: glusterd entering infinite loop and using 100% CPU Expected results: glusterd should not enter the infinite loop and thus should not use 100% CPU Additional info: glusterd log: [2012-02-16 00:58:00.724257] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 00:58:00.724327] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 00:58:00.725060] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 00:58:00.725119] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:03:33.805566] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:03:33.805625] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:03:33.805675] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 01:03:33.806401] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 01:03:33.806453] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:09:06.889849] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:09:06.889912] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:09:06.889990] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 01:09:06.890855] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 01:09:06.890926] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:14:39.988029] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:14:39.988120] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:14:39.988189] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 01:14:39.988862] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 01:14:39.988914] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:20:13.096929] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:20:13.097017] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:20:13.097090] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 01:20:13.097628] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 01:20:13.097683] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:25:46.253557] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:25:46.253680] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:25:46.253743] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-02-16 01:25:46.254283] I [glusterd-handler.c:1291:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: c8128c3e-efeb-4 fc6-85c3-f9a8e3a4e52b [2012-02-16 01:25:46.254333] I [glusterd-handler.c:1267:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-02-16 01:31:19.335760] I [glusterd-handler.c:487:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: c8128c3e-efeb-4fc6-8 5c3-f9a8e3a4e52b [2012-02-16 01:31:19.335831] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b [2012-02-16 01:31:19.335913] I [glusterd-handler.c:1247:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 : gluster peer status on other peers of the cluster. 10.1.11.131: gluster peer status Number of Peers: 3 Hostname: 10.1.11.130 Uuid: bc2bf8b8-a8e3-4664-8a9d-1e4a503cafe9 State: Peer in Cluster (Connected) Hostname: 10.1.11.145 Uuid: 1f3f3e03-3ba9-43c7-ac85-3a31a4f0bdbd State: Peer in Cluster (Connected) Hostname: 10.1.11.144 Uuid: ed1fdb01-9206-4173-9b09-ee0039b4871e State: Peer in Cluster (Connected) 10.1.11.144: gluster peer status Number of Peers: 3 Hostname: 10.1.11.131 Uuid: c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b State: Peer in Cluster (Connected) Hostname: 10.1.11.145 Uuid: 1f3f3e03-3ba9-43c7-ac85-3a31a4f0bdbd State: Peer in Cluster (Connected) Hostname: 10.1.11.130 Uuid: bc2bf8b8-a8e3-4664-8a9d-1e4a503cafe9 State: Peer in Cluster (Connected) 10.1.11.145: gluster peer status Number of Peers: 4 Hostname: 10.1.11.130 Uuid: bc2bf8b8-a8e3-4664-8a9d-1e4a503cafe9 State: Peer in Cluster (Connected) Hostname: 10.1.11.131 Uuid: c8128c3e-efeb-4fc6-85c3-f9a8e3a4e52b State: Peer in Cluster (Connected) Hostname: 10.1.11.144 Uuid: ed1fdb01-9206-4173-9b09-ee0039b4871e State: Peer in Cluster (Connected) Hostname: 10.1.11.144 Uuid: ed1fdb01-9206-4173-9b09-ee0039b4871e State: Peer in Cluster (Connected) [root@RHEL6 ~]# Above 10.1.11.144 is printed twice. top command on the peer where glusterd took 100% CPU: top - 07:25:39 up 2 days, 21 min, 2 users, load average: 1.39, 1.61, 1.77 Tasks: 128 total, 3 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 50.2%us, 0.0%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2054952k total, 1906528k used, 148424k free, 11592k buffers Swap: 1048568k total, 111052k used, 937516k free, 535276k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22243 root 20 0 204m 66m 1120 R 100.1 3.3 256:24.38 glusterd 25293 root 20 0 15016 1228 940 R 0.3 0.1 0:00.02 top 1 root 20 0 19324 1032 836 S 0.0 0.1 0:01.02 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:01.25 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.17 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 7 root RT 0 0 0 0 S 0.0 0.0 0:01.87 migration/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 9 root 20 0 0 0 0 S 0.0 0.0