Description of problem: pmproxy intermittently crashes with segfault when displaying grafana-pcp pre-defined dashboard. Following is the stack trace of pmproxy when pcp-bpftrace-flame-graphs grafana-pcp dashboard is displayed. It crashes at uv_timer_stop() in libuv. [root@yuokada01 coredump]# coredumpctl debug PID: 1009424 (pmproxy) UID: 973 (pcp) GID: 972 (pcp) Signal: 11 (SEGV) Timestamp: Tue 2020-01-07 18:10:49 JST (7min ago) Command Line: /usr/libexec/pcp/bin/pmproxy Executable: /usr/libexec/pcp/bin/pmproxy Control Group: /system.slice/pmproxy.service Unit: pmproxy.service Slice: system.slice Boot ID: bd85d82735134a01b352011df89a701c Machine ID: 0bd4a025da534f20b15cd30632040b10 Hostname: yuokada01 Storage: /var/lib/systemd/coredump/core.pmproxy.973.bd85d82735134a01b352011df89a701c.1009424.1578388249000000000000.lz4 Message: Process 1009424 (pmproxy) of user 973 dumped core. Stack trace of thread 1009455: #0 0x00007fb548b681d0 uv_timer_stop (libuv.so.1) #1 0x00007fb548b684d5 uv_timer_start (libuv.so.1) #2 0x00007fb548f34a7b webgroup_use_context.isra.0 (libpcp_web.so.1) #3 0x00007fb548f34c7d webgroup_lookup_context (libpcp_web.so.1) #4 0x00007fb548f39168 pmWebGroupStore (libpcp_web.so.1) #5 0x00007fb548b678fe worker (libuv.so.1) #6 0x00007fb548b404e2 start_thread (libpthread.so.0) #7 0x00007fb5489c0643 __clone (libc.so.6) Stack trace of thread 1009424: #0 0x00007fb5489c07a6 epoll_pwait (libc.so.6) #1 0x00007fb548b7b839 uv__io_poll (libuv.so.1) #2 0x00007fb548b6bbec uv_run (libuv.so.1) #3 0x000055c3cf159212 main_loop (pmproxy) #4 0x000055c3cf158869 main (pmproxy) #5 0x00007fb5488e61a3 __libc_start_main (libc.so.6) #6 0x000055c3cf158a8e _start (pmproxy) Stack trace of thread 1009427: #0 0x00007fb5489b5a1f __poll (libc.so.6) #1 0x00007fb54865266e poll_func (libavahi-common.so.3) #2 0x00007fb5486521e1 avahi_simple_poll_run (libavahi-common.so.3) #3 0x00007fb5486523c0 avahi_simple_poll_iterate (libavahi-common.so.3) #4 0x00007fb5486525f5 avahi_simple_poll_loop (libavahi-common.so.3) #5 0x00007fb5486526cb thread (libavahi-common.so.3) #6 0x00007fb548b404e2 start_thread (libpthread.so.0) #7 0x00007fb5489c0643 __clone (libc.so.6) Stack trace of thread 1009453: #0 0x00007fb548b46d45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fb548b7870d uv_cond_wait (libuv.so.1) #2 0x00007fb548b6799d worker (libuv.so.1) #3 0x00007fb548b404e2 start_thread (libpthread.so.0) #4 0x00007fb5489c0643 __clone (libc.so.6) Stack trace of thread 1009454: #0 0x00007fb548b46d45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fb548b7870d uv_cond_wait (libuv.so.1) #2 0x00007fb548b6799d worker (libuv.so.1) #3 0x00007fb548b404e2 start_thread (libpthread.so.0) #4 0x00007fb5489c0643 __clone (libc.so.6) Stack trace of thread 1009456: #0 0x00007fb548b46d45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fb548b7870d uv_cond_wait (libuv.so.1) #2 0x00007fb548b6799d worker (libuv.so.1) #3 0x00007fb548b404e2 start_thread (libpthread.so.0) #4 0x00007fb5489c0643 __clone (libc.so.6) GNU gdb (GDB) Fedora 8.3.50.20190824-24.fc31 [...] Reading symbols from /usr/libexec/pcp/bin/pmproxy... Reading symbols from /usr/lib/debug/usr/libexec/pcp/bin/pmproxy-5.0.2-1.fc31.x86_64.debug... [New LWP 1009455] [New LWP 1009424] [New LWP 1009427] [New LWP 1009453] [New LWP 1009454] [New LWP 1009456] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/libexec/pcp/bin/pmproxy'. Program terminated with signal SIGSEGV, Segmentation fault. #0 heap_remove (less_than=0x7fb548b680f0 <timer_less_than>, node=0x7fb530026260, heap=0x7fb548b8a428 <default_loop_struct+520>) at :173 173 if (path & 1) [Current thread is 1 (Thread 0x7fb545526700 (LWP 1009455))] (gdb) thread apply all bt Thread 6 (Thread 0x7fb544d25700 (LWP 1009456)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7fb548b8a1e8 <cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:80 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fb548b8a180 <mutex>, cond=0x7fb548b8a1c0 <cond>) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at pthread_cond_wait.c:638 #3 0x00007fb548b7870d in uv_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at src/unix/thread.c:781 #4 0x00007fb548b6799d in worker (arg=0x0) at src/threadpool.c:76 #5 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #6 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 5 (Thread 0x7fb545d27700 (LWP 1009454)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7fb548b8a1e8 <cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:80 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fb548b8a180 <mutex>, cond=0x7fb548b8a1c0 <cond>) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at pthread_cond_wait.c:638 #3 0x00007fb548b7870d in uv_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at src/unix/thread.c:781 #4 0x00007fb548b6799d in worker (arg=0x0) at src/threadpool.c:76 #5 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #6 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 4 (Thread 0x7fb546528700 (LWP 1009453)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7fb548b8a1ec <cond+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:80 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fb548b8a180 <mutex>, cond=0x7fb548b8a1c0 <cond>) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at pthread_cond_wait.c:638 #3 0x00007fb548b7870d in uv_cond_wait (cond=cond@entry=0x7fb548b8a1c0 <cond>, mutex=mutex@entry=0x7fb548b8a180 <mutex>) at src/unix/thread.c:781 #4 0x00007fb548b6799d in worker (arg=0x0) at src/threadpool.c:76 #5 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #6 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 3 (Thread 0x7fb547f8e700 (LWP 1009427)): #0 0x00007fb5489b5a1f in __GI___poll (fds=fds@entry=0x7fb540000b60, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007fb54865266e in poll (__timeout=-1, __nfds=2, __fds=0x7fb540000b60) at /usr/include/bits/poll2.h:46 #2 poll_func (ufds=0x7fb540000b60, nfds=2, timeout=-1, userdata=0x55c3cf999330) at thread-watch.c:56 #3 0x00007fb5486521e1 in avahi_simple_poll_run (s=0x55c3cf999370) at simple-watch.c:527 #4 avahi_simple_poll_run (s=0x55c3cf999370) at simple-watch.c:518 #5 0x00007fb5486523c0 in avahi_simple_poll_iterate (s=0x55c3cf999370, timeout=<optimized out>) at simple-watch.c:602 #6 0x00007fb5486525f5 in avahi_simple_poll_loop (s=0x55c3cf999370) at simple-watch.c:646 #7 0x00007fb5486526cb in thread (userdata=0x55c3cf999320) at thread-watch.c:71 #8 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #9 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7fb547f8f900 (LWP 1009424)): #0 0x00007fb5489c07a6 in __GI_epoll_pwait (epfd=8, events=events@entry=0x7ffc2b74e6b0, maxevents=maxevents@entry=1024, timeout=timeout@entry=29943, set=set@entry=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42 #1 0x00007fb548b7b839 in uv__io_poll (loop=loop@entry=0x7fb548b8a220 <default_loop_struct>, timeout=29943) at src/unix/linux-core.c:284 #2 0x00007fb548b6bbec in uv_run (loop=0x7fb548b8a220 <default_loop_struct>, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:373 #3 0x000055c3cf159212 in main_loop (arg=0x55c3cf996e20) at server.c:852 #4 0x000055c3cf158869 in main (argc=1, argv=0x7ffc2b751a78) at pmproxy.c:387 Thread 1 (Thread 0x7fb545526700 (LWP 1009455)): #0 heap_remove (less_than=0x7fb548b680f0 <timer_less_than>, node=0x7fb530026260, heap=0x7fb548b8a428 <default_loop_struct+520>) at src/heap-inl.h:173 #1 uv_timer_stop (handle=0x7fb5300261f8) at src/timer.c:106 #2 0x00007fb548b684d5 in uv_timer_start (handle=handle@entry=0x7fb5300261f8, cb=cb@entry=0x7fb548f34060 <webgroup_timeout_context>, timeout=30000, repeat=repeat@entry=0) at src/timer.c:81 #3 0x00007fb548f34a7b in webgroup_use_context (cp=cp@entry=0x7fb530026170, status=status@entry=0x7fb545525bd4, message=message@entry=0x7fb545525be0, arg=0x55c3d0c4d0a0) at webgroup.c:227 #4 0x00007fb548f34c7d in webgroup_lookup_context (sp=sp@entry=0x55c3cf16d800 <pmwebapi_settings>, id=id@entry=0x7fb545525bc8, params=params@entry=0x7fb53001e9b0, status=status@entry=0x7fb545525bd4, message=message@entry=0x7fb545525be0, a--Type <RET> for more, q to quit, c to continue without paging-- rg=arg@entry=0x55c3d0c4d0a0) at webgroup.c:288 #5 0x00007fb548f39168 in pmWebGroupStore (settings=0x55c3cf16d800 <pmwebapi_settings>, id=<optimized out>, params=0x7fb53001e9b0, arg=0x55c3d0c4d0a0) at webgroup.c:1948 #6 0x00007fb548b678fe in worker (arg=0x0) at src/threadpool.c:122 #7 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #8 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) bt full #0 heap_remove (less_than=0x7fb548b680f0 <timer_less_than>, node=0x7fb530026260, heap=0x7fb548b8a428 <default_loop_struct+520>) at src/heap-inl.h:173 k = 4 smallest = <optimized out> max = 0x0 child = <optimized out> path = 10 n = <optimized out> smallest = <optimized out> max = <optimized out> child = <optimized out> path = <optimized out> k = <optimized out> n = <optimized out> #1 uv_timer_stop (handle=0x7fb5300261f8) at src/timer.c:106 No locals. #2 0x00007fb548b684d5 in uv_timer_start (handle=handle@entry=0x7fb5300261f8, cb=cb@entry=0x7fb548f34060 <webgroup_timeout_context>, timeout=30000, repeat=repeat@entry=0) at src/timer.c:81 clamped_timeout = <optimized out> #3 0x00007fb548f34a7b in webgroup_use_context (cp=cp@entry=0x7fb530026170, status=status@entry=0x7fb545525bd4, message=message@entry=0x7fb545525be0, arg=0x55c3d0c4d0a0) at webgroup.c:227 errbuf = "\000\331\026\317\303U\000\000\200ZRE\265\177\000\000@ZRE\265\177\000\000d\346\363H\265\177\000\000\223\317\002<\265\177\000\000\205`\026\317\303U\000\000\000\000\000\000\000\000\000\000~\000\000\000\000\000\000\000\204`\026\317\303U\000\000\227)\361H\265\177\000\000\060[RE\265\177\000\000\065חH\265\177\000\000Ћ\002<\265\177\000\000\375\330\362H\265\177\000\000 [\005E\000\000\000\000\000\330\026\317\303U\000" sts = <optimized out> #4 0x00007fb548f34c7d in webgroup_lookup_context (sp=sp@entry=0x55c3cf16d800 <pmwebapi_settings>, id=id@entry=0x7fb545525bc8, params=params@entry=0x7fb53001e9b0, status=status@entry=0x7fb545525bd4, message=message@entry=0x7fb545525be0, arg=arg@entry=0x55c3d0c4d0a0) at webgroup.c:288 groups = <optimized out> cp = 0x7fb530026170 key = 671023954 access = {username = 0x0, password = 0x0, realm = 0x7fb530006da3 "pmapi/671023954"} endptr = 0x55c3d17b99ca "" #5 0x00007fb548f39168 in pmWebGroupStore (settings=0x55c3cf16d800 <pmwebapi_settings>, id=<optimized out>, params=0x7fb53001e9b0, arg=0x55c3d0c4d0a0) at webgroup.c:1948 cp = <optimized out> mp = <optimized out> length = <optimized out> err = "\374\b\000\000\000\000\000\000\001\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\200\\RE\265\177\000\000\371\021\000\000\000\000\000\000\235l\363H\265\177\000\000\300j\264H\265\177\000\000\200\\RE\265\177\000\000\060U\003<\265\177\000\000\000\000\000\000\000\000\000\000\371\021\000\000\000\000\000\000\300\241\270H\265\177\000\000\200\241\270H\265\177\000\000\000\000\000\000\265\177\000\000\060U\003<\265\177\000\000\000gn\364Y{", <incomplete sequence \305> metric = <optimized out> value = 0x7fb53800c633 "profile:hz:99 { @[kstack] = count(); }" pmid = <optimized out> instids = <optimized out> instnames = <optimized out> msg = 0x0 names = 0x0 ids = 0x0 sts = 0 numids = 0 numnames = 0 #6 0x00007fb548b678fe in worker (arg=0x0) at src/threadpool.c:122 w = 0x55c3d262d1c8 q = 0x55c3d262d1e0 is_slow_work = 0 #7 0x00007fb548b404e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 ret = <optimized out> pd = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140416528836352, -1448831324071641545, 140721037566030, 140721037566031, 140721037566176, 140416528834048, 1481188616699919927, 1481164201094430263}, mask_was_saved = 0}}, priv = { pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #8 0x00007fb5489c0643 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Version-Release number of selected component (if applicable): pcp-5.0.2-1.fc31.x86_64 pcp-libs-5.0.2-1.fc31.x86_64 grafana-pcp-1.0.4-2.fc31.noarch pcp-pmda-bpftrace-5.0.1-1.fc31.x86_64 pcp-pmda-bcc-5.0.1-1.fc31.x86_64 grafana-6.3.6-1.fc31.x86_64 libuv-1.32.0-1.fc31.x86_64 How reproducible: intermittently Steps to Reproduce: I can't always reproduce this issue, but it intermittently occurs with the following step. 1. Install pcp-zeroconf, grafana-pcp packages 2. Install pcp-pmda-bpftrace and pcp-pmda-bcc packages and enable them by ./Install script 3. Install grafana package and start grafana-server service 4. Start pmproxy service 5. Access grafana web page (http://localhost:3000), login, and configure "PCP bpftrace" and "PCP Vector" as Data Sources. 6. Display pre-defined dashboards such as "PCP bpftrace System Analysis", "PCP bpftrace Flame Graphs", or "PCP Vector Host Overview" 7. Just wait (Or move to other browser tab and move back to grafana tab.) Actual results: pmproxy segfault Expected results: pmproxy keeps working Additional info:
- This issue occured at least on the following 3 dashboards. Crash functions are a bit different respectively, but all are after webgroup_use_context function. pcp-bpftrace-flame-graphs pcp-bpftrace-system-analysis pcp-vector-host-overview [pcp-bpftrace-flame-graphs] Jan 07 18:10:48 yuokada01 grafana-server[1459]: t=2020-01-07T18:10:48+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/2144956960/store status=400 remote_addr =[::1] time_ms=13 size=156 referer="http://localhost:3000/d/z2p4HhbWz/pcp-bpftrace-flame-graphs?orgId=1&refresh=1s" Jan 07 18:10:49 yuokada01 audit[1009424]: ANOM_ABEND auid=4294967295 uid=973 gid=972 ses=4294967295 subj=system_u:system_r:pcp_pmproxy_t:s0 pid=1009424 comm="pmproxy" exe="/usr/libexec/pcp/bin/pmproxy" sig=11 res=1 Jan 07 18:10:49 yuokada01 kernel: pmproxy[1009455]: segfault at 0 ip 00007fb548b681d0 sp 00007fb5455259d0 error 4 in libuv.so.1.0.0[7fb548b63000+1b000] Jan 07 18:10:49 yuokada01 kernel: Code: 01 00 00 44 89 c1 31 d2 31 c0 89 cf 01 c0 d1 e9 83 c2 01 83 e7 01 09 f8 83 f9 01 75 ed 4c 89 d9 85 d2 74 1a 66 0f 1f 44 00 00 <48> 8b 39 a8 01 48 8d 4f 08 48 0f 44 cf d1 e8 83 ea 01 75 ec 4c 8b Jan 07 18:10:49 yuokada01 systemd[1]: Started Process Core Dump (PID 1010225/UID 0). Jan 07 18:10:49 yuokada01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@4-1010225-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal= ? res=success' Jan 07 18:10:49 yuokada01 grafana-server[1459]: 2020/01/07 18:10:49 http: proxy error: read tcp [::1]:32794->[::1]:44322: read: connection reset by peer Jan 07 18:10:49 yuokada01 grafana-server[1459]: t=2020-01-07T18:10:49+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/671023954/store status=502 remote_addr= [::1] time_ms=345 size=0 referer="http://localhost:3000/d/z2p4HhbWz/pcp-bpftrace-flame-graphs?orgId=1&refresh=1s" Jan 07 18:10:49 yuokada01 grafana-server[1459]: 2020/01/07 18:10:49 http: proxy error: read tcp [::1]:32796->[::1]:44322: read: connection reset by peer Jan 07 18:10:49 yuokada01 grafana-server[1459]: t=2020-01-07T18:10:49+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/560199622/store status=502 remote_addr= [::1] time_ms=345 size=0 referer="http://localhost:3000/d/z2p4HhbWz/pcp-bpftrace-flame-graphs?orgId=1&refresh=1s" Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Main process exited, code=killed, status=11/SEGV Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Failed with result 'signal'. Jan 07 18:10:49 yuokada01 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Consumed 1.938s CPU time. Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Service RestartSec=100ms expired, scheduling restart. Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Scheduled restart job, restart counter is at 35. Jan 07 18:10:49 yuokada01 systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon. Jan 07 18:10:49 yuokada01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 07 18:10:49 yuokada01 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 07 18:10:49 yuokada01 systemd[1]: pmproxy.service: Consumed 1.938s CPU time. Jan 07 18:10:49 yuokada01 systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jan 07 18:10:49 yuokada01 systemd-coredump[1010226]: Process 1009424 (pmproxy) of user 973 dumped core. [pcp-bpftrace-system-analysis] Jan 07 19:00:49 yuokada01 grafana-server[1459]: t=2020-01-07T19:00:49+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/341091755/store status=400 remote_addr= [::1] time_ms=20 size=155 referer="http://localhost:3000/d/bpt4N2bZk/pcp-bpftrace-system-analysis?orgId=1&refresh=1s" Jan 07 19:00:49 yuokada01 grafana-server[1459]: t=2020-01-07T19:00:49+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/2139318111/store status=400 remote_addr =[::1] time_ms=20 size=156 referer="http://localhost:3000/d/bpt4N2bZk/pcp-bpftrace-system-analysis?orgId=1&refresh=1s" Jan 07 19:00:49 yuokada01 kernel: pmproxy[1017134]: segfault at 8 ip 00007f29e49d21d0 sp 00007f29e138fac0 error 4 in libuv.so.1.0.0[7f29e49cd000+1b000] Jan 07 19:00:49 yuokada01 kernel: Code: 01 00 00 44 89 c1 31 d2 31 c0 89 cf 01 c0 d1 e9 83 c2 01 83 e7 01 09 f8 83 f9 01 75 ed 4c 89 d9 85 d2 74 1a 66 0f 1f 44 00 00 <48> 8b 39 a8 01 48 8d 4f 08 48 0f 44 cf d1 e8 83 ea 01 75 ec 4c 8b Jan 07 19:00:49 yuokada01 audit[1017127]: ANOM_ABEND auid=4294967295 uid=973 gid=972 ses=4294967295 subj=system_u:system_r:pcp_pmproxy_t:s0 pid=1017127 comm="pmproxy" exe="/usr/libexec/pcp/bin/pmproxy" sig=11 res=1 Jan 07 19:00:49 yuokada01 systemd[1]: Started Process Core Dump (PID 1017267/UID 0). Jan 07 19:00:49 yuokada01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@14-1017267-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal =? res=success' Jan 07 19:00:50 yuokada01 grafana-server[1459]: 2020/01/07 19:00:50 http: proxy error: read tcp [::1]:33574->[::1]:44322: read: connection reset by peer Jan 07 19:00:50 yuokada01 grafana-server[1459]: t=2020-01-07T19:00:50+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/189540555/fetch status=502 remote_addr= [::1] time_ms=422 size=0 referer="http://localhost:3000/d/bpt4N2bZk/pcp-bpftrace-system-analysis?orgId=1&refresh=1s" Jan 07 19:00:50 yuokada01 systemd[1]: pmproxy.service: Main process exited, code=killed, status=11/SEGV Jan 07 19:00:50 yuokada01 systemd[1]: pmproxy.service: Failed with result 'signal'. Jan 07 19:00:50 yuokada01 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Jan 07 19:00:50 yuokada01 grafana-server[1459]: 2020/01/07 19:00:50 http: proxy error: dial tcp [::1]:44322: connect: connection refused Jan 07 19:00:50 yuokada01 grafana-server[1459]: t=2020-01-07T19:00:50+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/context status=502 remote_addr=[::1] ti me_ms=20 size=0 referer="http://localhost:3000/d/bpt4N2bZk/pcp-bpftrace-system-analysis?orgId=1&refresh=1s" Jan 07 19:00:50 yuokada01 systemd-coredump[1017268]: Process 1017127 (pmproxy) of user 973 dumped core. Stack trace of thread 1017134: #0 0x00007f29e49d21d0 heap_remove (libuv.so.1) #1 0x00007f29e49d24d5 uv_timer_start (libuv.so.1) #2 0x00007f29e4d9ea7b webgroup_use_context (libpcp_web.so.1) #3 0x00007f29e4d9ec7d webgroup_lookup_context (libpcp_web.so.1) #4 0x00007f29e4da0c1e pmWebGroupFetch (libpcp_web.so.1) #5 0x00007f29e49d18fe worker (libuv.so.1) #6 0x00007f29e49aa4e2 start_thread (libpthread.so.0) #7 0x00007f29e482a643 __clone (libc.so.6) Stack trace of thread 1017130: #0 0x00007f29e481fa1f __GI___poll (libc.so.6) #1 0x00007f29e44bc66e poll (libavahi-common.so.3) #2 0x00007f29e44bc1e1 avahi_simple_poll_run (libavahi-common.so.3) #3 0x00007f29e44bc3c0 avahi_simple_poll_iterate (libavahi-common.so.3) #4 0x00007f29e44bc5f5 avahi_simple_poll_loop (libavahi-common.so.3) #5 0x00007f29e44bc6cb thread (libavahi-common.so.3) #6 0x00007f29e49aa4e2 start_thread (libpthread.so.0) #7 0x00007f29e482a643 __clone (libc.so.6) Stack trace of thread 1017132: #0 0x00007f29e49b0d45 futex_wait_cancelable (libpthread.so.0) #1 0x00007f29e49e270d uv_cond_wait (libuv.so.1) #2 0x00007f29e49d199d worker (libuv.so.1) #3 0x00007f29e49aa4e2 start_thread (libpthread.so.0) #4 0x00007f29e482a643 __clone (libc.so.6) Stack trace of thread 1017127: #0 0x00007f29e482a7a6 __GI_epoll_pwait (libc.so.6) #1 0x00007f29e49e5839 uv__io_poll (libuv.so.1) #2 0x00007f29e49d5bec uv_run (libuv.so.1) #3 0x0000563fa0236212 main_loop (pmproxy) #4 0x0000563fa0235869 main (pmproxy) #5 0x00007f29e47501a3 __libc_start_main (libc.so.6) #6 0x0000563fa0235a8e _start (pmproxy) Stack trace of thread 1017135: #0 0x00007f29e49b0d45 futex_wait_cancelable (libpthread.so.0) #1 0x00007f29e49e270d uv_cond_wait (libuv.so.1) #2 0x00007f29e49d199d worker (libuv.so.1) #3 0x00007f29e49aa4e2 start_thread (libpthread.so.0) #4 0x00007f29e482a643 __clone (libc.so.6) Stack trace of thread 1017133: #0 0x00007f29e49b0d45 futex_wait_cancelable (libpthread.so.0) #1 0x00007f29e49e270d uv_cond_wait (libuv.so.1) #2 0x00007f29e49d199d worker (libuv.so.1) #3 0x00007f29e49aa4e2 start_thread (libpthread.so.0) #4 0x00007f29e482a643 __clone (libc.so.6) [pcp-vector-host-overview] Jan 07 19:11:35 yuokada01 kernel: pmproxy[1017597]: segfault at 8 ip 00007fbb91f4c469 sp 00007fbb89e0ead0 error 6 in libuv.so.1.0.0[7fbb91f47000+1b000] Jan 07 19:11:35 yuokada01 kernel: Code: 8b 80 08 02 00 00 85 c9 75 0a e9 a2 00 00 00 66 90 4c 8b 07 f6 c2 01 49 8d 78 08 49 0f 44 f8 d1 ea 83 e9 01 75 eb 4c 89 43 78 <4c> 89 17 89 b0 10 02 00 00 48 8b 73 78 48 85 f6 75 19 eb 23 0f 1f Jan 07 19:11:35 yuokada01 systemd[1]: Started Process Core Dump (PID 1018542/UID 0). Jan 07 19:11:35 yuokada01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@16-1018542-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal =? res=success' Jan 07 19:11:36 yuokada01 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Main process exited, code=killed, status=11/SEGV Jan 07 19:11:36 yuokada01 grafana-server[1459]: 2020/01/07 19:11:36 http: proxy error: read tcp [::1]:34698->[::1]:44322: read: connection reset by peer Jan 07 19:11:36 yuokada01 grafana-server[1459]: 2020/01/07 19:11:36 http: proxy error: read tcp [::1]:34700->[::1]:44322: read: connection reset by peer Jan 07 19:11:36 yuokada01 grafana-server[1459]: t=2020-01-07T19:11:36+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/2/pmapi/2067894822/fetch status=502 remote_addr =[::1] time_ms=455 size=0 referer="http://localhost:3000/d/z5pVH2xZk/pcp-vector-host-overview?orgId=1&refresh=1s" Jan 07 19:11:36 yuokada01 grafana-server[1459]: t=2020-01-07T19:11:36+0900 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/3/pmapi/1578411949/fetch status=502 remote_addr =[::1] time_ms=447 size=0 referer="http://localhost:3000/d/z5pVH2xZk/pcp-vector-host-overview?orgId=1&refresh=1s" Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Failed with result 'signal'. Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Consumed 1.935s CPU time. Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Service RestartSec=100ms expired, scheduling restart. Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Scheduled restart job, restart counter is at 47. Jan 07 19:11:36 yuokada01 systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon. Jan 07 19:11:36 yuokada01 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 07 19:11:36 yuokada01 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 07 19:11:36 yuokada01 systemd[1]: pmproxy.service: Consumed 1.935s CPU time. Jan 07 19:11:36 yuokada01 systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jan 07 19:11:36 yuokada01 systemd-coredump[1018543]: Process 1017567 (pmproxy) of user 973 dumped core. Stack trace of thread 1017597: #0 0x00007fbb91f4c469 heap_insert (libuv.so.1) #1 0x00007fbb92318a7b webgroup_use_context (libpcp_web.so.1) #2 0x00007fbb92318c7d webgroup_lookup_context (libpcp_web.so.1) #3 0x00007fbb9231ac1e pmWebGroupFetch (libpcp_web.so.1) #4 0x00007fbb91f4b8fe worker (libuv.so.1) #5 0x00007fbb91f244e2 start_thread (libpthread.so.0) #6 0x00007fbb91da4643 __clone (libc.so.6) Stack trace of thread 1017594: #0 0x00007fbb91f2ad45 futex_wait_cancelable (libpthread.so.0) #1 0x00007fbb91f5c70d uv_cond_wait (libuv.so.1) #2 0x00007fbb91f4b99d worker (libuv.so.1) #3 0x00007fbb91f244e2 start_thread (libpthread.so.0) #4 0x00007fbb91da4643 __clone (libc.so.6) Stack trace of thread 1017596: #0 0x00007fbb91f2ec8c __libc_send (libpthread.so.0) #1 0x00007fbb91edbaaf __pmSend (libpcp.so.3) #2 0x00007fbb91e96f61 __pmXmitPDU (libpcp.so.3) #3 0x00007fbb91e9239a __pmSendFetch (libpcp.so.3) #4 0x00007fbb91e8a8cc pmFetch_ctx (libpcp.so.3) #5 0x00007fbb923182bb webgroup_fetch (libpcp_web.so.1) #6 0x00007fbb9231af82 webgroup_fetch_pmids (libpcp_web.so.1) #7 0x00007fbb91f4b8fe worker (libuv.so.1) #8 0x00007fbb91f244e2 start_thread (libpthread.so.0) #9 0x00007fbb91da4643 __clone (libc.so.6) Stack trace of thread 1017595: #0 0x00007fbb91f2ad45 futex_wait_cancelable (libpthread.so.0) #1 0x00007fbb91f5c70d uv_cond_wait (libuv.so.1) #2 0x00007fbb91f4b99d worker (libuv.so.1) #3 0x00007fbb91f244e2 start_thread (libpthread.so.0) #4 0x00007fbb91da4643 __clone (libc.so.6) Stack trace of thread 1017567: #0 0x00007fbb91da47a6 __GI_epoll_pwait (libc.so.6) #1 0x00007fbb91f5f839 uv__io_poll (libuv.so.1) #2 0x00007fbb91f4fbec uv_run (libuv.so.1) #3 0x000055c5f2934212 main_loop (pmproxy) #4 0x000055c5f2933869 main (pmproxy) #5 0x00007fbb91cca1a3 __libc_start_main (libc.so.6) #6 0x000055c5f2933a8e _start (pmproxy) Stack trace of thread 1017570: #0 0x00007fbb91d99a1f __GI___poll (libc.so.6) #1 0x00007fbb91a3666e poll (libavahi-common.so.3) #2 0x00007fbb91a361e1 avahi_simple_poll_run (libavahi-common.so.3) #3 0x00007fbb91a363c0 avahi_simple_poll_iterate (libavahi-common.so.3) #4 0x00007fbb91a365f5 avahi_simple_poll_loop (libavahi-common.so.3) #5 0x00007fbb91a366cb thread (libavahi-common.so.3) #6 0x00007fbb91f244e2 start_thread (libpthread.so.0) #7 0x00007fbb91da4643 __clone (libc.so.6)
I am observing the same issue as well ( pcp-5.0.2-3 build ). In addition to the stack traces above, pmproxy.log shows as the last message "free(): invalid pointer" before the log rotates.
I'm not 100% sure this is the fix for this issue (not seeing it here in any reliable kinda way), but if you can please try this fix ... commit 548cad8c5f1c7f9c355c2ea0163c1f37c4dfc20f Author: Nathan Scott <nathans> Date: Tue Jun 2 16:05:00 2020 +1000 libpcp_web: ensure context is freed only after timer is fully closed Resolve a race condition in the context destruction. Found by code auditing only and unable to reproduce so far, but this is an initial attempt towards resolving Red Hat bugzilla #1789312
FEDORA-2020-ee90ea0b98 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-ee90ea0b98
FEDORA-2020-dc0cc449b3 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-dc0cc449b3
FEDORA-2020-dc0cc449b3 has been pushed to the Fedora 31 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-dc0cc449b3` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-dc0cc449b3 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-ee90ea0b98 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-ee90ea0b98` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ee90ea0b98 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-dc0cc449b3 has been pushed to the Fedora 31 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2020-ee90ea0b98 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report.