2106361 – dnsmasq high CPU usage in 4.11 spoke deployment or after 4.10.21 to 4.11.0-rc.1 upgrade on an SNO node

Bug 2106361 - dnsmasq high CPU usage in 4.11 spoke deployment or after 4.10.21 to 4.11.0-rc.1 upgrade on an SNO node

Summary: dnsmasq high CPU usage in 4.11 spoke deployment or after 4.10.21 to 4.11.0-rc...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	Infrastructure Operator
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Igal Tsoiref
QA Contact:	Chad Crum
Docs Contact:	Derek
URL:
Whiteboard:
Duplicates (1):	2116549 (view as bug list)
Depends On:	1919894
Blocks:	2120357 2120711
TreeView+	depends on / blocked

Reported:	2022-07-12 13:05 UTC by Marius Cornea
Modified:	2023-04-01 23:31 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2120357 2120711 (view as bug list)
Environment:
Last Closed:	2022-09-21 12:54:05 UTC
Target Upstream Version:
Embargoed:
Flags:	yliu1: needinfo- ccrum: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	stolostron backlog issues 24200	0	None	None	None	2022-07-12 20:38:42 UTC
Red Hat Issue Tracker	MGMTBUGSM-465	0	None	None	None	2022-07-12 13:58:28 UTC

Description Marius Cornea 2022-07-12 13:05:23 UTC

Description of problem:

After upgrading an SNO node deployed and configured via Telco ZTP process dnsmasq process reports ~90% CPU usage:

top - 12:58:49 up 47 min,  1 user,  load average: 7.69, 10.34, 9.45
Tasks: 1462 total,   4 running, 1458 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.2 us,  1.0 sy,  0.0 ni, 95.7 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  95796.6 total,  37786.2 free,  45350.2 used,  12660.3 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  49854.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                 
   2920 dnsmasq   20   0   77608   4852   4376 R  92.6   0.0  40:18.11 dnsmasq   

Version-Release number of selected component (if applicable):
4.11.0-rc.1 

How reproducible:
Inconsistent but reproducible.

Steps to Reproduce:
1. Deploy and configure 4.10.21 SNO node via Telco ZTP process 
2. Upgrade node to 4.11.0-rc.1 
3. After upgrade is complete SSH into the node and run top

Actual results:
dnsmasq reports constantly CPU usage ~90%

Expected results:
less CPU usage for dnsmasq process 

Additional info:

Attaching must gather and system journal.

Comment 2 Marius Cornea 2022-07-12 13:17:03 UTC

After restarting dnsmasq via `systemctl restart dnsmasq` the CPU usage drops to normal value.

Comment 3 Marius Cornea 2022-07-12 13:19:45 UTC

Note that this issue may prevent other OpenShift components to function correctly and upgrade to succeed since the node is configured for workload partitioning and the OCP components are running only on a small number of CPUs.

Comment 4 Miciah Dashiel Butler Masters 2022-07-12 13:57:46 UTC

What version of RHACM are you using?

Comment 5 Marius Cornea 2022-07-12 13:59:03 UTC

(In reply to Miciah Dashiel Butler Masters from comment #4)
> What version of RHACM are you using?

advanced-cluster-management.v2.5.2

Comment 6 Marius Cornea 2022-07-12 14:00:12 UTC

(In reply to Marius Cornea from comment #5)
> (In reply to Miciah Dashiel Butler Masters from comment #4)
> > What version of RHACM are you using?
> 
> advanced-cluster-management.v2.5.2

Installed from 2.5.2-DOWNSTREAM-2022-07-08-16-02-46

Comment 7 Marius Cornea 2022-07-20 12:29:41 UTC

On subsequent runs I failed to reproduce the issue reported initially so closing this BZ.

Comment 8 yliu1 2022-08-04 12:55:56 UTC

Reopen because this issue is reproduced. It happens about 15-20% of the time in my environment. 

# Error in agent:
   state: installing-pending-user-action
    stateInfo: Host failed to reboot within timeout, please boot the host from the
      the OpenShift installation disk (sdc, /dev/disk/by-id/wwn-0x64cd98f07a3f810025cf0e56e9b8d8fb).
      The installation will resume once the host has rebooted

# However, server did boot from the correct disk:

[root@test-sno-1 core]# mount | grep  /sysroot
/dev/sdc4 on /sysroot type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/sdc4 on /sysroot/ostree/deploy/rhcos/var type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)


# Turned out node is not available. 

[kni ~]$ oc get pods -A
NAMESPACE            NAME                                  READY   STATUS    RESTARTS   AGE
assisted-installer   assisted-installer-controller-r95r4   0/1     Pending   0          19h

[kni ~]$ oc get nodes
No resources found

[core@test-sno-1 ~]$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf, 30-kubelet-interval-tuning.conf, 90-container-mount-namespace.conf
   Active: active (running) since Wed 2022-08-03 17:31:12 UTC; 19h ago
  Process: 5729 ExecStartPre=/usr/local/bin/extractExecStart kubelet.service //run/kubelet-execstart.env ORIG_EXECSTART (code=exited, status=0/SUCCESS)
  Process: 5726 ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state (code=exited, status=0/SUCCESS)
  Process: 5723 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 5715 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 5750 (kubelet)
    Tasks: 62 (limit: 2461943)
   Memory: 184.2M
      CPU: 17min 57.725s
   CGroup: /system.slice/kubelet.service
           └─5750 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service ->

Aug 04 12:33:22 test-sno-1.lab.eng.rdu2.redhat.com bash[5750]: E0804 12:33:22.491285    5750 kubelet.go:2447] "Error getting node" err="node \"test-sno-1.lab.eng.rdu2.redhat.com\" not found"

# dnsmasq was using 100% of CPU.
# after restarting dnsmasq, the node is up and pods started running.

Logs: https://drive.google.com/drive/folders/1RD_6-rseIHLbvtEUA64tvci63VVvNeCl?usp=sharing

Comment 9 yliu1 2022-08-04 12:57:24 UTC

Forgot to mention, in my case in comment #8, this happened in fresh deployment.

Comment 10 yliu1 2022-08-04 15:15:11 UTC

I have just reproduced this issue again and will keep this env for now in case someone wants to take a look.
ACM version on hub cluster is 2.5.1.
Spoke cluster ocp version is 4.11.0-rc.7.

Comment 11 Ken Young 2022-08-08 17:51:23 UTC

Marius,

Could this bus be a product of the environment?  Could something we exercising dnsmasq?  Are we seeing this elsewhere?

/Ken

Comment 12 Marius Cornea 2022-08-09 13:00:53 UTC

(In reply to Ken Young from comment #11)
> Marius,
> 
> Could this bus be a product of the environment?  Could something we
> exercising dnsmasq?  Are we seeing this elsewhere?
> 
> /Ken

The issue was pretty inconsistent on my environment and I haven't been able to reproduce it lately. So far this issue has been observed on 2 environments(mine and Yang's) and it looks that it's reproducing more consistent on Yang's environment(about 15-20%) according to https://bugzilla.redhat.com/show_bug.cgi?id=2106361#c8

Comment 13 Marius Cornea 2022-08-16 08:01:32 UTC

I started seeing the issue again on the ZT Systems machine. To reproduce it the issue after the initial error I used:

while true;do date; echo restarting; if [[ $(systemctl is-active dnsmasq) == 'failed' ]]; then systemctl restart dnsmasq; fi ; systemctl restart NetworkManager; sleep 15; if [[ $(top -b -n1  | head -10 | grep dnsmasq) ]]; then echo "reproduced. done"; break;fi done

Note that on the ZT system it reproduces pretty quickly, on HPE e910 it takes some time(~2 hours).

Comment 14 Igal Tsoiref 2022-08-22 13:16:57 UTC

*** Bug 2116549 has been marked as a duplicate of this bug. ***

Comment 15 Petr Menšík 2022-08-22 17:09:15 UTC

Posting backtrace on a reproduced machine:

(gdb) bt full
#0  forward_query (udpfd=4, udpaddr=udpaddr@entry=0x7ffc256dea00, dst_addr=dst_addr@entry=0x7ffc256de9e0, dst_iface=dst_iface@entry=1, 
    header=header@entry=0x55e29a3eab70, plen=plen@entry=37, now=1660925731, forward=0x55e29a3f1160, ad_reqd=0, do_bit=0) at forward.c:500
        fd = <optimized out>
        firstsentto = 0x55e29a3e5960
        oph = <optimized out>
        pheader = 0x55e298817345 <find_pseudoheader+405> "H\211\307H\205\300\017\205\265\376\377\377\017\037\200"
        subnet = 0
        forwarded = 0
        edns0_len = 37
        domain = 0x0
        type = 0
        norebind = 0
        addrp = 0x0
        flags = 0
        fwd_flags = <optimized out>
        start = 0x55e29a3ea4d0
        sd = 0x55e29a3e5220
        hash = <optimized out>
        do_dnssec = <optimized out>
        gotname = 128
        oph = <optimized out>
#1  0x000055e2987ed837 in receive_query (listen=<optimized out>, now=1660925731) at forward.c:1490
        ad_reqd = 0
        header = 0x55e29a3eab70
        source_addr = {sa = {sa_family = 2, sa_data = "\327\260\177\000\000\001\000\000\000\000\000\000\000"}, in = {sin_family = 2, 
            sin_port = 45271, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, 
            sin6_port = 45271, sin6_flowinfo = 16777343, sin6_addr = {__in6_u = {
                __u6_addr8 = "\000\000\000\000\000\000\000\000`Y>\232\342U\000", __u6_addr16 = {0, 0, 0, 0, 22880, 39486, 21986, 0}, 
                __u6_addr32 = {0, 0, 2587777376, 21986}}}, sin6_scope_id = 0}}
        pheader = 0xcd6c5f43b02b9000 <error: Cannot access memory at address 0xcd6c5f43b02b9000>
        type = 1
        udp_size = <optimized out>
        dst_addr = {addr = {addr4 = {s_addr = 16777343}, addr6 = {__in6_u = {
                __u6_addr8 = "\177\000\000\001\000\000\000\000\205\032\177\230\342U\000", __u6_addr16 = {127, 256, 0, 0, 6789, 39039, 
                  21986, 0}, __u6_addr32 = {16777343, 0, 2558466693, 21986}}}, log = {keytag = 127, algo = 256, digest = 0}, dnssec = {
              class = 127, type = 256}}}
        netmask = {s_addr = 0}
        dst_addr_4 = <optimized out>
        m = <optimized out>
        n = 37
        if_index = 1
        auth_dns = 0
        do_bit = 0
        have_pseudoheader = <optimized out>
        local_auth = <optimized out>
        iov = {{iov_base = 0x55e29a3eab70, iov_len = 4096}}
        msg = {msg_name = 0x7ffc256dea00, msg_namelen = 16, msg_iov = 0x7ffc256de9f0, msg_iovlen = 1, msg_control = 0x7ffc256dea20, 
          msg_controllen = 32, msg_flags = 0}
        cmptr = <optimized out>
        control_u = {align = {cmsg_len = 28, cmsg_level = 0, cmsg_type = 8, __cmsg_data = 0x7ffc256dea30 "\001"}, 
          control6 = "\034", '\000' <repeats 11 times>, "\b\000\000\000\001\000\000\000\177\000\000\001\177\000\000\001\000\000\000\020 R>\232\342U\000", control = "\034", '\000' <repeats 11 times>, "\b\000\000\000\001\000\000\000\177\000\000\001\177\000\000\001\000\000\000\020"}
        check_dst = <optimized out>
#2  0x000055e2987f1fa9 in check_dns_listeners (now=1660925731) at dnsmasq.c:1619
        serverfdp = <optimized out>
        listener = 0x55e29a3edc40
        rfl = <optimized out>
        i = <optimized out>
#3  0x000055e2987d7314 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:1070
        t = <optimized out>
        timeout = -1
        now = 1660925731
        sigact = {__sigaction_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, 
          sa_restorer = 0x0}
        if_tmp = <optimized out>
        piperead = 12
        pipefd = {12, 13}
        err_pipe = {-1707224336, -1}
        ent_pw = <optimized out>
        script_uid = 0
        script_gid = 0
        gp = <optimized out>
        i = 3
        max_fd = <optimized out>
        baduser = <optimized out>
        log_err = 0
        chown_warn = 0
        hdr = <optimized out>
        data = <optimized out>
        bound_device = 0x0
        did_bind = <optimized out>
        context = <optimized out>
        relay = <optimized out>
        tftp_prefix_missing = <optimized out>

(gdb) p $d->servers
$8 = (struct server *) 0x55e29a3eaa70
(gdb) p $d->servers->next
$9 = (struct server *) 0x55e29a3ea4d0
(gdb) p $d->servers->next->next
$10 = (struct server *) 0x55e29a3e9e00
(gdb) p $d->servers->next->next->next
$11 = (struct server *) 0x0
(gdb) p *$d->servers
$12 = {addr = {sa = {sa_family = 10, sa_data = "\000\065\000\000\000\000& \000R\000\000\001\342"}, in = {sin_family = 10, sin_port = 13568, 
      sin_addr = {s_addr = 0}, sin_zero = "& \000R\000\000\001\342"}, in6 = {sin6_family = 10, sin6_port = 13568, sin6_flowinfo = 0, 
      sin6_addr = {__in6_u = {__u6_addr8 = "& \000R\000\000\001\342\000\000\000\000\000\000\000\020", __u6_addr16 = {8230, 20992, 0, 57857, 
            0, 0, 0, 4096}, __u6_addr32 = {1375739942, 3791716352, 0, 268435456}}}, sin6_scope_id = 0}}, source_addr = {sa = {
      sa_family = 10, sa_data = '\000' <repeats 13 times>}, in = {sin_family = 10, sin_port = 0, sin_addr = {s_addr = 0}, 
      sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 10, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, 
      sin6_scope_id = 0}}, interface = '\000' <repeats 16 times>, ifindex = 0, sfd = 0x0, 
  domain = 0x55e29a3ea880 "api.kni-qe-12.lab.eng.rdu2.redhat.com", flags = 12, tcpfd = 0, edns_pktsz = 0, pktsz_reduced = 0, queries = 0, 
  failed_queries = 0, uid = 0, serv_domain = 0x55e29a3eab10, next = 0x55e29a3ea4d0}
(gdb) p *$d->servers->next
$13 = {addr = {sa = {sa_family = 10, sa_data = "\000\065\000\000\000\000& \000R\000\000\001\342"}, in = {sin_family = 10, sin_port = 13568, 
      sin_addr = {s_addr = 0}, sin_zero = "& \000R\000\000\001\342"}, in6 = {sin6_family = 10, sin6_port = 13568, sin6_flowinfo = 0, 
      sin6_addr = {__in6_u = {__u6_addr8 = "& \000R\000\000\001\342\000\000\000\000\000\000\000\020", __u6_addr16 = {8230, 20992, 0, 57857, 
            0, 0, 0, 4096}, __u6_addr32 = {1375739942, 3791716352, 0, 268435456}}}, sin6_scope_id = 0}}, source_addr = {sa = {
      sa_family = 10, sa_data = '\000' <repeats 13 times>}, in = {sin_family = 10, sin_port = 0, sin_addr = {s_addr = 0}, 
      sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 10, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, 
      sin6_scope_id = 0}}, interface = '\000' <repeats 16 times>, ifindex = 0, sfd = 0x0, 
  domain = 0x55e29a3ea490 "api-int.kni-qe-12.lab.eng.rdu2.redhat.com", flags = 12, tcpfd = 0, edns_pktsz = 0, pktsz_reduced = 0, 
  queries = 0, failed_queries = 0, uid = 0, serv_domain = 0x55e29a3ea570, next = 0x55e29a3e9e00}
(gdb) p *$d->servers->next->next
$14 = {addr = {sa = {sa_family = 10, sa_data = "\000\065\000\000\000\000& \000R\000\000\001\342"}, in = {sin_family = 10, sin_port = 13568, 
      sin_addr = {s_addr = 0}, sin_zero = "& \000R\000\000\001\342"}, in6 = {sin6_family = 10, sin6_port = 13568, sin6_flowinfo = 0, 
      sin6_addr = {__in6_u = {__u6_addr8 = "& \000R\000\000\001\342\000\000\000\000\000\000\000\020", __u6_addr16 = {8230, 20992, 0, 57857, 
            0, 0, 0, 4096}, __u6_addr32 = {1375739942, 3791716352, 0, 268435456}}}, sin6_scope_id = 0}}, source_addr = {sa = {
      sa_family = 10, sa_data = '\000' <repeats 13 times>}, in = {sin_family = 10, sin_port = 0, sin_addr = {s_addr = 0}, 
      sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 10, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, 
      sin6_scope_id = 0}}, interface = '\000' <repeats 16 times>, ifindex = 0, sfd = 0x0, 
  domain = 0x55e29a3e9760 "apps.kni-qe-12.lab.eng.rdu2.redhat.com", flags = 12, tcpfd = 0, edns_pktsz = 0, pktsz_reduced = 0, queries = 0, 
  failed_queries = 0, uid = 0, serv_domain = 0x55e29a3e9ea0, next = 0x0}

(gdb) p *firstsentto 
$2 = {addr = {sa = {sa_family = 22096, sa_data = ">\232\342U\000\000 \214,B\350\177\000"}, in = {sin_family = 22096, sin_port = 39486, 
      sin_addr = {s_addr = 21986}, sin_zero = " \214,B\350\177\000"}, in6 = {sin6_family = 22096, sin6_port = 39486, sin6_flowinfo = 21986, 
      sin6_addr = {__in6_u = {__u6_addr8 = " \214,B\350\177\000\000\000\000\000\000\336\255\276\357", __u6_addr16 = {35872, 16940, 32744, 
            0, 0, 0, 44510, 61374}, __u6_addr32 = {1110215712, 32744, 0, 4022250974}}}, sin6_scope_id = 0}}, source_addr = {sa = {
      sa_family = 10, sa_data = '\000' <repeats 13 times>}, in = {sin_family = 10, sin_port = 0, sin_addr = {s_addr = 0}, 
      sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 10, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, 
      sin6_scope_id = 0}}, interface = '\000' <repeats 16 times>, ifindex = 0, sfd = 0x0, domain = 0x0, flags = 257, tcpfd = 0, 
  edns_pktsz = 4096, pktsz_reduced = 0, queries = 2, failed_queries = 0, uid = 2152975652, serv_domain = 0x55e29a3e5220, next = 0x0}


When stepping it on debugger, it created a never ending loop. It loops in src/forward.c, function forward_query. It was supposed to exit the loop on line 568,  started on line 495. But by mechanism I do not yet understand firstsentto became invalid after reading a new set of servers from resolv.conf. The firstsentto becomes invalid pointer not in dnsmasq_daemon->servers list. But this loop expects firstsentto is always one of those servers. Strange is dnsmasq is singlethreaded, I haven't found how it invalidated the variable assigned on line 453 yet.

Unfortunately dnsmasq on RHEL8 does not produce coredump when I sent it SIGABRT, so coredump itself cannot be attached.

Comment 16 juhsu 2022-08-22 17:50:10 UTC

Does this problem also occur on ACM 2.6?

Comment 17 Petr Menšík 2022-08-22 20:14:30 UTC

I think I may have found a reason. I haven't been able to reliable reproduce it, but it matches current observations. In bug #1919894 change I have added new records in addition, which is a release 21. I think do not handle there case of changing servers from /etc/resolv.conf (or dbus) changes completely and it might result in invalid non-null pointer last_server in serv_domain record.

Which might in some conditions propagate to start pointer in forward_query function, which may result in firstsentto invalid pointer as seen in debug session.

Comment 23 Marius Cornea 2022-09-21 12:54:05 UTC

I wasn't able to reproduce the issue on OCP 4.11.5 which included the fixed dnsmasq so closing this BZ.

Note You need to log in before you can comment on or make changes to this bug.