Bug 1910876
| Summary: | RHOSP 16.1.2 redis-server (redis-5.0.3-2 ) uses 100% CPU on master controller node | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | XinhuaLi <xili> |
| Component: | redis | Assignee: | Matthias Runge <mrunge> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Leonid Natapov <lnatapov> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | apevec, csibbitt, lhh, nathans, pleimer |
| Target Milestone: | z5 | Keywords: | Triaged, ZStream |
| Target Release: | 16.1 (Train on RHEL 8.2) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-10 10:52:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: Symptom: The Redis-server uses 100% CPU on Master controller node always -------------------------------------------------------------------------------------------- redis master runs on controller-1: * Container bundle set: redis-bundle [cluster.common.tag/openstack-redis:pcmklatest]: * redis-bundle-0 (ocf::heartbeat:redis): Slave controller-0 * redis-bundle-1 (ocf::heartbeat:redis): Master controller-1 * redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 redis-server uses almost 100% always: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 874489 42460 20 0 793404 156320 4228 R 99.3 0.0 18136:51 redis-server [root@controller-1 ~]# ps -e -o pid,%cpu,cmd --sort=-%cpu | grep redis-server 874489 98.9 /usr/bin/redis-server 172.31.200.22:6379 Other nodes, such as controller-0/2,redis-server use less that <1% CPU: [root@controller-0 redis]# ps -e -o pid,%cpu,cmd --sort=-%cpu | grep redis 561619 0.4 /usr/bin/redis-server 172.31.200.21:6379 -------------------------------------------------------------------------------------------- The redis logs look normal -------------------------------------------------------------------------------------------- 89:M 21 Dec 2020 14:41:16.085 * 10000 changes in 60 seconds. Saving... 89:M 21 Dec 2020 14:41:16.088 * Background saving started by pid 918207 918207:C 21 Dec 2020 14:41:16.627 * DB saved on disk 918207:C 21 Dec 2020 14:41:16.630 * RDB: 45 MB of memory used by copy-on-write 89:M 21 Dec 2020 14:41:16.733 * Background saving terminated with success -------------------------------------------------------------------------------------------- redis-server consumes the CPU at user space -------------------------------------------------------------------------------------------- [root@controller-1 ~]# pidstat -p 874489 1 10 Linux 4.18.0-193.19.1.el8_2.x86_64 (controller-1) 12/23/2020 _x86_64_ (80 CPU) 10:31:59 AM UID PID %usr %system %guest %wait %CPU CPU Command 10:32:00 AM 42460 874489 94.00 5.00 0.00 0.00 99.00 73 redis-server 10:32:05 AM 42460 874489 100.00 0.00 0.00 0.00 100.00 33 redis-server -------------------------------------------------------------------------------------------- 3.3 we can see errno 32 from podman logs -------------------------------------------------------------------------------------------- [root@controller-1 ~]# podman logs -f redis-bundle-podman-1 (qb_ipcs_event_sendv) warning: new_event_notification (/dev/shm/qb-7-114143-15-PlloOw/qb): Broken pipe (32) (send_client_notify) info: Could not notify client proxy-cib_rw-114143-e2c6e977/e2c6e977-2e0d-498f-8f0e-8adf40fb168e: Disconnected | rc=-32 (qb_ipcs_event_sendv) warning: new_event_notification (/dev/shm/qb-7-142495-15-ZF1GiP/qb): Broken pipe (32) (send_client_notify) info: Could not notify client proxy-cib_rw-142495-ccdd8440/ccdd8440-e9a1-474a-a354-fa083543f2a4: Disconnected | rc=-32 -------------------------------------------------------------------------------------------- The stack file is empty -------------------------------------------------------------------------------------------- cat /proc/874489/stack -------------------------------------------------------------------------------------------- Version-Release number of selected component (if applicable): Detail as below: -------------------------------------------------------------------------------------------- cluster.common.tag/openstack-redis pcmklatest c5816debd336 2 months ago 529 MB director.ctlplane.localdomain:8787/osp16-rhel8/openstack-redis 16.1 c5816debd336 2 months ago 529 MB -------------------------------------------------------------------------------------------- RHEL, RHOSP, Kernel version -------------------------------------------------------------------------------------------- Red Hat Enterprise Linux release 8.2 (Ootpa) Red Hat OpenStack Platform release 16.1.2 GA (Train) Linux controller-1 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Wed Aug 26 15:29:02 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux -------------------------------------------------------------------------------------------- /var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/ceilometer.conf:341:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/aodh/etc/aodh/aodh.conf:283:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:283:listen redis /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:176:driver=redis /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:298:redis_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:624:coordination_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ -------------------------------------------------------------------------------------------- How reproducible: Check the CPU status and the symptom can be observed always Steps to Reproduce: Deploy RHOSP 16.1.2 with redis Actual results: redis-server consume almost 100 % CPU Expected results: redis-server should not consume 100 % CPU Additional info: 1 strace, lsof info, we can see that redis is writing to socket FD which connected to "Current DC: controller-2" -------------------------------------------------------------------------------------------- # grep write strace.txt |wc -l 112253 -------------------------------------------------------------------------------------------- # vim strace.txt and check write(146, "*3\r\n$36\r\n7de4010f-a272-4e7e-a936"..., 90) = 90 ...... write(119, "*3\r\n$36\r\n0e4e4fe6-aa79-4f0e-a734"..., 90) = 90 ...... write(88, "*3\r\n$36\r\n40cfea0d-64bf-4594-ad51"..., 828) = 828 -------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------- ...... redis-ser 874489 42460 88u IPv4 2409162774 0t0 TCP controller-1.localdomain:redis->controller-2.localdomain:43314 (ESTABLISHED) redis-ser 874489 42460 119u IPv4 2407031711 0t0 TCP controller-1.localdomain:redis->controller-2.localdomain:39778 (ESTABLISHED) redis-ser 874489 42460 146u IPv4 2407779370 0t0 TCP controller-1.localdomain:redis->controller-2.localdomain:60902 (ESTABLISHED) ...... -------------------------------------------------------------------------------------------- 2, ceilometer, aodh , gnocchi are configured to as redis. -------------------------------------------------------------------------------------------- /var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/ceilometer.conf:341:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/aodh/etc/aodh/aodh.conf:283:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:283:listen redis /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:176:driver=redis /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:298:redis_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ /var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:624:coordination_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/ -------------------------------------------------------------------------------------------- 3, perf -------------------------------------------------------------------------------------------- 13.53 │4a2a0: shr $0x3,%dl ▒ 1.73 │4a2a3: movzbl %dl,%edx ▒ │4a2a6: ↑ jmp 4a243 <addReply@@Base+0xc3> ▒ │4a2a8: nopl 0x0(%rax,%rax,1) ▒ │4a2b0: mov -0x11(%rsi),%rdx ▒ │4a2b4: ↑ jmp 4a243 <addReply@@Base+0xc3> ▒ │4a2b6: nopw %cs:0x0(%rax,%rax,1) ▒ │4a2c0: mov -0x9(%rsi),%edx ▒ │4a2c3: ↑ jmpq 4a21a <addReply@@Base+0x9a> ▒ │4a2c8: nopl 0x0(%rax,%rax,1) ▒ │4a2d0: movzwl -0x5(%rsi),%edx ▒ │4a2d4: ↑ jmpq 4a21a <addReply@@Base+0x9a> ▒ │4a2d9: nopl 0x0(%rax) ▒ 12.30 │4a2e0: movzbl -0x3(%rsi),%edx - 74.88% 0.00% 0 redis-server redis-server [.] aeMain ▒ - aeMain ▒ - 74.17% aeProcessEvents ▒ - 72.62% processInputBuffer ▒ - 72.55% processCommand ▒ - call ▒ - 72.52% evalGenericCommand ▒ - 71.81% lua_pcall ▒ 0x55836b51ce70 ▒ 0x55836b51c025 ▒ - 0x55836b51ccdc ▒ - 71.42% 0x55836b52620c ▒ - 0x55836b51c76c ▒ - 70.88% luaRedisGenericCommand ▒ + 40.91% redisProtocolToLuaType_MultiBulk ▒ - 28.80% call ▒ - 25.71% lrangeCommand ▒ + 13.31% addReplyBulkCBuffer ▒ + 8.44% addReply ▒ + 2.92% quicklistNext ▒ 0.85% listTypeNext ▒ + 2.75% keysCommand ▒ 0.67% sdscatlen -------------------------------------------------------------------------------------------- 4, i am assuming that there could be batch of data have to be written to socket and then report to "Current DC: controller-2"; so addReply keep working and hot cpu. May be we can switch master redis-server to "Current DC: controller-2"? not sure. or should we improve redis-server codes for multi-thread, asynchronous write etc ..? Regards, Sam