Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1910876

Summary: RHOSP 16.1.2 redis-server (redis-5.0.3-2 ) uses 100% CPU on master controller node
Product: Red Hat OpenStack Reporter: XinhuaLi <xili>
Component: redisAssignee: Matthias Runge <mrunge>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: apevec, csibbitt, lhh, nathans, pleimer
Target Milestone: z5Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-10 10:52:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description XinhuaLi 2020-12-25 10:19:02 UTC
Description of problem:

Symptom: The Redis-server uses 100% CPU on Master controller node always 

--------------------------------------------------------------------------------------------
redis master runs on controller-1:
  * Container bundle set: redis-bundle [cluster.common.tag/openstack-redis:pcmklatest]:
    * redis-bundle-0    (ocf::heartbeat:redis): Slave controller-0
    * redis-bundle-1    (ocf::heartbeat:redis): Master controller-1
    * redis-bundle-2    (ocf::heartbeat:redis): Slave controller-2

redis-server uses almost 100% always:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 874489 42460     20   0  793404 156320   4228 R  99.3   0.0  18136:51 redis-server

[root@controller-1 ~]#  ps -e -o pid,%cpu,cmd --sort=-%cpu | grep redis-server
 874489 98.9 /usr/bin/redis-server 172.31.200.22:6379

Other nodes, such as controller-0/2,redis-server use less that <1% CPU:
[root@controller-0 redis]# ps -e -o pid,%cpu,cmd --sort=-%cpu | grep redis
 561619  0.4 /usr/bin/redis-server 172.31.200.21:6379
--------------------------------------------------------------------------------------------

The redis logs look normal 
--------------------------------------------------------------------------------------------
89:M 21 Dec 2020 14:41:16.085 * 10000 changes in 60 seconds. Saving...
89:M 21 Dec 2020 14:41:16.088 * Background saving started by pid 918207
918207:C 21 Dec 2020 14:41:16.627 * DB saved on disk
918207:C 21 Dec 2020 14:41:16.630 * RDB: 45 MB of memory used by copy-on-write
89:M 21 Dec 2020 14:41:16.733 * Background saving terminated with success
--------------------------------------------------------------------------------------------

redis-server consumes the CPU at user space 
--------------------------------------------------------------------------------------------
[root@controller-1 ~]# pidstat -p 874489 1 10
Linux 4.18.0-193.19.1.el8_2.x86_64 (controller-1)       12/23/2020      _x86_64_        (80 CPU)
10:31:59 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:32:00 AM 42460    874489   94.00    5.00    0.00    0.00   99.00    73  redis-server
10:32:05 AM 42460    874489  100.00    0.00    0.00    0.00  100.00    33  redis-server
--------------------------------------------------------------------------------------------

3.3 we can see errno 32 from podman logs
--------------------------------------------------------------------------------------------
[root@controller-1 ~]# podman logs -f  redis-bundle-podman-1
(qb_ipcs_event_sendv)   warning: new_event_notification (/dev/shm/qb-7-114143-15-PlloOw/qb): Broken pipe (32)
(send_client_notify)    info: Could not notify client proxy-cib_rw-114143-e2c6e977/e2c6e977-2e0d-498f-8f0e-8adf40fb168e: Disconnected | rc=-32
(qb_ipcs_event_sendv)   warning: new_event_notification (/dev/shm/qb-7-142495-15-ZF1GiP/qb): Broken pipe (32)
(send_client_notify)    info: Could not notify client proxy-cib_rw-142495-ccdd8440/ccdd8440-e9a1-474a-a354-fa083543f2a4: Disconnected | rc=-32
--------------------------------------------------------------------------------------------

The stack file is empty 
--------------------------------------------------------------------------------------------
cat /proc/874489/stack 
--------------------------------------------------------------------------------------------

Version-Release number of selected component (if applicable):
Detail as below:
--------------------------------------------------------------------------------------------
cluster.common.tag/openstack-redis                                                    pcmklatest   c5816debd336   2 months ago   529 MB
director.ctlplane.localdomain:8787/osp16-rhel8/openstack-redis        16.1             c5816debd336   2 months ago   529 MB
--------------------------------------------------------------------------------------------

RHEL, RHOSP, Kernel version 
--------------------------------------------------------------------------------------------
Red Hat Enterprise Linux release 8.2 (Ootpa)
Red Hat OpenStack Platform release 16.1.2 GA (Train)
Linux controller-1 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Wed Aug 26 15:29:02 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
--------------------------------------------------------------------------------------------
/var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/ceilometer.conf:341:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/aodh/etc/aodh/aodh.conf:283:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:283:listen redis
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:176:driver=redis
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:298:redis_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:624:coordination_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
--------------------------------------------------------------------------------------------

How reproducible:
Check the CPU status and the symptom can be observed always

Steps to Reproduce:
Deploy RHOSP 16.1.2 with redis 

Actual results:
redis-server consume almost 100 % CPU 

Expected results:
redis-server should not consume 100 % CPU

Additional info:
1 strace, lsof info, we can see that redis is writing to socket FD which connected to "Current DC: controller-2"
--------------------------------------------------------------------------------------------
# grep write strace.txt |wc -l
112253
--------------------------------------------------------------------------------------------
# vim strace.txt and check  

write(146, "*3\r\n$36\r\n7de4010f-a272-4e7e-a936"..., 90) = 90
......
write(119, "*3\r\n$36\r\n0e4e4fe6-aa79-4f0e-a734"..., 90) = 90
......
write(88, "*3\r\n$36\r\n40cfea0d-64bf-4594-ad51"..., 828) = 828
--------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------
......
redis-ser  874489    42460   88u     IPv4         2409162774        0t0        TCP controller-1.localdomain:redis->controller-2.localdomain:43314 (ESTABLISHED)
redis-ser  874489    42460  119u     IPv4         2407031711        0t0        TCP controller-1.localdomain:redis->controller-2.localdomain:39778 (ESTABLISHED)
redis-ser  874489    42460  146u     IPv4         2407779370        0t0        TCP controller-1.localdomain:redis->controller-2.localdomain:60902 (ESTABLISHED)
......
--------------------------------------------------------------------------------------------

2, ceilometer, aodh , gnocchi are configured to as redis.
--------------------------------------------------------------------------------------------
/var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/ceilometer.conf:341:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/aodh/etc/aodh/aodh.conf:283:backend_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:283:listen redis
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:176:driver=redis
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:298:redis_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
/var/lib/config-data/puppet-generated/gnocchi/etc/gnocchi/gnocchi.conf:624:coordination_url=redis://:bEh4lLffaVEGgDvFJb8eJaAy1.200.99:6379/
--------------------------------------------------------------------------------------------

3, perf 
--------------------------------------------------------------------------------------------
 13.53 │4a2a0:   shr     $0x3,%dl                                                                                                                                                                               ▒
  1.73 │4a2a3:   movzbl  %dl,%edx                                                                                                                                                                               ▒
          │4a2a6: ↑ jmp     4a243 <addReply@@Base+0xc3>                                                                                                                                                            ▒
         │4a2a8:   nopl    0x0(%rax,%rax,1)                                                                                                                                                                       ▒
       │4a2b0:   mov     -0x11(%rsi),%rdx                                                                                                                                                                       ▒
       │4a2b4: ↑ jmp     4a243 <addReply@@Base+0xc3>                                                                                                                                                            ▒
       │4a2b6:   nopw    %cs:0x0(%rax,%rax,1)                                                                                                                                                                   ▒
       │4a2c0:   mov     -0x9(%rsi),%edx                                                                                                                                                                        ▒
       │4a2c3: ↑ jmpq    4a21a <addReply@@Base+0x9a>                                                                                                                                                            ▒
       │4a2c8:   nopl    0x0(%rax,%rax,1)                                                                                                                                                                       ▒
       │4a2d0:   movzwl  -0x5(%rsi),%edx                                                                                                                                                                        ▒
       │4a2d4: ↑ jmpq    4a21a <addReply@@Base+0x9a>                                                                                                                                                            ▒
       │4a2d9:   nopl    0x0(%rax)                                                                                                                                                                              ▒
 12.30 │4a2e0:   movzbl  -0x3(%rsi),%edx 

-   74.88%     0.00%             0  redis-server  redis-server        [.] aeMain                                                                                                                                ▒
   - aeMain                                                                                                                                                                                                     ▒
      - 74.17% aeProcessEvents                                                                                                                                                                                  ▒
         - 72.62% processInputBuffer                                                                                                                                                                            ▒
            - 72.55% processCommand                                                                                                                                                                             ▒
               - call                                                                                                                                                                                           ▒
                  - 72.52% evalGenericCommand                                                                                                                                                                   ▒
                     - 71.81% lua_pcall                                                                                                                                                                         ▒
                          0x55836b51ce70                                                                                                                                                                        ▒
                          0x55836b51c025                                                                                                                                                                        ▒
                        - 0x55836b51ccdc                                                                                                                                                                        ▒
                           - 71.42% 0x55836b52620c                                                                                                                                                              ▒
                              - 0x55836b51c76c                                                                                                                                                                  ▒
                                 - 70.88% luaRedisGenericCommand                                                                                                                                                ▒
                                    + 40.91% redisProtocolToLuaType_MultiBulk                                                                                                                                   ▒
                                    - 28.80% call                                                                                                                                                               ▒
                                       - 25.71% lrangeCommand                                                                                                                                                   ▒
                                          + 13.31% addReplyBulkCBuffer                                                                                                                                          ▒
                                          + 8.44% addReply                                                                                                                                                      ▒
                                          + 2.92% quicklistNext                                                                                                                                                 ▒
                                            0.85% listTypeNext                                                                                                                                                  ▒
                                       + 2.75% keysCommand                                                                                                                                                      ▒
                                      0.67% sdscatlen                                                       
--------------------------------------------------------------------------------------------
4, i am assuming that there could be batch of data have to be written to socket and then report to "Current DC: controller-2"; so addReply keep working and hot cpu.
May be we can switch master redis-server to "Current DC: controller-2"? not sure.  or should we improve redis-server codes for multi-thread, asynchronous write etc ..?

Regards,
Sam