Hide Forgot
Created attachment 1212850 [details] dmesg Description of problem: A system running kernel-3.10.0-327.36.1.rt56.237.el7.x86_64 ran into problems after a few minutes of uptime. A running SSH session was still usable, but no other connections were accepted; 'Write failed: Broken pipe' occurred instead. In the running SSH session, I ran top and ps and found out that there was an increasing number of zombie sshd and crond processes. Eventually, I noticed a trackeback on the system console and also in dmesg. Attaching the relevant part of the dmesg output. Reboot didn't help; the system ran into the same problems in just a few minutes. I've now booted a non-RT kernel (3.10.0-327.36.2.el7.x86_64) and everything has been OK so far.
I might add that this happened on a test system with nearly every package in the distro installed. I've installed another system with an almost minimum package set, and it's been running smoothly so far. I'm going to keep it up and running during the weekend.
(In reply to Radek Bíba from comment #2) > I might add that this happened on a test system with nearly every package in > the distro installed. I've installed another system with an almost minimum > package set, and it's been running smoothly so far. I'm going to keep it up > and running during the weekend. is that another machine also kvm guest?
(In reply to Jiri Kastner from comment #3) > is that another machine also kvm guest? Yes, https://beaker.engineering.redhat.com/view/bkr-hv03-guest26.dsal.lab.eng.bos.redhat.com#details. It's doing just fine, BTW.
Hi! Radek, I have some questions about this problem: Is it reproducible? If it is, could you please try to reproduce the problem and then crash the system to generate a vmcore? As far as I understood, it took place in a virtual machine, is that correct? -- Daniel
Hi Daniel, Yes, this problem has been reliably reproducible so far. The RT kernel "gets sick" after a while every time. I booted the kernel, crashed it (echo c > /proc/sysrq-trigger; I hope that's a good way to do it in this case), and got the crash data in /var/crash/127.0.0.1-2016-10-27-06\:53\:14/. The vmcore file is more than a hundred MB, and using "makedumpfile -c -d 31" didn't help much, so I created a zip archive from the full vmcore (deflated 49%). You can get the zip archive as well as the accompanying vmcore-dmesg.txt file from here: http://nest.test.redhat.com/mnt/qa/scratch/rbiba/kernel-rt-bz1387661/ (The host is only accessible inside the VPN.) The answer to the third question is yes. The system is a virtual machine. I currently don't have access to the hypervisor, though. HTH.
vmcore analysis: vmcore at: ssh YOURUSER.rdu2.redhat.com ... $ retrace-server-interact 483612848 crash Analysis: UN(responsible/D state) tasks, ordered by the last time they run: crash> foreach UN ps -m [0 00:00:00.002] [UN] PID: 1 TASK: ffff8801166b0000 CPU: 0 COMMAND: "systemd" [0 00:00:37.578] [UN] PID: 847 TASK: ffff8800346de9c0 CPU: 0 COMMAND: "systemd-logind" [0 00:02:42.728] [UN] PID: 585 TASK: ffff8800345c5240 CPU: 0 COMMAND: "systemd-journal" [0 00:02:42.784] [UN] PID: 148 TASK: ffff880034570000 CPU: 0 COMMAND: "kworker/0:2" The stack of the oldest UN task is: PID: 148 TASK: ffff880034570000 CPU: 0 COMMAND: "kworker/0:2" #0 [ffff88010cbfbc28] __schedule at ffffffff81627d69 #1 [ffff88010cbfbc90] schedule at ffffffff81628340 #2 [ffff88010cbfbca8] __rt_mutex_slowlock at ffffffff81628ca5 #3 [ffff88010cbfbd00] rt_mutex_slowlock at ffffffff81628f1c #4 [ffff88010cbfbdc8] rt_mutex_lock at ffffffff816290b1 #5 [ffff88010cbfbdd8] _mutex_lock at ffffffff81629a2e #6 [ffff88010cbfbde8] cgroup_free_fn at ffffffff810e86b8 #7 [ffff88010cbfbe20] process_one_work at ffffffff81091c36 #8 [ffff88010cbfbe68] worker_thread at ffffffff81092992 #9 [ffff88010cbfbec8] kthread at ffffffff8109a0b1 #10 [ffff88010cbfbf50] ret_from_fork at ffffffff816321d8 The task is waiting in the mutex at 0xffffffff81973a00, which is onwned by: crash> px ((struct rt_mutex *)0xffffffff81973a00)->owner $2 = (struct task_struct *) 0xffff8801166b0001 The second oldest UN task is: PID: 585 TASK: ffff8800345c5240 CPU: 0 COMMAND: "systemd-journal" #0 [ffff880033d4bc90] __schedule at ffffffff81627d69 #1 [ffff880033d4bcf8] schedule at ffffffff81628340 #2 [ffff880033d4bd10] __rt_mutex_slowlock at ffffffff81628ca5 #3 [ffff880033d4bd68] rt_mutex_slowlock at ffffffff81628f1c #4 [ffff880033d4be30] rt_mutex_lock at ffffffff816290b1 #5 [ffff880033d4be40] _mutex_lock at ffffffff81629a2e #6 [ffff880033d4be50] proc_cgroup_show at ffffffff810eb786 #7 [ffff880033d4be98] seq_read at ffffffff811f073a #8 [ffff880033d4bf08] vfs_read at ffffffff811cb2cc #9 [ffff880033d4bf38] sys_read at ffffffff811cbe1f #10 [ffff880033d4bf80] system_call_fastpath at ffffffff81632289 RIP: 00007fead7b564a0 RSP: 00007fffa5f791c8 RFLAGS: 00010207 RAX: 0000000000000000 RBX: ffffffff81632289 RCX: 0000000000000003 RDX: 0000000000000400 RSI: 00007fead8f5c000 RDI: 0000000000000011 RBP: 000000000000000a R8: 00000000ffffffff R9: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 00007feadada9410 R13: 0000000000000000 R14: 0000000000000000 R15: 00007feadada9410 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b which is also wating in the same mutex: crash> px ((struct rt_mutex *)0xffffffff81973a00)->owner $1 = (struct task_struct *) 0xffff8801166b0001 And the third... PID: 847 TASK: ffff8800346de9c0 CPU: 0 COMMAND: "systemd-logind" #0 [ffff8800d97b7c90] __schedule at ffffffff81627d69 #1 [ffff8800d97b7cf8] schedule at ffffffff81628340 #2 [ffff8800d97b7d10] __rt_mutex_slowlock at ffffffff81628ca5 #3 [ffff8800d97b7d68] rt_mutex_slowlock at ffffffff81628f1c #4 [ffff8800d97b7e30] rt_mutex_lock at ffffffff816290b1 #5 [ffff8800d97b7e40] _mutex_lock at ffffffff81629a2e #6 [ffff8800d97b7e50] proc_cgroup_show at ffffffff810eb786 #7 [ffff8800d97b7e98] seq_read at ffffffff811f073a #8 [ffff8800d97b7f08] vfs_read at ffffffff811cb2cc #9 [ffff8800d97b7f38] sys_read at ffffffff811cbe1f #10 [ffff8800d97b7f80] system_call_fastpath at ffffffff81632289 RIP: 00007f14fe0564a0 RSP: 00007ffe20a57998 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff81632289 RCX: ffffffffffffffff RDX: 0000000000000400 RSI: 00007f14fefb7000 RDI: 0000000000000014 RBP: 000000000000000a R8: 00000000ffffffff R9: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 00007f14ffd94dc0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007f14ffd94dc0 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b is also waiting in the same lock! crash> px ((struct rt_mutex *)0xffffffff81973a00)->owner $3 = (struct task_struct *) 0xffff8801166b0001 The logind is probably the task holding the login. The owner of the mutex is (removing the last 1, that is the HAS_WAITER flag). crash> ps -m ffff8801166b0000 [0 00:00:00.002] [UN] PID: 1 TASK: ffff8801166b0000 CPU: 0 COMMAND: "systemd" which has the following stack: crash> bt ffff8801166b0000 PID: 1 TASK: ffff8801166b0000 CPU: 0 COMMAND: "systemd" #0 [ffff8801166bbb68] __schedule at ffffffff81627d69 #1 [ffff8801166bbbd0] schedule at ffffffff81628340 #2 [ffff8801166bbbe8] schedule_timeout at ffffffff81626609 #3 [ffff8801166bbc98] wait_for_completion at ffffffff81627244 #4 [ffff8801166bbce8] wait_rcu_gp at ffffffff81096674 #5 [ffff8801166bbd40] synchronize_rcu at ffffffff81114859 #6 [ffff8801166bbd50] mem_cgroup_start_move at ffffffff811bb2ac #7 [ffff8801166bbd60] mem_cgroup_reparent_charges at ffffffff811be930 #8 [ffff8801166bbdd8] mem_cgroup_css_offline at ffffffff811bedef #9 [ffff8801166bbe00] cgroup_destroy_locked at ffffffff810e94b5 #10 [ffff8801166bbe48] cgroup_rmdir at ffffffff810e9755 #11 [ffff8801166bbe60] vfs_rmdir at ffffffff811d74d8 #12 [ffff8801166bbe88] do_rmdir at ffffffff811dc525 #13 [ffff8801166bbf70] sys_rmdir at ffffffff811dd6d6 #14 [ffff8801166bbf80] system_call_fastpath at ffffffff81632289 RIP: 00007f473e54cc37 RSP: 00007ffeb07803a8 RFLAGS: 00000202 RAX: 0000000000000054 RBX: ffffffff81632289 RCX: ffffffffffffffff RDX: 00007f473e81e7b8 RSI: 0000000000000000 RDI: 00007f4740ff31e0 RBP: 00007f473fc9d710 R8: 0000000000000001 R9: 00007f4741115918 R10: 0000000000000100 R11: 0000000000000202 R12: ffffffff811dd6d6 R13: ffff8801166bbf78 R14: 0000000000000001 R15: 00007f4740ff46e0 ORIG_RAX: 0000000000000054 CS: 0033 SS: 002b Systemd is doing a rmdir to destroy a cgroup. To destroy structs that contain RCU protected data, it is waiting an RCU grace period. But the grace period is either not happening, or being delayed. By checking the runqueue we can see if there is any RCU or timer softirq thread being delayed, which could justify the delay of the RCU grace period: crash> runq CPU 0 RUNQUEUE: ffff88011fc18800 CURRENT: PID: 7030 TASK: ffff8800d4970000 COMMAND: "bash" RT PRIO_ARRAY: ffff88011fc188e0 [no tasks queued] CFS RB_ROOT: ffff88011fc188a0 [120] PID: 7026 TASK: ffff8800d4975240 COMMAND: "sshd" CPU 1 RUNQUEUE: ffff88011fd18800 CURRENT: PID: 0 TASK: ffff880116772340 COMMAND: "swapper/1" RT PRIO_ARRAY: ffff88011fd188e0 [no tasks queued] CFS RB_ROOT: ffff88011fd188a0 [no tasks queued] As there is no -rt tasks, there is no starvation, hence we do not have an RCU stall by starvation. Here is the ps -m of RUnnable tasks: crash> foreach RU ps -m [0 00:00:00.000] [RU] PID: 7030 TASK: ffff8800d4970000 CPU: 0 COMMAND: "bash" [0 00:00:00.000] [RU] PID: 7026 TASK: ffff8800d4975240 CPU: 0 COMMAND: "sshd" [0 00:07:16.254] [RU] PID: 0 TASK: ffffffff81936440 CPU: 0 COMMAND: "swapper/0" [0 00:07:16.254] [RU] PID: 0 TASK: ffff880116772340 CPU: 1 COMMAND: "swapper/1" They just started to work, so the system was probably idle before it. These tasks are running because Radek run echo c > /proc/sysrq-trigger... This problem looks similar to the BZ1376184.
Radek, Could you please add the following parameters to the kernel command line: "trace_buf_size=10M trace_event=rcu" And then boot the reproducer and crash it like you did? The trace data is important to understand the problem. Moreover, could you please try to boot the with the RHEL-RT-7.3 beta kernel to see if the same problem happens? Best Regards, Daniel
Sure, no problem. I tried booting 3.10.0-514.rt56.420.el7.x86_64 (is that what you meant by the beta kernel?) and it was still fine after more than 15 minutes. That looks promising, doesn't it? Then I booted 3.10.0-514.rt56.420.el7 again, with the additional parameters, and it got sick soon, as usual. Here are the vmcore and vmcore-dmesg.txt files: http://nest.test.redhat.com/mnt/qa/scratch/rbiba/kernel-rt-bz138766.2/ . I'll now boot the beta kernel again and keep an eye on it. Thanks for looking into this, Daniel!
I let the system run over night, with an SSH session and a connection to the console, and here's what I see this morning: 1 - The SSH session is still usable, but no new session can be created: $ ssh root@HOSTNAME ssh_exchange_identification: Connection closed by remote host 2 - The console is clear; no traceback appeared. But I can't log in: Red Hat Enterprise Linux Server 7.2 (Maipo) Kernel 3.10.0-514.rt56.420.el7.x86_64 on an x86_64 x86-64-7s-v1 login: root Password: login: timed out after 60 seconds 3 - `top' in the SSH session tells me: top - 03:35:35 up 15:29, 3 users, load average: 4.00, 4.01, 4.05 Tasks: 3326 total, 1 running, 795 sleeping, 0 stopped, 2530 zombie Here are vmcore files from the system in this state in case you need them: http://nest.test.redhat.com/mnt/qa/scratch/rbiba/kernel-rt-bz138766.3/ . This kernel was booted without the extra parameters, though. I need to make the system available again because it's a semi-production system, so I'm booting a non-RT kernel, but please let me know if you'd like me to boot the RT kernel again and keep it up for a little longer and check something.
Radek, can you reproduce the issue again but this time add to the kernel command line: "trace_buf_size=10M trace_event=*:*" This will enable all events. And try to crash it as soon as you notice that it becomes unresponsive. Thanks!
The system is now running the latest released RT kernel (released just the other day) and I can't reproduce the issue. It's been running smoothly for almost an hour. # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-514.2.2.rt56.424.el7.x86_64 root=UUID=d1e30e68-dc2c-45db-871c-4c7819f41332 ro crashkernel=auto console=ttyS0,115200 trace_buf_size=10M trace_event=*:* LANG=en_US.UTF-8 Were there any changes in that kernel that might have resolved the issue?
(In reply to Radek Bíba from comment #12) > The system is now running the latest released RT kernel (released just the > other day) and I can't reproduce the issue. It's been running smoothly for > almost an hour. > > # cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-3.10.0-514.2.2.rt56.424.el7.x86_64 > root=UUID=d1e30e68-dc2c-45db-871c-4c7819f41332 ro crashkernel=auto > console=ttyS0,115200 trace_buf_size=10M trace_event=*:* LANG=en_US.UTF-8 > > Were there any changes in that kernel that might have resolved the issue? Hard to pin down the changes, since the failing kernel is a 7.2.z kernel and the one you're running now is a 7.3.z kernel. Git shows over 18k commits difference between them.
In talking with Clark and the RT team, we are unsure how to move forward without a way to reproduce and further debug this issue. Radek, are you okay with closing this BZ?
Yeah, let's close this bug.