Hide Forgot
Description of problem: On a setup where layered install of rhgs-server and rhgs-samba has been done on top of RHEL7.2 , while running i/o's on cifs mount and several other samba server side test cases multiple times there is a systemd crash seen. **************************************** bt is as follows: Core was generated by `/usr/lib/systemd/systemd-journald'. Program terminated with signal 6, Aborted. #0 0x00007f4c50f0408d in __GI_readlinkat (fd=-100, path=0x7ffc36e52b70 "/proc/1/exe", buf=0x7f4c527739d0 "/usr/lib/systemd/systemd", len=99) at ../sysdeps/unix/sysv/linux/readlinkat.c:45 45 result = INLINE_SYSCALL (readlinkat, 4, fd, path, buf, len); (gdb) bt #0 0x00007f4c50f0408d in __GI_readlinkat (fd=-100, path=0x7ffc36e52b70 "/proc/1/exe", buf=0x7f4c527739d0 "/usr/lib/systemd/systemd", len=99) at ../sysdeps/unix/sysv/linux/readlinkat.c:45 #1 0x00007f4c52334257 in __readlinkat_alias () at /usr/include/bits/unistd.h:185 #2 readlinkat_malloc (p=p@entry=0x7ffc36e52b70 "/proc/1/exe", ret=ret@entry=0x7ffc36e52c78, fd=-100) at src/shared/util.c:1010 #3 0x00007f4c52334342 in readlink_malloc (p=<optimized out>, ret=<optimized out>) at src/shared/util.c:1029 #4 get_process_link_contents (name=0x7ffc36e52c78, proc_file=0x7ffc36e52b70 "/proc/1/exe") at src/shared/util.c:838 #5 get_process_exe (pid=<optimized out>, name=name@entry=0x7ffc36e52c78) at src/shared/util.c:853 #6 0x00007f4c52319688 in dispatch_message_real.4064 (s=s@entry=0x7ffc36e53520, iovec=iovec@entry=0x7f4c52777500, n=13, n@entry=9, m=m@entry=66, ucred=ucred@entry=0x7ffc36e532e0, tv=tv@entry=0x7ffc36e532c0, label=label@entry=0x7ffc36e53300 "system_u:system_r:init_t:s0", label_len=label_len@entry=28, unit_id=unit_id@entry=0x0, priority=priority@entry=27, object_pid=object_pid@entry=0) at src/journal/journald-server.c:597 #7 0x00007f4c5233ebcf in server_dispatch_message (s=s@entry=0x7ffc36e53520, iovec=0x7f4c52777500, n=n@entry=9, m=66, ucred=ucred@entry=0x7ffc36e532e0, tv=tv@entry=0x7ffc36e532c0, label=label@entry=0x7ffc36e53300 "system_u:system_r:init_t:s0", label_len=label_len@entry=28, unit_id=unit_id@entry=0x0, priority=priority@entry=27, object_pid=0) at src/journal/journald-server.c:917 #8 0x00007f4c5232d245 in server_process_native_message (s=s@entry=0x7ffc36e53520, buffer=<optimized out>, buffer_size=233, ucred=ucred@entry=0x7ffc36e532e0, tv=tv@entry=0x7ffc36e532c0, label=label@entry=0x7ffc36e53300 "system_u:system_r:init_t:s0", label_len=label_len@entry=28) at src/journal/journald-native.c:286 #9 0x00007f4c5232da0e in server_process_datagram (es=<optimized out>, fd=4, revents=<optimized out>, userdata=0x7ffc36e53520) at src/journal/journald-server.c:1211 #10 0x00007f4c5232ee40 in source_dispatch (s=s@entry=0x7f4c52765480) at src/libsystemd/sd-event/sd-event.c:2115 #11 0x00007f4c5232ffba in sd_event_dispatch (e=e@entry=0x7f4c52765190) at src/libsystemd/sd-event/sd-event.c:2472 #12 0x00007f4c52314fac in sd_event_run (timeout=18446744073709551615, e=0x7f4c52765190) at src/libsystemd/sd-event/sd-event.c:2501 #13 main (argc=<optimized out>, argv=<optimized out>) at src/journal/journald.c:109 Version-Release number of selected component (if applicable): cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) How reproducible: Seen once Steps to Reproduce: 1. Do a layered install of rhgs-server samba on top of RHEl7.2 2. Start automation suite on cifs mount which has test cases like (mkdir, creating files, dd to create 1 GB file, renames, delete on mount , create vol delete vol multiple times.) 3.Check for failures, crashes, and errors in logs. Actual results: There is a systemd crash, also there was OOM kill invoked by smbd for which seperate BZ has been raised. Expected results: systemd should not crash. Additional info: Sosreports and core dump will be uploaded soon.
It looks like that part of disk was not accessible and readlinkat hanged. So journal have not pinged the watchdog a systemd killed it and started it again. If there was some other issue with IO, it is expected behavior.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.