Bug 1591418
Summary: | find or df command hang indefinitely | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Davor <dodevski> | ||||
Component: | systemd | Assignee: | systemd-maint | ||||
Status: | CLOSED DUPLICATE | QA Contact: | qe-baseos-daemons | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | dodevski, dtardon, jsynacek, msekleta, peter.elsner, single_08, systemd-maint-list | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-07-03 09:59:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Davor
2018-06-14 16:53:06 UTC
Here is the list of updated OS packages: Installed: kernel.x86_64 0:3.10.0-862.3.2.el7 kernel-debug.x86_64 0:3.10.0-862.3.2.el7 libwayland-client.x86_64 0:1.14.0-2.el7 libwayland-server.x86_64 0:1.14.0-2.el7 llvm-private.x86_64 0:5.0.0-3.el7 lz4.x86_64 0:1.7.5-2.el7 nss-pem.i686 0:1.0.3-4.el7 nss-pem.x86_64 0:1.0.3-4.el7 Updated: apr.x86_64 0:1.4.8-3.el7_4.1 authconfig.x86_64 0:6.2.8-30.el7 bash.x86_64 0:4.2.46-30.el7 bind-libs.x86_64 32:9.9.4-61.el7 bind-libs-lite.x86_64 32:9.9.4-61.el7 bind-license.noarch 32:9.9.4-61.el7 bind-utils.x86_64 32:9.9.4-61.el7 cpp.x86_64 0:4.8.5-28.el7_5.1 dhclient.x86_64 12:4.2.5-68.el7.centos.1 dhcp-common.x86_64 12:4.2.5-68.el7.centos.1 dhcp-libs.x86_64 12:4.2.5-68.el7.centos.1 ghostscript.x86_64 0:9.07-28.el7_4.2 glibc.i686 0:2.17-222.el7 glibc.x86_64 0:2.17-222.el7 glibc-common.x86_64 0:2.17-222.el7 graphite2.x86_64 0:1.3.10-1.el7_3 gstreamer1.x86_64 0:1.10.4-2.el7 krb5-libs.x86_64 0:1.15.1-19.el7 libICE.i686 0:1.0.9-9.el7 libICE.x86_64 0:1.0.9-9.el7 libX11.i686 0:1.6.5-1.el7 libX11.x86_64 0:1.6.5-1.el7 libX11-common.noarch 0:1.6.5-1.el7 libXaw.x86_64 0:1.0.13-4.el7 libXcursor.x86_64 0:1.1.14-8.el7 libXdmcp.x86_64 0:1.1.2-6.el7 libXfixes.x86_64 0:5.0.3-1.el7 libXfont.x86_64 0:1.5.2-1.el7 libXi.i686 0:1.7.9-1.el7 libXi.x86_64 0:1.7.9-1.el7 libXpm.x86_64 0:3.5.12-1.el7 libXrandr.x86_64 0:1.5.1-2.el7 libXrender.i686 0:0.9.10-1.el7 libXrender.x86_64 0:0.9.10-1.el7 libXt.x86_64 0:1.1.5-3.el7 libXtst.i686 0:1.2.3-1.el7 libXtst.x86_64 0:1.2.3-1.el7 libXv.x86_64 0:1.0.11-1.el7 libXxf86vm.x86_64 0:1.1.4-1.el7 libdrm.x86_64 0:2.4.83-2.el7 libepoxy.x86_64 0:1.3.1-2.el7_5 libevdev.x86_64 0:1.5.6-1.el7 libfontenc.x86_64 0:1.1.3-3.el7 libgcc.i686 0:4.8.5-28.el7_5.1 libgcc.x86_64 0:4.8.5-28.el7_5.1 libgomp.x86_64 0:4.8.5-28.el7_5.1 libgudev1.x86_64 0:219-57.el7 libnl3.x86_64 0:3.2.28-4.el7 libpcap.x86_64 14:1.5.3-11.el7 libsoup.x86_64 0:2.56.0-6.el7 libstdc++.i686 0:4.8.5-28.el7_5.1 libstdc++.x86_64 0:4.8.5-28.el7_5.1 libstdc++-devel.x86_64 0:4.8.5-28.el7_5.1 libtirpc.x86_64 0:0.2.4-0.10.el7 libvorbis.x86_64 1:1.3.3-8.el7.1 libxcb.i686 0:1.12-1.el7 libxcb.x86_64 0:1.12-1.el7 libxkbfile.x86_64 0:1.0.9-3.el7 linux-firmware.noarch 0:20180220-62.1.git6d51311.el7_5 mesa-dri-drivers.x86_64 0:17.2.3-8.20171019.el7 mesa-filesystem.x86_64 0:17.2.3-8.20171019.el7 mesa-libEGL.x86_64 0:17.2.3-8.20171019.el7 mesa-libGL.x86_64 0:17.2.3-8.20171019.el7 mesa-libgbm.x86_64 0:17.2.3-8.20171019.el7 mesa-libglapi.x86_64 0:17.2.3-8.20171019.el7 mesa-libxatracker.x86_64 0:17.2.3-8.20171019.el7 mesa-private-llvm.x86_64 0:3.9.1-3.el7 nspr.i686 0:4.19.0-1.el7_5 nspr.x86_64 0:4.19.0-1.el7_5 nss.i686 0:3.36.0-5.el7_5 nss.x86_64 0:3.36.0-5.el7_5 nss-softokn.i686 0:3.36.0-5.el7_5 nss-softokn.x86_64 0:3.36.0-5.el7_5 nss-softokn-freebl.i686 0:3.36.0-5.el7_5 nss-softokn-freebl.x86_64 0:3.36.0-5.el7_5 nss-sysinit.x86_64 0:3.36.0-5.el7_5 nss-tools.x86_64 0:3.36.0-5.el7_5 nss-util.i686 0:3.36.0-1.el7_5 nss-util.x86_64 0:3.36.0-1.el7_5 openldap.x86_64 0:2.4.44-15.el7_5 orc.x86_64 0:0.4.26-1.el7 procps-ng.x86_64 0:3.3.10-17.el7_5.2 python.x86_64 0:2.7.5-68.el7 python-libs.x86_64 0:2.7.5-68.el7 rpcbind.x86_64 0:0.2.0-44.el7 ruby.x86_64 0:2.0.0.648-33.el7_4 ruby-irb.noarch 0:2.0.0.648-33.el7_4 ruby-libs.x86_64 0:2.0.0.648-33.el7_4 rubygem-bigdecimal.x86_64 0:1.2.0-33.el7_4 rubygem-io-console.x86_64 0:0.4.2-33.el7_4 rubygem-json.x86_64 0:1.7.7-33.el7_4 rubygem-psych.x86_64 0:2.0.0-33.el7_4 rubygem-rdoc.noarch 0:4.0.0-33.el7_4 rubygems.noarch 0:2.0.14.1-33.el7_4 systemd.x86_64 0:219-57.el7 systemd-libs.x86_64 0:219-57.el7 systemd-python.x86_64 0:219-57.el7 systemd-sysv.x86_64 0:219-57.el7 tcpdump.x86_64 14:4.9.2-3.el7 xkeyboard-config.noarch 0:2.20-1.el7 This sounds like a duplicate of, https://bugzilla.redhat.com/show_bug.cgi?id=1498318 However, you seem to be running systemd version that should be fixed already. What happens after find gets stuck? Do you see any indication that systemd crashed? Can you invoke some systemctl status or start/restart some system service while find is stuck? What is the output systemctl list-jobs during that time? Also, before you run find, what is the output of the following command (if possible gather output on the freshly rebooted system), systemctl status proc-sys-fs-binfmt_misc.{mount,automount} Hello Michal, Thanks for your reply. This issue is reproducible on multiple systems. We're using CentOS based systems deployed as virtual machines (VMWare) in our lab. I did found a similar issue and the information it should be resolved already - so I created this defect just in case there is a regression. On your questions: After the command gets stuck, whole terminal session is blocked - i cannot cancel/stop/kill this process or invoke any command. The only way is to login through a new SSH session for example. Maybe it is important that we *didn't reboot* the system after packages update - our shell script is using find right after the packages got updated. My colleague had success when he invoked: systemctl restart proc-sys-fs-binfmt_misc.mount, which caused some exception and then returned him to the shell. In my case, this restart did not resolve any issue - multiple invoked find processed were still there (in S, D state, per 'ps aux') Before running find, here is the output of command you requested: [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.mount ● proc-sys-fs-binfmt_misc.mount - Arbitrary Executable File Formats File System Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.mount; static; vendor preset: disabled) Active: inactive (dead) since Fri 2018-06-15 08:42:48 GMT; 9min ago Where: /proc/sys/fs/binfmt_misc What: binfmt_misc Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems Jun 15 08:04:05 localhost.localdomain systemd[1]: Mounted Arbitrary Executable File Formats File System. Jun 15 08:42:48 s3a.lab.pst systemd[1]: Unmounting Arbitrary Executable File Formats File System... Jun 15 08:42:48 s3a.lab.pst systemd[1]: Unmounted Arbitrary Executable File Formats File System. I suppose this *inactive* is causing problems. In order to provide detailed timestamps, here is the output of our script that has done yum install/upgrade command for OS packages: FINISHED with yum exit code 0 at 2018-06-15-08:44:40 We've been using same script/approach to update packages from CentOS 5.7 -> CentOS 6.3/4 -> CentOS 7.2/3/4 and this is the first time find command fails after the update. The output of systemctl list-jobs, while find is hanging in another terminal session is as follows: [root@s3a ~]# systemctl list-jobs No jobs running. Here is a quick update on the workaround we just applied: After the packages update, we can run: systemctl start proc-sys-fs-binfmt_misc.mount and the find will work again. That explains why it works after reboot. For the time being, we will try adding the explicit 'systemctl start' after the packages update. (In reply to Davor from comment #4) > After the command gets stuck, whole terminal session is blocked - i cannot > cancel/stop/kill this process or invoke any command. The only way is to > login through a new SSH session for example. Can you please reproduce the issue and then save coredump (using "gcore 1" command) of systemd using the second session? > Maybe it is important that we *didn't reboot* the system after packages > update - our shell script is using find right after the packages got updated. That could be related however we do re-execute systemd on package updates. What binary image is systemd running after update (ls -l /proc/1/exec)? > My colleague had success when he invoked: > systemctl restart proc-sys-fs-binfmt_misc.mount, which caused some exception > and then returned him to the shell. Do you know what is the exact error message that he got back? > > In my case, this restart did not resolve any issue - multiple invoked find > processed were still there (in S, D state, per 'ps aux') > > > Before running find, here is the output of command you requested: > > [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.mount > ● proc-sys-fs-binfmt_misc.mount - Arbitrary Executable File Formats File > System > Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.mount; > static; vendor preset: disabled) > Active: inactive (dead) since Fri 2018-06-15 08:42:48 GMT; 9min ago > Where: /proc/sys/fs/binfmt_misc > What: binfmt_misc > Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt > http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems > > Jun 15 08:04:05 localhost.localdomain systemd[1]: Mounted Arbitrary > Executable File Formats File System. > Jun 15 08:42:48 s3a.lab.pst systemd[1]: Unmounting Arbitrary Executable File > Formats File System... > Jun 15 08:42:48 s3a.lab.pst systemd[1]: Unmounted Arbitrary Executable File > Formats File System. > > I suppose this *inactive* is causing problems. What about corresponding *automount* unit (systemctl status proc-sys-fs-binfmt_misc.automount)? Also, can you do "mount | grep binfmt" before and after you reproduce? >>> Can you please reproduce the issue and then save coredump (using "gcore 1" command) of systemd using the second session? [root@s3a ~]# gcore 1 [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". 0x00007fed39cfb183 in epoll_wait () from /lib64/libc.so.6 warning: target file /proc/1/cmdline contained unexpected null characters Saved corefile core.1 >>> That could be related however we do re-execute systemd on package updates. What binary image is systemd running after update (ls -l /proc/1/exec)? Did you mean 'exe' instead of 'exec'? [root@s3a ~]# ls -la /proc/1/exec ls: cannot access /proc/1/exec: No such file or directory before repro: [root@s3a ~]# ls -la /proc/1/exe lrwxrwxrwx. 1 root root 0 Jun 15 11:28 /proc/1/exe -> /usr/lib/systemd/systemd after repro: [root@s3a ~]# ls -la /proc/1/exe lrwxrwxrwx. 1 root root 0 Jun 15 11:28 /proc/1/exe -> /usr/lib/systemd/systemd >>> Do you know what is the exact error message that he got back? Unfortunately no. I just had a brief look at his screen yesterday. He is on PTO for the next two weeks. :/ >>> What about corresponding *automount* unit (systemctl status proc-sys-fs-binfmt_misc.automount)? Also, can you do "mount | grep binfmt" before and after you reproduce? Before reproduce: [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.automount ● proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: disabled) Active: active (running) since Fri 2018-06-15 11:13:36 GMT; 17min ago Where: /proc/sys/fs/binfmt_misc Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems [root@s3a ~]# mount | grep binfmt systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct,pipe_ino=8962) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) After reproduce: systemctl status proc-sys-fs-binfmt_misc.automount ● proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: disabled) Active: active (running) since Fri 2018-06-15 11:13:36 GMT; 30min ago Where: /proc/sys/fs/binfmt_misc Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.mount ● proc-sys-fs-binfmt_misc.mount - Arbitrary Executable File Formats File System Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.mount; static; vendor preset: disabled) Active: inactive (dead) since Fri 2018-06-15 11:42:09 GMT; 2min 25s ago Where: /proc/sys/fs/binfmt_misc What: binfmt_misc Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems Jun 15 11:42:09 s3a.lab.pst systemd[1]: Unmounting Arbitrary Executable File Formats File System... Jun 15 11:42:09 s3a.lab.pst systemd[1]: Unmounted Arbitrary Executable File Formats File System. [root@s3a ~]# mount | grep binfmt systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct,pipe_ino=8962) [root@s3a ~]# Created attachment 1451893 [details]
coredump file
Sorry about the coredump, i have attached the dump seconds before the issue is reproduced. Now, when the issue is reproduced, i'm getting gcore stuck also - 10 minutes already. [root@s3a ~]# gcore 1 ... After 'gcore 1' is stuck, I'm not able to invoke systemctl commands: [root@s3a ~]# systemctl start proc-sys-fs-binfmt_misc.mount Failed to start proc-sys-fs-binfmt_misc.mount: Connection timed out See system logs and 'systemctl status proc-sys-fs-binfmt_misc.mount' for details. [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.mount Failed to get properties: Connection timed out If i kill the gcore processes (gdb --nx --batch -ex set pagination off -ex set height 0 -ex set width 0 -ex attach 1 -ex gcore core.1 -ex detach -ex quit ) with 'kill -9' (which is possible, compared to 'find'), I'm again able to check the status with: [root@s3a ~]# systemctl status proc-sys-fs-binfmt_misc.mount ● proc-sys-fs-binfmt_misc.mount - Arbitrary Executable File Formats File System Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.mount; static; vendor preset: disabled) Active: inactive (dead) since Fri 2018-06-15 11:42:09 GMT; 21min ago Where: /proc/sys/fs/binfmt_misc What: binfmt_misc Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems Jun 15 11:42:09 s3a.lab.pst systemd[1]: Unmounting Arbitrary Executable File Formats File System... Jun 15 11:42:09 s3a.lab.pst systemd[1]: Unmounted Arbitrary Executable File Formats File System. *** This bug has been marked as a duplicate of bug 1596241 *** Hello, You mentioned that this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1596241 But I don't have permission to access that bug and need to know if it is resolved or not? If not resolved, do you require more information? I found that if I run: sysctl -a It locks up indefinitely as well. Strace shows it is locked up with similar error. stat("/proc/sys/fs/aio-max-nr", {st_dev=makedev(0, 3), st_ino=436832142, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 stat("/proc/sys/fs/aio-max-nr", {st_dev=makedev(0, 3), st_ino=436832142, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 open("/proc/sys/fs/aio-max-nr", O_RDONLY) = 5 fstat(5, {st_dev=makedev(0, 3), st_ino=436832142, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb2174d5000 read(5, "65536\n", 1024) = 6 write(1, "fs.aio-max-nr = 65536\n", 22fs.aio-max-nr = 65536 ) = 22 read(5, "", 1024) = 0 close(5) = 0 munmap(0x7fb2174d5000, 4096) = 0 stat("/proc/sys/fs/aio-nr", {st_dev=makedev(0, 3), st_ino=436832143, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 stat("/proc/sys/fs/aio-nr", {st_dev=makedev(0, 3), st_ino=436832143, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 open("/proc/sys/fs/aio-nr", O_RDONLY) = 5 fstat(5, {st_dev=makedev(0, 3), st_ino=436832143, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2018/07/03-11:38:51.043000000, st_mtime=2018/07/03-11:38:51.043000000, st_ctime=2018/07/03-11:38:51.043000000}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb2174d5000 read(5, "6772\n", 1024) = 5 write(1, "fs.aio-nr = 6772\n", 17fs.aio-nr = 6772 ) = 17 read(5, "", 1024) = 0 close(5) = 0 munmap(0x7fb2174d5000, 4096) = 0 stat("/proc/sys/fs/binfmt_misc", There it will sit indefinitely and it can't be ctrl-c'd or anything. A reboot does not fix this either, run the same command after a reboot and it will lock up in the same place. (In reply to Peter E. from comment #12) > You mentioned that this is a duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=1596241 > > But I don't have permission to access that bug and need to know if it is > resolved or not? Yes, it is. (In reply to David Tardon from comment #13) > (In reply to Peter E. from comment #12) > > You mentioned that this is a duplicate of > > https://bugzilla.redhat.com/show_bug.cgi?id=1596241 > > > > But I don't have permission to access that bug and need to know if it is > > resolved or not? > > Yes, it is. Hello, how did you resolve this? I have the same issue. thanks. (In reply to kevin from comment #14) > (In reply to David Tardon from comment #13) > > (In reply to Peter E. from comment #12) > > > You mentioned that this is a duplicate of > > > https://bugzilla.redhat.com/show_bug.cgi?id=1596241 > > > > > > But I don't have permission to access that bug and need to know if it is > > > resolved or not? > > > > Yes, it is. > > Hello, how did you resolve this? I have the same issue. thanks. It has been fixed in systemd-219-61.el7. Actually, it turned out later that it's not fixed. We've got a fix now, but it's not in any released build yet. See bug 1651257. |