Description of problem: We have seen problems where 'pvscan' in the rhgs-server container got into some sort of 'hung' state. Running strace against the newly started '/usr/sbin/lvm pvscan' command shows a (seemingly) endless-loop while reading from /run/dmeventd-client. Version-Release number of selected component (if applicable): rhgs-server-container:v3.11.1-15 How reproducible: Occasionally Steps to Reproduce: 1. Create many LVs, fill some up to 100% 2. Reboot the system 3. Have the rhgs-server container come up 4. Notice the container starting, but not getting 'ready' 5. Run 'ps ax' in the container, see 'pvscan' processes running Actual results: rhgs-server container does not become ready Expected results: The rhgs-server container should become ready within a few minutes. Mounting the LVs for all the bricks can still take some time. Mounting is done after pvscan has finished. Additional info: These types of 'hangs' in pvscan can probably be prevented by not running dmeventd in the container. There is no (currently known) reason to have access to the dmeventd sockets from the service running on the host.
Peter, can you think of a reason why we would want to have dmeventd running inside the container (and likely on the host)? If there is no valid reason, we'll continue with disabling the service in the rhgs-server container.
dmeventd was never designed to be executed inside 'container' so there are some assumption about being there only single instance of running 'dmeventd' on the whole host system. So currently I'd not recommend running multiple instances of dmeventd per many containers.
Acking the bug for 3.11.3 release
Moving this bug to failed_qa as i see that dmeventd service is still running inside the container. sh-4.2# systemctl status dm-event.service ● dm-event.service - Device-mapper event daemon Loaded: loaded (/usr/lib/systemd/system/dm-event.service; static; vendor preset: enabled) Active: active (running) since Tue 2019-05-07 05:47:59 UTC; 4h 12min ago Docs: man:dmeventd(8) Main PID: 45 (dmeventd) CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda0f5ecf2_708b_11e9_a71c_02a517c0cfee.slice/docker-41392d87adf9e0666622c3d0f8083b1fd9e69a432450780aaa28e705a0b60662.scope/system.slice/dm-event.service └─45 /usr/sbin/dmeventd -f sh-4.2# ls -l /root/buildinfo/ total 12 -rw-r--r--. 1 root root 2798 Apr 16 15:34 Dockerfile-rhel7-7.6-252 -rw-r--r--. 1 root root 6582 Apr 24 11:51 Dockerfile-rhgs3-rhgs-server-rhel7-3.11.3-8 sh-4.2# rpm -qa | grep lvm lvm2-libs-2.02.180-10.el7_6.7.x86_64 lvm2-2.02.180-10.el7_6.7.x86_64
Moving the bug to verified state as i do not see dmeventd process running in the rhgs-server-container. Performed the tests below to confirm the same. sh-4.2# systemctl status dm-event.service ● dm-event.service Loaded: masked (/dev/null; bad) Active: inactive (dead) sh-4.2# systemctl status dm-event.socket ● dm-event.socket Loaded: masked (/dev/null; bad) Active: inactive (dead) sh-4.2# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.4 0.0 46828 6932 ? Ss 11:19 0:20 /usr/sbin/init dbus 48 0.0 0.0 58096 2112 ? Ss 11:20 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 71 0.0 0.0 91504 1176 ? Ssl 11:20 0:00 /usr/sbin/gssproxy -D root 78 0.0 0.0 22696 1540 ? Ss 11:20 0:00 /usr/sbin/crond -n root 97 0.0 0.0 112864 4316 ? Ss 11:20 0:00 /usr/sbin/sshd -D root 1574 2.8 0.5 594756 167608 ? Ssl 11:25 1:55 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 1776 0.0 0.0 11680 1464 ? Ss 11:26 0:00 /bin/bash /usr/local/bin/check_diskspace.sh root 1901 1.3 2.8 20568016 924124 ? Ssl 11:27 0:55 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd. root 1911 12.5 3.2 28059724 1066352 ? Ssl 11:27 8:20 /usr/sbin/glusterfsd -s 10.70.47.125 --volfile-id heketidbstorage.10.70.47.125.var-lib-heketi-mounts-vg_51f root 5499 9.4 3.3 28283528 1113192 ? Ssl 11:27 6:14 /usr/sbin/glusterfsd -s 10.70.47.125 --volfile-id vol_664a7ca9425557dde9cab390704e2921.10.70.47.125.var-lib root 6704 18.5 0.2 2558404 73856 ? Ssl 11:28 12:13 /usr/sbin/glusterfsd -s 10.70.47.125 --volfile-id vol_81d773e7ba758e0f4a8f17f88c0eba44.10.70.47.125.var-lib root 9632 5.9 1.5 13552452 509172 ? Ssl 11:28 3:53 /usr/sbin/glusterfsd -s 10.70.47.125 --volfile-id vol_d5514028de508428dd17508c142ba3a1.10.70.47.125.var-lib root 11814 6.8 0.7 219741900 230756 ? Ssl 11:29 4:24 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block root 13204 0.0 0.0 276308 1872 ? Ssl 11:30 0:00 /usr/sbin/gluster-blockd --glfs-lru-count 15 --log-level INFO root 15543 0.0 0.0 21764 996 ? Ss 12:01 0:00 /usr/sbin/anacron -s root 17760 0.0 0.0 4360 352 ? S 12:32 0:00 sleep 120 root 17761 0.0 0.0 11820 1760 pts/0 Ss 12:32 0:00 /bin/sh root 17911 0.0 0.0 51744 1748 pts/0 R+ 12:34 0:00 ps aux sh-4.2# ps aux | grep dmeventd root 17918 0.0 0.0 9092 676 pts/0 R+ 12:34 0:00 grep dmeventd sh-4.2# ls -l /root/buildinfo/ total 12 -rw-r--r--. 1 root root 2798 Apr 16 15:34 Dockerfile-rhel7-7.6-252 -rw-r--r--. 1 root root 6824 May 15 05:02 Dockerfile-rhgs3-rhgs-server-rhel7-3.11.3-11 Add / remove device works fine. Able to create glusterfile & block volume but i saw that it took around 2 mins for the volume to be in bound state. Rebooted the server but i see an issue while the server boots up. Will raise a different bug for this issue. Rebooted the server and added the device, device addition worked fine.
Have updated the doc text. Kindly review it for technical accuracy.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1406