Created attachment 1794030 [details] vdsm.log Description of problem: hosts become 'NonOperational' after disconnect the storage server's network. Version-Release number of selected component (if applicable): 4.4.7 How reproducible: 100% Steps to Reproduce: 1.Create vm from a template (any good template) on hosted-engine environment - this vm will be the storage server 2.install nfs-utils package in the vm (after you got ip and ssh to the vm) 3. create a folder (nfs) and mount this folder 4. define exports file - nano /etc/exports with nfs path and *(rw, sync,no_all_squash, root_squash) 5. restart nfs service 6. go to storage domain screen in rhv - there are some storage domains ui that running- and create new storage domain- put in the path the storage server of vm that you already created 7. go to storage vm and disconnect the network with - nmcli n off 8. look on the hosts screen on rhv ui Actual results: hosts become 'NonOperational' after disconnect the storage server's network. The environment collapse and the user need to reprovision the environment host try to connect all the time to the disconnect storage server. Expected results: host should be ok and try to connect to other good storage domains, after disconnect one of nfs storage server. Additional info:
Michal, can you please describe the environment you are using? is there only one host in it? If a host cannot see a storage domain while all other hosts can see it, the host becomes 'NonOperational' because the Engine assumes that there is a problem with the host. Please provide the engine logs also.
I have an environment with 3 hosts. I have 4 available storage domains. I can't provide engine logs- because the environment collapse after this scenario
Creating a storage server on a vm which is part of the setup is not a real world use case and we don't care about it. If you used this flow in as an automated test the tests is bad and should be removed from the test suite. Please show how you reproduce this with storage server running on another host. You can use a vm, but not a vm managed by RHV. Usually these tests are done by blocking access the the storage server using iptables. I also tested in the past shutting down the storage server vm.
I reproduced this without hosted engine, with NFS serve on separate host. Environment tested: - 2 hosts running RHEL 8.5 nightly (updated last week) - 2 iscsi storage domains served by server "storage" - 2 nfs storage domains server by server "storage" - 1 nfs storage domain server by server "storage2" - 1 ovirt vm with disk on iscsi storage domain on server "storage" - 1 ovirt vm with disk on nfs storage domain on server "storage2" - all hosts are vms on my laptop 1. Start top in batch mode on both hosts 2. Shutdown the server "storage2" 3. After a 1-2 mintes, both hosts become non operational 4. Connect to hosts using ssh, both show very high load (50-60) 5. Wait 10-15 minutes 6. Start up server "storage2" 7. After few mintues system goes back to normal state We expect that: - vm with disk on server "storage2" will be paused - nfs storage domain on server "storage2" becomes inactive - both hosts should be up - vm with disk on server "storage" run normally In top output on both hosts we see that average load increases to very high values when storage server "storage2" was down: $ grep 'load average' host4-top.out top - 19:14:45 up 1:00, 1 user, load average: 0.38, 14.22, 19.55 top - 19:14:55 up 1:00, 1 user, load average: 0.32, 13.75, 19.34 top - 19:15:05 up 1:01, 1 user, load average: 1.05, 13.47, 19.19 top - 19:15:15 up 1:01, 2 users, load average: 2.12, 13.29, 19.06 top - 19:15:25 up 1:01, 2 users, load average: 3.18, 13.14, 18.96 top - 19:15:35 up 1:01, 2 users, load average: 4.07, 13.01, 18.85 top - 19:16:02 up 1:02, 2 users, load average: 30.56, 18.24, 20.41 top - 19:16:25 up 1:02, 2 users, load average: 50.71, 23.72, 22.15 top - 19:16:35 up 1:02, 2 users, load average: 43.75, 23.12, 21.97 top - 19:16:59 up 1:03, 2 users, load average: 56.04, 27.45, 23.42 top - 19:17:25 up 1:03, 2 users, load average: 67.60, 32.21, 25.09 top - 19:17:35 up 1:03, 2 users, load average: 58.44, 31.41, 24.91 top - 19:17:45 up 1:03, 2 users, load average: 53.88, 31.31, 24.94 top - 19:17:55 up 1:03, 2 users, load average: 57.99, 32.93, 25.54 top - 19:18:25 up 1:04, 2 users, load average: 71.07, 38.24, 27.53 top - 19:18:35 up 1:04, 2 users, load average: 62.09, 37.39, 27.37 top - 19:18:45 up 1:04, 2 users, load average: 56.00, 36.90, 27.31 top - 19:18:55 up 1:04, 2 users, load average: 48.69, 35.97, 27.11 top - 19:19:05 up 1:05, 2 users, load average: 42.74, 35.11, 26.93 top - 19:19:15 up 1:05, 2 users, load average: 47.71, 36.43, 27.44 top - 19:19:25 up 1:05, 2 users, load average: 52.06, 37.73, 27.96 top - 19:19:35 up 1:05, 2 users, load average: 53.24, 38.46, 28.31 top - 19:19:45 up 1:05, 2 users, load average: 53.82, 39.07, 28.61 top - 19:19:55 up 1:05, 2 users, load average: 47.06, 38.11, 28.41 top - 19:20:05 up 1:06, 2 users, load average: 41.50, 37.22, 28.23 top - 19:20:30 up 1:06, 2 users, load average: 57.07, 41.13, 29.76 top - 19:20:52 up 1:06, 2 users, load average: 62.28, 43.18, 30.66 top - 19:21:02 up 1:07, 2 users, load average: 61.09, 43.59, 30.93 top - 19:21:12 up 1:07, 2 users, load average: 53.57, 42.55, 30.73 top - 19:21:22 up 1:07, 2 users, load average: 49.02, 41.94, 30.66 top - 19:21:32 up 1:07, 2 users, load average: 43.89, 41.08, 30.50 top - 19:21:42 up 1:07, 2 users, load average: 38.84, 40.09, 30.29 top - 19:21:52 up 1:07, 2 users, load average: 34.71, 39.16, 30.09 top - 19:22:02 up 1:08, 2 users, load average: 30.98, 38.22, 29.88 top - 19:22:12 up 1:08, 2 users, load average: 27.61, 37.26, 29.66 top - 19:22:22 up 1:08, 2 users, load average: 25.14, 36.41, 29.46 top - 19:22:32 up 1:08, 2 users, load average: 22.75, 35.52, 29.25 top - 19:22:43 up 1:08, 2 users, load average: 20.87, 34.70, 29.05 top - 19:22:53 up 1:08, 2 users, load average: 19.51, 33.95, 28.86 top - 19:23:03 up 1:09, 2 users, load average: 17.75, 33.10, 28.64 top - 19:23:13 up 1:09, 2 users, load average: 23.35, 33.79, 28.91 top - 19:23:23 up 1:09, 2 users, load average: 24.85, 33.79, 28.97 top - 19:23:33 up 1:09, 2 users, load average: 25.04, 33.53, 28.93 top - 19:23:43 up 1:09, 1 user, load average: 24.84, 33.22, 28.88 top - 19:23:53 up 1:09, 1 user, load average: 21.89, 32.30, 28.63 top - 19:24:03 up 1:10, 1 user, load average: 25.07, 32.63, 28.77 top - 19:24:13 up 1:10, 1 user, load average: 30.53, 33.55, 29.11 top - 19:24:23 up 1:10, 1 user, load average: 27.62, 32.82, 28.92 top - 19:24:33 up 1:10, 1 user, load average: 29.68, 33.08, 29.05 top - 19:24:43 up 1:10, 1 user, load average: 38.34, 34.82, 29.66 top - 19:24:53 up 1:10, 1 user, load average: 43.98, 36.16, 30.15 top - 19:25:03 up 1:11, 1 user, load average: 40.15, 35.59, 30.03 top - 19:25:13 up 1:11, 1 user, load average: 37.35, 35.20, 29.99 top - 19:25:23 up 1:11, 1 user, load average: 33.28, 34.40, 29.79 top - 19:25:33 up 1:11, 1 user, load average: 28.79, 33.40, 29.51 top - 19:25:43 up 1:11, 1 user, load average: 27.44, 32.96, 29.41 top - 19:25:53 up 1:11, 1 user, load average: 26.98, 32.68, 29.36 top - 19:26:03 up 1:12, 1 user, load average: 22.91, 31.62, 29.05 top - 19:26:13 up 1:12, 1 user, load average: 19.39, 30.58, 28.74 top - 19:26:23 up 1:12, 1 user, load average: 16.48, 29.59, 28.43 top - 19:26:33 up 1:12, 1 user, load average: 13.95, 28.61, 28.13 top - 19:26:43 up 1:12, 1 user, load average: 11.87, 27.69, 27.83 top - 19:26:53 up 1:12, 1 user, load average: 10.05, 26.77, 27.53 top - 19:27:03 up 1:13, 1 user, load average: 8.50, 25.89, 27.24 top - 19:27:13 up 1:13, 1 user, load average: 7.20, 25.04, 26.94 top - 19:27:23 up 1:13, 1 user, load average: 6.09, 24.21, 26.66 top - 19:27:33 up 1:13, 1 user, load average: 5.15, 23.42, 26.37 top - 19:27:43 up 1:13, 1 user, load average: 4.36, 22.64, 26.09 top - 19:27:53 up 1:13, 1 user, load average: 3.69, 21.90, 25.81 top - 19:28:03 up 1:14, 2 users, load average: 3.20, 21.19, 25.53 top - 19:28:13 up 1:14, 2 users, load average: 2.94, 20.54, 25.28 top - 19:28:23 up 1:14, 2 users, load average: 2.48, 19.87, 25.01 top - 19:28:33 up 1:14, 2 users, load average: 2.10, 19.21, 24.74 top - 19:28:43 up 1:14, 2 users, load average: 1.78, 18.58, 24.47 top - 19:28:53 up 1:14, 2 users, load average: 1.50, 17.97, 24.21 top - 19:29:03 up 1:15, 2 users, load average: 1.35, 17.39, 23.95 Looking at the output when load was high: top - 19:16:02 up 1:02, 2 users, load average: 30.56, 18.24, 20.41 Tasks: 254 total, 9 running, 245 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 75.7 sy, 0.0 ni, 22.5 id, 0.4 wa, 0.2 hi, 0.0 si, 0.4 st MiB Mem : 3736.1 total, 803.0 free, 2232.4 used, 700.7 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1182.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2773 root 20 0 0 0 0 R 74.5 0.0 3:51.06 kworker/u8:8-rpciod 977 root 0 -20 0 0 0 R 74.4 0.0 0:19.78 rpciod+rpciod 10277 root 20 0 0 0 0 R 74.4 0.0 3:15.33 kworker/u8:2+rpciod 4805 root 20 0 0 0 0 R 73.1 0.0 2:19.86 kworker/u8:0+rpciod 12284 root 20 0 0 0 0 R 6.9 0.0 0:41.71 kworker/u8:4-events_unbound 2067 vdsm 0 -20 2996704 125020 30068 S 3.0 3.3 0:55.80 vdsmd 5998 vdsm 0 -20 772196 7420 3976 S 1.6 0.2 0:22.89 ioprocess 8737 qemu 20 0 2798860 955116 24436 S 0.2 25.0 0:28.93 qemu-kvm 12247 vdsm 20 0 635404 32828 10444 S 0.2 0.9 0:02.06 momd 1154 openvsw+ 10 -10 67416 5764 3960 S 0.1 0.2 0:05.47 ovsdb-server top - 19:16:25 up 1:02, 2 users, load average: 50.71, 23.72, 22.15 Tasks: 251 total, 19 running, 232 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 87.6 sy, 0.0 ni, 9.2 id, 2.1 wa, 0.3 hi, 0.0 si, 0.2 st MiB Mem : 3736.1 total, 783.8 free, 2251.5 used, 700.8 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1163.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 D 84.8 0.0 3:35.19 kworker/u8:2+iscsi_eh 12284 root 20 0 0 0 0 I 84.8 0.0 1:01.57 kworker/u8:4-rpciod 15473 root 20 0 0 0 0 R 84.6 0.0 0:19.82 kworker/u8:1+rpciod 4805 root 20 0 0 0 0 R 84.3 0.0 2:39.60 kworker/u8:0+rpciod 977 root 0 -20 0 0 0 I 6.7 0.0 0:21.34 rpciod 15503 vdsm 0 -20 133796 29484 10308 R 1.3 0.8 0:00.31 50_openstacknet 1604 root 15 -5 1897600 71888 24316 S 0.4 1.9 0:03.60 supervdsmd 2067 vdsm 0 -20 2996704 125120 30068 S 0.3 3.3 0:55.87 vdsmd 12247 vdsm 20 0 635404 32832 10444 R 0.2 0.9 0:02.10 momd 8737 qemu 20 0 2798860 955116 24436 S 0.1 25.0 0:28.96 qemu-kvm top - 19:16:59 up 1:03, 2 users, load average: 56.04, 27.45, 23.42 Tasks: 257 total, 33 running, 224 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.2 us, 74.7 sy, 0.0 ni, 22.4 id, 0.1 wa, 0.4 hi, 0.1 si, 1.2 st MiB Mem : 3736.1 total, 798.3 free, 2236.6 used, 701.3 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1178.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15473 root 20 0 0 0 0 R 79.5 0.0 0:39.46 kworker/u8:1+rpciod 7174 qemu 20 0 3714220 747732 24256 R 72.2 19.5 2:38.50 qemu-kvm 2773 root 20 0 0 0 0 R 71.9 0.0 4:08.82 kworker/u8:8+rpciod 10277 root 20 0 0 0 0 R 70.2 0.0 3:52.52 kworker/u8:2+rpciod 1 root 20 0 254512 15196 9620 S 0.8 0.4 0:10.35 systemd 2067 vdsm 0 -20 3004900 125648 30068 S 0.6 3.3 0:56.40 vdsmd 1006 dbus 20 0 65076 5980 4780 S 0.5 0.2 0:05.87 dbus-daemon 803 root 20 0 117532 28748 27072 R 0.2 0.8 0:02.39 systemd-journal 1069 root 20 0 105828 10372 8148 S 0.2 0.3 0:02.13 systemd-logind 12247 vdsm 20 0 635404 32836 10444 R 0.2 0.9 0:02.21 momd top - 19:17:25 up 1:03, 2 users, load average: 67.60, 32.21, 25.09 Tasks: 259 total, 17 running, 242 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.2 us, 84.9 sy, 0.0 ni, 12.0 id, 0.1 wa, 0.4 hi, 0.0 si, 0.4 st MiB Mem : 3736.1 total, 797.4 free, 2236.8 used, 701.9 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1177.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 R 87.2 0.0 4:14.56 kworker/u8:2+rpciod 15473 root 20 0 0 0 0 I 78.6 0.0 0:59.34 kworker/u8:1-iscsi_q_7 12284 root 20 0 0 0 0 R 78.5 0.0 1:21.41 kworker/u8:4+rpciod 5998 vdsm 0 -20 764000 7436 3976 S 78.4 0.2 0:42.71 ioprocess 2067 vdsm 0 -20 3045880 125716 30068 S 8.2 3.3 0:58.47 vdsmd 2773 root 20 0 0 0 0 R 7.2 0.0 4:10.64 kworker/u8:8-rpciod 7174 qemu 20 0 3706024 747708 24256 S 7.1 19.5 2:40.29 qemu-kvm 12247 vdsm 20 0 635404 32848 10444 S 0.3 0.9 0:02.28 momd 1 root 20 0 254512 15196 9620 S 0.2 0.4 0:10.40 systemd 1006 dbus 20 0 65076 5980 4780 S 0.2 0.2 0:05.91 dbus-daemon 15408 nsoffer 20 0 65572 5108 4192 R 0.2 0.1 0:00.19 top top - 19:17:45 up 1:03, 2 users, load average: 53.88, 31.31, 24.94 Tasks: 260 total, 14 running, 246 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.1 us, 27.2 sy, 0.0 ni, 70.0 id, 0.2 wa, 0.2 hi, 0.2 si, 0.1 st MiB Mem : 3736.1 total, 785.2 free, 2241.6 used, 709.4 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1165.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16131 vdsm 0 -20 485436 4220 3724 S 41.1 0.1 0:04.11 ioprocess 5998 vdsm 0 -20 764000 7436 3976 S 34.0 0.2 0:46.15 ioprocess 10277 root 20 0 0 0 0 R 25.8 0.0 4:17.14 kworker/u8:2+rpciod 2067 vdsm 0 -20 3045880 126204 30068 S 1.6 3.3 0:59.11 vdsmd 1 root 20 0 254512 15196 9620 S 1.2 0.4 0:10.94 systemd 1006 dbus 20 0 65076 5980 4780 S 0.7 0.2 0:06.28 dbus-daemon 8737 qemu 20 0 2798860 956288 24436 S 0.6 25.0 0:29.18 qemu-kvm 15408 nsoffer 20 0 65572 5252 4192 S 0.4 0.1 0:00.28 top 803 root 20 0 125724 31760 30060 S 0.3 0.8 0:02.52 systemd-journal 12247 vdsm 20 0 637548 33068 10656 S 0.3 0.9 0:02.37 momd top - 19:17:55 up 1:03, 2 users, load average: 57.99, 32.93, 25.54 Tasks: 260 total, 20 running, 240 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 73.9 sy, 0.0 ni, 0.0 id, 24.1 wa, 0.5 hi, 0.1 si, 1.4 st MiB Mem : 3736.1 total, 784.8 free, 2242.0 used, 709.4 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1165.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 R 99.4 0.0 4:27.10 kworker/u8:2+rpciod 5998 vdsm 0 -20 764000 7436 3976 S 99.3 0.2 0:56.10 ioprocess 16131 vdsm 0 -20 485436 4220 3724 S 99.2 0.1 0:14.05 ioprocess 8737 qemu 20 0 2798860 956288 24436 S 0.6 25.0 0:29.24 qemu-kvm 15308 nsoffer 20 0 65520 5040 4152 R 0.2 0.1 0:00.11 top 909 root 20 0 479584 21248 10588 S 0.1 0.6 0:00.68 multipathd 1030 sanlock 20 0 914876 63188 32040 S 0.1 1.7 0:01.10 sanlock 1604 root 15 -5 2045064 72708 24316 S 0.1 1.9 0:04.15 supervdsmd 15408 nsoffer 20 0 65572 5252 4192 S 0.1 0.1 0:00.29 top top - 19:18:25 up 1:04, 2 users, load average: 71.07, 38.24, 27.53 Tasks: 260 total, 12 running, 248 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 82.5 sy, 0.0 ni, 10.2 id, 5.7 wa, 0.4 hi, 0.1 si, 0.4 st MiB Mem : 3736.1 total, 781.8 free, 2244.8 used, 709.6 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1162.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 R 65.1 0.0 4:33.64 kworker/u8:2+rpciod 5998 vdsm 0 -20 764000 7436 3976 S 63.2 0.2 1:02.45 ioprocess 2067 vdsm 0 -20 3045880 126212 30068 S 3.1 3.3 0:59.42 vdsmd 1 root 20 0 254512 15196 9620 S 1.4 0.4 0:11.08 systemd 1006 dbus 20 0 65076 5988 4780 S 1.1 0.2 0:06.39 dbus-daemon 8737 qemu 20 0 2798860 956288 24436 S 0.8 25.0 0:29.32 qemu-kvm 12247 vdsm 20 0 637548 33068 10656 S 0.8 0.9 0:02.45 momd 2773 root 20 0 0 0 0 R 0.6 0.0 4:10.70 kworker/u8:8+rpciod 1069 root 20 0 105828 10372 8148 S 0.4 0.3 0:02.31 systemd-logind 803 root 20 0 125724 32264 30548 S 0.3 0.8 0:02.55 systemd-journal top - 19:18:35 up 1:04, 2 users, load average: 62.09, 37.39, 27.37 Tasks: 265 total, 7 running, 258 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.9 us, 23.8 sy, 0.0 ni, 45.5 id, 21.9 wa, 0.4 hi, 0.2 si, 0.3 st MiB Mem : 3736.1 total, 733.1 free, 2290.7 used, 712.4 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1115.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 R 92.2 0.0 5:01.24 kworker/u8:2+rpciod 5998 vdsm 0 -20 764000 7436 3976 S 66.3 0.2 1:22.28 ioprocess 2773 root 20 0 0 0 0 I 66.2 0.0 4:30.52 kworker/u8:8-events_unbound 12284 root 20 0 0 0 0 I 66.1 0.0 1:41.22 kworker/u8:4-iscsi_q_7 2067 vdsm 0 -20 3046136 126440 30068 S 1.0 3.3 0:59.71 vdsmd 1604 root 15 -5 2045064 72816 24316 S 0.8 1.9 0:04.39 supervdsmd 1 root 20 0 254512 15196 9620 S 0.6 0.4 0:11.27 systemd 1006 dbus 20 0 65076 5988 4780 S 0.4 0.2 0:06.52 dbus-daemon 8737 qemu 20 0 2798860 956288 24436 S 0.3 25.0 0:29.40 qemu-kvm 17011 vdsm 0 -20 84260 17132 8400 S 0.3 0.4 0:00.08 ovirt_provider_ 12247 vdsm 20 0 637548 33068 10656 S 0.2 0.9 0:02.52 momd top - 19:18:45 up 1:04, 2 users, load average: 56.00, 36.90, 27.31 Tasks: 269 total, 9 running, 260 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 25.2 sy, 0.0 ni, 24.4 id, 49.0 wa, 0.5 hi, 0.1 si, 0.5 st MiB Mem : 3736.1 total, 723.3 free, 2299.8 used, 713.0 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1105.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10277 root 20 0 0 0 0 R 99.6 0.0 5:11.22 kworker/u8:2+rpciod 2067 vdsm 0 -20 3046136 126444 30068 S 0.9 3.3 0:59.80 vdsmd 8737 qemu 20 0 2798860 956288 24436 S 0.8 25.0 0:29.48 qemu-kvm 12247 vdsm 20 0 637548 33072 10656 S 0.7 0.9 0:02.59 momd 15408 nsoffer 20 0 65572 5252 4192 S 0.4 0.1 0:00.41 top 1154 openvsw+ 10 -10 67416 5764 3960 S 0.3 0.2 0:05.66 ovsdb-server 1374 root 20 0 2091568 60032 39172 S 0.2 1.6 0:03.28 libvirtd 11 root 20 0 0 0 0 I 0.1 0.0 0:00.45 rcu_sched 1604 root 15 -5 2045064 72816 24316 S 0.1 1.9 0:04.40 supervdsmd top - 19:19:15 up 1:05, 2 users, load average: 47.71, 36.43, 27.44 Tasks: 260 total, 16 running, 244 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 74.6 sy, 0.0 ni, 7.3 id, 16.9 wa, 0.4 hi, 0.1 si, 0.4 st MiB Mem : 3736.1 total, 780.9 free, 2245.1 used, 710.2 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1162.2 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4805 root 20 0 0 0 0 R 99.5 0.0 2:49.69 kworker/u8:0+rpciod 12284 root 20 0 0 0 0 R 99.5 0.0 1:51.31 kworker/u8:4+rpciod 5998 vdsm 0 -20 764000 7440 3976 S 98.7 0.2 1:32.26 ioprocess 12247 vdsm 20 0 637548 33076 10656 S 0.6 0.9 0:02.69 momd 8737 qemu 20 0 2798860 956288 24436 S 0.4 25.0 0:29.61 qemu-kvm 15408 nsoffer 20 0 65704 5252 4192 S 0.4 0.1 0:00.55 top top - 19:19:25 up 1:05, 2 users, load average: 52.06, 37.73, 27.96 Tasks: 264 total, 12 running, 252 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 74.5 sy, 0.0 ni, 0.0 id, 24.4 wa, 0.4 hi, 0.0 si, 0.3 st MiB Mem : 3736.1 total, 747.2 free, 2278.7 used, 710.2 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1128.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12284 root 20 0 0 0 0 R 99.4 0.0 2:01.27 kworker/u8:4+rpciod 4805 root 20 0 0 0 0 I 98.1 0.0 2:59.52 kworker/u8:0-rpciod 5998 vdsm 0 -20 764000 7440 3976 S 98.0 0.2 1:42.08 ioprocess 2067 vdsm 0 -20 3037940 126488 30068 S 1.6 3.3 1:00.46 vdsmd 17027 vdsm 0 -20 11612 988 924 R 1.5 0.0 0:00.15 dd 8737 qemu 20 0 2798860 956288 24436 S 0.4 25.0 0:29.65 qemu-kvm 1154 openvsw+ 10 -10 67416 5764 3960 S 0.3 0.2 0:05.77 ovsdb-server top - 19:19:35 up 1:05, 2 users, load average: 53.24, 38.46, 28.31 Tasks: 271 total, 10 running, 257 sleeping, 0 stopped, 4 zombie %Cpu(s): 5.4 us, 58.6 sy, 0.0 ni, 23.1 id, 11.2 wa, 0.5 hi, 0.1 si, 1.2 st MiB Mem : 3736.1 total, 740.1 free, 2285.1 used, 711.0 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1121.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17027 vdsm 0 -20 11612 988 924 D 78.9 0.0 0:08.06 dd 16744 vdsm 0 -20 559168 6444 3872 S 77.2 0.2 0:07.74 ioprocess 12284 root 20 0 0 0 0 R 75.0 0.0 2:08.79 kworker/u8:4+rpciod 2067 vdsm 0 -20 3054332 126488 30068 S 21.3 3.3 1:02.59 vdsmd 17542 root 20 0 100956 10540 8828 S 0.9 0.3 0:00.09 systemd 15408 nsoffer 20 0 65700 5260 4192 S 0.7 0.1 0:00.65 top 8737 qemu 20 0 2798860 956288 24436 S 0.5 25.0 0:29.70 qemu-kvm top - 19:19:45 up 1:05, 2 users, load average: 53.82, 39.07, 28.61 Tasks: 271 total, 9 running, 258 sleeping, 0 stopped, 4 zombie %Cpu(s): 0.2 us, 49.9 sy, 0.0 ni, 49.2 id, 0.0 wa, 0.4 hi, 0.1 si, 0.2 st MiB Mem : 3736.1 total, 738.2 free, 2287.0 used, 711.0 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1119.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12284 root 20 0 0 0 0 R 99.5 0.0 2:18.76 kworker/u8:4+rpciod 16744 vdsm 0 -20 559168 6444 3872 S 99.5 0.2 0:17.71 ioprocess 12247 vdsm 20 0 637548 33076 10656 S 0.6 0.9 0:02.80 momd 15408 nsoffer 20 0 65700 5260 4192 S 0.4 0.1 0:00.69 top 8737 qemu 20 0 2798860 956288 24436 S 0.3 25.0 0:29.73 qemu-kvm 1285 root 20 0 394220 19136 16344 S 0.1 0.5 0:00.97 NetworkManager top - 19:19:55 up 1:05, 2 users, load average: 47.06, 38.11, 28.41 Tasks: 260 total, 1 running, 259 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.0 us, 25.3 sy, 0.0 ni, 69.8 id, 0.2 wa, 0.3 hi, 0.2 si, 0.2 st MiB Mem : 3736.1 total, 781.6 free, 2244.1 used, 710.5 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1163.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12284 root 20 0 0 0 0 I 69.2 0.0 2:25.69 kworker/u8:4-events_unbound 16744 vdsm 0 -20 559168 6444 3872 S 22.1 0.2 0:19.92 ioprocess 2067 vdsm 0 -20 3046136 126548 30068 S 2.7 3.3 1:02.86 vdsmd 1 root 20 0 254512 15196 9620 S 1.2 0.4 0:11.91 systemd 1006 dbus 20 0 65076 5988 4780 S 1.1 0.2 0:06.97 dbus-daemon top - 19:20:30 up 1:06, 2 users, load average: 57.07, 41.13, 29.76 Tasks: 264 total, 22 running, 242 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.7 us, 79.9 sy, 0.0 ni, 11.9 id, 3.4 wa, 0.5 hi, 0.0 si, 0.6 st MiB Mem : 3736.1 total, 767.2 free, 2258.0 used, 710.9 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1149.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16744 vdsm 0 -20 780364 8492 3872 S 159.4 0.2 0:59.47 ioprocess 4805 root 20 0 0 0 0 D 79.6 0.0 3:19.27 kworker/u8:0+iscsi_eh 12284 root 20 0 0 0 0 R 79.4 0.0 2:45.40 kworker/u8:4+rpciod 2067 vdsm 0 -20 3054332 126572 30068 S 16.1 3.3 1:07.27 vdsmd 8737 qemu 20 0 2798860 956288 24436 R 0.1 25.0 0:29.85 qemu-kvm 15408 nsoffer 20 0 65700 5260 4192 R 0.1 0.1 0:00.80 top top - 19:20:52 up 1:06, 2 users, load average: 62.28, 43.18, 30.66 Tasks: 261 total, 24 running, 235 sleeping, 0 stopped, 2 zombie %Cpu(s): 0.3 us, 71.6 sy, 0.0 ni, 22.1 id, 5.3 wa, 0.4 hi, 0.1 si, 0.2 st MiB Mem : 3736.1 total, 767.6 free, 2257.5 used, 711.1 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1149.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17429 vdsm 0 -20 485436 6368 3808 S 153.7 0.2 0:34.71 ioprocess 4805 root 20 0 0 0 0 R 68.2 0.0 3:34.67 kworker/u8:0+rpciod 16744 vdsm 0 -20 780364 8492 3872 S 63.2 0.2 1:13.73 ioprocess 2067 vdsm 0 -20 3054332 126580 30068 S 0.8 3.3 1:07.45 vdsmd 8737 qemu 20 0 2798860 956288 24436 S 0.5 25.0 0:29.96 qemu-kvm 12284 root 20 0 0 0 0 I 0.4 0.0 2:45.48 kworker/u8:4-rpciod 12247 vdsm 20 0 637548 33076 10656 R 0.3 0.9 0:02.91 momd top - 19:21:02 up 1:07, 2 users, load average: 61.09, 43.59, 30.93 Tasks: 258 total, 2 running, 256 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 36.3 sy, 0.0 ni, 41.9 id, 20.7 wa, 0.2 hi, 0.1 si, 0.3 st MiB Mem : 3736.1 total, 780.6 free, 2244.5 used, 711.0 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1162.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16744 vdsm 0 -20 780364 8508 3872 S 47.8 0.2 1:18.52 ioprocess 4805 root 20 0 0 0 0 I 45.2 0.0 3:39.20 kworker/u8:0-flush-253:0 2067 vdsm 0 -20 3037940 126540 30068 S 1.4 3.3 1:07.59 vdsmd 8737 qemu 20 0 2798860 956288 24436 S 0.7 25.0 0:30.03 qemu-kvm 315 root 0 -20 0 0 0 I 0.5 0.0 0:00.17 kworker/2:1H-kblockd 12247 vdsm 20 0 637548 33076 10656 S 0.4 0.9 0:02.95 momd 1154 openvsw+ 10 -10 67416 5764 3960 S 0.3 0.2 0:05.87 ovsdb-server top - 19:21:12 up 1:07, 2 users, load average: 53.57, 42.55, 30.73 Tasks: 259 total, 11 running, 248 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 28.9 sy, 0.0 ni, 69.3 id, 0.4 wa, 0.2 hi, 0.1 si, 0.3 st MiB Mem : 3736.1 total, 757.0 free, 2267.3 used, 711.8 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1139.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16744 vdsm 0 -20 780364 8508 3872 S 112.9 0.2 1:29.83 ioprocess 2067 vdsm 0 -20 3037940 126576 30068 S 1.2 3.3 1:07.71 vdsmd 8737 qemu 20 0 2798860 956288 24436 S 0.5 25.0 0:30.08 qemu-kvm 1 root 20 0 254512 15196 9620 S 0.3 0.4 0:12.47 systemd 1154 openvsw+ 10 -10 67416 5764 3960 S 0.3 0.2 0:05.90 ovsdb-server 12247 vdsm 20 0 637548 33076 10656 S 0.3 0.9 0:02.98 momd top - 19:21:22 up 1:07, 2 users, load average: 49.02, 41.94, 30.66 Tasks: 261 total, 13 running, 248 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 50.1 sy, 0.0 ni, 48.3 id, 0.0 wa, 0.4 hi, 0.1 si, 0.6 st MiB Mem : 3736.1 total, 738.1 free, 2285.0 used, 713.0 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1121.2 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16744 vdsm 0 -20 780364 8508 3872 S 198.9 0.2 1:49.76 ioprocess 2067 vdsm 0 -20 3037940 126576 30068 S 1.8 3.3 1:07.89 vdsmd 8737 qemu 20 0 2798860 956288 24436 S 0.6 25.0 0:30.14 qemu-kvm 1 root 20 0 254512 15196 9620 S 0.2 0.4 0:12.49 systemd 1154 openvsw+ 10 -10 67416 5764 3960 S 0.2 0.2 0:05.92 ovsdb-server 15308 nsoffer 20 0 65520 5040 4152 R 0.2 0.1 0:00.27 top top - 19:21:32 up 1:07, 2 users, load average: 43.89, 41.08, 30.50 Tasks: 257 total, 4 running, 253 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.9 us, 23.9 sy, 0.0 ni, 73.4 id, 0.0 wa, 0.3 hi, 0.2 si, 0.3 st MiB Mem : 3736.1 total, 779.7 free, 2245.2 used, 711.2 buff/cache MiB Swap: 2116.0 total, 2116.0 free, 0.0 used. 1161.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16744 vdsm 0 -20 780364 8508 3872 S 85.2 0.2 1:58.30 ioprocess 12284 root 20 0 0 0 0 R 2.4 0.0 2:45.72 kworker/u8:4+rpciod 2067 vdsm 0 -20 3029744 126560 30068 S 2.0 3.3 1:08.09 vdsmd 1 root 20 0 254512 15196 9620 S 0.9 0.4 0:12.58 systemd 1006 dbus 20 0 65076 5988 4780 S 0.7 0.2 0:07.43 dbus-daemon 8737 qemu 20 0 2798860 956288 24436 S 0.7 25.0 0:30.21 qemu-kvm Looks like ioprocess is the reason for the high load, or maybe this is kernel issue and iprocess is just the victim of the kernel issue. I'll test this again without oVirt, it will make it clear if this is a kernel issue. Avihai, do we have bare metal steup I can use for testing? I test on VMs and this is not good enviroment to test such issues. We need: - engine 4.4.7 (can be a vm) - 2 bare meta hosts with latest RHEL 8.4 and RHV 4.7.4 - nfs or iscsi server for good storage (netapp used for tests?) - host for temporary nfs server that will be blocked or shut down (can be a vm)
Nir, can you please set this bug as dependent on the kernel bug that was found?
Mordechai, we reproduced this bug on a very old (2013) machines with 4 cores. We need to understand how this bug affects real servers. We need to know if this is a critical issue that may affect users or minor issue that affects only our old testing environment. Can we try to reproduce this with real environment in the scale lab? To reproduce this I need 2 hosts. One will function as NFS server and the other as the NFS client. The NFS client should be strong server that is likey to be used by users. The NFS server host can be anything since we test the case when the NFS server is not accessible.
Nir we dont have an enviroment that is currently up as our lab went down earlier today. Adding need info on dagur maybe other hosts in tlv can be used here.
Michal, entire cluster goes down because of one inaccessible storage sounds urgent to me, a blocker actually.
(In reply to Nir Soffer from comment #16) > Michal, entire cluster goes down because of one inaccessible storage > sounds urgent to me, a blocker actually. It is serious of course, but - it's a negative scenario, comes back on its own once connection is restored, and there's no data loss. More importantly, we can't do anything about it (i.e. no urgent activity required on Dev side) until the kernel bug (which is Urgent) gets fixed. "blocker+" means we won't release a version with this bug. But here we are going to proceed with release, since it's anyway already the current state so blocking doesn't help anyone (and again that consideration might be different for the RHEL bug)
Looks like there was confusion about the nature of this issue. The issue is on the NFS client side (RHV host kernel), not on the NFS server side. So the fixed kernel must be installed on all RHV hosts in the environment, not on the NFS server host.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
We have confidence the kernel fix 4.18.0-305.8.1.el8_4.bz1979070.test.x86_64 does fix the issue but it was not officially delivered to RHV. We'll retest with official RHV and RHEL/kernel builds and verify then.
Can you verify this one? Please make sure kernel-4.18.0-305.11.1.el8_4 is present
rhv: 4.4.8.3-0.10el8ev HOSTS: os version on hosts: RHEL 8.4-4.elev kernel version 4.18.0-305.12.1.el8_4x86_64 Steps to Reproduce: 1.Create vm from a template (any good template) on hosted-engine environment - this vm will be the storage server 2.install nfs-utils package in the vm (after you got ip and ssh to the vm) 3. create a folder (nfs) and mount this folder 4. define exports file - nano /etc/exports with nfs path and *(rw, sync,no_all_squash, root_squash) 5. restart nfs service 6. go to storage domain screen in rhv - there are some storage domains ui that running- and create new storage domain- put in the path the storage server of vm that you already created 7. go to storage vm and disconnect the network with - nmcli n off 8. look on the hosts screen on rhv ui Actual results: Hosts continue to work as expected and all other storage domain are running and environment continue to work. I got this warning in events: Storage Domain RHV_NFS (Data Center golden_env_mixed) was deactivated by system because it's not visible by any of the hosts.