Red Hat Bugzilla – Bug 1309300
[scale] - vdsm initialization timeout (on master machine with 144 cores and vm per core)
Last modified: 2016-06-28 17:56:45 EDT
Description of problem:
when restarting (vdsm start) on master machine with 144 cores and vm per core.
vdsm pre-start \ initialization failed due to systemd TimeOutStartSec=90.
when it failed systemctl status and journal print this:
Active: activating (start-pre) since Wed 2016-02-17 05:57:46 EST; 136ms ago
Process: 113343 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
Main PID: 96235 (code=exited, status=0/SUCCESS); : 127024 (vdsmd_init_comm)
├─127024 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
└─127030 /usr/bin/python /usr/share/vdsm/get-conf-item /etc/vdsm/vdsm.conf irs repository /rhev/
Once the TimeOutStartSec extended to 500sec, vdsm start correctly
seems like the pre start and init hit the performance under this such of scale.
not sure if vdsm support profiling around this area since it is pre-start of the vdsm itself.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. extreme host with large ram and more than 100 cores.
2. run vm per core.
vdsm failed to start due to TimeOutStartSec
optimize the initialization stage.
further investigation required - profiler results.
there is no such a data in the vdsm logs, since the init stage failed may other logs have some useful lines
Please attach /var/log/vdsm/* so we can tell what eats so much time during boot.
(and /var/log/messages as well)
Currently targeting to 4.0.
We will re-examine once we get more details.
Please re-open if still relevant, and give access to the environment and relevant logs.