Bug 1309300 - [scale] - vdsm initialization timeout (on master machine with 144 cores and vm per core)
[scale] - vdsm initialization timeout (on master machine with 144 cores and v...
Status: CLOSED INSUFFICIENT_DATA
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.17.20
x86_64 Linux
unspecified Severity medium (vote)
: ovirt-4.0.0-alpha
: ---
Assigned To: Yaniv Bronhaim
Eldad Marciano
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-17 07:20 EST by Eldad Marciano
Modified: 2016-06-28 17:56 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-13 03:30:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
oourfali: ovirt‑4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Eldad Marciano 2016-02-17 07:20:03 EST
Description of problem:
when restarting (vdsm start) on master machine with 144 cores and vm per core.
vdsm pre-start \ initialization failed due to systemd TimeOutStartSec=90.

when it failed systemctl status and journal print this:
Active: activating (start-pre) since Wed 2016-02-17 05:57:46 EST; 136ms ago
Process: 113343 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
Main PID: 96235 (code=exited, status=0/SUCCESS);         : 127024 (vdsmd_init_comm)
    CGroup: /system.slice/vdsmd.service
            └─control
              ├─127024 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
              └─127030 /usr/bin/python /usr/share/vdsm/get-conf-item /etc/vdsm/vdsm.conf irs repository /rhev/


Once the TimeOutStartSec extended to 500sec, vdsm start correctly

seems like the pre start and init hit the performance under this such of scale.

not sure if vdsm support profiling around this area since it is pre-start of the vdsm itself.

Version-Release number of selected component (if applicable):
vdsm-4.17.20-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. extreme host with large ram and more than 100 cores.
2. run vm per core.


Actual results:
vdsm failed to start due to TimeOutStartSec

Expected results:
optimize the initialization stage.

Additional info:

further investigation required - profiler results.
there is no such a data in the vdsm logs, since the init stage failed may other logs have some useful lines
Comment 1 Dan Kenigsberg 2016-02-17 08:45:49 EST
Please attach /var/log/vdsm/* so we can tell what eats so much time during boot.
Comment 2 Dan Kenigsberg 2016-02-17 08:46:42 EST
(and /var/log/messages as well)
Comment 3 Oved Ourfali 2016-02-24 06:26:06 EST
Currently targeting to 4.0.
We will re-examine once we get more details.
Comment 4 Oved Ourfali 2016-03-13 03:30:42 EDT
Please re-open if still relevant, and give access to the environment and relevant logs.

Note You need to log in before you can comment on or make changes to this bug.