Bug 1309300 - [scale] - vdsm initialization timeout (on master machine with 144 cores and vm per core)
Summary: [scale] - vdsm initialization timeout (on master machine with 144 cores and v...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.20
Hardware: x86_64
OS: Linux
unspecified
medium vote
Target Milestone: ovirt-4.0.0-alpha
: ---
Assignee: Yaniv Bronhaim
QA Contact: Eldad Marciano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-17 12:20 UTC by Eldad Marciano
Modified: 2016-06-28 21:56 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-13 07:30:42 UTC
oVirt Team: Infra
oourfali: ovirt-4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Eldad Marciano 2016-02-17 12:20:03 UTC
Description of problem:
when restarting (vdsm start) on master machine with 144 cores and vm per core.
vdsm pre-start \ initialization failed due to systemd TimeOutStartSec=90.

when it failed systemctl status and journal print this:
Active: activating (start-pre) since Wed 2016-02-17 05:57:46 EST; 136ms ago
Process: 113343 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
Main PID: 96235 (code=exited, status=0/SUCCESS);         : 127024 (vdsmd_init_comm)
    CGroup: /system.slice/vdsmd.service
            └─control
              ├─127024 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
              └─127030 /usr/bin/python /usr/share/vdsm/get-conf-item /etc/vdsm/vdsm.conf irs repository /rhev/


Once the TimeOutStartSec extended to 500sec, vdsm start correctly

seems like the pre start and init hit the performance under this such of scale.

not sure if vdsm support profiling around this area since it is pre-start of the vdsm itself.

Version-Release number of selected component (if applicable):
vdsm-4.17.20-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. extreme host with large ram and more than 100 cores.
2. run vm per core.


Actual results:
vdsm failed to start due to TimeOutStartSec

Expected results:
optimize the initialization stage.

Additional info:

further investigation required - profiler results.
there is no such a data in the vdsm logs, since the init stage failed may other logs have some useful lines

Comment 1 Dan Kenigsberg 2016-02-17 13:45:49 UTC
Please attach /var/log/vdsm/* so we can tell what eats so much time during boot.

Comment 2 Dan Kenigsberg 2016-02-17 13:46:42 UTC
(and /var/log/messages as well)

Comment 3 Oved Ourfali 2016-02-24 11:26:06 UTC
Currently targeting to 4.0.
We will re-examine once we get more details.

Comment 4 Oved Ourfali 2016-03-13 07:30:42 UTC
Please re-open if still relevant, and give access to the environment and relevant logs.


Note You need to log in before you can comment on or make changes to this bug.