| Summary: | [scale] - vdsm initialization timeout (on master machine with 144 cores and vm per core) | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Eldad Marciano <emarcian> |
| Component: | Core | Assignee: | Yaniv Bronhaim <ybronhei> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Eldad Marciano <emarcian> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.17.20 | CC: | bugs, emarcian, gklein, oourfali |
| Target Milestone: | ovirt-4.0.0-alpha | Flags: | oourfali:
ovirt-4.0.0?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-03-13 07:30:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Please attach /var/log/vdsm/* so we can tell what eats so much time during boot. (and /var/log/messages as well) Currently targeting to 4.0. We will re-examine once we get more details. Please re-open if still relevant, and give access to the environment and relevant logs. |
Description of problem: when restarting (vdsm start) on master machine with 144 cores and vm per core. vdsm pre-start \ initialization failed due to systemd TimeOutStartSec=90. when it failed systemctl status and journal print this: Active: activating (start-pre) since Wed 2016-02-17 05:57:46 EST; 136ms ago Process: 113343 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Main PID: 96235 (code=exited, status=0/SUCCESS); : 127024 (vdsmd_init_comm) CGroup: /system.slice/vdsmd.service └─control ├─127024 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start └─127030 /usr/bin/python /usr/share/vdsm/get-conf-item /etc/vdsm/vdsm.conf irs repository /rhev/ Once the TimeOutStartSec extended to 500sec, vdsm start correctly seems like the pre start and init hit the performance under this such of scale. not sure if vdsm support profiling around this area since it is pre-start of the vdsm itself. Version-Release number of selected component (if applicable): vdsm-4.17.20-0.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. extreme host with large ram and more than 100 cores. 2. run vm per core. Actual results: vdsm failed to start due to TimeOutStartSec Expected results: optimize the initialization stage. Additional info: further investigation required - profiler results. there is no such a data in the vdsm logs, since the init stage failed may other logs have some useful lines