Bug 1333260

Summary: vdsmd and libvirtd fails to start with input/output errors
Product: [oVirt] vdsm Reporter: Bhaskarakiran <byarlaga>
Component: CoreAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.17.26CC: bugs, byarlaga, cshao, huiwa, huzhao, mzywusko, oourfali, weiwang, yaniwang, ycui
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 12:06:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1258386    
Attachments:
Description Flags
/var/log/messages file of rhsqa4 none

Description Bhaskarakiran 2016-05-05 07:03:29 UTC
Description of problem:
-----------------------

Out of 3 nodes, one of the node went unresponsive. Could see that it vdsmd service failed to start couple of times. All the dependent services fail including libvirtd

[root@rhsqa4 .shard]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) (Result: timeout) since Thu 2016-05-05 12:05:32 IST; 17min ago
  Process: 14635 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 2954 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=killed, signal=KILL)
  Process: 14911 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=killed, signal=TERM)
 Main PID: 2954 (code=killed, signal=KILL)

May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:27 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:27 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
Hint: Some lines were ellipsized, use -l to show in full.
[root@rhsqa4 .shard]# 

Version-Release number of selected component (if applicable):
4.17.26-0.1

How reproducible:
Seen once

Steps to Reproduce:
-------------------
All that i was running was IO from the clients and this was hit. Not sure of the exact steps.

Actual results:


Expected results:


Additional info:
----------------
Sosreports of the nodes will be attached.

Comment 2 Bhaskarakiran 2016-05-05 07:12:42 UTC
[root@rhsqa4 ~]# systemctl status libvirtd -l
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: failed (Result: start-limit) since Thu 2016-05-05 12:42:00 IST; 16s ago
     Docs: man:libvirtd(8)
           http://libvirt.org
  Process: 24584 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=1/FAILURE)
 Main PID: 24584 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/libvirtd.service

May 05 12:41:59 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Unit libvirtd.service entered failed state.
May 05 12:41:59 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service failed.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service holdoff time over, scheduling restart.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: start request repeated too quickly for libvirtd.service
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Unit libvirtd.service entered failed state.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service failed.
[root@rhsqa4 ~]#

Comment 3 Bhaskarakiran 2016-05-05 09:05:52 UTC
Created attachment 1154147 [details]
/var/log/messages file of rhsqa4

Comment 6 Yaniv Bronhaim 2016-05-15 08:15:11 UTC
Hi Bhaskarakiran, in first look rhsqa4 is responsive now and both libvirtd and vdsmd are running. I can't see errors there right now. However, as you say, I can see in the sos report that libvirt failed to start with "error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable"
This does sound like IO error which related specifically to your host. 

I don't see any other reason for that to happen:
2016-05-09 05:43:13.408+0000: 30026: info : libvirt version: 1.2.17, package: 13.el7_2.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-03-02-11:10:27, x86-034.build.eng.bos.redhat.com)
2016-05-09 05:43:13.408+0000: 30026: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fcb90775070 classname=virAccessManagerClass
2016-05-09 05:43:13.408+0000: 30026: debug : virAccessManagerNewDriver:108 : Initialized with stack
2016-05-09 05:43:13.408+0000: 30026: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fcb90769d60 classname=virAccessManagerClass
2016-05-09 05:43:13.408+0000: 30026: debug : virAccessManagerNewDriver:108 : Initialized with none
2016-05-09 05:43:13.408+0000: 30026: info : virObjectRef:296 : OBJECT_REF: obj=0x7fcb90775070
2016-05-09 05:43:13.408+0000: 30026: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fcb90775070
2016-05-09 05:43:13.408+0000: 30026: debug : main:1315 : Decided on pid file path '/var/run/libvirtd.pid'
2016-05-09 05:43:13.408+0000: 30026: debug : main:1328 : Decided on socket paths '/var/run/libvirt/libvirt-sock', '/var/run/libvirt/libvirt-sock-ro' and '/var/run/libvirt/libvirt-admin-sock'
2016-05-09 05:43:13.408+0000: 30026: debug : main:1364 : Ensuring run dir '/var/run/libvirt' exists
2016-05-09 05:43:13.408+0000: 30026: debug : virFileMakePathHelper:2660 : path=/var/run/libvirt mode=0777
2016-05-09 05:43:13.408+0000: 30026: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2016-05-09 05:43:13.408+0000: 30026: debug : virFileClose:102 : Closed fd 5
2016-05-09 05:43:13.408+0000: 30026: info : virNetlinkEventServiceStopAll:546 : stopping all netlink event services

this repeating log means that libvirtd couldn't touch the pid file under /var/run . and therefore vdsmd didn't start at all..

Except saying that, I can't analyze more if it doesn't happen again

Any additional info about the bug to share? maybe we already have reproducer..?

Comment 7 Red Hat Bugzilla Rules Engine 2016-05-15 08:15:18 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 8 Red Hat Bugzilla 2023-09-14 03:22:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days