Bug 1333260 - vdsmd and libvirtd fails to start with input/output errors
Summary: vdsmd and libvirtd fails to start with input/output errors
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.26
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Yaniv Bronhaim
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks: Gluster-HC-1
TreeView+ depends on / blocked
 
Reported: 2016-05-05 07:03 UTC by Bhaskarakiran
Modified: 2023-09-14 03:22 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-05-26 12:06:12 UTC
oVirt Team: Infra
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
/var/log/messages file of rhsqa4 (2.40 MB, application/zip)
2016-05-05 09:05 UTC, Bhaskarakiran
no flags Details

Description Bhaskarakiran 2016-05-05 07:03:29 UTC
Description of problem:
-----------------------

Out of 3 nodes, one of the node went unresponsive. Could see that it vdsmd service failed to start couple of times. All the dependent services fail including libvirtd

[root@rhsqa4 .shard]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) (Result: timeout) since Thu 2016-05-05 12:05:32 IST; 17min ago
  Process: 14635 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 2954 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=killed, signal=KILL)
  Process: 14911 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=killed, signal=TERM)
 Main PID: 2954 (code=killed, signal=KILL)

May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:26 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
May 05 12:22:27 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Ser...r.
May 05 12:22:27 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resul...'.
Hint: Some lines were ellipsized, use -l to show in full.
[root@rhsqa4 .shard]# 

Version-Release number of selected component (if applicable):
4.17.26-0.1

How reproducible:
Seen once

Steps to Reproduce:
-------------------
All that i was running was IO from the clients and this was hit. Not sure of the exact steps.

Actual results:


Expected results:


Additional info:
----------------
Sosreports of the nodes will be attached.

Comment 2 Bhaskarakiran 2016-05-05 07:12:42 UTC
[root@rhsqa4 ~]# systemctl status libvirtd -l
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: failed (Result: start-limit) since Thu 2016-05-05 12:42:00 IST; 16s ago
     Docs: man:libvirtd(8)
           http://libvirt.org
  Process: 24584 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=1/FAILURE)
 Main PID: 24584 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/libvirtd.service

May 05 12:41:59 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Unit libvirtd.service entered failed state.
May 05 12:41:59 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service failed.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service holdoff time over, scheduling restart.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: start request repeated too quickly for libvirtd.service
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: Unit libvirtd.service entered failed state.
May 05 12:42:00 rhsqa4.lab.eng.blr.redhat.com systemd[1]: libvirtd.service failed.
[root@rhsqa4 ~]#

Comment 3 Bhaskarakiran 2016-05-05 09:05:52 UTC
Created attachment 1154147 [details]
/var/log/messages file of rhsqa4

Comment 6 Yaniv Bronhaim 2016-05-15 08:15:11 UTC
Hi Bhaskarakiran, in first look rhsqa4 is responsive now and both libvirtd and vdsmd are running. I can't see errors there right now. However, as you say, I can see in the sos report that libvirt failed to start with "error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable"
This does sound like IO error which related specifically to your host. 

I don't see any other reason for that to happen:
2016-05-09 05:43:13.408+0000: 30026: info : libvirt version: 1.2.17, package: 13.el7_2.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-03-02-11:10:27, x86-034.build.eng.bos.redhat.com)
2016-05-09 05:43:13.408+0000: 30026: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fcb90775070 classname=virAccessManagerClass
2016-05-09 05:43:13.408+0000: 30026: debug : virAccessManagerNewDriver:108 : Initialized with stack
2016-05-09 05:43:13.408+0000: 30026: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fcb90769d60 classname=virAccessManagerClass
2016-05-09 05:43:13.408+0000: 30026: debug : virAccessManagerNewDriver:108 : Initialized with none
2016-05-09 05:43:13.408+0000: 30026: info : virObjectRef:296 : OBJECT_REF: obj=0x7fcb90775070
2016-05-09 05:43:13.408+0000: 30026: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fcb90775070
2016-05-09 05:43:13.408+0000: 30026: debug : main:1315 : Decided on pid file path '/var/run/libvirtd.pid'
2016-05-09 05:43:13.408+0000: 30026: debug : main:1328 : Decided on socket paths '/var/run/libvirt/libvirt-sock', '/var/run/libvirt/libvirt-sock-ro' and '/var/run/libvirt/libvirt-admin-sock'
2016-05-09 05:43:13.408+0000: 30026: debug : main:1364 : Ensuring run dir '/var/run/libvirt' exists
2016-05-09 05:43:13.408+0000: 30026: debug : virFileMakePathHelper:2660 : path=/var/run/libvirt mode=0777
2016-05-09 05:43:13.408+0000: 30026: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2016-05-09 05:43:13.408+0000: 30026: debug : virFileClose:102 : Closed fd 5
2016-05-09 05:43:13.408+0000: 30026: info : virNetlinkEventServiceStopAll:546 : stopping all netlink event services

this repeating log means that libvirtd couldn't touch the pid file under /var/run . and therefore vdsmd didn't start at all..

Except saying that, I can't analyze more if it doesn't happen again

Any additional info about the bug to share? maybe we already have reproducer..?

Comment 7 Red Hat Bugzilla Rules Engine 2016-05-15 08:15:18 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 8 Red Hat Bugzilla 2023-09-14 03:22:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.