Bug 1474584

Summary: rhosp-director: Repeating errors in /var/log/messages on compute upon deployment: "virNetSocketReadWire:1808 : End of file while reading data: Input/output error"
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: dbecker, emacchi, mburns, mcornea, morazi, pkilambi, rhel-osp-director-maint, sasha, tvignaud
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 21:44:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2017-07-25 01:54:03 UTC
rhosp-director: Repeating errors in /var/log/messages on compute upon deployment: "virNetSocketReadWire:1808 : End of file while reading data: Input/output error"

Environment:
openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch
openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch
instack-undercloud-7.1.1-0.20170714211622.el7ost.noarch
openstack-neutron-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
python-nova-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-neutron-common-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-nova-scheduler-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
puppet-nova-11.2.0-0.20170718023405.a27c62c.el7ost.noarch
openstack-nova-common-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
python-novaclient-9.0.1-0.20170621200422.ddb386b.el7ost.noarch
python-neutron-lib-1.7.0-0.20170529134801.0ee4f4a.el7ost.noarch
openstack-neutron-ml2-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-nova-placement-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
puppet-neutron-11.2.0-0.20170714192026.2bb7c50.el7ost.noarch
python-neutron-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
openstack-neutron-openvswitch-11.0.0-0.20170719132730.1c94a80.el7ost.noarch
python-neutronclient-6.3.0-0.20170601203754.ba535c6.el7ost.noarch
openstack-nova-api-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-nova-conductor-16.0.0-0.20170719155122.7ae3753.el7ost.noarch
openstack-nova-compute-16.0.0-0.20170719155122.7ae3753.el7ost.noarch


Steps to reproduce:
Complete OC deployment.
Check /var/log/messages on compute node.

Result:

Repeating errors "virNetSocketReadWire:1808":

Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.153+0000: 22097: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.153+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.154+0000: 22100: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.154+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.155+0000: 22099: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.155+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.155+0000: 22098: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.155+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.156+0000: 22096: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.156+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.157+0000: 22095: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.158+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.158+0000: 22093: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.158+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.158+0000: 22099: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.159+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.159+0000: 22098: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.159+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.160+0000: 22096: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.160+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.161+0000: 22097: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.161+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.162+0000: 22093: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.162+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost ceilometer-polling: libvirt:  error : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.163+0000: 22099: error : virGetUserEnt:810 : Failed to find user record for uid '166'
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.163+0000: 22041: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Jul 24 21:45:56 localhost journal: 2017-07-25 01:45:56.163+0000: 22098: error : virGetUserEnt:810 : Failed to find user record for uid '166'

Comment 2 Pradeep Kilambi 2017-07-25 11:16:46 UTC
I wonder if this is because libvirt is not in the same group as ceilo? 166 is suppose to be the userid for ceilometer user. But yesterday i did a fresh upstream tripleo deploy and i dont see this error:

[root@overcloud-novacompute-0 ~]# docker logs ceilometer_agent_compute
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/ceilometer/ceilometer.conf
INFO:__main__:Copying
/var/lib/kolla/config_files/src/etc/ceilometer/ceilometer.conf to
/etc/ceilometer/ceilometer.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/bin/ceilometer-polling --polling-namespaces compute'

Comment 3 Pradeep Kilambi 2017-07-25 11:42:53 UTC
ceilometer user should be created automatically when ceilometer-common is installed in %pre section of the spec. So if thats installed, the user should exist obviously. Can you check a few things for me:

- is ceilometer-common installed in the ceilometer containers

- does ceilometer user exist in your /etc/passwd file in both ceilometer_agent_compute and libvirtrd containers?

Comment 4 Alexander Chuzhoy 2017-07-25 18:53:12 UTC
On compute in nova_libvirt container:
ceilometer:x:42405:42405::/home/ceilometer:/usr/sbin/nologin

on controller:

ceilometer_agent_notification container:
openstack-ceilometer-common-8.1.0-0.20170719023739.082368a.el7ost.noarch

ceilometer_agent_central:
openstack-ceilometer-common-8.1.0-0.20170719023739.082368a.el7ost.noarch

Comment 5 Pradeep Kilambi 2017-07-25 21:52:53 UTC
(In reply to Alexander Chuzhoy from comment #4)
> On compute in nova_libvirt container:
> ceilometer:x:42405:42405::/home/ceilometer:/usr/sbin/nologin


Yea the id you have here is incorrect. This is due to the way kolla is building the images and creating user as part of the template. We need to fix the kolla dockerfile to somehow specify the uid/gid so it creates the user with right id instead of picking some random one.

Comment 6 Pradeep Kilambi 2017-07-26 15:44:49 UTC
(In reply to Pradeep Kilambi from comment #5)
> (In reply to Alexander Chuzhoy from comment #4)
> > On compute in nova_libvirt container:
> > ceilometer:x:42405:42405::/home/ceilometer:/usr/sbin/nologin
> 
> 
> Yea the id you have here is incorrect. This is due to the way kolla is
> building the images and creating user as part of the template. We need to
> fix the kolla dockerfile to somehow specify the uid/gid so it creates the
> user with right id instead of picking some random one.

So talking to jistr, we want to keep the uids this way so they dont conflict with host ids. In this case ceilometer doesn't care about what the uid is, but libvirt evidently does. But i dont know why.

 I deployed this upstream and i get the i/o error but dont see the uid issue. The I/O error could be due to an issue where redis password is defaulting to undef.

Can you provide couple more info:

- Whats the uid/gid for ceilometer in ceilometer_agent_compute containers

- do you see any errors in /var/log/ceilometer/ceilometyer-polling.log with in the ceilometer_agent_compute container.

Comment 7 Alexander Chuzhoy 2017-07-26 16:36:10 UTC

in both ceilometer_agent_central and ceilometer_agent_notification: same uid

ceilometer:x:42405:42405::/var/lib/ceilometer:/usr/sbin/nologin


no ceilometer_agent_compute

No errors in /var/log/ceilometer/ceilometer-polling.log on ceilometer_agent_central

Comment 8 Pradeep Kilambi 2017-07-26 19:32:42 UTC
with this patch: https://review.openstack.org/#/c/487480/ I can confirm that i dont see any i/o errors in /var/log/messages on the compute host anymoire. Since at this point compute is working as expected, lets use that fix downstream and see if it resolves the issue.



[root@overcloud-novacompute-0 ~]# tail /var/log/messages 
Jul 26 19:27:07 localhost journal: INFO:__main__:Copying service configuration files
Jul 26 19:27:07 localhost journal: INFO:__main__:Deleting /etc/neutron/plugins/ml2/openvswitch_agent.ini
Jul 26 19:27:07 localhost journal: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/neutron/plugins/ml2/openvswitch_agent.ini to /etc/neutron/plugins/ml2/openvswitch_agent.ini
Jul 26 19:27:07 localhost journal: INFO:__main__:Deleting /etc/neutron/neutron.conf
Jul 26 19:27:07 localhost journal: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/neutron/neutron.conf to /etc/neutron/neutron.conf
Jul 26 19:27:07 localhost journal: INFO:__main__:Writing out command to execute
Jul 26 19:27:07 localhost journal: INFO:__main__:Setting permission for /var/log/neutron
Jul 26 19:27:07 localhost journal: INFO:__main__:Setting permission for /var/log/neutron/neutron-openvswitch-agent.log
Jul 26 19:27:07 localhost journal: Running command: '/usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-dir /etc/neutron/conf.d/common'
Jul 26 19:27:08 localhost journal: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.


[root@overcloud-novacompute-0 ~]# docker ps
CONTAINER ID        IMAGE                                                                              COMMAND             CREATED             STATUS              PORTS               NAMES
47ea0b5fad12        192.168.24.1:8787/tripleoupstream/centos-binary-neutron-openvswitch-agent:latest   "kolla_start"       22 minutes ago      Up 6 seconds                            neutron_ovs_agent
464971e01e53        192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest                "kolla_start"       22 minutes ago      Up 22 minutes                           nova_migration_target
e175c6f34d6d        192.168.24.1:8787/tripleoupstream/centos-binary-ceilometer-compute:latest          "kolla_start"       22 minutes ago      Up 22 minutes                           ceilometer_agent_compute
6406874457a5        192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest                "kolla_start"       22 minutes ago      Up 22 minutes                           nova_compute
c9d55fc5c18f        192.168.24.1:8787/tripleoupstream/centos-binary-iscsid:latest                      "kolla_start"       31 minutes ago      Up 31 minutes                           iscsid
e8ad42c74dfa        192.168.24.1:8787/tripleoupstream/centos-binary-nova-libvirt:latest                "kolla_start"       31 minutes ago      Up 31 minutes                           nova_libvirt

[root@overcloud-novacompute-0 ~]# docker logs ceilometer_agent_compute
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/ceilometer/ceilometer.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/ceilometer/ceilometer.conf to /etc/ceilometer/ceilometer.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/bin/ceilometer-polling --polling-namespaces compute'

Comment 9 Marius Cornea 2017-08-04 08:42:14 UTC
I am still seeing the issue reported in this ticket during upgrade with https://review.openstack.org/#/c/487480/ and https://review.openstack.org/#/c/486737/ applied.

Comment 22 errata-xmlrpc 2017-12-13 21:44:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462