Bug 1798617

Summary: collectd coredump on a ceph/compute node filling up the / filesystem
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: collectdAssignee: Ryan McCabe <rmccabe>
Status: CLOSED NOTABUG QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: astafeye, broose, jbadiapa, lars, mmagr, mrunge, rmccabe
Target Milestone: z13Keywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-11 13:28:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2020-02-05 16:55:57 UTC
Description of problem:
collectd coredump on a ceph/compute node filling up the / filesystem and we got the coredumps of when it happened as well as a sosreport for that compute.



                "Log": [
                    {
                        "Start": "2020-02-04T13:57:42.139076467-08:00",
                        "End": "2020-02-04T13:57:42.73061108-08:00",
                        "ExitCode": 1,
                        "Output": "ERROR: Failed to connect to daemon at unix:/var/run/collectd-socket: Connection refused.\n"
                    },
                    {
                        "Start": "2020-02-04T13:58:12.73168984-08:00",
                        "End": "2020-02-04T13:58:13.349369442-08:00",
                        "ExitCode": 1,
                        "Output": "ERROR: Failed to connect to daemon at unix:/var/run/collectd-socket: Connection refused.\n"
                    },
                    {
                        "Start": "2020-02-04T13:58:43.349623844-08:00",
                        "End": "2020-02-04T13:58:43.922973732-08:00",
                        "ExitCode": 1,
                        "Output": "ERROR: Failed to connect to daemon at unix:/var/run/collectd-socket: Connection refused.\n"
                    },
                    {
                        "Start": "2020-02-04T13:59:13.923192814-08:00",
                        "End": "2020-02-04T13:59:14.532747201-08:00",
                        "ExitCode": 1,
                        "Output": "ERROR: Failed to connect to daemon at unix:/var/run/collectd-socket: Connection refused.\n"
                    },
                    {
                        "Start": "2020-02-04T13:59:44.533008885-08:00",
                        "End": "2020-02-04T13:59:45.070755975-08:00",
                        "ExitCode": 1,
                        "Output": "ERROR: Failed to connect to daemon at unix:/var/run/collectd-socket: Connection refused.\n"
                    }
                ]



                "architecture": "x86_64",
                "authoritative-source-url": "registry.access.redhat.com",
                "batch": "20190224.1",
                "build-date": "2019-04-09T13:29:51.238587",
                "com.redhat.build-host": "cpt-0013.osbs.prod.upshift.rdu2.redhat.com",
                "com.redhat.component": "openstack-collectd-container",
                "com.redhat.license_terms": "https://www.redhat.com/licenses/eulas",
                "config_data": "{\"healthcheck\": {\"test\": \"/openstack/healthcheck\"}, \"image\": \"satpol01.mgmt:5000/t-mobile_magentabox-production-composite_openstack_13-osp13_containers-collectd:latest\", \"pid\": \"host\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=17cbf15232999c8cd2ddfba815cea138\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/collectd.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/collectd/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/collectd:/var/log/collectd:rw\", \"/var/run/openvswitch:/var/run/openvswitch:ro\", \"/var/run/ceph:/var/run/ceph:ro\", \"/var/run/libvirt:/var/run/libvirt:ro\"], \"net\": \"host\", \"privileged\": true, \"restart\": \"always\"}",
                "config_id": "tripleo_step5",
                "container_name": "collectd",
                "description": "Red Hat OpenStack Platform 13.0 collectd",
                "distribution-scope": "public",
                "io.k8s.description": "Red Hat OpenStack Platform 13.0 collectd",
                "io.k8s.display-name": "Red Hat OpenStack Platform 13.0 collectd",
                "io.openshift.tags": "rhosp osp openstack osp-13.0",
                "managed_by": "paunch",
                "name": "rhosp13/openstack-collectd",
                "release": "61.1554788831",
                "summary": "Red Hat OpenStack Platform 13.0 collectd",
                "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-collectd/images/13.0-61.1554788831",
                "vcs-ref": "33ba785c229d2855db5cd8e83f2303fbf12c450f",
                "vcs-type": "git",
                "vendor": "Red Hat, Inc.",
                "version": "13.0"

Version-Release number of selected component (if applicable):


How reproducible:
This time

Steps to Reproduce:
1. It coredumped
2.
3.

Actual results:
It coredumped

Expected results:
Shouldn't coredump

Additional info:

Comment 5 Matthias Runge 2020-05-06 15:25:45 UTC
Is this still an issue or can we close this? It isn't really reproducible from our side.

Comment 6 David Hill 2020-05-11 12:11:00 UTC
Case is closed so go ahead, close this BZ too .

Comment 7 Matthias Runge 2020-05-11 13:28:57 UTC
thank you, will do.

Comment 8 Alexander Stafeyev 2020-11-26 08:13:37 UTC
(In reply to Matthias Runge from comment #5)
> Is this still an issue or can we close this? It isn't really reproducible
> from our side.

Hi Matthias, on which zstream you tried to reproduce ? 
Thanks