1494612 – after upgrade to 3.6 from 3.5 log aggregation does not show recent logs

Bug 1494612 - after upgrade to 3.6 from 3.5 log aggregation does not show recent logs

Summary: after upgrade to 3.6 from 3.5 log aggregation does not show recent logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.6.z
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1520629 1525687
TreeView+	depends on / blocked

Reported:	2017-09-22 15:58 UTC by Steven Walter
Modified:	2021-08-30 13:01 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Fluentd fails to properly process messages when it is unable to determine the namespace and pod uuids Consequence: The logging pipeline spews lots message and sometimes blocks log flow to elasticsearch Fix: Check for the missing fields and set orphan the record if needed. Result: Logs continue to flow and orphaned records end up in an orphaned namespace.
Clone Of:
Clones:	1520629 (view as bug list)
Environment:
Last Closed:	2018-01-23 17:57:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ViaQ fluent-plugin-viaq_data_model pull 12	'None'	closed	bug 1494612. Orphan records missing namespace_name and/or namespace_id	2021-01-08 13:50:09 UTC
Github	openshift origin-aggregated-logging pull 856	'None'	closed	bug 1494612. Orphan records missing namespace_name and/or namespace_id	2021-01-08 13:50:07 UTC
Red Hat Knowledge Base (Solution)	3211281	None	None	None	2017-10-10 08:40:54 UTC
Red Hat Product Errata	RHBA-2018:0113	normal	SHIPPED_LIVE	OpenShift Container Platform 3.7 and 3.6 bug fix and enhancement update	2018-01-23 22:55:59 UTC

Description Steven Walter 2017-09-22 15:58:13 UTC

Description of problem:
No logs are showing after a certain date. Issue occurred after upgrade. After scaling down and up the fluentd pods, it allowed some logs from after the upgrade, but none after a certain point.


Version-Release number of selected component (if applicable):
logging-elasticsearch-v3.6.173.0.5-5
logging-fluentd-v3.6.173.0.21-17

How reproducible:
Unconfirmed


Actual results:
2017-09-21 22:06:00 +0200 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not
2017-09-22 00:39:02 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-22 00:39:03 +0200 error_class="TypeError" error="no implicit conversion of nil into String" plugin_id="object:11b86f8"
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:240:in `sub!'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:240:in `expand_param'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:130:in `block (2 levels) in write_objects'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:125:in `each'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:125:in `each_with_index'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:125:in `block in write_objects'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/buffer.rb:123:in `each'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/buffer.rb:123:in `block in msgpack_each'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/plugin/buf_file.rb:71:in `open'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/buffer.rb:120:in `msgpack_each'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.5/lib/fluent/plugin/out_elasticsearch_dynamic.rb:121:in `write_objects'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/output.rb:490:in `write'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/buffer.rb:354:in `write_chunk'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/buffer.rb:333:in `pop'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/output.rb:342:in `try_flush'
  2017-09-22 00:39:02 +0200 [warn]: /usr/share/gems/gems/fluentd-0.12.39/lib/fluent/output.rb:149:in `run'
2017-09-22 00:39:03 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-22 00:39:05 +0200 error_class="TypeError" error="no implicit conversion of nil into String" plugin_id="object:11b86f8"
  2017-09-22 00:39:04 +0200 [warn]: suppressed same stacktrace

2017-09-20 19:52:13 +0200 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not
2017-09-20 19:52:55 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-20 19:52:20 +0200 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1c7ae4c"

Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Clustername: logging-es
Clusterstate: GREEN
Number of nodes: 3
Number of data nodes: 3


Additional info:

May be related to https://bugzilla.redhat.com/show_bug.cgi?id=1486493 -- but no projects/namespaces were deleted. Issue occurred after an upgrade.

Comment 2 Rich Megginson 2017-09-29 01:58:23 UTC

I have a post about additional steps to get more information: https://richmegginson.livejournal.com/29741.html

Note: if you change the configmap to do this, be sure to change it back _immediately_ after reproducing the problem, as this will cause fluentd to spew a large volume of data.

Comment 4 Fulvio Carrus 2017-10-06 11:12:51 UTC

I've run into the same issue and as per comment 2, here's my output.
I think the problem is in one of the two json objects, but they look fine to me.
They both have a @timestamp and the kubernetes' one has both namespace_id and namespace_name.



2017-10-06 12:51:58 +0200 journal.system: {"systemd":{"t":{"MACHINE_ID":"00d1b79716a34614976af02f296f6e24","BOOT_ID":"f8d86ce7bcef4f1091852d9b49144f55","CAP_EFFECTIVE":"1fffffffff","CMDLINE":"/usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2","COMM":"openshift","EXE":"/usr/bin/openshift","GID":"0","HOSTNAME":"node02.domain.it","PID":"2466","SELINUX_CONTEXT":"system_u:system_r:init_t:s0","SYSTEMD_CGROUP":"/system.slice/atomic-openshift-node.service","SYSTEMD_SLICE":"system.slice","SYSTEMD_UNIT":"atomic-openshift-node.service","TRANSPORT":"stdout","UID":"0"},"u":{"SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"atomic-openshift-node"}},"hostname":"node02.domain.it","message":"I1006 11:16:17.251671    2466 operation_generator.go:609] MountVolume.SetUp succeeded for volume \"kubernetes.io/secret/fcd6174c-aa76-11e7-9286-005056ba13d4-deployer-token-6x17r\" (spec.Name: \"deployer-token-6x17r\") pod \"fcd6174c-aa76-11e7-9286-005056ba13d4\" (UID: \"fcd6174c-aa76-11e7-9286-005056ba13d4\").","pipeline_metadata":{"collector":{"ipaddr4":"10.128.5.100","ipaddr6":"fe80::858:aff:fe80:564","inputname":"fluent-plugin-systemd","name":"fluentd openshift","received_at":"2017-10-06T09:16:17.000000+00:00","version":"0.12.39 1.6.0"}},"level":"info","@timestamp":"2017-10-06T09:16:17.000000+00:00"}
2017-10-06 12:51:58 +0200 kubernetes.journal.container: {"docker":{"container_id":"aec5c57642c2547362fbe485cb57832d6ef4da957bcc3446c1d03f0beed9a834"},"kubernetes":{"container_name":"deployment","namespace_name":"imprese","pod_name":"imprese-2-deploy","namespace_id":"b93ecfd0-aa67-11e7-b8c1-005056ba5ce6"},"hostname":"node02.domain.it","message":"--> Scaling up imprese-2 from 0 to 1, scaling down imprese-1 from 1 to 0 (keep 1 pods available, don't exceed 2 pods)","level":"info","pipeline_metadata":{"collector":{"ipaddr4":"10.128.5.100","ipaddr6":"fe80::858:aff:fe80:564","inputname":"fluent-plugin-systemd","name":"fluentd openshift","received_at":"2017-10-06T09:16:17.846085+00:00","version":"0.12.39 1.6.0"}},"systemd":{"t":{"MACHINE_ID":"00d1b79716a34614976af02f296f6e24","BOOT_ID":"f8d86ce7bcef4f1091852d9b49144f55","CAP_EFFECTIVE":"1fffffffff","CMDLINE":"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --log-level=warn --ipv6=false --storage-driver overlay2 --mtu=1450 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com","COMM":"dockerd-current","EXE":"/usr/bin/dockerd-current","GID":"0","HOSTNAME":"node02.domain.it","PID":"1771","SELINUX_CONTEXT":"system_u:system_r:container_runtime_t:s0","SOURCE_REALTIME_TIMESTAMP":"1507281377846085","SYSTEMD_CGROUP":"/system.slice/docker.service","SYSTEMD_SLICE":"system.slice","SYSTEMD_UNIT":"docker.service","TRANSPORT":"journal","UID":"0"}},"@timestamp":"2017-10-06T09:16:17.846085+00:00"}
2017-10-06 12:51:59 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-10-06 12:51:59 +0200 error_class="TypeError" error="no implicit conversion of nil into String" plugin_id="object:19e6b04"

Comment 5 Miheer Salunke 2017-10-10 07:00:33 UTC

The following solves the issue. I saw something similar BZ related MUX which fails like this.

Uninstall logging ->
https://docs.openshift.com/container-platform/3.6/install_config/aggregate_logging.html#aggregate-logging-cleanup

Then put openshift_logging_image_version=v3.6.173.0.5 in the inventory.

Install logging with  https://docs.openshift.com/container-platform/3.6/install_config/aggregate_logging.html#deploying-the-efk-stack

Comment 6 Vladislav Walek 2017-10-31 14:40:42 UTC

Hello,

I have same case when I see that logs from some projects are not transferred to the elastic search.
However, it seems that it worked with the problematic version before, but now it doesn't work at all.

The current version marked as latest is v3.6.173.0.49-4, is the new image ok?

Thank you

Comment 7 Rich Megginson 2017-10-31 14:57:05 UTC

(In reply to Vladislav Walek from comment #6)
> Hello,
> 
> I have same case when I see that logs from some projects are not transferred
> to the elastic search.
> However, it seems that it worked with the problematic version before, but
> now it doesn't work at all.
> 
> The current version marked as latest is v3.6.173.0.49-4, is the new image ok?

If it isn't working for you, then it probably isn't ok.

> 
> Thank you

Comment 8 Takayoshi Tanaka 2017-11-01 07:43:55 UTC

Hello,

I'm working on the case 01960527, commented by Vladislav before. As you said v3.6.173.0.49-4, marked as the latest, is not OK, could you tell us why you think so?

Since the customer is facing the product issue, we should fix it soon. If rolling back to the old version: for example v3.6.173.0.5 which Miheer tried, is a valid workaround, we can recommend to the customer. However, it has not a valid reason at this point.

Regards,

Comment 9 Rich Megginson 2017-11-01 12:48:10 UTC

(In reply to Takayoshi Tanaka from comment #8)
> Hello,
> 
> I'm working on the case 01960527, commented by Vladislav before. As you said
> v3.6.173.0.49-4, marked as the latest, is not OK, could you tell us why you
> think so?

I don't know.  But if it isn't working, then it isn't OK.

> 
> Since the customer is facing the product issue, we should fix it soon. If
> rolling back to the old version: for example v3.6.173.0.5 which Miheer
> tried, is a valid workaround, we can recommend to the customer. However, it
> has not a valid reason at this point.
> 
> Regards,

We have later versions too - the latest version is https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=617058

logging-fluentd-docker-v3.6.173.0.63-2

which may also fix the problem (I don't know because I don't know what the problem is)

Comment 10 Vladislav Walek 2017-11-01 14:44:35 UTC

Hello Rich,

the issue is that the logs are not send to the elastic search.
The case I am working on is showing:

- there are 2 app nodes. n01 and n02.
- there 2 projects running on those(2 pods each) - p01 and p02
- from p01 when pod is running on n01 you can see logs, but only till 30th of October
- from p01 when pod running on n02 you can't see the logs
- from the p02 if running on n01 or n02 you can't see anything
- the logs will be deleted after 7 days

We can see logs on the pod, but can't see the index in the elastic search - both p01 and p02 are generating logs from 27th till 31th of October - in elastic search, only part of the logs are in kibana.

The version 49-4 won't fix the issue. We tried to run the container, but no logs were sent to ES.

Any thoughts?

Comment 11 Vladislav Walek 2017-11-01 15:09:16 UTC

Hello,

also the latest version in registry available for customer is 49-4.
The one mentioned in Comment #9 is not available.

Thank you

Comment 12 Vladislav Walek 2017-11-01 15:48:39 UTC

(In reply to Miheer Salunke from comment #5)
> The following solves the issue. I saw something similar BZ related MUX which
> fails like this.
> 
> Uninstall logging ->
> https://docs.openshift.com/container-platform/3.6/install_config/
> aggregate_logging.html#aggregate-logging-cleanup
> 
> Then put openshift_logging_image_version=v3.6.173.0.5 in the inventory.
> 
> Install logging with 
> https://docs.openshift.com/container-platform/3.6/install_config/
> aggregate_logging.html#deploying-the-efk-stack

I am trying to check with customer if the uninstall is necessary as the issue could be with fluentd.
Change only the image tag in the daemonset from v3.6 (which points to latest) to v3.6.173.0.5 (which is alias for tag 3.6.173.0.5-5)

Comment 13 Rich Megginson 2017-11-01 15:59:26 UTC

(In reply to Vladislav Walek from comment #11)
> Hello,
> 
> also the latest version in registry available for customer is 49-4.
> The one mentioned in Comment #9 is not available.
> 
> Thank you

Right. I'm not sure when it will be going out - 3.6.3, etc.

Comment 14 Miheer Salunke 2017-11-02 07:30:40 UTC

1) If it OK with you
Uninstall logging ->
https://docs.openshift.com/container-platform/3.6/install_config/
aggregate_logging.html#aggregate-logging-cleanup

Then put openshift_logging_image_version=v3.6.173.0.5 in the inventory.

Install logging with
https://docs.openshift.com/container-platform/3.6/install_config/
aggregate_logging.html#deploying-the-efk-stack

OR

2)

The version of the fluentd could be changed in the daemon set to the tag you 
want to pull - in this case v3.6.173.0.5 (remember that it must be identical 
to the tag in registry [1] )

Then just deleting the pod, the pod should be automatically deployed with the 
specified version of image.

The second option was tried but still it fails.

Rich Sir any suggestions on how to move ahead with this issue ?

Comment 20 Rich Megginson 2017-11-02 21:18:25 UTC

 I have 2 questions.

In the fluentd pod:

    oc rsh $FLUENTDPOD

Do we have a filter-post-z-* config file in /etc/fluent/configs.d?
# ls /etc/fluent/configs.d/openshift/filter-post-z-*
/etc/fluent/configs.d/openshift/filter-post-z-retag-two.conf

Also, how does the fluentd's configmap look like?
oc edit configmap $FLUENTDPOD

Does the configmap have <label @OUTPUT> as follows?
8<-----------------------------------------------------------------------------------------
    <label @INGRESS>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      @include configs.d/openshift/filter-retag-journal.conf
      @include configs.d/openshift/filter-k8s-meta.conf
      @include configs.d/openshift/filter-kibana-transform.conf
      @include configs.d/openshift/filter-k8s-flatten-hash.conf
      @include configs.d/openshift/filter-k8s-record-transform.conf
      @include configs.d/openshift/filter-syslog-record-transform.conf
      @include configs.d/openshift/filter-viaq-data-model.conf
      @include configs.d/openshift/filter-post-*.conf
    ##
    </label>

    <label @OUTPUT>
    ## matches
      @include configs.d/openshift/output-pre-*.conf
      @include configs.d/openshift/output-operations.conf
      @include configs.d/openshift/output-applications.conf
      # no post - applications.conf matches everything left
    ##
    </label>
8<-----------------------------------------------------------------------------------------

If there is no filter-post-z-* config file in /etc/fluent/configs.d/openshift, please remove </label> and <label @OUTPUT> as follows:
8<-----------------------------------------------------------------------------------------
    <label @INGRESS>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      @include configs.d/openshift/filter-retag-journal.conf
      @include configs.d/openshift/filter-k8s-meta.conf
      @include configs.d/openshift/filter-kibana-transform.conf
      @include configs.d/openshift/filter-k8s-flatten-hash.conf
      @include configs.d/openshift/filter-k8s-record-transform.conf
      @include configs.d/openshift/filter-syslog-record-transform.conf
      @include configs.d/openshift/filter-viaq-data-model.conf
      @include configs.d/openshift/filter-post-*.conf
    ##

    ## matches
      @include configs.d/openshift/output-pre-*.conf
      @include configs.d/openshift/output-operations.conf
      @include configs.d/openshift/output-applications.conf
      # no post - applications.conf matches everything left
    ##
    </label>
8<-----------------------------------------------------------------------------------------

If you have the filter-post-z-* config file in /etc/fluent/configs.d/openshift and do not have </label> and <label @OUTPUT>, please add them.  (I don't think that's the case since the fluentd run.sh does not install filter-post-z-* unless <label @OUTPUT> is found in the configmap.)

Thanks,
--noriko

Comment 24 Nicolas Nosenzo 2017-11-03 14:58:10 UTC

(In reply to Rich Megginson from comment #20)
>  I have 2 questions.
> 
> In the fluentd pod:
> 
>     oc rsh $FLUENTDPOD
> 
> Do we have a filter-post-z-* config file in /etc/fluent/configs.d?
> # ls /etc/fluent/configs.d/openshift/filter-post-z-*
> /etc/fluent/configs.d/openshift/filter-post-z-retag-two.conf
> 
> Also, how does the fluentd's configmap look like?
> oc edit configmap $FLUENTDPOD
> 
> Does the configmap have <label @OUTPUT> as follows?
> 8<---------------------------------------------------------------------------
> --------------
>     <label @INGRESS>
>     ## filters
>       @include configs.d/openshift/filter-pre-*.conf
>       @include configs.d/openshift/filter-retag-journal.conf
>       @include configs.d/openshift/filter-k8s-meta.conf
>       @include configs.d/openshift/filter-kibana-transform.conf
>       @include configs.d/openshift/filter-k8s-flatten-hash.conf
>       @include configs.d/openshift/filter-k8s-record-transform.conf
>       @include configs.d/openshift/filter-syslog-record-transform.conf
>       @include configs.d/openshift/filter-viaq-data-model.conf
>       @include configs.d/openshift/filter-post-*.conf
>     ##
>     </label>
> 
>     <label @OUTPUT>
>     ## matches
>       @include configs.d/openshift/output-pre-*.conf
>       @include configs.d/openshift/output-operations.conf
>       @include configs.d/openshift/output-applications.conf
>       # no post - applications.conf matches everything left
>     ##
>     </label>
> 8<---------------------------------------------------------------------------
> --------------
> 
> If there is no filter-post-z-* config file in
> /etc/fluent/configs.d/openshift, please remove </label> and <label @OUTPUT>
> as follows:
> 8<---------------------------------------------------------------------------
> --------------
>     <label @INGRESS>
>     ## filters
>       @include configs.d/openshift/filter-pre-*.conf
>       @include configs.d/openshift/filter-retag-journal.conf
>       @include configs.d/openshift/filter-k8s-meta.conf
>       @include configs.d/openshift/filter-kibana-transform.conf
>       @include configs.d/openshift/filter-k8s-flatten-hash.conf
>       @include configs.d/openshift/filter-k8s-record-transform.conf
>       @include configs.d/openshift/filter-syslog-record-transform.conf
>       @include configs.d/openshift/filter-viaq-data-model.conf
>       @include configs.d/openshift/filter-post-*.conf
>     ##
> 
>     ## matches
>       @include configs.d/openshift/output-pre-*.conf
>       @include configs.d/openshift/output-operations.conf
>       @include configs.d/openshift/output-applications.conf
>       # no post - applications.conf matches everything left
>     ##
>     </label>
> 8<---------------------------------------------------------------------------
> --------------
> 
> If you have the filter-post-z-* config file in
> /etc/fluent/configs.d/openshift and do not have </label> and <label
> @OUTPUT>, please add them.  (I don't think that's the case since the fluentd
> run.sh does not install filter-post-z-* unless <label @OUTPUT> is found in
> the configmap.)
> 
> Thanks,
> --noriko

@Rich, they don't such a filter, neither a <label @INGRESS> within the configmap

Comment 25 Noriko Hosoi 2017-11-03 15:55:14 UTC

(In reply to Nicolas Nosenzo from comment #24)
> @Rich, they don't such a filter, neither a <label @INGRESS> within the
> configmap

@Nicolas, how about <label @OUTPUT>?

Comment 26 Nicolas Nosenzo 2017-11-03 15:58:19 UTC

(In reply to Noriko Hosoi from comment #25)
> (In reply to Nicolas Nosenzo from comment #24)
> > @Rich, they don't such a filter, neither a <label @INGRESS> within the
> > configmap
> 
> @Nicolas, how about <label @OUTPUT>?

@Noriko, I meant <label @OUTPUT>, sorry.

Comment 34 Miheer Salunke 2017-11-09 01:00:36 UTC

Following things were done and had happened ->

Customer was told to as per this ->

https://bugzilla.redhat.com/show_bug.cgi?id=1494612#c14


still one fluentd did not send logs to the elastic search.
" error_class="TypeError" error="no implicit conversion of nil into String" plugin_id=""


So then we did the following ->
Stop fluentd

oc label node $nodename logging-infra-fluentd-


Delete stale buffer files in ls -al /var/lib/fluentd 


oc label node $nodename logging-infra-fluentd=true


Then fluentd was again up.

Comment 35 Rich Megginson 2017-11-09 03:07:21 UTC

(In reply to Miheer Salunke from comment #34)
> Following things were done and had happened ->
> 
> Customer was told to as per this ->
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1494612#c14
> 
> 
> still one fluentd did not send logs to the elastic search.
> " error_class="TypeError" error="no implicit conversion of nil into String"
> plugin_id=""
> 
> 
> So then we did the following ->
> Stop fluentd
> 
> oc label node $nodename logging-infra-fluentd-
> 
> 
> Delete stale buffer files in ls -al /var/lib/fluentd 

We were only able to do this because these were infra node logs that were older than the retention policy.  This is not a general purpose solution.

The customer saved the stale buffer files so that we can do further analysis.

> 
> 
> oc label node $nodename logging-infra-fluentd=true
> 
> 
> Then fluentd was again up.

Comment 37 Steven Walter 2017-12-04 22:14:30 UTC

For customers hitting this on 3.5 is there a confirmed 3.5 image to use as a workaround that does not exhibit the behavior?

If not is it safe to use  3.6.173.0.5 -- in particular, is it safe to use 3.6.173.0.5 when you only care about exporting logs to an external ELK stack?

Comment 38 Steven Walter 2017-12-04 23:26:04 UTC

The "working" version in 3.6 is part of this release: https://docs.openshift.com/container-platform/3.6/release_notes/ocp_3_6_release_notes.html#ocp-3-6-rhba-2017-1829 -- and it is later releases causing issue.

The closest equivalent in 3.5 is here: https://docs.openshift.com/container-platform/3.5/release_notes/ocp_3_5_release_notes.html#ocp-3-5-rhba-2017-1828

The logging images associated with that release are:

openshift3/logging-auth-proxy:3.5.0-28
openshift3/logging-elasticsearch:3.5.0-37
openshift3/logging-fluentd:3.5.0-26
openshift3/logging-kibana:3.5.0-30

Does this seem correct?

Comment 39 Steven Walter 2017-12-05 00:27:07 UTC

Updating priority as this is:

- Part of prio-list thread
- Causing customers to need to user earlier images to avoid the bug, causing them to be vulnerable to bugs that are fixed in later images

Comment 41 openshift-github-bot 2017-12-14 10:43:04 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/3459ae8165c856539aa2fb10ce8c2649f2d1a395
bug 1494612. Orphan records missing namespace_name and/or namespace_id

https://github.com/openshift/origin-aggregated-logging/commit/064628dd44f5731d0947749bece9d27d9a45157d
Merge pull request #856 from jcantrill/1494612_orphan_namespaces

Automatic merge from submit-queue.

bug 1494612. Orphan records missing namespace_name and/or namespace_id

Comment 43 Anping Li 2017-12-20 11:39:53 UTC

Jeff,
For test purpose, is there any good method to create Orphan records missing namespace_name and/or namespace_id ?

Comment 44 Anping Li 2018-01-08 08:22:27 UTC

Verified, the .orphaned records are index in .orphaned project when use openshift3/logging-fluentd/images/v3.6.173.0.95-1

green  open   .orphaned.2018.01.08                                                   1   0         94            0     97.2kb         97.2kb

Comment 47 errata-xmlrpc 2018-01-23 17:57:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113

Note You need to log in before you can comment on or make changes to this bug.