fixed in PR origin-aggregated-logging/pull/559
koji_builds: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=587924 repositories: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.6-rhel-7-docker-candidate-23619-20170823203852 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.173.0.5 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.173.0.5-10
reassign to @juzhao as he is the trello card owner
@Noriko I have two questions: 1. Do we still have openshift_logging_use_mux_client parameter? From PR https://github.com/openshift/openshift-ansible/pull/4554/ for https://bugzilla.redhat.com/show_bug.cgi?id=1464024#c0 There is openshift_logging_use_mux_client: False in roles/openshift_logging_fluentd/defaults/main.yml But it can not be found now. ansilbe playbooks version openshift-ansible-playbooks-3.6.173.0.21-2.git.0.44a4038.el7.noarch I don't think it is necessary to test with mux, are you agree? 2. Since we only can use file buffer now, I think the only verification way for this defect is 1) stop fluentd pods for a while and restart them later. 2) verify fluentd is still able to communicate with Elasticsearch 3) verify the logs stored in file/pv could be retrieved by kibana, no log is missing. Do you have better way to verify it?
(In reply to Junqi Zhao from comment #4) > @Noriko > > I have two questions: > 1. Do we still have openshift_logging_use_mux_client parameter? > From PR https://github.com/openshift/openshift-ansible/pull/4554/ for > https://bugzilla.redhat.com/show_bug.cgi?id=1464024#c0 > > There is openshift_logging_use_mux_client: False in > roles/openshift_logging_fluentd/defaults/main.yml > > But it can not be found now. Right. In 3.6.1 we got rid of that setting because there are now multiple mux client modes: https://github.com/openshift/openshift-ansible/tree/master/roles/openshift_logging#mux---secure_forward-listener-service "openshift_logging_mux_client_mode: Values - minimal, maximal. Default is unset. Setting this value will cause the Fluentd node agent to send logs to mux rather than directly to Elasticsearch. The value maximal means that Fluentd will do as much processing as possible at the node before sending the records to mux. This is the current recommended way to use mux due to current scaling issues. The value minimal means that Fluentd will do no processing at all, and send the raw logs to mux for processing. We do not currently recommend using this mode, and ansible will warn you about this. " When testing mux, use `openshift_logging_mux_client_mode=maximal` > > ansilbe playbooks version > openshift-ansible-playbooks-3.6.173.0.21-2.git.0.44a4038.el7.noarch > > I don't think it is necessary to test with mux, are you agree? We would prefer testing with mux, as mux has no persistence at all, and is the most vulnerable for data loss. > > 2. Since we only can use file buffer now, I think the only verification way > for this defect is > 1) stop fluentd pods for a while and restart them later. > 2) verify fluentd is still able to communicate with Elasticsearch > 3) verify the logs stored in file/pv could be retrieved by kibana, no log is > missing. > > Do you have better way to verify it?
Verification steps: 1. Use mux to test, set the following parameters in inventory file openshift_logging_use_mux=true openshift_logging_mux_client_mode=maximal 2. Creat one project to populate logs. 3. Stop fluentd pods, and note down the last project logs in kibana 4. Wait for a while, and restart fluentd pods. 5. Check the subsequent logs after step 3, no logs is missing. Test env # openshift version openshift v3.6.173.0.21 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 Images: logging-curator-v3.6.173.0.21-15 logging-elasticsearch-v3.6.173.0.21-15 logging-fluentd-v3.6.173.0.28-1 logging-kibana-v3.6.173.0.21-15 logging-auth-proxy-v3.6.173.0.21-15
(In reply to Junqi Zhao from comment #6) > Verification steps: > 1. Use mux to test, set the following parameters in inventory file > openshift_logging_use_mux=true > openshift_logging_mux_client_mode=maximal > > 2. Creat one project to populate logs. > > 3. Stop fluentd pods, and note down the last project logs in kibana > > 4. Wait for a while, and restart fluentd pods. > > 5. Check the subsequent logs after step 3, no logs is missing. add step 6 6. Repeat step 3 to 5, make sure no log is missing > Test env > # openshift version > openshift v3.6.173.0.21 > kubernetes v1.6.1+5115d708d7 > etcd 3.2.1 > > Images: > logging-curator-v3.6.173.0.21-15 > logging-elasticsearch-v3.6.173.0.21-15 > logging-fluentd-v3.6.173.0.28-1 > logging-kibana-v3.6.173.0.21-15 > logging-auth-proxy-v3.6.173.0.21-15
@nhosoi one more question, in fluentd pod log, "'exclude1' parameter is deprecated" has reported in one defect, I want to ask is it right for the warn message, such as: 2017-09-01 03:28:29 -0400 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not We didn't see this before. # oc logs logging-fluentd-hnphl umounts of dead containers will fail. Ignoring... umount: /var/lib/docker/containers/*/shm: mountpoint not found 2017-09-01 03:28:26 -0400 [info]: reading config file path="/etc/fluent/fluent.conf" 2017-09-01 03:28:28 -0400 [warn]: 'exclude1' parameter is deprecated: Use <exclude> section 2017-09-01 03:28:28 -0400 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not 2017-09-01 03:28:29 -0400 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not 2017-09-01 03:28:29 -0400 [warn]: 'block' action stops input process until the buffer full is resolved. Check your pipeline this action is fit or not
Hi Junqi, Regarding exclude1, pr#629 is submitted and reviewed. Looking into 'block' action one, next. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049