Bug 2040694 - Three upstream HTTPClientConfig struct fields missing in the operator
Summary: Three upstream HTTPClientConfig struct fields missing in the operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Philip Gough
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 2041459
TreeView+ depends on / blocked
 
Reported: 2022-01-14 14:28 UTC by Lucas López Montero
Modified: 2023-01-03 11:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2041459 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:39:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6665831 0 None None None 2022-01-25 08:38:54 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:39:54 UTC

Description Lucas López Montero 2022-01-14 14:28:54 UTC
Description of problem:

Fields in upstream HTTPClientConfig struct [1] are 8, whereas they are 5 in the operator [2].


Component version: 

Prometheus Operator 4.9.


Actual results:

When a configuration corresponding to a missing field in the operator struct is performed on alertmanager-main secret, an error like the following is received:

prometheus-operator fails to parse this configuration due to an unknown field (follow_redirects):
level=error ts=2022-01-14T08:10:38.69780826Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/main\" failed: provision alertmanager configuration: base config from Secret could not be parsed: yaml: unmarshal errors:\n  line 4: field follow_redirects not found in type alertmanager.httpClientConfig\n  line 116: field follow_redirects not found in type alertmanager.httpClientConfig"


Expected results:

No error because both structs have the same fields.


Additional info:

The reason why there are two different versions of the same struct seems to be explained here [3][4][5].

Upstream version used by the operator is 0.29.0 [6].



[1] https://github.com/prometheus/common/blob/v0.29.0/config/http_config.go#L158-L179
[2] https://github.com/openshift/prometheus-operator/blob/release-4.9/pkg/alertmanager/types.go#L178-L184
[3] https://github.com/prometheus/alertmanager/issues/1985
[4] https://github.com/prometheus/alertmanager/pull/1804#issuecomment-482038079
[5] https://github.com/prometheus/alertmanager/blob/4017d1a478e546909a2b4b4173b6aab3e7dd2dbe/config/config.go#L53-L58
[6] https://github.com/openshift/prometheus-operator/blob/release-4.9/go.mod#L25

Comment 2 Philip Gough 2022-01-17 11:51:25 UTC
This issue should be resolved in 4.10 via https://github.com/prometheus-operator/prometheus-operator/pull/4333/

Comment 4 Junqi Zhao 2022-01-18 13:19:04 UTC
tested with 4.10.0-0.nightly-2022-01-17-182202 and followed steps in
https://docs.openshift.com/container-platform/4.9/monitoring/managing-alerts.html#applying-custom-alertmanager-configuration_managing-alerts
output the current Alertmanager configuration into file alertmanager.yaml, and edit to include http_config.follow_redirects
1. oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml
2. edit alertmanager.yaml, include http_config.follow_redirects
****************************
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: "false"
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: "true"
        http_config:
          follow_redirects: "true"
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical
****************************
3.  Apply the new configuration, no error for that
# oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml |  oc -n openshift-monitoring replace secret --filename=-
secret/alertmanager-main replaced

4. the new config file loaded,
# oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- cat /etc/alertmanager/config/alertmanager.yaml
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: "false"
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: "true"
        http_config:
          follow_redirects: "true"
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical
****************************
also the same result from
# oc -n openshift-monitoring get secret alertmanager-main -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d

but there is error in alertmanager
# oc -n openshift-monitoring logs -c alertmanager alertmanager-main-0 | tail
level=info ts=2022-01-18T13:16:46.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:46.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:16:51.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:51.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:16:56.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:56.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:17:01.568Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:17:01.568Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:17:06.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:17:06.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"

Comment 5 Junqi Zhao 2022-01-18 13:27:33 UTC
continue with Comment 4,
did not see the change in the ${alertmanager_route}/#/status, it remains the default configuration

Comment 6 Philip Gough 2022-01-18 13:28:33 UTC
I think the issue here is that you are evaluating bool as strings. I checked the following with amtool and it validates:

global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: false
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: true
        http_config:
          follow_redirects: true
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical

Comment 7 Junqi Zhao 2022-01-18 13:37:19 UTC
updated as Comment 6, no error in alertmanager container, and we could see the configuration loaded in ${alertmanager_route}/#/status page

Comment 10 errata-xmlrpc 2022-03-10 16:39:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.