Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2040694

Summary: Three upstream HTTPClientConfig struct fields missing in the operator
Product: OpenShift Container Platform Reporter: Lucas López Montero <llopezmo>
Component: MonitoringAssignee: Philip Gough <pgough>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: amuller, anpicker, aos-bugs, benjamin.alpert, erooth
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2041459 (view as bug list) Environment:
Last Closed: 2022-03-10 16:39:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2041459    

Description Lucas López Montero 2022-01-14 14:28:54 UTC
Description of problem:

Fields in upstream HTTPClientConfig struct [1] are 8, whereas they are 5 in the operator [2].


Component version: 

Prometheus Operator 4.9.


Actual results:

When a configuration corresponding to a missing field in the operator struct is performed on alertmanager-main secret, an error like the following is received:

prometheus-operator fails to parse this configuration due to an unknown field (follow_redirects):
level=error ts=2022-01-14T08:10:38.69780826Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/main\" failed: provision alertmanager configuration: base config from Secret could not be parsed: yaml: unmarshal errors:\n  line 4: field follow_redirects not found in type alertmanager.httpClientConfig\n  line 116: field follow_redirects not found in type alertmanager.httpClientConfig"


Expected results:

No error because both structs have the same fields.


Additional info:

The reason why there are two different versions of the same struct seems to be explained here [3][4][5].

Upstream version used by the operator is 0.29.0 [6].



[1] https://github.com/prometheus/common/blob/v0.29.0/config/http_config.go#L158-L179
[2] https://github.com/openshift/prometheus-operator/blob/release-4.9/pkg/alertmanager/types.go#L178-L184
[3] https://github.com/prometheus/alertmanager/issues/1985
[4] https://github.com/prometheus/alertmanager/pull/1804#issuecomment-482038079
[5] https://github.com/prometheus/alertmanager/blob/4017d1a478e546909a2b4b4173b6aab3e7dd2dbe/config/config.go#L53-L58
[6] https://github.com/openshift/prometheus-operator/blob/release-4.9/go.mod#L25

Comment 2 Philip Gough 2022-01-17 11:51:25 UTC
This issue should be resolved in 4.10 via https://github.com/prometheus-operator/prometheus-operator/pull/4333/

Comment 4 Junqi Zhao 2022-01-18 13:19:04 UTC
tested with 4.10.0-0.nightly-2022-01-17-182202 and followed steps in
https://docs.openshift.com/container-platform/4.9/monitoring/managing-alerts.html#applying-custom-alertmanager-configuration_managing-alerts
output the current Alertmanager configuration into file alertmanager.yaml, and edit to include http_config.follow_redirects
1. oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml
2. edit alertmanager.yaml, include http_config.follow_redirects
****************************
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: "false"
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: "true"
        http_config:
          follow_redirects: "true"
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical
****************************
3.  Apply the new configuration, no error for that
# oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml |  oc -n openshift-monitoring replace secret --filename=-
secret/alertmanager-main replaced

4. the new config file loaded,
# oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- cat /etc/alertmanager/config/alertmanager.yaml
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: "false"
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: "true"
        http_config:
          follow_redirects: "true"
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical
****************************
also the same result from
# oc -n openshift-monitoring get secret alertmanager-main -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d

but there is error in alertmanager
# oc -n openshift-monitoring logs -c alertmanager alertmanager-main-0 | tail
level=info ts=2022-01-18T13:16:46.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:46.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:16:51.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:51.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:16:56.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:16:56.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:17:01.568Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:17:01.568Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"
level=info ts=2022-01-18T13:17:06.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2022-01-18T13:17:06.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 4: cannot unmarshal !!str `false` into bool\n  line 26: cannot unmarshal !!str `true` into bool\n  line 28: cannot unmarshal !!str `true` into bool"

Comment 5 Junqi Zhao 2022-01-18 13:27:33 UTC
continue with Comment 4,
did not see the change in the ${alertmanager_route}/#/status, it remains the default configuration

Comment 6 Philip Gough 2022-01-18 13:28:33 UTC
I think the issue here is that you are evaluating bool as strings. I checked the following with amtool and it validates:

global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: false
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = critical
    target_matchers:
      - severity =~ warning|info
  - equal:
      - namespace
      - alertname
    source_matchers:
      - severity = warning
    target_matchers:
      - severity = info
receivers:
  - name: Default
  - name: Watchdog
  - name: Critical
  - name: webhook
    webhook_configs:
      - send_resolved: true
        http_config:
          follow_redirects: true
        url: http://gems-agent.gemcloud-system:8041/alert
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - matchers:
        - alertname = Watchdog
      receiver: Watchdog
    - matchers:
        - severity = critical
      receiver: Critical
    - receiver: webhook
      match:
        severity: critical

Comment 7 Junqi Zhao 2022-01-18 13:37:19 UTC
updated as Comment 6, no error in alertmanager container, and we could see the configuration loaded in ${alertmanager_route}/#/status page

Comment 10 errata-xmlrpc 2022-03-10 16:39:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056