Bug 1660598

Summary: HostAlreadyClaimed route issue on path based route
Product: OpenShift Container Platform Reporter: Tom Manor <tmanor>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akiyoshi.yonekura, alchan, aos-bugs, cewang, hongli, jack.ottofaro, jfiala, jinjli, jolee, jupierce, knakai, knakayam, openshift-bugs-escalate, scuppett, stwalter, weliang, zhiwang
Version: 3.11.0Keywords: NeedsTestCase
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1669919 1670072 1698820 1748489 (view as bug list) Environment:
Last Closed: 2019-02-20 14:11:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1669919, 1670072, 1698820, 1748489    
Attachments:
Description Flags
reproducer none

Description Tom Manor 2018-12-18 19:00:50 UTC
Created attachment 1515407 [details]
reproducer

Description of problem:
Running into HostAlreadyClaimed route issue when updating the path.  Occurs after 3rd update to the path.  Restarting router pod clears error and allows route to work.

Version-Release number of selected component (if applicable):
v3.11.43

How reproducible:
Very reproducible, attached is reproducer script

Steps to Reproduce:
1. Start with clean cluster

2. Ensure to isolate to 1 router

3. Deploy simple project/app with route
    3a. From web console, Create Project (name=phphelloworld, displayname=phpHelloWorld)
    3b. From catalog create PHP app (application name=php-hello, git repo=https://github.com/RedHatTraining/DO081x-lab-php-helloworld)

4. Verify web page opens via browser

5. From master node, patch the route
    5a. oc project phphelloworld
    5b. oc get route
    5c. oc patch route php-hello -p ‘{“spec”:{“path”:”/hol”}}’
    5d. oc get route
    5e. oc patch route php-hello -p ‘{“spec”:{“path”:””}}’
    5f. oc get route
    5g. oc patch route php-hello -p ‘{“spec”:{”path”:”/hol”}}’  #Error occurs here
    5h. oc get route   #See HostAlreadyClaimed here
    5i. curl 'http://php-hello-phphelloworld.apps.domain.com' fails
    5j. oc patch route php-hello -p ‘{“spec”:{“path”:””}}’  #Route resets
    5k. oc get route
    5l. curl 'http://php-hello-phphelloworld.apps.domain.com' works

In order to fully remove issue, router pod must be restarted.


Actual results:
After 3rd 'oc patch route' command the route errors with HostAlreadyClaimed and route will not work


Expected results:
The route should not encounter HostAlreadyClaimed error

Additional info:

Comment 2 Weibin Liang 2019-01-18 19:35:40 UTC
Follow above steps, this bug can be easily reproduced in v3.11.43

Comment 3 Hongan Li 2019-01-21 06:55:41 UTC
more testing results below:

1. create route with a path and update the host several times, cannot reproduce
   # oc expose svc service-unsecure --name=testpath --hostname=abc.test.example.com --path=/test
   # oc patch route testpath -p '{"spec":{"host":"abc.testpath.example.com"}}'
   # oc patch route testpath -p '{"spec":{"host":"abc.test.example.com"}}'
   # oc patch route testpath -p '{"spec":{"host":"abc.testpath.example.com"}}'
   # oc patch route testpath -p '{"spec":{"host":"abc.test.example.com"}}'
   .......

2. create route without path and update the host several times, cannot reproduce
   # oc expose svc service-unsecure --name=tt --hostname=abc.tt.example.com
   # oc patch route tt -p '{"spec":{"host":"abc.ttnopath.example.com"}}'
   # oc patch route tt -p '{"spec":{"host":"abc.tt.example.com"}}'
   # oc patch route tt -p '{"spec":{"host":"abc.ttnopath.example.com"}}'
   # oc patch route tt -p '{"spec":{"host":"abc.tt.example.com"}}'
   ......

3. create route with a path and update the path several times, the issue occurred at the 3rd update. 
   # oc expose svc service-unsecure --name=testpath --hostname=abc.test.example.com --path=/test
   # oc patch route testpath -p '{"spec":{"path":"/path"}}'
   # oc patch route testpath -p '{"spec":{"path":"/test"}}'
   # oc patch route testpath -p '{"spec":{"path":"/path"}}'            ### -->>> issue here

Version: v3.11.43

Comment 4 Hongan Li 2019-01-21 07:55:38 UTC
one more testing:

create a route and update the path to different value each time, no issue;
update the path to the begging value when the route was created, no issue; 
if update path to any value has been used, then issue occurred.

test steps:
# oc expose svc service-unsecure --name=tt --hostname=abc.tt.example.com         ### no path specified
# oc patch route tt -p '{"spec":{"path":"/one"}}'
# oc patch route tt -p '{"spec":{"path":"/two"}}'
# oc patch route tt -p '{"spec":{"path":"/three"}}'
# oc patch route tt -p '{"spec":{"path":"/four"}}'
# oc patch route tt -p '{"spec":{"path":"/five"}}'
# oc patch route tt -p '{"spec":{"path":""}}'           ### update to beginning value, no issue
# oc patch route tt -p '{"spec":{"path":"/six"}}'
# oc patch route tt -p '{"spec":{"path":"/two"}}'       ### issue occurred, show "HostAlreadyClaimed" when "oc get route"
# oc patch route tt -p '{"spec":{"path":""}}'           ### update to beginning value,  "HostAlreadyClaimed" is gone

Comment 5 Dan Mace 2019-01-21 19:25:30 UTC
Thanks for the reproducers. We'll investigate.

Comment 6 Dan Mace 2019-01-21 23:08:11 UTC
I've made some significant progress on a root cause diagnosis and will provide another update tomorrow (Jan. 22).

Comment 7 Stephen Cuppett 2019-01-22 14:45:25 UTC
*** Bug 1663423 has been marked as a duplicate of this bug. ***

Comment 8 Dan Mace 2019-01-22 22:06:39 UTC
Just a brief update:

We've made more progress today understanding the root cause of the problem. The bug is essentially due to an internal data consistency issue in the route controller process. Unfortunately, we don't yet have the level of clarity we need to:

1. Comprehensively describe the exact failure modes
2. Provide a reliable fix

We'll need more time to follow our current lines of investigation to their conclusions. Progress will pick up again tomorrow (EST time).

Comment 10 Dan Mace 2019-01-23 16:42:36 UTC
Proposed fix: https://github.com/openshift/router/pull/9

Comment 18 Hongan Li 2019-02-11 05:40:53 UTC
verified with atomic-openshift-3.11.82-1.git.0.08bc31b.el7 and the issue has been fixed.

Comment 21 errata-xmlrpc 2019-02-20 14:11:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326

Comment 22 Steven Walter 2019-09-26 21:39:10 UTC
*** Bug 1723527 has been marked as a duplicate of this bug. ***