Smarter routing in Spring Cloud Gateway

Another day, another Spring Cloud Gateway story. I've previously written about customizing its underlying Netty server and how the framework handles downstream HTTP requests. This time around I'll be taking a look at the heart of Spring Cloud Gateway: its routing system.

Use-case

At work I've been running a Spring Cloud Gateway app in production for over a year and a half. By now we are exposing most of our 500+ API endpoints through it. With so many endpoints we encountered some tricky situations. One of them is overlapping path definitions. Here's an example:

Imagine a Foo product implemented by a foo-service application. Since all foo related endpoints are handled by the same app all it needs is a single Cloud Gateway route:

builder.routes()
  .route(r -> r.path("/foo/**").uri(fooServiceUri))
  .build();

Spring Cloud Gateway uses Ant-style matching here, so ** matches zero or more path segments.

As time goes by a new version of the product is designed. new-foo-serice app is built to implement it. The new app exposes /foo/2/** endpoints. Finally, in order to enable independent scaling, /foo/2/bars endpoint is extracted into a separate bars-service app.

Downstream app Path pattern
foo-service /foo/**
new-foo-service /foo/2/**
bars-service /foo/2/bars

In this case a request to GET /foo/2/bars matches all 3 routes.

The problem

This by itself is not a problem, as Cloud Gateway has a mechanism for handling such situations. The Route class implements the Ordered interface. Gateway picks a route with the highest precedence to handle an incoming request. This seems flexible and is easy to configure using the route builder. However, it does not solve my problem.

In my production application I don't know all the routes in advance. They are discovered and updated at runtime, one downstream app at a time. This means that my gateway loads routes in batches and only updates the ones that have actually changed. This is a problem, because order of each individual route needs to be defined at load time and depends on all the other routes.

Imagine routes loading happens like this:

  1. foo-service routes load first, and /foo/** route is given order 0.
  2. bars-service loads next. Since its path /foo/2/bars is more exact its route is given order -1, which has higher precedence.
  3. new-foo-service loads last. At this point there's no order that could be given to its /foo/2/** route that would correctly position it between the two previously loaded routes.

In this case a “solution” might be to load downstream apps in a particular order. However, it's easy to construct a counter example in which an app could have both more exact route in one hierarchy and a more general one in another. This makes the app ordering solution insufficient.

How Cloud Gateway works

Cloud Gateway itself is built on top of Spring WebFlux. As such, there are two key classes that implement handing of all incoming HTTP requests:

RoutePredicateHandlerMapping is of interest here. More specifically, its lookupRoute method. Here's what the method looks like with logging, error and empty state handling removed:

protected Mono<Route> lookupRoute(ServerWebExchange exchange) {
    // start with an ordered list of all registered routes:
    return this.routeLocator.getRoutes()
        .concatMap(route -> Mono.just(route)
                // check if a route matches the request:
                .filterWhen(r -> r.getPredicate().apply(exchange))
        )
        // take the first route that matches:
        .next()
        .map(route -> {
            // by default all routes are valid
            validateRoute(route, exchange);
            return route;
        });
}

The trick here is in the .next() operator. Because of it Cloud Gateway always picks the first route that matches the request. Also of note is that the lookupRoute method is protected!

Solution

This allowed me to implement my own handler mapping class by extending the RoutePredicateHandlerMapping and overriding the lookupRoute method. The change was simple: I sorted the matching routes so that “more exact” ones come first. The rest of the implementation is the same:

protected Mono<Route> lookupRoute(ServerWebExchange exchange) {
    return this.routeLocator.getRoutes()
        .concatMap(/* filter routes */)
        // sort routes so that more exact ones come first:
        .sort(pathBasedRouteComparator)
        .next() // pick the first route, like before
        .map(/* validate picked route */);
}

What's left to determine is what precisely “more exact” route means. For routes R1 with path /foo/** and R2 with path /foo/2/** I defined R2 to be more exact because pattern /foo/** matches the literal path /foo/2/** , while the opposite is not true. This is how I implemented it in a comparator:

@Component
public class PathBasedRouteComparator implements Comparator<Route> {
    @Override
    public int compare(Route r1, Route r2) {
        var info1 = (PathInfo) r1.getMetadata().get(PathInfo.KEY);
        var info2 = (PathInfo) r2.getMetadata().get(PathInfo.KEY);
        if (info1 == null || info2 == null) {
            // fall back to native route ordering:
            return compareOrders(r1, r2);
        }
        var r1MatchesR2 = info1.getPattern().matches(info2.getPath());
        var r2MatchesR1 = info2.getPattern().matches(info1.getPath());
        if (r1MatchesR2 && !r2MatchesR1) {
            return 1;
        }
        if (r2MatchesR1 && !r1MatchesR2) {
            return -1;
        }
        // fall back to native route ordering:
        return compareOrders(r1, r2);
    }
    private int compareOrders(Route r1, Route r2) {
        return Integer.compare(r1.getOrder(), r2.getOrder());
    }
}

For this comparator to work routes need to have a PathInfo added to their metadata. This can be done at route loading time because, unlike the order, path info depends only on the individual route that's currently loading.

Alternative solution

This customization allowed Cloud Gateway application to handle any convoluted path overriding that API developers could came up with. It did, however, come with a runtime performance cost. With the new routing logic all loaded routs are matched against each incoming request. As expected, the lower latency percentiles were impacted by this. They represent situations in which request matched one of the high precedence routes, so a lot of matching was skipped in the original implementation.

For use-cases where routes are either known in advanced or are updated all at once I'd recommend using the original, order based, routing logic. In this case route order can be determined upfront at route loading time. The comparator logic can be reused to calculate the route order. This solution would have no impact on the runtime performance.

Example

I've created a demo project on GitHub that illustrates the solution using the custom handler mapping. Feel free to check out the project and play around with it. If you find a better solution, please let me know about it! You can reach me at: