Open Source GraphQL CDN / Edge Cache with Cloudflare, Fastly, and Fly.io

Open Source GraphQL CDN / Edge Cache with Cloudflare, Fastly, and Fly.io

This blog post was written by Jens Neuse, Founder and CEO and Dustin Deus, Co-Founder and Tech Lead at WunderGraph. WunderGraph is now Hiring ๐Ÿš€

ยท

14 min read

We've recently announced that WunderGraph is now fully open source. Today, we'd like to explain how you can leverage our API Developer Framework to add Edge Caching to your GraphQL APIs without locking yourself into a specific vendor.

Caching GraphQL on the Edge should be vendor-agnostic.

Services like Akamai and GraphCDN / Stellate offer proprietary solutions to solving this problem. We'll compare the different approaches and their tradeoffs.

Why did we create proprietary GraphQL CDN solutions?

A good question to start with is why we've created proprietary GraphQL CDN solutions in the first place?

Most GraphQL implementations ignore how the web works

The problem with most GraphQL implementations is that they don't really use the "platform" they are operating on. By platform, I mean the web, or more specifically HTTP and the REST constraints.

The web has a lot to offer, if you're using it in the right way. If read requests (Queries) were using the GET verb, combined with Cache-Control Headers and ETag, Browsers, CDNs, Cache Servers like Varnish, Nginx and many other tools could handle Caching out of the box. You wouldn't really need a service that understands GraphQL.

However, the reality is that most GraphQL APIs make you send Queries over HTTP POST. Cache-Control Headers and ETags don't make sense in this case, as all participants on the web think we're trying to "manipulate" something.

So, we've created Edge Caching solutions that rewrite HTTP POST to GET and are capable of invalidating in very smart ways. By analyzing Queries and Mutations they're able to build up a cache and invalidate Objects as mutations flow through the system.

Limitations and Problems of GraphQL CDNs / Edge Caches

A GraphQL CDN creates a secondary source of truth

As discussed in the Layered System constraint by Fielding, intermediaries can be used to implement a shared cache to improve the latency of requests.

The problem I see with "smart" GraphQL Edge Caches is that they look into the GraphQL Operations to figure out what to cache and what to invalidate. Some GraphQL CDN implementations even allow you to use their API to invalidate objects.

What sounds very smart creates a huge problem, you're building a second source of truth. Before using the GraphQL CDN, all you've had to do is implement your resolvers. Now, you've got to think about how to invalidate the CDN.

Even worse, you're now programming against a single vendor and their implementation, coupling your application to a single service provider. You can't just switch easily from one GraphQL CDN provider to another. Contrary to REST/HTTP APIs, there's no standard on how to Cache GraphQL APIs, hence every implementation will be different.

Another issue is that we're creating a secondary source of truth. Imagine we're putting a GraphQL CDN in front of the GraphQL API of GitHub to cache the Issues for a repository. If we're using "smart" Cache invalidation using Mutations, we're not able to update the cache if a user is bypassing our CDN and uses the GitHub API directly.

A GraphQL CDN really only works if 100% of the traffic flows through the system.

Your GraphQL Edge Cache won't work on localhost or in your CI/CD

Testing takes up a huge part of software development, and we definitely want to have a great Developer Experience when building our APIs. Using a proprietary, cloud-only, Caching solution means that we're not able to test it locally or from within our Continuous Integration systems.

You'll be developing on you local machine without the CDN, only to find out that something behaves weirdly when you're using the CDN in production.

If we were using standardized caching directives, we'd be able to use any Cache Server, like Nginx or Varnish within our CI/CD pipeline to test our APIs.

Proprietary GraphQL CDNs impose a vendor lock-in problem

As mentioned earlier, there's no specification for how GraphQL Caching should work. Every implementation is different, so you're always tied to a specific vendor. They might get broke, they might get acquired, they might cancel their service, they might change their pricing model.

In any case, it's a risk that needs to be managed.

A GraphQL CDN can introduce cross-origin requests

Depending on the setup, adding a GraphQL CDN to your architecture could mean that your browser-based applications have to make cross-origin requests. That is, when your application is running on example.com and your GraphQL CDN runs on example.cdn.com, the browser will always make an extra Preflight request.

It's possible to cache the Preflight request, but this still means that we'd have to do at least one extra request on the initial page load.

A GraphQL CDN might not work well for authenticated Requests with Server Side Rendering (SSR)

Let's say your application requires your users to be logged in, but you still want to be able to apply caching.

Aside from that, you'd also like to implement Server-Side Rendering (SSR). In the ideal scenario, you'd have your users log into your authentication server, which sets a cookie on your apex domain, so that you're logged in on all subdomains. If your CDN is running on a different domain, it's going to be impossible to server-side render a page as the browser will not send the user's cookie to the CDN domain.

Luckily, some providers offer custom domains, so cookie-based authentication might still work for you.

Invalidating Deeply Nested GraphQL Operations might not work

Here's an example of a GraphQL Operation that invalidates easily.

mutation UpdateProfile {
    updateProfile(update: {id: 1, name: "Jannik"}) {
        id
        name
        friendsCount
    }
}

If you've previously Queried the user with id 1, you can invalidate all records with that id and invalidate the following Query:

query {
    profile(id: 1){
        id
        name
        friendsCount
    }
}

Let's make it a bit more complex and query for all your friends:

query {
    me {
       friends {
            id
            name
        }
    }
}

Now, you made some good friends at the last conference. Let's add them:

mutation AddFriend {
    addFriends(where: {user:{hasConnection: {conferences: {id: {eq: 7}}}}}){
        id
        name
    }
}

We've now added all users as friends that had visited a conference whose id equals 7. At this point, we don't get back all the data when we query again for all your friends because the cache can't know that the addFriends mutation has invalidated the cache for the friends query.

At this point, you've got to start adding response tagging, surrogate-keys, or analysing the GraphQL return types to make your invalidation strategy "smarter".

We'll pick up this topic again after the benefits.

Benefits of using a GraphQL CDN / Edge Cache

When there's a cost, there are also benefits!

Using a GraphQL CDN means, you don't have to change much of your application. In an ideal scenario, you simply change the URL of the GraphQL Server and everything should work.

You also don't have to deploy and configure any tooling. You're buying a ready-to-use service that just works.

Despite the problems discussed earlier, a GraphQL CDN can probably improve the performance of many applications.

When not to use a GraphQL CDN

As discussed in the section on Cache Invalidation, building a smart CDN service that doesn't know anything about your GraphQL Backend is actually extremely hard. The problem lies in the fact that the Backend is the source of truth, but doesn't share all this information with the Cache.

In fact, if you're running into such issues where invalidation becomes hard, you might want to apply caching at a different level, inside the resolvers, or even one level below, at the entity level, using the DataLoader pattern.

Additionally, if your data is expected to change frequently, it might make not much sense for you to cache at the Edge.

Another big underrated flaw is that the invalidation of a distributed cache is eventually consistent. This means you serve stale content after a mutation for a short period of time. If you don't respect that in your architecture it has the potential to break the business logic of clients.

Client (US) -> Query all posts to fill the cache
Client (US) -> Run mutation to update the post with ID:1
System fires a web-hook to a third-party service in FRA
Client (FRA) -> Query all posts -> stale content

This is not GraphQL specific but with HTTP Caching the semantics are better understood. In summary, don't use a GraphQL CDN if you need write-after-read consistency.

If your requirement is to support write-after-read consistency, there are multiple solutions to the problem.

One is to not cache at the Edge, but rather at the application layer. In this case, you'd trade latency for consistency, which can be a good trade.

Another way of solving the problem is by distributing the state across the edge and sharding the data based on location. This model of sharding is not always possible, but if shared state is only used across groups of users in the same location, this solution could work very well.

One example of this is Cloudflare Workers + Durable Objects, which gives you a simple Key-Value store that is persisted in a specific location, meaning that all users close to that one location can have consistent state at low latency.

When a GraphQL Edge Cache makes the most sense

If you've got a single GraphQL API, and this API is the only API your frontend is talking to, you fully own this API and no traffic is bypassing your Cache, then such a CDN might actually make sense.

Otherwise, I doubt you get the results you're expecting, especially not with the extra costs discussed earlier.

WunderGraph - An Alternative Approach to GraphQL Edge Caching without vendor lock-in

We've discussed the pros and cons of GraphQL CDNs, now I'd like to propose another approach that makes you stay vendor independent.

When it comes to solving problems, sometimes it's smart to be dumb. I think a Cache can be a lot smarter when it's dumb and playing by the rules of the web, and that's exactly what we're doing with WunderGraph.

Our solution is Open Source and can be deployed everywhere.

How does it work?

I've been working with GraphQL for many years now, and I've realised that when we deploy an application that uses GraphQL APIs, I've never seen the application change the GraphQL Operations at runtime.

What applications do at runtime is changing the Variables, but the Operations usually stay static.

So, what we've done is, we've created a GraphQL to JSON-RPC compiler, which treats GraphQL Operations like "Prepared Statements", you've probably heard of the term from using a database.

When first using a SQL statement, you send it to the Database and get back a handle to "execute" it later. Subsequent requests can now execute the Statement just by sending the handle and the variables. This makes the execution a lot faster.

Because we're not changing GraphQL Operations at runtime, we can actually do this "compilation" step during development.

At the same time, we're replacing HTTP POST for Queries with HTTP GET, while sending the Variables as a Query Parameter.

By sending GraphQL Queries via JSON-RPC with HTTP GET, we're able to automatically enable us to use Cache-Control Headers and ETags.

And that's the whole magic of WunderGraph's Caching Story.

We don't build a "smart" Cache. We don't cache Objects, and we don't have to build complex invalidation logic. We simply cache the response of unique URLs.

For each Operation, you can define if it should be Cached and for how long. If a response might change frequently, you can also set the Cache time to 0, but configure "stale-while-revalidate" to a non-negative number. The client will automatically send the request to the origin with the ETag, the server will either send a refreshed response or 304, not modified.

It's a very dumb approach. If the cached value is expired or stale, we're asking the origin if they have an update. Depending on the configuration, that might create a few more requests to the origin, but we're also not creating a second source of truth, or have to deal with the complexity of managing cache invalidation tags for nested objects.

It's similar to using Change Data Capture (CDC) as a source for Subscriptions vs. simple Polling. CDC can get extremely complex to get right, while simple server-side Polling might work just fine most of the time. It's simple and robust.

There's really not much we've invented here, all this is standard Caching behaviour and any service, like Cloudflare, fastly, Varnish or Nginx supports it out of the box.

There's a standard for how all the participants of the web treat Cache-Control and ETag headers, we've simply implemented this standard.

By removing GraphQL from the equation, we've made it compatible with the web.

If you build tools for the web, you should respect how the web is built, otherwise you're creating more problems than you're solving.

Additional Benefits of the WunderGraph GraphQL Caching Approach

It's not just CDNs and Servers who understand the Cache-Control and ETag Headers. Browsers also automatically cache and invalidate your responses, without adding a single line of code.

Additionally, because we've removed GraphQL from the runtime, we've automatically reduced the attack surface of our application. If our frontend doesn't change GraphQL Queries at runtime, why expose an API that allows you to do so?

Limitations of the JSON-RPC Caching Approach

One of the limitations of this approach is that we're no longer able to use regular GraphQL Clients, as they'd expect us to send GraphQL Operations over HTTP POST, and Subscriptions over WebSockets.

That's not an issue for new projects, but might be a blocker for existing applications. Btw. we've got an internal RFC to add a compatibility mode, allowing e.g. Apollo client / urql, etc. to work with WunderGraph through an Adapter. If you're interested, please let us know.

That said, using plain JSON-RPC would be very inconvenient. That's why we're not just compiling GraphQL to JSON RPC, but also generating fully type-safe clients.

One such client-side integration is the NextJS package, making it super easy to use WunderGraph with NextJS, including Server-Side Rendering, Authentication, File Uploads, etc...

Another limitation, as of now, is that you have to self-host WunderGraph. We're working on a hosted Serverless solution, but as of now, you'd have to deploy and run it yourself.

While it might not be super convenient, this also comes with an advantage: WunderGraph is Apache 2.0 licensed, you can run it anywhere.

How can you deploy WunderGraph?

Now that we've discussed the pros and cons of the WunderGraph approach to caching, let's see how we can deploy WunderGraph to achieve good Caching results.

First, you don't have to deploy WunderGraph globally. It's possible to run it close to your origin, e.g. the (micro-) services that you'd like to use. In this scenario, it's possible to run WunderGraph as an Ingress to your other services.

The architecture of this scenario looks like this:

Client -> WunderGraph Server -> Origin

Deploy WunderGraph with Nginx or Varnish as an additional Caching Layer

If you're deploying WunderGraph close to the origin, it will automatically add the required Cache-Control & ETag Headers. This setup might already be enough for your scenario.

However, in some scenarios you'd like to add another layer of Caching Servers. This could be e.g. a cluster of Nginx or Varnish servers, placed in front of your WunderGraph server.

Architecture updated with this scenario:

Client -> Cache Server -> WunderGraph Server -> Origin

Deploy WunderGraph with Cloudflare or fastly as an Edge Cache

Depending on where your users are, a centralised Caching Layer might not be sufficient for your use case.

In this scenario, you can use services like Cloudflare (Workers) or fastly, to add a globally distributed GraphQL CDN / Edge Cache.

What's important to note here is that you're not tied to a specific solution. As mentioned before, we're using standardized Cache-Control directives, supported by all Cache Server solutions, so you're not tying yourself to a specific vendor.

Updated Architecture:

Client -> CDN -> WunderGraph Server -> Origin

Deploy WunderGraph directly on the Edge, using fly.io

Another option is to deploy WunderGraph directly on the Edge, removing the need for an additional Caching Layer. Services like fly.io allow you to deploy Containers as close to your users as possible.

The architecture of this scenario looks like the following:

Client -> WunderGraph Server (on the Edge) -> Origin

Deploying a service on the Edge is not always beneficial

There's one important point I'd like to make that applies to all solutions mentioned in this post.

If we're deploying a workload on the Edge, this service will be very close to the users, which is generally a good thing.

At the same time, depending on where on the "Edge" we're processing a request, there might be a random latency between ~0-300ms to your origin server(s) per roundtrip. If we have to do multiple roundtrips to fetch the data for a single client request, this latency can add up.

As you can see, it's not always beneficial to the overall performance to have logic running on the edge.

There's one more thing! This solution works for GraphQL, REST, gRPC and other Protocols as well!

You've heard that right. WunderGraph is not just about GraphQL origins. We support a wide array of upstream protocols, like REST, gRPC (coming soon!) and Databases like PostgreSQL, MySQL, Planetscale, etc...

WunderGraph ingests all your services and creates a "Virtual Graph", a GraphQL Schema representing all your services.

The Caching layer we've described above does not just work for GraphQL origins, but all services that you've added to your Virtual Graph.

This means, you're able to apply one unified layer of Authentication, Authorization, and of course Caching to all your APIs.

From our experience, it's rarely the case that you're connecting a frontend to a single GraphQL server. Instead, you're probably going to connect it to many services, which is what we're trying to simplify, and caching is one part of the story.

Conclusion

We've discussed various options to improve application performance with different kinds Caching systems. There are ready-to-use solutions that come with simplicity but also lock you into specific vendors.

With WunderGraph, we're trying to offer an Open Source alternative that's based on standards, implemented by many tools and vendors, so you're able to choose the best of breed solution for your situation.

If you're looking into adding Caching to your stack, we'd love to talk! Join us on Discord or connect on Twitter.

ย