Tech Breakout Summit/2021: Under the Hood With Headless WordPress and the Google Cloud Platform
Performance and scalability are some of the leading drivers of headless adoption for large-scale organizations, but what do those perceived benefits look like close-up? Learn from the VPs of Engineering at Google and WP Engine as they discuss how WP Engine’s new solution for Headless WordPress Hosting, Atlas, leverages Google Cloud Platform to deliver the next generation of speed and flexibility to WordPress developers.
In this session, VP of Engineering at Google, Chen Goldberg, and VP of Engineering at WP Engine Brandon DuRette discuss:
- Rapid development with Google Cloud Platform’s (GCP) Kubernetes Engine, which makes it easy to deploy, manage, and scale applications.
- WP Engine’s new solution for Headless WordPress, Atlas, which leverages GCP to deliver the next generation of speed and flexibility for WordPress developers.
When you deploy new code to Atlas, we tell Kubernetes, we want you to run version four intsead of version three, and Kubernetes takes care of starting up version four, changing the load balancer configuration, and shutting down version three…it’s really powerful stuff.”Brandon DuRette, VP of Engineering at WP Engine
Full text transcript
BRANDON DURETTE: Hi, everybody. I’m Brandon DuRette, VP of engineering at WP Engine.
CHEN GOLDBERG: Hi, and I’m Chen Goldberg, VP Engineering at Google Cloud.
BRANDON DURETTE: Today, we’re going to talk to you about Atlas, WP Engine’s complete solution for Headless WordPress, and the underlying technologies that make it possible. We’re also going to talk about why we chose the technology stack that we did and why that’s important to you.
When we set out to build a headless platform, long before we gave it the name Atlas, we talked to a lot of customers and agency partners. Many of them might be in this audience. We wanted to understand why you were adopting headless or considering adopting headless because we wanted to ensure that the platform that we built met your needs or exceeded them.
But the most important drivers we heard for headless adoption were performance and scalability, even on very powerful WordPress platforms like the WP Engine platform, scaling WordPress can be hard, especially for highly dynamic sites that aren’t as cachable. Later, we commissioned a formal study in the headless that backed up our early results. Performance and scalability were the number one and number three reasons that participants cited for adopting headless, with 41% and 33% of respondents saying those were the reasons that they were choosing headless.
Of course, this makes sense. Performance and scalability are critical to the success of any website. Faster sites have a lower bounce rate and a higher conversion rate. You need your sites online and available to your visitors, wherever they are, wherever they are coming from, and whenever they hit your site. No amount of developer flexibility integrations, SDK features, or any of the myriad of other benefits of headless mean anything if your site is down.
So we knew that above all else, we had to deliver on these. So here’s what we came up with. When we think of the Atlas platform, we think about two major focus areas. First, the development and build aspect, where you develop your code and deploy it onto our platform. And then there’s the serving aspect, where we handle your web traffic. It will become clear in a minute why the build system is so important.
But for now, let’s focus on the serving aspect. The first component of the Atlas platform stack that your traffic will encounter is our integrated CDN. The CDN accepts your traffic from the internet at points of presence all around the globe and serves any static content or cache content to your visitors from right around the corner from them.
The CDN is powered by Cloudflare and it is the same CDN we use to power a good portion of our WordPress hosting stack. Traffic that can’t be served by our CDN is forwarded to what we call Node Engine. This is where the dynamic code runs. You can run– you can have custom APIs or integrations that run there or just dynamically generate pages for your site.
Finally, we back all that up with our industry-leading WordPress platform, which distributes your content to the Node front end via Rest or GraphQL and provides your content team the tools that they’re familiar with for creating and managing that site content.
Today at Summit we also announced another critical component of the stack– Content Engine. While I won’t be talking about many of the specifics about Content Engine today, it’s designed to accelerate the delivery of your WordPress content via GraphQL. Content Engine will further accelerate your dynamic experiences, making them better than ever.
There are several approaches to building headless sites, and Atlas is flexible enough to support them all. The differences in the approaches come down to how you load your data and when you render your pages. The first approach is a fully static site. All of the heavy lifting is done at build time. So all of the data is loaded from the back end during the build, and every page of the site is rendered and stored as a static file that can be served directly from the CDN immediately.
Fully static sites minimize both the network distance to your end-users and eliminate the data fetching and rendering that take time and increase your Time To First Byte, making this approach the fastest way to deliver your content. If static is fastest then, why would you choose anything else? Well, not every site is fully static, at least not for every page.
For example, if you have a search feature on your site, you’re not going to want to be pre-rendering every possible search result. You’re not creative enough to know what all of your site visitors are going to search for on your site. If you want to personalize your site and offer visitors a unique experience based on how they interact with your site, what pages they visited, or who they are. You’re also not going to want to serve the same content to everyone.
Finally, if your site is changing rapidly, you’re updating content continuously, you’re– maybe you have a live feed, a live blog on your site, it may not make sense to run a build step with each change. Instead, you need to do something more dynamic and at least populate those aspects of your site at runtime.
At the other extreme is a fully dynamic site, each and every request receives a fully rendered– freshly rendered response. That means all of the necessary content is loaded from the WordPress or Content Engine with each request. You could choose to cache the data in Node Engine, of course, to save yourself some requests at the back-end. But all of the decisions about how to render the page are happening right there at runtime.
This approach has the most computation and network overhead. So it’s going to be slower than other approaches, and I would not recommend that you build your entire site dynamically in this way, unless your site is relatively low-trafficked and it really, really requires it. In the future with Content Engine, we’re focusing on driving down the cost of that content loading to the point where we think you will be able to do fully dynamic sites at very close to the same speed as static.
But you’ll be happy to know that there’s a solid middle ground between fully static and fully dynamic, and we call this incremental static. Incremental static is an idea popularized by the Next.js framework. And it works by dynamically generating pages the first time they’re requested, but then caching them for future visitors.
Most visitors receive the cached content, even from the CDN. And once the cached content expires, the page will be rerendered so that visitors get fresh content. With Next.js, there’s even a feature that allows you to generate fresh content for an expired page but not make the visitor wait for that fresh content. Instead, the first visitor is reserved for the cached page, and the CDN is repopulated by the freshly rendered page.
This is technical– the technical term for this is called stale-while-revalidate and is the technique used by CDNs that are hosting platforms. And there’s a small trade-off here in terms of freshness in exchange for increased performance. Finally, I’ll say that nothing requires you to make one and only one choice between these three different approaches.
Many of the frameworks out there allow you to mix and match. So some pages can be fully static and others fully dynamic. If you have a page on your site that never changes, maybe it’s your About Us page—for some people it’s their Home page– you can make that fully static, and then only make the pages that change often—maybe there’s a blog feed or a shopping cart experience or something like that that has to be dynamic. Make those the dynamic pages on your site.
One of the nice things about headless is that you have the flexibility to take any and all of these approaches to meet the needs of your site. Regardless of whether you choose to make your site fully static, fully dynamic, incrementally static, or some hybrid of those, Atlas is a fast, reliable, and scalable Headless WordPress platform.
To ensure your sites are fast on Atlas, as I mentioned before, it comes with an out-of-the-box global CDN. We wanted this to be integrated and seamless, so you don’t have to think about it at all. We just do the right thing and ensure your static content is served as fast as possible right around the corner from your visitors. I bet you didn’t know that networks have corners.
But as we know, sometimes static isn’t good enough. Just like our WordPress platform for Node Engine, we leverage the fastest cloud compute available. This won’t make any network latency go away, and it won’t fix third-party APIs that you’re integrating with, but the code that you write, that you run on Atlas will run as fast as possible.
And finally, with Content Engine, we’re making all critical public WordPress content faster than ever– post, pages, author data, everything you need available through an accelerated graphical API. We’re excited about the possibility for Content Engine to make your server or client rendered pages nearly as fast as static.
While reliability wasn’t specifically called out in the State of Headless report, I think it goes in hand in hand with performance. A site that’s down is by definition a site that is worse than a site in the slow. To ensure the site stays online, we use a technology called Kubernetes, which Chen will talk about in a minute.
Kubernetes allows us to seamlessly distribute our Node applications across multiple availability zones in the cloud. These availability zones are designed to minimize the chance that if something goes wrong in one zone that something is also going wrong in the other zone. So when something goes wrong in one zone, then your site stays online, because we have it running in both zones.
With Kubernetes, we also have built-in automated recovery, from both hardware failures and application software failures. Because of this redundancy and automated recovery, Atlas is backed by our enhanced SLA with an availability target of 99.99%.
Kubernetes provides another benefit– auto-scaling. When your site is under the stress of high load, we automatically scale up to handle the traffic. This takes place across availability zones and can even provision more compute resources to handle the load if necessary. This is fully automated, so gone are the days of provisioning for peak traffic or preplanning for events.
Let’s face it, you aren’t in control of all the ways that your site might get a burst of traffic. Sometimes you are, like when you launch a successful marketing campaign or have a flash sale. But on today’s internet, things go viral for random reasons. Not to worry though, regardless of where your traffic comes from, Atlas has your back. With that, I’m going to turn it over to Chen to share more about Kubernetes, the engine that powers the Atlas platform.
CHEN GOLDBERG: Thank you, Brandon. I’m excited to be here on this virtual stage with Brandon. When I first met Brandon and the team back in 2018, they started to reimagine the WP Engine platform. We had a good conversation about technology, the characteristics of cloud-native applications, what’s top of mind for their customers, and Google Kubernetes Engine.
Since then, Google Cloud has partnered with the team to make their vision a reality. Today, I want to show a little bit more about one of the underlying technologies that power Headless WordPress, Atlas. Google Kubernetes Engine is an enterprise container management service from Google. Our goal is to make it easy for our customers to run containerized applications at the enterprise level– fast, reliable, and scalable.
At the heart of GKE is a technology named Kubernetes. What is Kubernetes? Kubernetes is an open source platform for automating, deployment, management, and scaling of containerized applications. Containers package applications and decouple the application and its dependencies from the operating system.
When you need to deploy, monitor, and manage, and scale one application, it’s doable. But when you have many more, it becomes more complicated. Kubernetes is a technology that helps you orchestrate all these containers, making sure everything runs smoothly. It’s also open source. That means that it’s being built by a community of developers and companies, ensuring it works well on-premises and in the cloud.
This means that once your application runs on Kubernetes, with Kubernetes your application is now portable. It can run anywhere. Last, flexibility mattered a lot. So from day one, we invested in making Kubernetes extensible. This means that Kubernetes can be enhanced and easily integrated with.
But on top of all of these amazing things, Kubernetes has another superpower which Brandon also mentioned. Kubernetes is a powerful automation machine. Traditional automation machines work with an event trigger system—if this, then that. For example, I can program the air conditioning system to turn on every time it’s 6:30 and heat or cool the house depending on the season.
The downside of this approach is that I need to program for all cases. Think about edge cases and make sure that everything is predictable. An alternative way is to set up a thermostat. With a thermostat, I can program my desired temperature. For example, I would like our house temperature to be between 68 to 72 degrees Fahrenheit. The system will continuously monitor the current state, and when it detects it doesn’t meet the desired state, will automatically fix it by either warming or cooling the house.
This is how Kubernetes deals with automation at scale. Operators and developers don’t need to think about all the edge cases– what can go wrong or what can go well. Especially in cloud-native applications, the environment is dynamic and stuff does change a lot. Instead, they need to define what’s the desired state. For example, what is considered healthy performance for the application? Kubernetes will monitor the application and all its components, and if things change, it will take action to bring it back to the desired state.
Kubernetes reached 1.0 about six years ago, and Google Kubernetes Engine has been [INAUDIBLE] at the same time. The reason Kubernetes is successful or maybe one of the reasons it has been adopted rapidly is because we’ve done all of this for a very long time internally– more than a decade. The modern connect container is a combination of Linux name namespaces and cgroups. The latter was invented internally at Google around 2006.
Docker made a Linux container popular around 2013, but the concept was heavily used internally at Google much earlier. We understood that the real challenge is not in the container but in the orchestration of containers across multiple hosts. Internally, we have a system named Borg, and we developed it and surrounded it with a rich ecosystem to make Google Developers’ and operators’ life easier and achieve higher velocity.
Based on our learnings, we created Kubernetes and its ecosystem, partnering with others in the open source community. Today, Google thought leadership in the Kubernetes ecosystem is very clear, and we keep innovating and raising the bar. Kubernetes is a very powerful and complex product, and running it reliably at scale is not a trivial task.
Companies that run self-managed Kubernetes learned it the hard way and typically, have a team of experts responsible for operating Kubernetes. Google Kubernetes Engine offloads that complexity from customers, allowing them to focus on their business workloads. GKE control plan is fully operated by Google Site Reliability Engineers with managed availability, security patches, and upgrades.
The Google SRE team does not only has deep operational knowledge of Kubernetes, they also have better-tested best practices for managing highly reliable scalable service. They even published multiple books on this topic. Our SRE team distributed globally constantly met, monitors, and acts on problems on any GKE cluster, including those powering Atlas, ensuring we and WP Engine also meet their SLOs.
Without compromising the Kubernetes API, we integrate carefully too many Google services like Google Compute Engine, Google Global Load Balancer, monitoring, and logging. GKE provides comprehensive management for nodes, including auto-provisioning, patching, auto-upgrade, auto-repair, and scaling. Listed here are some of the unique capabilities of GKE.
Our team takes a lot of pride in creating a cluster in just a few minutes, our ability to run the largest clusters in the industry, and supporting the industry-first full-way auto-scaler– reliable, scalable, and fast, just like Headless WordPress. Back to you Brandon.
BRANDON DURETTE: Thanks, Chen. I really love the container—Kubernetes control—loop observe, compare, react, repeat. It’s a simple idea but a really powerful one. That loop is the heartbeat that drives auto-scaling, automated failure recovery, deployment automation, and more within Atlas.
For auto-scaling, if the control loop observes that a site is struggling under load, it simply deploys more copies of the site to take up the load. When that burst of traffic goes away, the same control loop shuts down the extra copies of the site so that it doesn’t consume any more resources than it needs. For failure recovery, Kubernetes can detect a copy of your site when your copy of your site becomes unhealthy or unavailable, whether that’s because of a software problem or a hardware problem, and it stops sending it traffic.
If necessary, it will automatically deploy a replacement so your site keeps on going. Even our deployment automation follows the same pattern. When you deploy new code to Atlas, we just tell Kubernetes, hey, we want you to be running version 4 instead of version 3, and Kubernetes takes care of starting up version 4, and changing the load balancer configuration, and shutting down version 3– switching all the traffic to the new version as soon as it’s ready to go. It’s really, really powerful stuff.
Finally, I want to say thank you to Chen and her team, as well as the Kubernetes developers around the world who make Kubernetes and the GKE the platform that it is. It’s really a critical part of making Atlas the fast, reliable, scalable, headless platform. The full research report that I mentioned at the top of the talk is available on wpengine.com.
And if you’re interested in learning more about Atlas or Headless WordPress in general, visit developers.wpengine.com, our developer relations site where you’ll find documentation about Atlas, articles, and video tutorials about Headless WordPress, and even a podcast about everything interesting going on in this space.