Supercharge Headless WordPress Development with AI & Retrieval-Augmented Generation (RAG)

Many developers are now enjoying huge productivity gains by leveraging AI-powered code editors such as Cursor and Windsurf. These powerful tools and the state-of-the-art large language models (LLMs) that power them can “understand” the codebase and use that knowledge to generate and modify code as instructed by the developer.

In this article, you’ll learn about some of the shortcomings AI code editors have and an approach you can use to provide them with the context they need to significantly improve the quality and correctness of the code they generate.

Table of Contents

Where LLMs Excel
Where LLMs Fall Short
Real-world Example
The Solution: RAG
How to Implement the Solution
Leverage Local Docs
Test the Solution
Give it a Try

Where LLMs Excel

Imagine you’re working on a web project and you give your AI code editor a prompt such as this:

Loop over this array of blog post objects and render each one as an article tag with the blog post title as an h2, the excerpt as a p tag, and a “Read more” link that points to the blog post’s URI. Apply Tailwind classes to stack the list of blog posts vertically on mobile, then display them two-per-row at the md breakpoint and three-per-row at the lg breakpoint.

There is a high likelihood that your AI-powered code editor will do a phenomenal job achieving the desired outcome. That’s because the data it was trained on includes an enormous amount of HTML and TailwindCSS code samples that are still relevant. When it draws on this vast corpus of relevant code samples and uses them to generate code, it makes the output very accurate.

Where LLMs Fall Short

When new or updated software is released, AI tools often struggle to generate accurate code since they were trained on older documentation. This scenario causes two main problems:

The LLM may not know about the new/updated software at all since it was released after the LLM’s training cut-off date.
Even if the LLM does know about the new/updated software, it may mix old and new code syntax due to having been trained on many more examples of the old syntax.

To further complicate things, your AI-powered code editor isn’t always aware of the data available in the APIs you’re interfacing with. So when you ask it to construct queries, it may not do so correctly.

Real-world Example

Here’s an example to illustrate the problems mentioned:

I recently created a headless WordPress backend site and activated the WPGraphQL, Advanced Custom Fields (ACF), and WPGraphQL for ACF plugins. Using ACF, I created a new “Movie” custom post type, a “Movie Fields” field group, and added a “Poster” custom ACF field to store the movie poster image. I then created a new SvelteKit app and told my AI-powered code editor to do this:

Create a movies page. When you load data, send a fetch request to the WPGraphQL endpoint http://headless-wp.local:10009/graphql to get the most recent 10 movies. On the page, render an h1 that says “Movies”. Loop over the movies that were fetched to create a list.

For each movie:

Render the ACF poster field image as an img tag.

Render the title as an h2.

Render the excerpt as a p tag.

Wrap the title in a link that points to the uri for the movie.

My code editor generated a src/routes/movies/+page.ts file with this load function to fetch the data:

import type { PageLoad } from './$types';

export const load: PageLoad = async ({ fetch }) => {
	const response = await fetch('http://headless-wp.local:10009/graphql', {
		method: 'POST',
		headers: {
			'Content-Type': 'application/json'
		},
		body: JSON.stringify({
			query: `
        query GetRecentMovies {
          movies(first: 10, where: { orderby: { field: DATE, order: DESC } }) {
            nodes {
              title
              excerpt
              uri
              featuredImage {
                node {
                  sourceUrl
                  altText
                }
              }
            }
          }
        }
      `
		})
	});

	const { data } = await response.json();
	return {
		movies: data?.movies?.nodes || []
	};
};
Code language: TypeScript (typescript)

Some parts of this query are correct, but it tried to access the featured image for each movie even though I asked it for “the ACF poster field image”. This mistake is unsurprising since at this point, the LLM has no way of knowing that on my WordPress backend, I have an ACF image field called poster in my Movie Fields ACF field group that is assigned to my movie custom post type.

Now let’s see how the code editor did when trying to render the content. It generated a src/routes/movies/+page.svelte file containing this:

<script lang="ts">
  export let data;
</script>

<div class="container mx-auto px-4 py-8">
  <h1 class="text-3xl font-bold mb-8">Movies</h1>
  
  <div class="grid gap-8 md:grid-cols-2 lg:grid-cols-3">
    {#each data.movies as movie}
      <article class="border rounded-lg overflow-hidden shadow-md">
        {#if movie.featuredImage?.node}
          <img 
            src={movie.featuredImage.node.sourceUrl} 
            alt={movie.featuredImage.node.altText || movie.title} 
            class="w-full h-64 object-cover"
          />
        {/if}
        <div class="p-4">
          <h2 class="text-2xl font-semibold mb-2">
            <a href={movie.uri} class="hover:underline">{movie.title}</a>
          </h2>
          <div class="prose prose-lg">
            {@html movie.excerpt}
          </div>
        </div>
      </article>
    {/each}
  </div>
</div>
Code language: HTML, XML (xml)

In the script tag at the top of the file, the code editor used the old export let data; syntax, which is wrong. As of Svelte 5 (which this project uses), the new, correct syntax for accessing props is let { data } = $props();. The LLM should have used that new, correct syntax, but since the majority of the Svelte code example data it was trained on uses the old syntax, the code it generates often falls back to using that old syntax.

The Solution: RAG

One common solution to these kinds of issues in the world of LLMs is to employ Retrieval Augmented Generation (RAG). Essentially, that means starting with the base model (one of the LLMs from Anthropic, OpenAI, DeepSeek, etc.) and augmenting it with an external knowledge base that it can use to inform its responses.

There are several approaches to RAG:

Take data in text format and create embeddings. This converts the words to collections of numbers within a vector database to capture their semantic meaning and provides that vector database to the LLM.
Make network requests to fetch relevant data via the web. Tools such as Cursor and Windsurf sometimes use this approach to look up information on-the-fly.
Provide the LLM with a collection of text documents containing relevant information.

So, which approach is best for fixing the issues cited in this article? Let’s compare them.

Approach #1 is great for huge amounts of documentation that wouldn’t fit entirely within an LLM’s context window. It’s labor-intensive, though, and the embeddings would have to be recreated each time new documentation was available. This approach is too heavy-handed for our use case.
Approach #2 is too slow and requires internet access. With this approach, we would need to tell our AI-powered code editor to access online resources to augment its knowledge, but such lookups are slow and only work while online. Further, the information fetched may not be persisted over time.
Approach #3 works best. We can put a collection of text documents locally in our code project and tell the AI-powered code editor to use the information within to inform its responses. This makes it easy to regenerate the text files whenever technical documentation changes, it doesn’t require a network connection, and it makes accessing that data quick and easy.

How to Implement the Solution

Now that we have a high-level understanding of the issues and which RAG approach is best for fixing them, let’s apply it.

As I mentioned above, my frontend app is built using SvelteKit and my WordPress backend uses WPGraphQL, Advanced Custom Fields (ACF), and WPGraphQL for ACF. Given that tech stack, Let’s see how we can give our AI-powered code editor documentation for all of those tools.

In addition, we’ll run a GraphQL introspection query on our headless WordPress backend and put the results of that query in a JSON file so that the AI code editor has full knowledge of our WPGraphQL schema.

The example codebase to follow along with is available here:
https://github.com/kellenmace/headless-wp-with-ai-rag

There is an llm-docs folder in the root of the SvelteKit project where our documentation will live. Within that folder, there are introspection-query.graphql and generate-wpgraphql-acf-docs.js files that we’ll make use of in subsequent steps.

Svelte Docs

The Docs for LLMs page of the Svelte docs says:

We support the llms.txt convention for making documentation available to large language models and the applications that make use of them.

Because Svelte follows the llmstxt.org convention of providing text documents for LLMs to use, this makes our job easy! The command below has been added to the scripts section of our package.json file. Executing this command will copy the complete documentation for Svelte, SvelteKit, and the Svelte CLI into our project as llm-docs/svelte-docs-full.txt.

"scripts": {
	// ...other scripts
	"generateSvelteDocs": "curl -o llm-docs/svelte-docs-full.txt https://svelte.dev/llms-full.txt",
},
Code language: JSON / JSON with Comments (json)

WPGraphQL Docs

To get the WPGraphQL docs, I have added this script to package.json:

"scripts": {
	// ...other scripts
	"generateWPGraphQLDocs": "curl -L https://github.com/wp-graphql/wp-graphql/archive/refs/heads/master.zip -o wp-graphql-master.zip && unzip wp-graphql-master.zip 'wp-graphql-master/docs/*' && rm -rf llm-docs/wpgraphql-docs && mkdir -p llm-docs/wpgraphql-docs && mv wp-graphql-master/docs/* llm-docs/wpgraphql-docs/ && rm -rf wp-graphql-master wp-graphql-master.zip",
},
Code language: JSON / JSON with Comments (json)

This command does this:

Download the master branch of the WPGraphQL repo as a zip file
Extract the zip file into a folder named wp-graphql-master
Copy the contents of the wp-graphql-master/docs/ folder into a llm-docs/wpgraphql-docs in our project
Delete the zip file and extracted folder from steps 1 & 2.

This results in a /llm-docs/wpgraphql-docs file in our project that contains all of the Markdown files that make up WPGraphQL’s documentation.

WPGraphQL for ACF Docs

The WPGraphQL for ACF docs are stored in a headless WordPress backend. We’ll copy them into our project with this package.json script:

"scripts": {
	// ...other scripts
	"generateWPGraphQLAcfDocs": "node llm-docs/generate-wpgraphql-acf-docs.js",
},
Code language: JSON / JSON with Comments (json)

When this is run, the code inside of /llm-docs/generate-wpgraphql-acf-docs.js executes. It does this:

Fetch data for Pages and Field Types from the headless WordPress backend that the acf.wpgraphql.com site uses.
Renders the markup for Pages and Field Types as HTML files within the /llm-docs/wpgraphql-for-acf directory.

Introspection Query

Without this, our AI code editor would be “flying blind” when constructing GraphQL queries, not really knowing what the GraphQL schema of our headless WordPress backend looks like. We’ll fix that with this script:

"scripts": {
	// ...other scripts
	"generateIntrospectionResult": "bash -c \"curl -X POST -H 'Content-Type: application/json' --data @<(jq -Rs '{query: .}' < llm-docs/introspection-query.graphql) http://headless-wp.local:10009/graphql -o llm-docs/introspection-result.json\"",
},
Code language: JSON / JSON with Comments (json)

When run, this command takes the GraphQL introspection query in /llm-docs/introspection-query.graphql and runs it on our WordPress backend’s /graphql endpoint. This query essentially asks the GraphQL API “what is the structure and types for all the data in your schema?”. The response that comes back is stored in the /llm-docs/introspection-result.json file.

One Script to Rule Them All

Having these as individual scripts is useful when you only want to run one of them to regenerate the docs for that specific thing. For convenience, we’ll add one last generateLLMDocs script that when run, executes all of the LLM documentation-generating commands for you.

With this in place, our final set of scripts in the package.json file looks like this:

	"scripts": {
		// ...other scripts
		"generateSvelteDocs": "curl -o llm-docs/svelte-docs-full.txt https://svelte.dev/llms-full.txt",
		"generateWPGraphQLDocs": "curl -L https://github.com/wp-graphql/wp-graphql/archive/refs/heads/master.zip -o wp-graphql-master.zip && unzip wp-graphql-master.zip 'wp-graphql-master/docs/*' && rm -rf llm-docs/wpgraphql-docs && mkdir -p llm-docs/wpgraphql-docs && mv wp-graphql-master/docs/* llm-docs/wpgraphql-docs/ && rm -rf wp-graphql-master wp-graphql-master.zip",
		"generateWPGraphQLAcfDocs": "node llm-docs/generate-wpgraphql-acf-docs.js",
		"generateIntrospectionResult": "bash -c \"curl -X POST -H 'Content-Type: application/json' --data @<(jq -Rs '{query: .}' < llm-docs/introspection-query.graphql) http://headless-wp.local:10009/graphql -o llm-docs/introspection-result.json\"",
		"generateLLMDocs": "npm run generateSvelteDocs && npm run generateWPGraphQLDocs && npm run generateWPGraphQLAcfDocs && npm run generateIntrospectionResult"
	},
Code language: JSON / JSON with Comments (json)

If you cloned down the project repo, can can run this command yourself to generate the documentation files.

You can run any of these scripts at any time to regenerate the documentation and keep it fresh.

Leverage Local Docs

Now that we have our documentation in /llm-docs, we need to tell our AI code editor to make use of it.

That can be done by creating a Cursor project rule, or a Windsurf local rule. I have added both .cursorrules and .windsurfrules files to the example repo that include these instructions:

# Crucial Documentation

Documentation is stored in the `llm-docs` directory. Please review all the contents of that directory thoroughly and use the information within to inform your responses.

- The SvelteKit documentation is at `llm-docs/svelte-docs-full.txt`.
- The WPGraphQL documentation is in the `llm-docs/wpgraphql-docs` directory.
- The WPGraphQL for ACF documentation is in the `llm-docs/wpgraphql-for-acf` directory.
- I ran the GraphQL introspection query in `llm-docs/introspection-query.graphql` and saved the result in `llm-docs/introspection-result.json`. By reviewing this, you can understand the GraphQL schema of the headless WordPress site.
Code language: Markdown (markdown)

Now your AI code editor will follow your instructions in this rule file. It will review the important documentation within the llm-docs directory, and use it to inform its responses.

Alternatively, you can paste that wording into a new prompt when working in your editor.

Test the Solution

After setting up this system to populate the /llm-docs directory with documentation and instruct my AI code editor to utilize it, I ran the prompt again:

Create a movies page. When you load data, send a fetch request to the WPGraphQL endpoint http://headless-wp.local:10009/graphql to get the most recent 10 movies. On the page, render an h1 that says “Movies”. Loop over the movies that were fetched to create a list.

For each movie:

Render the ACF poster field image as an img tag.

Render the title as an h2.

Render the excerpt as a p tag.

Wrap the title in a link that points to the uri for the movie.

This time, I’m pleased to report that the issues I encountered were resoved. It correctly queried the poster ACF field and used the correct Svelte 5 state syntax.

Give it a Try

Consider implementing this setup in your own projects, using this article as your guide. Just as I’ve demonstrated by creating scripts in the package.json file to fetch documentation and API responses, and save them directly within the project’s directory, you can apply this approach to your documentation and APIs. Once configured, you should notice a significant improvement in the quality of the code generated by your AI code editor.