API Management

Solving SEO problems with API design

June 22, 2017

Martin Nally

Software Developer and API designer, Apigee

Editor's note: This is the first of a two post series on solving SEO problems with API design. The second post in the series can be found here.

Last year I visited the software development arm of household-name retailer that was attempting to rebuild the user interface for their main website using a single-page application (SPA) implementation design. The project had failed.

It did so for for two main reasons: page load times were unacceptably long, and search engines were unable to effectively index the new site.

At the time of my visit, the retailer had abandoned the SPA design approach in favor of "old-fashioned" HTML construction on the server. This is a pity, because a properly-designed and -implemented SPA could have provided a superior experience for the company’s users, and superior productivity for its developers.

There’s a lot of advice on the web on how to optimize load times for single-page applications, but less on how to deal with the problem of search-engine indexing. This post explains how the search engine indexing problem can be solved through thoughtful design of APIs—perhaps not the place many people might look for a solution to a search engine problem.

SPAs enable end users to navigate between different entities without performing an HTML document load for each entity.

Note: HTML document loading is a concept fundamental to all web browsers—it’s precisely defined here and in other specifications. "Single-document application" would have been a better name than single-page application, if you value consistency with the terminology in the specifications.

Essentially, a JavaScript program is loaded into the browser as a result of an initial HTML document load, and that JavaScript program then uses a series of API calls to get the data and present a user interface for a succession of entities without having to load an HTML document (and corresponding JavaScript) for each one.

The SPA experience

There are many reasons for the popularity of SPAs: they are easy and fun to write; they can provide a superior user experience; and they help provide a clean separation between user interface code and business logic.

Another important advantage of SPAs is that their overall design is similar to that of mobile apps that display the same data and access the same function—in fact many people use the same HTML5 SPA implementation for both web and mobile.

Early SPAs often failed to integrate well with the browser. One of the most visible mistakes was a failure to update the browser history appropriately. As the user navigated between resources, the SPA failed to update the address bar and the browser history, resulting in providing nothing for the user to bookmark and the back, forward, and history controls of the browser not working.

Better understanding of how a SPA should be written along with adoption of frameworks like Angular that help developers write good SPAs have resulted in more and more SPAs that integrate well with the browser. Yet there are still few SPAs that also work well with search engine optimization.

HTML5 improvements aren’t enough

If you look in the web browser address bar of a typical SPA, you will see addresses that look like this:

https://acme.com/#/entity/xxxx

Note: technically these addresses are called URI references, not URLs—the URL terminates at the # character.

The only HTML document that exists in a design like this is the document at the URL https://acme.com/. This is the document that will load a JavaScript application that will then make API calls to retrieve JSON representations of entities with identifiers of the form /entity/xxxx. The Javascript then creates a browser DOM whose rendering is what the user sees.

There is only one HTML document for search engines to retrieve—https://acme.com/—and it usually contains only code or references to code, which is not useful to a search bot. This completely defeats search engine indexing.

With the ubiquity of the HTML5 history API in browsers, SPAs can now be written to use URLs like this one instead:

https://acme.com/entity/xxxx

This is an important improvement, because there is now a separate URL that is retrievable from the server for each entity that is exposed by the SPA. Although this is a step in the right direction, this does not by itself solve our SEO problem, because the resources at these URLs—like the single resource we had previously at https://acme.com/—typically only contain code or references to code, as illustrated in the following example. This is still useless to a search bot.

<!DOCTYPE html>
<html>
<head>
<script src="/mySPA.js"></script>
</head>
<body>
</body>
</html>

Assuming the JavaScript is written appropriately, this HTML will display the correct user interface for each one of the entities of the app, whose URLs might look like https://acme.com/entity/xxxx. The JavaScript code will construct a browser document object model (DOM) for the user interface even though there is no content in the body of the HTML document to do this. The JavaScript code must also look at the URL of the document that was loaded to determine the initial entity to present to the user.

Add some meta

Consider the example of an application that displays information about Disney cartoon characters. In order to make the HTML useful for a search bot, we can simply add additional information, like this:

<!DOCTYPE html>
<html>
<head>
   <meta name="description" content="Personal information about Mickey Mouse" />
   <meta name="keywords" content="Mickey Mouse Donald Duck Minnie Goofy Disney" />
   <script src="/mySPA.js"></script>
</head>
<body>
   <div style="display: none;">
     <p>name: <span>Mickey Mouse</span></p>
     <p>girlfriend: <a href="https://acme.com/entity/Minnie">Minnie Mouse</a></p>
     <p>friends:
       <ol>
         <li><a href="https://acme.com/entity/Donald">Donald Duck</a></li>
         <li><a href="https://acme.com/entity/Goofy">Goofy</a></li>
       </ol>
     </p>
   </div>
</body>
</html>

All we did was add some <meta> elements to the head of the document (search engines take note of meta elements) and a simple <body> that will never be displayed to the user.

The essential idea—which might not be immediately obvious—is that the <body> element we included here has no influence on the user interface that a user will see—producing that user interface is the job of the JavaScript of the SPA.

The goal of the <body> element is only to provide useful information for search bots, and other non-browser clients that understand HTML. Users will see only the user interface created by the JavaScript of the SPA, and search engines will see only the body shown above.

Obviously, this is not a sophisticated example of what you want to include in your HTML for SEO—this is not an SEO tutorial, which I'm anyway not qualified to write—but it shows how you can include HTML that is visible to search engines for all the entities that are displayed in your SPAs.

In summary, there are two steps required to solve the SEO problem for SPAs. The first is defining and using separate URLs—ones without fragment identifiers introduced by the "#" character—for each entity shown in the user interface.

The second step is providing meaningful HTML bodies for each entity, even though that HTML will not be seen by human users of the SPA.

This post has outlined a basic approach to make SPAs indexable by search engines, but we have not yet linked the story to API design. We do that in the second and final post in this series.