API Management

Solving SEO problems with API design, part 2

July 17, 2017

Martin Nally

Software Developer and API designer, Apigee

Editor's note: This is the second of a two post series on solving SEO problems with API design. The first post in the series can be found here.

In a previous post, we discussed SEO problems faced by single-page applications, or SPAs, and outlined a basic approach to make SPAs indexable by search engines. Here, we’ll discuss how API design fits into the picture.

Consider the implementation of the SPA. Imagine the SPA has already displayed the user interface for the fictitious API resource I’ll call Mickey Mouse, and the user has clicked on the link for the resource I’ll call Minnie Mouse. The SPA must perform the following steps:

Fetch the data for Minnie; let's say it's at https://api.acme.com/entity/Minnie
Add a URL for Minnie (say, https://ui.acme.com/entity/Minnie) to the browser history
Construct the user interface document object model (DOM) for Minnie in the browser (or fill new data into the existing one)

Having two different URLs for Minnie—https://ui.acme.com/entity/Minnie and https://api.acme.com/entity/Minnie—and selecting which to use depending on whether you want HTML (for browsers) or JSON (for programmatic clients) is very common, but has downsides.

One is that you must have a rule for converting from one to the other, which has to be learned and separately documented. Also, if I want to email a link to Minnie to you, which one should I send? Or if I want to link to Minnie from another resource, which URL should I include?

Converging on a single URL

The answer is: "it depends on what the recipient wants to do with the link.” You’ll probably also have to implement two servers to support these URLs.

An alternative to this approach is to define a single URL for Minnie and use content negotiation to decide whether to return HTML or JSON. This means that each time a client (regardless of whether it’s a browser or a programmatic client) uses the unique URL for Minnie, it includes a header in the HTTP request to say whether, on this occasion, it wants HTML or JSON.

The browser itself will ask for HTTP, while the JavaScript of a SPA running in the same browser will ask for JSON, both using the same URL. The Ruby-on-Rails framework popularized the use of content negotiation, but it is a core feature of the HTTP protocol that can be used in any programming language.

I like to use this approach because it defines a single URL for each entity that can be used for either human web browsing or programmatic API access. I appreciate the fact that it makes my API immediately browsable using a standard web browser without any special plugins, tools, or API metadata descriptions—my SPA is also my API browser.

I can even browse the API with JavaScript turned off to see a more basic rendering of the API data in HTML, and I’ll see a direct rendering of the same HTML that search bots will see.

Note: You can also use this approach to support a degraded UI for browser users that cannot or will not turn on Javascript in their browsers. Those users see a rendering of the same HTML a search bot sees.

Creating HTML from programming objects

This approach is easy to implement, because the HTML can be created algorithmically from the same programming objects that are normally used to produce JSON, without requiring coding or knowledge specific to the API. This is especially easy in the case described above because the HTML is not expected to produce a user interface, just a meaningful description of the entity for non-browser clients.

Producing this sort of HTML is not quite as simple as serializing objects in JSON—for which all the popular programming languages have built-in libraries—but it isn’t hard. At the end of this post, there's an appendix with a few lines of JavaScript code that you could include in a Node.js server for this purpose. Note that you don't ever have to accept HTML upon input in your API—it's sufficient to produce it on output.

Creating HTML from programming objects is implemented much more simply if the data in your objects includes links, even for the JSON case. Including links in JSON is something you should do in your APIs anyway, but that topic is the subject of other blog posts and covered in the eBook, “Web API Design: The Missing Link.”

Dealing with SPA load speed

In the discussion above, we considered the case where the SPA was already loaded and the user was navigating from Mickey to Minnie. When the SPA was first loaded, the sequence of events went something like this:

The browser loaded the HTML document at https://acme.com/entity/Mickey
The JavaScript file at https://acme.com/mySPA.js was loaded and executed
The DOM created from the HTML body was discarded or simply ignored; the JavaScript code saw that the URL that was loaded was the URL for Mickey, went back to the server to ask for the data for Mickey in JSON format, and then constructed a new browser DOM that is displayed to the user

This illustrates one of the downsides of SPAs: they typically load more slowly than "old-fashioned" HTML, in part because they load and execute lots of JavaScript, and in part because they have to go back to the server again for data.

Fortunately, they also load much less often. In practice, the first reason is usually more significant than the second, but optimizing the load times of your JavaScript is outside of the scope of this post.

The second reason for SPA slowness can be entirely eliminated if the SPA is willing to read the resource data from the HTML DOM it already has, rather than going back to the server for a JSON version. For this to work, we have to enhance the HTML body slightly as follows:

<!DOCTYPE html>
<html>
<head>
   <meta name="description" content="Personal information about Mickey Mouse" />
   <meta name="keywords" content="Mickey Mouse Donald Duck Minnie Goofy Disney" />
   <script src="/mySPA.js"></script>
</head>
<body>
   <div style="display: none;" resource="https://acme.com/entity/Mickey">
     <p>name: <span property="name" datatype="string">Mickey Mouse</span></p>
     <p>girlfriend: <a property="girlfriend" href="https://acme.com/entity/Minnie">Minnie Mouse</a></p>
     <p>friends:
       <ol property="friends">
         <li><a href="https://acme.com/entity/Donald">Donald Duck</a></li>
         <li><a href="https://acme.com/entity/Goofy">Goofy</a></li>
       </ol>
     </p>
   </div>
</body>
</html>

All I did was add the property names and types using the standard "property" and "datatype" attributes. For completeness, I also used the "resource" attribute to include the URL of the entity whose HTML this is—this is usually necessary for nested objects, but is a good idea even at the outer level.

On first load, my SPA can now read the data for Mickey directly from the DOM that has already been created by the browser from the HTML body, instead of going back to the server. This saves a server round-trip.

The JavaScript code to read this DOM is simple and general. I have not included source code for this, but it is similar in size and complexity to the code shown below for generating the same HTML. It essentially performs the inverse of the code to generate HTML from programming objects (happily, the HTML has already been parsed into a DOM by the browser).

Semantic search

Parsing our own HTML in the SPA to save a server round-trip is not the only motivation for adding these attributes. Google search also understands them. In addition to reducing the load times for our SPA, we are providing richer data to the Google search engine. See RDFa, schema.org, or microdata for more information on this.

The attribute names I used in the example are from RDFa, although my use of them is not ideal, because my property names do not define useful URLs. This isn't hard to correct, but the detail of doing so would have been distracting. You can use schema.org or microdata attributes instead if you prefer, but you should stick to something that Google understands rather than inventing your own.

It's quite straightforward to write a SPA that is compatible with search engine optimization. All you have to do is to use “regular” URLs instead of URI references with fragment identifiers (for example, those that include the “#” character) and make sure each of these URLs returns a simple HTML format that will be useful for search engines but is never seen in the user interface.

You can improve the quality of the data that you provide to Google search and reduce the load times of your SPA at the same time if you include microdata-style attributes in the HTML you produce on the server for search bots, using one of several standards that Google supports.

And finally, you can simplify your overall implementation and improve your API if you unify your API with your UI by using the same URLs for both.

Appendix: Generating HTML from JavaScript objects

The following code assumes that the "body" parameter is a JavaScript object that you would normally serialize to JSON to produce the API representation of a resource. In other words, instead of writing JSON.stringify(anObject) to create the API response, you would write toHTML(anObject)to produce an HTML response.

function toHTML(body) {
const increment = 25;
function valueToHTML(value, indent, name) {
   if (typeof value == 'string') {
     if (value.startsWith('http') || value.startsWith('./') || value.startsWith('/')) {
       return `<a href="${value}"${name === undefined ? '': ` property="${name}"`}>${value}</a>`;
     } else {
       return `<span${name === undefined ? '': ` property="${name}"`} datatype="string">${value}</span>`;
     }
   } else if (typeof value == 'number') {
     return `<span${name === undefined ? '': ` property="${name}"`} datatype="number">${value.toString()}</span>`;
   } else if (typeof value == 'boolean') {
     return `<span${name === undefined ? '': ` property="${name}"`} datatype="boolean">${value.toString()}</span>`;
   } else if (Array.isArray(value)) {
     var rslt = value.map(x => `<li>${valueToHTML(x, indent)}</li>`);
     return `<ol${name === undefined ? '': ` property="${name}"`}>${rslt.join('')}</ol>`;
   } else if (typeof value == 'object') {
     var rslt = Object.keys(value).map(name => propToHTML(name, value[name], indent+increment));
     return `<div${value._self === undefined ? '' : ` resource=${value._self}`} style="padding-left:${indent+increment}px">${rslt.join('')}</div>`;
   }
}
function propToHTML(name, value, indent) {
   return `<p>${name}: ${valueToHTML(value, indent, name)}</p>`;
}
return `<!DOCTYPE html><html><head></head><body>${valueToHTML(body, -increment)}</body></html>`;
}