Why do I even care what version your API is? Versioning your API with HATEOAS
This is another in my unofficial series on stuff to do with REST. You can see the others by clicking here for API Anti-Patterns or here for Cookies and the RESTful API.
Notice what I did there? Think hard. I linked some other documents that you can GET. Click’em and you’ll:
GET: http://blog.mikepearce.net/2010/08/24/cookies-and-the-restful-api/
What I did there was provide you with two hypertext links, one to each of the other posts about REST that I’ve written. This is HATEOAS. You see, you don’t actually need to know those two links above to be able to find them, you don’t even really need to know this link to THIS blog post, you only need to know my endpoint, which is http://blog.mikepearce.net. Navigate to there, and then you can get to each and every other page on my blog without having to know the URLs.
Even if you DID know my URLs (and by ‘know’ I mean bookmark, only the most diehard Mike Pearce fans memorise all my blog post URLs, dontcha Jon?) I might, one day, decide to change the schema, so they wouldn’t work. However, you’d get a 404 and a list of possible URLs, or I might even 301 redirect you if I was feeling nice (and this wasn’t hosted on wordpress.com). The thing about this hypermedia is that it’s so easy to get anywhere you like, the very definition of surfing.
So, now you’ve had a short tutorial on what HATEOAS means, Hypermedia As The Engine Of Application State, essentially means that your clients shouldn’t be building their own URIs. They should be requesting one of your endpoints and retrieving a list of URI with which they can interact with with one of the HTTP verbs (GET, PUT, DELETE, UPDATE, POST). The idea (which I illustrated above) is, that as an API provider, should you need to, you can change your URIs when you like and your clients will still know how to access all the resources of your API (if they’ve written their client to support that kind of thing). You could get away with one endpoint, the root of your API URI scheme. But it might be nice to provide one or two more which aren’t every going to change, perhaps:
http://api.yourcompany.com/users http://api.yourcompany.com/links
Further, you can then provide your users with a list of properties that the API response will return at each of these endpoints, that links to the URIs they need to access to interact with your service. So, if the user were to:
GET: http://api.yourcompany.com/users
The response might be:
{
"userMutateActions":
{
"add": "/users/add",
"update": "/users/updateUser",
"delete": "/users/deleteThatSucka"
},
"userReadOnlyActions"
{
"get.Details": "/users/$username",
"get.Email": "/users/$username/email",
"get.Dob": "/users/$username/dob
}
}
Now you’ve a contract with your clients that means they can visit any of the three endpoints and be able to navigate their way around the rest of your API. If you were a nice API provider you could provide examples of code for your clients to integrate into the software that interrogated your API for these endpoint URI indexes…
You see, I think that here it is a little fuzzy. The idea is that a client can and should be able to bookmark a URI it knows about and be able to call that URI and have your API still respond is one of the tenets of a REST API, but it’s at odds with HATEOAS.
Anyway, that’s for eggheads and philosophers. Use your best judgment to decide when and where to do this kind of thing, how many endpoints you have and what you provide as URIs to your clients.
I should really get to the point described in the title. But you’ve probably worked it out by now…
There are a couple of ways of versioning an API, check out slides 88 – 93 of my presentation in the API AntiPatterns presentation here, either on the URL, or in the body of a request or with primary and secondary URIs. The thing is, after I wrote that and while I spent a bit of time reading more about cookies, I decided that none of those methods of versioning were any good, the answer is…
… don’t actually bother with versioning. If you’re adhering to the HATEOAS contraint and you’ve created some good, solid endpoints, some documentation that creates contracts as to nomenclature of your URI properties with your clients, then you don’t need to version. Just change the URIs and the values of the URI properties at each endpoint and the client won’t even know you’ve done it.
Shhh, it’ll be our little secret!
Cookies and the RESTful API
Right, after my presentation at PHPLondon this month, the most contentious issue was that of using cookies with your REST API. I said, in no uncertain terms, that you shouldn’t do it. There were a few cries from the audience which were akin to the flapping you hear in a parliamentary broadcast, Derick Rethans didn’t agree but had the grace not to publicly embarrass me* and one comment on the original post requesting a clarification of my statement.
So, to clarify!
One of the most important constraints of REST is that it should be stateless, that is, every request made to API should contain everything the application needs in order to service the request. Now, at it’s most terse, that is my clarification, however, the quicker witted and cleverer among you will be proud to announce that a cookie is part of a HTTP request, and you’d be right, so, more clarification is needed.
In order to get to my decision that cookies shouldn’t be used with a REST API, you have to consider HATEOAS (Hypertext As The Engine Of Application State). What this means is that, a client only needs to know one, well published, end point to your entire API and from there, they can navigate the whole damn thing. Many people have blogged on this (including the big man himself), so I shan’t go into it right now, take a look at those links, but it’s safe to say that your clients shouldn’t be building their own URIs, they should be given the URIs from the API. Following this through to it’s logical conclusion means that the client shouldn’t be storing anything that isn’t the well published end point, because you, as the service provider, could change the requirement of whatever it is the client has stored as a cookie on a whim and then the client is banjaxed.
An example you say? OK. Let’s assume for a moment that we all agree that storing login information on the client is NOT GOOD. Right? (Consider XSS briefly and you’ll understand why – although less relevant nowadays, it’s still relevant (just)). OK, so the only other logical thing we might want to store in a cookie is an authentication token. That seems fairly harmless in it’s implementation, no? Yes it is and really, we could end the post here as, at a push, this is probably the one thing that IS OK to store in a cookie.
But I don’t sanction it.
Why? Because it doesn’t adhere to the constraints of HATEOAS and statelessness. Now the client has a cookie with an authentication token in it, you cannot change the way you authenticate, or the way the token is created without breaking the link between your API and each client that is using that token. If you publish a change to your REST API which handles the tokens differently, each and every cookie token will be useless.
Granted, this example doesn’t really offer anything that couldn’t be fixed with another login, but that’s not the point. The point is, it’s not RESTful.
To wit; this is a rule you can break and, probably, get away with it. However, you’ll feel dirty as it won’t be REST; try not to do it, make Roy happy.
*Although, he did heckle me on many other points.
10 Hadoop-able Problems (a summary)
So, the new company I work for, Affiliate Window, are pretty awesome. Technically, they’re not driven by what is cool, or what the latest buzzword is on The Twitter that one of the directors saw on the telebox. They do what is necessary to get the job done, using the best tools. If this requires some in house dev, then time is found. If there’s a cool bit of tech from outside which fits the problem, then they’ll try it.
They’re also not hemmed in by the corporate, big enterprise world of “it’s the way others do it, so we should to”. They’re also good at long-term investment in their team and their tools. Plus, I get to use Ubuntu as my desktop. Rock on.
Anyway, a meeting was arranged for today where we could watch a presentation on Cloudera’s Hadoop (which you can see here at GoMeeting, although only on windows and only after registering (great, more vendor lockin!)). It was called ’10 Common Hadoopable Problems’ given by Jeff Hammerbacher (their Chief Scientist no less!) and was basically things that you can do with hadoop (that isn’t counting words…). I thought I would summarise them here, although I’d encourage every last one of you to watch it as it’s pretty interesting.
- Modelling True Risk – If you think about this in the context of banks or other financial institues (which is, well, banks) this is a really useful way of burrowing deeper into your customers. You can suck in data about their spending habits, their credit, repayments everything. Munge it all together and squeeze out an answer on whether to lend them more money.
- Customer Churn Analysis – Hadoop was used here to analyse how a telco retained customers. Again, data from many different sources, including social networks AND the calls themselves (recorded and then voice analysed, I guess) were used to work out how and why the company were losing or gaining customers.
- Recommendation engines – I don’t really need to explain this one do I? Thinking about this in terms of Google, this is like the ranking algorithm. Sucking in a bunch of factors like; popularity, link depth, buzz on Twitter etc and then scoring links for display in score order later.
- Ad Targeting – Similar to the recommendation engine, but with the added dimension of the advertiser paying a premium for better ad-space
- Point Of Sale Transaction Analysis – On this face of it, this seems simple and straightforward; analysing the data that is provided by your P.O.S device (your till). However, this could also include other factors like weather and local news, which could influence how and why consumers spend money in your store.
- Analysing Network Data To Predict Failure – The example given here was that of an electricity company which used smart-somethings to measure the electricity flying around their network. They could pump in past failures and current fluctuations and then pass the whole lot into a modelling engine to predict where failures would occur. It turned out that seemingly unconnected, small anomolies on the system were connected after all. This data wouldn’t have been able to be mined any other way.
- Threat Analysis/Fraud Detection – Another one for the financial sector and very similar to Modelling True Risk. Hadoop can be used to analyse spending habits, earnings and all sorts of other key metrics to work out a transaction is fraudulent. Yahoo! use Hadoop with this pattern to ascertain whether a certain piece of mail heading into Yahoo! Mail is actually spam.
- Trade Surveillance – Similar to Threat Analysis and Fraud Detection, but this time pointed squarely at the markets, analysing gathered historical and current live data to see if there is Inside Trading or Money Laundering afoot!
- Search Quality – Similar to the recommendation engine. This will analyse search attempts and then try to offer alternatives, based on data gathered and pumped into Hadoop about the links and the things people search for.
- Data “Sandbox” – This is probably the most ambigious, but the most useful Hadoop-able problem. A data sandbox is just somewhere to dump data that you previously thought was too big, or useless or disparate to get any meaningful data from. Instead of just chucking it away, throw it into Hadoop (which can easily handle it) then see if there IS data you can glean from it. It’s cheap to run Hadoop and anyone can attach a datasource and push data in. It allows you to make otherwise arbitrary queries about stuff to see if it’s any use!
As you can see, most of these boil down to “Aggregate Data, Score Data, Present Score As Rank”, which, at it’s simplest, is what Hadoop can do. But the introduction of the idea of a Data Sandbox and the ability, using Sqoop, to push the analysed data back into a relational database (for a data warehouse for example) means that you can run Hadoop independently and prove it’s worth in your business very cheaply.
API Anti-Patterns (how NOT to write a RESTful API)
[Vimeo http://vimeo.com/13922981 w=640&h=385]
I had the honor of giving another talk at PHPLondon this month. Although I only had two weeks notice to research and write the thing, I think I managed to pull it off!
The talk was on API Anti-Patterns. I’d originally thought about doing a talk on How To Write a RESTful API, but the topic is enormous and sprawling and I only had 30 mins. So, I flipped the idea on it’s head and wrote about the things which we find with supposedly RESTful APIs which really aren’t RESTful. It’s shorter and, more importantly, funnier. So, below you’ll find the video and the slides from the night. Get in touch if you have any questions.




2 comments