Firejack Technologies: soap

When constructing APIs for complex systems, particularly web-based systems, REST is becoming the preferred protocol. While the trend is strong, many programmers are still used to designing and building standard Remote Procedure Call (RPC) style web services and SOAP and WSDL are still major players in integration. While REST is simple and straight-forward (once you get the hang of it), it seems at first blush to have difficulty with problems that standard function calls handle with ease.

Enter Pseudo Documents - conceptually sound business domain data critical to communication but not necessarily stored in your database.

Standard REST encourages exposure of the entire data model as an organized set of documents arranged in a logical folder structure. When considering directly exposing a database in this way, each table (or each major table) will have a path location on the web site, a simple defined document to represent its data, and support create, update, read list, read single item, and delete calls at that path using the POST, PUT, GET, GET with ID passed on the path, and DELETE HTTP methods respectively. Every major entity is arranged in a taxonomy by URL path as if each record were simply a document file and each table a directory on the web server. This structure makes it very easy to define the API for basic CRUD operations quickly with most of the thought going into organizing the taxonomy logically.

Where this simple strategy appears to fall down is around non-standard operations like searches, reporting, and batch operations. When operations don't map neatly to your data model, defining Pseudo Documents (you can think of them as data views or message) to represent specific actions allow the API to call out important concepts like report requests and search queries as first-order concepts. These special documents now declare data critical to specialty operations in the same way the data model entities themselves were before (actually these documents existed already in the procedural world - we just don't recognize them explicitly) and REST opens them up to being stored, cached, and even used in asynchronous services quite easily.

A great example of a Pseudo Document is a report request. Imagine the need to pull data from 3 joined, time-series data tables where the request takes in a date range (start and end dates). Let's assume the output data is also grouped by geographic region and includes total number of people, total spending, total revenue, and profit for each day and geographic area. In this case, the input is more complex than a simple identifier (although not much) and the output is not directly related to any one specific table. Moreover, the basic CRUD operations do not truly apply at all in the standard sense.

As the input of a report request becomes more complex, it often makes sense to think of the report request as a document in and of itself and to use the POST method as if you were creating a ticket to be fulfilled (i.e. CREATE Report Request). You can even merge the request and response into a single document so that requests may be created, queued, fulfilled, cached, and saved.

Here is an example request containing only the input data:

POST /data/report/geo/finance/day HTTP 1.0

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011'

}

In the simplest case the response comes back immediately along with the request data so that the document is still a single document as shown below.

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011',

    identifier: 12345678000,

    data: [

{

        region: 'southeast',

        date: '10-10-2011',

        people: 12332,

        spending: 1233.22,

        revenue: 2344.22,

        profit: 1111.00

      }, 
      {

        region: 'southeast',

        date: '10-10-2011',

        people: 12332,

        spending: 1233.22,

        revenue: 2344.22,

        profit: 1111.00

      }

...

]

}

Because we consider the document a record of a specific report request and it's result for the daily geographic financial report we can represent the entire report as a single document cleanly. We can also consider the POST a creation activity in line with the REST specification. This strategy also opens up the possibility of moving a long-running report to be fulfilled asynchronous by returning just an identifier and a status at creation time as shown below.

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011',

    identifier: 12345678001,

    status: 'pending'

}

Since the response indicates that the data is not ready and returns a unique identifier for the report request, the client can now poll for the results until they come back. We can use the GET method with this ID to poll for the results as shown here.

GET /data/report/geo/finance/day/12345678001 HTTP 1.0

content-type: application/json

While the report is being run, the response will for the GET method will continue to be identical to the pending status response above. Once the asynchronous report has completed however, the response can return with with the complete data for the report as shown in the longer response above.

Pseudo Documents like these allow you to leverage REST to accomplish very interesting things as easily as in any standard procedure call. The added benefit is that every critical message in the business service becomes a declared and defined part of the information taxonomy which often improves communication. This strategy also allows for algorithmic translation between REST and standard RPC services like SOAP or RMI to ensure parity across different types of interfaces.

Thursday, January 26, 2012

Pseudo Documents: Reconciling REST, Reporting, and RPC