Firejack Technologies: architecture

Showing posts with label architecture. Show all posts

Sunday, May 6, 2012

The Ruby PHP Java Battle

Even though there are impressive solutions being built today on Ruby on Rails and PHP and even though those languages have some clear advantages, when you have to build a serious computing solution you need a language that can handle the heavy lifting. Today's Internet projects all have different value propositions and many are all about simple data and content (articles, blog entries, pictures, and videos) getting from source to repository and then out to the audience. Even Facebook fits mostly into this category and it has proven you can build a very powerful and successful communication vehicle for millions of people using PHP. In contrast, Google builds almost all of its solutions in compiled languages like Java (GWT is an entire RIA framework that powers Google Calendar, and Google Analytics to name a few) and they are solving equally valuable but somewhat more complex problems.

The massive set of well-structured APIs and libraries for manipulating data and networks and managing machine-to-machine communication using Java give it a huge advantage over scripting solutions. Today's Java community allows you to make five servers talk to one another securely and smash data in memory to achieve distributed data-crunching solutions or cooperative web services in ways that Ruby and PHP cannot hope to compete with. Native binary communication between objects and other key operational advantages also put Java in a different weight class than these other solutions. Still, not every idea needs these advantages at the expense of time to market (or at least, not at first).

Most businesses are looking for instant gratification to quickly prove the viability of a business idea. A savvy entrepreneur will often look for an off-the-shelf product they can tailor to model their idea regardless of the technology. The drive to build tools that let an average person manipulate the Internet is a noble one and one that scripting languages have a leg up on to date. In fact, with the ongoing expansion of cloud-based services like Google Checkout, PayPal, Google Maps, and the like, there are more and more things a basic script-based web site can do without really needing the horse-power to actually implement anything major from scratch. Successful businesses are cobbled together quickly (this is often called a mash-up) everyday using this strategy on top of Ruby or PHP, sometimes using something like Drupal or Wordpress as a baseline.

Once an idea or business grows to the point where real integration or large-scale problem solving is required, the immediate gratification advantages scripting allows in earlier stages begins to disappear entirely. The more successful the idea the worse this problem becomes and soon the difference between deploying changes to a Ruby, PHP, and Java solution vanish.

The Java community has been solving scaling and distributed computing problems for a lot of years and has learned a ton from the Ruby / PHP communities about time to market and flexibility. In fact, the entire Java community has been re-tooling to close the surface-level flexibility gap and is now very close to complete success. Companies like Firejack Technologies are pushing that envelope every day in fact, hoping to save early stage companies the re-tooling effort that often becomes critical for survival in competitive industries.

The goal is to achieve completely re-configurable services and application that don't require code changes to alter basic functionality. Business rules, security settings, page layouts, look and feel control, and user-experience really don't belong in code at all, but the Java solutions to do these things are still not accessible to average people. This is only a matter of time though, and soon the gap will close completely.

For many businesses, their growth problems begin with operational issues that require workflow, security, auditing, complex accounting, and advanced business analytics. Cloud based and off-the-shelf offerings can solve some of the basic problems, but the integration, scaling, and security concerns soon become far too complex for the small advantages that the scripting solution may have to matter. What is more, very few of the actual service providers themselves are using scripting languages to build their solutions, and for good reason.

In the end, companies or developers out to solve tomorrows problems and provide services to other people the way Google, PayPal, and Facebook do would be better off using Java from the beginning. Keep in mind Facebook was just a web site and came into billions of dollars before they started re-defining the Internet and they could afford to sink as much time and energy as they liked into making their technology compete. In the near future, there will be tools that rival all of the advantages of scripting languages and still give Java programmers the power tools to make incredible innovations happen. For businesses that are all about information for consumers, scripting may still be your best friend, but for the rest, making the right technology choice up front can be the difference between raging success and squandered potential greatness.

Thursday, January 26, 2012

Pseudo Documents: Reconciling REST, Reporting, and RPC

When constructing APIs for complex systems, particularly web-based systems, REST is becoming the preferred protocol. While the trend is strong, many programmers are still used to designing and building standard Remote Procedure Call (RPC) style web services and SOAP and WSDL are still major players in integration. While REST is simple and straight-forward (once you get the hang of it), it seems at first blush to have difficulty with problems that standard function calls handle with ease.

Enter Pseudo Documents - conceptually sound business domain data critical to communication but not necessarily stored in your database.

Standard REST encourages exposure of the entire data model as an organized set of documents arranged in a logical folder structure. When considering directly exposing a database in this way, each table (or each major table) will have a path location on the web site, a simple defined document to represent its data, and support create, update, read list, read single item, and delete calls at that path using the POST, PUT, GET, GET with ID passed on the path, and DELETE HTTP methods respectively. Every major entity is arranged in a taxonomy by URL path as if each record were simply a document file and each table a directory on the web server. This structure makes it very easy to define the API for basic CRUD operations quickly with most of the thought going into organizing the taxonomy logically.

Where this simple strategy appears to fall down is around non-standard operations like searches, reporting, and batch operations. When operations don't map neatly to your data model, defining Pseudo Documents (you can think of them as data views or message) to represent specific actions allow the API to call out important concepts like report requests and search queries as first-order concepts. These special documents now declare data critical to specialty operations in the same way the data model entities themselves were before (actually these documents existed already in the procedural world - we just don't recognize them explicitly) and REST opens them up to being stored, cached, and even used in asynchronous services quite easily.

A great example of a Pseudo Document is a report request. Imagine the need to pull data from 3 joined, time-series data tables where the request takes in a date range (start and end dates). Let's assume the output data is also grouped by geographic region and includes total number of people, total spending, total revenue, and profit for each day and geographic area. In this case, the input is more complex than a simple identifier (although not much) and the output is not directly related to any one specific table. Moreover, the basic CRUD operations do not truly apply at all in the standard sense.

As the input of a report request becomes more complex, it often makes sense to think of the report request as a document in and of itself and to use the POST method as if you were creating a ticket to be fulfilled (i.e. CREATE Report Request). You can even merge the request and response into a single document so that requests may be created, queued, fulfilled, cached, and saved.

Here is an example request containing only the input data:

POST /data/report/geo/finance/day HTTP 1.0

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011'

}

In the simplest case the response comes back immediately along with the request data so that the document is still a single document as shown below.

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011',

    identifier: 12345678000,

    data: [

{

        region: 'southeast',

        date: '10-10-2011',

        people: 12332,

        spending: 1233.22,

        revenue: 2344.22,

        profit: 1111.00

      }, 
      {

        region: 'southeast',

        date: '10-10-2011',

        people: 12332,

        spending: 1233.22,

        revenue: 2344.22,

        profit: 1111.00

      }

...

]

}

Because we consider the document a record of a specific report request and it's result for the daily geographic financial report we can represent the entire report as a single document cleanly. We can also consider the POST a creation activity in line with the REST specification. This strategy also opens up the possibility of moving a long-running report to be fulfilled asynchronous by returning just an identifier and a status at creation time as shown below.

content-type: application/json

{

    startDate: '10-10-2011',

    endDate: '11-12-2011',

    identifier: 12345678001,

    status: 'pending'

}

Since the response indicates that the data is not ready and returns a unique identifier for the report request, the client can now poll for the results until they come back. We can use the GET method with this ID to poll for the results as shown here.

GET /data/report/geo/finance/day/12345678001 HTTP 1.0

content-type: application/json

While the report is being run, the response will for the GET method will continue to be identical to the pending status response above. Once the asynchronous report has completed however, the response can return with with the complete data for the report as shown in the longer response above.

Pseudo Documents like these allow you to leverage REST to accomplish very interesting things as easily as in any standard procedure call. The added benefit is that every critical message in the business service becomes a declared and defined part of the information taxonomy which often improves communication. This strategy also allows for algorithmic translation between REST and standard RPC services like SOAP or RMI to ensure parity across different types of interfaces.

Tuesday, January 17, 2012

Domain Driven: The Consumer is King

One of the key reasons great software is successful is that it does a very good job of using metaphors and representations of information that everyone understands. Consumers of software systems must dictate the language used to deal with work or intuition works against them at every turn. This Domain Dictionary of terms, concepts, and metaphors are already in use wherever we build software and whether it's inherited from generally accepted concepts like finance or accounting, an established industry like publishing or banking, or are part of what differentiates the way a specific organization operates from another, years of evolution have ingrained the ideas (if not the entire lexicon) of the Domain Language in the minds of your customers.

The Domain Driven community refers to the result of capturing the domain language as a Ubiquitous Language stating that the terms of the language should proliferate all communication and code. This concept is among the most powerful concepts in Domain Driven Design and incredibly valuable in ensuring all members of a project team and the stakeholders communicate and share information properly. In the end, every API, class name, package name, URL pattern, e-mail must reference the domain language (or ubiquitous language) and a good architect will ensure that these definitions are up to date and spread across the team in some easily accessible and referable form (wikis make a great repository for the glossary).

What is key about this approach is that it alters the typical entity strategy (ground-up) construction of a system when done properly and focuses on the consumer of the system and its services first (top-down). This means leading with the language the consumers already use (and visualizations that make sense helps too) all the to the field level and making a commitment to every known view of information as they understand it. It also means abandoning the urge to force new metaphors or force-feed vocabulary (at least as much as possible) on the consumers (your customer) and an expectation that the architect and development team will handle translation of the conceptual model or domain model to the underlying Entity Model (that deals more directly with data as it's stored in a database).

If necessary, this abstraction can carry down to the data model in the form of views or materialized views particularly if your consumers feel strongly about reporting from the data model directly. Ideally, the chosen system architecture should supports a remote Services API layer, a business domain services layer, and a separate data access layer where domain data objects (DTOs or VOs) can be translated into more granular or less business friendly (although you should keep this minimal if possible) Entity Model representations. Still, the code at the domain data layer should be almost understandable to anyone in the domain because it should be completely reliant on actions and domain information definitions directly from the Domain Language Lexicon (DLL) or Dictionary (DLD). Actions, Entities (or Objects), Processes, and Events will make up the backbone of the language and those names should show up across the code base to ensure easy mapping between requirements, code, and sub-systems.

The more the underlying code base and user interface reinforce the DLD/DLL, the better chance of the development team delivering a system that is intuitive and clear to their consumers. The same goes for basic web services APIs and other integration points. The name of the game is consistent semantics across from top to bottom and clean, clear flow from one operation to the next. The great news is, great architecture strategies like MVC, OOD, N-Tier, Event-Driven, and Process Driven are all compatible with Domain Driven Analysis where the consumer is king. Done right, everyone wins and innovation will be well received because it will speak to everyone when it's delivered.