Sunday, May 6, 2012

The Ruby PHP Java Battle

Even though there are impressive solutions being built today on Ruby on Rails and PHP and even though those languages have some clear advantages, when you have to build a serious computing solution you need a language that can handle the heavy lifting. Today's Internet projects all have different value propositions and many are all about simple data and content (articles, blog entries, pictures, and videos) getting from source to repository and then out to the audience. Even Facebook fits mostly into this category and it has proven you can build a very powerful and successful communication vehicle for millions of people using PHP. In contrast, Google builds almost all of its solutions in compiled languages like Java (GWT is an entire RIA framework that powers Google Calendar, and Google Analytics to name a few) and they are solving equally valuable but somewhat more complex problems.

The massive set of well-structured APIs and libraries for manipulating data and networks and managing machine-to-machine communication using Java give it a huge advantage over scripting solutions. Today's Java community allows you to make five servers talk to one another securely and smash data in memory to achieve distributed data-crunching solutions or cooperative web services in ways that Ruby and PHP cannot hope to compete with. Native binary communication between objects and other key operational advantages also put Java in a different weight class than these other solutions. Still, not every idea needs these advantages at the expense of time to market (or at least, not at first).

Most businesses are looking for instant gratification to quickly prove the viability of a business idea. A savvy entrepreneur will often look for an off-the-shelf product they can tailor to model their idea regardless of the technology. The drive to build tools that let an average person manipulate the Internet is a noble one and one that scripting languages have a leg up on to date. In fact, with the ongoing expansion of cloud-based services like Google Checkout, PayPal, Google Maps, and the like, there are more and more things a basic script-based web site can do without really needing the horse-power to actually implement anything major from scratch. Successful businesses are cobbled together quickly (this is often called a mash-up) everyday using this strategy on top of Ruby or PHP, sometimes using something like Drupal or Wordpress as a baseline.

Once an idea or business grows to the point where real integration or large-scale problem solving is required, the immediate gratification advantages scripting allows in earlier stages begins to disappear entirely. The more successful the idea the worse this problem becomes and soon the difference between deploying changes to a Ruby, PHP, and Java solution vanish.

The Java community has been solving scaling and distributed computing problems for a lot of years and has learned a ton from the Ruby / PHP communities about time to market and flexibility. In fact, the entire Java community has been re-tooling to close the surface-level flexibility gap and is now very close to complete success. Companies like Firejack Technologies are pushing that envelope every day in fact, hoping to save early stage companies the re-tooling effort that often becomes critical for survival in competitive industries.

The goal is to achieve completely re-configurable services and application that don't require code changes to alter basic functionality. Business rules, security settings, page layouts, look and feel control, and user-experience really don't belong in code at all, but the Java solutions to do these things are still not accessible to average people. This is only a matter of time though, and soon the gap will close completely.

For many businesses, their growth problems begin with operational issues that require workflow, security, auditing, complex accounting, and advanced business analytics. Cloud based and off-the-shelf offerings can solve some of the basic problems, but the integration, scaling, and security concerns soon become far too complex for the small advantages that the scripting solution may have to matter. What is more, very few of the actual service providers themselves are using scripting languages to build their solutions, and for good reason.

In the end, companies or developers out to solve tomorrows problems and provide services to other people the way Google, PayPal, and Facebook do would be better off using Java from the beginning. Keep in mind Facebook was just a web site and came into billions of dollars before they started re-defining the Internet and they could afford to sink as much time and energy as they liked into making their technology compete. In the near future, there will be tools that rival all of the advantages of scripting languages and still give Java programmers the power tools to make incredible innovations happen. For businesses that are all about information for consumers, scripting may still be your best friend, but for the rest, making the right technology choice up front can be the difference between raging success and squandered potential greatness.

Thursday, January 26, 2012

Pseudo Documents: Reconciling REST, Reporting, and RPC

When constructing APIs for complex systems, particularly web-based systems, REST is becoming the preferred protocol. While the trend is strong, many programmers are still used to designing and building standard Remote Procedure Call (RPC) style web services and SOAP and WSDL are still major players in integration. While REST is simple and straight-forward (once you get the hang of it), it seems at first blush to have difficulty with problems that standard function calls handle with ease.

Enter Pseudo Documents - conceptually sound business domain data critical to communication but not necessarily stored in your database.

Standard REST encourages exposure of the entire data model as an organized set of documents arranged in a logical folder structure. When considering directly exposing a database in this way, each table (or each major table) will have a path location on the web site, a simple defined document to represent its data, and support create, update, read list, read single item, and delete calls at that path using the POST, PUT, GET, GET with ID passed on the path, and DELETE HTTP methods respectively. Every major entity is arranged in a taxonomy by URL path as if each record were simply a document file and each table a directory on the web server. This structure makes it very easy to define the API for basic CRUD operations quickly with most of the thought going into organizing the taxonomy logically.

Where this simple strategy appears to fall down is around non-standard operations like searches, reporting, and batch operations. When operations don't map neatly to your data model, defining Pseudo Documents (you can think of them as data views or message) to represent specific actions allow the API to call out important concepts like report requests and search queries as first-order concepts. These special documents now declare data critical to specialty operations in the same way the data model entities themselves were before (actually these documents existed already in the procedural world - we just don't recognize them explicitly) and REST opens them up to being stored, cached, and even used in asynchronous services quite easily.

A great example of a Pseudo Document is a report request. Imagine the need to pull data from 3 joined, time-series data tables where the request takes in a date range (start and end dates). Let's assume the output data is also grouped by geographic region and includes total number of people, total spending, total revenue, and profit for each day and geographic area. In this case, the input is more complex than a simple identifier (although not much) and the output is not directly related to any one specific table. Moreover, the basic CRUD operations do not truly apply at all in the standard sense.

As the input of a report request becomes more complex, it often makes sense to think of the report request as a document in and of itself and to use the POST method as if you were creating a ticket to be fulfilled (i.e. CREATE Report Request). You can even merge the request and response into a single document so that requests may be created, queued, fulfilled, cached, and saved.

Here is an example request containing only the input data:

POST /data/report/geo/finance/day HTTP 1.0

content-type: application/json

{
    startDate: '10-10-2011',
    endDate: '11-12-2011'
}

In the simplest case the response comes back immediately along with the request data so that the document is still a single document as shown below.

content-type: application/json

{
    startDate: '10-10-2011',
    endDate: '11-12-2011',
    identifier: 12345678000,
    data: [
      {
        region: 'southeast',
        date: '10-10-2011',
        people: 12332,
        spending: 1233.22,
        revenue: 2344.22,
        profit: 1111.00
      }, 
      {
        region: 'southeast',
        date: '10-10-2011',
        people: 12332,
        spending: 1233.22,
        revenue: 2344.22,
        profit: 1111.00
      }
     ...

    ]
}

Because we consider the document a record of a specific report request and it's result for the daily geographic financial report we can represent the entire report as a single document cleanly. We can also consider the POST a creation activity in line with the REST specification. This strategy also opens up the possibility of moving a long-running report to be fulfilled asynchronous by returning just an identifier and a status at creation time as shown below.

content-type: application/json

{
    startDate: '10-10-2011',
    endDate: '11-12-2011',
    identifier: 12345678001,
    status: 'pending'
}

Since the response indicates that the data is not ready and returns a unique identifier for the report request, the client can now poll for the results until they come back. We can use the GET method with this ID to poll for the results as shown here.

GET /data/report/geo/finance/day/12345678001 HTTP 1.0

content-type: application/json

While the report is being run, the response will for the GET method will continue to be identical to the pending status response above. Once the asynchronous report has completed however, the response can return with with the complete data for the report as shown in the longer response above.

Pseudo Documents like these allow you to leverage REST to accomplish very interesting things as easily as in any standard procedure call. The added benefit is that every critical message in the business service becomes a declared and defined part of the information taxonomy which often improves communication. This strategy also allows for algorithmic translation between REST and standard RPC services like SOAP or RMI to ensure parity across different types of interfaces.

Tuesday, January 17, 2012

Domain Driven: The Consumer is King

One of the key reasons great software is successful is that it does a very good job of using metaphors and representations of information that everyone understands. Consumers of software systems must dictate the language used to deal with work or intuition works against them at every turn. This Domain Dictionary of terms, concepts, and metaphors are already in use wherever we build software and whether it's inherited from generally accepted concepts like finance or accounting, an established industry like publishing or banking, or are part of what differentiates the way a specific organization operates from another, years of evolution have ingrained the ideas (if not the entire lexicon) of the Domain Language in the minds of your customers.

The Domain Driven community refers to the result of capturing the domain language as a Ubiquitous Language stating that the terms of the language should proliferate all communication and code. This concept is among the most powerful concepts in Domain Driven Design and incredibly valuable in ensuring all members of a project team and the stakeholders communicate and share information properly. In the end, every API, class name, package name, URL pattern, e-mail must reference the domain language (or ubiquitous language) and a good architect will ensure that these definitions are up to date and spread across the team in some easily accessible and referable form (wikis make a great repository for the glossary).

What is key about this approach is that it alters the typical entity strategy (ground-up) construction of a system when done properly and focuses on the consumer of the system and its services first (top-down). This means leading with the language the consumers already use (and visualizations that make sense helps too) all the to the field level and making a commitment to every known view of information as they understand it. It also means abandoning the urge to force new metaphors or force-feed vocabulary (at least as much as possible) on the consumers (your customer) and an expectation that the architect and development team will handle translation of the conceptual model or domain model to the underlying Entity Model (that deals more directly with data as it's stored in a database).

If necessary, this abstraction can carry down to the data model in the form of views or materialized views particularly if your consumers feel strongly about reporting from the data model directly. Ideally, the chosen system architecture should supports a remote Services API layer, a business domain services layer, and a separate data access layer where domain data objects (DTOs or VOs) can be translated into more granular or less business friendly (although you should keep this minimal if possible) Entity Model representations. Still, the code at the domain data layer should be almost understandable to anyone in the domain because it should be completely reliant on actions and domain information definitions directly from the Domain Language Lexicon (DLL) or Dictionary (DLD). Actions, Entities (or Objects), Processes, and Events will make up the backbone of the language and those names should show up across the code base to ensure easy mapping between requirements, code, and sub-systems.

The more the underlying code base and user interface reinforce the DLD/DLL, the better chance of the development team delivering a system that is intuitive and clear to their consumers. The same goes for basic web services APIs and other integration points. The name of the game is consistent semantics across from top to bottom and clean, clear flow from one operation to the next. The great news is, great architecture strategies like MVC, OOD, N-Tier, Event-Driven, and Process Driven are all compatible with Domain Driven Analysis where the consumer is king. Done right, everyone wins and innovation will be well received because it will speak to everyone when it's delivered. 


Sunday, January 15, 2012

Making Global Teams Agile (part 3): Great Agile Documentation

As a follow up to the continuing conversation about making globally distributed teams work in agile fashion, we focus this article on what it really takes to make requirements documentation work. The most important step in getting this right is to recognize "this isn't your father's requirements document anymore" (although that might be his car commercial tag line) and that "requirements" are more than just a few words in a story. What all great agile requirements documents have in common is that they are short, to the point, focus on the visual, incorporate functional and technical guidance in one place, and they are expected to be iterative.

In part 2 we discussed using agile tracking tools like JIRA, VersionOne, and Pivotal Tracker along with a wiki to capture requirements. Wikis are great for requirements from one perspective because they are easy to author, quick to edit, and track versions over time (which is invaluable) in understanding change. Every day that goes by on an agile project, people have questions about how and what to develop to meet the needs of the customer. Capturing those answers as you go and making sure that where the customer and technical leads expect consistency, there is clear guidance the team can always get to means the difference between success and failure. Surprisingly little effort (as a start) just putting together a simple wiki page for each area of your project and keeping it up to date incrementally makes an enormous difference and saves tons of time and mistakes.

While Wikis do a great job of tracking change over time and they are more consolidated (often) than your stories (sometimes you'll see 20 or more stories for one functional area of your project) they have 2 major shortcomings: they still spread work out in many places and they are not disconnected. Sometimes, it is just plain easier to use a real document or tool to share details. Case tools like VP-UML and Poseidon (I site these for their platform independence) are great for sharing technical intent across a distributed team and the ability to share documents like database diagrams, class diagrams, and sequence diagrams (these are some of the most useful visualizations for communicating technical direction) can save hours or even days of re-work. Throwing implementation guidelines, a handful of visualizations, and requirements together for the shared standards and a couple functional areas of the project into a single, portable word document (as an example) can also be extremely helpful for allowing developers to work disconnected and stay on target. Whichever document types your team uses, they should be treated just like code and should be checked in and shared via your source code repository (probably GIT or SVN these days) or at the very least uploaded to the wiki in an appropriate location. Again, these documents are works in progress and their content may get moved onto or pulled from the wiki as appropriate to the needs of the team.

Focusing on the visual in these documents is another important way you can set the team up to succeed. Basic wire-frames or designs of interfaces are the first place to start. From there, descriptions of data in the form of an annotated ERD diagram and a basic class and package diagram ensure that the team understands what is expected for delivery, what information is involved, and how the technical lead and team agree the basic structure should be implemented. For projects using scripting languages like Ruby and PHP, you can still use component diagrams to communicate this. What is great to realize is that a project with 100 functional features probably only requires 10 - 12 of these pictures in place for the team to infer how everything should work, and you can build these in just a couple of hours / week and save hours and hours of miscues and confusion in every sprint. If you're skeptical, try doing a search through your inbox for one of your story names and see how many e-mails got traded over the course of a single sprint and then try spending the time on docs and repeat the exercise.

Back in the days of Waterfall and the Rational Unified Process (RUP) the software architecture community pushed a very formal methodology for doing analysis and design that often led (and still does sometimes today) to 2 or 3 separate and specific documents for a project. These documents became rigid and required layers of change management to alter and a repeat of the entire waterfall process. Agile documents still benefit from structure, but it is far better to merge scope, functional requirements, and technical guidance into one iterative document (or at least one per area). The skills of a great analyst, architect, development lead are still incredibly valuable and applicable to agile or SCRUM projects, but now they work as a team to produce the exact amount of guidance needed at any given time. The easier this information is to access and have at the developer's fingertips, the better, and the clearer the authoritative source of the latest ideas are, the less time the team wastes in confusion and re-work. It shouldn't be un-common to see a 4 page document that shows 2 use cases, some functional requirements, a wireframe, a class diagram, and a database diagram mixed together. You'll find lots of this content and concepts are re-usable from story to story and function to function and if you commit to iterating you produce great guidance that keeps the team moving as one to get the job done with minimal week to week effort.

As we said, this is not your father's documentation but a few hours / week spent doing this out ahead of the project team can double your team's velocity and effectiveness easily. The trick is right-sizing and simplifying the documentation to your team's needs, establishing templates and patterns that make producing and reading these documents easy, keeping the documents in some kind of version management system and easily accessible to everyone (online and offline), and accepting the iterative nature of the documents and the process. The goals are to save time and create shared vision to improve execution. In the end, team synergy and communication accounts for more than half of the team's effectiveness and a great documentation process can make it look easy.

Saturday, January 14, 2012

Making Global Teams Agile (Part 2): Tools and Documents

This is the second part of our series about what we've learned about Agile over the past 2 years working hard to get to a simple, standard process for making Agile and SCRUM work in almost any team dynamic. As we've said before, our teams focus on making resources that live and work all over the globe behave and feel like one highly effective team.

We know the keys to success are picking an agile methodology (in this case SCRUM) that allows for frequent, short communication (Meetings) and selecting tools (Tools) and documentation templates (Documentation) that ensure all team members and stakeholders communicate consistently and effectively. We'll talk more about commitment to stakeholder / client communication and ensuring a trusted presence in a follow-up article but in this entry we focus on what you need from your tools and the importance of Agile project documentation.

There are many Agile and SCRUM centered tools on the market today, both for use in the cloud and installation in house. Many of these tools are adaptations of traditional ERP system modules from larger companies, others have grown up in the open source community, and still others are lean and focused on just SCRUM. VersionOne, Pivotal Tracker, and JIRA from Atlassian are among some of the most popular and widely used tools on the market and each represents a slightly different class and approach to dealing with Agile projects. Selecting the right tool to track stories, issues, and defects for your team is important, but at the end of the day, using the tool for just the right amount of process is what leads to truly effective and successful teams. Many teams couple the use of one of these tools with a Wiki of some kind and cross linking (using URLs) between the two systems. Wikis make a much better coherent and version aware record of detailed decisions than does a collection of stories or epics (item entries in the agile tracking tool) and accepting the need for a mix of wiki and other documentation is critical to making teams successful.

Priority number one in establishing a tool is to set up a simple process and establish common understanding of the language of the tool across the team. Processes and tools, just like meetings (see part 1), have a tendency to become heavy-weight over time and cut deep into the time available to do actual development work. Striking a balance between effective project tracking and reporting and protecting the 30 hour + work week is crucial no matter which tool you select.

Typically, Agile tools provide ways of doing the following things: describing requirements, prioritizing features and defects, estimating (points and sometimes hours) work, tracking item status and progress, calculating team efficiency over time (velocity), and planning and closing sprints and releases. In the end, all of these tools are just issue tracking tools with various features for grouping items (issues, epics, stories, bugs) into sprints, releases, and then assigning them to team members. Every team member should have access to the tool and all work should go into the tool and get updated regularly as part of the process. Where project tracking requirements begin overshadow work, project managers or SCRUM masters should try to shield the team from anything more than about an hour / week of effort by updating details as part of recurring status meetings or offline after these meetings.

While it is definitely good practice to keep all possible details of story in the tool itself to facilitate work, often linking out to a wiki page or a document (or both) makes maintenance of stories like feature enhancements and bugs for more simple. It's also often much easier to edit and compose complex requirements in this way. The process for driving out details (technical and functional) and reconciling the impact of change is a key missing prescription in the standard SCRUM methodology (and most Agile methodologies) and doing it well means the difference between successful sprints and failed ones. If the team does not achieve the intent of a story in the customer's mind arguing about the content of a story is useless. A story is only truly successful if the client or customer likes what he / she sees and is willing to accept it (at least temporarily) and should be treated as such. Establishing standard templates for capturing and maintaining requirements incrementally as they evolve and integrating these documents with the meeting strategy (meetings should result in documents getting approved or even updated) and tools allows you to always have "just the right amount" of documentation at all times. It also prevents repetition as functional requirements change and evolve (which should be expected).

Selecting the right tool, establishing a shared language, and formalizing some simple templates for integrating documents, wikis, and your tracking tool can radically improve execution and communication on an agile project. Mixed with the periodic execution reviews and testing, these agile documents lower the work of maintaining details in the tool and make details more easily accessible and manageable over time. In the end, three or four document template, a clear wiki page strategy and some standards go a long way toward rounding out the agile process and making your team excellent - no matter where they live and work.

Making Global Teams Agile: Meeting Strategies

For the past 2 years, we've been working hard to get to a simple, standard process for making Agile and SCRUM work in almost any team dynamic. Specifically, our teams focus on making resources that live and work all over the globe behave and feel like one highly effective team without undue overhead.

The keys to success in building an effective team, in our experience, are picking an agile methodology (in this case SCRUM) that allows for frequent communication in short, structured ways (Meetings) and selecting tools (Tools) and templates (Documentation) that ensure all team members and stakeholders communicate consistently and effectively with minimal wasted time and effort. A real commitment to stakeholder / client communication and interpretation and ensuring a trusted presence (real or virtual) with the customer is often crucial as well. In this article we focus on what it takes to really establish a great meeting strategy.

A great meeting strategy is the first place to start in getting a project off the ground. Standard SCRUM calls for several types of standard meetings, including: Sprint Planning, Backlog Grooming, Sprint Demonstrations, and Sprint Retrospectives. These regular meetings allow the team to understand what the customer wants to see next, demonstrate progress, refine requirements and requests, and improve the general process. Mixed with the daily SCRUM (the daily status meeting that gets the team all talking and working together constantly), just establishing these meetings on a consistent schedule and setting up ground rules and agendas is possibly the most important property of an effective, reliable project team.

Of course simply establishing some meetings and agendas is not enough to make a project or even a meeting strategy successful. Every meeting has to have a clear purpose, ground rules, and a known agenda that remains consistent over time. You also have to place meetings at the right time of day (beginning and end of day meetings work a lot better than mid-day meetings because they interrupt less work) and keep them as short and effective as possible with manageable goals. A Backlog grooming meeting should happen once every sprint (we find 2 week sprints are the most effective) and should have the goal of refining and prioritizing just enough requirements to cover the next 2 sprints in about 60 - 90 minutes. Daily SCRUM meetings should last less than 30 minutes and should allow every team member to speak quickly just to identify progress on specific work and raise (but not discuss in detail) any concerns, problems, or open questions. The last few minutes of every meeting must be reserved to discuss follow-up actions and next steps to set expectations for the next meeting or for progressing work. The goal of any meeting strategy should be to reserve as much time for the team to do what they need to complete work and demonstrate it every sprint while ensuring plenty of opportunity to reserve time for frequent course correction from the customer, client, and project leaders between sprints, milestones, and releases.

Finally, establishing regular channels of communication within the project team for standard activities like testing, design reviews, code reviews, and commitments is just as important as the standard SCRUM meetings themselves. Sprints (particularly short ones) move too fast to assume critical activities will happen without this planning. If you are relying on a development lead or architect to ensure quality (we recommend this highly if you care about maintaining feature velocity and controlling defects) he or she should have 2 - 3 scheduled, structured times to intervene and review work every sprint and teams should get used to presenting their work in this way. Once again perfection is not the goal, but intervention is not perfection - just exceedingly better than chaos. Likewise, testing should commence and become the focus of each sprint at a standard, scheduled time and features should be frozen, and some time for test review should be reserved as well.

In the end an effective meeting strategy dictates how much effective working time left to a team to really develop features and functions and establishes the culture and communication of the team. Done correctly, it allows a team to learn to work together and respect each others contributions and ensures that everyone pulls together to get the job done. Balancing communication with efficiency is always a challenge, but if you can reserve at least 30 hours of a 40 hour week (ideally more) for actually getting work done, the project will almost certainly be on the best possible track. From here, making the project team great is more about other things that we'll talk about in a later article.