Non Functional Requirements

All too often software development projects focus solely on the functional requirements, after all, upon delivery, our clients simply want to see an attractive functional solution. But what do our clients know of performance, scalability, capacity, security and availability? These non-functional requirements are typically implied, assumed, or the client is simply not aware that need to be specifically catered for. Instead the need to be driven from an IT perspective, but how do we (IT) go about putting measures on performance, capacity & availability and how we can ensure we deliver on these?

Any measurements of an object need a base unit of measurement, or at least a yardstick to measure against. It is not possible to measure performance without a yardstick since there is no unit of measurement. So what do we consider performance? Is it simply the ability to manage with whatever levels of usage are thrown at our solution? Is it a total number of users, or user requests per second? for web applications we normally measure page response times, but for most applications it should be based on the response time from an event (a click, or keystroke ) to a final rest state. If we base our ideas of performance on previous experiences, we need to have specific measures. E.g. If an existing website took 2.5 seconds to load for a single request, but slowed to 7 seconds for 100 simultaneous requests, then what do we consider acceptable performance if our new website takes the same amount of time to load? We can consider factors such as file size, bandwidth, the hardware (CPU & RAM), or Application Server. Fundamentally, we might assume the same performance as existing experiences is acceptable, but for how long?

An effective tool to ascertain current usage levels in terms of peak usage times, days & no of users/requests throughout is a Business Usage Model. This BUM defines the expected no of users now and in the future; the expected peak periods of use – for business or financial systems, this typically shows peak usage between 9am and 5pm Monday to Friday with peak usage typically earlier on. Over time, a trend can be determined to help predict future peaks and volume of users. Additionally the business may have some specific plans on how many customers or clients they are targeting to acquire over a given period. Tie these figures together and you should be able to predict your worst case scenario – how many users making requests at the peak time on your peak day. Test your site with that volume and measure how long it takes to load your site (specific pages that provide the most functionality). Now the figures you record will depend on what you have decided is acceptable. For financial transaction sites e.g. Banking and Trading, anything more than 1 second is unacceptable, and for anything with online bookings, you need scalability.

We’ve all seen those advertised discounted flights for one dollar. Imagine the load a sale like that would put on your system. That sort of promotion does not go out over night without months of planning and the right sort of system to support a peak load that these sort of sales would endure. If a regular flight booking site on average has capacity for say 1000 concurrent users, Imagine how many users are going to jump onto that site when users first see that TV advertisement or read that newspaper. If the average site has capacity for 1000 concurrent users, then for that one day, or even the first hour, you are going to need a system that can manage at least 10 -20 times that amount.

How do you design a system with that sort of scalability? There are a couple of ways you can do this:

  • Scale Horizontally
  • Scale Vertically
  • Scale both Horizontally & Vertically
  • Scale Virtually

Scaling Horizontally simply suggest more servers. Typically simple, cheap servers known as low cost “commodity” systems that can be easily purchased and replaced if one becomes faulty. If each server can handle x many requests, then n number servers means more requests can be handled (effectively n×x) however the solution must ensure that bottlenecks are simply moved elsewhere. Common bottlenecks include network routers, web services and connected third party systems, and more commonly, databases. Session management and caching are other important areas for consideration.

Scaling Vertically ensures that no one layer in a multi-tier architecture experiences the bulk of the load by adding resources – typically CPU, RAM & Storage. This means we have fewer yet more powerful servers with multiple redundancies to manage the load.

Scaling Horizontally and Vertically is obviously, as the name suggests, a combination of the two which means multiple powerful, expensive servers. This is an extremely expensive approach, however is quiet common in may large scale financial solutions.

Virtualisation has been around for quite a while and permits a virtual server to occupy a prescribed amount of data space, RAM and CPU utilisation to represent a real server allowing multiple virtual servers to occupy one or more, or even span across multiple physical servers. Scaling Virtually is similar to scaling horizontally and vertically, with the horizontal bit being the multiple virtual servers. Microsoft and VMWare have Virtualisation software that enables this sort of scaling.

One of the great benefits of Enterprise virtualisation tools is the ability to increase the no of virtual servers based on load.

Lets say we have 10 virtual servers running and each start to reach 50% capacity. The Virtualisation software can start creating new Virtual servers and load balancing them to share the load on the existing virtual servers. This is ideal in situations where an airline might advertise a one dollar fare deal – their server hardware may run an average of 5 – 10 virtual servers to cater for typical daily loads, however when demand becomes intense, it can automatically run up another 10-20 virtual servers to manage the load. Now in these cases, you would typically pre-empt the load by starting up additional resources, but it’s nice to know that this sort of system has the flexibility to cater to load on-demand. These sorts of systems sit on very large expensive servers that are clustered such that the virtual servers could run across a cluster of up to 5-10 physical servers with the Virtual Server image stored on a SAN for rapid access from any of the clustered servers. The no of physical servers required depends on the no of virtual servers that can be loaded onto each physical server and the levels of RAM or CPU utilisation required.

As with functional requirements, non-functional requirements also require vigorous testing. Testing and tuning cycles are typically done at the end of the project. As Kent Beck said “make it work, make it right, make it fast”; however performance still needs due consideration during design and development. Test scenarios are best based on real world use cases to reflect how the site will be used in production. To ensure the results can be trusted, it is vital that the test system used should be as close as practically possible (or affordable) to the production system. So many times, things go wrong when new software products are released because the production and test environments differ and some unforeseen variable causes things to go downhill.

Testing profiles for each of the real world use cases should consider a variety of loads:

  • Average or Typical user load
  • Peak or highest expected load
  • Stress or breakpoint load – the volume/load at which the you find the threshold of your system or where it fails or where performance is severely degraded. This is your breakpoint and must be avoided once in Production.

Knowing your breakpoint is a great place to start looking where your bottlenecks are using various server metrics and logs to identify the code responsible for the failure. When looking to resolve performance issues, start with changes that provide the highest benefit for the least amount of effort. After each change is made, re-run your breakpoint load case to see if bottle neck is removed. Hopefully, if things work and there is no threshold, you have improved performance and can now go about determining your new breakpoint. Keep repeating this process until you are happy with the breakpoint load and accepting of the point of failure. Another consideration is how you protect your application users from experiencing that breakpoint. For example, on websites,consider a customised page for the various HTML error codes (e.g. 404 and 500).

At the end of the day, next time your client asks “how much?”, consider the extra time and effort required to ensure the non-functional requirements are defined, tested and met.

Comments are closed.