February 21, 2017

DevOps Digest 401: A Prelude to Continuous Testing


It’s time to move past the nuts of bolts of builds and Continuous Integration (CI) and focus on closing an easily overlooked gap in the age of Agile: the gap between how things should work versus how they actually work.

In recent years, a movement has been brewing to do away with Quality Assurance (QA) altogether, the argument generally being that the Agile focus on unit tests guarantees that shipping software works as intended. This is problematic on multiple counts, not the least of which being that unit tests check only what developers think to test. Sometimes, it can take a fresh set of eyes and a different perspective to catch defects.

Further, unit tests can check many things, but are by nature designed to run in isolation and execute quickly, which means they often “mock” (e.g., simulate) database and other such connections. They’re great  for validating individual bits of code, but are a lousy solution for other testing needs.

Continuous Testing (CT) is such a broad field, often varying widely from industry to industry — even project to project — that we’re not going to build a lot of specific machinery. Instead, in this chapter, you’ll select the strategies and tools best suited to your own circumstances.

High-level Testing Strategies

Let’s begin with the broadest outlines of high-level testing strategies. Your first line of defense is static testing, which takes its name from the fact that your code remains “at rest.”

Bringing a second pair of eyes to bear often catches things that the original developers didn’t. Techniques such as pair programming, mandatory code reviews, static analysis tools, and so forth can be useful in verifying that the code “looks right” for its intended purpose. You may know this by another name, as static testing is often considered verification.

But of course, any static testing is limited by the simple fact that it doesn’t involve executing the code. Even the best musicians and composers gain a new understanding of a piece when it’s performed, no matter how well their mind’s eye (ear?) can translate sheet music, and the same is true of software.

That’s why executing code, or dynamic testing, is also critical for success. And again, you may know this by another name, as dynamic testing is often considered validation. In the end, your users don’t care about the code, they want the software to work as expected.

Exploring Testing Types

Dynamic testing invariably leads to a variety of important questions, first among which is arguably from whose perspective do we test? The way you answer that question effectively tells you whether you’re embracing:

  • White-box Testing leverages detailed knowledge of internals
  • Black-box Testing focuses wholly on the “surface”, avoiding any knowledge of internal details
  • Gray-box Testing is a mix of white- and black-box testing

It may seem like a subtle or even useless distinction, but you’ll easily find arguments for and against each choice.

White-box testing can be particularly useful because it proceeds in full possession of detailed internal knowledge, which allows you to construct test cases to exercise relevant cases to execute all logical code paths. Black-box testing can be especially useful when dealing with software components, as in the end, what usually matters most is that they produce the proper outputs for a given set of inputs. And in some cases, gray-box testing may be necessary to ensure that a set of operations goes to completion correctly given some initial state (e.g., certain known or expected records in a database table or other storage).

What your project does and how you want to verify that it’s doing it correctly should guide your choice when it comes to these options. The one “golden rule” we can easily recommend is (1) to pay careful attention to how much knowledge of the system’s operation is essential to constructing high-quality tests, and then (2) balance that against the expectations with which you intend to saddle your users. In a highly technical application it’s not at all improper to require much of your users, but a dirt-simple app intended for all mobile phone users is a completely different story.

Testing in Layers

Speaking of different stories, layers of testing are also relevant. It doesn’t make sense, for example, to test low-level driver code a user is never going to interact with the same way you test some custom-built user-interface element that users will “abuse” in all sorts of unexpected ways. In short, there are a number of different layers, levels, or scopes at which it can be relevant to test, enumerated here in order of increasing complexity:

  1. Unit testing
  2. Integration testing
  3. Interface testing
  4. System testing
  5. Acceptance testing

We’ll talk about each of these in turn, beginning with unit testing. As already mentioned, unit tests are intended to exercise and validate the operation of some small bit of code as quickly as possible. Developers working with object-oriented languages, for example, often create a suite (or suites) of tests to exercise each individual “class” they create. Unit tests are often white-box in nature because these are bits of code expected to be used only by developers, though black-box and gray-box testing can also prove helpful. The key point is that unit tests ensure a single thing works as intended.

In contrast, integration testing is inherently focused at a higher layer/level because it ensures that the classes, modules, components, etc. that comprise some larger logical unit of software work together. In effect, integration testing by definition seeks to ensure that multiple things work together as intended.

Interface testing is similar to integration testing in that it typically involves exercising multiple things, but with a particular focus on making sure that any communications occurs correctly between larger units of software. In software, ‘interface’, in the field of software anyway, is typically defined as some shared boundary that divides separate units of software (often called components).[1] The goal of interface testing is (unsurprisingly) to ensure that all data and operations shared between those larger units of software are correct.

The next step up the ladder of testing complexity is system testing, which can be thought of as integration testing writ large. In other words, instead of talking about bringing several lower-level bits of code (often components) together and making sure they work, system testing aims at validating the operation of some higher-level unit of software that can be thought of a thing in its own right.

The example of client-server software illustrates this. In such an architecture, a server program can typically be considered a complete system in its own right. It accepts commands or other interactions from a client, perhaps involving authentication and/or authorization to make sure those operations are allowed, and then carries out tasks and/or responds with the proper outputs. Clients that “know” how to utilize the server can also be thought of systems in their own right.

Acceptance testing begins when you bring them together to verify that everything works properly at the very highest level possible: user interactions with the product as a whole. While system testing shows you defects in a server, only acceptance testing shows you whether the overall process is good enough, from a user perspective, to ship.

It is rather common to leverage automated testing tools for the lowest three (unit, integration, and interface) testing levels. It’s often harder (and generally more expensive) to acquire good tools for system and acceptance testing. Alhtough progress has been made in recent years, you’ll still have to get your hands dirty if you commit to higher-level testing.

Types of Testing

We’ll barely scratch the surface as we explore different types of testing. The following are ordered roughly in terms of commonality, highest to lowest, descending into testing obscurity as we go.

The most commonly known types of testing are arguably alpha testing and beta testing. By software tradition, an alpha release is a feature-incomplete version of a product or service that is nevertheless ready for at least some review. Alpha testing is usually conducted internally and typically open only to those with the kind of high level of understanding and patience necessary to navigate the rough waters of unfinished software. It can be helpful in making sure the project is on the right path before getting too far along the development calendar.

In contrast, a beta release is typically a feature-complete version of a product that is ready for at least some external review and perhaps even production use. Beta testing is usually conducted with a limited set of users, often important customers seeking special influence over product development. It provides a final gut-check of the planned feature set before a more general release of a given product or service.

Next is smoke testing, or sanity testing, which is  almost the opposite of unit tests. For whereas unit tests often comprise highly specific suites that seek to exercise the individual bits that go into an application in a rigorous way, smoke testing is often involves limited high-level tests to ensure that nothing has gone horribly wrong — usually implemented from the end-user’s standpoint. Smoke testing is usually the most basic hurdle for a build to vault.

Regression testing is another very common type, focusing on making sure that some new change hasn’t broken features that were already working. This type of testing typically occurs right before a new release or when some crucial fix is performed after a disastrous report from the field. A one-line code change can cause chaos, and regression testing usually catches unintended consequences before release.

Performance testing is also common, particularly with software designed for high-volume applications or real-time processing. This type of testing often includes sub-types such as load testing, to validate that software performs acceptably under some large amount of concurrent work in progress, or stress testing, to validate that the software’s functionality will degrade gracefully under unexpected conditions (e.g., memory or storage scarcity). For the most demanding applications, real-time testing validates that a given system (often a hardware/software hybrid device) can execute tasks within a strict time limit. This can be crucial for medical devices, for example, or other high-risk products that must always deliver timely results.

Especially popular today is A/B testing. The ubiquity of the web, and its ease of deployment, have made it possible for various web applications and services to offer multiple variants of a given page or function to users in a controlled environment. This allows organizations to collect data on usage, practicality, and utility of two different ways of doing something, easily retiring the less desirable once the data is gathered. This kind of testing improves perceived user value and prevents bad design and implementation decisions by utilizing real-time user feedback.

And of course the list goes on. Install/uninstall testing verifies that a product may be added or removed to/from a computer or user account. Security testing comes in many shapes and forms, from checking for known exploits (e.g., SQL injection) to hammering randomly at various interfaces to see what breaks. Internationalization testing focuses on accurate linguistic translations as well as cultural norms and standards (e.g., currency formats, calendar representations, etc.).

The Reality of Testing

It’s simply not possible to test everything thoroughly. Testing is often only partially directed at validating proper function. Its more important role , particularly in our increasingly litigious society, is often reducing risk of legal exposure and subsequently mitigating legal actions. Whatever your core concerns, do your homework and choose the testing most useful for assuaging them.

A final caveat is relevant for those wondering how truly complete testing remains impossible, even in today’s age of increasing computer automation. It’s not difficult if you consider how thoroughly insurmountable the problem really is.

Many lines of code involve potential logical branching; i.e., making a decision based on some data and choosing an alternate path of subsequent execution as a result. Even relatively trivial software can involve hundreds of thousands of lines of code, so the sheer number of total execution paths to test is enormous.

And that’s just execution. Consider also that for any input field on a user interface, a user can enter whatever characters are allowed. That often means the entire alphabet as well as numbers, symbols, and perhaps even extended characters or entire other character sets in the era of Unicode. In short, there is often an effectively infinite set of possible input data to test.

And that doesn’t include the variety of interactions possible between applications today or the complexity of the environments in which they execute. Cross-platform software these days operates on Windows, macOS, Linux/Unix, and even various mobile operating systems such as Android and/or iOS.

Combinatorically speaking, you put all of this together and even the proverbial infinity of monkeys are not going to be able to test your software completely prior to the heat death of our universe. And besides, I’ve yet to meet a company that has the budget for infinite monkeys and billions of monkey-years of testing.

Testing is a bit like a software version of the famous Gordian knot. Unravelling all the threads is impossible, so your best bet is to trim off only the bits you care about with your own project-management version of the sword of Alexander. Choose what gives you the biggest-quality-value bang for your test-resources buck and expand from there as needed.

Thus ends our prelude to tackling CT. Next time, we’ll cover it a little more specifically, using our sample application as a guide, and discuss how managing testing environments is crucial for reliable, repeatable, high-quality results.

You Ask, We Answer

As previously mentioned, this is your roadmap to creating a successful DevOps pipeline. Don’t understand something? Just ask. Need to dive a little deeper? Send an email to [email protected] with your questions. Then, stay tuned for a live Q&A webinar at the end of this series.

Get DevOps Digest Sent to Your Inbox

You don’t need to remember to check back with us each week. Instead, get the digest delivered directly to your inbox. Subscribe to our 25-week DevOps Digest and we’ll get you where you need to go, one email at a time.

See Perforce Helix in Action!

Join us for a live demo every other Tuesday and see the best of Perforce Helix in 20 minutes. Save your spot!