Calling BS

We both recently read Calling Bullshit (by Carl T. Bergstrom and Jevin D. West) and Lee reviewed the book in a blog post. We’ve both talked together about the BS we see in and around testing for many years, and written blogs too (e.g. Lee’s critical review of the Capgemini “World Quality Report 2020-21”). We then spotted this tweet from Michael Bolton and it inspired us to pen this joint blog post in the spirit of “calling bullshit”:

Some of the attributes that we think make a great tester include critical thinking, shining a light into dark places, and being brave enough to call out problems when they’re spotted. Why then do testers seem willing to walk past problematic statements made in blogs, LinkedIn posts and vendor marketing and not call them out? How do we advance testing if we continue to allow mistruths to go unchallenged?

In On Bullshit, the philosopher Frankfurt (2005) defines bullshit as something that is designed to impress but that was constructed absent direct concern for the truth. This distinguishes bullshit from lying, which entails a deliberate manipulation and subversion of truth (as understood by the liar).

While there is no shortage of material when it comes to BS around testing, consider this recent marketing from testing tool vendor, Applitools – on their official website page for Applitools Eyes – as an example.


Stable, Fast, and Efficient

Applitools Eyes is powered by Visual AI, the only AI powered computer vision that replicates the human eyes and brain to quickly spot functional and visual regressions. Tests infused with Visual AI are created 5.8x faster, run 3.8 more stable, and catch 45% more bugs vs traditional functional testing. In addition, tests powered by Visual AI can take advantage of the ultrafast speed and stability of the next generation of cross browser testing, our Ultrafast Test Cloud.

Let’s consider some attributes of human eyes and brains. Eyes don’t, in reality, “see” anything. They capture lightwaves and send these, via neural networks, to our brain’s visual cortex for interpretation and correction. Notice how you have two eyes but only see one “smooth” picture – that’s some neat real-time processing right there. Not only that, your eyes are sending a constantly changing pipeline of varying lightwaves, but note how the picture stays smooth and controlled, enabling you to make decisions, almost at an unconscious level.

So, how do we “replicate” this with a computer? The answer is: we don’t.

“Vision is the functional aspect of the brain that we understand the best, in humans and other animals,” Tenenbaum says. “And computer vision is one of the most successful areas of AI at this point. We take for granted that machines can now look at pictures and recognize faces very well, and detect other kinds of objects.”

However, even these sophisticated artificial intelligence systems don’t come close to what the human visual system can do, Yildirim says. “Our brains don’t just detect that there’s an object over there, or recognize and put a label on something,” he says. “We see all of the shapes, the geometry, the surfaces, the textures. We see a very rich world.”

Neuroscience News

While it might sound cool to make such a grandiose statement, the regressions being spotted are simply comparisons – no more and no less. The computer is programmed to look for variances. The more sophisticated the software, it is probably reasonable to assume the better it will be at spotting regressions. Checking state “A” with state “B” and comparing it is, when all is said and done, is one of the things computers were built for and are good at.

Question – why are we replicating the “human eyes and brain”? We know that humans really struggle when it comes to checking in detail and for any length of time. Is the answer then to ask a computer to replicate that struggle? Why would we use software that replicates human limitations and errors? Perhaps we are not “replicating” at all but creating an illusion that makes for a grand sales pitch.

It seems to us that if the software was “replicating human eyes and brain” then surely it could do more than just spot a difference. Is it exploring the reasons why the difference has been spotted? Is it creating a bug ticket with replication steps and a detailed analysis of why the difference is unacceptable rather than acceptable? Can it compare the current state with previous states and make decisions about attractiveness, usability, suitability? Can it consider any of this based on a variety of contexts? If not, what’s stopping this? A human brain, a real human brain, that’s what. Perhaps it would be enough for Applitools to talk about how accurate the software is, avoiding this “human eyes and brain” hyperbole altogether.

Let’s turn our attention to the idea of “Tests infused with visual AI”. From the Oxford dictionary we can interpret “infused”, in this context, as “the introduction of a new element or quality into something”. From reading across one of the linked pages, we see many references to “AI” and we also see attention-grabbing words and phrases like “complex” and “deep learning”, along with the claim that “Applitools Visual AI has achieved 99.9999% accuracy”.

We take a couple of things out of this:

  1. We have been given no better understanding of why “tests infused with AI” are helpful to our testing or quality goals. While it might make for a nice sales pitch, it really gives us nothing of substance to compare against anything we might already be using to pinpoint likely advantages or even half solid reasons for considering a switch.
  2. The claim that “Applitools Visual AI has achieved 99.9999% accuracy” requires some scrutiny. Notice that it does not claim to always achieve or provide you with “99.9999% accuracy” (is our sarcasm in overdrive suggesting the wording is deliberately designed to be misread this way?). “Four nines” is an amazing accuracy rate but, the big (and unanswered) question is, under what conditions? High accuracy rates are achievable, and are far less impressive, when achieved under optimal conditions. If we started using this software tomorrow, what would our accuracy rate be? How long would it take to achieve 99.9999%? Is it even achievable in the real world outside of controlled conditions?

Let’s move on to the next series of claims – “5.8x faster” and “3.8 [times] more stable” – and ask the question, compared to what? These are unsupported statistical claims. There’s no denying that they sound impressive, but there is nothing to substantiate the statistics.

Without knowing the source and context, a particular statistic is worth little. Yet numbers and statistics appear rigorous and reliable simply by virtue of being quantitative, and have a tendency to spread.

West, Jevin D.; Bergstrom, Carl T.. Calling Bullshit (p. 101). Penguin Books Ltd. Kindle Edition.

We’re not going to simply accept the numbers. What is the benchmark? Under what conditions? Which hardware, software, load, etc.? If we run the software, will we see these (or similar) speed and stability improvements? We can’t possibly know because the claims have nothing to back them up. This seems exceptionally strange when you are selling into a market that should be questioning such claims.

With all seriousness, what would a software tool pitch be without a claim to being superior to testing executed by humans? “…catch 45% more bugs than traditional functional testing”. It’s nice to see that “manual” was cast aside in favour of “functional”, or was it? Is this a comparison to automated checks written without Applitools Eyes or is it a comparison to human testing? As the model is undefined, the “45% more bugs” claim is completely invalid.

The final claim made here relates to their cloud, which apparently offers “ultrafast speed and stability” when it comes to cross-browser testing. A LinkedIn post (from 5th January 2021) amped up this claim:

Again we see a very specific number, “20x faster” without offering anything by way of explanation of what it’s comparing (e.g. overall execution time) and there can be no foundation to the claim that their solution performs so well compared to “any other”. They could provide data to show their tool’s performance compared against some other solutions, but simply cannot claim it performs so much better than all alternatives. It’s also worth noting that the speed of check execution is not an indicator of quality in the approach and the human elements of genuine testing (over algorithmic checking) are not amenable to being sped up without reduction in their efficacy.

The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it.

West, Jevin D.; Bergstrom, Carl T.. Calling Bullshit (p. 11). Penguin Books Ltd. Kindle Edition.

We’ve focused on Applitools marketing in this blog post – note that we’re not attacking the company, its people or its product, rather we’re critiquing their messaging. We see similar examples crossing our feeds on a daily basis – and so we assume most other testers do too. It’s easy to look at this nonsense and scroll on by, but if we do nothing and allow the BS to proliferate then we’re complicit in its ability to spread, be accepted and maybe even become viewed as fact over time. 

So, please join us in calling out the BS you see around testing. Flex those critical thinking muscles and communicate your objections and let’s hope that by doing so, we can help reduce the prevalence of such poor messaging around testing.



We realized recently that we’d been working together in various projects for a few years, though we maybe never really consciously thought about this as “pairing”. When we decided to publish our shared thoughts on this new blog, though, the topic of pairing began to feel like a good one for us to cover.

Pairing – what and why?

Before we dig in further, let’s see if we can elaborate on why we might pair to complete a task. Perhaps the first thing to highlight is that pairing is a choice – it’s a choice to involve another person in helping to solve a problem. Pairing is not a law, you can problem solve on your own if you so choose. So why might we elect to pair?

Coming together is a great way to explore shared experiences and leverage the diversity of thought that comes from different backgrounds & knowledge. Each member of the pair brings a different mental model to the problem at hand, again resulting in more diverse avenues to explore together. Pairing also helps with negotiation skills and can enable a safer environment in which to resolve disagreements. And, of course, sometimes it’s just nice to have someone to work with rather than tackling a problem alone.

We’ve worked together to co-write for magazines and so on, but our most significant pairing exercise – both in terms of effort and outcomes – to date has been putting together and running the EPIC TestAbility Academy (ETA).

The EPIC TestAbility Academy

Paul says: “The TL;DR of the EPIC TestAbility Academy is that I decided to pursue an idea I had been pondering for some time. Why did I decide to pair rather than run with it on my own? Well here’s a small list. The work required to get such an undertaking in place is significant. I knew I didn’t have the required capacity. I had ideas but I wanted someone who could challenge them, propose things I would not have thought of so that the course became better than it could otherwise have been. I wanted somebody who had skills that I didn’t have. For all of the above I also just wanted to share the experience with somebody who believed that contributing to the test community is important.”

Lee says: “Paul came to me with the idea of working together in a community/volunteer capacity and the idea instantly appealed to me. It’s unlikely I would have come up with the idea myself so having Paul’s passion as the driving force helped to get a project off the ground with EPIC Assist.”

One of the things that stands out is that we have different, but complimentary styles. For example, we both love good slide sets when presenting or teaching to others. We generally reach agreement fairly quickly on what we think are the key points. However Paul prefers sparse slides (just the really key details, and whenever possible, no words just a related diagram or picture), while Lee is somewhat more detail driven and prefers more words on the slide. This is where things get interesting as we then need to negotiate and iterate over the content.

We discovered we have different teaching styles. Lee leans more towards being driven by the topic or slide, while remaining open to exploring interesting thoughts and ideas. Perhaps due to his teaching background but put Paul in front of a group of people and he undergoes a slight personality change. So he was more willing to pick up a thread and diverge from the immediate topic. More than once (OK, regularly, yeah, alright, every week) we had to modify the upcoming session content for our ETA classes. That was part of our shared learning. These differences worked really well as we had two different styles in action as we swapped in and out of “teacher” and “assistant” modes.

This is probably one of the keys to what is a very solid friendship. We can differ and negotiate without getting into a “win/lose” or a “my way or the highway” mindset. We observe, we consider, we raise differences of opinions respectfully. Some of our work is done over a shared bottle of red wine sitting in Lee’s house – this probably helps the process.

In the interests of raising awareness of the work we’d been doing with ETA, we submitted a proposal to speak at the inaugural TestBash Australia conference in Sydney in 2018. We were pleased to be accepted and had several months to prepare for this joint presentation. Our preparation mirrored our approach to building the ETA content in many ways – drafting slides individually and then reviewing them together (so that Paul could cut Lee’s wordy slides down to something more presentable!). Lee’s previous conference experience led to us adopting a fairly rigorous practice regime, especially important given the hard thirty-minute timeslot we’d been given. We rehearsed the talk over Skype meetings, timing each slide and reviewing where we needed to refine our messaging to fit the timeslot. After several iterations, we got to the point where had a good set of slides that we could deliver in a consistent manner (within a couple of minutes either way). We felt very well prepared as we headed to Sydney for the conference. The delivery of the talk as a pair also worked really well and knowing we had each others back was a confidence booster.

Joint blogging

A more recent pairing effort has resulted in what you’re reading today, our joint blog. While we both blog in our own right (Paul at and Lee at TheRockerTester), we often discuss things – often over a coffee after work – that feel more appropriate to cover together, where we can share our potentially different thoughts on the same topic in one place.

Joint blogging has proved to be a very different experience to blogging individually. Both of us tend to write individual blog posts fairly quickly once we have an idea we think is worth writing about. When we decide to cover a topic via our joint blog, however, the posts take considerably longer to form and publish. The asynchronous nature of putting these joint posts together is a blessing and a curse – we take the opportunity to edit, refine and discuss content (sometimes over a number of weeks) which probably leads to more coherent posts, but the lag between initial idea and publishing can also be frustrating for both of us.

Our final thoughts

We pair because it works for us, it’s enjoyable, and it also gives us the chance to learn from our differences as well as our similarities.



The case against detailed tests cases (part three)

We recently read an article on the QA Revolution website, titled 7 Great Reasons to Write Detailed Test Cases, which claims to give “valid justification to write detailed test cases” and goes as far as to “encourage you to write more detailed test cases in the future.” We strongly disagree with both the premise and the “great reasons” and we’ll argue our counter position in a series of blog posts.

Our first blog covered the claims around test planning claims and our second those on offshore testing teams.

In this third and final part of our series, our focus now turns to the points made in the article around training.

Training: I have found that it is extremely helpful to have detailed test cases in order to train new testing resources. I typically will have the new employees start understanding how things work by executing the functional test cases. This will help them come up to speed a lot faster than they would be able to otherwise.

Let’s go through the assertions made by these statements.

I have found that it is extremely helpful to have detailed test cases in order to train new testing resources.

We note that, again, our experiences suggest the exact opposite. While this probably seems like a sound idea and Lee also once advocated for such an approach, it soon became clear that there were significant downsides to driving learning via an existing test case library including:

  • Different people learn in different ways and following written instructions is not a learning style that works well for everyone.
  • Performing testing by following detailed test cases is boring. Some of the key drivers of learning – such as genuine engagement and curiosity – are dampened or obliterated by simply following instructions in this way.
  • The tacit knowledge held by the test case author during the creation of the test case results in big gaps and unclear instructions when it comes to being executed by another person (even one of similar experience).
  • When following detailed instructions, the ability to observe and memorise is severely compromised – you’ll have no doubt experienced this when driving to a destination using GPS as compared to when you follow signs and landmarks to reach the same destination. Mental capability is used up trying to follow a map rather than learning and navigating the terrain, and following written instructions is mentally tiring (perhaps partly due to suppressing the innate human desire to explore and learn rather than living to a script).

I typically will have the new employees start understanding how things work by executing the functional test cases. This will help them come up to speed a lot faster than they would be able to otherwise.

Paul has, for some years now, used exploratory models to train people new to the software being tested. This enables them to use their curiosity while learning how things work. Following other people’s directions (via detailed test scripts) is simply following a map, leading to the possibility of confusing the map for the terrain. Due to their detailed nature, such test cases quickly become out of synch with the product as it is developed and Lee has seen many instances of new testers to a team trying to use such test cases and becoming very confused due to the inevitable mismatches between the test case and the reality that is the product.

A further observation of ours is that when testers learn through exploration, they ask a lot of questions. As they get feedback on their questions, they are also getting constructive feedback on the quality and relevance of those questions. This helps new testers to practice framing important questions about the software, their approach to testing it, their current lack of knowledge and potential areas of system risk. These are all attributes that help to create an excellent tester.

We’d like to point out that following instructions and understanding are not the same things. Rote learning of the software produces a “one dimensional” view as you are following one way paths. In reality, software testing is often more like a freeway with multiple lanes, off ramps, on ramps, pot holes and barricades. You need all your senses available to you to understand the terrain, spot signs of potential trouble and get them repaired before your customers are troubled by them. Notice that while we have a focus on training and learning, we are doing this in the context of system testing and potentially uncovering new sources of risk. This more holistic approach to training is a much closer approximation to what we believe good testers do when testing.

We note that the article’s author suggests that the tester will “come up to speed a lot faster than they would be able to otherwise”, but there are no alternative ways of “coming up to speed” offered against which to compare. Our experience of trying to force learning via following existing test cases is that the resulting understanding is shallow and what might look like a good level of understanding of the software is later revealed to be quite poor when it comes to finding deeper, more important issues.

Summing up our views

In our opinion, while you could use detailed test cases as a training and learning tool, our experiences suggest that this is an approach that is neither engaging or effective compared to allowing the tester to learn through exploration, support and questioning.

If you’ve read all three blogs on our case against test cases, you have probably come to the conclusion that we really do not agree with the assertions made by the article we’re responding to. Detailed test cases in our view provide very few advantages and a lot of disadvantages. It’s hard to support any approach that reduces a tester’s time interacting with the software and asks them to detail what they should test based on a specification that will change and render many test cases pointless. Testers are intelligent people (at least the ones we know well) with boundless curiosity and an appetite for exploring and asking questions. Asking them to suppress these talents in favour of following detailed test cases is a massive disservice to testers. If the context you are engaged in demands detailed test scripts, well that sucks, but at the end of the day you’re stuck with that. However there is no reason why you can’t actively advocate for better approaches and seek to run small experiments that slowly move your testing away from detailed test scripts.

Our suggestions for further reading:


The case against detailed tests cases (part two)

We recently read an article on the QA Revolution website, titled 7 Great Reasons to Write Detailed Test Cases, which claims to give “valid justification to write detailed test cases” and goes as far as to “encourage you to write more detailed test cases in the future.” We strongly disagree with both the premise and the “great reasons” and we’ll argue our counter position in a series of blog posts.

We now turn our attention to the claims made about offshore testing in the article.

Offshore: If you have an offshore team, you know how challenging that can be. It is really important to write everything out in detail when you communicate offshore so that everyone understands. It is critical to write detailed test cases is no different. Without those details, the offshore team will really struggle to understand what needs to be tested. Getting clarifications on a test case can often take a few days of back and forth and that is extremely time consuming and frustrating.

Let’s start by breaking out statements and examining them.

It is really important to write everything out in detail when you communicate offshore so that everyone understands

This recommendation appears to be conflating additional detail with improved understanding. Our joint experience of working with offshore testing teams suggests that the exact opposite is true. Instead of focusing so much effort on detailed test cases, we’ve preferred to spend time providing support as required to these testers to help them understand. When working with offshore teams for whom English is not their first language, Lee found that detailed documents (including test cases) were very slow for the testers to consume and it was more efficient to use lighter weight documents (in terms of their text content) such as mind maps. Paul has had similar experiences and also finds lightweight documentation and support to be a far more effective method to generate understanding.

It is also interesting that the author claims that “everyone understands”. Think of yourself in your workplace, perhaps in a meeting, and you are discussing the system under test. All attendees have the same native language. Now ask yourself, when was the last time you issued directions and everybody understood? Can you remember an instance? We are struggling to recall an occasion where we were completely clear and not questioned. Now convert this from spoken word to written word. How often do you get questions about statements you have written? It’s pretty frequent, right? So we are more than a little curious about why detailed test cases would be immune to a lack of clarity.

Without those details, the offshore team will really struggle to understand what needs to be tested.

It sounds like the idea is for someone to write detailed test cases with the intent of a tester in the offshore team executing them. There are a couple of assumptions in play in this case. Firstly, it is implied that the test case is clear and without ambiguity – trying to write test cases that achieve this is a fool’s errand (since it’s impossible). Secondly, there is an assumption that what makes sense to the test case author makes sense to the executor of the test case. This implies that the author and the executor model information in the same way and they have the same background knowledge. Again, this is very unlikely to be true. There is plenty of room for differences in modeling and assumptions that could result in the testing by the executor being quite different from that imagined by the author. Having a detailed test case doesn’t necessarily mean it is followed to the letter during execution and, even if it was, the testing will most likely be different from that envisaged by the author.

We also find this statement disrespectful to the intelligence of the testers in the offshore team, who are most likely quite capable of thinking for themselves and coming up with excellent test ideas if given the chance outside the confines of simply being test case executors. We strongly recommend that testers learn about oracles and heuristics as ways to generate test ideas, rather than solely relying on documentation and detailed test cases. The testers – be they in offshore teams or otherwise – will likely find much deeper problems by broadening out their thinking using these tools than if they rely on prescriptions of exactly what to test.

The statement implies that the primary (or maybe only?) form of communication between the onshore and offshore team is the detailed written test case. This is a recipe for disaster. Collaboration through discussion is much more likely to aid with understanding than trying to communicate only in written form.

Getting clarifications on a test case can often take a few days of back and forth and that is extremely time consuming and frustrating.

We find it quite interesting that writing detailed test cases is time consuming (and for many testers, incredibly frustrating) but the article uses those very factors as a reason for writing detailed test cases. Amazingly, it is promoting the notion that quantity of detail provides superb clarity. 

We’ve yet to see a set of test cases that didn’t need some level of clarification. That we can be so precise as to leave no doubt is a fallacy that simply does not hold in day to day communication. Wiio’s Law tells us, if broadly summarised, that “Human communications usually fail except by accident”. While intended to be humorous, there are also recognised “truths” through observation. Wiio’s Law was commented on by  Korpela, Jukka Kalervo  who noted with regards to why human communication fails:

  • Language differences. On the Internet, for example, the lingua franca is badly written and poorly understood English. Some people use it as their native language; others learned some of it from various sources. In any case, whatever you say will be interpreted in a myriad of ways, whether you use idiomatic English or not.
  • Cultural differences. Whatever you assume about the recipients of your message, the wider the audience, the more of them will fail to meet your assumptions. On the Internet, this virtually guarantees you will be misunderstood. What you intend to say as a neutral matter of fact will be interpreted (by different people) as a detestable political opinion, a horrendous blasphemy, and a lovely piece of poetry.
  • Personal differences. Any assumption about the prior knowledge on the subject matter fails for any reasonably large audience. Whatever you try to explain about the genetics of colors will be incomprehensible to most people, since they have a very vague idea of what “genes” are (in written communication you might just manage to distinguish them from Jeans).

Summing up our views

While we both have experience in working with offshore testing teams, we have not seen the use of detailed test cases as either being mandatory or effective in reducing some of the inevitable communication challenges involved in this model. We have instead focused on collaboration and learning to help on both sides of the communication, finding that the use of exploratory testing has been incredibly valuable in effectively working in an offshore model.

Our suggestions for further reading:

Thanks to Brian Osman for his review of our blog post.


The case against detailed tests cases (part one)

We recently read an article on the QA Revolution website, titled 7 Great Reasons to Write Detailed Test Cases, which claims to give “valid justification to write detailed test cases” and goes as far as to “encourage you to write more detailed test cases in the future.” We strongly disagree with both the premise and the “great reasons” and we’ll argue our counter position in a series of blog posts.

What is meant by detailed test cases?

This was not defined in the article (well there’s a link to “test cases” – see article extract below – but it leads to a page with no relevant content – was there a detailed test case for this?). As we have no working definition from the author, this article is assuming that detailed test cases are those that comprise predefined sections, typically describing input actions, data inputs and expected result. The input actions are typically broken into low level detail and could be thought of as forming a string of instructions such as “do this, now do this, now do this, now input this, now click this and check that the output is equal to the expected output that is documented”.

Let’s start at the very beginning

For the purposes of this article, the beginning is planning. The article makes the following supporting argument for detailed test cases

It is important to write detailed test cases because it helps you to think through what needs to be tested. Writing detailed test cases takes planning. That planning will result in accelerating the testing timeline and identifying more defects. You need to be able to organize your testing in a way that is most optimal. Documenting all the different flows and combinations will help you identify potential areas that might otherwise be missed.

Let’s explore the assertions made by these statements.

We should start by pointing out that we agree that planning is important. But test planning can be accomplished in many different ways and the results of it documented in many different ways – as always, context matters!

Helps you to think through what needs to be tested

When thinking through what needs to be tested, you need to focus on a multitude of factors. Developing an understanding of what has changed and what this means for testing will lead to many different test ideas. We want to capture these for later reference but not in a detailed way. We see much greater value in keeping this as “light as possible”. We don’t want our creativity and critical thinking to be overwhelmed by details. We also don’t want to fall into a sunk cost fallacy trap by spending so much time documenting an idea that we then feel we can’t discard it later.

Planning can be made an even more valuable activity when it is used to also think of “what ifs” and looking for problems in understanding as the idea and code is developed, while “detailed test cases” (in the context of this article) already suggests waterfall and the idea that testers do not contribute to building the right thing, right.

Another major problem with planning via the creation of detailed test cases is the implication that we already know what to test (a very common fallacy in our industry). In reality, we know what to confirm based on specifications. We are accepting, as correct, documentation that is often incorrect and will not reflect the end product. Approaching testing as a proving, rather than disproving, or confirming over questioning activity plays to confirmation bias. Attempting to demonstrate that the specification is right and not considering ways it could be wrong does not lead us into deeper understanding and learning. This is a waste of tester time and skills.

That planning will result in accelerating the testing timeline and identifying more defects

We are a bit surprised to find a statement like this when there is no evidence provided to support the assertion. As testing has its foundations in evidence, it strikes us as a little strange to make this statement and expect it to be taken as fact. We wonder how the author has come up with both conclusions.

Does the author simply mean that by following scripted instructions testing is executed at greater speed? Is this an argument for efficiency over efficacy? We’d argue, based on our experiences, that detailed test cases are neither efficient nor effective. True story – many years ago Paul, working in a waterfall environment, decided to write detailed test cases that could be executed by anybody. At that point in test history this was “gold standard” thinking. Three weeks later, Paul was assigned to the testing. Having been assigned to other projects in the meantime he came back to this assignment and found the extra detail completely useless. It had been written “for the moment”. With the “in the moment knowledge” missing, the cases were not clear and it required a lot of work to get back into testing the changes. If you’ve ever tried to work with somebody else’s detailed test cases, you know the problem we’re describing.

Also, writing detailed test cases, as a precursor to testing, naturally extends the testing timeline. The ability to test early and create rapid feedback loops is removed by spending time writing documentation rather than testing code.

Similarly “identifying more defects” is a rather pointless observation sans supporting evidence. This smacks of bug counting as a measure of success over more valuable themes such as digging deeply into the system, exploring and reporting that provides evidence-based observations around risk. In saying “identifying more defects”, it would have been helpful to indicate alternative approaches being compared against here.

Defects are an outcome of engaging in testing that is thoughtful and based on observation of system responses to inputs. Hanging on to scripted details, trying to decipher them and the required inputs, effectively blunts your ability to observe beyond the instruction set you are executing. Another Paul story – Paul had been testing for only a short while (maybe two years) but was getting a reputation for finding important bugs. In a conversation with a developer one day, Paul was asked why this was so. Paul couldn’t answer the question at the time. Later, however, it dawned on him that those bugs were “off script”. They were the result of observing unusual outcomes or thinking about things the specification didn’t cover.

You need to be able to organize your testing in a way that is most optimal.

This statement, while not being completely clear to us in terms of its meaning, is problematic because for one thing it seems to assume there is an optimal order for testing. So then we need to consider, optimal for whom? Optimal for the tester, the development team, the Project Manager, the Release Manager, the C level business strategy or the customer?

If we adopt a risk-based focus (and we should) then we can have a view about an order of execution but until we start testing and actually see what the system is doing, we can’t know. Even in the space of a single test our whole view of “optimal” could change, so we need to remain flexible enough to change our direction (and re-plan) as we go.

Documenting all the different flows and combinations will help you identify potential areas that might otherwise be missed.

While it might seem like writing detailed test cases would help testers identify gaps, the reality is different. Diving into that level of detail, and potentially delaying your opportunity for hands-on testing, can actually help to obfuscate problem areas. Documenting the different flows and combinations is a good idea, and can form part of a good testing approach, but this should not be conflated with a reason for writing detailed test cases.

The statement suggests to us an implication that approaches other than detailed test cases will fail to detect issues. This is another statement that is made without any supporting evidence. It is also a statement that contradicts our experience. In simple terms, we posit that problems are found through discussion, collaboration and actual hands on testing of the code. The more time we spend writing about tests we might execute, the less time we have to actually learn the system under test and discovering new risks.

We also need to be careful to avoid the fallacy of completeness in saying “documenting all the different flows and combinations”. We all know that complete testing is impossible for any real-world piece of software and it’s important not to mislead our stakeholders by suggesting we can fully document in the way described here.

Summing up our views

Our experience suggests that visual options, such as mind maps, are less heavy and provide easier visual communication to stakeholders than a library of detailed test cases. Visual presentations can be generated quickly and enable stakeholders to quickly appreciate relationships and dependencies. Possible gaps in thinking or overlooked areas also tend to stand out when the approach is highly visual. Try doing that with a whole bunch of words spread across a table.

Our suggestions for further reading:

Thanks to Brian Osman for his review of our blog post.