16 October 2013

How many holes are in this shirt?


Saw this picture on Facebook and couldn't resist to add to the long stream of comments saying 6, 7, 8, .. holes:

First: Definition of a hole? You have separations between threads in the fabric for instance, do they count?

Second: Even with a definition it's impossible to say only based on that picture since you can't see holes on the back of the shirt, covered by the part of the front that is not ripped open. For instance, you can see right through so there must be holes on the back but is there two different holes or one big hole?

For the spirit of the question: At minimum zero, assuming the yellow parts are only clever design and the whole back is gone so that holes for arm, neck etc. are not really holes anymore... hard to call it shirt in that case tough.

... so what's the minimum requirement for it to be called a shirt?

You could of course continue and question things like if that's a shirt or a drawing, that a two dimensional object in general would work poorly as a shirt if you don't live in Flat Land and so on.

I wanted to share this just as an example of how many assumptions we make when giving a simple answer even to a simple question. I leave it to you to make something out of it but a start could be to look critically at simple, especially "universal", answers you get or give. What's required for them to be true and are you sure that reflects reality?

13 October 2013

Arguing for Exploratory Testing, part 2, Reuse

Intro
You can read a bit more about this series in the first post:
Arguing for Exploratory Testing, part 1, Traceability

The topic for our second Transpection Tuesday on "Arguing for Exploratory Testing" was Reuse.

We finished with two open questions:
Can we ensure we actually repeat the exact same test a second time?
How do you actually achieve reuse in exploratory testing (when it is desired)?

Reasons to reuse tests
First we tried to state reasons someone would want to reuse test cases:
  • Save time during test design
  • Functionality is changed and we want to rerun the full/part of the test scope
  • We want to verify a (bug) fix
Preconceptions
Looking at reasons quickly led us to some preconceptions which became the topic for a big portion of the session:
  • Effort = Value
  • Equal execution = Equal value
  • Our scope is (almost) complete
  • Reuse = free testing
  • A monkey can run a test case
Preconception: Effort = ValueSince we've invested so much time (as well as money and prestige) in writing test cases they must be worth more than a single execution.
  • Even if presented with clear evidence we may reject it to defend out judgement
  • We may overestimate what a test case is useful for (we want to get the most out of our work)
  • It's my work, criticize it and you criticize me not the work! (common and unfortunate misconception)
It takes a lot of self-esteem to say: "Yeah I screwed up, could you help me?", especially in an environment where mistakes are not accepted. Notice many of the "so how can we make the most of this mistake" still communicates "so you made a mistake, now you'll have to suffer for it by telling us why you are a failure". It takes a lot of work to change this.

Preconception: Equal execution = Equal value
Let's say we execute the exact same steps in a scripted and an exploratory way, wouldn't that be two identical tests? We believe not.
  1. Your goal differs. With test cases your goal is to finish as many test cases as possible (progress). That's how you measure "how much testing you were able to perform". In exploratory testing you are judged based on the information you provide thus you should be more incline to spend a few extra minutes observing/following something up even when it's not "part of your test".
  2. Your focus differs. When you have a script you have to focus on following that script. In exploratory testing your goal is typically to find new leads to base the next test on. That means in one case your focus is on the product and in the other on an artifact. Think about the Invisible Gorilla experiment.
  3. Scripts easier bias you not to observe. In a script you typically have verification steps e.g. "verify X=5". We believe this could bias you to not be as observant during the other steps: "this is just setup so nothing should happen that concerns me".
Preconception: Our scope is (almost) complete
We know a feature's boundaries (specifications, requirements) so when we set the scope for testing we can, and usually will, cover almost the entire feature.
  • We can't know the boundaries of a feature:
    • We will impact and use other components not "part of the feature" e.g. other code in the application, the operating system, surrounding applications, third party plugins, hardware, hardware states etc.
    • We interpret planning documents differently, adding parts, discover things we couldn't had anticipated, correct mistakes or interpret something differently than intended by the author and/or interpreted by the tester.
  • We can (almost) always tweak a test a little bit (e.g. change input data or timing). But testing all combinations (we recognize) is way too expensive. Also there are usually so many ways an application can be misused (intentionally or unintentionally) that even with a ton of creativity we can't figure out them all (ask any security expert .)
So our scope is basically a few small dots on a big canvas rather than a well colored map. But those dots are (hopefully) carefully selected to protect us from the greatest risks we can anticipate. Still, they are only dots.
As testers we easily support the preconception of full coverage by answering questions like "do we cover this feature now?", "is all testing done?" etc. with a simple "yes" or "almost". The more accurate answer would be "we cover the most important risks we identified limited by our current knowledge, time available and other constraints", but that answer is not very manager friendly which leads us to...

There is a general lack of knowledge and understanding of testing in most organizations. And we decided to stop there since that was a way too big question to tackle at the point we got there. But it's an important questions so please take a moment and think about it for yourself: How can you improve understanding and interest for test in your organization?

A final note. Since we only cover a small part, reusing a test scope will not help us catch the bugs we missed the first time. How big of a problem that is differs but repeat a few times and it may scale up in a nasty way.

Preconception: Reuse = free testing
We've already written the test case so wherever it's applicable (which should be self-explanatory) we can just paste it into out test scope and voí la! Free coverage!

The big issue here is the "self-explanatory" part. Problem is what fitted well in one feature might not do it in another even similar one. Even without needed tweaks we still have to figure out what the test case actually does, so that we know what we have covered with it and what we still need to cover in other ways.

This process is expensive, really expensive, so sure we save time not having to figure out the test and how to practically run it all over again but consider the time it takes to find the test case, analyse what it covers, analyse what it doesn't cover, analyse how it interacts with existing test cases, analyse if something has changed that impacts the test case compared to last time and so forth.

Preconception: A monkey can run a test case
  • We all interpret things differently. Click can mean single click, double click, right click (already assuming the first two were left clicks), tab and use enter, middle button click, etc. Even a well written, simple test case can lead to different interpretations.
  • One thing we're looking for is unexpected behavior and it's in the nature of "unexpected" to be something we can't plan for. Thus to get much use of a test case we need to handle the system well enough to investigate and identify when a behavior is unexpected or undesired.
  • We do a ton more observations than we consciously think of. These observations takes practice, focus and skill. For example, when you boot your computer you would react to a lot more things than you would add in a "boot test". Examples: screen is blinking, smoke is coming out, lights in the room flickers, you hear strange mechanical sounds, all these should catch you attention but are unlikely written down.

    More skill and/or focus can lead to more valuable observations: The login screen looks different, memory calculations are wrong, it's slower than usual/expected, BIOS version is incorrect, the operating system's starting mode is wrong etc.
  • When we don't fully understand something we tend to write it down less detailed (sucks to look stupid by writing down something incorrect and we're too lazy to investigate every detail we don't understand, it's easier to investigate as we get there).
  • When we write a test case based on specifications, requirements and other "guesses" of how the end system will work even a flawless instruction will sometimes not correspond to how the system is actually working (including when working as desired). This of course requires the person executing to be able to correct the test case thus understand both the intention with the test case and how the system works.
  • If we don't understand the system we may lose a lot of time setting up fully or partly irrelevant variables to the values stated in the instructions. The immediate comment is, if we have stated irrelevant variables in the test case we've failed. Consider then that the variable might be irrelevant to the test but mandatory to the system (e.g. you have to set a valid time server). Leave that out and the person executing once again needs to understand the system.
When is reuse actually beneficial?
  • We have rewritten something from the ground up but want it to externally still work the same. Reuse could save time.
  • We have some critical paths through the system that can't break.
  • We need to quickly regression test a feature without having to dig in too deep in the feature itself.
But remember that the one executing still should understand the test and the system to ensure tweaks (using different input values, triggering different fault cases etc.) can be made and important observations are more likely to be made.

How can we achieve reuse in Exploratory Testing
Not covered much by this particular session but a few thoughts:
  • Charters
  • Debrief notes
  • Test ideas
  • Test plans
Try creating a report specifically used as a "feature summary" including valuable operational instructions, general test ideas, impacts, lessons, problems, tools, important details, testability etc. We did kind of this at my former company where we let the test plan continuously turn into an end report as we added lessons from our testing. This would not only help when retesting a similar feature but also as educational material or test plan input for instance. Important though is to stay concise, noise is a huge enemy! The number of readers of a document is inversely proportional to the number of pages in the document, you know .)

A few notes on test case storage
First off I love this post on having a big inventory of test cases by Kristoffer Nordström.

It's easy to think something you've already created is free, but there's no such thing. Having a large inventory to test cases costs in many different ways:
  • Storage
  • Noise (it's one more thing testers have to keep track of)
  • Another tool/part of tool for testers to stay updated with / learn / understand
  • For a test case to be fully reusable later it should be kept up to date. How many refactors all their old test cases as functionality is changed?
  • ... if you do, that sounds really expensive.
Summary
Reuse has it's place but be careful!

Remember reuse means inheriting blind spots, has a cost and still requires the person "reusing" to know just as much about the feature, system and testing in general as if (s)he wasn't reusing old checks.

Take care, and I hope these Transpection Tuesday notes (even though somewhat messy) were helpful!

... and of course, thank you Helena!

03 October 2013

Arguing for Exploratory Testing, part 1, Traceability

Background
The topic for my and Helena Jeret Mäe's last Transpection Tuesday was Arguing for Exploratory Testing. What we basically wanted to achieve was to get better at explaining the pros (and cons) about exploratory testing, in a concise way, as well as identify common preconceptions about scripted versus exploratory testing.

Input
We had defined 15 subtopics such as time estimations, credibility and making sure the important testing is done. The first item on this list was traceability which turned out to be enough material to fill the whole 2 hour session.

What is Traceability
First question was: What do we mean with traceability?

Our answer: Being able to track what has been tested, how, when and by who.

Why do we want Traceability
The next question was why we want traceability. We quickly formed a list but reading it now makes me realize we mixed together traceability and claimed benefits of having a trunk of test cases. But anyway:
  • External demands
  • Ensure work has been performed
  • Base for further testing
  • Support handovers
  • Create a map
  • Reuse
General thoughts
One thing we got back to over and over again was: The best way (often related to level of detail) to achieve good enough traceability is highly context dependent! For example having a simple mind map with short comments is good enough for one company while another requires every session to be recorded with the recordings being stored and indexed together with session notes, debrief summaries and saved logs. It all depends!

Another reoccurring theme was: "But do we really achieve that kind of traceability with test cases". I will not bring up those discussions much in this post but expect another one on "false assumptions about scripted and exploratory testing" soon.

Terms

Charter
Charter is basically an area to test, a way to break down a big testing mission. Notice though that as you test new charters might come up so it's by no means a definite plan. Read more >>

Test idea
Typically a one liner describing one or more tests you want to do. Read more >>

Session
A timeboxed, uninterrupted test sitting, typically 60-120 minutes. Read more >>

Debrief
Refers to an activity happening after a session where the tester explains what has been done to, for example, a test manager. This also includes clarifying questions, feedback and other kinds of dialog to help both parties learn from the session. Read more >>

Recording

We mainly refer to screen recording (video, either using a screen recording tool or an external video camera) but could as well mean record audio, save logs/traces or other ways to save what has been done. A good resource >>

External demands
This refers to regulated businesses (watch the excellent presentation What is good evidence by Griffin Jones), evidence in a potential lawsuit or customers demanding test data.

Possible solutions:
  • Record the sessions, preferably with configuration (device, version, settings etc.) explained if that matters. Adding commentary might improve the value as well (communicating purpose, observations etc.). This is also typically a scenario where logs/traces can be a required addition to a video recording. Once again, watch What is good evidence.
  • Store session notes
  • Store session summaries
  • Store charters
  • Store debrief summaries
  • Store test ideas (assuming they has been covered by your testing)
Creating support to find old information (index) seems key as well. For this charters, time stamps and/or categories might be useful to tag your save material with.

Ensure work has been performed
First question raised was: Is this really something we want to encourage? And our general answer is no; with the motivation that people in our experience tend to do things to look good rather than do what is needed/valuable when closely monitored. But being able to know that the testers actually do their job is closely connected to credibility and transparency so still a valid question.

Possible solutions:
  • Debriefs
  • Recordings
  • Notes
  • Bugs reported (a really bad metric for this but can indicates something!)
Debriefs seemed to most often be the preferred approach. During a good debrief the person being debriefed asks followup questions that will require the person debriefing to explain the testing done. A byproduct in this process would be to ensure that the tester actually did a good job / any job at all. But once again; if your focus is on monitoring, the people monitored (testers as well as non-testers) is likely to waste time proving job has been done rather than actually work!

Base for further testing
Let's say we've finished the prepared scope or are suddenly given an extra week to test something. If we can't go back and use already executed tests as inspiration, how do we know where to continue?

Possible solutions:
  • Having a bulk of charters as inspiration
  • Make comments about testing you've left out in your finished charters/sessions
  • Review session notes
We also brought up if there's a value of actually looking at what has been done. Often we found that the time it takes to analyse the work already done might not be worth it (information being too detailed making it hard to overview and learn from quickly). Simply exploring using knowledge we might not had had the first time or by having a different tester from when we first tested, is often more than enough to add value. After all, the time we analyse is time we cannot test (which might or might not be well invested).

Support handovers
One tester leaves (quits, parental leave, other tasks etc.) and another has to take over, how can we manage such a change when not having a set scope of test cases? First of all the new tester do have to spend some time getting familiar with the feature in exploratory testing but this is also true for using test cases since we, for instance, can't predict what problems we will run into thus can't prepare instructions for those!

But we can make it easier:
  • Charters (with status)
  • Debrief
  • Documented test ideas with already investigated ideas being marked
  • Session notes or session summaries
  • Mind maps or other test planning with already tested parts commented
  • Documenting lessons learned (like operational instructions)
Debrief in this case refers to a general debrief of what has been done, what we know is left, problems seen, lessons learned, where information is stored, who to talk to etc. by the tester leaving. Of course if the switch happens very suddenly (e.g. sickness) performing this is not possible and in that case it's important testers are professional enough to document what has been done (mind maps, short plans, visualizations, debrief/session summaries, charters). This is once again true for both exploratory and scripted testing.

Create a map
A bulk of test cases combined with statuses can somewhat be used to draw a map of what has been covered and what is left to test. How can we visualize this without test cases?

Possible solutions:
  • Charters
  • A mind map describing what has been tested
  • A picture/model of our product with comments about testing/coverage
  • Other visualizations like diagrams
  • The Low Tech Dashboard
A few important notes:
  1. You sure have a map with test cases but is it actually anyway near accurate? Say we have two equally complex functions. One takes 1 argument, one takes 10. We likely will have at least 10 times as many test cases to cover the second function. So if we execute all the test cases for the second function, have we really covered over 90% (with "covered" only considering these 2 functions)?
  2. Even if equally sized, that map would not cover what we didn't anticipate from the beginning so you still need to add an up to date judgement/evaluation (e.g. "wow that network protocol sure was more complex when we expected during the planning, we need more testing of it!").
  3. Scale is really important. Do we want to see Tartu, Estonia, Europe, the world or the Milky Way galaxy? We might need different visualizations to create all the maps we need (once again, think about value, how much time can we spare to keep these updated).
Reuse
Later a similar feature or a feature impacting the one we just tested is developed and we want to reuse the work previously done. How can we do this without test cases?

First of all, reuse is one of the places where test cases are powerful. However you have the minesweeper problem: If you walk the same lane in a mine field over and over, as new mines are constantly added, it's likely that the number of mines beside your narrow track start to build up while few will happen to end up in your path. Meaning, running the same tests over and over is less likely catch new bugs as creating new tests are so value quickly diminishes (more tests executed is not equal to more valuable ground covered).

What we often would suggest is to use knowledge acquired the first time as foundation for new testing to speed it up. Think about the new risks introduced and what needs to be tested based on that (like with new functionality) rather than how old test cases might fit into your testing.

Possible solutions:
  • Reuse of charters
  • Reuse of test ideas
  • Look at old session notes / summaries
  • Use old recordings (the simpler the form of the recordings the better for this, watching several hours of screen recording is probably waste)
  • Start a wiki page/document/similar for each feature and add lessons learned, where to find info, problems etc. as you test.
Summary
There are many ways of achieving traceability (and similar potential benefits of test case trunks) in exploratory testing, Session Based Test Management principles seems to be the most straight forward way but keeping track of test ideas or using other approaches works as well. All have their own contexts where they seem to work best (e.g. SBTM might add too much overhead for a simple project).

All and all, if someone claims "You lose traceability with exploratory testing", ask what that person means more precisely (e.g. present testing data to customer) and explain the alternatives. Notice this is only based on our two hour discussion and there are a whole lot more to add so think for yourself as well! Also question whether you actually achieve the kind of traceability requested using a scripted approach and to what cost. Finally question if the requested traceability is actually worth its cost no matter if exploratory or scripted testing is used. Doing unnecessary work is wasteful no matter what approach you use.

Finally: There are still contexts where a highly scripted approach is likely the best option but the closer you get to a pure scripted approach the fewer and more extreme the contexts become.

Thank you for reading!

And thank you Helena, see you next week!