11 April 2016

Time estimates in testing, part 1

Why this blog post

Activity planning and time estimation in testing is complex to begin with; combining that with cross-functional teams (having to actively take other's planning into consideration) and I've seen a lot of confusion. In this blog post I hope I can arm you with some useful ideas on how to deal with activity planning and time estimates in testing.

... and that turned out to be a lot bigger task than I anticipated so in this first part I will speak about the more overall concepts and in the second part I'll provide more concrete explanations of each approach (see the graphic below).

Key attributes

When I started listing various ways I've been working with estimates I continually came back to two variables that seemed central:
  1. To what degree does testers and developer share their planning?
  2. How much trust and effort is put into time estimates?
For this reason I made this little graphic:


Do note that this is still thoughts in process so I do appreciate any feedback on whether this makes sense or not.

Separate planning vs Shared planning

I've been in teams where testers were only or almost only expected to test what had just been implemented (as in testing the exact work described in the implementation stories thus sharing the planning with developers) as well as teams where testers were expected to create their own planning based on the testing needs; both extremes come with their own benefits and drawbacks:

Shared planning:
 + less administration (as in cheaper)
 + closer relationsship between testers and developers
 + quick feedback loops in general
 + easier to get focus on testability improvements in the code

Separate planning
 + more focus put on regression, system (e.g. consistency) and integration testing
 + easier to get work only affecting testers prioritized
 + planning can be optimized to fit testing

Two dangerous mind traps I've experienced:

Shared planning
Risk: Testing seen as something very simplistic where you just test the product's new features with no or little regard to the big picture.

Negative effect: General testing activities, such as overall performance testing, integration testing or regression testing are down-prioritized.

Implications: Bugs related to e.g. integration, unforeseen legacy impacts or product inconsistency.
Coping: To deal with this it takes testers or other team members who are good at advocating for testing in general and can motivate why certain test activities makes sense even though obviously impacting code is not changed.


Separated planning
Risk: Testers and developers move apart and create two separated subteams within the team.

Negative effect: Impacts information sharing, understanding of the other's work (which in terms may impact the respect for one another's profession) and developer's losing the feeling of responsibility for quality.

Implications: Prolonged feedback loop (code developed to bug reported to bug fixed to fix verified), worse quality on code handed over from development, testers lacking important information making them less effective/making them take bad decisions.
Coping: Well functioning team and social management, testers who're good at communicating their work and good at speaking with developers on the developers terms or vice versa, developers putting a great interest in testers and what they do.

I'll elaborate a bit more on this in part 2.

Effort put into estimates

Let's talk about the two extremes first.

When we put a lot of effort into estimates we may for instance split the work into small tasks to make them easier to estimate and try to give each an as exact time estimate as possible; e.g. "these 6 tasks are estimated to 3, 4, 6, 3, 1 and 9 hours so the time to test this feature should be around 26 hours".

Little effort are put into time estimates when we for instance accept an external time constrain, e.g. "get the testing ready for the release" or "finish your testing within this sprint" and we more inform the one setting the time constrain what we roughly think the quality of the testing will be at this point.

Important: "Quality of the testing" in this case refers to how deep/well we expect we will be able to cover the various features. This is not the same as the "quality of the product". Why the latter is not what is (typically) talked about is because that oe is way too complex to estimate and often out of what we can actively influence. For instance it's not our call if a bug we've identified should actually be fixed, that's up to the stakeholder(s).

At this point the former might seem more reasonable but it's not that simple...

Accuracy of estimates

There's no "done" in testing. This is sort of true for development as well, we can always continue to improve the code itself or polish the actual feature, but at least we can observe the product and say "this is good enough" and, important, that state of "done" can be fairly well-defined/described.

Even though some try to pretend it's not true; testing does not at all work like this.

Testing happens inside you head, development too (to a large degree); However, the product of testing also stays in your head and is relying on your ability to communicate it while the product of development can be easily observed by others. This complicates estimates since we cannot keep discussions around a fairly well defined end product; instead we have to leave it to everyone's interpretation of the task and do our best to communicate our own interpretation so we roughly estimate the same thing at least. For this to work at all, tester's first need consciousness of what they actually do, the necessary communication skills to explain this to others and the receivers must have enough understanding to interpret this message correctly. No human on earth (as far as I know) do even one of these things flawlessly (aka: communication is hard).

With no well defined "done" everyone will have to rely on their interpretations of what they think needs to be done and what they think the receiver asks for. That in terms will impact the estimate's accuracy but this's just part of the problem...

... the other part

On top of estimating a task we cannot clearly define we also have to estimate something that is inherently plagued with great levels of uncertainty:

The time it’ll take to test something depends on external factors such as the quality of the code, the complexity of the chosen design (which is often not set when estimates are done), the stability (up-time and robustness) of necessary environments, the help we'll get to correct problems found/blocking etc. To continue the reasoning the quality of the code depends on things such as how hard something is to implement, the skill level of the developer(s), how stressed they are (if a developer has a tight schedule she can typically stress the design a bit to get it out the door in time which in terms affect the quality which in terms affect the effort required to reach the abstract "done state" when testing) and stress level depends on even more factors. I can go on but let’s just say the time it takes to test something to the “abstract, hard to reasonably accurately define, level we desire” depends on so many external factors that the estimate's uncertainty is huge.

Bullshit!

I once told a project manager this and he replied: "Bullshit! My testers ace their estimates almost every time!" (in a product with pretty crappy quality and expensive testing should be added). And here comes a conflicting truth:

We can't predict when we'll be "done" but we can be "done" by Wednesday if you want.

In development we need to produce some kind of agreed upon product based on, at least, our explicit requirements. Remember how testing did not have a defined "done"; well taken to its extreme we can (almost) say we're done at any point in time since it's unclear what done is anyhow: "This is the findings we've made so far (and are able to communicate). You told us to be "done" until today so we are "done" but our professional judgement is this needs more testing as the following areas are still considered unstable or only superficially tested...". In a sense we always do this, estimates or not, but don't even spend time trying to guess in the latter case, instead we do our best to put us in a good enough state when the time is up.

... and before someone talks about bad attitude or lack of professionalism; this is of course not how it's practically done. Some level of estimation is always performed but rather than saying: "We think this will take 10 hours, this 5 hours, this 2 hours and this 8 hours." we might say for instance "We think we're on track for the release but the quality at this point is unusually low so we're a bit worried this might change".

Planning based on time or effort

This topic is too big to cover in this post but a quick rundown.

When planning our testing we can either focus on effort or time. Effort means "we want to reach a state where we/someone else feels confident about this particular part before we leave it" while time means "we want to spend the fixed amount of time we have in a way so that we've looked at everything at least briefly". In the former we're doomed to not be "done" if something runs late and leave the project management with little choice but delay e.g. a release if they don't want to take on a considerable risk (leaving parts completely untested). But this also allows us to argue for the need of more testing better by saying "we haven't even looked at these four features" and we'll spend less time revisiting areas having to "relearn them" as an area is left more in a "done state".

In the latter we will hopefully have found at least the most obvious, serious problems in all areas of the product and can thus make a general assessment: "our testing is fairly shallow in most areas and we would like to spend more time with the product but from what we've seen all of the new features seems to be okey at a basic level". The drawback with this is it's harder to argue that we actually do need more time to test if stakeholders aren't well informed and comfortable with the concept of "uncertainty" as well as a greater risk of needing to revisit areas for further testing.

How does this relate to estimates? Well in my experience the effort approach prevails when we have estimates and plan our work around these actual stories since "we shouldn't" have too many open stories at once and when they are closed, they are closed so we have to "complete them". In the same way the time approach is more common when we skip estimates (my experience), at least when in combination with a separate test planning. If we have a set deadline (e.g. sprint end or release date) we can more naturally plan in a way so that we've at least looked at everything once.

I say this because most stakeholders I've been in contact with seem to prefer the time approach but still argue for the importance of estimates and these concepts seem a bit contradictory, at least in the context of testing. One last note: The contradiction problem still applies to development: If we get a bit of everything ready we will at least have some kind of product in the end but as described earlier: Since more has to be in place for a piece of software to make sense combined with that estimates should be easier to get accurate enough; the contradiction is not as big of a deal (my biased opinion, feel free to challenge this).

Wrap up of estimate or not

To wrap up:

Since time estimates of testing will always be plagued with great levels of uncertainty, no matter the estimation effort, the question is if detailed estimates really provide enough additional value to support their cost (both time and the risk of misleading stakeholder to believe we know when we'll be "done"). The "ideal" level of estimation is also highly context dependent, the context changes over time and we can't really objectively measure whether our approach is good or bad; so we'll have to experiment and rely on our ability to observe/analyze.

... and finally don't confuse time estimates with planning; Planning is made in both cases and the effort spent on planning has no/little correlation to the effort spent estimating (my experience).

Stay tuned for part 2...