Before we start, I’d like to remind you that the opinions expressed in this blog are my own and do not necessarily reflect those of my colleagues.
With that out of the way – I think punishment gets a bad rap. Wait, wait, it’s not what you think! We’re not going into that kind of territory on this blog . . .
Skinner, of Skinner-Box fame, has framed a lot of our thinking about how we train animals. Skinner used the term ‘operant conditioning’ because he believed that the internal motivations weren’t the only things that shaped behaviour – that we learned from our environment, specifically that our behaviours influence the environment and generate consequences, and that we learn from this.
Now, admittedly, Skinner gives internal motivations short thrift. It’s worth pointing out I’ve made a career of measuring the outcome of internal x external motivations and the influence this has on the probability of behaviour. Internal motivations are important, but that’s probably a post for another day. Let’s talk about Skinner and his box first.
A rat in a box. Two levers. One lever, when pressed, gives food. The other lever, when pressed, shocks the rat. Understandably, the rat learns to press lever one and avoid lever two. The environment ‘trains’ the animal to perform certain behaviours.
At this point I’m going to take a short diversion. One of the reasons I’m doing this blog post is to try and get my head around how to teach this in a more effective way, since it always causes student confusion.
Let’s forget about Skinner for a moment and just focus on two things.
The first is ‘reinforcement‘. Whenever you ‘reinforce’ a behaviour, you’re increasing the likelihood of the animal performing the behaviour again. The second is ‘punishment‘. Whenever you ‘punish’ a behaviour, you’re decreasing the likelihood of the animal performing the behaviour.
Going back to the rat in the box. It’s showing two behaviours: it’s pressing lever one a lot, so that behaviour must be being reinforced. It’s not pressing lever two at all, so that behaviour must be being punished.
The question now is how are these behaviours being either reinforced or punished?
We use the words positive and negative to talk about this, but not in a qualitative good/bad way. Instead I think students would find it easier to think of it as ‘additive’ and ‘subtractive’, the only problem with this being that then they wouldn’t be using the same terminology as the rest of the world.
Positive Reinforcement gives the animal something to encourage the animal to perform the behaviour again. For example, when a dog sits on command it receives a treat. The behaviour being reinforced is the ‘sit’, the treat is the positive addition.
Negative Reinforcement takes something away from the animal to encourage it to perform a behaviour again. The something that we subtract has to be unpleasant for the animal so that they are rewarded by its removal (hence encouraged to do the behaviour again). A common animal example of negative reinforcement is pushing a dog’s bottom to encourage it to sit. When the animal sits (the behaviour we want to reinforce), the aversive stimulus (pushing) is subtracted.
Positive Punishment gives the animal something to discourage the animal from performing the behaviour again. Similar to the above example, in order to discourage the animal the stimulus we are adding should be unpleasant. A common animal example would be jerking the leash of a dog that’s pulling. The pulling is the behaviour we want to punish (decrease), and the leash jerk is the aversive stimulus we add.
Negative Punishment takes away something from the animal to discourage the animal from performing the behaviour again. If you’ve been following along you’ve probably guessed we have to take away something that the animal would want or desire. A common animal example would be a dog that barks when it greets its owner. The owner ignores it (removes the desired attention) and the behaviour decreases.
To further confuse matters however, sometimes these are classed into ‘aversive training‘ which would include negative reinforcement and positive punishment (because the stimulus we talk about in both these cases are aversive, or unpleasant), and ‘reward-based training‘ which includes positive reinforcement and negative punishment (because the stimulus in both these cases is rewarding, or pleasant).
Where it gets really complicated, in my opinion, is where people start to believe that one type of conditioning, or one kind of training, is by far superior to the others. ‘Reward-based’ training is usually the one that most animal welfare people are keen on (for obvious reasons, I should hope!) They cite papers such as Herron et al (2008) which show that confrontational training in dogs increases aggression. This has resulted in something odd where trainers will start saying things like “aggression should never be punished”. In training terms, this means you would never reduce the incidence of aggression being shown!
Positive punishment is the ‘worst’ of the aversive training methods by this thinking – but let me give you an example I’ve been using with Athena. When she arrived she had a terrible habit of chewing electrical cables. It was very worrying. I would scold her with an unpleasant voice (positive punishment!) and I would distract her with toys, but still she would do it. I ended up slathering chilli powder and vaseline over the most attractive cables so when she would start to mouth at the cables, she would receive an immediate aversive stimuli. This is positive punishment, an aversive stimuli used to decrease the occurrence of an undesirable behaviour.
So there is definitely a place for positive punishment – where it’s applied correctly. The chilli powder example works because the aversive stimuli is encountered the moment the undesirable behaviour begins, and stopping the behaviour quickly stops the stimuli presenting itself.
I also use negative punishment with Athena. Sometimes when we’re playing she will want to bite and scratch my hand. When this happens I let my hand go limp and stop playing with her. No matter how hard she bites, I don’t resume play. Play in this case is the reward, and my attention/play is removed when she starts displaying the undesirable behaviour. With this one, something else happens too. When she calms down and behaves gently again, play resumes. The good behaviour is reinforced by adding the desired stimulus (my attention/play) when it is performed. The combination of negative punishment and positive reinforcement here means that even though she’s getting bigger, her playing remains gentle and fun for both of us.
It’s impossible for any animal (humans included) to learn without encountering all four of these aspects. Aversive training is by definition unpleasant, but it can be appropriate to use. Take my positive punishment example. The consequences of Athena continuing the cable chewing behaviour were dire. The aversive stimulus added was relatively mild (and came with warning – I think she only actually chewed a chilli cable once, for the most part the smell was enough to make her decide otherwise), and she had a huge amount of choice about the situation: there were plenty of other things to play with (and she would be rewarded for playing with those other things), the aversive stimulus was well defined (on the actual cable – no real way of accidentally getting the aversive stimulus). Importantly, the punishment wasn’t perceived as coming from me and so our bond and her trust in me was also protected. Finally I only needed to apply the paste once. Now that the behaviour has reduced, we can use an even milder positive punisher (me saying ‘no’ in a loud, stern voice), if she tries to attempt it again.
I am sure no trainer would ever say that the ‘no in a loud, firm voice’ is inhumane, but it is a positive punishment. To say all punishment is bad is to further confuse the operant conditioning theory.
Your final exam, therefore, is to tell me – in the case of the rat with the two levers, how was it being trained? 😉
Edited to add – make sure you read Kathy’s comment below, very insightful!