Why You Shouldn’t Give a Treat for Every Successful “Sit!”: The Pros and Cons of Fixed Versus Variable Reinforcement

While many dog owners think the best way to train their dog is to always reinforce a positive behaviour, we’ve found this is not necessarily the case. In fact, there are two main schedules of reinforcement that should be used at different points in training to achieve optimal results and a happy dog and handler.

In case you’re not sure, a schedule of reinforcement is the frequency in which we deliver rewards when the dog performs a behaviour. To simplify this, we will explain two general types of schedule–fixed and variable/random—and how to use them for best results.

When using a fixed schedule, we always reward the dog after he/she completes the same number of repetitions of the behaviour. For example, on a fixed schedule of one, we always reward the dog after he performs one repetition of the behaviour, whereas on a fixed schedule of three, we reward after three repetitions of a behaviour (for example, we reward after performing three sits in a row).

When trying to establish a new behaviour, a fixed schedule of one is great. However, if we continue this schedule for a long period of time, it can actually work against us. Here is why: if we reward every behaviour, once we stop rewarding, the behaviour will stop (become “extinct”) very quickly. To understand why this would happen, just think about a vending machine dispensing sodas. A vending machine is always on a fixed schedule of one. Every time you put money in it, a soda can pops out. One day, you arrive at a broken machine; you put money in it, but nothing happens. You are really thirsty so you try again, and still nothing happens – you walk away. Your behaviour has stopped because there was no reward (the soda can coming out).

Well, your dog will behave similarly. If you reward every time your dog performs a behaviour, and then suddenly try to reward every third time, the dog might stop performing the behaviour because it wasn’t rewarded every time. This is the reason why after establishing a behaviour, we need to change our schedule of reinforcement to a variable or random schedule. On this schedule, the dog never knows when the reward is coming. It can come after repeating the behaviour once, a few times, or many times – the dog cannot predict when the reward will come, which will encourage the dog to keep trying the behaviour when asked, since it never knows when the reward is going to come.

However, we must start transferring to a variable reinforcement schedule very gradually, or we risk the extinction of the behaviour. We start by asking for two behaviours for one reward. For example, we ask for the dog to sit, we don’t reward, but rather we take a step back, get him/her to walk towards us and ask for a sit again – and now we reward. Right after that, we ask for another sit and reward immediately. Then, we might ask for three sits before we reward, and right after that only one sit, and then two sits, etc.

This variable schedule is similar to a slot machine’s schedule. A slot machine works on a variable or random schedule of reinforcement. The gambler never knows when he/she will be rewarded, but keeps playing because the payoff can happen any time after the handle is pulled. The reinforcement varies in the amount of money given and in the frequency of the delivery. Hence, one always wants to pull the handle again, since there is always a chance that reinforcement will appear.

Your job is to become your dog’s slot machine. This will make him/her want to engage in a behaviour time after time, simply because there is always a chance of reward.

Happy Training