Why variance of Importance Sampling off-policy gradient goes to infinity exponentially fast?

https://preview.redd.it/8eed729klj771.png?width=1058&format=png&auto=webp&s=584d2e744d436d4b75e2e40d69e58e6d14cbcd9a

It is said in the lectures here at 11:30 that because the importance sampling weight is going to zero exponentially fast then the variance of the gradient will also go to infinity exponentially fast. Why is that? I do not understand what causes this problem?

Madison Howard

Share Your Mood

miladink

Why variance of Importance Sampling off-policy gradient goes to infinity exponentially fast?