Why variance of Importance Sampling off-policy gradient goes to infinity exponentially fast?
It is said in the lectures here at 11:30 that because the importance sampling weight is going to zero exponentially fast then the variance of the gradient will also go to infinity exponentially fast. Why is that? I do not understand what causes this problem?