Offline Reinforcement Learning — CQL

1.5
50%
CQL penalizes Q(s,a) for out-of-distribution actions not in dataset