Least-Squares Temporal Difference Learning Boyan (ResearchIndex) Document details from CiteSeerX (Isaac Councill, Lee Giles): TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. F boyanciteseerxjustin By Bagfieldsin Uncategorizedwith uncategorizedboyanresearchindex