UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Combining online and offline knowledge in UCT

Gelly, S; Silver, D; (2007) Combining online and offline knowledge in UCT. In: (pp. pp. 273-280).

Full text not available from this repository.


The UCT algorithm learns a value function online using sample-based search. The TD() algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 x 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 x 9 Go program. Each technique significantly improves MoGo's playing strength.

Type: Proceedings paper
Title: Combining online and offline knowledge in UCT
DOI: 10.1145/1273496.1273531
UCL classification: UCL > School of BEAMS
UCL > School of BEAMS > Faculty of Engineering Science
URI: http://discovery.ucl.ac.uk/id/eprint/1368113
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item