<> <http://www.w3.org/2000/01/rdf-schema#comment> "The repository administrator has not yet configured an RDF license."^^<http://www.w3.org/2001/XMLSchema#string> . <> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10115794> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/AcademicArticle> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/title> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/abstract> "Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. We also present an analysis of the ranked reward mechanism, in particular, the effects of problem instances with varying difficulty and different ranking thresholds."^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/date> "2018" . <https://discovery.ucl.ac.uk/id/document/1220504> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Document> . <https://discovery.ucl.ac.uk/id/publication/ext-90d9649c1b572ff65191b89a7924747a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Collection> . <https://discovery.ucl.ac.uk/id/publication/ext-90d9649c1b572ff65191b89a7924747a> <http://xmlns.com/foaf/0.1/name> "Advances in Neural Information Processing Systems 31 (NeurIPS 2018)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/publication/ext-90d9649c1b572ff65191b89a7924747a> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/status> <http://purl.org/ontology/bibo/status/published> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_4> <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_5> <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_6> <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_7> <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_8> <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> . <https://discovery.ucl.ac.uk/id/eprint/10115794#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_9> <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> . <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> <http://xmlns.com/foaf/0.1/givenName> "Y"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> <http://xmlns.com/foaf/0.1/familyName> "Fu"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-06a28ec7b2dbb947d1812db0b294b9eb> <http://xmlns.com/foaf/0.1/name> "Y Fu"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> <http://xmlns.com/foaf/0.1/givenName> "A-S"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> <http://xmlns.com/foaf/0.1/familyName> "Cohen"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-bace709a7a040496d812a1040f7c62a0> <http://xmlns.com/foaf/0.1/name> "A-S Cohen"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> <http://xmlns.com/foaf/0.1/givenName> "D"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> <http://xmlns.com/foaf/0.1/familyName> "Kas"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-665fc3ea518d408460ac5c2886dcbb73> <http://xmlns.com/foaf/0.1/name> "D Kas"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> <http://xmlns.com/foaf/0.1/givenName> "K"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> <http://xmlns.com/foaf/0.1/familyName> "Hajjar"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-34a8c4508ae55445c0eab793a6190c11> <http://xmlns.com/foaf/0.1/name> "K Hajjar"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> <http://xmlns.com/foaf/0.1/givenName> "A"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> <http://xmlns.com/foaf/0.1/familyName> "Kerkeni"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-e9d47130cf2775890399af5774543c9f> <http://xmlns.com/foaf/0.1/name> "A Kerkeni"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> <http://xmlns.com/foaf/0.1/givenName> "K"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> <http://xmlns.com/foaf/0.1/familyName> "Beguir"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-7e7b1379f7adb5d147851a2c1910e979> <http://xmlns.com/foaf/0.1/name> "K Beguir"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> <http://xmlns.com/foaf/0.1/givenName> "TS"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> <http://xmlns.com/foaf/0.1/familyName> "Dahl"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-895cc71bc674d771c5e7d13162d80d18> <http://xmlns.com/foaf/0.1/name> "TS Dahl"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> <http://xmlns.com/foaf/0.1/givenName> "A"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> <http://xmlns.com/foaf/0.1/familyName> "Laterre"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-592c878c491fcc000b695da4fbcf6baa> <http://xmlns.com/foaf/0.1/name> "A Laterre"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> <http://xmlns.com/foaf/0.1/givenName> "MK"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> <http://xmlns.com/foaf/0.1/familyName> "Jabri"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-3e26e4384e4f855ddfbdcd56b6734dc4> <http://xmlns.com/foaf/0.1/name> "MK Jabri"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/EPrint> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/ArticleEPrint> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/repository> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220504> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220504> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Text)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://purl.org/dc/elements/1.1/hasVersion> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasPublished> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220504> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/1/1807.01672v3.pdf> . <https://discovery.ucl.ac.uk/id/document/1220504> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/1/1807.01672v3.pdf> . <https://discovery.ucl.ac.uk/id/eprint/10115794/1/1807.01672v3.pdf> <http://www.w3.org/2000/01/rdf-schema#label> "1807.01672v3.pdf"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220505> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://eprints.org/relation/islightboxThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/2/lightbox.jpg> . <https://discovery.ucl.ac.uk/id/document/1220505> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/2/lightbox.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10115794/2/lightbox.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "lightbox.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220506> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://eprints.org/relation/ispreviewThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/3/preview.jpg> . <https://discovery.ucl.ac.uk/id/document/1220506> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/3/preview.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10115794/3/preview.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "preview.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220507> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://eprints.org/relation/ismediumThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/4/medium.jpg> . <https://discovery.ucl.ac.uk/id/document/1220507> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/4/medium.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10115794/4/medium.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "medium.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220508> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://eprints.org/relation/issmallThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/5/small.jpg> . <https://discovery.ucl.ac.uk/id/document/1220508> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/5/small.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10115794/5/small.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "small.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1220509> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://www.w3.org/2000/01/rdf-schema#label> "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://eprints.org/relation/isIndexCodesVersionOf> <https://discovery.ucl.ac.uk/id/document/1220504> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10115794/6/indexcodes.txt> . <https://discovery.ucl.ac.uk/id/document/1220509> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10115794/6/indexcodes.txt> . <https://discovery.ucl.ac.uk/id/eprint/10115794/6/indexcodes.txt> <http://www.w3.org/2000/01/rdf-schema#label> "indexcodes.txt"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10115794> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://discovery.ucl.ac.uk/id/eprint/10115794/> . <https://discovery.ucl.ac.uk/id/eprint/10115794/> <http://purl.org/dc/elements/1.1/title> "HTML Summary of #10115794 \n\nRanked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization\n\n" . <https://discovery.ucl.ac.uk/id/eprint/10115794/> <http://purl.org/dc/elements/1.1/format> "text/html" . <https://discovery.ucl.ac.uk/id/eprint/10115794/> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10115794> .