eprintid: 1433098 rev_number: 38 eprint_status: archive userid: 608 dir: disk0/01/43/30/98 datestamp: 2014-07-08 11:32:13 lastmod: 2021-10-28 22:18:24 status_changed: 2014-07-08 11:32:13 type: conference_item metadata_visibility: show item_issues_count: 0 creators_name: Szabo, Z creators_name: Gretton, A creators_name: Póczos, B creators_name: Sriperumbudur, B title: Distribution Regression - the Set Kernel Heuristic is Consistent ispublished: pub divisions: UCL divisions: B02 divisions: C08 divisions: D76 note: http://arxiv.org/abs/1402.1754 abstract: Bag of feature (BoF) representations are omnipresent in machine learning; for example, an image can be described by a bag of visual features, a document might be considered as a bag of words, or a molecule can be handled as a bag of its different configurations. Set kernels (also called multi-instance or ensemble kernels; Gaertner 2002) defining the similarity of two bags as the average pairwise point similarities between the sets, are among the most widely applied tools to handle problems based on such BoF representations. Despite the wide applicability of set kernels, even the most fundamental theoretical questions such as their consistency in specific learning tasks is unknown. In my talk, I am going to focus on the distribution regression problem: regressing from a probability distribution to a real-valued response. By considering the mean embeddings of the distributions, this is a natural generalization of set kernels to the infinite sample limit: the bags can be seen as i.i.d. (independent identically distributed) samples from a distribution. We will propose an algorithmically simple ridge regression based solution for distribution regression and prove its consistency under fairly mild conditions (for probability distributions defined on locally compact Polish spaces). As a special case, we give positive answer to a 12-year-old open question, the consistency of set kernels in regression. We demonstrate the efficiency of the studied ridge regression technique on (i) supervised entropy learning, and (ii) aerosol prediction based on satellite images. date: 2014-05 official_url: http://www.csml.ucl.ac.uk/events/164 vfaculties: VFLS oa_status: green full_text_type: pub language: eng primo: open primo_central: open_green verified: verified_manual elements_source: Manually entered elements_id: 956377 lyricists_name: Gretton, Arthur lyricists_name: Szabo, Zoltan lyricists_id: AGRET87 lyricists_id: ZSZAB96 full_text_status: public pres_type: presentation event_title: CSML Lunch Talk Series event_location: London, UK event_dates: 2014-05-02 - 2014-05-02 citation: Szabo, Z; Gretton, A; Póczos, B; Sriperumbudur, B; (2014) Distribution Regression - the Set Kernel Heuristic is Consistent. Presented at: CSML Lunch Talk Series, London, UK. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/1433098/1/Zoltan_Szabo_invited_talk_Distribution_Regression_the_Set_Kernel_Heuristic_is_Consistent_02_05_2014.pdf