LEARN Toolkit of Best Practice for Research Data Management

Research data is the new currency of the digital age. From sonnets to statistics, and genes 
to geodata, the amount of material being created and stored is growing exponentially. 
The LERU Roadmap for Research Data identifies a serious gap in the level of preparation 
amongst research performing organisations. This gulf is prominent in areas such as policy 
development, awareness of current issues, skills development, training, costs, community 
building, governance, disciplinary/legal/terminological and geographical differences. 
LEARN will help decision and policy makers identify sound solutions. 
Stakeholders can follow this LEARN Toolkit of Best Practice Case Studies, 
all of which will help organisations to grapple with the data deluge.


Introduction
In April 2016, Jisc issued an information leaflet on Managing research data in your institution. 1 They concluded that 'data needs to be selected, curated, retained and stored, using appropriate metadata'. The call was timely and aimed at research performing institutions. Similarly, the Research Data Alliance (RDA) also makes an important offering in the research data space. 2 The RDA is an international organisation focused on the development of infrastructure and community activities aimed to reduce barriers to data sharing and exchange, and to promote the acceleration of data-driven innovation worldwide. With over 4,500 members globally, the RDA comprises individuals, organisations and policy makers representing multiple industries and disciplines, who are committed to building the social, organisational and technical infrastructure needed to reduce barriers to data sharing and exchange, and to accelerating data-driven innovation worldwide.
From 11-17 September 2016, more than 850 data professionals and researchers from all disciplines around the globe convened in Denver, Colorado, for the first edition of International Data Week (IDW). This landmark event, organised by CODATA, the Committee on Data of the International Council of Science (ICSU), the ICSU World Data System (WDS) and the Research Data Alliance (RDA), brought together data scientists, researchers, industry leaders, entrepreneurs, policymakers and data stewards from all disciplines to explore how best to exploit the data revolution in order to improve science and society through data-driven discovery and innovation. 3 In the UK, the Digital Curation Centre (DCC) provides access to a range of resources including Howto Guides, case studies and online services. Their training programmes aim to equip researchers and data custodians with the skills they need to manage and share data effectively. The DCC also provides consultancy and support for issues such as policy development and data management planning. 4 Clearly, research data management is a topic of wide interest. The Research Data Curation Bibliography by Charles W. Bailey lists over 620 selected English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions. 5 What issues are current for those involved in RDM? For decision makers, the primary issue is probably that of the associated costs. The 4C project offers an overview of relevant cost models. 6 One of these is the LIFE model -Lifecycle Information For E-literature -for which one of the LEARN project partners (UCL) was a joint lead. 7 The LIFE costing model is: L = C + Aq T + I T + M T + BP T + CP T + Ac T where L = Complete lifecycle cost over time 0 to T. C = Creation Aq = Acquisition I = Ingest M = Metadata Creation BP = Bit-stream Preservation CP = Content Preservation Ac = Access T = Period of time over which identified activity lasts However, there is an elephant in the room with regard to RDM costing. As 4C says, 'There is a sizeable canon of research into cost modelling for digital curation but the research is in many ways preliminary and there has been little uptake of the tools and methods that have been developed. For example, tools to manage and estimate costs have not been integrated into other digital curation processes or tools.' 8 This has made it extremely difficult for research performing institutions to take RDM forward locally when total costs are unclear, for decision makers do not write blank cheques. Even when costs are known, many institutions are unable or unwilling to reveal their costing activities.
Another issue which is setting the agenda for RDM in Europe is the recent publication of the Commission's High Level Expert Group Report on the European Open Science Cloud. 9 This Report has at its kernel the benefits which Open Data can bring to research communities. It bemoans the current fragmentation in the European research data landscape and states starkly, 'There is no dedicated and mandated effort or instrument to coordinate EOSC-type activities across Member States '. 10 At institutional level, a baseline was drawn by the LERU Roadmap for Research Data, 11 which was published in December 2013. This was the first document to look in detail at the opportunities and challenges which face European research performing organisations in the RDM space. LERU is the League of European Research Universities, comprising 23 members in 12 European countries. Two members of the LEARN project, UCL and the University of Barcelona, are also members of LERU. 12  Whilst the Roadmap was written on behalf of LERU members, the issues it analysed were in fact generic and can be said to apply to any research performing organisation anywhere in the world -they are by no means exclusive to research-intensive universities. The Roadmap 'presents a series of blueprints which LERU members, indeed any European university, could use to begin to tackle the challenges which research data poses. It also has a series of messages for researchers, research institutions, support services and policy makers '. 13 8. 4C: http://www.4cproject.eu/about-us/; last accessed 9 February 2017. 9. European Commission: http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud; last accessed 9 February 2017. 10. Ibid., p. 6. 11. LERU: http://www.leru.org/files/publications/AP14_LERU_Roadmap_for_Research_data_final.pdf; last accessed 9 February 2017. 12. LERU: http://www.leru.org/index.php/public/about-leru/members/; last accessed 9 February 2017. 13. LERU: http://www.leru.org/files/publications/AP14_LERU_Roadmap_for_Research_data_final.pdf, p. 3; last accessed 9 February 2017.

MAXIMISING THE VALUE OF RESEARCH DATA: A KEY PRIORITY FOR RESEARCH FUNDERS
There is a strong and growing consensus among research funders over the need to ensure that data outputs resulting from the research we support are managed and shared in ways that will deliver the greatest benefit to society. Over recent years, funders around the world have introduced policies requiring that their funded researchers make data available to others in a timely and responsible manner, and plan their approach for managing data as an integral part of planning their research.
The Wellcome Trust 1 is a global research foundation dedicated to improving health for everyone through enabling great ideas to thrive. This case study summarises our experience in implementing our data management and sharing policy over the last decade, drawing out some of the key lessons and remaining challenges.

WHY DOES DATA SHARING MATTER TO RESEARCH FUNDERS?
In common with other research funders, Wellcome's work to encourage data sharing is driven in large part by a recognition that much of the data currently generated by research represents a vast untapped resource. Enabling researchers and other users to access, combine and use data could open up new avenues for discovery and innovation that might never have been anticipated by the original data generators. In addition, access to the data underlying research findings is critical to ensure that these claims can be scrutinised and reproduced. Data sharing also holds the potential to help reduce avoidable duplication and waste -helping to enable research funds to be allocated effectively and enhancing the efficiency of the research enterprise.

DEVELOPMENT OF WELLCOME'S DATA MANAGEMENT AND SHARING POLICY
Wellcome's policy on data management and sharing 2 was published in January 2007, and was updated following a review in 2010. It followed two years after the introduction of our policy on open access to research publications 3 , and built on Wellcome's work over many years to develop key data resources for the benefit of the research community -notably in the genomics field where we took a lead role in ensuring the data generated in the Human Genome Project was made immediately available, with no restrictions on its use. Unlike our open access policy, where we were able to set out a clear mandate that all original research papers we fund be made open access within six months of publication, our data management and sharing policy allows for a case-by-case approach. Whilst a research article is a single type of output for which a consistent rule could be applied, the optimal approach for sharing data outputs will vary dramatically depending on the nature of the data and the research context. Furthermore, there are some types of data which cannot be shared openly and where limits on sharing are required -particularly data relating to human research participants.

KEY POLICY PROVISIONS
Our data management and sharing policy is very similar to those of other major funders -including the UK Research Councils and US National Institutes of Health. The key elements are as follows: • We expect all of our funded researchers to maximise the availability of research data with as few restrictions as possible; • We require applicants for funding to submit a data management and sharing plan in cases where their proposed research is likely to generate data outputs that will hold significant value as a resource for the wider research community; • We commit to review data management and sharing plans, and any associated costs to deliver them, as an integral part of the funding decision; • We expect all users of research data to acknowledge the sources of their data and to abide by the terms and conditions under which they accessed the original data.

OUR EXPERIENCE IN IMPLEMENTING THE POLICY
Over the ten years the policy has been in place, we have undertaken periodic reviews to take stock of the data management and sharing plans being submitted and to gauge the perspectives of researchers and reviewers. Following our review of the policy in 2010, we introduced more detailed guidance for applicants 4 on developing data management and sharing plans -structured around seven key questions that plans should address (Figure 1.1). The overall quality of plans has improved over time, but plans still vary significantly in terms of their levels of detail and specificity. Particularly in areas of research where data sharing resources and practices are less well-established, many researchers and reviewers still do not feel there is sufficient clarity on our expectations of researchers or, in many cases, how best to put data sharing requirements into practice.
At present, it is also not clear that the costs of implementing data plans are always being fully factored into funded applications. In 2016, we updated our guidance to give greater specificity on the costs that could be requested in terms of people and skills, data storage and computation, data access, data preservation and deposition.
Finally, our ability to monitor the extent to which researchers put their plans into practice is currently limited. While data sharing does form a key criterion in decisions over renewals for major resources and databases we support, we do not have a process, nor the in-house resources, systematically to track the delivery of plans across the bulk of research we support.

WIDER CHALLENGES IN DATA SHARING
Different research disciplines are at very different stages in developing the resources and practices required to support data sharing. Several major barriers persist which must be overcome if funder policies to maximise the value of research data are to be successful -key amongst these are: • Infrastructure and tools -building and sustaining the technical resources and tools needed to store, access and analyse vast and complex research datasets; • Culture and incentives -fostering a cultural shift to ensure data sharing is valued and rewarded appropriately; • Capacity and skills -developing the skills necessary in the research community to manage and analyse data effectively; • Ethics and governance -establishing the policy frameworks to ensure data sharing is ethical and fair, and maintains public trust.

WORKING IN PARTNERSHIP
To take on these challenges, Wellcome's approach over recent years has been to focus on strategically important research areas where we believe there is the potential to work with the community to advance data sharing, and to forge partnerships with other funders to drive change. For example, we have: • established the Expert Advisory Group on Data Access 5 in partnership with MRC, ESRC and Cancer Research UK to provide strategic advice to funders on data access for cohort and longitudinal studies across genetics, epidemiology and social sciences; • joined with a consortium of pharmaceutical companies to support the ClinicalStudyDataRequest.com 6 platform to enable research access to clinical trials data; • taken a lead role in working with funders and journals to drive the rapid sharing of research data related to public health emergencies 7 .

EMERGING PRIORITIES
Wellcome is actively exploring how we can build on the work we have done to champion data sharing. In terms of our policy, we are likely to move towards a more holistic approach for the management of outputs. Rather than request a data management and sharing plan in isolation, we would ask researchers to outline a plan for managing and sharing any outputs of value (including software and research materials, as well as data) and to describe their approach where relevant to managing any associated intellectual property.
In parallel, we are also actively exploring how to strengthen implementation of our data management and sharing policy -including defining more clearly the roles of reviewers and staff in assessing plans and developing a clearer template for plans. We are particularly keen to explore the scope to work with other funders to implement machine-actionable data management plans that dynamically update as data outputs are generated.
Wellcome is also actively developing new opportunities to advance the broader goals of open research and pilot innovative models to push the boundaries of openness -building on our work to establish the Wellcome Open Research 8 publishing platform and Open Science Prize 9 .

CONCLUSIONS
Over the last five to ten years, there has been a growing international recognition of the crucial importance of maximising access to research data. There is a strong policy alignment between major research funders, but significant challenges remain in implementing these policies in practice. Based on Wellcome's experience, key issues for funders to consider in developing and implementing data sharing policies include the need to: • clarify expectations for researchers as far as possible; and to develop guidance tailored to specific research fields and data types in terms of current best practice for data management and sharing, and the resources available; • establish a clear process for reviewing and assessing data management plans and the associated costs, and a proportionate mechanism to track plans post-award; • develop new mechanisms for funders to recognise and reward the contributions of researchers who generate and share high quality datasets and initiate a broader cultural shift; • consider how best to work in partnership at national and international level to: -build and sustain repositories, standards and tools to support data sharing; -develop the skills and capacity needed to manage, share and analyse data; -harmonise policies and practices wherever possible; -advocate and champion the ongoing transition to open science approaches. this issue is best addressed by data management plans (DMP). A DMP was described in all examples (in some more thoroughly than others) or even considered as a mandatory requirement; in several policies there is evidence that a template was used, or a DMP-guidance tool (such as that of Digital Curation Centre).
The topic of "support and training" was universally treated as a necessary component of RDM and was mentioned in all policies. In contrast, the relevance of topics such as "educational data" and "cultural heritage" has not yet entered the consciousness of the research community.

WORK OF THE EXPERT GROUP E-INFRASTRUCTURES AUSTRIA
The Expert Group task force made use of previous data on the subject of RDM, including the results of the report entitled "Researchers and their Data: Results of an Austrian Survey", the results of the first LEARN workshop, held in London in January 2016, as well as the results of an online conference with universities in South America.
The Expert Group also formed a nine-person work sub-group, which met regularly every two weeks, and was charged with drafting a policy paper. At first, work was begun in English, as many of the existing policy examples were written in English. Over time, however, drafts were broken down to meet Austrian needs, in both language and meaning. The model policy became more and more concrete with each meeting. The project management of e-Infrastructures Austria ensured a continual flow of information between the work subgroup and the Project LEARN, particularly as the breakout sessions during the second and third workshops became more and more focused on policy development in varying European institutions. E-Infrastructures Austria also set a high standard with the organisation of the four-day "Training Seminar for Research Data Stewardship and e-Infrastructures", 7 which looked at operational measures in the field of RDM.
The following duties of the Expert Group are of particular importance: • Regularly exchanging information regarding the development of a model RDM policy with LEARN project partners, particularly with representatives from South America, in order to compare and standardise terminology; • Utilising the results of the breakout sessions of the LEARN Workshops; • Keeping the goals and mission outlined in the LERU Roadmap in consideration; • Upholding the "FAIR guiding principles for scientific data management and stewardship"; 8 • Gathering feedback from the Austrian research landscape, particularly with regard to rights and organisational guidelines and terminologies; • Involving institutional computer centres (ICT); • Cooperating with legal experts; • Continually exchanging information with representatives from Austrian research funders and sponsors; • Comparing the results of the work sub-group with the conclusions drawn after the examination of RDM policies across Europe (see also: Evaluation Grid for RDM Policies in Europe. Survey results, August 2016 in this Toolkit, pp. 139-66). 9

ESTABLISHING THE CASE
In the Referendum of 2016 1 , the UK's decision to leave the EU has caused both delight and consternation. A fundamental driver for that result was the perception that the UK needed to achieve greater autonomy. In some quarters, this has led to loud calls for individual autonomy. London Mayor Sadiq Khan wants London to be given more autonomy from central government following the UK's vote … to leave the European Union, saying that the city needs to "take back control." ' 2 Autonomy is a powerful and emotive word. It is important to note that autonomy is not the same as independence. As the Mayor has also said: 'I want to send a particular message to the almost one million Europeans living in London, who make a huge contribution to our city -working hard, paying taxes and contributing to our civic and cultural life. You are welcome here. We value the enormous contribution you make to our city and that will not change as a result of this referendum.' 3 'We are now coming to have a sense that what it is to be a university in the 21st century necessarily includes a positive orientation to the world, in all of its aspects. The university -as an idea -is not only networked across the world, not only active in many countries, but takes up a positive stance towards the world. Indeed, it has a care for the world, wanting to play its part in helping to improve the world.' 5 That is a very helpful discussion and offers much in terms of understanding the possible consequences of Brexit.
Many commentators have reacted with fear and alarm to the Brexit vote. Immigration is seen by some as the major issue and as a driver for the 'No' vote in the Referendum. Others note the impact of Brexit on exchange rates, and the perceived damage were the UK to leave the Single Market. 6 For universities, there are enormous concerns over the possible loss of EU funding in Horizon 2020, the ability of UK universities to recruit overseas students and to retain its EU workforce. 7 Universities UK has highlighted a key concern as: 'In terms of recruiting EU staff in the longer term, any changes will depend on the kind of relationship Electoral Commission: http://www.electoralcommission.org.uk/find-information-by-subject/elections-and-referendums/past-elections-andreferendums/eu-referendum/electorate-and-count-information; last accessed 3/1/17. 2 Business Insider UK: http://uk.businessinsider.com/sadiq-khan-speech-on-london-independence-after-brexit-and-the-eu-referendum-2016-6; last accessed 3/1/17. 3 Independent: http://www.independent.co.uk/news/uk/sadiq-khans-brexit-eu-referendum-response-in-full-there-is-no-need-to-panic-a7100071. html; last accessed 3/1/17. 4 Financial Times: https://www.ft.com/content/d32b1a42-7a5b-11e6-ae24-f193b105145e; last accessed 3/1/17. 5 Barnett, R, 24 June 2016, EU referendum: will UK HE become less global, more parochial? THE blog: https://www.timeshighereducation.com/ blog/eu-referendum-will-uk-he-become-less-global-more-parochial; last accessed 3/1/17. 6 BBC: http://www.bbc.co.uk/news/uk-politics-32810887 gives an overview of current issues at the end of 2016; last accessed 3/1/17. 7 UUK: http://www.universitiesuk.ac.uk/policy-and-analysis/brexit/Pages/brexit-faqs.aspx; last accessed 3/1/17. the UK negotiates with the EU. However, UUK is committed to highlighting the value of all EU staff, including researchers, scientists and academics, and is urging the UK government to guarantee that those currently working at UK universities can continue to do so after the UK exits the EU.' Clearly, the current situation poses threats. However, the purpose of this article is to suggest that Brexit is not simply a threat, but also an opportunity. A recent article in Insights suggested that Brexit presented opportunities for commercial publishing, 8 '… where some publishers see adversity, others see possibility.
While there has been much hand-wringing about economic fallout, nearly half of all publishers see Brexit as an opportunity to make money on exports …'.The words of Sadiq Khan on the future of London are important here -'it's what we are: open for talent, for business, for investment.' The emphasis is on the word 'open', and it is the argument of this article that Brexit presents not only challenges but also real opportunities for the UK and Open Access, not in terms of autonomy but of freedom -the freedom to innovate and to devise new models for the dissemination of scholarly outputs. These are core values of the Open Access movement and 2017 presents the opportunity to invest time and effort to deliver on them.

DELIVERING THE GOODS
How has the UK contributed to this vision for an Open Access future? Is it an independent view or one shaped in collaboration with others? What challenges lie ahead for the UK in developing its Open Access position and presence? A study of four themes can help tease out answers to these questions: Open Access policies and mandates, EU copyright reform, new Open Access publishing models and Open Science.

Policies and mandates
Brexit means that the UK will leave membership of the European Union, not that it will be leaving Europe. 'Brexit means Brexit', but the nature of the future relationship remains to be worked out. However, Open Access is a European -and indeed a global -agenda, not solely a matter for the EU. Europe is awash with Open Access infrastructure. As of 3 January 2017, OpenDOAR listed 3,285 Open Access repositories. 45.2% of these are based in Europe. Looking at the breakdown of repositories by country worldwide, the top 9 countries with a repository presence are as follows: The UK does well in terms of its place in this particular league table, coming second overall and ahead of any other European nation.
Arguably, the drive towards Open Access in the UK has been driven by UK funder mandates, by the Finch Review and by the recent HEFCE Open Access requirement for REF2020. Research-intensive universities are on the ball in supporting their researchers in meeting the requirements of Open Access funder policies. UCL (University College London), for example, lists 39 funder policies on its website, 9 only 4 of which are linked directly to the European Union. It should be noted, however, that these European funders are significant funders of UK collaborative research -the European Research Council, the EU's FP7 programme, Horizon 2020 and Marie Curie. In February 2016, UCL noted, 'UCL has retained and strengthened its position as the top performing university in Europe in the major EU funding scheme Horizon 2020, securing more than €103 million so far. In another significant funding success, UCL researchers have recently been awarded nine highly prestigious European Research Council (ERC) Consolidator Grants, totalling around €15 million and placing the university as the second-placed higher education institution in Europe for the number of grant awards under this scheme. UCL has also been awarded 27 Marie Curie International Fellowships, worth around €6 million.' 10 Clearly, loss of EU research funding will have a major impact on the ability of research-intensive universities to undertake research and so to disseminate the results of that research activity as Open Access outputs. As Universities UK has stated: 'UUK will make the case to government of the importance and impact of our strong research collaboration with European partners, highlighting how EU programmes play a central role in supporting this.' 11 Funding is a serious issue, but in other areas the UK has made a significant contribution to the global OA debate. The Finch Report, 12 accepted by Government in July 2012, was key in determining a public policy position in the UK on Open Access. On 16 July, Research Councils UK announced that they were also introducing Open Access requirements. 13 As it has been implemented, RCUK offers funding to researchintensive universities to disseminate their funded research outputs as Gold OA outputs. 14 In the first 3 years of activity, UCL (University College London) exceeded the targets which RCUK had set. The vast majority of papers made Open Access were Gold, supported by RCUK funding. to steer the whole of the EU towards during its … presidency.' 15 In fact, a 2016 study 16 found that 5 EU countries want to abandon the traditional subscription model and move to Gold Open Access dissemination: the Netherlands, Hungary, Romania, Sweden and the UK. Clearly, the UK has contributed to this debate, a contribution not solely shaped by the EU.
In the UK, the recent HEFCE mandate for Open Access to support the Research Excellence Framework (REF) 2020 is already being very influential in shaping attitudes to OA dissemination in universities. 17 The REF has enormous influence since the results determine the selective annual allocation of quality-related (QR) grant distribution from the Higher Education Funding Councils. There is every chance that REF OA compliance, rather than the Finch review or even the RCUK OA mandate, will be a game changer for the development of OA in the UK going forward.

European Copyright reform
The European Union is currently engaged in what we believe to be the final stages of copyright reform proposals. In Europe, a number of organisations are taking a leading role in supporting demands for academic-friendly copyright reform, bodies such as LIBER (Association of European Research Libraries) and LERU (League of European Research Universities). 18 For these organisations, the crux of the matter is the need to modernise copyright legislation for the digital age. Their case is focussed on the need for an Exception for Text and Data Mining (TDM) to be enshrined in the new legislation. 19 Text and data mining is the process of deriving information from machine-read material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns. Copyright legislation is involved in the discussion because of the act of copying. For a digital future, let alone an Open Access future, TDM is an essential tool. Researchers will want to mine content which is both Open Access and material which is available from commercial suppliers, where copyright has typically be assigned to the publisher. LIBER and LERU assert that 'the right to read is the right to mine'; and that all content, to which researchers have legal access, should be open for TDM. There are also legal barriers which restrict researchers' abilities to mine the open web. This legal uncertainly hampers research and discoveries, which would act as a foundation for innovation and income generation, creating new jobs for the European economy. It is vital that the draft copyright reform proposal 20 currently offered by the Commission embraces all these requirements.
When the UK leaves the EU, where will it stand in relation to the new Directive? There are two issues to consider. There are already copyright-friendly regimes in operation around the world: the USA, Asia, Canada and the UK, for example. In the UK, the Hargreaves review of UK copyright frameworks allows an Exception for TDM, but for non-commercial purposes only. 21

New publishing models
Open Access allows new approaches to scholarly publishing. In the UK, there is a growing amount of interest in the creation of Open Access publishing platforms, often linked to institutional university libraries.
One good example of this is UCL Press, the UK's first fully Open Access University Press. 22 Grounded in the Open Science/Open Scholarship agenda, UCL Press will seek to make its published outputs available to a global audience, irrespective of the ability to pay, because UCL believes that this is the best way to tackle global 'Grand Challenges' 23 such as poverty, disease, hunger. Full Open Access by 2020 is a very ambitious vision. As a member of the EU, the UK is committed to support this objective. After Brexit, depending on the nature of the future relationship between the EU and the UK, the United Kingdom probably will not be mandatorily subject to this requirement going forward. In the UK itself, there is no current equivalent mandate for 100% OA compliance by 2020. The nearest directive is probably the HEFCE requirement for the Research Excellence Framework, also 2020. However, not all research produced in the UK is submitted to the REF. The EU ambition for OA, therefore, is more expansive than the public position in the UK. It has to be said, however, that the UK position on 2020 may be more realistic in terms of the ability to attain the stated objective.
One of the major early deliverables from the Open Science agenda is a bold vision for a European Open Science Cloud (EOSC) of research objects. The Commission has appointed a High Level Expert Group (HLEG) to advise on progress in the Cloud, which is a metaphor for an Internet of data, and the HLEG has recently released its Report. 26 I was honoured to be a member of the Group that compiled this document. One of the major observations it contains is that the majority of challenges to reach a functional EOSC are 'social rather than technical'. Another major finding is that there is an 'alarming shortage of data experts both globally and in the European Union'. The Report also determines that the technical components needed to create a first generation EOSC are largely in existence already, but that they are 'lost in fragmentation and spread over 28 member states and across different communities'. There is a real challenge facing the UK, and indeed Europe, if the UK is not a member of the EOSC going forward. Research is global; it does not stop at national boundaries. The UK will suffer if its research data is not visible as part of this European collaboration. Europe, and indeed research communities across the globe, will also be the poorer if they cannot seamlessly access UK research outputs alongside other European findings.

CONCLUSION
The argument of this paper is that, no matter what sort of relationship the UK develops with the European Union post Brexit, Brexit itself poses not only challenges but also presents opportunities.

RESEARCH INTEGRITY
All well-managed research performing organisations should have codes of conduct for research integrity, which are developed at institutional level and/or at national level. 1 These codes provide frameworks for best practice in research practice and conduct, establishing principles, guidelines or norms for the ethical, effective and legal conduct of research enquiry. By way of example, this Case Study looks at the framework for research integrity in place in UCL (University College London). 2 UCL has a Statement on Research Integrity 3 and an accompanying Code of Conduct for Research. 4 The Statement on Research Integrity makes clear: 'It is the view of UCL that everyone involved with research has a joint responsibility for ensuring high standards of integrity throughout the research process, from the creation of methodology and data collection through to publication and authorship.' The UCL Statement is itself grounded in UCL 2034 5 , the UCL institutional strategy. Principal Theme 1 of this strategy is 'Academic leadership grounded in intellectual excellence'. In 2012, Universities UK published the Concordat to support research integrity and the five commitments set out the UK's determination to maintain high standards of rigour and integrity in its research. 6 'This concordat 7 seeks to provide a comprehensive national framework for good research conduct and its governance. As signatories to and supporters of the concordat to support research integrity, we are committed to: Case Study • maintaining the highest standards of rigour and integrity in all aspects of research • ensuring that research is conducted according to appropriate ethical, legal and professional frameworks, obligations and standards • supporting a research environment that is underpinned by a culture of integrity and based on good governance, best practice and support for the development of researchers • using transparent, robust and fair processes to deal with allegations of research misconduct should they arise • working together to strengthen the integrity of research and to reviewing progress regularly and openly' UCL welcomed the 2012 Concordat to Support Research Integrity and agrees with the five commitments contained within. The four elements of integrity within the concordat, which UCL sees as Principles of Integrity, reflect UCL's existing Code of Conduct for Research. It is expected that all staff (including honorary staff), students, visitors and collaborators are aware of and adhere to both the Code of Conduct for Research and the Principles of Integrity as set out below (taken in their entirety from the concordat). 8 • Honesty in all aspects of research, including in the presentation of research goals, intentions and findings; in reporting on research methods and procedures; in gathering data; in using and acknowledging the work of other researchers; and in conveying valid interpretations and making justifiable claims based on research findings. • Rigour, in line with prevailing disciplinary norms and standards: in performing research and using appropriate methods; in adhering to an agreed protocol where appropriate; in drawing interpretations and conclusions from the research; and in communicating the results.
• Transparency and open communication in declaring conflicts of interest; in the reporting of research data collection methods; in the analysis and interpretation of data; in making research findings widely available, which includes sharing negative results as appropriate; and in presenting the work to other researchers and to the general public. • Care and respect for all participants in and subjects of research, including humans, animals, the environment and cultural objects. Those engaged with research must also show care and respect for the stewardship of research and scholarship for future generations.
These statements and principles set the framework for the performance of research at UCL. The third of the principles from the UUK Concordat is important for the topics of Open Science and Research Data Management 9 . How can this principle be delivered? It is to this subject that this Case Study is devoted. • Concerns about quality assurance -53% fully agreed that this was a barrier; 35% partially agreed • Lack of credit-giving for Science 2.0 [Open Science] -50% fully agreed, 38% partially agreed

RESEARCH DATA MANAGEMENT AS A FEATURE OF OPEN SCIENCE
The Report then looked at how these barriers could be removed, and the types of intervention that would be needed to do this. The answers to the questions of interest to this Case Study are given in Figure 4.1 below. 12 Comparison of the figures is interesting. There was not much interest amongst researchers in intervention in the metrics space. Concerns about the lack of Open Access to research publications and research data scored highly -in fact, this was the most significant total in the validation exercise. 'The purpose of this Policy is to provide a framework to define the responsibilities of all UCL members and to guide researchers and students in how to manage the data, enabling research data to be maintained and preserved as a first class research object and made available to the widest possible audience for the highest possible impact.' 20 The position taken by the UCL policy on Open Data is that research data should be as open as possible, as closed as necessary.
Supported by this policy, UCL is taking practical steps to deliver pan-university RDM systems and services.
Top-level activities are illustrated in Figure 4.3 below. As is the case in many other UK universities, these positions were created relatively recently; the first RDSO having started in May 205, and the second in November 2016. One of the main drivers to create these posts was a change in research funders' requirements 3 which prompted UK research-intensive universities to provide greater support for researchers in terms of interpreting and complying with funders' policies.

RESEARCH DATA MANAGEMENT ADVOCACY AND SUPPORT
The Research Data Management Team's activities can be divided into three interwoven missions: i Advocacy In terms of advocacy, the team promotes best practice in data management and sharing, and communicates about services available within the university to support researchers throughout their research projects. The website and all research data-related services are mainly promoted via short presentations given upon invitation in faculty and research department meetings. Other advocacy activities have included participation in university-wide induction events for new staff members; presentations for staff members in other professional services; delivering 1-to 3-hour bespoke workshops on data management in research departments; and conducting a university-wide Research Data Management survey.
ii Support One-to-one and research group support are other key areas of activity. Responding to email and phone enquiries is part of the day-to-day support offered to researchers and research students. Meetings are proposed when the user needs advice on several topics, or when a discussion is required with the whole research group. In addition, the Research Data Management Team reviews Data Management Plans (DMP) written as part of research grant applications or as project deliverables. The review consists of feedback on the content and layout of the plan, but also advice on where to find funders' requirements and guidance to write the plan, and information on relevant university or external resources to improve the plan. Users are offered the opportunity to submit a final draft version of their DMP for a last review.
iii Training A structured training programme enables the dissemination of information about resources and best practices to both researchers and to non-research staff in central services. A separate training programme to introduce Subject Librarians to Research Data Management has run since 2015. Four sessions with thirty participants in each have been convened to date. Since December 2016 a training programme for PhD students has been co-organised by the Research Data Management Team, the Research Integrity Team (in the Research Office), and the Doctoral School. Embedding RDM advocacy in communications about research support in general enables the Research Data Management Team to reach a wide range of researchers who are at early stages of their projects and careers. In addition to the structured programmes, tailored training sessions are delivered upon demand.

WHAT WORKED: COLLABORATION, PRESENTATIONS, IMMEDIATE HELP AND INFORMATION GATEWAY
A Collaboration across central services Several of the activities described above were put together thanks to collaboration with other university central services. Joint work with the Information Services Division, the Research Office, the Ethics Committees and Legal Services has resulted in more efficient promotion of RDM services. Moreover, daily liaison both with the Research IT Services and within Library Services has established a growing network of research support and research data experts. Given that the RDM advocacy programme across the university is still relatively novel, and that the Research Data Management Team is still quite small, this network is proving most valuable. In terms of communications, this enables all of the teams to make the most of each invitation to give presentations at faculty or department staff meetings, and thereby to multiply the opportunities to speak about data management planning and the help available within the university.
The Library Working Group dedicated to RDM is another network on which the team can rely. The Group provides discipline-specific knowledge and essential support for short-term projects such as the building of the website, and designing and promoting the cross-university survey. This Group is formed of thirteen volunteers (Librarians, Records Manager, Digital Curation Manager) who work on a specific project each summer; not all Working Group members are required to participate in all projects. It is planned to offer more training to these members and other librarians so that in the future they can answer basic RDM enquiries and review Data Management Plans.

B Presentations at staff meetings and review of Data Management Plans
It has been possible to draw a direct link between verbal presentations given in departmental and faculty meetings and a subsequent increase of email enquiries received. The average allocated time for such presentations is only 10 minutes, but this enables us to point to a range of university services which are often previously unknown to researchers, and to answer several questions. Regarding feedback on our service, it is the reviewing of DMPs which triggers the highest volume of positive feedback. This includes comments such as: Although reviewing one DMP can take up to two hours, this assistance has an extensive and immediate impact on the user's grant application and future project. It also enables an opportunity to point to services, and to explain several aspects of data management to a researcher. These documents are moreover valuable source material to analyse the types of data being produced in the university.

C Gateway to other university services
Being seen as a first point of contact is one of the Research Data Management Team's objectives. Because of the size of the user community and the variety of its enquiries and needs, the Team aims to develop an excellent and up-to-date knowledge of who does what in the institution rather than attempting to become knowledgeable in each and every aspect covered by data management. This strategy has so far been successful and in three cases it has enabled putting researchers in touch with the relevant experts. It has proved the right one whenever several experts had to be brought to one table to help a research group. In a similar perspective, the RDM website has been planned as a gateway to find research support resources across the university.

THE CHALLENGES OF OPERATING IN A VERY DIVERSE RESEARCH INSTITUTION
While many of our approaches have worked well, working in such a large research-intensive institution does pose challenges. Because only generic knowledge can be developed by a small Research Data Management Team, advocacy has so far failed to extend to discipline-specific needs. This can be frustrating for both service providers and users alike. To help to address this, the RDM Team now recommends that faculty and department data experts should act as the primary contact for subject-specific questions. The idea is that Research Data Management central services are available to complement help offered by local research support staff (such as permanent data managers, research managers and IT officers) who are able to maintain a subject disciplinary expertise. We also aim to foster and support a network of subject-specific data managers across all faculties.
The diversity of research contexts within the university also forces the RDM Team to prioritise its advocacy effort. The strategy so far has been to help users who self-identify as needing it. This can be because their funder has explicit requirements on research data; or because they generate a large amount of digital data. As a result the Team has been slower to assist and reach out to potential users who do not appear to have data management issues. For example, internally-funded and student projects are not forced to use DMPs; or there can be a misunderstanding of what "research data" encompasses. Solutions found so far to help to overcome these limitations include targeting students via the new training programme, giving presentations in all faculties and stressing in our communications that help is available for all researchers whatever their discipline or type of data produced.

CONCLUSIONS: MEASURING SUCCESSES
Measuring the success of a service focussing on advocacy and awareness-raising is extremely difficult in terms of metrics. Having only existed for less than two years at the time of writing, relatively little hard data is available. Furthermore, being an entirely new area of engagement for the institution, there is no baseline data from which to measure successes. Instead, we have focussed on qualitative data to promote and inform service development. An institution-wide survey has provided a baseline in terms of key areas, including awareness and understanding of RDM, and we would hope that this can be repeated periodically in an attempt to measure the impact that the RDM Team is having and to further inform service development. 5

THE OVERALL CHALLENGE
Raising awareness among relevant stakeholders is critical for the success of any Research Data Management (RDM) initiative, as their participation and collaboration will be needed for the development and implementation of related policies and programmes. The UN Economic Commission for Latin America and the Caribbean (ECLAC), in its role as a partner institution of the LEARN Project, had as one of its missions to raise awareness and engage RDM stakeholders within Latin America and the Caribbean (LAC).
However, the task constituted a significant challenge due to the geographical dimensions of the region and the socio-cultural diversity within it. For that reason, ECLAC had to develop a strategy that involved several actions, including gathering information about the current state of LAC in regards to RDM; identifying relevant stakeholders; liaising with them to understand their needs and expectations, and planning targeted activities taking into account the particularities of people and institutions within the region.

RDM IN LAC: STATE OF THE ART
The first step was gathering information about past and current developments in RDM in LAC. This would lead to the identification of institutions, people and projects related to research data, in terms of data creation, management, preservation, access, and policy development.
Due to the complexities in collecting information from such a large variety of countries -each one being a whole universe of people and organisations -six countries were selected as the starting point and main focus of research: Argentina, Brazil, Chile, Colombia, Mexico and Peru. Information was gathered using freely-available publications in several formats, mainly institutional websites, and complemented by interviews with stakeholders when necessary.
This initial approach allowed ECLAC to get a first overview of the RDM landscape in LAC. It could be established that -although isolated or relatively unknown -there are several initiatives from scientific communities and organisations related to the management of research data.
One of the trends identified in the region is the promotion of the management of research data through national legal initiatives in the domain of access to scientific information. The most prominent case is Argentina, where the enactment of the law n° 26899 1 in 2013 set new requirements for individuals and organisations whose research is publicly funded and led to the creation of the National System of Repositories  2014 3 ), where national repositories are expected to gather research publications and data and make them available to the public.
Brazil is another country making progress in the field of RDM, where efforts are being made by organisations related to scientific development. Among them is FAPESP, the funding agency of the Sao Paulo State, currently in the process of developing agency-wide Research Data Management Plans, and the Brazilian Institute of Science and Technology (IBTC), whose Rede Cariniana -a network of digital preservation services available to Brazilian universities -will make available research data generated by researchers in all fields of knowledge.
These are a few examples of initiatives found in the research phase, which also included discipline-specific repositories in a variety of fields such as social sciences, economics and biodiversity. Their identification served as a basis for the definition of the activities that were undertaken by ECLAC in the following months.

STAKEHOLDERS IDENTIFICATION
The identification of stakeholders was undertaken along with the gathering of general information about RDM in LAC. During the process, ECLAC sought to identify people and organisations, taking into consideration two criteria: the representation from, at least, the six selected countries, and the presence of the most relevant professional sectors or roles normally involved in the management of data (such as researchers, librarians, IT professionals, policy makers and research funders). The size of the initial list of stakeholders grew throughout the research (reaching over 400 people), and its quality improved mainly thanks to the collaboration of the same stakeholders, who provided useful references to people and projects within particular fields, sectors and/or countries, thus helping ECLAC to build a credible network of contacts within LAC.

UNDERSTANDING STAKEHOLDERS NEEDS AND EXPECTATIONS
After the main group of stakeholders was defined, over 30 meetings were planned and held, either in person or virtually, with three main objectives in mind: to present the LEARN Project and its goals, to better understand the current state of development of RDM in each country and institution, and to identify the strengths and needs perceived by each stakeholder in this respect.
The meetings proved useful in fulfilling these objectives, although some challenging aspects of working with a diverse group of stakeholders over a large geographic area started to emerge. For example, it became apparent that there was not a single use and understanding of terms related to RDM and the scope and purpose of RDM itself. Moreover, one of the first findings was that Research Data Management was not a commonly used term in LAC, meaning that the difference between RDM and other related terms (such as Open Science, Open Data or Open Access) was not necessarily clear. This was identified as a potential barrier to effective communication with stakeholders. ECLAC was able to identify different levels of understanding about the implications of RDM and to perceive that stakeholders had different interests and expectations in terms of their collaboration with LEARN. However, they had something in common: they wanted to learn more about RDM and they were also interested in knowing other people and organisations with experience in this area, in particular within the Latin American and Caribbean spectrum. This prompted ECLAC to plan new activities to that end.

TARGETED ACTIVITIES
Having in mind the differences, needs and expectations of stakeholder groups, ECLAC organised a series of online mini-workshops, designed to serve two main purposes: first, to allow stakeholders to meet and know about each others' experience in RDM and, second, to present and discuss issues about the management of research data, which could also help in setting a common understanding of RDM concepts, as a theoretical ground to build upon in future activities.
The first mini-workshop, titled "Research Data Management (RDM): An overview", was held on 20 April, 2016. A second event, more specific in terms of content, was held on 30 June, 2016, and consisted of a discussion of the current state of development of RDM in one Latin American country, Peru. Both events were held using a virtual platform, and lasted one hour.
A third mini-workshop was held in Port of Spain, Trinidad and Tobago, on 24 November, 2016. This event was different from the first two mini-workshops, as it was an on-site full-day event, focused on the developments in and particular characteristics of the Caribbean context.

CREATING A FORUM FOR REGIONAL COLLABORATION
The participation of ECLAC in the LEARN Project considered, from the beginning, the organisation of one regional event, which was held on 27 October 2016 at the UN ECLAC premises in Santiago, Chile. The event was titled "Implementation of policies and strategies in Latin America and the Caribbean". The programme and activities were strongly tied to the findings of the team in previous activities, and resulted in the gathering of around 90 people representing the regional and professional diversity of stakeholders from Latin American and the Caribbean.
This event, and the three previous mini-workshops, allowed ECLAC to advance in a significant manner its mission of raising awareness on RDM-related issues and engaging stakeholders in Latin America and the Caribbean. They also provided a forum for stakeholders in which they met, learned about other people and organisations' work, shared their experiences and started a discussion about strategic areas of development. It is hoped that these experiences will also contribute to the creation of alliances and joint projects to foster the development of RDM both across LAC and beyond.

LESSONS LEARNED
The experience of ECLAC in raising awareness on RDM throughout Latin America and the Caribbean provided several lessons. First, it proved how important it is to identify stakeholders and to understand their situation and needs prior to the planning of specific actions.
Expectations on each side must be known, as any action will have to take into account what each party can provide and what it expects to receive. In this respect, actions should be taken to make sure that appropriate communication channels in all directions are in place, and to identify potential barriers, such as language, preconceptions on a given topic, or different organisational cultures and procedures, among others.
A diverse pool of stakeholders requires a close examination of each one of them before looking at the big picture. This will help any organisation to deliver a clear message and to plan and execute targeted activities relevant and useful to all RDM stakeholders which will, in turn, encourage their engagement in the management of research data.

WHAT ARE THE ISSUES?
The University of the West Indies (UWI) is unique, in that it is a multi-campus institution located in different countries in the English-speaking Caribbean. After 60 years of existence, it presently has over 45,000 students. Researchers-academic staff and postgraduate students-are actively engaged in many research initiatives at the various faculties, centres and units. However, the notion of research data management (RDM) is still in its infancy. Moreover, universities in the Caribbean have been outpaced by their counterparts in the developed nations with regards to RDM. The key issues that face the UWI, at this time, are: In terms of placing the UWI in this matrix, the institution is at the earliest stage i.e. policy and leadership.

AWARENESS
At the UWI St. Augustine (STA) Campus located in Trinidad and Tobago, In September 2015, the UWI STA Campus Libraries participated in a two-day Annual Research Expo which highlighted the research conducted on the campus and the assistance provided for this. One of the objectives of the STA Campus Libraries on this occasion was to show the ways in which the Libraries provide valuable support throughout the entire research cycle from the formulation of the idea, preparation of the literature review and the actual study, gathering data, documentation, publications and archiving data.
Case Study

UWI St Augustine Campus Libraries and RDM efforts at the UWI, St Augustine Campus
The Campus Libraries were becoming increasingly concerned about how researchers managed data after it was collected and analysed, as well as its availability for further study. As a result, a survey was designed and administered to researchers-both faculty and postgraduate students-to determine their awareness of data management practices; the size of the data they generally managed; how they stored and archived their data; and how they perceived the Libraries having a role in assisting them managing their data during and at the end of their research cycle, if at all.
Based on the survey results, it was clear that researchers were generally not fully aware of what RDM involved or the four key components of RDM: • create data and plan for its use • organise, structure and describe data • store and preserve data • search for and share it.
Some of those interviewed felt that emailing their data to their personal email account was a form of archiving their research. Furthermore, when they were asked how they saw the Library helping them with managing data, they were unable to say. These responses were indicative of the fact that the notion and elements of RDM were unfamiliar to the group of respondents at the UWI STA Campus. Also, of interest, was that most people handled small amounts of data < 50 MB and not the large data sets the Libraries had expected.

COORDINATION OF RDM EFFORTS
Another major issue is that there are multiple departments and initiatives at UWI STA that provide support to researchers, but there is little communication at the moment among them.
One project, the Research Information Management System (RIMS), is an online tool used to identify researchers at UWI with specific knowledge and skills. RIMS allocates each researcher a profile in the database where they can update personal information; learn about current research activities on the campus; access internal funding sources; and locate information on and apply for internal and external grants. Through RIMS, researchers can access training and assistance with the development of research proposals (UWI. ORDKT, 2016 2 ).
Another venture is the Trinidad and Tobago Research and Development Impact (RDI) Fund. This Fund, provided by the Trinidad and Tobago Government but managed by the Office of the Principal of the St. Augustine Campus, offers a maximum of US$ 300,000 to researchers to develop projects in priority areas such as agriculture, crime, violence and citizen security, public health, climate change and related environmental issues, finance and entrepreneurship, technology and society, and economic diversification and sector competitiveness. Since the establishment of the Fund in 2012, eighty-five (85) concept notes have been received and thirty-one (31) grants totalling over US$ 2,000,000 have been approved and awarded. Despite these successes, RDM has not been an integral requirement for researchers accessing these funds. Due to the structure of UWI, coordination of RDM would be a challenge to implement. The UWI comprises three physical campuses located at St. Augustine in Trinidad and Tobago; Cave Hill in Barbados; and Mona in Jamaica and a fourth campus, the Open Campus which is both a virtual campus and also consists of seventeen (17) centres located on various islands throughout the English-speaking Caribbean. Each territory has its own governmental policies to which it adheres as well as a distinct cultural landscape. The geographical and administrative issues provide major challenges to setting policies and implementing RDM across the UWI campuses. Furthermore, the question of which department will take the lead in developing the necessary infrastructure across the four campuses is a prime concern. Currently, the institutional repository (IR) called UWISpace is managed at the UWI STA Campus Libraries. Recently, the staff at the Alma Jordan Library based at the St Augustine Campus visited Harvard University in order to gain insight into how Dataverse-software used for data management-functions with the view to testing and deploying it across the UWI Campuses.

TRAINING
For RDM to be successfully implemented, staff must be properly trained to provide support to researchers to assist them during the RDM cycle. Academic libraries have carved out a niche in this area in North American countries. At present, the expertise among the UWI librarians is not at the level to provide the necessary RDM support. Although the technical information technology (IT) expertise may exist, testing and implementing software is just one aspect of implementing RDM.

COST
At this time, all the UWI campuses are undergoing severe budget cuts due to economic setbacks experienced by the contributing UWI territories. In Trinidad and Tobago, the UWI STA Campus overall budget was cut by more than 14% over the last year and departments on the Campus have had to make sometimes drastic adjustments to cope with the diminished allocations. Implementing RDM across the campuses would involve considerable costs associated with the necessary storage and infrastructure and equipping staff with the appropriate skills.

CONCLUSION
For the UWI, the key issues are identifying those ready and willing to take charge and to drive RDM at the various Campuses. The STA Campus Libraries have shown initiative by deploying an RDM awareness pilot survey among researchers on the campus and sending staff to acquire knowledge of Dataverse software. Libraries would have a critical role to play in the RDM implementation process. Nevertheless at UWI, RDM cannot be realised without the collaboration of all relevant departments.

BACKGROUND
The 4TU.Centre for Research Data began as a collaboration between three of the Dutch Technical Universities (Delft, Eindhoven and Twente) that form the wider 4TU Federation (which also includes Wageningen University). Since its inception in 2008, the Centre has used a variety of methods to convince and incentivise researchers to deposit their data. So far, over 7,000 datasets have been deposited, published with Digital Object Identifiers (DOIs) and are openly available for reuse.  i Roadshows A series of lunchtime lectures has been organised for researchers within TU Delft. They have been organised and hosted at a departmental level so that staff from 4TU.ResearchData can tailor their presentations according to different disciplinary requirements. The roadshows have been developed in combination with other staff from the library. This has allowed the roadshows to present information on a variety of issues (Open Access, Current research information system implementation etc.). This has meant that more researchers have attended, so they can find out about whichever issue is pertinent to them.
ii Financial Incentives 4TU.ResearchData are currently planning to release two sets of funding to incentivise researchers from the three technical universities to deposit data. The first is a 'Data Rescue' fund. This will provide funds to researchers to allow them to prepare data so that it is suitable for depositing in the 4TU archive. Data preparation can mean giving the research team time to anonymise data, add documentation, or convert the data into formats suitable for publication.
The second is a 'Data Publication' fund. This will give researchers the time and money to write data reviews of their data for a suitable Data Journal (e.g. the Geoscience Data Journal) and provide them with suitable Article Processing Charges, if required. The data will then also be published in the 4TU archive.
iii Working with ICT and Projects Within TU Delft, working with other partners in the university also helps spread our message. Colleagues in the ICT department often provide advice to researchers on data storage and data processing during projects. We therefore organise regular meet ups with the relevant ICT staff to inform each other about our work. This helps ensure that staff from the ICT department are also capable of passing information to researchers on why they should deposit data.
Similarly, we also work with the research funding department of TU Delft. Given the requirements for good data management from the EU in Horizon 2020 and also from the Dutch funding agency NWO, the research funding team began to understand the importance of good data management in successful project proposals. This, in turn, has increased the likelihood of more data deposits when such projects begin to produce data.

BUILDING INSTITUTION-WIDE DATA STEWARDSHIP
While the methods referred to above have been useful, it is still a rather piecemeal approach. Given the importance of good data management to the entire scientific research lifecycle, a holistic approach was required.
Therefore the next step has been to get the entire institution to consider good data management. In 2015, TU Delft began its Data Stewardship project, with the goal of introducing policies and best practices for data management in each of the university's eight faculties. This is being achieved with the support of the senior management of the university, who have introduced a broader Open Science programme, with the goal of promoting all types of openness in scholarly communication (e.g. open education, open access).
Being able to work with key stakeholders and persuade them why research data is important is essential. Faculty secretaries, who are the senior administrators within each of the faculties, have an important role to play here, at least in the context of TU Delft. They can help shape policy at a faculty level, but can also gauge and weigh the different responses to data management amongst a faculty's staff. This extra knowledge proves valuable in creating data management policies that can work in tandem with the researcher.
Continuing to find allies in the other support services is also important. For example, connecting with the Graduation School, which provides generic training for students, has allowed the Data Stewardship project to see how it can embed training on effective data management. This is a much longer scale piece of work to implement. It requires numerous stages, and will take a few years to complete. The four identified stages are:

a) initial fact finding within faculties
This has involved interviewing researchers and senior administrators on their attitudes and current behaviour in terms of managing their research data. Particular attention has been paid to the varying practices and methodologies within different disciplines, and the impact these have on data management.

b) development of a draft policy
Based on the above, a draft policy was written stating potential roles and responsibilities for stakeholders within the university. It also offered faculties specific options on how they would deal with the following three areas: training for PhD students, data management plans and training for researchers.

c) ongoing conversations about implementing such a policy within separate faculties
The draft policy is then used to continue discussion on the implementation of good data management, with a focus on the staff and infrastructure required at each stage of the research lifecycle.

d) implementation of processes and policies
It is envisioned that there will be funds made available to allow the faculties to put into practice the demands made in the university-wide policy, for example with regard to PhD training or the creation of individual Data Management Plans for each project. Most importantly, it is hoped that Data Stewards can be embedded in each faculty to provide tailor made help for the different disciplines within the university.
At time of writing the second phase of activity has just been completed.

CONCLUSION
The 4TU.Centre for Research Data has been in existence for nearly ten years. It has therefore had ample opportunity to explore various methods for incentivising researchers to share their data.
While they have limited reach, the early steps taken (roadshows, published case studies) are essential to get initial contact with stakeholders in the university. The most likely way one has of convincing researchers to deposit their data is if their disciplinary peers are already doing this. Therefore the local case studies are important, offering personal testimony. Roadshows are also important as the face to face contact helps build trust, and gets Library staff out of the Library and into the faculties and departments.
However, to advance all this to the next level, a wider institutional approach is required. TU Delft's Data Stewardship project identifies and works with key influencers throughout the university. Engaging the necessary stakeholders and implementing policies -as opposed to engaging individual researchers on a one to one basis -is essential for any university wishing to see Data Stewardship work at scale. The survey was open to all UCL research staff and research students 2 and available online over 5 weeks in January and February 2016. The 67 questions dealt with respondents' awareness of policies and UCL services; with their practices of data management planning, data creation, storage and sharing. Finally, they were asked about their needs in terms of support and training. All questions addressed the respondents' most recent research project.

CONTEXT
306 fully completed surveys were received (out of 619 unique surveys sent in). 130 research departments, institutes, centres and units were represented among the responses (out of a total of 380 3 ) and were drawn from all UCL faculties.
The majority of responses came from research staff members, who are collaborating with other researchers on their project (either based within their department or external to UCL) and who have received external funding for it. The detailed responses from 3 of UCL's Schools and Faculties are given in the Appendix. These are the Faculties of Arts and Humanities, Laws, and Social and Historical Sciences.

Pan-UCL findings
Overall, the findings across UCL were as follows: A very positive 70% of respondents are aware of UCL's and of their funder's policy on research data, In this report, "research staff" encompass two categories of staff used by UCL Human Resources: "Academics" and "Researchers" (both full-time and part-time employees). It does not includes "Teachers". "Research students" refer to full-time Graduate Research students. 3 As listed in the UCL Departments A to Z (http://www.ucl.ac.uk/departments/a-z/, accessed 4 August 2016). and 60% of respondents know about the UCL services related to Open Access. However, the level of awareness is problematic when it comes to internal research data-specific services: both the Research Data Management website (online since September 2015) and the Research Data Storage facility (available since 2012) are unknown to 60% of the participants.
The most common types of digital data created by respondents are spreadsheets, texts, databases and images. Remarkably perhaps, the answers also show that 30% of respondents produced non-digital data as part of their most recent projects and another 30% collected personal or sensitive data. Half of the respondents produced less than 100 GB of data over the lifetime of their project.
Data storage and archiving practices are also shown to be problematic. The most common method for storing research data was by using a personally-owned computer (45% of responses); the other favourite choices were a UCL computer, an external hard drive/USB stick or a cloud service. At the end of their project, half of respondents left their data on existing storage and, worryingly, 20% either could not recall exactly where they had archived their data, or had no plans for long-term preservation.
Among those who archived their data, 50% did it for their own re-use; for 20% of research staff it was because of funders' requirements. Half of the respondents have already shared their data with other researchers. Among them, only 25% did not have any concern when sharing data. When concerns were expressed, they were linked to legal questions, misinterpretation and time spent to collect the data.
A very large proportion of respondents (71%) said they thought about data management very early on in their projects, and a third indicated having someone in their team or department responsible for RDM. Yet, when asked what challenges they faced when managing their research data, a long list of problems enumerated by 217 participants is striking.
What is also surprising is that respondents mainly described challenges that are linked to handling data during their projects (storage, dealing with large volumes of data, good record keeping and backing-up procedures). This could indicate that they are not aware of where to find central information on these issues; or that the help available (whether at the central, faculty or department level) is not sufficiently adapted to assist with these essential measures.
Among the options proposed to them, respondents indicated that they would like help primarily in the following areas: storage and preservation of data; writing Data Management Plans; costing data management; data sharing and Open Access to publications. They would prefer to receive such assistance through online resources, training sessions in their department and regular drop-in sessions.

Faculty-specific findings
In terms of levels of awareness of policies and services, Arts, Humanities and Social Science researchers showed low levels of awareness of internal UCL RDM facilities and services. 66% of Arts and Humanities researchers did not know of the UCL RDM website. For Laws, half did not know. With regard to usage, only 10% of Arts and Humanities researchers and 8% of Social and Historical Sciences researchers had actually utilised it. This compares with, say, the Faculty of Engineering, where only 10% of researchers knew about the RDM website and 66% did not (Engineering had 67 respondents from 72 surveys).
As to the creation and analysis of data, an interesting picture emerges. For Arts and Humanities researchers, the most important type of data created was textual, with spreadsheets coming a close second. For Laws it was databases, with text coming a close second. In Social and Historical Sciences, the most popular forms were spreadsheets and photographs/digitised images, followed closely by databases and text. In Engineering, again the most popular formats were text, followed by spreadsheets and other images. Clearly, there is not the difference between the disciplines that might have been imagined.
Where there is a difference is in the size of the datasets created. For Arts and Humanities, only 3 respondents created datasets of 1-10 GB. The figures in Laws are too small to use to draw comparisons. In Social and Historical Sciences, the single biggest category for size of dataset creation was 1-10 GB. This contrasts with the Faculty of Engineering, where 11 of the 58 respondents were creating datasets of 1-10 GB, 12 datasets of 10-100 GB, and 11 datasets of 100 GB-1 TB.
When it came to storing and archiving research data, researchers in the Arts and Humanities commonly selected the hard drive of a personal PC or laptop, or an external hard drive/USB stick. For Laws, the most popular storage medium was an external hard drive/USB stick. In Social and Historical Sciences, the preferred media were the same as for Arts and Humanities.
For long term archiving, researchers in Arts and Humanities preferred subject repositories, or repositories external to UCL. In Laws and Social and Historical Sciences, researchers preferred existing storage. In the latter, many respondents admitted to having no archive plans. Engineers showed a pattern similar to Social and Historical Sciences. They preferred to use existing platforms for long-term storage.
Researchers were asked if they had any concerns about sharing their research data with others. Arts and Humanities on the whole had no concerns, and the same is true for Laws. Social and Historical Sciences did have concerns, however, and the most cited reasons were confidentiality issues/IPR or Data Protection.
Finally, researchers were asked what kinds of support they needed. In Arts and Humanities, researchers cited three main areas: storage and preservation of data, Open Access to publications and Data Management Plans. The same three preferences were cited in Social and Historical Sciences. In Laws, however, the most requested area for help was Open Access to publications, followed by a group who felt they needed no help. By way of illustration, in Engineering the most mentioned common areas for additional support were storage and preservation of data and Data Management Plans.

CONCLUSIONS AND RECOMMENDATIONS
A number of conclusions and recommendations can be drawn from the UCL survey, which illustrate the challenges and opportunities for research data management in the Arts, Humanities and Social Sciences:

Recommendation nº1
In all disciplines, research funders expect grant applicants and grant holders to explain how they will manage their data and to comply with their Data Management Plans. Being aware of these policies and services is a key element to writing successful funding applications. The earlier researchers receive assistance, the lesser the eventual risks for their projects.
• Faculties and research departments are encouraged to promote data storage solutions which comply with institutional and funder policies. • Establish a central information service to support research data management activities, such as http://www.ucl.ac.uk/library/research-support/researchdata, and ensure this is continually promoted across the institution. • Where possible, Heads of Departments should periodically invite the institutional Research Data Management team to give brief presentations to staff and research students on what assistance is available to them, including on 1-to-1 support and review of Data Management Plans. • PhD students should be urged to attend courses on research support as part of an institutional Doctoral Skills Development Programme. 4

Recommendation nº2
Training and support opportunities for both research staff and research students should not overlook the aspects around personal/sensitive data and databases as a large proportion of researchers use these as part of their projects.
Using personal computers and commercial cloud services to store research data represents a clear security risk for any data and a potential breach of security regulations if these are personal/sensitive data. An increasing number of funders currently expect that research data should be preserved for at least 10 years. Whether using an institutional facility or a discipline-specific repository, researchers should ensure that they know how to find reliable archiving facilities.

Recommendation n°3
The lack of clarity on where to find solutions to all of the challenges cited by research staff is a worrying observation. Academic faculties should be strongly encouraged to consider appointing/designating permanent staff members to assist researchers with data management in their subject disciplines.
Following these recommendations will help to avoid rushed and potentially costly short-term decisions; a lack of support when problems arise; and the outdating of skills and standards.

APPENDIX
Selected detailed responses to the UCL Questionnaire

Faculty of Arts and Humanities
16 completed surveys were sent (out of 37 unique surveys transmitted) from researchers and research students in the Faculty. The overview below highlights responses to some of the key questions. It should be read after the Executive Summary and in conjunction with the whole report. The Research Data Management team is available to discuss the results.

Faculty of Social & Historical Sciences
17 completed surveys were sent (out of 28 unique surveys transmitted) from researchers and research students in the Faculty. The overview below highlights responses to some of the key questions. It should be read after the Executive Summary and in conjunction with the whole report. The Research Data Management team is available to discuss the results.

Awareness: policies, UCL services & Data Management Plans
Creating & analysing data

INTRODUCTION
Living Symphonies 1 is a landscape sound installation by James Bulley and Daniel Jones 2 , which toured across four different forests 3 in the UK in the summer of 2014. The work portrays the thriving activity of the forest's wildlife, plants and atmospheric conditions, creating an ever-changing sound symphony heard from a network of 24 speakers hidden throughout the forest itself. Working with ecologists and wildlife experts across the UK, Jones/Bulley developed highly detailed maps of the flora and fauna that inhabited each forest site where the installation was to take place.
Each species in the surveyed area was depicted by a unique set of musical motifs that portrayed their changing behaviour over day and night, coming to life as the species awakened; moving, developing and interacting just as the organism would. Dozens of these motifs were heard at any moment when the piece was live, spatialised across the space of the forest and heard back through a three-dimensional speaker system. In total there were some 15,000 fragments of sound within the sound score, making up musical movements for over a hundred different organisms.

FUNDER REQUIREMENTS
The piece was commissioned and funded as a collaborative work by Sound and Music, the Arts Council England and the Forestry Commission England. All copyright in the work, including that of the datasets, remained with the artists and there was no requirement to make any such data publicly available. A required outcome was a toolkit for touring public artworks, produced and published by the Forestry Commission England. This toolkit is openly accessible and available here 4 .

SURVEY DATA
In order to undertake the piece, the artists collected a large array of datasets over a year-long period of indepth research and development. This data was used both to create and contextualize the artwork. A table of datasets captured during the project is shown in Figure 10.1.

BACKUP AND STORAGE
Working in remote forests across England was a challenge for capturing and storing data, as Internet/ network access was extremely limited. As a result, the data was regularly backed up and duplicated onto hard drive storage, before then being synchronized to cloud storage at a later point. For immediate 'transfer' purposes all data gathered was placed into Dropbox (for sharing with partners including press organisations, Sound and Music and Forestry Commission England) and then transferred to external hard drive storage (copies were synced and held both at the Jones/Bulley studio and in personal artist studios offsite). Dropbox was used for its ease of use, stability and simple sharing interface.

ANCILLARY DATA
During the live period as the installation toured, there were a number of additional datasets that were captured by the artists and the production team as part of the project.
A table of datasets captured during the project is included in Figure 10.3

SHARING OF DATA
The sharing of the data that underpins Living Symphonies has been a complex and near impossible task. Whilst the partner organisations did create a toolkit that explored the touring of the piece (which was a prerequisite of the Arts Council funding that the piece obtained), it has not been possible to make available the vast majority of the above data in any coherent way. It is clear that most of this data would be very useful to many other researchers and artists (as proven by the interest of numerous academics, musicians and ecologists). However, in order to achieve this there would need to be funding allocated to provide the time for the adequate preparation of the datasets with related material to explain and contextualise them. Some of the photography and video has been used to make short reference films and to provide visual context to document the occurrence of the work, but it has not been possible for the artists to make the following datasets available due to a lack of funding, time constraints surrounding its curation and contextualization, i.e. ranges of data and editing of documentation material, and issues in hosting such large quantities of material. Bracketed after these datasets are the avenues that the artists would hope and plan to make the material available through if possible: •

CONCLUSION
Whilst much discussion has occurred in recent years surrounding research data management in the context of science-centred and text-based research outputs, very little of this has involved confronting the problems facing artist-researchers working outside these areas. As a result of fundamental differences in the commissioning and funding structures for art projects, there is insufficient funding and understanding on the part of the artists and institutions involved as to how or even why it is worth making this data available.
Living Symphonies provides a case study that highlights a large and wide-ranging array of datasets that would undoubtedly be useful for researchers across numerous disciplines. In this instance the artists/ researchers are comfortable with the vast majority of the data being made available under one of the more openly accessible of Creative Commons licenses -in this instance this would not affect any further income for the artists as the pieces in themselves are unrepeatable due to their site-specific nature. The artists believe this would be the right thing to do, given the publicly funded nature of the project. This data will remain unavailable unless there is adequate funding and planning from the outset for projects such as these.

THE DIGITAL REVOLUTION
At about the turn of the millennium, the global volume of data and information that was stored digitally overtook that stored in analogue systems on paper, tape and disc. The result has been a digital revolution, with the global data acquisition rate now 40 times greater (35x1000 7 bytes) than 10 years ago, still accelerating and driven in part by the massive reduction in the cost of digital storage. In 2003, the human genome was sequenced for the first time. It had taken 10 years and cost $4billion. It now takes 3 days and costs $1,000.
The unprecedented rate that we are able to acquire, store, manipulate and instantaneously communicate vast amounts of digital data and information has profound implications for all fields of science and scholarly research as well as for economies and societies. It is crucial that these implications are explored to the maximum effect by the research and scholarly communities in all parts of the world. Part of the opportunity lies in exploiting "Big Data", where enormous fluxes of data stream into computational and storage devices, often from a great diversity of sensors and sources; in "Linked Data", where semantic linking between different datasets opens opportunities for eliciting much deeper meanings (of great potential relevance for many global challenges such as infectious disease, disaster risk reduction and migration); in the myriad opportunities that arise from blending the physical and digital realms through the "Internet of Things"; and in the powerful but problematic potential of machine learning. The fundamental benefits derived from these approaches are in elucidating patterns and relationships that have previously been beyond our capacity to resolve and both to characterize and to simulate the dynamics of complex systems.

SCIENCE 1 AS AN INHERENTLY OPEN ENTERPRISE
Openness has been the bedrock on which modern science has been built. The rules of the game were established in the late seventeenth century, when scientific ideas began to be published in open journals rather than hidden in the private correspondence of gentlemen. A further crucial step was the requirement by journal editors that truth claims must be accompanied by the evidence (the data) on which they were based. This permitted others to attempt replication of the observational or experimental evidence and to scrutinise the logic of the proposed relationship between evidence and concept. Failure on either count indicated error. It is a process termed "self correction" by historians of science, tellingly characterised by Arthur Koestler in writing: "The progress of science is strewn, like an ancient desert trail, with the bleached skeletons of discarded theories that once seemed to possess eternal life". If there is a scientific method, this is it, the power of the negative. Albert Einstein characterised it as: "No amount of experimentation can prove me right. A single experiment can prove me wrong." The word science is used here to mean the systematic organisation of knowledge that can be rationally explained and reliably applied. It is used, as in most languages other than English, to include all domains, including humanities and social sciences as well as the STEM (science, technology, engineering, medicine) disciplines.

THE BRIGHT SIDE
Like all revolutions that have not yet run their course, it is often difficult to distinguish reality and potential from hype. But powerful, real discoveries have now emerged in the elucidation of previously unsuspected patterns and relationships. In genomics, rapid sequencing and advanced computing power permit systematic testing of relationships between genetic variations and specific traits and diseases, rather than using trial and error, with profound implications for medicine, agriculture, the production of biofuels and the process of drug discovery. The advent of the modern computer has long permitted simulation of the dynamics of highly coupled complex systems, their sensitivity to small variations in initial conditions and their capacity to produce "emergent behaviours" that were not evident from their individual components.
We can now add to this by the use of big, linked data to characterise complexity, and by iterating between characterisations and simulation, to follow and forecast the evolution of complex systems, as is now done in modern high-resolution weather forecasting. Only however if data is routinely made "intelligently open" (accessible, intelligible, assessable and re-usable), 2 can the full benefit of such approaches be realised.

THE DARK SIDE
However, the vast and complex data volumes that many scientists are now able to access also challenge the open approach required for self-correction. This arises from the difficulty of making such data sets open to scrutiny, together with the metadata, the computer code used in analysis, and the logic of any "learning machine" used in the process. It is hardly surprising that many of us fail this standard, or have succumbed to the temptation to keep our data under wraps so that it can be milked again for further publications. A current debate in the New England Journal of Medicine 3 about the rights and wrongs of openness in medical research epitomises this conflict; between the public interest in openness and the interests of scientists' careers in maintaining data ownership. Moreover, the recent attempts to replicate the results of highly regarded papers, in areas as diverse as pre-clinical oncology, social psychology and economics, with replication rates never exceeding 25%, illustrate the consequences of not rigorously presenting all the data and metadata. Without this, self-correction cannot work. If we are to maintain the credibility of the scientific process, we need to regard absence or inadequate presentation of data and metadata as scientific malpractice and to re-establish standards of reproducibility for a data-rich age. Without this we run the risk of the digital explosion overwhelming the processes that ultimately maintain scientific rigour.

ADAPTING TO CHANGE
Information and knowledge have always been essential drivers of human material and social progress, and the technologies by which knowledge is stored and communicated have been determinants of the efficiency of these processes. The digital revolution is a world historical event as significant as Gutenberg's invention of moveable type, and certainly more pervasive. A crucial question for the research and scholarly community is the extent to which our current habits of storing and communicating data, information and the knowledge derived from them are fundamental to creative knowledge production and its communication for use in society, irrespective of the supporting technologies, or whether many are merely adaptations to an increasingly outmoded paper/print technology. Do we any longer need expensive commercial publishers as intermediaries in the communication process? Do conventional means of recognising and rewarding research achievements militate against creative collaboration? Has pre-publication peer review ceased to have a useful function? These are non-trivial questions that need non-trivial responses.
Both individuals and institutions need to adapt. The recently published Accord on Open Data 4 sets out principles and responsibilities. It advocates a normative principle at the level of individuals: "Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re-used and re-purposed." and an operational principle that: "The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations." A positive reaction to the Accord from the International Union of Crystallography 5 included an even stronger clarion call to action: "We urge the worldwide community of scientists, whether publicly or privately funded, always to have the starting goal to divulge fully all data collected or generated in experiments." Such statements from the global research community about the open ethos of scientific inquiry, and its relevance to the need of humanity to use ideas freely, should be echoed by universities as part of their traditional role in preserving, re-assessing and creating knowledge and communicating it, in questioning received wisdom rather than blandly regurgitating it. They are also important in combating a countervailing trend towards the privatisation of knowledge, of which some universities are part, by succumbing to injunctions to see themselves largely as instruments of national wealth creation, where intellectual output is marketable property rather than public good. In contrast, the technologies at our fingertips have a key enabling potential for "open science", in which publicly funded science is done openly, its data are open to scrutiny, its results are available freely or at minimal cost, and results and their implications communicated more effectively to a wide range of stakeholders. Moreover scientific knowledge 'producers' should cease to think of knowledge 'users' as passive information receivers, or at best as contributors of data to analyses framed by scientists, but potentially as respected allies in the co-framing of issues and the co-production of actionable knowledge 6

INFRASTRUCTURES FOR OPEN DATA
Whilst universities must respond to these ethical challenges in their own ways, they must also respond to the need to manage their data in ways that they believe to best reflect their mission. Several years ago, rigorous data management was seen by many universities merely as a cost, as an "unfunded mandate". Increasing numbers of universities now see open data as a necessary part of their future and plan to position themselves to exploit the opportunities that it offers. Some of the essential principles of good research data management have now been established as a result of hard won experience 7 8 , many of which are shared in this volume. The "hard" infrastructure of high performance computing or cloud technologies and the software tools needed to acquire and manipulate data in these settings are only part of the problem. Much more problematic is the "soft" infrastructure of national policies, institutional relationships and practices, and incentives and capacities of individuals. For although science is an international enterprise, it is done within national systems of priorities, institutional roles and cultural practices, such that university policies and practices need to accommodate to their national environment. The iceberg figure reflects this (figure 11.1). The easy part is the visible part comprising the hardware and software tools required by a national open data system and any consents required for data use. Below the surface lie issues of process and organization.
What is the ecology of the There are, however, important developments in support of open data beyond the confines of the university, with which universities can engage to their considerable benefit, if only to relieve themselves of the burden of being data management islands. "Open data platforms" are currently being developed where the needs of users are matched with hardware/software provision and data managerial skills, and created within individual disciplines (e.g. US National Institute of Health 9 , Elixir-europe programme 10 ) or multi-national geographic regions (e.g. Open Science Platform for Africa; Latin America and the Caribbean Platform; European Open Science Cloud).

A DECADAL VISION
Nearly two decades ago, Tim Berners-Lee proposed that datasets that relate to the same or related phenomena could be semantically linked in ways that integrate different perspectives, 11 and thereby offer much deeper understanding than merely using the web as a means of retrieving documents. Such a semantic web for science has the potential to integrate data from many sources to gain insight into complex relationships. It could, for example, be a means of integrating data from the natural and social sciences that are highly relevant to many complex global challenges; or of integrating data from the "internet of things", where almost any device with its own power source is able to acquire non-trivial information about its environment. Such a development is impeded by two barriers: a failure of many disciples to define their own vocabularies and ontologies, which impedes the efficiency with which they are able both to locate and use data relevant to their own discipline; and a failure to adhere to standards that enable interoperability between disciplines. A strategic initiative is currently being launched by the International Council for Science's Committee on Data for Science and Technology, together with international science unions and associations, in the form of a Commission on Data Standards for Science to tackle these two major issues. It has great potential not only to enhance scientific understanding, but also the way that science is able to engage with the wider public in a more truly open science. This will require a major, decadal effort from across the science community, and could prove to be a profound step that will fundamentally change the way that science is done in the 21 st century, through an unprecedented capacity to integrate data from disparate disciplines in ways that profoundly increase the potential of science to address major global challenges.  The main finding was that there were many perspectives, experiences and small-scale initiatives in this area across the institution. There was a need to enable and invigorate a common clear ethos of "open" across faculties and campuses. The report indicated positive reactions to the philosophy of "open" with strong support in many (but not all) academic areas. There was an appreciation among stakeholders that there is not a single correct way of doing it. It was felt that open education at UCL can best be introduced by focusing on openness via a set of specific dimensions, such as content, technology, and pedagogies.
Through the SIG UCL is assessing how initiatives can be brought together and also how existing policies and projects are used to support a more coordinated and purposeful approach. UCL is already recognised as a European leader in its commitment to open access to research. UCL Discovery, UCL's open access repository for UCL research publications, is well established as a mainstream service. UCL Library Services and UCL Digital Education are in discussions about piloting an Open Education Resources (OER) repository using the same platform as UCL Discovery. Another development is that UCL has recently launched MediaCentral, a media repository that showcases and provides access to media-based teaching and other items. 2 Additionally, UCL Press 3 is the UK's first fully Open Access university press, launched in 2015. Since then 43 titles have either been published or are in press, including three textbooks. This is seen as a major opportunity in UCL to change the current commercial business model for textbook publishing. Open Access textbooks present the institution with an opportunity to make an offering in the Open space which will promote access to, and use of, textbooks by the end user. Students support OER, especially open textbooks, because OER in digital format are accessed at no cost and print copies are also available at relatively low cost.
The notion of linking some UCL Press titles to MOOCs is also being explored. UCL has already run three centrally-funded MOOCs on the Futurelearn 4 platform with three others in production for launch in 2017. UCL academics are also involved in several other MOOCs as well as open access courses on UCL eXtend, 5 UCL's externally-facing virtual learning environment.

COSTS
In addition to building on and connecting currently-funded open initiatives, UCL Library Services and UCL Digital Education are jointly seeking additional funding to begin piloting the OER service, with a projected implementation in 2017. In terms of OER content production, a major benefit will be cost-effectiveness because of the ability to share and re-use resources. However, while OER bring down total expenditures, they are not cost-free. New OER can be assembled or simply re-used/re-purposed from existing open resources, and RDM storage and hosting facilities can be re-used for OER. This is a primary strength of OER and, as such, can produce major cost savings. OER need not be created from scratch. On the other hand, it should be recognised that there are some costs in the assembly and adaptation process.

SERVICE PROVISION
The pilot OER service will be run by staff from across the UCL Library Services and UCL Digital Education teams. The teams will run the activity of planning and organising people, faculty, infrastructure, communication and the material components of an OER service in order to guarantee its quality and the interaction between the UCL Press as an RDM service provider and UCL faculty and students as users. The pilot will include a requirements analysis for the staff and end-user functionality of the repository (e.g. subject descriptor taxonomies, workflows to support quality assurance, branding). UCL will also identify and develop exemplar OERs to test the technical system and develop/document support processes. As indicated in the opening section a major requirement is to raise the profile of OER and input to the pilot though a programme of training and advocacy, workflows and case studies. From an educational impact perspective, the team will evaluate and make recommendations for turning the demonstrator into an established and sustainable service.

INDICATIVE COSTS
For indicative capital setup costs, the OER service will be connected to the UCL Research Data Service which required in the region of £1 million to be established, with presumably little additional cost for the introduction of OER RDM. The pilot will clarify actual staff support requirements, but potential staff costs can be estimated as a two year half-time grade 8 post -less than £60,000 including full economic costing. The existing Library, Digital Education and academic development teams will co-ordinate activity to encourage a culture of 'open' amongst existing professional and academic staff and students, including top-down strategic direction, energy and encouragement.

SCOPE OF THE SERVICE
The Digital Education Team will support academic colleagues, colleges, units and individual researchers to publish educational media (podcasts, video, files, etc.) under an appropriate Creative Commons Licence. The team will also promote and support the use of OER in teaching at UCL via its user communities, websites, news publications, workshops and events. In partnership with Library Services, it will offer staff and students training in copyright licensing and how to prepare materials for open publication and platforms for dissemination. Furthermore UCL encourages members of the university to use, create and publish OERs to enhance the quality of the student experience, provided that resources used have undergone quality assurance, are fit-for purpose and relevant.

WHERE DO UCL OER SERVICES SIT?
The OER service will be a partnership between the UCL Digital Education Team, UCL Library Services and UCL Press. UCL Digital Education is based within the Information Services Division (ISD) and provides support, advice and training for all aspects of Learning Technology, e-learning and open and distance learning across the whole of UCL. The Division of UCL Library Services runs UCL Discovery, 6 UCL's open access research repository, and UCL Press is a department of UCL Library Services.

POLICY DEVELOPMENT
Currently there is no existing national OER policy in the UK to support institutions in OER adoption and no current OER institutional repository in UCL. There is a UCL Research Data policy 7 and a LERU Working Group produced the LERU Roadmap for Research Data 8 which contains a range of guidance and information on Open Data and which could be extended or adopted for OER. UCL's current policy and advocacy for RDM are fully in line with the LERU Roadmap. In terms of a potential future UCL policy in OER, it should articulate and expand upon UCL's position on OERs and provide guidelines for practice in learning and teaching and RDM procedures. The University could encourage staff and students to use, create, and publish OERs via institutional and other repositories and help them track usage and impact.

CONCLUSIONS
OER as a service with its RDM setup is not so difficult to setup if other data management systems are already in place in a University setting. We contend that is why an OER service should be embedded into UCL's Library services and main Content Management System tools.

HABEAS DATA 1 VS OPEN DATA
In Colombia, there is a regulatory framework that protects the publication of personal data such as names, municipality, identification documents, phone numbers and addresses. In addition any information regarding children's development and victims of conflict is also protected by this regulation. For this reason, Centro de Datos (CEDE) at University of the Andes has created a range methods in order to publish research data and make it as open as possible, but without revealing the elements protected by Habeas Data. These methods include: confidential agreements, users with different levels of access, and a special data processing room located in the School of Economics at the University of the Andes.

TYPES OF USERS AND ACCESS POLICY
As a dissemination platform for its information CEDE uses its web page https://datoscede.uniandes.edu.co/ Nonetheless, due to the sensitivity and the costs of the data only users with the appropriate level of granted access clearance can access this information. To this end the platform has three different categories of users: professors from the Economics department, other University of the Andes members, and finally external users. The first have free access to all of the data sets the web page offers, the main reason for this being that the information has been collected or requested by them. The second group has limited access to the information; an authorisation from a professor of the University, along with a brief summary of the investigation for which the data is needed, are necessary in order to access restricted information. Lastly, the third group can only access the public data available in the web page, and no further information is allowed due to the contracts by which the University obtains restricted information.

ANONYMISATION PROTOCOLS
In order to publish restricted data, our CEDE created an anonymisation protocol. The algorithm used consists of: • Generating a random number in order to identify the household. This number is generated based on the interviewer identification number, order of the municipalities visited and order of the interviews conducted in the house; • Generating a consecutive number based on the interview order and on the order of the municipalities visited.
The handling of research data in the social sciences at University of the Andes -Data Centre (CEDE) -Colombia

RESTRICTED DATA AND CONFIDENTIAL AGREEMENTS
In line with the Open Data initiative, CEDE has classified its datasets into five different groups: open data, public use, licensed data, external repository and data that is not available. The data that is considered licensed is mainly information from public institutions, which hold material that is relevant for economic research, but where that information is not generally freely accessible. To access this type of data, the University subscribes to confidentiality agreements or contracts. However, due to the sensitive nature of this data, its availability is subjected to an authorization process by which professors attest to the use of the data by a student, subject to a prior discussion as to its necessity within the specific research process. The data that contains information protected by Habeas Data also has an additional step of authorization, in which the researcher has to sign a confidentiality clause where they agree not to reveal information about particular individuals, not to use the data for other purposes, and/or not to circulate the information among other people that lack the necessary permission to use it.

DATA PROCESSING ROOM
As a contingency measure to protect the information subject to Habeas Data and to guarantee access to the information, CEDE jointly with the National Department of Statistics (DANE) have made available a data processing room. In this space researchers can access information with no anonymity process in order to estimate more complete economic models. To access this room, researchers have first to request the information from DANE by way of an email in which they give details of the specific data set and give a brief introduction to the research project they are working on. After this, DANE grants access by creating a designated user file into which the information requested is placed. Finally, researchers sign a confidentiality agreement and schedule an appointment to work on the computers available in the data processing room. These computers provide no Internet connection or USB ports, in order to control the entry and extraction of data. When researchers want to enter or take information, they must first send an email to DANE concerning the material they want to enter or file, as well as the output format, to obtain the results from the calculation made in the data processing room. Under no circumstances is the raw data made available out of this room.

CONCLUSIONS
For CEDE it is important to make the data available and as open as possible. However, in this process, it is necessary to respect the law related to personal data protection in Colombia. CEDE has identified and operates some methods that are useful to open the data without revealing the protected elements within it. For this reason, the University believes that it is important to find a way under the law to publish this data and make it as open as possible.

DATA STORAGE DURING THE 'ACTIVE' PHASE OF RESEARCH
The active phase of a research project comprises the generation or collection of data, its processing, and its analysis. If data is well managed during this phase then it can considerably simplify the job of preparing data for longer-term preservation and access after the end of a project, but it is not easy for institutions constructively to intervene in many elements of research, which are often highly specific to the requirements of a particular project. Data storage is the one element that almost all researchers depend upon, and where institutions can offer a generic central service. However, even in this realm there is a wealth of options available to most researchers, from laptop hard drives and memory sticks, to commercial cloud services such as Dropbox.
By providing researchers with a storage service that is both easy to use and includes helpful collaboration mechanisms, an institution can however gain some measure of control over how their data assets are managed, and facilitate the smooth path of data and associated metadata through the research data lifecycle.

THE RESEARCH DATA STORAGE SERVICE AT UCL
The development of the Research Data Storage (RDS) Service at UCL was motivated from the outset by the necessity of assisting researchers to comply with the requirements of research funders. UCL sought to develop a data storage service that had the 'resilience and disaster recovery to assure the safety of research data'; 'multiple and intuitive user interfaces to meet a broad set of user experiences', a 'service wrap to make the Service useful to more users', and the 'capacity to increase the user base across UCL'. A tender for physical storage to enable the objectives of a data storage service was issued in 2012, and the service opened to researchers in June 2013.
Use of the service has grown exponentially since that time. As of December 2016, the service hosts approximately 760 TB of research data before replication and redundancy, 1.791 PB in total. All faculties at UCL have at least one project that is using the service.
The service is offered to research projects, rather than individual researchers. This helps with the assignment of useful metadata, as projects can be cross-referenced with administrative information held in grants databases and other UCL information sources. In practice, the service does not prohibit the creation of unofficial projects, as that would effectively proscribe the use of the storage by 'unfunded' research, a mode of working which is common in the humanities and social sciences.
When signing up for an allocation of project storage space, the authorisation of a Principal Investigator is required. The PI must vouch that no personal data (as opposed to research data) is held in the system, and that they recognise their legal obligations under the UK Data Protection Act 1998 and otherwise.  To be assigned a new project, the PI must also provide a start/end date for their project and some basic descriptive metadata. Projects can request between 1 and 5 TB storage, or contact the service directly if they need more. The minimum allocation of 1TB reflects the fact that the service was originally developed with large-scale data users in mind, as this community was least well served by alternative solutions, although the Storage Service is available to all UCL researchers however much data they anticipate generating.

UNDERLYING INFRASTRUCTURE
There are two different storage technologies under the bonnet of the RDS Storage Service: General Parallel File System (GPFS) block storage; and Web Object Storage (WOS). This was seen as a good combination, as the fast GPFS component can cater for users who require data to be staged to UCL's high-performance computing facilities, whilst the highly scalable object storage provides a cost-effective way of managing the bulk of UCL research data. The Integrated Rule-Oriented Data System (iRODS) is used as the management layer for data in the object store.

SUPPORT REQUIREMENTS
Besides the need to keep the infrastructure up to date and ensure that the service is running smoothly from a technical perspective, the RDS team works with UCL Library Services to assist researchers with interesting use-cases to make the most of the service by ensuring their workflows are rationalised. At present, some common administration processes, such as changing permissions in project groups, are also still a semimanual process, although web interfaces are being developed to allow users to do more of this themselves.

COSTS AND PRICING
At the time of writing the Research Data Services team consists of 4 full-time employees (4 FTE), although not all of this staffing resource is dedicated to keeping the storage service ticking over. Monitoring, patching, bug-fixing, service communications, support and consultancy, and service management take about 2.5 FTE at present, with the rest of the time going towards future service development (including a UCL institutional repository), a re-architecting of the present service, and technology monitoring and assessment. An unusually high proportion of staff time over the last year has been spent dealing with issues affecting the object storage. Once the service is more mature, and more of its administrative processes automated, we would expect it to require less staff time to maintain. In addition to the core team, the service requires a small amount of resource from the UCL helpdesk team and the Data Centres team.
Hardware and support costs for a storage service will vary according to the specific deal arranged with the supplier(s). The current RDS capacity was achieved via two purchases: an initial purchase of just under a petabyte of GPFS storage and 240 TB of WOS storage, plus servers, support, and other small items of equipment, for around £740,000 in 2012; and an expansion of 2.88 PB of WOS for a little under £600,000 in February 2014. 1.2 PB of this was later converted to GPFS.
The service itself is currently offered free of charge to UCL researchers, although those with particularly large requirements (>10TB) are asked to contribute to costs if they are able. As the service scales up, this model is unlikely to remain viable, so a new pricing model is currently under development to ensure longterm sustainability.
The new pricing model will almost certainly allow a storage allocation up to a certain point free of charge, with charges applying for quantities beyond this as yet unset level. This should enable small and unfunded projects to continue using central storage, with all its benefits both to researchers and institution in terms of being able to manage data over the long term. More data-intensive projects, on the other hand, will be expected to include their required data storage capacity in their grant applications -passing their exceptional costs on to the research funder.
Although demand for the service is anticipated to continue to grow exponentially, the costs are expected to be offset in part by the falling price of storage. We are seeking to move to a purchasing strategy of buying storage according to more of a just-in-time model in future, as it makes little sense in owning constantly depreciating capacity standing idle. It is possible that some sort of cloud capacity will be used as well, but it is recognised that the costs of cloud storage add an unpredictable and potentially expensive component to the service model.

FUTURE REQUIREMENTS
At present, the RDS Service is a push-in / pull-out service. However, many of our users want to be able to use their allocated storage space as though it were available as a mounted drive. This prospect is challenging given the large file sizes the service needs to cater for, but various technologies are being assessed for suitability.
Other improvements and functionality that users have requested include: a. File versioning b. Dropbox-like sync and share functionality c. The ability to add non-UCL collaborators to projects (which is currently possible, but only by adding the collaborator as an honorary member of UCL, which is a bureaucratic process) As of December 2016, the RDS is engaged in a major project to expand capacity and better address user requirements.

LESSONS LEARNED
Some things to consider when setting up a storage service for active research data: • Ensure that your choice of underlying storage technology is mature and reliable -this is a situation where being an early adopter is not necessarily a good strategy; • Have a clear policy as to what the service can and cannot offer; • Ensure a daily back-up is in place; • Run induction sessions to understand new users and their requirements; • Communicate clearly the benefits of institutional storage over personal storage; • Recommend a single graphical interface for less technical users, plus programmatic access for the more technically adept; • Invest time in developing a clear reporting system that is independent from the underlying infrastructure; • Understand how your institution's identity and group management systems work; • Have a plan B for if something goes catastrophically wrong!

INTRODUCTION
In the 20th century, at the time when the State was the main driving force for the management of information and documentation resources and services, the creation of intermediary information systems in Brazil became a matter of strategic importance. As a result of this, it was necessary to construct a scientific technical infrastructure, as well as to train qualified personnel in the management of the production of, access to and preservation of information in science and technology. Brazil's government conferred upon the Brazilian Institute of Information in Science and Technology (IBICT) the responsibility to become the standard-bearer for core competencies in the process of treatment of, access to, and dissemination of information.
The Cariniana is a distributed preservation network, funded by IBICT, committed to national and international cooperation, promoting the management and dissemination of digital preservation practices and developing a sustainable digital preservation programme to support Brazilian universities and research centers' needs and requirements. In 2012 IBICT recognized the need to address digital preservation issues, and it adopted the LOCKSS (Lots of Copies Keeps Stuff Safe 2 ) approach as suitable for the needs of the Cariniana network. Its main focus concerns open access publications in Brazil. The network preserves journals, and doctoral theses, and it is just starting to cover scientific data to be deposited in a research data repository. The experimental phase, using LOCKSS open source, covered a year in 2013, and was supervised by LOCKSS staff from the University of Stanford.
In 2015, the implementation of Cariniana's Dataverse 3 repository added significant new services to the digital preservation network which will help specialized libraries' staff to deal with the demand from researchers for a trusted space for their datasets. IBICT is making available a repository for research data that is responsible for long term preservation and good archival practices, while researchers can share, keep control of, and receive recognition for their data. In addition, the repository supports the sharing of research data with persistent data citation and enables reproducible research.

WHAT MOTIVATED THE REPOSITORY OF SCIENTIFIC DATA AT THE CARINIANA NETWORK?
The Cariniana Network resulted from the need to create a digital preservation service of Brazilian electronic documents to ensure continuous access to these documents throughout time. The creation of the project for the preservation of research data was based on the idea that the more copies of a document that are stored in different places, the safer they will be. First, a centralized storage structure is used; then the content goes through distributed computer resources, with the participation of institutions that support electronic documents. Initially, the activities were carried out jointly with the University of Brasilia. In the first phase, the Dataverse network is being used for the addition and storage of research documents from individuals, institutional projects, and electronic journals. After that, the possibility of integration with the LOCKSS platform of the partner institutions will be used as the preservation repository of the stored material. Offering digital preservation services includes integrating the scientific data content of the connected institutions into a unified pattern; these mechanisms must facilitate the automation of processes of identification, storage, validation, and conversion of the content into new digital formats.
IBICT started a pilot project in 2015 and one of its objectives is to be a valuable contributor to the development of research data repositories in Brazil. The Cariniana Dataverse network is developing information products to promote the practice of digital curation at institutions with important collections in digital format. Coordinated by IBICT, the Dataverse repository is used by the Cariniana team to help network partners become proficient at using methods of insertion and storage of electronic documents in research data repositories. Dataverse is a large repository open to data from all disciplines and hosted by the Institute for Quantitative Social Science at Harvard University. The Dataverse repository at IBICT provides free-of-charge an available means to deposit, find, and access specific datasets that are being archived by researchers from the participating organizations. It will act as a steward of digital content, is open for data deposits from our institutions' affiliated partners, and it shares content with all their researchers and librarians.
The Dataverse repository includes a relatively simple self-service ingest workflow for researchers; it also has the ability to share with trusted groups of researchers prior to publication, and it helps them fulfil Data Management Plan requirements. The Cariniana team was interested in Dataverse because it can be easily installed and maintained, and it can be brought online with a relatively small staff.
Nonetheless, the main reason Cariniana chose to make a Dataverse repository available to its partner institutions was the ability to integrate it with other systems; that is, LOCKSS and Archivematica 4 for distributed and local long-term preservation, OJS 5 for data publication, and DSpace 6 for interoperability.

WHAT SCIENTIFIC DATA IS ARCHIVED?
Thanks to the technical cooperation agreements established for OJS journals, the Dataverse repository had allowed initial collaboration and support on the implementation of a scientific data preservation service. The target of the service is content from institutional and individual projects. IBICT Dataverse's member service provides an individual space for deposit of archives or datasets for researchers, or a community of researchers and institutions. All the data that is being archived is automatically identified, linked, and supported with access mechanisms.
The datasets in Dataverse are considered a structural archive with standardized metadata to maximize its compatibility and retrieval. This helps researchers meet the requirements of the funding institutions for verification of research project data. All the metadata records are made available for research, and the service allows uploading of datasets identified by the author or institutional owner. Information from the dataverses can be used by local libraries in helping users with better-informed answers to their queries. Furthermore, Cariniana is collaborating with its partners' institutions' libraries in their management decisions regarding researchers' data that needs to be processed by digital curation and archiving processes.  Currently, the repository is being used by twenty-nine Cariniana collaborators, from six institutions, who have deposited data as members of the network. Researchers may store any type of digital data on any subject specialism. The Dataverse repository hosts 89 studies uploaded on 23 dataverses from institutional and individual research projects.

WHAT COMES NEXT?
The repository requirements and policies on access, privacy, and reuse need to be well defined. Cariniana staff are establishing a curator service that will help organize data for preservation. A partner institution is planning financially to support the translation of the system's user interface into Portuguese, and the Cariniana team has produced a user manual. The migration of Dataverse to version 4.0 will be accomplished in the current year. This procedure will be combined with an institutional strategy at IBICT for secure backups and the assignment of a persistent URL.
IBICT's International Technical and Scientific Support Committee is establishing guidelines and recommendations for the repository. The Cariniana network is now discussing and planning further collaborative action and an integration of efforts. Its members consider Dataverse a trusted repository that allows for the long-term persistent access to scientific data, with mechanisms that incorporate full identification and the prevention of digital obsolescence.
IBICT considers it fundamental to establish efficient management of data for the development of scientific research, with quality in all aspects related to organization, documentation, archiving, and sharing of scientific information. As the project evolves, new challenges arise, revealing new approaches to the scientific information digital workflow. In Brazil there are still relatively few research data repositories, but the impact of the contributions from the field of Library and Information Science is growing.

THE EUROPEAN OPEN SCIENCE CLOUD
In 2016, the European Commission's High Level Expert Group (HLEG) on the European Open Science Cloud published their Report. 1 I was privileged to be a member of this HLEG. The Report is designed to establish a vision for the future of Research Data, particularly Open Data, in Europe. The main findings were: 2 • The majority of the challenges to reach a functional EOSC are social rather than technical.
• The major technical challenge is the complexity of the data and analytics procedures across disciplines rather than the size of the data per se. • There is an alarming shortage of data experts both globally and in the European Union. • This is partly based on an archaic reward and funding system for science and innovation, sustaining the article culture and preventing effective data publishing and re-use. • The lack of core intermediary expertise has created a chasm between e-infrastructure providers and scientific domain specialists. • Despite the success of the European Strategy Forum on Research Infrastructures (ESFRI), fragmentation across domains still produces repetitive and isolated solutions. • The short and dispersed funding cycles of core research and e-infrastructures are not fit for the purpose of regulating and making effective use of global scientific data. • Ever larger distributed data sets are increasingly immobile (e.g. for sheer size and privacy reasons) and centralised HPC alone is insufficient to support critically federated and distributed meta-analysis and learning. • Notwithstanding the challenges, the components needed to create a first generation EOSC are largely there but they are lost in fragmentation and spread over 28 Member States and across different communities. • There is no dedicated and mandated effort or instrument to coordinate EOSC-type activities across Member States.

CHALLENGES: FUNDING AND FAIR RESEARCH DATA
Funding is an obvious challenge. The Report emphasises that the need is not so much one to build new infrastructures, but rather to make what Europe already has less silo-based and more interoperable: Based on the consensus that most foundational building blocks of the Internet of FAIR data and Services are operational somewhere, but that they operate in silos per domain, geographical region and funding scheme, we recommend that early and strong action is taken to federate these gems. Optimal engagement is required of the e-infrastructure communities, the ESFRI communities and other disciplinary groups and institutes. Whilst FAIR research data is essential to underpin the EOSC as an Internet commons of data available for sharing and re-use, it is clear that the academic community has some way to go to see Open Data as the default position for research data they are creating/using, as the UCL survey shows.

CONCLUSION
The publication of the EOSC Report represents a watershed in the vision for the creation of a European, and in the long term a global, commons of FAIR research data. Johannes Gutenburg's invention of moveable type printing in the West at the end of the fifteenth century revolutionised the way ideas were recorded and disseminated. The Protestant Reformation, and the Counter Reformation, would not have been possible without the aid of the printing press. FAIR, Open Data and developments such as the European Open Science Cloud have the potential to have a similar impact in the 21 st century. Studies such as the one undertaken by UCL, however, underline the challenges that need to be overcome to deliver the EOSC vision.
Researchers are engaged on a journey, and it is the mission of the LEARN Toolkit of Good Practice to help them arrive at their chosen destination.

RDM Roadmap
The gap analysis activity, led by a new, academic-led RDM Steering Group, resulted in an RDM Roadmap, designed for high level planning that would become a living document as goals were met and new ones added. Eventually the Roadmap covered the time period January 2012 through July 2016, 4 and covered four categories of service: • RDM planning: support and services for planning activities typically performed before research data is collected / created; • Active data infrastructure: facilities to store data actively used in current research activities and to provide access to that storage, and tools to assist in working with the data; • Data stewardship: tools and services to aid in the description, deposit, and continuity of access to completed research data outputs; • Data management support: awareness raising and advocacy, data management guidance and training.

Business case to fund the Roadmap
Capital and recurrent funds were secured from the university to cover the human and physical infrastructure needed to support the services. As stated in the RDM Roadmap, the business case submitted to the University IT Committee in June 2012 estimated a cost of £1M one-off, and £250K recurrent to implement the RDM Policy. In some cases, services already existed and just needed to be brought under the governance of the RDM steering group, such as Edinburgh DataShare, first set up as a demonstrator for managing research data in institutional repositories (DISC-UK DataShare project, 2007-09 5 ), but becoming through completion of Roadmap goals the University's institutional research data repository. 6 The recurrent funds enabled some new RDM-specific posts to be created, as an adjunct to existing support roles across Information Services.

Efficiencies
The University of Edinburgh provides a consolidated underlying infrastructure for a large number of services that require data storage and data management. This has a number of advantages in both providing increased usability for the end user, all their data is accessible in 'one place' (although through numerous services), thus more closely corresponding to their mental model; and providing efficiencies of scale for the operation of the services through avoiding fragmentation and duplication of infrastructures.

Scale
The infrastructure at Edinburgh is composed of a primary base layer of a fast parallel storage file system (using GPFS), approximately 9 Petabytes in total. This allows large numbers of concurrent accesses without degrading the performance. This is then presented to a range of desktop services through a presentation layer of servers that export the correct protocols for Windows, Mac, Linux, etc. In addition, the infrastructure also serves the University compute cluster for large scale analysis tasks.

Infrastructure staffing
In order to provide this underlying converged infrastructure a small operations staff run the infrastructure, and financial support for these posts is shared across specific services (and also specific research activities and projects). There are two specific posts which support the RDM services directly on DataStore, 1 Senior Systems Engineer and 1 Junior Systems Engineer. These provide office hours support for the service and an approximately 99% availability service.

DataStore
The large scale shared high performance storage infrastructure at the University of Edinburgh was initiated in 2005 under the umbrella of the ECDF (Edinburgh Compute and Data Facility). The DataStore service has grown from that starting point to provide the main storage, back-up and disaster recovery infrastructure for research data, group data and personal data. Storage on DataStore is currently charged internally to the University at £175/TeraByte/year.
FROM RDM PROGRAMME TO RESEARCH DATA SERVICE

Transition and new website
The transition from RDM Programme to Research Data Service has been completed and the final Roadmap has been signed off by the steering group, with acknowledgement of some minor missed targets that will be rolled into ongoing service improvements. The new service website 7 reflects all of the service components including DataStore and DataShare and is organised according to a vision which takes into account the full user experience of using the service in the context of doing their research, and in becoming a one-stop shop for any research data-related needs: • User-friendly navigation and headings (instead of brand names, for example 'Active Data Storage' under the general category, 'Working with Data' instead of 'DataStore'); • Tools and support categorised according to a simplified data lifecycle corresponding to before, during, and after a research project; • Generic and customised training and support available on demand.

Service team
The line is difficult to draw on the exact RDM team because of the necessary and pre-existing contributions from staff across Information Services, namely: Library & University Collections, IT Infrastructure, EDINA and Data Library, Digital Curation Centre, and User Services Division. Using service management framework language, the Research Data Service requires the following roles to be filled: Business Owner (representing the customer, currently filled by the Chair of the Steering Group), Service Owner, Service Operations Manager, and Virtual Team; the latter three are all staff members of Information Services.

Funded RDM posts
The Virtual Team is large, and includes IS staff who contribute to the service in any substantial way, and who were already providing data services of some sort before the RDM Programme began. However, the posts specifically funded by the service itself are as follows:

Staff Budget
Median staff costs are 42,000 GBP for senior staff and 34,000 GBP for junior staff. The RDM budget including a small amount for operational costs (events, printing, minimal travel expenses) is 350,000 GBP for 2016-17, although due to normal staff changes and turnover this may vary some from projected expenditure. The funding model builds on the original recurrent university funds and employs cost recovery where practicable, especially via line items in research grant proposals. Hardware costs are considered capital spend.

Ongoing work
Current project activity will lead to additional service components being incorporated: Data Vault and Data Safe Haven. (Data Vault is currently being offered by appointment only but the aim is to move to a selfservice workflow as soon as possible.) Data Safe Haven, an active data infrastructure for sensitive data, is due to be rolled out in August, 2017. Models to sustain their operations are being developed as part of the project activity.

RESEARCH AND RESEARCH-BASED EDUCATION
Research -blue sky and applied -is fundamental to the mission of research-intensive universities. As such, it is enunciated in the Mission Statements of such institutions. The University of Barcelona for example, a research-intensive university in the Catalan region, states that 'The University of Barcelona is a public institution committed to the environment, whose mission is to provide a quality public service of higher education primarily through the study, teaching, research and effective management of knowledge transfer.' 1 This is its Mission. Research also features prominently in its Vision, 'Barcelona University must be a university that offers comprehensive training, ongoing and critical evaluation at the highest level, and research which is both advanced and efficient.' 2 The importance of research in a university is also captured in the Mission Statements of university organisations. LERU, the League of European Research Universities, advocates: 3 • education through an awareness of the frontiers of human understanding; • the creation of new knowledge through basic research, which is the ultimate source of innovation in society; • and the promotion of research across a broad front in partnership with industry and society at large.
Learning through research and enquiry is a fundamental feature of study in a research-intensive university. Universities, with a strong tradition of producing world-class research, wish to demonstrate that excellence not only in their research outputs but also in the learning experience of their students, both undergraduate and postgraduate.
University College London (UCL), a research-intensive university in the UK, has developed a model for research-based education via its Connected Curriculum initiative. 4 This is made up of six inter-connected strands of activity: Research data can also be seen as a learning object. In a digital environment, research outputs cannot be restricted/limited to traditional written works such as journal articles or monographs. Nowadays, research outputs consist of a mixture of objects, amongst which can be found written works and data. One of the building blocks for these publications is research data. Via digital networks, it is possible to share both publications and the underlying data to anyone who can access them. The emergence of research data as a major source of information is now becoming apparent. To take advantage of this revolution researchers, especially early career researchers, need to be trained in best practice in research data management. This Case Study offer one example of how this can be done.

EARLY CAREER RESEARCHERS
In 2001, a US study sponsored by the Pew Charitable Trusts found that: 5 Students in 11 arts and sciences disciplines from 27 institutions and 1 cross-institutional program […] were surveyed. Responses were received from 4,114 students, a response rate of 42.3%. Results suggest that the training doctoral students receive is not what they want, nor does it prepare them for the jobs they take. Many students do not understand what doctoral study entails, how the process works, and how to navigate it effectively. There is a mismatch among the purpose of doctoral education, the aspirations of the students, and the realities of their careers within and outside academia.
In 2017, the situation is better. The UCL Doctoral School, in its Code of Practice, stresses: 6 UCL offers a programme for the development of generic research and personal transferable skills to help you develop the skills necessary not only for successful completion of your degree but also to equip you for later life and for the workplace … The specific menu of courses and other training opportunities should be discussed between you and your Supervisors using the skills self assessment section of UCL's Research Student Log. The self-assessment process is based on a national framework, the Researcher Development Framework.
It follows that the need for skills development has been identified and courses/materials put in place. One of those training needs concerns research data management. 5 Golde

THE LERU DOCTORAL SUMMER SCHOOL AS A MODEL OF BEST PRACTICE
LERU itself has produced a LERU Roadmap for Research Data. 7 Chapter 6 looks at Roles, Responsibilities and Skills and identifies a need for training for early career researchers, for academics and for support staff. The training needs and routes for skills development are clearly identified in Figure 18.1 below. The separate categories are not mutually exclusive. All stakeholders -student/PhD + Senior Researcher + Librarian + Data Scientist need to work together to share knowledge and Best Practice. Nonetheless, the categorisation in Figure 18.1 does attempt to codify the learning needs of each stakeholder group and how these needs can realistically be met. It accepts that there is a graduated series of learning needs, starting with postgraduate/ PhD students, and which increase in complexity as early career researchers become Senior Researchers. In this model, Librarians have a new role to play in the research space. They need to acquire new skills and to impart that knowledge to the groups that they train. This partnership is crucial in embedding RDM skills into the research landscape. Finally, there is the emerging new career of Data Scientist, and this is discussed more fully in the section below on the European Open Science Cloud. Having identified the training needs, how can those needs be met? The LERU Roadmap suggests that, for most categories of user, what is required are credited models and/or professional courses. LERU universities have taken this to the next stage by devising a format for a formal Summer School to train PhD students new to research and to RDM. The first meeting was held in Leiden in Summer 2016. 9 This is a taster for future activity, which is currently being discussed in the LERU network.
The Programme for the Summer School 10 had as its ambition the creation of the 'new generation of data scientists'. Each of the 21 LERU member universities 11 was invited to send one or more members of their doctoral programme to attend the week, the intention being that having received training in Leiden they The format of the Programme was to aim for a mixture of keynote speakers on specific topics, speakers to lead in particular thematic areas and student presentations/discussions. The Summer School highlighted a number of issues, which are likely to form the core of RDM training activity going forward. Some of the more prominent are listed here: • The importance of research data being FAIR (Findable, Accessible, Interoperable and Reusable) 13 • The importance of data management plans in proving a framework for the creation, storage, and sharing of research data 14 • Licensing issues and an explanation of the meaning of the Creative Commons suite of licences and its use in research data 15 • Big Science is Open Science 16 • The future infrastructure for Open Science 17

TOP-LEVEL ISSUES CONCERNING RESEARCH DATA FOR THE LERU SUMMER SCHOOL
FAIR data is one of the building blocks of the new information age. If research data is findable, accessible, interoperable and reusable, it increases in value as a tool for supporting innovation and new discoveries. Effective licensing of research data, when needed, increases their usefulness and makes it clear what the terms of re-use are. One of the drawbacks of the early development of Open Access is that many of the published research outputs tagged as Open Access outputs have no accompanying licence. This makes it difficult to understand exactly what the rules for reuse are in every case. Moreover, the lack of a licence has to be interpreted as all rights reserved in accordance with copyright law. Can an Open Access publication, with no accompanying licence, be re-used for commercial advantage?
Not all research data is big data. Many collections of data form part of a long tail of data creation, where research data has been created/collected to support the publication of a particular article, or a lecture to taught-course students. The term 'big data' is sometimes overused and brings with it legal issues such as privacy into the discussion. Nonetheless, the best future for research data, whether big or small, is that it is open where that is legally possible. Finally, to deliver and perform Open Science, infrastructure is needed -not simply technical platforms but also training and skills development programmes to create the 'new generation of data scientists'. All this and more was discussed in a focussed and intensive week in the LERU Doctoral Summer School.
A particularly important part of the model for the Summer School was the balance between formal presentations and the opportunity for students themselves to present case studies using their own research data, and to interact with speakers. 18  of the recorded tweets. Other participants felt that the Summer School was a valuable mirror to reflect how science is done in the twenty-first century. The participants expressed real enjoyment at being able to participate in the event. In fact, they wished they had had more time to discuss the new information that they were learning each day. With feedback like this, the objective of the Summer School to provide solid training in data stewardship for the next generation of future leaders does not seem to have been unrealistic.

THE FUTURE PATTERN OF SKILLS DEVELOPMENT: THE EUROPEAN OPEN SCIENCE CLOUD?
In July 2016, the European Commission published the Report of its High Level Expert Group on the European Open Science Cloud: 19 … The European Open Science Cloud (EOSC) aims to accelerate and support the current transition to more effective Open Science and Open Innovation in the Digital Single Market. It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders. The term cloud is understood by the High level Expert Group (HLEG) as a metaphor to help convey both seamlessness and the idea of a commons based on scientific data. This report approaches the EOSC as a federated environment for scientific data sharing and re-use, based on existing and emerging elements in the Member States, with lightweight international guidance and governance and a large degree of freedom regarding practical implementation. The EOSC is indeed a European infrastructure, but it should be globally interoperable and accessible. It includes the required human expertise, resources, standards, best practices as well as the underpinning technical infrastructures. An important aspect of the EOSC is systematic and professional data management and long-term stewardship of scientific data assets and services in Europe and globally. However, data stewardship is not a goal in itself and the final realm of the EOSC is the frontier of science and innovation in Europe. Important in this summary of activity, for present purposes, is the recognition of the importance of skills development. The LERU Roadmap itself identified the category of Data Scientist as the summation of skills development in terms of research stakeholders. This identification is further developed in the EOSC Report by emphasising the absolute importance of developing the role of data stewards to deliver the vision of a global commons of scientific data. The Report suggests: A first cohort of core data experts should be trained immediately to translate the needs for data driven science into technical specifications to be discussed with hard-core data scientists and engineers. This new class of core data experts will also help translate back to the hardcore scientists the technical opportunities and limitations. 20 Elsewhere, the Report puts figures to the training requirement: The number of people with these skills needed to effectively operate the EOSC is, we estimate, likely exceeding half a million within a decade. As we further argue below, we believe that the implementation of the EOSC needs to include instruments to help train, retain and recognise this expertise, in order to support the 1.7 million scientists and over 70 million people working in innovation. The success of the EOSC depends upon it. 21 These are significant numbers. It will take a significant investment in European teaching infrastructures to develop the curricula, agree success criteria for measuring successful delivery and finance this huge training undertaking. Commissioner Moedas (Research, Science & Innovation), however, has highlighted the need for skills development and has said, 'Such recommendations deserve detailed consideration by the scientific community and other stakeholders.' 22 Research performing organisations need to start somewhere, as the LERU Roadmap makes clear. In this context the model of the LERU Doctoral Summer School seems a measured, successful and immediate response that such bodies need to make to manage the training needs implicit in the data deluge.

CONCLUSION
The purpose of this Case Study has been to look at the role of research performing organisations in skills development for early career researchers, and to set those needs in the context of the growing importance of research data and the emerging role of the data steward.

INTRODUCTION
University College London (UCL) ranks among the top twenty universities in the world and is one of the most successful British research institutions at attracting funding. Almost all academic disciplines are represented in its 380 research departments, units, institutes and centres 1 . UCL is home to 12,000 research staff and research students 2 .
UCL Library Services run eighteen libraries which support UCL's teaching and research activities, including one in the award-winning School of Slavonic and East European Studies building and several that provide services to both UCL and the National Health Service. The combined staff in UCL Library Services totals 263 FTE (full-time equivalents). Amongst this number are around 30 subject liaison and site librarians who have responsibility for supporting the research and teaching of the institution. These librarians are the primary points of contact for academics, researchers, UCL staff and students. They provide subject-specific support and advice on resources and collections, offer training to staff and students, and promote and provide training on the various teaching and research support services that the Library offers, including open access services.
Two Research Data Support Officers work as part of the same team as well as in close collaboration with the UCL Information System Division (IT Services) and several other central services. These officers coordinate Research Data Management (RDM) advocacy and support across the institution. To ensure the long-term sustainability and scalability of the RDM support service, as well as sufficient subject discipline support, the RDM team aims to foster several support networks of subject-specific experts across the university. The subject liaison and site librarians form one of these networks.

INTRODUCING LIBRARIANS TO RDM THROUGH WORKSHOPS
The first UCL Research Data Policy (launched in August 2013) was accompanied by introductory presentations on RDM and related service developments for Library Services' staff. A programme of three day-long workshops was subsequently planned to inform and train library staff about current issues in Research Data Management, from key definitions up to the review of Data Management Plans. These workshops took place in 2015; they gathered between 30 and 35 participants each. The sessions were designed and delivered by Data Management experts from the Information School of the University of Sheffield. The outline for each workshop was as follows: Session 1: This workshop provided an introduction to research data and its management in the context of UCL and the Library's role. Topics covered included the nature of data and research data services; data Session 2: This workshop started with presentations on what participants had learnt from their conversations with researchers. This was followed by a discussion about the survey approach to gathering further information about data management in the university. Librarians looked at identifying key choices in planning training and at the issues around selecting, describing and citing data. At the end of the day, participants were tasked with group exercises to prepare the last workshop.
Session 3: In this workshop, participants heard from each group what their ideas and plans were for addressing the various aspects of RDM support identified in the first two sessions. Librarians presented group reports on practical RDM, requirements gathering (and in particular the Data Asset Framework method 3 ), data sharing, data sources and RDM websites.
The first two sessions comprised presentations by the workshop leaders introducing a new area, approach or issue coupled with group activities which led to the final project of creating a plan on how to respond to key aspects of RDM. This was done in small groups working together outside the workshops to prepare their response and also a brief presentation to be delivered at the final workshop.
Feedback from the workshops was positive and librarians welcomed the thoroughness of the programme, which did cover all the key aspects of RDM. They also welcomed the opportunity to work collaboratively with colleagues from across Library Services. The only negative feedback received was that the workshops had been spread over a 6-month period with 2-3 month intervals between each.
In 2016 a fourth workshop focused on central RDM services run by UCL Library Services and the UCL Information Systems Division jointly. The 30 participants discussed the roles and interaction of these different services, and how these are explained to researchers and research students across all disciplines. This event was also planned as a networking opportunity which gathered together for the first time librarians, Research IT staff and departmental data managers.
Feedback received from this event showed that it helped to meet colleagues across the university and to understand how the different services join up. Theoretical presentations were considered to be less useful, and several participants suggested that having small group activities to put into practice what was said in the presentations would enhance their learning; this would include creating flow-charts to explain data storage processes within the university and drafting discipline-specific guidance for researchers and students. In 2015, the Group first concentrated on building the new RDM website, and second on designing and promoting a cross-university RDM survey. For the first activity, all Group members were trained to improve their knowledge of the Library Content Management System and to write for the web. They worked in pairs to draft, edit and publish online the webpages that they had chosen to work on. The website 4 was completed in two months and launched at the start of the academic year 2015/2016. It featured nine howto guides, a section about the university's and research funders' policies on research data, key definitions about RDM, a searchable list of Frequently Asked Questions, and a selection of resources and tools to learn more about RDM. The website is regularly updated since its launch and new resources have since been added. The survey was designed, tested and promoted by five members of the RDM Working Group between the summer 2015 and winter 2015-2016. The exercise was primarily aimed at finding information about awareness, practices and needs related to RDM across all faculties. Analysis of the results helped assess what support was needed by researchers with regard to RDM, and how this should be prioritised by UCL Library Services and other research support services across the university.

INVOLVING LIBRARIANS IN RDM THROUGH PARTICIPATION IN A WORKING GROUP
In summer 2016, the Group worked on creating discipline-specific resources to help researchers throughout their research projects; such resources include RDM guidance, metadata standards, data repositories and ethics guidelines. A second completed project was the design of a course template to introduce research students to RDM. The template consists of a series of presentation slides, a lesson plan and guidance to deliver the course. The course was tested in autumn and winter 2016 by three members of the Group with cohorts of Masters and PhD students. It serves as an essential basis to develop future courses on RDM at a more advanced level and aimed at further communities across the university.
RDM Working Group members cite their primary reason for volunteering to take part as being the opportunity to extend their knowledge of RDM both to fulfil personal interest, but also to provide extended research support to the departments with which they work. In the case of subject liaison librarians, a greater knowledge of RDM has been a means to establish new points of contact within the academic communities that they support.

CONCLUSIONS
RDM training will continue within UCL Library Services to ensure that subject liaison and site librarians' knowledge stays up to date. Currently, future plans include a 'train the trainer' session to help them deliver introductory courses on RDM in research departments. A session on reviewing Data Management Plans (DMPs) is also being designed as several librarians have expressed the need to be able to follow-up with enquiries on DMPs once they have delivered the introductory course.

MOTIVATION AND BACKGROUND
Modern research requires new types of specialists that are capable of supporting all stages of the research data lifecycle -from data production and input to data processing, storage, and the publishing and dissemination of scientific results, which can be jointly defined as key components of the emerging profession of Data Science (DSP).
To address this demand from research and industry, the Horizon 2020 Programme is funding the EDISON Project (Grant 675419, INFRASUPP-4-2015: CSA), 1 the goal of which is to build the Data Science profession for European research and industry. This includes the definition of Data Science and data handling-related professional profiles (or occupations), corresponding core competences and skills, the Data Science Body of Knowledge and a Model Curriculum that together comprise the EDISON Data Science Framework. This work is done with the involvement of the main stakeholders from the research community, industry, data preservation and handling community, universities and professional training organisations.
The University of Amsterdam is coordinator and a base organisation for the EDISON Project; other partners include the University of Stavanger (Norway), the University of Southampton (UK), Engineering Italy, EGI. eu, FTK (Germany), and Inmark Europe (Spain). The project benefits from multiple Data Science-related initiatives and academic activity and effective cooperation between Computer Science and multi-disciplinary departments, University Library and IT departments. It is also supported by such external initiatives as the Amsterdam Data Science Centre and Amsterdam School of Data Science (ASDS). On the other hand, all project recommendations find their practical pilot implementation at the University of Amsterdam and in cooperating organisations. This includes four Data Science and Big Data programmes, Research Data Management (RDM) training (together with the University Library), training for researchers, programmes and course catalogue services for universities and students, and advice for companies.

STAKEHOLDERS AND THEIR ROLE IN DATA SCIENCE EDUCATION
To create a foundation for the sustainable education and training of future Data Science professionals and Core Data Experts to support present and future data-driven research, the EDISON Project involves and is cooperating with multiple stakeholders, relevant bodies and communities. This includes but is not limited to the following: • (see below for a description of the programmes). Course development, teaching and support is provided primarily by departmental staff with some facility services maintained by ICT departments. • The University Library is involved in two main activities: (i) it provides basic training for researchers and contributes to the more general academic education for students in RDM; (ii) it cooperates with the ICT department in developing and implementing university-wide RDM services, infrastructure and policy. • The ICT department supports Data Science education by providing and maintaining HPC facilities and services. The ICT department cooperates with the University Library in implementing RDM infrastructure and policy university-wide.

EDISON DATA SCIENCE FRAMEWORK
The EDISON Data Science Framework (EDSF), 2 is a core product of the EDISON Project that provides a basis for the definition of the whole ecosystem for education, training and professional development in core Data Science and Data Management-related competences and skills. An important component of EDSF is the Data Science professional family that provides a basis for defining customisable educational and training programmes for different target professional groups. Figure 20.1 below illustrates the main EDSF components:

• CF-DS -Data Science Competence Framework • DS-BoK -Data Science Body of Knowledge • MC-DS -Data Science Model Curriculum • DSP -Data Science Professional profiles and occupations taxonomy • Data Science Taxonomy and Scientific Disciplines Classification (including Vocabulary)
The proposed framework provides a basis for other components of the Data Science professional ecosystem: • EDISON Online Education Environment (EOEE) • Education and Training Marketplace and Directory • Data Science Community Portal (CP) that also includes tools for individual competences benchmarking and personalized educational path building • Certification Framework for core Data Science competences and professional profiles The DSP profiles and Data Science occupations taxonomy are defined based on, and as an extension to, the European Skills, Competences, Qualifications and Occupations (ESCO) framework. The DSP profiles definition provides an instrument to create effective organisational structures and corresponding roles to support the whole data management lifecycle. For example, in the area of professional data handling/management, the following taxonomy is proposed: Professional (data handling/management): Data Stewards, Digital Data Curator, Digital Librarians, Data Archivists. DSP can also be used for building individual career paths and corresponding competences and skills transferability between organisations and sectors of the economy.

Data Science Competence Framework (CF-DS)
The Data Science Competence Framework (CF-DS) 3 has been built based on an extensive study of the demand and supply side of the Data Science job market, organisational structures and roles as well as existing practices and standards in the area of competences and skills management. The figure below [20.2] presents the following competences: Three competence groups identified in the NIST document and confirmed by the analysis of collected data: • Data Analytics including statistical methods, Machine Learning and Business Analytics • Engineering: software and infrastructure • Subject/Scientific Domain competences and knowledge Two newly identified competence groups that are in high demanded and are specific to Data Science  Knowledge of scientific research methods and techniques makes the Data Scientist profession different from all previous professions. For business-related professions, a similar role belongs to business process management in areas that need to be adapted to a new data-driven agile business model, in particular, to adopt continuous data-driven business processes improvement.
Data management, curation and preservation are already included in existing (research) data-related professions such as data steward, data archivist, data manager, digital librarian, data curator, and others. Research data management is an important component of European Research Area policy. Companies also recognise the need for data management skills when they start using data-driven technologies.
The identified demand for general competences and knowledge of Data Management and Research Methods needs to be addressed in future Data Science education and training programmes, as well as being included in re-skilling training programmes. It is important to mention that knowledge of Research Methods does not mean that all Data Scientists must be talented scientists; however, they need to understand general research methods such as formulating an hypothesis, applying research methods, producing artefacts, and evaluating an hypothesis (so called 4 steps model). Research Methods training is already included into Masters programmes and for graduate students.
The identified competence areas provide a basis for defining education and training programmes for Data Science-related jobs, re-skilling and professional certification.
Other skills commonly recognised are referred to as "soft skills" or "social/professional intelligence": interpersonal skills or team work, the ability to cooperate. In many cases, an organisation expects the Data Scientist to provide a kind of literacy advice and guidance on related data analysis and management technologies.

Data Science Body of Knowledge (DS-BoK)
The DS-BoK should contain the following Knowledge Area groups (KAG) that are defined after CF-DS competence groups: Universities can use DS-BoK as a reference to define knowledge areas that they need to cover in their programmes depending on the primary demand groups in research or industry. Domain-specific knowledge can be acquired as a part of academic education or as postgraduate professional training at the graduate's work place. It is also commonly recognized that KAG6-DSDKX is essential for the practical work of a Data Scientist, which means that Data Scientists need to have sufficient understanding of specific subject domain-related concepts, models, organisation and corresponding data analysis methods to effectively communicate with domain-related specialists for data collection, insight and the presentation of results.

Data Science Model Curriculum (MC-DS)
The initial Data Science Model curriculum provides two basic components for building customisable Data Science curricula: (1) the definition of a learning outcomes (LO) based on the CF-DS competences, including their differentiation for different proficiency levels, e.g. using Bloom's Taxonomy, (2) definition of the Learning Units (LU) that map to the LOs for target professional groups, which need to be defined in accordance with existing academic discipline classifications such as the 2012 ACM Computing Classification System (CCS2012). 4

Data Science Professional Profiles Definition (DSPP)
The proposed Data Science Professional profiles (DSPP) 5 definition is based on the analysis of the demand in research and industry in data-related professions as well as in current company practices in defining new data-related organisational roles. The identified professional profiles are classified using ESCO taxonomy 6   The University of Amsterdam is starting 4 new Data Science programmes and tracks that are based on/ originate from different departments, and which are aimed at different industries and target groups from Computer Science, Business Administration, and multidisciplinary studies. They are primarily intended to answer the needs of the Dutch economy (i.e. industry, research and public services) which is to a large extent international. The programmes and tracks are developed by the departments independently, but all of them use general EDISON recommendations.
i. Artificial Intelligence and Data Science (specialisation) (http://gss.uva.nl/future-msc-students/information-sciencescontent26studyprogramme/ profile-data-science.html) Track -Master At the core of Data Science are methods for the analysis of large volumes of data. Recently much more data has become available in electronic form, and methods for the analysis and modelling of these data for prediction, classification and optimisation have become much more effective. Recent technical innovations, such as Deep Learning, provide increasingly powerful tools that make it possible to find complex patterns in very large datasets.
Much of the Master's Artificial Intelligence (AI) degree is about Data Science. The obligatory courses on Machine Learning address key technology and theory for modelling large amounts of data. The courses on Machine Learning, Natural Language Processing, Information Retrieval and Computational Intelligence all have a strong focus on data-driven methods. For the "AI courses" in the curriculum, students can choose advanced courses on these topics: Machine Learning 2, Computer Vision 2, Natural Language Processing 2, Information Retrieval 2, Deep Learning, Data Mining Techniques, Information Visualisation and Probabilistic Robotics. All these courses are about modelling data. These can be complemented by courses outside AI, for example on distributed computer systems, privacy and ethical questions, or on statistics.
Within programme: Artificial Intelligence Organisation: UvA Language: English Duration: 5 months ii. Big Data Engineering (http://gss.uva.nl/future-msc-students/information-sciences/content28/computer-science.html) Track -Master In the Internet era, data is at the centre of the stage. We all continuously communicate via social networks, we expect all information to be accessible online continuously, and the world's economies thrive on data processing services where revenue is created by generating insights from raw data. These developments are enabled by a global data processing infrastructure, connecting everyone from small company computer clusters to data centres run by world-leading IT giants. In the Big Data Engineering track, you study the technology from which these infrastructures are built, allowing you to design and operate solutions for processing, analysing and managing large quantities of data. This track is part of the joint Masters in Computer Science, in which renowned researchers from both the Vrije Universiteit Amsterdam (VU) and the University of Amsterdam (UvA) contribute their varied expertise in one of the strongest Computer Science programmes available in Europe.
Within programme: Computer Science Organisation: UvA + VU Language: English Duration: 2 years iii. MBA Business Analytics & Data Science (http://abs.uva.nl/programmes/mba/content2/mba-big-data.html) Track -Master MBA This MBA in Big Data and Business Analytics is intended for hands-on Big Data specialists, for people in leadership roles working with Big Data and for Entrepreneurs. The curriculum of this MBA is highly multidisciplinary, with courses from A (analytics), B (business) and C (computer science), and with projects to practise and implement the integration of these three aspects.
Furthermore, the curriculum is a mix of state-of-the art theory taught by renowned academic professors, and it includes practical applications of this knowledge taught by people with extensive industry experience. In the curriculum, much time will be devoted to the '21st century skills' -the skills required to become successful in this age: entrepreneurship / entrepreneurial attitude, flexibility, teamwork, communication skills and ethics.
Key features: • Two-year part-time programme (2 evenings per week); • Balanced curriculum consisting of Business courses (e.g. strategy, finance, marketing, HRM), Analytics courses (e.g. statistics, econometrics, system optimization) and Computer Science courses (e.g. machine learning, data visualisation); • All lecturers combine theory with practical applications; • Silicon Valley study trip and Big Data Thesis Project will be part of the programme; • Degree: Master of Business Administration (MBA) granted by the University of Amsterdam; Organisation: Amsterdam Business School, UvA Language: English Duration: 2 years iv. Data Science (http://gss.uva.nl/future-msc-students/information-sciences/content/data-science.html) Track -Master In the one-year Data Science Master's track, you will acquire knowledge of the theories and tools used in data science. We will teach you how to use these tools for working with data in different domains, such as Healthcare, Media and Communication, Smart City, Life Sciences and Digital Humanities. Graduates have an integrated view on the possibilities and development of data science in society. Students will benefit from the strong collaboration with Amsterdam Data Science (ADS), bringing together leading researchers across the entire life cycle of data science, from expertise in machine learning and information retrieval to human computer interaction and large-scale data management.

RESEARCH DATA MANAGEMENT EDUCATION AND TRAINING
Research Data Management training is recognised as essential for practising researchers of all scientific domains and important for academic Data Science education. It is typically covered by training programmes for postgraduates, PhD students and researchers; however it is rarely covered by existing or planned academic programmes and courses. It has been identified that to cover the wide needs of the research and academic community, the RDM curriculum and training materials must allow easy customisation and localisation to adjust to the trainees' background and local infrastructure resources, as well as to cater for the needs of specific scientific domains.
The EDISON Project has addressed RDM training and education as a priority issue in order to contribute to raising standards in general competences and skills related to working with research data and with the variety of modern data including social (network) data, environmental data and business data. The EDSF provides a basis for defining a general RDM training program that covers the major practical aspects of RDM; this can be also considered as an important component of more general data literacy training.

The proposed customisable RDM training program
The following RDM training program has been constructed based on an extensive study of existing RDM training programmes and resources, in particular collected at the Data Management Clearinghouse 7 and by the RDA US directory of RDM resources 8 . It covers most topics available in currently-available RDM training programmes and curricula, has a modular structure and provides the possibility of expanding into more specific data management topics that may be required by specific groups of practitioners.
A Research Data Management training or education programme should contain the following essential modules (allowing extension and adoption to particular target communities):

B. Data Management elements (organisational and individual)
• Goals and motivation for managing your data • Data formats • Creating documentation and metadata, metadata for discovery • Using data portals and metadata registries • Tracking data usage • Backing up your data • Data security and integrity • Data Management Plan (DMP) (also a part of hands on session(s) )

C. Responsible Data Use Section (Citation, Copyright, Data Restrictions)
• Handling sensitive data • Ethical issues, obtaining consents The proposed RDM training program has been taught at the Data Science workshop since May 2016 at Amsterdam Business School, University of Amsterdam 9 organized by the EU Erasmus+ Eduworks 10 Project. The program contained two major parts: general RDM topics, and Data Management Plan (DMP) design that was presented as a hands-on exercise. The training materials were developed jointly by the EDISON Project UvA in cooperation with the University Library and are available under a CC BY licence. Further development is expected in the framework of the proposed RDA Working Group on RDM literacy.

REQUIRED RESOURCES
A successful Data Science education programme depends on the availability of 3 key components: (1) teaching staff, (2) computing and lab facilities, and (3) a pool of experts/advisers and related topics for course and thesis projects. All three components create challenges and require advanced planning. The following offerings are made available by the relevant departments: 1. Teaching staff: Core teaching staff are provided by departments hosting the programme or track; associate teaching staff from industry provide specialised courses; local industry experts are invited to give selected lectures; leading domain researchers and experts are invited to give lectures, seminars and colloquia. 2. Computing and lab facilities: Computer classes are operated by departments and supported by ICT departments; high performance computing facilities are provided by the SURFsara Dutch research HPC facility; departments are actively using research and educational grants from the major cloud and Big Data providers such as Amazon Web Services, Microsoft Azure, IBM Watson and BlueMix to give students the opportunity to learn about leading industry platforms and applications. 3. A pool of experts and project development topics: departments maintain a network of external experts and collaborating research and technology organisations that provide advice on students' projects and host students' thesis projects.
A common problem and gap in developing consistent Data Science programmes is setting up a professional Data Management course that would cover both Research Data Management and industry data management and governance topics. The EDISON Project is cooperating with departments to develop core Data Management courses including Research Data Management courses and training for students and researchers.

COORDINATION OF RELEVANT ACTIVITIES INSIDE UVA
For coordination purposes and for the exchange of experience, UvA has created the Data Science Interest Group and a corresponding mailing list that has become an important forum for coordinating activities between departments, projects and collaborating organisations. This important role belongs to the Amsterdam Data Science Centre (ADS) 11 which is a joint initiative of 10+ companies and institutions in the Amsterdam area; the recently-established Amsterdam School of Data Science (ASDS) also has an important role to play. 12 20.8. CONCLUSIONS

INTRODUCTION
Quite often when we talk about legal issues related to research data, we fall into discussions about privacy and personal data. This issue is fundamental when data are gathered from personal surveys or clinical trials, for instance. In these cases, researchers should follow the standard procedures established by their institutions through dedicated committees, for example an ethics or bioethics commission. In many of these cases, data cannot be shared openly. Only some aggregated data or anonymised data can be shared following a strict procedure 1 .
In this chapter, I would like to focus on the legal tools we have to make data open once we have overcome all the possible barriers to providing data gathered or created during research activities. For the purpose of this case study, I will use the term open as defined by the open definition: "Open data and content can be freely used, modified, and shared by anyone for any purpose". 2 First, I will look at how copyright deals with data and afterwards I will review the different options we have to share data openly. It is important to know how researchers can share data because reusability is one of the FAIR principles that research data must fulfil. 3 As stated in the principles, data and metadata must be released with a clear and accessible data usage licence.

DATA AND COPYRIGHT
To analyse what the different options to license data are, we must first review which rights are involved. Data is a complex term in relation to copyright because there are many formats of research output that can be considered as data depending on the discipline. For instance, data can be numbers, texts, or images. This variety of formats involves a different treatment when applied to copyright. It is clear that facts or dates cannot be copyrighted by anyone and therefore they fall outside any protection. In those cases there is no need to use a licence, and the best practice is to state that all this kind of data is under public domain.
However when data are texts or images, copyright has to be taken into account. Generally when there is a degree of originality exploitation rights appear, and there is a need to use a licence to authorise wide reuse: otherwise data should be considered with all rights reserved. 4 Even in cases where images have a lack of originality, some legislation grants the performers some exploitation rights, shorter than the ones granted when images are considered works. 5 Current copyright laws do not require any procedure to get exploitation rights. Therefore in the absence of a copyright notice, the "all rights reserved" regime should be applied. 5 For instance in Spain, the agent of a "mere photograph" has such a right for 25 years. More on the situation of non-original photographs: Thomas Margoni, "The digitisation of cultural heritage: originality, derived works and (non) original photographs", http://www.ivir.nl/publicaties/download/1507.pdf (last accessed 29/01/2017) Moreover, data are not usually released individually, but rather as part of a compilation or a database. This way of presenting data could be protected by copyright in two different ways. Again, if the compilation or database has a degree of originality, it can be protected as any other creative work, as I have mentioned before in relation to data. The originality has to be found in the selection or arrangement of the data. This protection is granted even if the compiled data by themselves are not copyrightable. Furthermore, in the European Union and a few other countries, databases with a lack of originality in the selection or in the arrangement of data may have another layer of protection by means of the so called sui generis (i.e. of its own kind) right.
This right recognises the substantial investment in compiling a database and grants the creator a period of protection of fifteen years. During this time, nobody can extract the whole content or a substantial part of the database and reuse it without consent. Again this protection is granted to any database whereas its content could be protected, or not, by copyright. Therefore we must take into account these different layers of protection in order to share data openly. In the next section I will review some of the licences that we can use.

LICENCES FOR DATA AND DATABASES
Probably when we deal with licences for open content, the first set of legal texts that come to us is the one provided by Creative Commons (CC). 6 However there are other options that fulfil the requirements to deal with all the possible layers of protection in a database.

Use of Creative Commons Licences for data and databases
With almost 15 years of experience, the suite of licences developed by Creative Commons (CC) provides a good solution to share any content that falls under the scope of copyright protection. Therefore, if we want to share data that could have some protection due to its originality or its format, we can consider using them, as we can if we want to share a database with originality in the selection or the arrangement of its elements.
Currently CC offers a standard set of six licences that provide for different degrees of reusability. Any of the six licences grants the right to reproduce, distribute and communicate in public the licensed material for non-commercial purposes. Depending on the licence, it is even possible to grant those exploitation rights for commercial purposes. Four of the six licences also grant the transformation right that permits the creation and dissemination of derived works. When the transformation right is granted, the licensor can require that the possible derived works be disseminated using the same licence as the original work or an equivalent one. This requirement is inspired by the copyleft 7 clauses that originally carried free software licences.
It is important to note that CC also has a public domain mark 8 that can be used to identify public domain works. This tool has been used in some governmental material and in cultural and heritage institutions.
Until the current version 4.0, CC licences approached the sui generis database right in different ways. Initially, and due to its US copyright inspiration, there was no mention of this right because it is not recognised in US copyright law. In version 3.0, some of the ported versions developed by European CC affiliates introduced For a detailed explanation of the types of licences, go to: https://creativecommons.org/share-your-work/licensing-types-examples/ (last accessed 29/01/2017). 7 An arrangement whereby software or artistic work may be used, modified, and distributed freely on condition that anything derived from it is bound by the same conditions. the issue into their local texts and they mainly proposed to waive the sui generis right when licences were attached to databases.
In version 4.0, where in principle there will be no porting process other than translations, the sui generis database right has been included in a dedicated section of the legal code. The current version treats this right as any other exploitation right. This means that if a licence prohibits the work to be reused for a commercial purpose, it implies that the extraction and reuse of all or a substantial part of the elements in a database cannot be for commercial exploitation.
Therefore the requirements of the four elements of the CC licences have the following implications when applied to the sui generis database right: • Attribution: Any extraction and reuse of all or a substantial part of elements from the licensed database requires a proper acknowledgment of its creator and any others designated to receive appropriate credit; • Non Commercial: Any extraction and reuse of all or a substantial part of elements from the licensed database cannot be for a commercial purpose; • Non Derived Works: It is not allowed to build a new database with all the elements, or a substantial part of them, extracted from the original licensed database; • Share Alike: It is allowed to build a new database with all the elements, or a substantial part of them, extracted from the original licensed database, but this new database has to be licensed under the same licence or an equivalent one.

Licences created for data and databases
Before having the abovementioned sui generis right included in the six standard licences, CC created a legal tool aimed at scientific databases. This tool is called CC0 and it is both a waiver and a licence at the same time. 9 Sometimes CC0 is seen as a pure public domain dedication and it raises some concerns in those countries where the copyright law does not allow the placing of a work into the public domain before the protection term expires or the waiving of all copyright rights, especially moral rights. In fact CC0 is not a full waiver of rights. CC0 works on two levels: first, the rights holder waives all rights over the work or content to the fullest permitted by law; 10 second, all the unwaivable rights are then granted to the fullest permitted by law to the user, acting as a licence without any requirements. If there are still some rights that cannot be waived or licensed by the applicable law they remain with the corresponding rights holder.
Before the release of CC0, the Open Knowledge Foundation created the Open Data Commons project to provide legal solutions for open data. This initiative launched three licences addressed to share data and databases openly: the Open Database Licence, the Attribution Licence, and the Public Domain Dedication and Licence. 11 The first is a pure copyleft licence allowing for wide reuse, with the requirement to use the same licence when creating a derived licence. The second licence only requires a proper attribution in a 9 For a detailed explanation of the Public Domain Mark, go to Creative Commons: https://creativecommons.org/share-your-work/public-domain/ cc0/ (last accessed 29/01/2017). 10 As is explained in the CC0 dedicated FAQ, no legal instrument can ever eliminate all copyright interests in a work in every jurisdiction. Finally we can mention a couple of licences created to allow the reuse of public sector information: the Open Government Licence from the United Kingdom 12 and the French Open Licence/Licence Ouverte. 13 Both these licences grant a full reuse of the information attached to them, acknowledging the corresponding sources.

CONCLUSION
Before starting to think about the most suitable licence to be applied for reusability, it is important to check that data can be legally released and that there are, for instance, no implications for privacy, security or confidentiality. It is important to use a licence that takes into account all the possible layers of protection applicable to data: authors' rights, neighbouring or related rights, and especially the sui generis database right. If we pursue wide reusability, we must avoid licences that restrict some uses, for instance commercial purposes or the creation of derived materials. Licences that only require an acknowledgement of the source and the creators of data and/or databases fulfil the goal of providing complete reusability.

INTRODUCTION
This case study will describe the experience of the Centro Argentino de Información Científica y Tecnológica del Consejo Nacional de Investigaciones Científicas y Técnicas (CAICYT-CONICET) 1 in the research, development and implementation of a Research Data Management Plan for the Observatorio Nacional de la Degradación de Tierras y Desertificación (ONDTyD) 2 and for CONICET.

A RESEARCH DATA MANAGEMENT PLAN BY CAICYT-CONICET
Several international organisations related to the field of science and technology (National Research Agencies, Funders, University Consortia, etc.) have started to require that research project funding applications be accompanied by a Research Data Management Plan (DMP) elaborated by the lead researcher and/or the group of researchers applying for funds.
The DMP allows for, on the one hand, the organisation of research data for researchers and, on the other, the ability to diagnose, characterise and predict, based on the information contained in the DMP, thus making it a valuable instrument for institutions managing Science and Technology. Furthermore, the DMP becomes a fundamental tool to assess and evaluate the potential impact (social, economic, cultural, etc.) implied in the development of research projects.
In Argentina there exists legislation and regulations that provide a framework and formalise the requirement for Data Management Plans (DMP): •

WHAT IS A DATA MANAGEMENT PLAN (DMP)?
A research data management plan (DMP) is a document elaborated by a researcher or a group of researchers, where the following is defined: • What data will be created and how; • How data will be described, organised, stored and managed; • Who will be responsible for each of these activities; • How data will be shared, explaining any use restriction that could apply.
The data management plan (DMP) is a live document, which evolves until the end of the research and its subsequent publication. Usually, a DMP is required at the following points in time: (1) at the time of requesting funding, accompanying the research project proposal; (2) once the project has started; (3) half way through the project; (4) at the end of the research project.

PROBLEMS WITH RESEARCH DATA
The National Observatory of Soil Degradation and Desertification (ONDTyD) is a national system for the evaluation and monitoring of soil across different scales (national, regional and pilot sites), based on an integral, interdisciplinary and participatory approach. It is sustained by a network of science and technology, and political organisations that provide data and knowledge and, at the same time, are also users of that information. Interactive maps, publications and an online geospatial data repository are being developed for their visualisation. The goal of ONDTyD is to identify the causes of desertification, to anticipate environmental risks and to collaborate in the restoration of affected ecosystems.
In the methodology developed, ONDTyD uses indicators of biophysical and socioeconomic vectors. However, the researchers were not aware of the lifecycle of their data, data management practices, documentation of their use, re-use, licenses or long-term preservation. The result was multiple versions of data from various sources and a lack of standardisation.
ONDTyD invited CAICYT-CONICET to collaborate in the improvement of these areas of their ongoing research project, whose indicators have varying levels of progress in terms of data collection.

DEVELOPMENT OF THE RESEARCH DATA MANAGEMENT PLAN
The first task was to discover the level of awareness of the field of data management amongst the researchers and to identify the research practices, documentation generated and group workflows at ONDTyD. We established regular meetings with the group coordinators, with specific researchers, as well as other meetings of a more general nature with the whole group. These meetings allowed us to understand, determine and reach consensus among participants about research data lifecycles and workflows.
We continued with the identification, analysis and comparison of research data management plans required by the Digital Curation Centre (DCC, UK), Horizon 2020 (European Union), the National Science Foundation (NSF, USA) and the Australian Research Council (ARC, Australia), as specified in the Information Laboratory of CAICYT -CONICET's working paper "Analysis of Data Management Plans".
The following action was to develop a Research Data Management Plan for ONDTyD, incorporating a data dictionary which was also developed (the dictionary specifies what information is required and incorporates definitions and alternative answers to the questions of the DMP). Furthermore, a section on Best Practices was included, referring to: (a) Data formats, (b) Folders and files structure, (c) Version control, and (d) Metadata schemas.
The ONDTyD-DMP includes the sections: (a) Administrative data; (b) Data collection; (c) Documentation and metadata; (d) Storage and security copies; (e) Selection and preservation; and (f) Data re-use.

PLATFORM FOR DMP MANAGEMENT, TRAINING AND SUPPORT
The next phase was to develop and to implement a digital tool to enable the research group (located across different provinces and cities in Argentina) to load, edit, and store and publish remotely a Data Management Plan (ONDTyD-DMP).
We identified and compared different online platforms for the management of a DMP. For diverse reasons, the tool selected was DMPonline 6 developed by the Digital Curation Centre (DCC, UK). Following acquisition, we then undertook the customisation and translation of the platform for use by the ONDTyD.
To ensure the implementation and correct use by all members of the Observatory, the next step was to deal with training and support: • Development of a workshop entitled "Scientific Data: quality, normalisation and visualisation" • Development of a virtual course about the ONDTyD-DMP, which incorporated information on the required sections and best practices (mentioned above). • Establishment of a support helpline, to answer questions emerging in the process of filling out the ONDTyD-DMP

IMPACT
After meeting and exchanging information with ONDTyD, the combined workgroup deemed it necessary to reconsider some methodological decisions, resulting in the enhancement of data, their documentation and the management of research data created and to be generated in the future. In this way the group of researchers of ONDTyD improved their understanding and skills in the management of research data.
The Fundación Williams 7 -the project funder -make clear its interest in incorporating the DMP as an integral element in the process of receiving future research funding applications.
Based on the previous experience and the work carried out with ONDTyD, at the request of the Gerencia de Desarrollo Científico 8 of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina), we: • Developed a Research Data Management Plan for CONICET. The DMP is of a generic nature and has 3 levels of information detail for the presentation of a project:

CONCLUSIONS
It is fundamental to acquire an appreciation of the discipline and to know research practices and workflows of specialised research groups in the thematic area. It is also important to allow for constant feedback from research groups and/or researchers in each thematic area to reach consensus in regard to data lifecycle, data management plans, metadata, etc.
The DMP enables researchers to plan the creation and collection, as well as the organisation, of data. A good DMP will multiply the possibilities for data use, re-use, and the impact of research in the scientific community and in society at large.
The requirement of a DMP by institutions that manage and fund research in Science and Technology constitutes an important input for diagnosis and prediction, necessary for the development of infrastructure and for the evaluation and measurement of potential and/or real impact (social, economic cultural, etc.) that a piece of research and its funding imply.
ONDTyD's digital platform to manage their DMPs was developed and implemented. The platform should be flexible, modular and interoperable with repositories of data, publications, etc.
Training and support of the researchers at ONDTyD have proved vital elements to success with the implementation and development of DMPs, the implementation of which will facilitate future use and re-use of data.

SURVEY: Is your institution ready for managing research data?
The LEARN project has compiled the following survey as a self-assessment tool to assist institutions discover how ready they are for managing research data. The survey is based on the issues posed to institutions by the LERU Roadmap for Research Data published at the end of 2013, and available at: http://www.leru.org/files/publications/AP14_LERU_Roadmap_for_Research_data_final.pdf.
The survey has thirteen questions addressing the main elements to be taken into account in developing an institutional strategy for research data management. Each question has three possible answers represented by green, yellow or red answers. The more 'green light' responses recorded, the readier an institution probably is for managing its research data. We encourage you to complete the questionnaire online which is available at: http://learn-rdm.eu/en/rdm-readiness-survey/, with a link straight through to the questionnaire at http://goo.gl/forms/m6PGJ34tGr. The survey is available in both English and Spanish.
The Survey is iterative, in that it can (once taken) be re-taken at regular intervals. Changes in the scores will themselves illustrate the level of progress made in the intervening period.

LEADERSHIP
My institution has a steering committee on research data My institution is working ion setting up a working group to develop services and policies on research data There is no dedicated group on research data at my institution

ROLES
My institution has established new roles to steward the management of research data Some staff are shifting part of their work to involve the management of research data There is no one dedicated to research data

INFORMATION (SERVICES)
My institution has an information point/helpdesk/webpages on research data management There is someone at/in the university library/research office who can give advice on research data management to researchers No service at my institution provides clear information on research data management

DISSEMINATION (AWARENESS)
My institution has created some materials on the management of research data There are some links with information on research data on the library/research office website Researchers need to look outside my institution for information on the management of research data

INFRASTRUCTURE
My institution provides an infrastructure to manage research data through the complete research cycle My institution provides some services for managing data but not through the complete research cycle Researchers need to use external facilities to manage their data

COST MODEL
My institution has established a list of free and paid for services based upon an analysis of costs My institution offers some services for free and some need to be paid but there is not a public list of paid services There has not been any analysis regarding the cost of managing research data at my institution

LEGAL
There is a protocol to define who is the owner of research data produced My institution has a policy on intellectual property rights (IPR) but there is no mention of research data My institution does not have a policy on IPR

SELECTION OF DATA
There are protocols, laid down by bodies such as the university or the research funder, to define which data has to be kept, shared, archived, etc My institution gives some advice about the preservation of research data My institution has not established any guidance about which research data should be kept

PUBLICATION AND SHARING
There are protocols, laid down by bodies such as the university or the research funder, defining which data has to be published, where and under which terms of use My institution allows researchers to publish research data in our institutional repository or in a disciplinary repository (outside the institution) My institution does not have a protocol or a place to publish research data

TRAINING
My institution has scheduled regular training sessions on research data management addressed to researchers, students and staff My institution offers training sessions in research data management upon demand There are no training sessions about how to manage research data

REVISION AND UPDATES
My institution has established a roadmap to review and, if needed, update its policy and services on research data My institution is developing services for managing research data, but there is no scheduled calendar for reviews My institution has yet to start the conversation to create a working group on research data

OPEN DATA
My institution publishes research data openly by default and it has established a set of exceptions to waive this policy My institution allows researchers to share data openly but there is no formal policy established My institution does not publish any data openly

POLICY AND LEADERSHIP
The LERU Roadmap advocated that 'Every LERU member should develop and promulgate an institutional data policy'. 2 The LEARN Toolkit provides the tools to do this, with a model RDM policy and guidance in Part 2 of the compilation. Additionally, the Case Studies support the call for policy leadership and alignment. Case Study 1 from the Wellcome Trust argues that there is broad agreement on policy amongst research funders on the importance of RDM, whilst identifying key challenges which remain. The Executive Briefings in six languages in Part 3 are designed for senior decision makers, to support them in delivering sound solutions. Case Study 2 describes the process of developing a model RDM policy for Austria, based on the LEARN template, which acts as a framework and which can be customised at a local level. Case Study 3 looks at Brexit and its potential impact on Open Science, concluding that perhaps the greatest threat currently lies in a possible lack of engagement in the UK with the European Open Science Cloud. Case Study 4 looks at linking the practice of RDM with research integrity frameworks.

ADVOCACY
Many of the Case Studies are devoted to the theme of advocacy. The LERU Roadmap stressed that LERU members, researchers and research funders should 'Promote best practice in data management, citation and interoperability to increase the visibility of data'. 3 This is true of the Case Studies from both Latin America/the Caribbean and Europe. Some interesting themes around advocacy are identified. Case Study 5 makes the point that RDM advocacy to researchers is in its infancy. Accordingly, qualitative rather than quantitative measures and approaches currently predominate. However, the institution in this Case Study has undertaken a wide-ranging internal survey which will provide a baseline for future activity. Case Study 6 emphasises that what is needed is to identify RDM stakeholders, ensure good communication, and develop implementation plans. Case Study 7 links leadership and advocacy by asking the question 'Who has leadership for RDM at an institutional level in the University of the West Indies?'. Case Study 8 underlines the challenges involved in RDM advocacy. In this institution, after years of activity, the difficulties in changing institutional culture with regard to RDM remain.

SUBJECT APPROACHES
The Case Studies in the LEARN Toolkit look particularly at RDM issues in the Arts, Humanities and Social Sciences. Case Study 9 looks in detail at the challenges and opportunities at UCL (University College London). It identifies that many researchers in these disciplines do not use, or are unaware of, UCLsupported RDM solutions and that there is a need for advocacy to these communities. Case Study 10 is from the Performing Arts. Discussion on RDM has centred on the sciences (in the English sense, excluding the Arts, Humanities and Social Sciences). Because of how Arts projects are funded and structured, there are special problems and challenges relating to RDM -which this Case Study identifies.

TOOL DEVELOPMENT
A number of chapters look at tool development to support RDM. Case Study 21 looks at legal requirements and shows how the use of licences can establish frameworks for sharing, re-use and compliance. Case Study 22 looks to Argentina and the development of Data Management planning, concluding that good Data Management Plans will deliver good research. Finally, chapter 23 looks at the LEARN Readiness survey. The survey allows research performing institutions to assess their level of preparation for RDM by answering 13 questions. Using a traffic light marking scheme of red, amber or green, the survey will be marked and enable those taking the test to see how prepared they are. The test can be taken iteratively, so that over a period an institution can measure its progress in RDM activity.

CONCLUSION
Research data is the new currency of the digital age. From sonnets to statistics, and genes to geodata, the amount of material being created and stored is growing exponentially. The LERU Roadmap identifies a serious gap in the level of preparation amongst research performing organisations. This gulf is prominent in areas such as policy development, awareness of current issues, skills development, training, costs, community building, governance, disciplinary/legal/terminological and geographical differences. The LEARN Toolkit is designed to identify sound solutions and proposals for these challenges and opportunities. By adopting recommended LEARN practices, templates and guidance, all those involved as stakeholders in RDM can introduce best practice into their institutions.
In compliance with intellectual property rights, and if no third-party rights, legal requirements or property laws prohibit it, research data should be assigned a licence for open use. 2 Adherence to citation norms and requirements regarding publication and future research should be assured, sources of subsequently-used data explicitly traceable, and original sources can be acknowledged.
Research data and records are to be stored and made available according to intellectual property laws or the requirements of third-party funders, within the parameters of applicable legal or contractual requirements, e.g. EU restrictions on where identifiable personal data may be stored. Research data of future historical interest and the administrative records accompanying research projects should also be archived.
The minimum archive duration for research data and records is 10 years after either the assignment of a persistent identifier or publication of a related work following project completion, whichever is later.
In the event that research data and records are to be deleted or destroyed, either after expiration of the required archive duration or for legal or ethical reasons, such action will be carried out only after considering all legal and ethical perspectives. The interests and contractual stipulations of third-party funders and other stakeholders, employees and partner participants in particular, as well as the aspects of confidentiality and security, must be taken into consideration when decisions about retention and destruction are made. Any action taken must be documented and be accessible for possible future audit.

RESPONSIBILITIES, RIGHTS, DUTIES
The responsibility for research data management during and after a research project lies with [name of research institution] and its researchers and should be compliant with codes for the responsible conduct of research.

RESEARCHERS ARE RESPONSIBLE FOR:
a. Management of research data and data sets in adherence with principles and requirements expressed in this policy; b. Collection, documentation, archiving, access to and storage or proper destruction of research data and research-related records. This also includes the definition of protocols and responsibilities within a joint research project. Such information should be included in a Data Management Plan (DMP), or in protocols that explicitly define the collection, administration, integrity, confidentiality, storage, use and publication of data that will be employed. Researchers will produce a DMP for every research project. 3 c. Compliance with the general requirements of the funders and the research institution; special requirements in specific projects should be described in the DMP; d. Planning to enable, wherever possible, the continued use of data even after project completion. This includes defining post-project usage rights, with the assignation of 2 Concrete recommendations for licensing should be listed and be available to the researchers. 3 A Data Management Plan (DMP) is a structured guideline (document or online tool) which depicts the entire lifeline of data and can be updated if needed. Data management plans must assure that research data are traceable, available, authentic, citable, properly stored and that they adhere to clearly defined legal parameters and appropriate safety measures governing subsequent use. Ideally, DMPs should be delivered in a machine actionable format.
appropriate licences, as well as the clarification of data storage and archiving in the case of discontinued involvement at the [name of university/research institution]; e. Backup and compliance with all organisational, regulatory, institutional and other contractual and legal requirements, both with regard to research data, as well as the administration of research records (for example contextual or provenance information); f. To ensure appropriate institutional support, it is required that new research projects are registered at the proposal stage at [name of research institution/central body].

THE [NAME OF RESEARCH INSTITUTION] IS RESPONSIBLE FOR:
a. Empowerment of organisational units, providing appropriate means and resources for research support operations, the upkeep of services, organizational units, infrastructures, and employee education; b. Support of established scientific practices from the beginning. This is possible through the drafting and provision of DMPs, monitoring, training, education and support, while in compliance with regulations, third-party contracts for research grants, university/ institutional statutes, codes of conduct, and other relevant guidelines; c. Developing and providing mechanisms and services for the storage, safekeeping, registration and deposition of research data in support of current and future access to research data during and after the completion of research projects; d. Providing access to services and infrastructures for the storage, safekeeping and archiving of research data and records, enabling researchers to exercise their responsibilities (as outlined above) and to comply with obligations to third-party funders or other legal entities. > Research data is one part of the knowledge capital of research institutions. In data-driven science, good data management promotes discovery, efficiency, and increases reliability by ensuring consistent quality with a high level of comparability. The policy may be strongly connected to strategic alignments and strategic management. It could help in building the bridge from technical requirements to skills and competencies.

VALIDITY
> Research data management is considered as a whole in the policy (including research records, methods, software, code etc.).
> These principles will determine the organisation's behaviour.
> These principles also apply to the behaviour of individuals within the institution.
> The policy (with annexed documents) should contain definitions, indicating answers to these questions: • What is "research data"?
• Who is a "researcher"?
> The following should be clear: • Authorship of the policy. It should be clear who defines the policy ("the speaking entity") and why this entity (author of the policy) defines the policy. What is the role of "the speaking entity" (authorship)?
• Aim of the policy. Why does a research institution/institute have a policy? What is the goal of the policy? What does the institution want to achieve?
• Subject. According to the statutes of the institution and its published guidelines: What is the subject of the policy?

Preamble
Refers to Point 1 of the Model Policy The preamble describes the context: > It is an introductory statement or a description of an initial situation.
> It defines why there should be a policy and how to contextualize it within the institution. This part has to be localised by each institution and aligned with the prevailing philosophy and mission of the institution.
> Scientific disciplines and organizations produce and manage different types of materials which might have different guiding principles. It is essential that consistency is brought to the field in the form of research institution/institute-level policies.
> The fundamental truths or propositions that serve as the foundation for the chain of reasoning of the policy should be described.

Jurisdiction
Refers to Point 2 of the Model Policy > The scope of the policy must be defined according to space and time.
> The relationship between the policy and research institution/institute and non-research institution/institute guidelines and statutes must be clarified in the policy.
> Compliance with legal and contractual provisions must be maintained.

Intellectual
Property Rights

Refers to Point 3 of the Model Policy
According to the FAIR principles, the fundamental purpose of rights definition is to encourage re-use and collaboration.
> In this section, rights must be defined according to the questions: • Who owns research data?
• And who holds rights in such data?
This is a fundamental question. With regard to research data protected by law, this question can be answered by legal advisers.
> The following aspects must be considered: • terms of use • questions of licensing and subsequent use of data • data protection aspects, including relevant legal requirements • privacy rights, usage rights, exploitation rights and copyrights > In cases where no law fittingly applies to a specific piece of research data, the policy will apply to intellectual property rights, etc.
> The policy must take into account all contracts made with funders, as well as contracts between researchers and their institutions, which have precedence.
You might include the following sentence: The research institution will make research data available under an open licence, unless legal obligations, third party rights, intellectual property rights and privacy rights preclude this. The licence is selected according to the type of data and in order to label the data and facilitate its utilization. An example for a Source Code Licence would be the General Public Licence (GPL). For all other kinds of data, CC0 or CCBY licences can be used. Data which are not subject to any copyright restrictions should be clearly marked as such with for instance the Creative Common Public Domain Mark. In some cases copyright belongs to the institution that employs the researcher, so there may be a question regarding who has the right to choose a licence.

Handling research data
Refers to Point 4 of the Model Policy > This section refers to all processes for dealing with one's own and other people's data throughout and after the scientific discovery process.
> The policy refers to any research data generated within the institution, for instance in education, cultural heritage and institutional management.
> It is important to define how research data are to be changed, documented, used, secured, archived, publicized and the conditions under which data may subsequently be used. Thus, this section reflects the FAIR data principles, meaning that data are Findable, Accessible, Interoperable and Re-usable.
> It should be clear which exceptions exist in the policy and to what extent they apply. This may also concern the "right to be forgotten" (deletion of data).
> Concerning deletion (deleting): This defines which data can or must be deleted and who decides to carry this out.
> Concerning retention of data: The minimum recommended period for retention of research data is 10 years. However, in some particular cases it should be considered that: • for short-term research projects that are for assessment purposes only, such as research projects completed by students, retaining research data for 12 months after the completion of the project may be sufficient • for some research projects retaining research data for 15 years or more may be necessary (e.g. clinical trials) • for other areas (e.g. gene therapy, seismological data), research data must be retained permanently • institutional • faculty-wide (or other organizational units) • discipline-wide • group(s) of people covered: such as research staff, research support staff, IT services, students > The scope and coverage of the policy should be checked: • Does the policy include all research data?
• Does the policy include/exclude a selection of the non-digital results of research processes?
> Regulations concerning the responsibilities, rights and duties of the following persons and institutions should be formulated with regard to research data: • researchers and research data producers (e.g. PhD students) • funders and funders' regulations (the policy should acknowledge that funders have rights and regulations, and show that these will be given precedence where appropriate) • institutions • research supporting entities (for example, libraries, IT services, research support centres, etc.) > If necessary, there should be a recommendation for institutional research infrastructure.
> Questions around the costs of RDM (including stewardship of data) as stated in a data management plan (DMP), as well as who bears those costs, should be well defined. This could also include costs that occur after a project has ended.
> It is important to define roles, responsibilities and competencies in order to assign objectives and define time frames. Relevant questions: • Who is in charge of ensuring legal compliance?
• Who will provide legal advice?
• Who is in charge of the quality of the content?
• Who is in charge of defining acceptable formats?
• Who is in charge of maintaining the currency of formats over time?
• Who will provide technical support?
• Who will promote services?
• Who will provide training?
Approval of the policy, periodic review, validity and timeline Refers to Point 6 of the Model Policy > This pertains to the date of release of the policy and how long the current policy will be valid. This can be done on a regular basis, which may be externally defined, or based upon needs. The key dates must be included.
> The policy should be subjected to periodic review. The changes in each revision must be listed.
> The relevant questions here are: • How long are the terms of the policy valid?
• Who/which body is responsible for reviewing and updating the policy?
• What should be done after the end of the defined timeline or period?  ). This is a one-off charge and guarantees secure data storage for ten years. " University of Leeds: Guide for costing and infrastructure planning is available on the website. Researchers should seek to recover the direct costs of managing research data generated by projects from the research funder Aalto University: Opening access to research data shall be implemented in a cost-effective manner Radboud University: "Previous research suggests that a centralised service for data management at Radboud University would be more cost effective than management at an institutional level." Universität Göttingen: "Specific requirements have to be aligned among all stakeholders and may involve additional funding." • Where research is carried out under a grant or contract: terms of agreement will determine ownership • Where no external contract exists: University normally has ownership of primary data generated in the course of research undertaken by researchers in its employment • University does not automatically own student Intellectual Property (IP) Suitable agreements for ownership should be established and agreed in writing by parties concerned before a project starts University of Glasgow: Researchers have to: • "Clearly state who owns the data that are being generated through the research activity. Where this is not clear, researchers will work with IPR specialists in Research Strategy and Innovation, the Library and College support teams to verify data ownership as early as possible in the research data lifecycle." • "Ensure that, when leaving the University (for retirement or a position elsewhere), data of long-term value which were generated using University resources are deposited in the Institutional Data Repository for longterm storage and preservation." University of Leeds: Responsibilities of the responsible owners • "Work with IT Services and College IT teams to identify storage requirements that may exceed those currently offered by the institution.
• "Store their data during the course of their research in accordance with guidance from IT Services and funder requirements." • "Deposit data in a reputable repository for long term preservation and sharing." | University Services have to "provide a dedicated institutional research data repository with appropriate security and backup." University of Leeds: All relevant research data should be offered and assessed for deposit and preservation in an appropriate University, national or international data service or domain repository: Guidance University of Oxford: Planning for the ongoing custodianship (at the University or using third party services) of data after the completion of research or, in event of departure or retirement from the University | Agreement with the head of department/faculty as to where data will be located and how this will be stored It was NOT taken into consideration * It was PARTLY taken into account * University of Oxford: "(…) Where research is supported by a contract with or a grant to the University that includes specific provisions regarding ownership, retention of and access to data, the provisions of that agreement will take precedence." University of Helsinki: "This policy does not cover the physical resources on which research data are based (e.g., paper materials) or the use of biological research material." University of Turku: "The data policy does not apply to physical and biological materials and the University's practices related to them are presented in the research infrastructure policy of the University of Turku."  (October 2015). Contains also commitment: "(…) Funders require that research data is preserved after the end of a project (typically for at least 10 years). There is a cost to the technical curation of data which cannot be built into project funding, therefore the University is committing to meeting these costs". The policy is only available on the website of the university (not as pdf document). Detailed policy addressing most of the identified main topics. The focus is on the responsibilities of the University, staff and students; e.g.: "The University is responsible for managing a dedicated website providing guidance for the University's academics in good data management practice." Contains also a collection of RDM policies of major research funders in the UK.

Research Data Management Stewardship
It is important that researchers plan the collection, curation, description and dissemination of their research data at the start of their research. This information is best captured in a Research Data Management plan, which provides a framework for research data stewardship. 5

Infrastructure
To curate their research data, researchers and research performing organisations need access to the requisite digital eco-systems. These may be maintained locally; or they may be commercial services, subject domain offerings or regional/national/international platforms. Different subject communities and individual countries will want to provide such facilities in different ways. Commonly, the platform(s) will need to offer the following services: • Storage, for researchers who are actively collecting data; • A publication platform, where research data and related software can be made available for sharing and re-use; • Archive facilities, to allow research data to be curated for the long term, often in response to the requirements of research funders; • A discovery service, which will allow researchers and citizens to search for research data deposits both locally and across the Internet.
The European Commission is promoting the European Open Science Cloud. 6 The EOSC is a metaphor to help convey both seamlessness and the idea of a commons based on scientific data. The EOSC will be a federated environment for the sharing and re-use of scientific data, based on existing and emerging elements in the Member States, with lightweight international guidance and governance and a large degree of freedom regarding practical implementation.

Training
The prevalence of research data requires all researchers, new and established, to equip themselves with the skills and tools to be confident in a data-driven environment. The lead needs to be taken by research performing organisations and, in many cases, by their institutional libraries.

Funding
Research data management comes with costs. There is no one method for assessing these costs, but a number of costing models exist to help, for example the 4C Project. 7

Conclusion
Research data can drive innovation and stimulate new discoveries, to the great benefit of Society. All stakeholders in the research workflow have a role to play. This Executive Briefing highlights what researchers and research performing organisations need to do to rise to this exciting challenge.

Administración de los datos de investigación
Es importante que los investigadores planifiquen la recopilación, conservación, descripción y difusión de sus datos al inicio de su actividad investigadora. La mejor manera de registrar esta información es mediante un Plan de Gestión de Datos, que ofrece una buena estructura para la administración de los datos de investigación. 5

Politique des données de la recherche
Tout organisme de recherche devrait adopter une politique des données de la recherche énonçant les responsabilités dont sont investis les chercheurs lorsqu'ils reçoivent un financement. Le projet LEARN a créé un modèle de politique pour la gestion des données dans les organismes de recherche; ce modèle est accompagné de conseils pour mettre en place une telle politique. 2 Le modèle de politique proposé peut être adapté et adopté par chaque organisme mais aussi par des consortiums régionaux, nationaux et/ou internationaux. Plans de gestion des données de la recherche Il est recommandé aux chercheurs de planifier la collecte, le traitement, la description et la diffusion de leurs données dès le début de leur recherche. Ecrire un Plan de gestion des données de la recherche (Data Management Plan) permet de rassembler ces éléments et d'établir un programme de gestion durant la recherche. 5

Conclusion
Les données de la recherche peuvent favoriser l'innovation et stimuler de nouvelles découvertes pour le plus grand bénéfice de la Société. Tous les acteurs du cycle de la recherche ont un rôle à jouer. Cette Synthèse met en valeur ce que les chercheurs et organismes de recherche doivent faire pour être à la hauteur de ces défis passionnants.