Summer began early for many of us who had travelled to Bergen to attend IASSIST 2016 held from 31 May to 3 June and we were grateful for the wonderful hospitality provided by Norwegian Centre for Research Data (NSD).
CESSDA colleagues were well-represented at this year’s conference. The opportunity to meet Taina Jääskeläinen, Gry Henriksen and Irena Vipavc Brvar was particularly welcome.
Source: John Shepherdson. Bjorn Henrichsen, Director of NSD, with Heidi Tvedt and Gry Henriksen
The IASSIST Blog reports on the conference and provides an excellent summary of a selection of presentations in the parallel sessions. Some CESSDA work not mentioned in this blog includes a poster on a CESSDA Work Plan Task presented by Anne Etheridge and her co-authors, Wolfgang Zenk-Möltgen and Mari Kleemola, and John Shepherdson’s outline of the forthcoming CESSDA Research Infrastructure (CRI).
In the parallel sessions entitled ‘Big Data, Big Science’ Aidan Condron and I presented complementary papers. Aidan introduced the UK Data Service’s ‘big data’ architecture. The Service is in the process of designing an open data platform for social science which is implemented through a data lake. When complete it will enable social scientists to analyse resources ranging from large and complex datasets to combinations of data sources. Taking up a thread in Matthew Woollard’s IASSIST plenary address, in which he said that it is most useful now to talk about ‘new and novel’ forms of data, Aidan advocated taking an expansive view of how, when, and where data can be collected, stored, linked and analysed. Using a case study from the project Smarter Household Energy Data he demonstrated how exploratory data analysis methods could be employed to prepare the data for use within the social science research community.
My presentation, co-authored by Nathan Cunningham, focused on the requirements for a vocabulary service to augment this open data platform. Such a vocabulary service, it was proposed, could benefit from the use of a classification scheme to organise subject access for the purpose of exploratory data analysis. An application of the Universal Decimal Classification Scheme (UDC) had been trialled within the Archive as a tool to manage subject categories. Aida Slavic, the editor of UDC, has argued that ‘free text’ searching abated the interest in classification throughout the 1980s and 1990s (Slavic, 2008), but notes that the advent of subject gateways somewhat reversed this trend by using classification schemes to support mapping between different indexing systems (Slavic, 2006). These models inspired a trial application of UDC by the UK Data Archive.
The trial demonstrated that legacy classification is not a difficult task. For the same reasons that ‘free text’ searches are successful in the retrieval of important social science research concepts, a generic description of the research topic, via title and abstract, enables the subject content to be quickly classified. We reported on the trial and outlined an application for the use of UDC to support a vocabulary service to augment this open data platform.
Data librarians were well represented at this year’s conference and references were made to the forthcoming publication: Databrarianship: The Academic Data Librarian in Theory and Practice, which promises to be of interest to all colleagues.