SERISS thesaurus evaluation: final results

Lorna Balkan

Project aims

In recent years, ELSST has benefited from two strands of funding. Development work on the thesaurus content and software has been funded by the ESRC-funded CESSDA-ELSST project, and continues under the EU-funded CESSDA Vocabulary Services Multilingual Content Management (CESSDA VOICE) project. Complementary work on the assessment of the translation quality of terms has been carried out as part of the Synergies for Europe’s Research Infrastructures in the Social Sciences (SERISS) project, which is also funded by the EU. The SERISS work has now been completed.

The SERISS project investigated two methods for assessing the translation quality of ELSST terms. The first method was back-translation, which was performed on a subset of ELSST terms in two of the target languages: French and German. The work was very labour-intensive, but helped to identify cases where translations differed semantically and stylistically from the source terms. This was reported previously (see the First results of SERISS project).

The second quality assessment method, discussed here, involved comparing the set of index terms assigned to the same resource in different languages. Two types of resource were chosen: whole studies versus individual questions.

Whole-study indexing

For the whole-study indexing, a comparison was made of the ELSST terms used to index the same set of cross-national surveys in different languages. The indexing had been carried out previously by members of Consortium of European Social Science Data Archives (CESSDA ERIC) or CESSDA-related archives, according to their own indexing procedures. These procedures differed widely, with some archives assigning index terms at more granular levels than others. Consequently the results, discussed in First results of SERISS project, were difficult to compare. Moreover, whole study indexing produces an alphabetic list of terms where it is difficult to see which term relates to which part of a study.

Question indexing

Indexing individual questions is not currently practised by any of the CESSDA and CESSDA-related archives, but a small sample was selected and indexed as part of the SERISS project. The questions were taken from three surveys (the European Social Survey (ESS), the European Values Study (EVS) and the Survey of Health, Ageing and Retirement in Europe (SHARE)), and indexed with ELSST terms in German, Greek and Romanian. As before, the results were analysed, not only to compare how consistent the indexing was between indexers (and thereby uncover any potential problems with the terms or their translations), but also to see how well they covered the semantic content expressed in the questions.

This time, it was easier to see how the terms related to the object indexed (i.e. the question text) and to identify problems such as ambiguity (where a term’s meaning was not clear in the source and/or target language) and redundancy (where there was too great an overlap of meaning between two or more terms in the same language). However, other factors besides the properties of the terms themselves influenced the indexers’ choice of terms. In some cases, differences in how questions were worded in the different languages had an impact on the indexing terms chosen, and despite the fact that all indexers were using the same indexing instructions, each indexer interpreted them slightly differently. The experiment also revealed cases where the semantic content of the questions could not be adequately covered by ELSST terms, resulting in some new term suggestions. More details can be found in the Report on application of indexing terms in the data lifecycle.

Project impact

Overall, the SERISS work proved valuable in highlighting issues with ELSST terms and their translations. These issues cover semantic as well as more formal/stylistic aspects of terms. The results have been used to produce Guidelines for the management of ELSST content, which in turn has been used to update ELSST translation guidelines and training, and inform ongoing thesaurus development and translation work.

Besides improving the translation quality of ELSST terms, the SERISS work will also be of interest to those investigating how to index data, i.e. what to index (whole studies, questions and/or variables) and where in the data lifecycle indexing should be carried out.

Results from the SERISS thesaurus evaluation

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s