Major user-generated content platforms are used by people from across the world who contribute content in many different languages (e.g., Facebook, Twitter, Wikipedia). One of the largest barriers to the sharing of content between all users on these platforms is language. In general, users in each language contribute unique content that is not shared outside of the language. For example, about half of the articles in the German edition of Wikipedia, the second-largest edition, have no equivalent in the English edition of Wikipedia, the largest edition.
Many platforms have added the ability for people to view machine translations of other-language content in order to enable the wider spread of content across languages. However, the short and informal nature of much user-generated content results in poor-quality translations. Furthermore, machine translation is not available for speakers of many smaller-sized languages who are coming online for the first time and arguably are most in need of translation since there is generally little content available online in their languages.
This project investigates the mechanics of successful human, crowd-sourced translations of user-generated content. The project conducts online experiments to understand what translation rating/voting systems work best and the optimal display of translated content. The project is being carried out in cooperation with industry partners including Meedan, who are building tools to help facilitate the translation of social media content.