Current Projects


Credibility Analysis in Health Communities:

There has been an increasing number of people asking for medical advice or health-related opinions online. In the context of healthcare and medicine, inaccurate information from untrustworthy members may cause serious hazards and must be taken with caution. This project aims to analysis credibility of user statements in health communities, considering features specific to online communities to generate useful insights.  [More]


Trend Analysis and Prediction in Scholarly Documents:

With increasing volume of scientific documents added existing archives every day, it is difficult yet crucial to track the emergence of the research topics and its impact. This project aims to address some of the issues of analysis of big scholarly corpus and other contemporary signals to generate useful insights. [More]


Mobile App Recommendation:

This project aims to build recommendation systems for App Stores. The model is developed from a graph-based approach, and it utilises Twitter information which can precede formal user ratings in app stores, as well as version information which is specific to mobile apps. [More]


MOOC Wikification:

This project aims to build a system which is able to identify the resources mentioned and referenced in the discussion forums of MOOC platforms and link to the actual location automatically. It provides learners the ability to combine all resources in a more convenient way. [More]


NER In Legal Domain:

This project is in collaboration with INTELLLEX (a tech start-up for law), and aims to increase the precision of existing Named Entity Recognition systems. While not restricting on the types of people, location, etc, the project has been extended to the scope of Legal terms. [More]


A Web Based Dashboard for MOOC Instructors:

This project is proposed in assistance of the Instructor Intervene in MOOC Discussion Forums [link], and aims to build a system which takes in generic forum threads (from Coursera, edX, etc), and outputs the threads in the order of importance such that instructors are able to intervene on time. [More]


Verb Duration Discovery:

This project aims to discover the relationships between verbs and durations. So if given a verb and a situation, we can predict how long the action will last. For example, given a verb “eat”, a situation “I eat sandwich”, we can predict the action “eat” will last for a couple of minutes. [More]


Snippet Generation for MOOC Discussion Forums:

This project aims to identify relevant sentences that are significant in MOOC Discussion Forum threads to generate a summary. [More]



NUS MOOC Corpus: Crowdsourcing annotations to study instructor intervention:

 This project proposes to annotate a large corpus of instructor-intervened threads using AMT2, enabling supervised machine learning algorithms to automatically identify interventions that promote student learning. [More]



Investigating Instructor Intervention in MOOC Discussion Forums:

This project aims to design predictive models to identify important threads from MOOC discussion forums. It may allow building of dashboards to automatically prompt instructors of MOOC on when and how to intervene in discussion forums such that good pedagogical practices can be scaled in the context of MOOC. [More]


Implicit Discourse Relation Recognition:

This project aims to leverage on both traditional feature-based and deep learning approaches to improve the recognition performance of PDTB style implicit discourse relation such that it can be made viable for real world applications. [More]


Coursera Crawler:

A crawler for the Coursera website to get the discussion forum data. This crawler depends on PhantomJS to simulate the login process and PycURL to get the target data via hidden APIs. It can be easily extended to get information dynamically displayed on the webpage apart from the discussion forums. [More]



Scholarly Paper Recommendation:

This project aims to propose methods for recommending scholarly papers relevant to a specific researcher’s interests and needs. At the same time, the methods provide serendipitous suggestions as well, such that researchers are able to reach out to other disciplines and areas. [More]




The ParsCit project aims to build a system for two tasks: 1)  reference string parsing (citation parsing, citation extraction), and 2) logical structure parsing of scientific documents. It is architected as a supervised machine learning procedure that uses Conditional Random Fields as its learning mechanism. [More]



Source Code Plagiarism Detection:

This project aims to build a system which leverages the performance of code plagiarism detection models. Apart from detecting similar code pairwise, the system is able to identify cluster similarity among a group submission, and a Student Submissions Integrity Diagnosis (SSID) system has been developed. [More]



Citation Analysis:

This project aims to build a system which helps to identify citing sentences in research papers by constructing a supervised learning classifier. With the correct classification of those sentences, it is able to provide an assisting editor to indicate whether a specific piece of text needs to be backed up by citation. [More]


Proposed Projects:

You’ll find a list of projects proposed by WING members and if you are interested, please contact the respective person in charge for more details.

Keyphrase Extraction:

Keyphrases are words that capture the main topics of a document. Extracting high- quality keyphrases can benefit various natural language processing (NLP) applications: in text summarization, keyphrases are useful as a form of semantic metadata indicating the significance of sentences and paragraphs, in which they appear; in both text categorization and document clustering, keyphrases offer a means of term dimensionality reduction, and have been shown to improve system efficiency and accuracy; and for search engines, keyphrases can supplement full-text indexing and assist users in formulating queries. [More]

For projects which are earlier than 2011, please refer to here.