SOCA

With the massive collection of electronic health records, clinical notes, and biomedical data, there is today a great opportunity for the identification of health events that may help early diagnosis and improved therapeutic interventions. Moreover, the Internet is a continuous collector of health and well-being information, which is shared by citizens in many distinct social media networks.

In this work package we propose to monitor a campus community, by collecting personal well-being indicators and environmental measurements, to relate them with particular health conditions or academic activities.

The work package will also handle scientific and technical dissemination, as relevant for the project.

The main tasks are the following:

Data	Milestones	Description
2017-10-16	Extraction of events from social media and community forums (M4-M21)	Despite the continued developments in text mining, there are still important computational and technical challenges that hinder reaching the full benefits for science and health care. The first task that this project will address is the extraction of events from forums and social media (e.g. mentions of well-being indicators, health conditions, sentiments, opinions). While current methods achieve good performances for simpler event types, more complex events still pose important challenges. Emoticons, abbreviations, negation and speculation, are other challenges that have not yet been adequately solved. During this task, we will investigate and improve methods for extracting fine-grained information about health and well-being, from the social media networks. For this, we will benefit from pre-processing techniques developed inside Task 3.2, which will be enhanced with the extension of our previous work on tweets analysis [Prieto 2014] and in the detection of event trigger words [Campos 2014b] to the identification of the participants in the event or relation, and apply natural language processing (NLP) and lexico-syntactic rules to adapt the event enrichment step to specific categories of events. Another important challenge has to do with the ambiguity that is highly present in entity names and acronyms in the biomedical and clinical domains. While knowledge and graph-based approaches have been shown to achieve high levels of performance in this task, these are usually slower and require higher computational resources than machine-learning (ML) methods, for example. ML methods, on the other hand, face limitations due to insufficient training data. To exploit the benefits of both approaches, we will use a combined strategy with knowledge-based approaches and ML with distant supervision. This information will be stored inside the infrastructure developed inside Task 3.3.
2017-10-16	Semantic enrichment of events and opinions (M6-M27)	Semantic interoperability is an important aspect to be addressed in our big data collection. The Semantic Web arises as a ground-breaking paradigm to foster the intelligent integration of structured information. Sustained by state-of-the-art standards (such as RDF, OWL, SPARQL and LinkedData), Semantic Web promotes better strategies to express, infer and make knowledge interoperable. Latest advances in the area cover the research and development of new algorithms to further improve how we collect data, transform data into meaningful knowledge assertions, and publish connected knowledge. To address this challenge, we will take advantage of COEUS [Lopes 2012], a semantic web framework developed in our group, recently enhanced with RESTful services over a transactional triple store (https://github.com/bioinformatics-ua/scaleus). We will extend this work to create a semantic knowledge base that will be used to store the extracted data, integrate it with existing domain knowledge resources, and provide interoperable services and tools for using this data in external pipelines. In close collaboration with WP3, we will also evaluate the performance of multiple storage types (e.g. relational, NoSQL) as well caching solutions, aiming to take advantage of cloud computing and infrastructures. Structuring the extracted information in this semantic KB, supported by semantic web standards, will provide new ways of exploiting the scientific discoveries for driving new findings and aiding and supporting the “campus health”.
2017-10-16	Context-aware recommender system (M6-M32)	The information collected through Task 4.1 and Task 4.2 will be processed to extract knowledge that can be used to create a context-aware recommender system. The final aim of this task is the development of recommendation models which are context sensitive, and that through the use of specific data mining and machine learning techniques may provide recommendations for the campus actors and to enrich their user experience in the daily life. To achieve this goal, we will explore several neighbourhood-based methods as also context dimensions such as textual, social, space and time. The overall service will be made available through the service interfacing developed inside Task 3.3.

WP4 - Monitoring personal health and well-being