bandeauEGC2016

Conférenciers invités

 

Toon Calders (Président d'honneur) - Université Libre de Bruxelles, Belgium

Bio:  Toon Calders obtained his PhD at the University of Antwerp in Belgium in 2003. He recently joined the ULB in 2012 where he holds the Chair in Business Intelligence, after working for 6 years at the Eindhoven University of Technology in The Netherlands as an assistant professor in the department of mathematics and computer science. His main research interests are Business Intelligence, Knowledge Discovery in Databases, machine learning and data mining. Toon Calders is area editor of the Springer Data Mining journal, he was one of the program chair of the ECMLPKDD conference in September 2014, and will be one of the program chairs of the Discovery Science conference in 2016. Toon Calders published over 60 papers in the data mining area including 17 journal papers in high-level data mining venues (DMKD, KDD, ICDM, SDM, ECML/PKDD). His research interests include pattern mining, entity resolution, discrimination and fairness aware data mining, and data stream processing.

Title: Fairness-Aware Data Mining

Abstract: In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. Recently this topic received considerable interest both in the research community as well as more general, as witnessed by several recent articles in popular news media such as the New York Times. In this talk I will introduce and motivate research in fairness-aware data mining. Different techniques in unsupervised and supervised data mining will be discussed, dividing these techniques into three categories: algorithms of the first category adapt the input data in such a way to remove harmful biases while the second adapts the learning algorithms and the third category modifies the output models in such a way that its predictions become unbiased. Furthermore different ways to quantify unfairness, and indirect and conditional discrimination will be discussed, each with their own pros and cons. With this talk I hope to convincingly argument the validity and necessity of this often contested research area.


Marcin Detyniecki - AXA Data Innovation Lab, France

Bio: Marcin Detyniecki is senior R&D officer at the Data Innovation Lab recently created by AXA.  Additionally, he is professor at the Polish Academy of Science (IBS PAN) and associate researcher at the computer science laboratory LIP6 of the University Pierre and Marie Curie (UPMC). Currently, his research focuses on the emerging challenge popularly named big data.  In the past, he has worked on the usage of new media, with challenges ranging from multimedia information retrieval to image understanding. Several of the developed applications have not only been deployed in the market, but they have also been singled out in international competitions such as TrecVid, ImageClef, MediaEval. This applicative success is the results of a dialogue with more theoretical works on topics such as new challenges in approximate reasoning, information aggregation and fusion, and machine learning from a computational intelligence perspective. Marcin Detyniecki studied mathematics, physics and computer science at the University Pierre and Marie Curie (UPMC) in Paris. In 2000 he obtained his Ph.D. in Artificial Intelligence from the same university. Between 2001 and 2014, he was a research scientist of the French National Center for Scientific Research (CNRS). He has has been researcher at the University of California at Berkeley and at Carnegie Mellon University (CMU). He has been visiting researcher at the University of Florence and at British Telecom Research labs. Today he is member of the research and academic council of UPMC University, member of the executive board of laboratory SMART, elected member of the LIP6 laboratory council, and member of the editorial board of the International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJFUKS). He also funded and animated until 2014 the UPMC – Sorbonne Universités Computer Science Colloquium. Dr. Detyniecki has over 90 publications in journals and conference proceedings, including 6 keynotes.

Titre :  Extraction de connaissances liées aux nouveaux défis de l’assurance

Résumé : La quantité de données dans notre monde a explosé et l’analyse de grands ensembles de données – aussi connu dans l’industrie sous le nom "Big Data" – deviendra un atout majeur de compétitivité, principalement dû à une croissance de productivité et surtout grâce à plus d’innovation. L’assurance vie une révolution tout particulière. L’assureur, traditionnellement gestionnaire du risque en s’appuyant sur une longue expérience, qu’on traduirait aujourd’hui par une captation systématique de données, est après la révolution numérique partiellement exclus de canaux digitaux. Les défis scientifiques sont nombreux, allant de la captation non-intrusive de la donnée, à la visualisation et gestion de connaissances extraites, en passant par de l’apprentissage artificiel pour pouvoir servir à de millions d’utilisateurs simultanément. Dans cette présentation nous couvrirons rapidement chacune de ces thématiques avec une attention particulière aux défis scientifiques sous-jacents.  Nous illustrerons notre propos par un exemple phare de cette révolution : la famille d’offres d’assurance dite « pay as you drive » où généralement on obtient une décote ou réduction en fonction de sa façon de conduire.

 

Sašo DžeroskiJožef Stefan Institute, Ljubljana, Slovenia

Bio:  Sašo Džeroski is a scientific councillor at the Jozef Stefan Institute and the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, both in Ljubljana, Slovenia. He is also a full professor at the Jozef Stefan International Postgraduate School. His research is mainly in the area of machine learning and data mining (including structured output prediction and automated modeling of dynamic systems) and their applications (mainly in environmental sciences, incl. ecology, and life sciences, incl. systems biology). He is co-author/co-editor of more than ten books/volumes, including "Inductive Logic Programming", "Relational Data Mining", "Learning Language in Logic", "Computational Discovery of Scientific Knowledge" and "Inductive Databases and Constraint-Based Data Mining". He has participated in many international research projects (mostly EU-funded) and coordinated two of them in the past. He is currently the coordinator of the FET XTrack project MAESTRA (Learning from Massive, Incompletely annotated, and Structured Data) and one of the principal investigators in the FET Flagship Human Brain Project.

Title: Learning from Massive, Incompletely Annotated & Structured Data

Abstract: The MAESTRA project (http://maestra-project.eu/) addresses the ambitious task of predicting different types of structured outputs in several challenging settings, such as semi-supervised learning, mining data streams and mining network data. It develops machine learning methods that work in each of these settings, as well as combinations thereof. The techniques developed are applied to problems from the area of biology and bioinformatics, sensor data analysis, multimedia annotation and retrieval, and social network analysis. The talk will give an introduction to the project and the topics it addresses, an overview of the results of the project, and a detailed description of selected techniques and applications: Semi-supervised learning for structured-output prediction (SOP) and SOP on data streams will be discussed for the task of multi-target regression (MTR), as well as applications of MTR for the annotation/retrieval of images.

 

Jérôme Euzenat - INRIA et Université Grenoble-Alpes, France

Bio : Jérôme Euzenat est directeur de recherches à l'INRIA. Il dirige l'équipe Exmo du Laboratoire d'Informatique de Grenoble qui se consacre particulièrement à l'interopérabilité sémantique. Ces dernières années il a fortement contribué au développement de l'alignement d'ontologies: il est coauteur de l'ouvrage de référence sur le sujet. Avec ses collègues, il prolonge ce travail vers le liage de données RDF.

Titre : extraction de clés de liage de données

Résumé : de grandes  quantités de données sont publiées en RDF sur le web. Les lier consiste à identifier les mêmes ressources dans deux jeux de données permettant l'exploitation conjointe des données publiées. Mais l'extraction de liens n'est pas une tâche facile. Nous avons développé une approche qui extrait des clés de liage (link keys). Les clés de liage étendent la notion de clé de l'algèbre relationnelle à plusieurs sources de données. Elles sont fondées sur des ensembles de couples de propriétés identifiant les objets lorsqu'ils ont les mêmes valeurs, ou des valeurs communes, pour ces propriétés. On présentera une manière d'extraire automatiquement les clés de liage candidates à partir de données. Cette opération peut être exprimée en tant qu'analyse de concepts formels. La qualité des clés candidates peut-être évaluée en fonction de la disponibilité (cas supervisé) ou non (cas non supervisé) d'un échantillon de liens. La pertinence et de la robustesse de telles clés seront illustrées sur un exemple réel. Enfin on discutera d'extensions de ce travail. Travail conjoint avec Manuel Atencia, Jérôme David et Amedeo Napoli.

 

Tias Guns - KU Leuven, Belgium

Bio: Tias Guns is a post-doctoral fellow at the DTAI lab of the KU Leuven. His research lies on the border between data mining and constraint programming, and his main interest is in combining methods from both fields. As part of his PhD, he has developed the CP4IM framework which showed for the first time the potential of using constraint programming for pattern mining. His PhD was awarded with both the constraint programming dissertation award and the ECCAI artificial intelligence dissertation award. He is an active member of the community and has organized a number of workshops and a special issue on the topic of combining constraint programming with machine learning and data mining.

Title: Towards generic and efficient constraint-based mining, a constraint programming approach

Abstract: In todays data-rich world, pattern mining techniques allow us to extract knowledge from data. However, such knowledge can take many forms and often depends on the application at hand. This calls for generic techniques that can be used in a wide range of settings. In recent years, constraint programming has been shown to offer a generic methodology that fits many pattern mining settings, including novel ones. Existing constraint programming solvers do not scale very well though. In this talk, I will review different ways in which this limitation has been overcome. Often, this is through principled integration of techniques and data structures from pattern mining into the constraint solvers.