Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning for Intelligent Discovery Assistance, In: Planning to Learn 2012, Workshop at ECAI 2012, CEUR Workshop Proceedings, 2012-08-28. (Conference or Workshop Paper published in Proceedings)
 
Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the cor- rectness of workflows is not checked before execution. Hence, the execution of a workflow frequently stops with an error after several hours of runtime.This paper presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole life-cycle of (semi-) auto- matic workflow generation. Our modeling tool eProPlan allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Dis- covery Assistant, eIDA, allows to place workflows into data mining (DM) tools or workflow engines for execution. |
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning, In: European Conference on Artificial Intelligence, Systems Demos, IOS Press, 2012-08-27. (Conference or Workshop Paper)
 
Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the correctness of workflows is not checked before execution. This demo presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole cycle of (semi-) automatic workflow generation. Our modeling tool eProPlan, allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Discovery Assistant, eIDA, allows to place workflows into data mining (DM) suites or workflow engines for execution. |
|
Alon Dolev, File synchronization with distributed version lists, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
Many modern computer users have multiple storage devices and they would like to keep the most up-to-date versions of their documents on all of them. In order to solve this problem, we require a mechanism to detect changes made to files and propagate the most preferable one: A file synchronizer. Many existing solutions need a central server, depend on constant network connectivity, can only synchronize in one way and bother the user with already- resolved version conflicts. We present a novel algorithm which allows for an optimistic, peer-to-peer, multi-way, asynchronous and optimal file synchronizer. It thus allows for changes in disconnected settings, does not require a central server, may synchronize any subset of the synchronization network at any time and it will not report false-positive conflicts. The algorithm improves on the well-known concept of version vectors presented by Parker et al. by allowing for conflict-resolution propagation. We do so by storing an additional bit of information for every version vector element. It is a more space-efficient solution to this propagation problem than the “vector time pairs” presented by Cox et al. and further, it is not restricted to one-way synchronization. We additionally present a novel user interface concept allowing for convenient handling of synchronization patterns. Based on these ideas we developed the file synchronizer McSync in order to show the feasibility of our approach. |
|
Marc Tobler, Natural language processing with signal/collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
Traditional Natural Language Processing (NLP) focuses on individual tasks, such as Tokenizing, Part of Speech tagging (POS) or Parsing. To acquire final results one would usually combine several of these steps in a sequence, thereby creating a pipeline. In this thesis we suggest a new approach to Natural Language Processing (NLP), using parallel combination instead.
We will illustrate our proposal with a Word Sense Disambiguation (WSD) and a Part of Speech (POS) tagger. We start by implementing the PageRank algorithm for WSD and the Viterbi algorithm as a POS-tagger on Signal/Collect - a framework for parallel graph processing. Then we continue by combining the two tasks in a pipeline, using the information gathered from the Part of Speech tagger to increase the performance of WSD. We proceed with our suggestion of a non-sequential combination of the algorithms, combining them into a single algorithm that handles POS tagging and WSD in parallel.
With our thesis, we want to contribute with the following two ideas. Firstly, we want to show that graph theory provides a suitable model for solving selected NLP problems. And we want to prove that modeling such graphs in Signal/Collect is a promising approach, due to the framework’s good scaling and its potential for parallelization. Secondly, we want to suggest a different methodology in solving NLP tasks. We are showing a way how to get away from isolated studies of NLP problems and pipelining to a broadened approach.
We evaluate our algorithms on the Senseval 3 data, comparing the obtained results to a similar approach introduced by Agirre and Soroa in 2009. |
|
David Oggier, Tagging methods for linked media data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
In this thesis, a method is presented to tag the media metadata of a broadcasting company with Linked Data concepts. Specifically, a controlled vocabulary in the form of a thesaurus is used as an intermediary between broadcast metadata and Linked Data vocabularies. A method to link this metadata with appropriate thesaurus entries, as well as an algorithm to align the latter with Linked Data concepts are presented and evaluated. Furthermore, it is investigated whether a benefit is gained for user queries by applying faceted search on the resulting semantically enhanced data. |
|
Thomas Niederberger, Norbert Stoop, Markus Christen, Thomas Ott, Hebbian principal component clustering for information retrieval on a crowdsourcing platform, In: Nonlinear Dynamics of Electronic Systems, IEEE, 2012-07-11. (Conference or Workshop Paper published in Proceedings)
 
Crowdsourcing, a distributed process that involves outsourcing tasks to a network of people, is increasingly used by companies for generating solutions to problems of various kinds. In this way, thousands of people contribute a large amount of text data that needs to already be structured during the process of idea generation in order to avoid repetitions and to maximize the solution space. This is a hard information retrieval problem as the texts are very short and have little predefined structure. We present a solution that involves three steps: text data preprocessing, clustering, and visualization. In this contribution, we focus on clustering and visualization by presenting a Hebbian network approach that is able to learn the principal components of the data while the data set is continuously growing in size. We compare our approach to standard clustering applications and demonstrate its superiority with respect to classification reliability on a real-world example. |
|
Krishna Römpp, Ein natürlichsprachliches Dialogsystem für das Internet, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
Due to the ongoing digitization of everyday life, fast and direct interaction with web content gets increasingly important.
This study presents a prototype of a German-language dialogue system based on real internet data sources.
It consists of components for extraction and aggregation of web data, as well as modules for language processing and text generation from ontologies.
Using a variety of knowledge bases, this work creates an architecture to answer queries in real time.
The work shows problems that arise in developing such systems and illustrates a possible solution based on the given implementation.
Eventually, an evaluation indicates functionality and performance of the developed system. |
|
Patrick Minder, Abraham Bernstein, How to translate a book within an hour - Towards general purpose programmable human computers with CrowdLang, In: Web Science 2012, New Yortk, NY, USA, 2012-06-22. (Conference or Workshop Paper published in Proceedings)
 
In this paper we present the programming language and framework CrowdLang for engineering complex computation systems incorporating large numbers of networked humans and machines agents. We evaluate CrowdLang by developing a text translation program incorporating human and machine agents. The evaluation shows that we are able to simply explore a large design space of possible problem solving programs with the simple variation of the used abstractions. Furthermore, an experiment, involving 1918 different human actors, shows that the developed mixed human-machine translation program significantly outperforms a pure machine translation in terms of adequacy and fluency whilst translating more than 30 pages per hour and that the program approximates the professional translated gold-standard to 75% using the automatic evaluation metric METEOR. Last but not least, our evaluation illustrates that our new human computation pattern staged-contest with pruning outperforms all other refinements in the translation task. |
|
Patrick Minder, Sven Seuken, Abraham Bernstein, Mengia Zollinger, CrowdManager - Combinatorial allocation and pricing of crowdsourcing tasks with time constraints, In: Workshop on Social Computing and User Generated Content in conjunction with ACM Conference on Electronic Commerce (ACM-EC 2012), Valencia, Spain, 2012-06-07. (Conference or Workshop Paper published in Proceedings)
 
Crowdsourcing markets like Amazon’s Mechanical Turk or Crowdflower are quickly growing in size and popularity. The allocation of workers and compensation approaches in these markets are, however, still very simple. In particular, given a set of tasks that need to be solved within a specific time constraint, no mechanism exists for the requestor to (a) find a suitable set of crowd workers that can solve all of the tasks within the time constraint, and (b) find the “right” price to pay these workers. In this paper, we provide a solution to this problem by introducing CrowdManager – a framework for the combinatorial allocation and pricing of crowdsourcing tasks under budget, completion time, and quality constraints. Our main contribution is a mechanism that allocates tasks to workers such that social welfare is maximized, while obeying the requestor’s time and quality constraints. Workers’ payments are computed using a VCG payment rule. Thus, the resulting mechanism is efficient, truthful, and individually rational. To support our approach we present simulation results that benchmark our mechanism against two baseline approaches employing fixed-priced mechanisms. The simulation results illustrate that our mechanism (i) significantly reduces the requestor’s costs in the majority of settings and (ii) finds solutions in many cases where the baseline approaches either fail or significantly overpay. Furthermore, we show that the allocation as well as VCG payments can be computed in a few seconds, even with hundreds of workers and thousands of tasks. |
|
Khadija Elbedweihy, Stuart N Wrigley, Fabio Ciravegna, Dorothee Reinhard, Abraham Bernstein, Evaluating semantic search systems to identify future directions of research, In: Second International Workshop on Evaluation of Semantic Technologies, 2012-05-28. (Conference or Workshop Paper published in Proceedings)
 
Recent work on searching the Semantic Web has yielded a wide range of approaches with respect to the style of input, the underlying search mechanisms and the manner in which results are presented. Each approach has an impact upon the quality of the information retrieved and the user’s experience of the search process. This highlights the need for formalised and consistent evaluation to benchmark the coverage, applicability and usability of existing tools and provide indications of future directions for advancement of the state-of-the-art. In this paper, we describe a comprehensive evaluation methodology which addresses both the underlying performance and the subjective usability of a tool. We present the key outcomes of a recently completed international evaluation campaign which adopted this approach and thus identify a number of new requirements for semantic search tools from both the perspective of the underlying technology as well as the user experience. |
|
Alexander Schäfer, Evaluation of methods for automatic data linking, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)

The Semantic Web defines a way to publish data that is semantically linked to the World Wide Web. The advantages are that computer programs can follow these links and assemble data on their own, without human intervention but with human initiation. In the domain of statistics providing linked data would be a natural step towards open access to information. This thesis uses data from the Federal Statistics Office of Switzerland in a semi-automated process of semantically linking that data. Also four different tools with different methods for automatic matching of data were evaluated. It was found out, that for automated data linking in a manner acceptable for adoption, the raw data is not yet prepared enough, and the matching tools are not sufficiently developed.
|
|
Markus Christen, Florian Faller, Ulrich Götz, Cornelius Müller, Serious Moral Games : Erfassung und Vermittlung moralischer Werte durch Videospiele, Edition ZHdK , Zürich, 2012. (Book/Research Monograph)
 
Können Videospiele moralische Werte vermitteln? Dieser Gedanke widerspricht einer öffentlichen Debatte, die oft ganz selbstverständlich von einem negativen Einfluss solcher Spiele auf die Moral der Spieler ausgeht. Dieses Buch will die meist verkürzt geführte Diskussion aufbrechen und um neue Themen erweitern. Ausgehend von der Beobachtung, dass moderne Videospiele auch ethische Themen in ihre Spielgestaltung einbauen, untersuchen die Autoren Möglichkeiten und Grenzen der Konstruktion eines «Serious Moral Game» – also eines Videospiels, mit dem man das moralische Handeln des Spielers erfassen und reflektieren kann. Das Buch «Serious Moral Games» zeigt auf, dass in Videospielen ein bislang wenig ausgeschöpftes Potential steckt, das sowohl für die Moralforschung als auch für die Spieler selbst interessant ist: Videospiele als Instrumente, um mehr über sich und das eigene moralische Empfinden und Wertschätzen zu erfahren. |
|
Markus Christen, Rezension von: Stefan Huster (2011): Soziale Gesundheitsgerechtigkeit. Sparen, umverteilen, vorsorgen?, Bioethica Forum, Vol. 5 (4), 2012. (Journal Article)
 
|
|
Markus Christen, Rezension von: Oliver Müller/Giovanni Maio/Joachim Boldt/Josef Mackert (Hrsg.), Das Gehirn als Projekt. Wissenschaftler, Künstler und Schüler erkunden unsere neurotechnische Zukunft, Freiburg i. Br./Berlin (Rombach) 2011, Zeitschrift für medizinische Ethik, Vol. 58 (4), 2012. (Journal Article)
 
|
|
Markus Christen, Darcia Narvaez, Moral development in early childhood is key for moral enhancement, AJOB Neuroscience, Vol. 3 (4), 2012. (Journal Article)
 
|
|
Markus Christen, Marianne Regard, Der „unmoralische Patient“. Eine Analyse der Nutzung hirnverletzter Menschen in der Moralforschung, Nervenheilkunde, Vol. 31 (4), 2012. (Journal Article)
 
Die empirische Erforschung des moralischen Entscheidens und Handelns stützt sich zunehmend auf Patienten, die selten auftretende Hirnläsionen in bestimmten Regionen des Frontallappens aufweisen. Dies stellt sowohl die neuroethische Frage zur Bedeutung solcher Erkenntnisse für unser Verständnis von Moral als auch die medizinethische Frage nach dem Umgang mit solchen Patienten im Kontext von Forschung und Klinik. Basierend auf einer Auswertung der Literatur über den Zusammenhang von Hirnläsionen und Sozialverhalten sowie gut 40 Jahren eigene Erfahrung in der neuropsychologischen Abklärung zeigen wir zwei Wahrnehmungslücken: Zum einen propagieren diese Studien einen Neurodeterminismus des menschlichen Moralverhaltens, der aber wissenschaftlich nicht ausreichend untermauert ist. Zum anderen zeigt sich eine Verschiebung des Forschungsinteresses weg von einem klinischen Fokus hin zur neuropsychologischen Grundlagenforschung über das menschliche Moralvermögen. Letzterer Punkt ist insofern bedeutsam, als dass der klinische und alltägliche Umgang mit solchen Patienten schwierig ist und diese Menschen die Grenzen der Anwendung klassischer medizinethischer Prinzipien wie Autonomie und Fürsorge aufzeigen. |
|
Markus Christen, Merlin Bittlinger, Henrik Walter, Peter Brugger, Sabine Müller, Dealing with side effects of deep brain stimulation: Lessons learned from stimulating the STN, AJOB Neuroscience, Vol. 3 (1), 2012. (Journal Article)
 
Deep brain stimulation (DBS) is increasingly investigated as a therapy for psychiatric disorders. In the ethical evaluation of this novel approach, incidence and impact of side effects (SE) play a key role. In our contribution, we analyze the discussion on SE of DBS of the subthalamic nucleus (STN)—a standard therapy for movement disorders like Parkinson's disease (PD)—based on 66 case reports, 69 review papers, and 347 outcome studies from 1993 to 2009. We show how the DBS community increasingly acknowledged the complexity of STN-DBS side effects. Then we discuss the issue of study quality and the methods used to assess SE. We note that some side effects are the subject of conflicting evaluations by the different stakeholders involved. This complicates the ethical controversy inherent in any novel treatments for diseases that involve psychiatric aspects. We delineate how the lessons from STN-DBS could guide future DBS applications in the field of psychiatry. |
|
Markus Christen, Sabine Müller, Current status and future challenges of deep brain stimulation in Switzerland, Swiss Medical Weekly, Vol. 2012 (142), 2012. (Journal Article)
 
QUESTIONS UNDER STUDY: Deep brain stimulation (DBS) has become a standard therapy for some forms of severe movement disorders and is investigated for other neurological and psychiatric disorders, although many scientific, clinical and ethical issues are still open. We analyse how the Swiss DBS community addresses these problematic issues and future challenges.
METHODS: We have performed a survey among Swiss DBS centres and a Delphi study with representatives of all centres and further stakeholders related to the topic.
RESULTS: The current DBS infrastructure in Switzerland consists of seven facilities. About 850–1,050 patients have received a DBS system in Switzerland for various indications since its advent in 1976. Critical issues like patient selection and dealing with side effects are in accordance with international standards. There are indications of a conservative referral practice in Switzerland for DBS interventions, but the data available do not allow verifying or refuting this point.
CONCLUSIONS: Issues to investigate further are whether or not there is an unmet medical need with respect to DBS, long-term medical and psychosocial sequelae of the intervention, conditions for enhancing the (research) collaboration of Swiss DBS centers, and the effect of the recent decision to reduce the number of DBS centres to 4 (resp. possibly 3) on the potential of this therapeutic approach. |
|
Jayalath Ekanayake, Improving reliability of defect prediction models: from temporal reasoning and machine learning perspective, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation)
 
Software quality is an important factor since software systems are playing a key role in today’s world. There are several perspectives within the field on software quality measurement. One such frequently used measurement (or metric) is the number of defects that could result in crashes, catastrophic failures, or security breaches encountered in the software. Testing the software for such defect is essential to enhance the quality. However, due to the rising complexity of software manual testing was becoming extremely time consuming task and consequently, many more automatic supporting tools have been developed. One such supporting tool is defect prediction models. A large number of defect prediction models can be found in the literature and most of them share a common procedure to develop the models. In general, the models’ development procedure indirectly assumes that underlying data distribution of software systems is relatively stable over time. But, this assumption is not necessarily true and consequently, the reliability of those models is doubtful at some points in time. In this thesis, therefore, we presented temporal or time-based reasoning techniques that improve the reliability of prediction models. By exploring four open source software (OSS) projects and one cost estimation dataset, we first disclosed that real-time based data sampling compared to random sampling improves the prediction quality. Also, the temporal features are more appropriate than static features for defect prediction. Furthermore, we found that the non-linear models are better than linear models for defect prediction. This implies, the relationship between project features and the defects is not linear. Further investigations showed that prediction quality varies significantly over time and hence, testing a model in one or few data samples is not sufficient to generalize the model. Specifically, we unveiled that the project features influence the model’s prediction quality and therefore, the model’s prediction quality itself can be predicted. Finally, we turned these insights into a tool that estimates the prediction quality of models in advance. This tool supports the developers to determine when to apply their models and when not.Our presented temporal-reasoning techniques can be easily adapted to most of the existing prediction models for enhancing the reliability of those models. Generality, these techniques are easy-to-use, extensible, and show high degree of flexibility in terms of customization to real applications. More important, we provided a tool that supports the developers to make a decision about their prediction models in advance. |
|
Ausgezeichnete Informatikdissertationen 2011, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2012. (Edited Scientific Work)

|
|