Michael Imhof, Entwicklung eines RDF Parsers für transaktionsbasierte Daten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Bachelor's Thesis)
 
Der Java RDF Parser (JRP) ist ein Programm zum Einlesen von Files im RDF
Format und Extrahieren von transaktionsbasierten Daten, die anschliessend in einer
Datenbank gespeichert werden können. Diese Arbeit handelt von der Entwicklung
von JRP und bietet dem Leser einen Einblick in das Design des Codes, das
Datenbank-Schema und die Anbindung sowie eine Evaluation von Jena, der Java
Library die fu?r das Parsen der Daten benutzt wird. Selbstverständlich wurde das
Programm mit realen Daten getestet und bewies auf diese Art und Weise seine
korrekte Funktionalität. Leider kann bis jetzt nichts u?ber die Skalierbarkeit des
Parsers gesagt werden, da fu?r die Performance Tests keine grossen Datensätze
vorhanden waren. The Java RDF Parser (JRP) is a program to read in files in RDF format and extract
transactional data from it that can be stored in a database afterwards. This thesis is
about the development of JRP and gives an insight into the design of the code, the
database schema and connection, as well as an evaluation of Jena, the Java library
that is used to parse the input files. Naturally, the program was tested with real data
and proved the desired functionality. Unfortunately, nothing can be said about the
scalability of the parser, because there were no large datasets available for
performance tests. |
|
Matthias Linherr, Data Mining auf Kundendaten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
 
The aim of this thesis is to implement a platform to enable the alumni associations to
analyse their member-databases. Using statistical methods and data-mining algorithms,
this platform should allow the visualization and appraisal of member behaviour and
member structure. Four different alumni organisations utilise this platform in form of a
web-based application to maintain their databases. They form the base of the following
evaluations.
|
|
Katharina Reinecke, Abraham Bernstein, Culturally Adaptive Software: Moving Beyond Internationalization, In: Proceedings of the HCI International (HCII), Springer, Beijing, China, July 2007. (Conference or Workshop Paper)
 
So far, culture has played a minor role in the design of software. Our experience with imbuto, a program designed for Rwandan agricultural advisors, has shown that cultural adaptation increased efficiency, but was extremely time-consuming and, thus, prohibitively expensive. In order to bridge the gap between cost-savings on one hand, and international usability on the other, this paper promotes the idea of culturally adaptive software. In contrast to manual localization, adaptive software is able to acquire details about an individual's cultural identity during use. Combining insights from the related fields international usability, user modeling and user interface adaptation, we show how research findings can be exploited for an integrated approach to automatically adapt software to the user's cultural frame. |
|
Sinja Helfenstein, Visualizing Labor Market Dynamics based on Social Security Records A Combination of Temporal and Visual Data Mining, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
 
The goal of this thesis is the understanding of temporal patterns in the Austrian Social Security Database to derive labor market dynamics. As these structures are very complex, conventional data mining approaches turned out to be inadequate for interpretation and knowledge discovery. The main challenge is the intuitive representation of the time dimension. Therefore, we keep the time dimension by generating movies of concatenated probabilistic model visualizations. Using this combination of temporal and visual data mining allows us to identify various effects such as seasonal hiring cycles, gender and age-related employment dynamics, and demographic influences. |
|
Domenic Benz, Voraussage von Benutzerverhalten in dynamischen Umgebungen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
 
The increasing proliferation of mobile phones has a significant influence on our daily lifes. Allthough the increasing use of mobile devices has brought several advantages, it also has the negative effect of unwanted disturbance and interruptions. It is desirable that a mobile phone has the ability to adapt to the current situation it is in. For such an adaption to become possible, the mobile phone would need to have information about its current context. To achieve this goal, a software is implemented which gathers data from a variety of sensors on a mobile phone. This software is then being used in a prototype experiment. In this experiment we try to determine if it is possible to predict a users activity and location based on the collected data. The software implemented in this thesis and the results of the experiment help to prepare and conduct follow-up experiments in the field of context awareness and human interuptibility research. |
|
Christoph Kiefer, Imprecise SPARQL: Towards a Unified Framework for Similarity-Based Semantic Web Tasks, In: Proceedings of 2nd Knowledge Web PhD Symposium (KWEPSY) colocated with the 4th Annual European Semantic Web Conference (ESWC), June 2007. (Conference or Workshop Paper)
 
This proposal explores a unified framework to solve Semantic Web tasks that often require similarity measures, such as RDF retrieval, ontology alignment, and semantic service matchmaking. Our aim is to see how far it is possible to integrate user-defined similarity functions (UDSF) into SPARQL to achieve good results for these tasks.We present some research questions, summarize the experimental work conducted so far, and present our research plan that focuses on the various challenges of similarity querying within the Semantic Web. |
|
Christoph Kiefer, Abraham Bernstein, Jonas Tappolet, Analyzing Software with iSPARQL, In: Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007), Springer, June 2007. (Conference or Workshop Paper)
 
|
|
Dennis Weiss, Mining Customer Networks and Inter-Product Relations in Internet / Digital Entertainment Provider Data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)

Today’s telecommunication companies have at their disposal large quantities of detailed transaction data. Methods of data mining can be utilized for generating information on product use, customer behaviour and interaction between customers. Cross-selling analyses, customer segmentation and social network analysis thereby represent only some of the practices which can be employed for facilitating direct marketing procedures. This present thesis illustrates an approach for identifying customer groups and their networks, from which management implications may be derived by means of propositional as well as relational data mining. In this context triple play customers - i.e. subscribers of broadband internet, fixed-line telephony and digital TV - were segmented on the basis of data generated from product use. In addition, network analysis and the search for multi-relational patterns provided further insight into both customer types and their respective needs. |
|
Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, Andreas Zeller, How Long will it Take to Fix This Bug?, In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, May 2007. (Conference or Workshop Paper)

Predicting the time and effort for a software problem has long been a difficult task. We present an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue. Our technique leverages existing issue tracking systems: given a new issue report, we use the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Our approach thus allows for early effort estimation, helping in assigning issues and scheduling stable releases. We evaluated our approach using effort data from the JBoss project. Given a sufficient number of issues reports, our automatic predictions are close to the actual effort; for issues that are bugs, we are off by only one hour, beating naive predictions by a factor of four. |
|
Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, Andreas Zeller, Predicting Effort to fix Software Bugs, In: Proceedings of the 9th Workshop Software Reengineering, May 2007. (Conference or Workshop Paper)

|
|
Jonas Tappolet, Mining Software Repositories - A Semantic Web Approach, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
 
Modern software development has become a complex task. Software systems grow larger and are densely interconnected to other systems making excessive use of large communication frameworks. To cope with this complexity, software developers and project managers need the assistance of tools which extract information about flaws in code as well as general information about the state of a project. In this thesis, we first introduce a data exchange format based on OWL/RDF, the Semantic Web’s format of choice today, able to store data and meta data from the source code, versioning system (i.e. CVS) and bug tracking system (i.e. Bugzilla). In a next step, we present a tool to retrieve the data from the online software repositories and to store it in OWL/RDF. This tool is implemented as a plug-in for the Eclipse IDE and is able to harvest data from projects managed by Eclipse. Finally, we evaluated our data format and tools by applying a set of software metric calculations, pattern detections and similarity measures by using iSPARQL and SimPack. The results of the conducted experiments are promising, and gave a first proof of our approach. |
|
Esther Kaufmann, Abraham Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users?, In: 6th International Semantic Web Conference (ISWC 2007), March 2007. (Conference or Workshop Paper)
 
Natural language interfaces offer end-users a familiar and convenient option for querying ontology-based knowledge bases. Several studies have shown that they can achieve high retrieval performance as well as domain independence. This paper focuses on usability and investigates if NLIs are useful from an end-user's point of view. To that end, we introduce four interfaces each allowing a different query language and present a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full sentences as query language and confirm that NLIs are useful for querying Semantic Web data. |
|
Christoph Kiefer, Abraham Bernstein, Markus Stocker, The Fundamentals of iSPARQL - A Virtual Triple Approach For Similarity-Based Semantic Web Tasks, In: Proceedings of the 6th International Semantic Web Conference (ISWC), Springer, March 2007. (Conference or Workshop Paper)
 
This research explores three SPARQL-based techniques to solve Semantic Web tasks that often require similarity measures, such as semantic data integration, ontology mapping, and Semantic Weg service matchmaking. Our aim is to see how far it is possible to integrate customized similarity functions (CSF) into SPARQL to achieve good results for these tasks. Our first approach exploits virtual triples calling property functions to establish virtual relations among resources under comparison; the second approach uses extension functions to filter out resources that do not meet the requested similarity criteria; finally, our third technique applies new solution modifiers to post-process a SPARQL solution sequence. The semantics of the three approaches are formally elaborated and discussed. We close the paper with a demonstration of the usefulness of our iSPARQL framework in the context of a data integration and an ontology mapping experiment. |
|
Abraham Bernstein, Michael Daenzer, The NExT System: Towards True Dynamic Adaptions of Semantic Web Service Compositions (System Description), In: Proceedings of the 4th European Semantic Web Conference (ESWC '07), Springer, March 2007. (Conference or Workshop Paper)
 
Traditional process support systems typically offer a static composition of atomic tasks to more powerful services. In the real world, however, processes change over time: business needs are rapidly evolving thus changing the work itself and relevant information may be unknown until workflow execution run-time. Hence, the static approach does not sufficiently address the need for dynamism. Based on applications in the life science domain this paper puts forward five requirements for dynamic process support systems. These demand a focus on a tight user interaction in the whole process life cycle. The system and the user establish a continuous feedback loop resulting in a mixed-initiative approach requiring a partial execution and resumption feature to adapt a running process to changing needs. Here we present our prototype implementation NExT and discuss a preliminary validation based on a real-world scenario. |
|
Christoph Kiefer, Abraham Bernstein, Jonas Tappolet, Mining Software Repositories with iSPARQL and a Software Evolution Ontology, In: Proceedings of the 2007 International Workshop on Mining Software Repositories (MSR '07), IEEE Computer Society, March 2007. (Conference or Workshop Paper)
 
One of the most important decisions researchers face when analyzing the evolution of software systems is the choice of a proper data analysis/exchange format. Most existing formats have to be processed with special programs written specifically for that purpose and are not easily extendible. Most scientists, therefore, use their own database(s) requiring each of them to repeat the work of writing the import/export programs to their format. We present EvoOnt, a software repository data exchange format based on the Web Ontology Language (OWL). EvoOnt includes software, release, and bug-related information. Since OWL describes the semantics of the data, EvoOnt is (1) easily extendible, (2) comes with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The paper also shows iSPARQL – our SPARQL-based Semantic Web query engine containing similarity joins. Together with EvoOnt, iSPARQL can accomplish a sizable number of tasks sought in software repository mining projects, such as an assessment of the amount of change between versions or the detection of bad code smells. To illustrate the usefulness of EvoOnt (and iSPARQL), we perform a series of experiments with a real-world Java project. These show that a number of software analyses can be reduced to simple iSPARQL queries on an EvoOnt dataset. |
|
Christoph Kiefer, Abraham Bernstein, Hong Joo Lee, Mark Klein, Markus Stocker, Semantic Process Retrieval with iSPARQL, In: Proceedings of the 4th European Semantic Web Conference (ESWC '07), Springer, March 2007. (Conference or Workshop Paper)
 
The vision of semantic business processes is to enable the integration and inter-operability of business processes across organizational boundaries. Since different organizations model their processes differently, the discovery and retrieval of similar smantic business processes is necessary in order to foster inter-organi ational collaborations. This paper presents our approach of using iSPARQL � our imprecise query engine based on SPARQL � to query the OWL MIT Process Handbook � a large collection of over 5000 semantic business processes. We particularly show how easy it is to use iSPARQL to perform the presented process retrieval task. Furthermore, since choosing the best performing similarity strategy is a non-trivial, data-, and context-dependent task, we evaluate the performance of three simple and two human-engineered similarity strategies. In addition, we conduct machine learning experiments to learn similarity measures showing that complementary information contained in the different notions of similarity strategies provide a very high retrieval accuracy. Our preliminary results indicate that iSPARQL is indeed useful for extending the reach of queries and that it, therefore, is an enabler for inter- and intra-organizational collaborations. |
|
Hermann Wotruba, Thomas Scharrenbach, New developments in sorting technology- the use of microwave excitation and infrared sensors for ore sorting, In: Meddelanden från MinFo No. 37: Conference in Mineral Processing, Luleå, Sweden, February 2007, Stockholm, Sweden, 2007-02-06. (Conference or Workshop Paper published in Proceedings)

|
|
Simon Lützelschwab, Case-based Reasoner for OWL-S Web Services, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
 
This thesis explores a novel approach to use techniques found in Case-based Reasoner Systems and apply them to the Semantic Web. In the context of the Web Service Ontology OWL-S, a framework following the principles of Case-based Reasoning is introduced. A suitable case structure is defined that builds the basis of the system. Furthermore, various similarity strategies are implemented to determine the appropriate selection of suitable cases based on the novel problem presented to the system. Similarity is measured using semantic, syntactic and graph measurements. Additionally, different adaption strategies are introduced to facilitate the reuse process. The framework's architecture allows for custom extension of additional similarity and adaption strategies in the future. |
|
Christian Kündig, A User Model Editor for Ontology-based Cultural Personalization, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Bachelor's Thesis)

|
|
Abraham Bernstein, Markus Stocker, Christoph Kiefer, SPARQL Query Optimization Using Selectivity Estimation, 2007. (Other Publication)
 
This poster describes three static SPARQL optimization approaches for in-memory RDF graphs: (1) a selectivity estimation index (SEI) for single query triple patterns; (2) a query pattern index (QPI) for joined triple patterns; and (3) a hybrid optimization approach that combines both indexes. Using the Lehigh University Benchmark (LUBM), we show that the hybrid approach outperforms other SPARQL query engines such as ARQ and Sesame for in-memory graphs. |
|