Shen Gao, Thomas Scharrenbach, Jörg-Uwe Kietz, Abraham Bernstein, Running out of Bindings? Integrating Facts and Events in Linked Data Stream Processing, In: 4th International Workshop on Ordering and Reasoning, s.n., Aachen, Germany, 2015-10-11. (Conference or Workshop Paper published in Proceedings)
Processing streams of linked data has gained increased importance over the past years. In many cases the streams contain events generated by sensors such as traffic control systems or news releases. As a reaction to this increased need, a number of languages and systems were developed that are aimed at processing linked data streams. These systems/languages follow one of two pertinent traditions: either they perform complex event processing or stream reasoning. However, both kinds of systems only support simulating system states as a sequence of events.
This paper proposes to model a new kind of data – Facts. Facts are temporal states stored in systems that combine events. Essentially, they trade space complexity for time complexity and reduce the intermediate variable bindings compared to other approaches. They also have the advantage of keeping queries relatively simple. In our evaluation, we compile queries for typical sensor-based use-cases in TEF-SPARQL, our SPARQL extension supporting Facts, C-SPARQL, and EP-SPARQL to the well-established Event Processing Language (EPL) running on the Esper complex event processing engine. Compared to simulate Facts, we show that modeling Facts directly only creates less than 1% of intermediate bindings and improves the throughput by up to 4 times. |
|
Lorenz Fischer, Roi Blanco, Peter Mika, Abraham Bernstein, Timely Semantics: A Study of a Stream-based Ranking System for Entity Relationships, In: The 14th International Semantic Web Conference, Heidelberg, Germany, 2015-10-11. (Conference or Workshop Paper published in Proceedings)
In recent years, search engines have started presenting se- mantically relevant entity information together with document search results. Entity ranking systems are used to compute recommendations for related entities that a user might also be interested to explore. Typically, this is done by ranking relationships between entities in a semantic knowledge graph using signals found in a data source as well as type annotations on the nodes and links of the graph. However, the process of producing these rankings can take a substantial amount of time. As a result, entity ranking systems typically lag behind real-world events and present relevant entities with outdated relationships to the search term or even outdated entities that should be replaced with more recent relations or entities.
This paper presents a study using a real-world stream-processing based implementation of an entity ranking system, to understand the effect of data timeliness on entity rankings. We describe the system and the data it processes in detail. Using a longitudinal case-study, we demonstrate (i) that low-latency, large-scale entity relationship ranking is feasible using moderate resources and (ii) that stream-based entity ranking improves the freshness of related entities while maintaining relevance. |
|
Markus Christen, Thema im Fokus „Urteilsfähigkeit“ – Ethische Kernfragen, Thema im Fokus : die Zeitschrift von Dialog Ethik, Vol. 2015 (Oktober), 2015. (Journal Article)
|
|
Markus Christen, Thomas Niederberger, Thomas Ott, Suleiman Aryobsei, Reto Hofstetter, Micro-text classification between small and big data, Nonlinear Theory and Its Applications, Vol. 6 (4), 2015. (Journal Article)
Micro-texts emerging from social media platforms have become an important source for research. Automatized classification and interpretation of such micro-texts is challenging. The problem is exaggerated if the number of texts is at a medium level, making it too small for effective machine learning, but too big to be efficiently analyzed solely by humans. We present a semi-supervised learning system for micro-text classification that combines machine learning techniques with the unmatched human ability for making demanding, i.e. nonlinear decisions based on sparse data. We compare our system with human performance and a predefined optimal classifier using a validated benchmark data-set. |
|
Thomas Brenner, Modeling of User Preferences using graph-based Recommender Systems, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
Recommender systems have become an important tool to help conquer the immense flood of Internet information. In recent years, focus has shifted from just increasing accuracy to improving user satisfaction by producing more diverse recommendations. This thesis seeks deeper knowledge about diversity and how a user approaches it. Users
are assigned to two different groups: a diversity-seeking and non-diversity-seeking group; this paper explains different ways to separate the groups. In a second part, alterations to graph-based recommender systems, i.e. applying the tf-idf scheme and employing users' neighborhood relations are discussed. Separation of users into different groups and recommender system variations are evaluated; a useful combination to optimize the results according to a user's preferences is proposed. These new variations of recommender systems succeed in providing more accurate and at the same time more diverse
recommendations for certain groups of users compared to state-of-the-art recommender systems. |
|
David Arpad Pinezich, Crowdsourced recognition of recoil black holes, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
This paper is focused on the "Blackhole Chaser", a crowdsourcing platform prototype which helps to find new recoil black holes. First this paper shows a general overview over the topic of crowdsourcing and other related research fields. Then it explains how the "Blackhole Chaser" was built and how the architecture was planned. After that, it describes how different users (paid crowd workers, it-specialists and professionals) act on this platform and if there is a coincidence in their classification on it, based on their individual cognitive skills, that have been tested prior to using the platform with the renowned ETS testing framework. This paper closes with a conclusion and a listing of future tasks.
|
|
Sofia Orlova, Interactive Advertising Analytics, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
Advertisement is everywhere. Whether we are aware of it or not, in average we are daily exposed to 2500 - 10000 ads. We are used to customized ads from google search, youtube and several more big players. But not every industry is that far yet. In fact one of the oldest communication media, namely the television doesn’t show you customized ads yet. The demographic information needed for such a customization simply wasn’t available. Online televion portals are gaining more and more users, which register themselves with the necessary data, which makes their habits traceable. Regarding this development tools to recognize the ads in live streams are required, in order to use the demographic information of the user and propose customized ads to him. This thesis describes how I built such a tool, that compares the video streams to ads based on their colour distributions. This mechanism can be used to expand its functions for extraction of the demographic data and combine the comparison mechanism by adding further recognition features. |
|
Fabian Christoffel, Bibek Paudel, Chris Newell, Abraham Bernstein, Blockbusters and Wallflowers: Speeding up Diverse and Accurate Recommendations with Random Walks, In: 9th ACM Conference on Recommender Systems RecSys 2015, ACM Press, New York, NY, USA, 2015-09-16. (Conference or Workshop Paper published in Proceedings)
User satisfaction is often dependent on providing accurate and diverse recommendations. In this paper, we explore algorithms that exploit random walks as a sampling technique to obtain diverse recommendations without compromising on efficiency and accuracy. Specifically, we present a novel graph vertex ranking recommendation algorithm called RP3β that re-ranks items based on 3-hop random walk transition probabilities. We show empirically, that RP3β provides accu- rate recommendations with high long-tail item frequency at the top of the recommendation list. We also present approx- imate versions of RP3β and the two most accurate previously published vertex ranking algorithms based on random walk transition probabilities and show that these approximations converge with increasing number of samples. |
|
Dmitry Moor, Tobias Grubenmann, Sven Seuken, Abraham Bernstein, A Double Auction for Querying the Web of Data, In: The Third Conference on Auctions, Market Mechanisms and Their Applications, ACM, New York, USA, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
|
|
Lorenz Fischer, Shen Gao, Abraham Bernstein, Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization, In: 2015 IEEE International Conference on Cluster Computing (CLUSTER 2015), IEEE Computer Society, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise.
In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach. |
|
Basil Philipp, A Flexible Viewership Analytics System for Online TV, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
The technologies used by online television providers make it possible to collect significantly more information on viewer behaviour than is possible with traditional, panel-based measurements. The fragmented market and the large data size call for novel approaches to handle this data and turn it into valuable insights. We propose a system that can deal with multiple data sources and offers advanced analyses of the data. We demonstrate the capabilities by showing an exemplary market analysis, an audience flow analysis and a viewership prediction. |
|
András Heé, Large-Scale Social Network Analysis with the igraph Toolbox and Signal/Collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
In the last years, the processing of huge graphs with millions and billions of vertices and edges has become feasible due to highly scalable distributed frameworks. But, the current systems are suffering from having to provide a high level language abstraction to allow data scientists the expression of large scale data analysis tasks. Our contribution has two main goals: Firstly, we build a generic network analysis toolbox (NAT) on top of Signal/Collect, a vertex-centric graph processing framework, to support the integration into existing statistical and scientific programming environments. We deliver an interface to the popular network analysis tool igraph. Secondly, we address the challenge to port social network analysis and graph exploration algorithms to the vertex-centric programming model to find implementations which do not operate on adjacency matrix representations of the graphs and do not rely on global state. |
|
Patrick Winzenried, A domain specific bidding language for SPARQL, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
With the increasing popularity of the Semantic Web, the innovative use of value-adding solutions such as a SPARQL query market is not only a promising but also an important step towards an economic Semantic Web approach.
Firstly, however, the precise description of a domain specific bidding language for SPARQL is necessary, in order to allow not only retrieving data but also expressing matching valuations.
Thus, the purpose of this master thesis is to assess through exploratory, semi-structured interviews (n = 25) the most important requirements and valuation functions. Based on this requirements engineering, it is possible to formulate required features and to propose a bidding language that allows the representation of the most discussed valuation expressions.
On the basis of the results of this research, it can be concluded that the demand for quality and the possible use of piecewise-affine-linear value functions depending on the cardinality are among the most important requirements. |
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets, In: 15th International Conference on Web Engineering, Switzerland, 2015-06-23. (Conference or Workshop Paper published in Proceedings)
|
|
Markus Christen, J Domingo-Ferrer, Dominik Herrmann, Jeroen van den Hoven, Beyond informed consent – investigating ethical justifications for disclosing, donating or sharing personal data in research, In: Joint conference of the International Society for Ethics and Information Technology and the International Association for Computing and Philosophy, s.n., 2015-06-22. (Conference or Workshop Paper published in Proceedings)
|
|
Yiftach Nagar, Patrick De Boer, ANA CRISTINA BICHARRA GARCIA, James W. Pennebaker, Klemens Mang, Aiding Expert Judges in Open-Innovation Challenges, In: Collective Intelligence 2015. 2015. (Conference Presentation)
|
|
Patrick De Boer, Abraham Bernstein, PPLib: towards systematic crowd process design using recombination and auto-experimentation, In: Collective Intelligence 2015, University of Michigan, Santa Clara, CA, 2015-05-31. (Conference or Workshop Paper)
|
|
Christian Rupietta, Uschi Backes-Gellner, Abraham Bernstein, Effectiveness of Small Coaching Activities in Massive Open Online Courses: Evidence from a Randomized Experiment, In: 49th Annual Conference of the Canadian Economics Association. 2015. (Conference Presentation)
|
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Online view maintenance for continuous query evaluation, In: WWW 2015, s.n., 2015-05-18. (Conference or Workshop Paper published in Proceedings)
In Web stream processing, there are queries that integrate Web data of various velocity, categorized broadly as streaming (i.e., fast changing) and background (i.e., slow changing) data. The introduction of local views on the background data speeds up the query answering process, but requires maintenance processes to keep the replicated data up-to-date. In this work, we study the problem of maintaining local views in a Web setting, where background data are usually stored remotely, are exposed through services with constraints on the data access (e.g., invocation rate limits and data access patterns) and, contrary to the database setting, do not provide streams with changes over their content. Then, we propose an initial solution: WBM, a method to maintain the content of the view with regards to query and user-defined constraints on accuracy and responsiveness. |
|
Philip Stutz, Bibek Paudel, Coralia-Mihaela Verman, Abraham Bernstein, Random-walk triplerush: asynchronous graph querying and sampling, In: 24th International World Wide Web Conference (WWW 2015), International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, 2015-05-18. (Conference or Workshop Paper published in Proceedings)
Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.
In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph exploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sampling of graph properties.
To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples. |
|