Not logged in.

Contributions published at Data Analytics (Ingo Scholtes)

Contribution
Joel Leupp, Interactive Visualization of Scientific Collaboration Networks based on Graph Neural Networks, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Collaborative research is becoming increasingly more important and is associated with higher productivity and producing high-quality research output. It serves as a key mechanism for knowledge diffusion within a research community. In this thesis collaborations in computer science denoted by co-authorships in research publications are analyzed and visualized to uncover the social interactions and relationships between authors, institutions and countries. Bibliographic data from DBLP and detailed author information from CSRankings are collected to create a large-scale collaboration network that includes 76’546 publications from 127 conferences, 148’379 collaborations, and 14’555 authors from 597 institutions located across 55 countries. An exploratory data analysis is conducted, and the network is visualized using the advancements in deep learning on graphs with Graph Convolutional Networks. The publicly available CSCollab tool is introduced, which allows filtering the network based on the geographical scope, the research areas and the year of publication. It provides a visualization of the network on an interactive geographical map, an interactive graph visualization, various visualizations of analytics and statistics of the networks, and features to explore the underlying data of the collaborations.
Baris Özakar, Time-Aware Centralities and Embeddings of Nodes for Influence Prediction in Evolving Socio-Financial Networks, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Financial markets are complex and constantly evolving systems, where investors make decisions based on market conditions, company performance, and global economic trends. However, recent studies suggest that peer effects can also play a significant role in shaping investment decisions. Peer effects refer to the influence that one's peers have on their decision-making, and in the context of financial decision-making, can cause investors to follow trends in herding behavior. This influence process can result in cascading behavior, where the actions of a few investors can trigger a chain reaction of buying or selling, leading to significant price movements. The impact of peer effects has been amplified by social networks that have revolutionized the way we communicate and share information. In this thesis, we investigate the role of peer effects in financial markets and their impact on cascading behavior. Using a real-life evolving socio-financial network, we aim to quantify the extent to which individual investors influence the generation of cascading behavior, with a particular focus on the spatio-temporal features of individual investors within the network. We formulate a prediction task that forecasts the influence of individual users by utilizing various centrality measures and time-aware node embeddings. We evaluate the effectiveness of these centrality measures and time-aware node embeddings in predicting the influence of users in generating cascades of trades through the network. Our study contributes to a better understanding of the spatio-temporal factors that facilitate cascading behavior in financial markets, highlighting the need to understand their impact in various contexts, including real-life socio-financial networks.
Lisi Qarkaxhija, Vincenzo Perri, Ingo Scholtes, De Bruijn goes Neural: Causality-Aware Graph Neural Networks for Time Series Data on Dynamic Graphs, In: Learning on Graphs Conference, 2022. (Conference or Workshop Paper published in Proceedings) null
Christopher Blöcker, Jelena Smiljanic, Ingo Scholtes, Martin Rosvall, Similarity-based Link Prediction from Modular Compression of Network Flows, In: Learning on Graphs Conference, 2022. (Conference or Workshop Paper published in Proceedings) null
Christoph Gote, Vincenzo Perri, Ingo Scholtes, Predicting Influential Higher-Order Patterns in Temporal Network Data, In: IEEE/ACM International Conference on Advances in Social Network Analysis and Mining, 2022. (Conference or Workshop Paper published in Proceedings) null
Laura Salathe, Predicting Human Error in Geolocation Tasks Using Online Metadata - An Exploratory Study, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) This thesis addresses the problem of predicting human errors in image geolocation tasks with the help of collected metadata. For this purpose, an interactive experiment architecture allowing the collection of mouse coordinates, clicking events, timestamps, and further human interaction data was designed. One hundred participants took part in the experiment and solved various geolocation tasks. The collected data was used to train different machine learning classifiers, such as logistic regression models, k-nearest neighbors, and support vector machines. The best-performing model is able to predict human errors in geolocation tasks to a small extent. The test accuracy on unseen test data is 10% higher than random chance and 4% better than the most simple rule-based model classifying all answers as correct.
Andris Prokofjevs, Digital Message in a Bottle, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) This thesis designs and builds a twitter-like network utilizing short-range wireless technologies likeWiFi or Bluetooth and the benefits of the Elixir programming language. The choice between WiFi and Bluetooth is made in favor of Bluetooth. A system design is described, which illustrated how the network should behave in different situations and how the routing of messages is performed. A unique GMM (Get Missing Messages) approach is developed to make message exchange as cheap as possible in terms of the amount of data transfer. Finally, a prototype is built and presented as well as drawbacks, ways to solve them and potential future development of the system are discussed.
Vincenzo Perri, Lisi Qarkaxhija, Albin Zehe, Andreas Hotho, Ingo Scholtes, One Graph to Rule them All: Using NLP and Graph Neural Networks to analyse Tolkien's Legendarium, In: Proceedings of the Computational Humanities Research Conference 2022, CHR 2022, Antwerp, Belgium, December 12-14, 2022, CEUR-WS.org, 2022. (Conference or Workshop Paper published in Proceedings) null
Timothy LaRock, Ingo Scholtes, Tina Eliassi-Rad, Sequential motifs in observed walks, J. Complex Networks, Vol. 10 (5), 2022. (Journal Article) null
Leonore Röseler, Ingo Scholtes, Bernhard Sendhoff, Aniko Hannak, Willing to Revise? Confidence and Recommendation Adoption in AI-Assisted Image Recognition, In: HHAI 2022: Augmenting Human Intellect - Proceedings of the First International Conference on Hybrid Human-Artificial Intelligence, Amsterdam, The Netherlands, 13-17 June 2022, IOS Press, 2022. (Conference or Workshop Paper published in Proceedings) null
Christoph Gote, Pavlin Mavrodiev, Frank Schweitzer, Ingo Scholtes, Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set, In: 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, ACM, 2022. (Conference or Workshop Paper published in Proceedings) null
Luka Petrovic, Ingo Scholtes, Learning the Markov Order of Paths in Graphs, In: WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, ACM, 2022. (Conference or Workshop Paper published in Proceedings) null
Ziawasch Abedjan, Thomas Bendig, Ulf Brefeld, Joachim Bürkle, Jörg Desel, Stefan Edlich, Thomas Eppler, Michael Goedicke, Nils Hachmeister, Jens Heidrich, Stefan Höppner, Stefan M Kast, Daniel Krupka, Klaus Lang, Peter Liggesmeyer, Lisa Meiser, Ingo Scholtes, Marina Tropmann-Frick, Empfehlungen für Masterstudiengänge „Data Science “--auf Basis eines Bachelors in (Wirtschafts-) Informatik oder Mathematik, Gesellschaft für Informatik e.V, 2021. (Journal Article) null
Tina Eliassi-Rad and Vito Latora and Martin Rosvall and Ingo Scholtes, Higher-Order Graph Models: From Theoretical Foundations to Machine Learning (Dagstuhl Seminar 21352), Dagstuhl Reports, Vol. 11 (7), 2021. (Journal Article) null
Christoph Gote, Vincenzo Perri, Ingo Scholtes, Predicting Influential Higher-Order Patterns in Temporal Network Data, CoRR, Vol. abs/2107.12100, 2021. (Journal Article) Networks are frequently used to model complex systems comprised of interacting elements. While links capture the topology of direct interactions, the true complexity of many systems originates from higher-order patterns in paths by which nodes can indirectly influence each other. Path data, representing ordered sequences of consecutive direct interactions, can be used to model these patterns. However, to avoid overfitting, such models should only consider those higher-order patterns for which the data provide sufficient statistical evidence. On the other hand, we hypothesise that network models, which capture only direct interactions, underfit higher-order patterns present in data. Consequently, both approaches are likely to misidentify influential nodes in complex networks. We contribute to this issue by proposing eight centrality measures based on MOGen, a multi-order generative model that accounts for all paths up to a maximum distance but disregards paths at higher distances. We compare MOGen-based centralities to equivalent measures for network models and path data in a prediction experiment where we aim to identify influential nodes in out-of-sample data. Our results show strong evidence supporting our hypothesis. MOGen consistently outperforms both the network model and path-based prediction. We further show that the performance difference between MOGen and the path-based approach disappears if we have sufficient observations, confirming that the error is due to overfitting.
Unai Alvarez-Rodriguez, Luka V Petrovi\'c, Ingo Scholtes, Inference of time-ordered multibody interactions, arXiv preprint arXiv:2111.14611, 2021. (Journal Article) We introduce time-ordered multibody interactions to describe complex systems manifesting temporal as well as multibody dependencies. First, we show how the dynamics of multivariate Markov chains can be decomposed in ensembles of time-ordered multibody interactions. Then, we present an algorithm to extract combined interactions from data and a measure to characterize the complexity of interaction ensembles. Finally, we experimentally validate the robustness of our algorithm against statistical errors and its efficiency at obtaining simple interaction ensembles.
Luka V. Petrovi\'c, Ingo Scholtes, PaCo: Fast Counting of Causal Paths in Temporal Network Data, In: Companion Proceedings of the Web Conference 2021, Association for Computing Machinery, New York, NY, USA, 2021. (Conference or Workshop Paper published in Proceedings) Graph or network representations are an important foundation for data mining and machine learning tasks in relational data. Many tools of network analysis, like centrality measures, information ranking, or cluster detection rest on the assumption that links capture direct influence, and that paths represent possible indirect influence. This assumption is invalidated in time series data capturing, e.g., time-stamped social interactions, time-resolved co-occurrences or other types of relational time series. In such data, for two time-stamped links (A,B) and (B,C) the chronological ordering and timing determines whether a causal path from node A via B to C exists. A number of works has shown that for this reason network analysis cannot be directly applied to time-stamped data. Existing methods to address this issue require statistics on causal paths, which is computationally challenging for big time series data. Addressing this problem, we develop an efficient algorithm to count causal paths in time-stamped network data. Applying it to empirical data, we show that our method is more efficient than a baseline method implemented in an OpenSource data analytics package. Our method works efficiently for different values of the maximum time difference between consecutive links of a causal path and supports streaming scenarios. With it, we are closing a gap that hinders an efficient analysis of large temporal networks.
Timothy LaRock, Ingo Scholtes, Tina Eliassi-Rad, Sequential Motifs in Observed Walks, arXiv preprint arXiv:2112.05642, 2021. (Journal Article) The structure of complex networks can be characterized by counting and analyzing network motifs. Motifs are small subgraphs that occur repeatedly in a network, such as triangles or chains. Recent work has generalized motifs to temporal and dynamic network data. However, existing techniques do not generalize to sequential or trajectory data, which represents entities moving through the nodes of a network, such as passengers moving through transportation networks. The unit of observation in these data is fundamentally different, since we analyze full observations of trajectories (e.g., a trip from airport A to airport C through airport B), rather than independent observations of edges or snapshots of graphs over time. In this work, we define sequential motifs in trajectory data, which are small, directed, and edge-weighted subgraphs corresponding to patterns in observed sequences. We draw a connection between counting and analysis of sequential motifs and Higher-Order Network (HON) models. We show that by mapping edges of a HON, specifically a kth-order DeBruijn graph, to sequential motifs, we can count and evaluate their importance in observed data. We test our methodology with two datasets: (1) passengers navigating an airport network and (2) people navigating the Wikipedia article network. We find that the most prevalent and important sequential motifs correspond to intuitive patterns of traversal in the real systems, and show empirically that the heterogeneity of edge weights in an observed higher-order DeBruijn graph has implications for the distributions of sequential motifs we expect to see across our null models.
Vincenzo Perri, Ingo Scholtes, Visualisation of Temporal Network Data via Time-Aware Static Representations with HOTVis, In: Companion Proceedings of the Web Conference 2021, Association for Computing Machinery, New York, NY, USA, 2021. (Conference or Workshop Paper) The visual analysis of temporal network data is often hindered by the cognitively demanding nature of dynamic graphic visualizations. Addressing this issue, the graph visualization tool HOTVis generates time-aware static network visualizations that highlight the causal topology of temporal networks, i.e. which nodes can directly and indirectly influence each other, and are thus considerably easier to interpret than state-of-the-art dynamic graph visualizations.
Tina Eliassi-Rad, Vito Latora, Martin Rosvall, Ingo Scholtes, Higher-Order Graph Models: From Theoretical Foundations to Machine Learning (Dagstuhl Seminar 21352), Dagstuhl Reports, Vol. 11 (7), 2021. (Journal Article) Graph and network models are essential for data science applications in computer science, social sciences, and life sciences. They help to detect patterns in data on dyadic relations between pairs of genes, humans, or documents, and have improved our understanding of complex networks across disciplines. While the advantages of graph models of relational data are undisputed, we often have access to data with multiple types of higher-order relations not captured by simple graphs. Such data arise in social systems with non-dyadic or group-based interactions, multi-modal transportation networks with multiple connection types, or time series containing specific sequences of nodes traversed on paths. The complex relational structure of such data questions the validity of graph-based data mining and modelling, and jeopardises interdisciplinary applications of network analysis and machine learning. To address this challenge, researchers in topological data analysis, network science, machine learning, and physics recently started to generalise network analysis to higher-order graph models that capture more than dyadic relations. These higher-order models differ from standard network analysis in assumptions, applications, and mathematical formalisms. As a result, the emerging field lacks a shared terminology, common challenges, benchmark data and metrics to facilitate fair comparisons. By bringing together researchers from different disciplines, Dagstuhl Seminar 21352 "Higher-Order Graph Models: From Theoretical Foundations to Machine Learning" aimed at the development of a common language and a shared understanding of key challenges in the field that foster progress in data analytics and machine learning for data with complex relational structure. This report documents the program and the outcomes of this seminar.

12 Next