# Computer Science

## Computer Science School Seminar

### Coordinator: Elazar Goldenberg

Monday, May 22, 2023 at 1400, Weston 127.

### Speaker: Hila Naaman.

Monday, January 16, 2023 at 1400, Weston 127.

### Minimal Influential Sets in Majority Dynamics

We investigate the minimum set of vertices required to take over a network under majority dynamics. We demonstrate that in typical networks, this set must be relatively large, scaling linearly with the size of the network. However, we also identify special cases where the bound on the size of the set is significantly smaller. Additionally, we discuss the specific vertices that make up this minimal influential set.

Joint with Itai Arieli, Galit Golan-Ashkenazi.

Monday, January 2, 2023 at 1400, Staff conference room.

### עיתונות בתנועה: שימוש באלגוריתמים ובניתוח רשתות לחקר עיתונות במאה ה-19

בשנים האחרונות חל גידול במה שמכונה "מדעי הרוח הדיגיטליים", חקר מדעי הרוח באמצעות כלים חישוביים. מחקרי הנוכחי, שאת הרעיון מאחוריו אציג בסמינר, בוחן תהליכי האחדה ופירוד בחברה האמריקאית במאה ה-19 באמצעות ניתוח קורפוס נרחב של מאות אלפי גיליונות של עיתונים אמריקאים בין 1840 ל -1884. המחקר מיישם מספר גישות וטכניקות חישוביות לצורך זיהוי ואפיון המבנה, התכנים והדינמיקה של הרשתות העיתונאיות באמריקה. בין השאר, אנו משתמשים בכלים שפותחו לזיהוי מקוריות (plagiarism detection) כדי לאתר שימוש חוזר בטקסטים ובמידול נושאי (topic modeling) כדי לזהות צבירים (clusters) של תוכן בזמן ובמרחב. התוצאה הכוללת היא רשת רשת קשרים בין עיתונים בזמן ובמרחב, שתאפשר לזהות את הכיוון והקצב של זרימת חדשות. מחקר זה זכה השנה במענק ארבע שנתי של הקרן הלאומית למדעים (ISF)

מעבר לשאלות ההיסטוריות הצצות במחקר זה, הקשורות ביחסי הגומלין בין התקשורת האמריקאית עם התפתחויות פוליטיות, אידיאולוגיות, חברתיות וטכנולוגיות, ישנן שאלות רבות הקשורות למדעי הנתונים. כיצד מעבדים יחד את המשתנים הרבים הקשורים לרשת זו, כיצד מייצגים את הרשת וקשריה, וכיצד מבצעים את המעבר בין ה-data להבנה ההיסטורית.

בסמינר אציג את שתי המתודות (זיהוי מקוריות ומידול נושאי) ואת התועלת שלהן לחקר רשתות עיתונות היסטורית. במקביל, אציג את הקשיים הנובעים מיישום של כלים חישוביים לחקר ההיסטוריה.

Monday, December 12, 2022 at 1400, Staff conference room.

### Streaming algorithms for submodular maximization with a cardinality constraint

Motivated by machine learning applications, much research over the last decade was devoted to solving submodular maximization problems under Big Data computational models. Perhaps the most basic such problem is the problem of maximizing a submodular function subject to a cardinality constraint. A recent series of papers has studied this problem in the data stream model, and in particular, fully determined the approximation ratio that can be obtained for it by (semi-)-streaming algorithms both when the objective function is monotone and non-monotone. In this talk we will survey the main ideas behind these tight results.

Based on joint works with Naor Alaluf, Alina Ene, Huy Nguyen, Ashkan Norouzi-Fard, Andrew Suh, Ola Svensson and Rico Zenklusen.

Monday, November 28, 2022 at 1400, Staff conference room.

### Quantum Computing -- Today and Tomorrow

Quantum computers are at an active stage of research and development that may eventually lead to their practical realization, with potential impact on many fields from chemistry and medication, to financial risk analysis and AI. In this talk, we will review the basic concepts of quantum computing: qubits, measurements, quantum gates, quantum circuits, superposition, entanglement etc, and briefly present IBM's quantum software and hardware roadmap.

We will focus on our IBM Quantum Israel team research and open-source code contribution in experimentalists tools and efficient quantum circuit synthesis.

Dr. Shelly Garion is a researcher at IBM Research Israel at the Quantum Computing group, and a contributor to Qiskit open-source

Monday, May 2, 2022 at 1400, Weston 127.

### Risk Aware Stochastic Placement of Cloud services

Allocating the right amount of resources to each service in any of the datacenters in a cloud environment is a very difficult task. This task becomes much harder due to the dynamic nature of the workload and the fact that while long term statistics about the demand may be known, it is impossible to predict the exact demand at each point in time. As a result, service providers either over allocate resources and hurt the service cost efficiency, or run into situations where the allocated local resources are insufficient to support the current demand. In these cases, the service providers deploy overflow mechanisms, such as redirecting traffic to a remote datacenter or temporarily leasing additional resources (at a higher price) from the cloud infrastructure owner. The additional cost is in many cases proportional to the amount of overflow demand.

In this work we study this approach and develop a novel mechanism to assign services to datacenters based on the available resources in each datacenter and the distribution of the demand for each service. We use comprehensive analysis to prove that the overall overflow cost is almost optimal for arbitrary demand distributions, as long as there are no dependencies among the services. We further show, using simulation based on real data, that the scheme performs very well on realistic service workloads.

Monday, April 4, 2022 at 1400, Weston 127.

### Localizing and repairing problems in specifications for reactive synthesis

In recent years reactive synthesis from temporal logic specifications has attracted much research interest. Synthesis aims to automatically produce correct by construction models, circuits, implementations, etc. from specifications. In this talk we focus on reactive synthesis from temporal logic specifications. Reactive synthesis creates reactive systems, i.e., systems that receive inputs and produce outputs at every computation step, and are expected to run indefinitely. We focus also on GR(1) specifications. GR(1) is an expressive fragment of Linear Temporal Logic (LTL) that allows efficient synthesis. Several synthesis tools exist for GR(1) with many applications, including industrial ones.

Writing correct and complete formal specifications, reading and understanding formal specifications, and repairing them as needed, are all very challenging tasks. Thus, synthesis in itself is not enough, and tools for assisting engineers in these tasks are essential. These tools must also be efficient enough to deliver results in reasonable time.

We present two works that deal with unrealizability, the case where specifications have no correct implementation. The first work localizes causes for unrealizability. The second work automatically repairs unrealizable specifications. Finally, we will briefly discuss two other works that improve the quality of specifications and synthesized controllers, by locating, explaining, and removing redundant specification elements.

Monday, February 28, 2022 at 1400, Weston 127.

### Manipulation-resistant false-name-proof facility location mechanisms for complex graphs

In many real-life scenarios, a group of agents needs to agree on a common action, e.g., on a location for a public facility, while there is some consistency between their preferences, e.g., all preferences are derived from a common metric space. The facility location problem models such scenarios, and it is a well-studied problem in social choice. We study mechanisms for facility location on unweighted undirected graphs that are resistant to manipulations (strategy-proof, abstention-proof, and false-name-proof) by both individuals and coalitions on one hand and anonymous and efficient (Pareto-optimal) on the other. We define a new family of graphs, ZV-line graphs, and show a general facility location mechanism for these graphs that satisfies all these desired properties.

This mechanism can also be computed in polynomial time, and it can equivalently be defined as the first Pareto-optimal location according to some predefined order. Our main result, the ZV-line graphs family and the mechanism we present for it, unifies all works in the literature of false-name-proof facility location on discrete graphs, including the preliminary (unpublished) results we are aware of. In particular, we show mechanisms for all graphs of at most five vertices, discrete trees, bicliques, and clique tree graphs.

Finally, we discuss some generalizations and limitations of our result for facility location problems on other structures: Weighted graphs, large discrete cycles, infinite graphs, and facility location problems concerning infinite societies.

Monday, February 14, 2022 at 1400, Weston 127.

### A Computational Approach to the Study of Bilingualism

Major features of bilingualism, including grammatical, cognitive, and social aspects, have been extensively studied by scholars for over half a century. Crucially, much of this research has been conducted with small, carefully-curated datasets or in a laboratory experimental setup. Nowadays, the ubiquity of English as an online lingua franca offers a rich opportunity for computational research on second language acquisition and on tools for aiding non-native speakers. The availability of large and diverse datasets stimulates new opportunities for pursuing the emerging direction of computational investigation of bilingualism, thereby tying empirical results with well-established theoretical foundations.

I tackle empirical tasks motivated by theoretical hypotheses on second language acquisition, leveraging linguistic insights and foundations into practical value through applications in the area of second-language education. My main focus lies in developing robust and practical tools for better understanding, more accurate modeling and processing of the language of advanced, highly proficient non-native speakers – a population that has enjoyed very little attention in computational investigation thus far.

Monday, February 7, 2022 at 1400, Weston 127.

### Caching Systems with Indicators

Indicators reduce costs in caching systems by allowing a user to select which cache to query for the desired datum. Such indicators are used in large-scale systems, such as content delivery networks, 5G networks, and information-centric networks.

Due to bandwidth and energy constraints, a practical indicator is typically not a fresh, accurate list of all the items currently stored in the cache. Instead, the indicator is a stale approximation of the list of the currently cached items. As a result, indicators experience false indications.

Our work targets the problem of minimizing costs while satisfying bandwidth constraints. To this end, we develop lightweight, dynamic, auto-adjusting algorithms for both the cache and user sides. Using rigorous mathematical analysis, we provide performance guarantees for our algorithms. Our extensive evaluation study, based on real-world cache traces, shows a significant improvement over existing approaches.

Monday, December 6, 2021 at 1400, Weston 127.

### On the Size of Succinct Non-interactive Arguments in the Random Oracle Model

Imagine being able to prove the correctness of massive computations while only communicating a few bits. Such schemes can be used to solve many of the challenges of the modern online world. Formally, these schemes are known as Succinct Non-interactive ARGuments (SNARGs): a proof system that lets a prover convince a verifier about the truthfulness of a statement via a single short message.

In a seminal work, Micali (FOCS 1994) gave the first succinct non-interactive argument in the random oracle model (ROM). The construction is elegant and compiles an information-theoretic PCP and a cryptographic commitment scheme. The resulting SNARG has many attractive features: it is fast, requires no setup, and satisfies post-quantum security. However, it also has one major drawback: a large (quadratic) argument size.

Are all SNARG constructions in the random oracle model inherently large? The answer to this question is still open, and I will survey recent work that makes significant progress towards a solution. In particular, we will see a new construction that achieves a smaller argument size. This is the first progress on the Micali construction since it was introduced over 25 years ago. Our construction relies on a strong soundness notion for PCPs and a weak binding notion for commitments. We hope that our work paves the way for understanding if a linear argument size is achievable in the ROM.

The talk is based on several joint works with a set of authors including Alessandro Chiesa, Daniel Nukrai, and Iftach Hairner.

Monday, November 15, 2021 at 1400, Weston 127.

### From Deep Learning to Autonomous Driving in Mobileye

Deep Learning has revolutionized Artificial Intelligence in the past years. At Mobileye, Deep Learning lies at the core of our computer vision system, allowing cars to drive autonomously in Israel and around the world. In the talk I will present in detail the computer vision and machine learning systems that are used in Mobileye, and discuss how developments in Mobileye are driving some academic progress on the fundamentals of Deep Learning.

Monday, October 25, 2021 at 1400, Economics auditorium.

### Project Debater – how persuasive can a computer be?

Project Debater is the first AI system that can meaningfully debate a human opponent. The system, an IBM Grand Challenge, is designed to build coherent, convincing speeches on its own, as well as provide rebuttals to the opponent's main arguments. In 2019, Project Debater competed against Harish Natarajan, who holds the world record for most debate victories, in an event held in San Francisco that was broadcasted live world-wide. In this talk we will tell the story of Project Debater, from conception to a climactic final event, describe its underlying technology, and discuss how it can be leveraged for advancing decision making and critical thinking.

Monday, June 7, 2021 at 1400, Weston 127.

### Current trends in arithmetic dynamics

Arithmetic dynamics is a relatively new field of research. The goal is to study number theoretic properties of classical dynamical systems. The problems in this area are varied, from counting numbers of rational periodic points to Galois actions on orbits. The tools required for research in arithmetic dynamics are also quite varied as it lies in the intersection of analytic and algebraic number theory, arithmetic geometry and classical dynamical systems. In this lecture we discuss recent progress in the field, and some of our recent results.

Monday, April 26, 2021 at 1400, Weston 127.

### Orthogonality Dimension, Kneser Graphs, and Applications

The orthogonality dimension of a graph is the smallest integer $t$ for which one can assign to every vertex a nonzero vector in $R^t$ so that adjacent vertices in the graph receive orthogonal vectors. The Kneser graph $K(d,s)$ is the graph whose vertices are all the $s$-subsets of a universe of size $d$, where two sets are adjacent if they are disjoint.

In the talk, I will discuss the (generalized) orthogonality dimension of (generalized) Kneser graphs, and present several applications in theoretical computer science and information theory.

Monday, April 19, 2021 at 1400, Weston 127.

### Larger Corner-Free Sets from Better NOF Exactly-$N$ Protocols

A subset of the integer planar grid $[N] \times [N]$ is called corner-free if it contains no triple of the form $(x,y)$, $(x+\delta,y)$, $(x,y+\delta)$. It is known that such a set has a vanishingly small density, but how large this density can be remains unknown. The best previous construction was based on Behrend's large subset of $[N]$ with no 3-term arithmetic progression. Here we provide the first substantial improvement to this lower bound in decades. Our approach to the problem is based on the theory of communication complexity.

No previous knowledge needed.

(Joint work with N. Linial)

Monday, March 8, 2021 at 1400, Weston 127.

### Efficient Verification in Distributed Networks

In every complex system, faults can occur. Detecting faults, in many cases, is the first step in dealing with them. In this work, we focus on efficient detection of faults in distributed networks. A distributed network is a set of interconnected processors. We consider a standard message-passing model of distributed computation, where there is no central control or shared memory, and processors communicate only by sending messages on communication links in synchronous rounds. Distributed verification has received much attention over the years due to its applications to various domains. For example, checking the results obtained from the execution of a distributed program, constructing self-stabilizing algorithms, and developing a distributed complexity theory inspired by the sequential complexity theory.

In this work, we address the problem of locally verifying global properties of the network, and we study the effect of different network resources and relaxations on the complexity of verification. In particular, we first show that using randomization reduces the communication complexity exponentially. Also, approximations can significantly reduce space and communication complexity. The ability to send a different message on each link is a crucial factor which can greatly reduce the communication complexity of verification as well. Finally, we show that using multiple communication rounds can sometimes reduce space complexity even more than linearly in the number of rounds.

Monday, March 1, 2021 at 1400, Weston 127.

### Dynamic and Fault-Tolerant Graph Algorithms and Data-Structures

In many algorithms used in computing environments such as massive storage devices, large scale parallel computation, and communication networks, recovering from failures must be an integral part. Therefore, designing algorithms and data-structures whose running time is efficient even in the presence of failures is an important task. In this talk I will present variants of shortest path algorithms and data-structures in setting with failures. A replacement path $P(s,t,e)$ is a shortest path from $s$ to $t$ that does not go through $e$. I will present, at a high level, the results I obtained during my PhD for efficiently computing replacement paths [SODA 2019, ICALP 2019], as well as efficiently constructing compact data-structures for answering replacement paths/distance queries [SODA 2017, STOC 2020]. Then, I will deep-dive into the analysis of one of these results.

I will also present some results obtained during my postdoc in multiple research areas. Motivated by COVID-19 I researched network dynamics and developed a model for the spread of infection in social networks that is motivated by some properties from COVID-19 spreading. I will also present a project in which I developed face-detection deep neural networks that classifies faces as either wearing or not-wearing face-masks, as part of a new "competitive programming with deep learning" course that I lecture.

Tuesday, January 28, 2020 at 1500, Weston 127.

### Fully Understanding the Hashing-Trick

Feature hashing, also known as the hashing trick, introduced by Weinberger et al. (2009), is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking, feature hashing uses a random hashing of the entries of an $n$-dimensional $x$ in order to reduce the dimension of the data from $n$ to $m$ (where $m<<n$), while approximately preserving the Euclidean norm. Weinberger et al. showed tail bounds on the norm of the new vector. Specifically they showed that if an $n$-dimensional vector $x$ is sufficiently "balanced", and the target dimension $m$ is sufficiently large, then the norm of the hashed vector approximates the norm of x with high probability. These bounds were later extended by Dasgupta et al. (2010) and most recently refined by Dahlgaard et al. (2017), however, the true nature of the performance of this key technique, and specifically the correct tradeoff between the pivotal parameters (how "balanced" $x$ is, and what are the constraints on the error) remained an open question. We settle this question by giving tight asymptotic bounds on the exact tradeoff between the central parameters, thus providing a complete understanding of the performance of feature hashing. We complement the asymptotic bound with empirical data, which shows that the constants "hiding" in the asymptotic notation are, in fact, very close to $1$, thus further illustrating the tightness of the presented bounds in practice.

(Joint work with Casper Freksen and Kasper Green Larsen)

Monday, January 13, 2020 at 1400, Weston 127.

### Vertex Sparsifiers of Planar Graphs

These days, more than ever, we deal with huge graphs such as social networks, communication networks, roadmaps and so forth. But even when our main interest is only in a small portion of the input graph $G$, we still need to process all or most of it in order to answer our query, since the runtime and memory requirements of many common graph algorithms depend on the input (graph) size. Therefore, a natural question is whether we can find a vertex sparsifier $H$, which is a smaller graph that exactly (or approximately) preserves some properties of the original graph such as distances, cuts and flows. This talk presents the state of the art results of vertex sparsifiers for planar graphs, and further refines these bounds by a topological parameter called the terminal face cover.

Monday, December 16, 2019 at 1400, Weston 127.

### Distributed Computing with Rational Agents

In recent years, there is a growing interest in distributed algorithms for networks in which the participants are not faulty but rational, and may deviate from the prescribed algorithm in order to increase their profit. Until now, these distributed algorithms for rational agents have assumed a-priori knowledge of the size of the network.

We generalize the game-theoretic model of distributed computing and present basic building blocks that are in equilibrium, i.e., where no agent can profit by cheating. We then challenge this assumption by proving that, without a-priori knowledge of the network size, there is no equilibrium for many different distributed computing problems.

Monday, November 18, 2019 at 1400, Weston 127.