colloquium
Upcoming Lectures
Photos and videos are now a main mode of communication, used to tell stories, share experiences and convey ideas. However, common media editing tools are often either too complex to master, or oversimplified and limited.
In this talk I will present my strategy towards the creation of media editing techniques that are easy to learn, yet expressive enough to reflect unique creative objectives. We will mostly discuss one specific domain  human heads  which are both extremely common (i.e. people care about people) and technologically challenging. I will present several works on editing video by editing text, perspective manipulation, and incamera lighting feedback. I will also discuss exciting future opportunities related to neural rendering and digital representations of humans.
BIO:
Ohad Fried is a postdoctoral research scholar at Stanford University. His work lies in the intersection of computer graphics, computer vision, and humancomputer interaction. He holds a PhD in computer science from Princeton University, and an M.Sc. in computer science and a B.Sc. in computational biology from The Hebrew University. Ohad’s research focuses on tools, algorithms, and new paradigms for photo and video editing. Ohad is the recipient of several awards, including a Siebel Scholarship and a Google PhD Fellowship. If you own a cable modem, there’s a nonnegligible chance that Ohad’s code runs within it, so feel free to blame him for your slow internet connection. www.ohadf.com
Previous Lectures
Integrity constraints such as functional dependencies (FD), and multivalued dependencies (MVD)
are fundamental in database schema design, query optimization, and for enforcing data integrity.
Current data intensive applications such as ML algorithms process observational data that is often
unnormalized, inconsistent, erroneous and noisy. In these applications, quite often the constraints
need to be inferred from the data, and are not required to hold exactly, but it suffices if they hold
only to a certain degree.
In this work, we use information theory to quantify the degree of satisfaction of a constraint,
giving rise to two major challenges that I will cover in this talk: the implication problem for soft constraints, and discovering soft constraints in data. The implication problem for soft constraints asks whether a set of constraints (antecedents) that hold in the data to a large degree imply a high degree of satisfaction of another constraint (consequent). The implication problem has been investigated in both the Database and AI literature, but only under the assumption that all constraints hold exactly; our work extends this to the case of soft constraints.
Next, we address the problem of mining soft constraints from data, and present an algorithm for discovering complete schemas from data. The algorithm employs pruning techniques that take advantage of the properties of the informationtheoretic measures associated with the constraints, and allow it to scale to datasets with up to 1M tuples, and up to 30 attributes.
Based on joint work with Dan Suciu, Pranay Mundra, Guna Prasad, and Babak Salimi (to be presented at ICDT 2020 and SIGMOD 2020)
In this talk, we present Variational Bayesian Networks (VBNets)  A novel scalable Bayesian hierarchical model that utilizes both implicit and explicit relations for learning entity representations. VBNets are designed for Microsoft Store and Xbox services that handle around a billion users worldwide. Different from point estimate solutions that map entities to vectors and are usually over confident, VBNets map entities to densities and hence model uncertainty. VBNets are based on analytical approximations of the intractable entities' posterior and the posterior predictive distribution of the data. We demonstrate the effectiveness of VBNets on linguistic, recommendations, and medical informatics tasks, where it is shown to outperform other alternative methods that facilitate Bayesian modeling with or without semantic priors. In addition, we show that VBNets produce superior representations for rare words and cold items.
If time permits, we will give a brief overview of several recent deep learning works in the domains of deep neural attention mechanisms, multiview representation learning and inverse problems with applications for natural language understanding, recommender systems, computer vision, sound synthesis and biometrics.
Formal verification of infinitestate systems, and distributed systems in particular, is a long standing research goal. I will describe a series of works that develop a methodology for verifying distributed algorithms and systems using decidable logics, employing decomposition, abstraction, and user interaction. This methodology is implemented in an opensource tool, and has resulted in the first mechanized proofs of several important distributed protocols. I will also describe a novel approach to the problem of invariant inference based on a newly formalized duality between reachability and mathematical induction. The duality leads to a primaldual search algorithm, and a prototype implementation already handles challenging examples that other stateoftheart techniques cannot handle. I will briefly describe several other works, including a new optimization technique for deep learning computations that achieves significant speedups relative to existing deep learning frameworks.
In this talk, we will present the main building blocks that allow for an interpretative analysis and process of dynamical systems data. The overarching theme of our work is based on the theory of Bernard Koopman (1931). Key to our approach is the interplay between the underlying dynamics and its embedding onto an infinitedimensional space of scalar functions. This embedding encodes a potentially nonlinear system via a linear object known as the Koopman Operator. Koopman's perspective is advantageous since it involves the manipulation of linear operators which can be done efficiently. Moreover, algebraic properties of the Koopman matrix are directly associated with dynamical features of the system, which in turn are linked to highlevel questions. Overall, the combination of Koopman Theory with novel dimensionality reduction techniques and data science approaches leads to a highly powerful framework.
To demonstrate the effectiveness of our approach, we consider several challenging problems in various fields. In geometry processing, we show how optimizing for a spectral basis and a Koopman operator (a functional map) leads to improved shape matching results. In fluid mechanics, we develop a provably convergent ADMM scheme for computing Koopman operators that admits stateoftheart results on data with high levels of sensor noise. In image processing, our methodology generates a discrete transform of a nonlinear flow as faster as two orders of magnitude when compared to existing approaches. Finally, we construct a novel framework for recovering dynamics from questionnaire data that arise in the social sciences.
In the metric compression problem, we are given n points in a metric space, and the goal is to construct a compact representation (sketch) of the points, such that the distance between every pair can be approximately recovered from the sketch, up to a small distortion of (1 +/ epsilon). Such sketches are widely used for fast nearest neighbor search in highdimensional Euclidean spaces.
We give a new algorithm for sketching Euclidean metric spaces. It provably achieves the optimal compression bound, improving over the classical dimension reduction theorem of Johnson and Lindenstrauss. In particular, while the latter theorem represents each point by log(n)/epsilon^2 *coordinates* (each containing a multibit number), we show that log(n)/epsilon^2 *bits* are both sufficient and necessary. Empirically, our algorithm either matches or improves over stateoftheart heuristics.
Based on joint works with Piotr Indyk and Ilya Razenshteyn.
In recent years there has been an increasing gap between the success of machine learning algorithms and our ability to explain their success theoretically.
Namely, many of the problems that are solved to a satisfactory degree of precision are computationally hard in the worst case. Fortunately, there are often reasonable assumptions which help us to get around these worstcase impediments and allow us to rigorously analyze heuristics that are used in practice.
In this talk I will advocate a complementary approach, where instead of explicitly characterizing some desired "niceness" properties of the data, we assume access to an optimization oracle that solves a relatively simpler task. This allows us to identify the sources of hardness and extend our theoretical understanding to new domains. Furthermore we will show that seemingly innocents (and arguably justifiable) modifications to the oracle can lead to tractable reductions and even to bypass hardness results.
We demonstrate these ideas using the following results: i) An efficient algorithm for nonconvex online learning using an optimization oracle. b) A faster boosting algorithm using a "simple" weak learner. iii) An efficient reduction from online to private learning.
Joint works with Naman Agarwal, Noga Alon, Elad Hazan, and Shay Moran.
In the 'frequent items' problem one sees a sequence of items in a stream (e.g. a stream of words coming into a search query engine like Google) and wants to report a small list of items containing all frequent items. In the 'change detection' problem one sees two streams, say one from yesterday and one from today, and wants to report a small list of items containing all those whose frequencies changed significantly. For both of these problems, we would like algorithms that use memory substantially sublinear in the length of the stream.
We describe new stateoftheart solutions to both problems. For the former, we make use of chaining methods to control the suprema of Rademacher processes to develop an algorithm BPTree with provably nearoptimal memory consumption. For the latter, we develop the first
algorithm to simultaneously achieve asymptotically optimal space, fast query and update time, and high success probability (over the random coins flipped by the algorithm) by reducing the problem to a certain graph partitioning problem, which we then give a new algorithm for.
Based on joint works with Vladimir Braverman, Stephen Chestnut, Nikita Ivkin, Kasper Green Larsen, Huy Le Nguyen, Mikkel Thorup, Zhengyu Wang, and David P. Woodruff.
In recent years, new forms of communication between people and devices have revolutionized our daily lives. The Internet has become the leading platform for human interaction (e.g., social networks), commerce, information, and also control of physical devices (e.g., Internet of Things). This new connectivity creates new security and privacy risks for individual users and organizations. It also increases the complexity and diversity of the different security and cryptographic solutions we need to protect against increasingly sophisticated and motivated attackers. Designing and implementing a secure system is a very elusive process. One needs to clearly identify the security targets (e.g., maintaining the confidentiality of the messages or preventing access from nonauthorized entities) as well as the adversarial capabilities.
In this talk, I will show how we can combine cryptanalytic techniques with various sidechannels to break the security guarantees of realworld implementations of cryptographic protocols, and how novel solutions can help mitigate the root causes of these vulnerabilities.
A distance labeling scheme is an assignment of bitlabels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. An important class of distance labeling schemes is that of hub labelings, where a node v∈G stores its distance to the socalled hubs S(v)⊆V, chosen so that for any u,v∈V there is w∈S(u)∩S(v) belonging to some shortest uv path. Notice that for most existing graph classes, the best distance labelling constructions existing use at some point a hub labeling scheme at least as a key building block. Our interest lies in hub labelings of sparse graphs, i.e., those with E(G)=O(n), for which we show a lowerbound of n/2^O(√logn) for the average size of the hubsets. Additionally, we show a hublabeling construction for sparse graphs of average size O(n/RS(n)^c) for some 0<c<1, where RS(n) is the socalled RuzsaSzemerédi function, linked to structure of induced matchings in dense graphs. This implies that further improving the lower bound on hub labeling size to n/2^(logn)^o(1) would require a breakthrough in the study of lower bounds on RS(n), which have resisted substantial improvement in the last 70 years. For general distance labeling of sparse graphs, we show a lowerbound of 1/2O(√logn) * SumIndex(n), where SumIndex(n) is the communication complexity of the SumIndex problem over Z_n. Our results suggest that the best achievable hublabel size and distancelabel size in sparse graphs may be Θ(n/2^(logn)^c) for some 0<c<1.
Joint work with Adrian Kosowski and Laurent Viennot (presented at PODC 2019).
Deep Learning has fundamentally changed Information Retrieval. Traditional search retrieves text documents based on text queries. Today, however, companies want to search for images, videos, customers, jobs, shopping catalog items, friends, places, and many more. For such searches, Deep Learning models provide the most accurate results. Unfortunately, deploying and serving machine learning models at massive scale is still a herculean effort even for highly techheavy companies. This talk will survey some of those challenges. We will also introduce a new cloudmanaged service by HyperCube which serves such realtime workloads.
In this talk, we discuss the problem of recognizing regular languages in a variant of the streaming model of computation, called the sliding window model. In this model, we are given a size of the sliding window n and a stream of symbols. At each time instant, we must decide whether the suffix of length n of the current stream (“the active window”) belongs to a given regular language.
Recent works showed that the space complexity of an optimal deterministic sliding window algorithm for this problem is either constant, logarithmic, or linear in the window size n; and either constant, double logarithmic, logarithmic, or linear in the randomized setting.
In this work, we make an important step forward and combine the sliding window model with the property testing setting, which results in ultraefficient algorithms for all regular languages. Informally, a sliding window property tester must accept the active window if it belongs to the language and reject if it is far from the language. We consider deterministic and randomized sliding window property testers with onesided and twosided errors. In particular, we show that for any regular language, there is a deterministic sliding window property tester that uses logarithmic space and a randomized sliding window property tester with twosided error that uses constant space.
The talk is based on a joint work with M. Ganardi, D. Hucke, and M. Lohrey accepted to ISAAC 2019.
הנדסת תכנה כוללת שיטות לפיתוח מערכות המאפשרות חלוקת עבודה, יעילות והקטנת מספר התקלות. כתתתחום של הנדסת תוכנה, "שיטות פורמליות" עוסקת בטכניקות לבדיקת מערכות וניפוי טעויות. לקראת סוף שנות ה 60 פותחו שיטות להוכחת נכונות של תכנה, כשהמוטיבציה הראשונית נבעה, בין היתר, מפרויקטי החלל, שדרשו אמינות חסרת תקדים למערכות. נושא זה הלך והתפתח עם השנים, כאשר מערכות מחשב חודרות לתחומים רבים כמו רפואה, תחבורה וכולי. השיטה הישנה של בדיקות תכנה שרדה אף היא, ולמרות שהאמינות שהיא מבטיחה הנה פחותה מזאת של הוכחת נכונות, הרי שהיא עדיין יותר ישימה מבחינת כוח חישוב. בשנים האחרונות עולה הפופולריות של בדיקות בזמן ריצה: האפשרות לעקוב אחרי ביצוע המערכת ולהתריע או אף למנוע תקלות, תוך כדי ריצה. זוהי טכניקה שנמצאת בין בדיקות תכנה, שכן אינה מבטיחה מראש הוכחה מלאה של נכונות התכנה, ובין בדיקות תכנה, שכן היא בודקת את המערכת, בזמן ריצה, מול מפרט מורכב.
בהרצאה זאת אתאר את התחום של בדיקות בזמן ריצה, את המפרט הניתן לתכנה ואת הכלים המשמשים לבדיקות כאלו. אתאר גם כלי לבדיקות בזמן ריצה שפותח בשיתוף פעולה עם NASA ומשמש אף לאנליזה של נתונים המגיעים מגששית כוכב מארס של סוכנות החלל.
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread applications. The goal is to preprocess a database such that, given a query text, we can quickly find the database entry that is most similar to the query. This similar text may differ by character replacements, insertions or deletionsin other words, we want the most similar item (or at least an approximatelymostsimilar item) under edit distance. This problem has widespread uses in text processing and computational biology.
Previous approaches to this problem fell into three categories. Bruteforce approaches search through the whole database, using very little preprocessing at the cost of large query time. Prefixtree approaches can be much faster, but they have very large exponential terms, so they are only efficient when the desired text is very similar (has a small edit distance). Embedding approaches can overcome these large search costs regardless of the similarity of the desired text, but embeddings can only be used when the approximation factor is very large (in fact superconstant).
I will be presenting a new method that overcomes these issues: a localitysensitive hashing approach. This method uses a simple hash function, under which similar texts are more likely to hash together than dissimilar texts. Using this hash function, we can answer queries using a hash table lookup, achieving good bounds for all approximation factors and moderate distances to the most similar database text. I will also discuss some potential future directions of this approach, as well as some other recent work on related problems.
Given a stream of N elements, a fheavy hitter is an item that occurs at least fN times in S. The problem of finding heavyhitters has been extensively studied in the streaming literature. In this talk, I will present a related problem. We say that there is a fevent at time t if an element occurs exactly fN times in the stream of elements till time t. Thus, for each fheavy hitter there is a single fevent which occurs when its count reaches the reporting threshold fN. We define the online eventdetection problem (OEDP) as: given f and a stream S, report all fevents as soon as they occur.
Many realworld monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no nonevents reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard samplingbased approaches. I will focus on cacheefficient algorithms in the externalmemory model and present provide algorithms for the OEDP that are within a log factor of optimal.
Arguably the most common network communication problem is multipleunicasts: Distinct packets at different nodes in a network need to be delivered to a destination specific to each packet, as fast as possible.
The famous multipleunicast conjecture posits that, for this natural problem, there is no performance gap between routing and network coding, at least in terms of throughput. We study the same network coding gap, but in terms of completion time.
While throughput corresponds to the completion time for asymptoticallylarge transmissions, we look at completion times of multiple unicasts for arbitrary amounts of data. We develop nearlymatching upper and lower bounds. In particular, we prove that the network coding gap for the completion time of k unicasts is at most polylogarithmic in k, and there exist instances of k unicasts for which this coding gap is polylogarithmic in k.
This talk is based on joint work with Bernhard Haeupler and Goran Zuzic.
Supervised methods are commonly used for machinelearning based applications but require expensive labeled dataset creation and maintenance. Increasingly, practitioners employ weak supervision approaches, where training labels are programmatically generated in higherlevel but noisier ways. However, these approaches require domain experts with programming skills. Additionally, highly imbalanced data is often a significant practical challenge for these approaches. In this work, we propose Osprey, a weaksupervision system suited for highlyimbalanced data, built on top of the Snorkel framework. In order to support noncoders, the programmatic labeling is decoupled into a code layer and a configuration one. This decoupling enables a rapid development of endtoend systems by encoding the business logic into the configuration layer. We apply the resulting system on highlyimbalanced (0.05% positive) socialmedia data using a synthetic data rebalancing and augmentation approach, and a novel technique of ensembling a generative model over the legacy rules with a learned discriminative model. We demonstrate how an existing rulebased model can be transformed easily into a weaklysupervised one. For 3 relation extraction applications based on realworld deployments at Intel, we show that with a fraction of the cost, we achieve gains of 18.5 precision points and 28.5 coverage points over prior traditionally supervised and rulesbased approaches.
Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. This distance measure finds applications in fields such as computational biology, pattern recognition, text processing, and information retrieval. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time, which is also known to be almost optimal under SETH assumption [Backurs, Indyk 2015]. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time (randomized) algorithm that approximates edit distance within polylog(n) approximation factor. In this talk we discuss a randomized algorithm with running time $\tilde{O}(n^{22/7})$ that approximates the edit distance within a constant factor.
(based on a joint work with Das, Goldenberg, Koucky and Saks, appeared in FOCS’18)
In this talk, I will present the first algorithm that breaks the O(n)time barrier for BWT construction. Given a binary string of length n, it builds the Burrows–Wheeler transform in O(n / √log n) time and O(n / log n) space. Any further progress in the time complexity of BWT construction would yield faster algorithms for the problem of counting inversions: it would improve upon the stateoftheart O(m √log m)time solution by Chan and Pǎtraşcu (SODA 2010).
The new algorithm is based on a concept of string synchronizing sets, originally designed for answering Longest Common Extension (LCE) queries, which is of independent interest. I will also show that how this technique yields a data structure of the optimal size O(n / log n) that answers LCE queries in O(1) time and, furthermore, can be deterministically constructed in the optimal O(n / log n) time.
We prove that Θ(kd2 /ε^2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R^d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axisaligned Gaussians, we show that O(kd/ε^2 ) samples suffice, matching a known lower bound. Moreover, these results hold in the agnosticlearning/robustestimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of compression. Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in R^d admits a smallsized compression scheme.
(Joint work with Hassan Ashtiani, Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian and Yaniv Plan)
This talk will describe my past and ongoing work on translating images and words between very different datasets without supervision. Although Humans often do not require supervision to make connections between very different sources of data, this is still difficult for machines. Recently great progress was made by using adversarial training – a powerful yet tricky method. Although adversarial methods have had great success, they have critical failings which significantly limit their breadth of applicability and motivate research into alternative nonadversarial methods. In this talk, I will describe novel nonadversarial methods for unsupervised word translation and for translating images between very different datasets (with Lior Wolf). As image generation models are an important component in our method, I will present a nonadversarial image generation approach, which is often better than current adversarial approaches (with Jitendra Malik).
A homomorphic secretsharing scheme is a secretsharing scheme that allows locally mapping shares of a secret x to compact shares of a function of x. The talk will survey the current state of the art on homomorphic secret sharing, covering applications in cryptography and complexity theory, efficient constructions, and open questions.
The surprising success of machine learning with deep neural networks poses two fundamental challenges. One is understanding why these multilayer networks work so well on many different artificial intelligence tasks, in some cases close or better than human performance. The second is: what can this success tell us about human intelligence and our biological brain.
Our recent Information Bottleneck theory of Deep Learning provides new insights and answers to the first question. It shows that the layers of deep neural networks achieve the optimal information theoretic tradeoff between training sample size and accuracy, and that this optimality is achieved through the noisy process of stochastic gradient decent. Moreover, it shows that the benefit of the multilayers structure is mainly computational  it exponentially boosts the time of convergence to these optimal representations. In this talk I will address the relevance of of these findings to the emergence of hierarchies in deep neural networks and in biological brains.

Machine Learning, Computer Vision, and Drones at the Supermarket
We will show that the following problems are surprisingly related, together with
provable approximations, and cool videos that demonstrate their realworld realtime performance.
1) Have a group of lawful (<200 grams) autonomous toydrones that carry personalized ads inside the supermarket "RamiLevi Hashikma LTD" near the university.
This is related to SLAM (simultaneous localization and mapping).
2) Given n lines and a n points, compute a rotation, translation and 1to1 matching that will minimize the sum of distances between every point to its matched line.
This is a special case of a fundamental problem in computer vision, called PnP (PerspectivenPoint).
3) Minimize Axb+\lambda x\lambda over every \lambda>0 and x in R^d.
This is called Ridge regression in machine learning, if lambda is given. We suggest the first provable approximation without tuning lambda .
4) Headache caused by current Augmented Reality glasses.
Partially based on papers in IEEE International Conference on Robotics and Automation (ICRA'19) and IEEE Robotics and Automation Letters (RAL'19).
Joint work with Ibrahim Jubran, David Cohn, Ariel Hutterer, and Daniel Jeryes from the lab.


Growing complexity of network operations is the subject of many debates. In general there are two major factors impacting complexity of network operations: the size and structure of a manageable state and frequency of its changes. Networks should be autonomous as possible, ideally, completely excluding operators from the operational loop. However, this can significantly increase the size of a manageable state and complexity of network infrastructure. Currently, there is no way to systematically quantify the implemented efficiency versus complexity of network operations by potential design principles. In this talk we will consider this fundamental tradeoff. Our goal is to understand design principles and show their effectiveness allowing networks to become selfdriving.
Deep learning has become one of the leading methods of machine learning, with significant breakthroughs in the the domains of computer vision and speech recognition, among others. In this talk, I will describe some major obstacles, such as computation time and adversarial attacks, and present a number of possible directions we are taking to overcome these challenges.
Recent years have shown adoption of new software trends such as machine learning, programmable computer networks, and blockchain frameworks. Several software bugs and attacks have demonstrated the importance of bringing formal guarantees to such systems. Techniques from verification, automated reasoning and program synthesis were shown successful in providing such guarantees for software systems. My research extends these techniques to these new software trends.
In this talk, I will focus on deep learning and will present two novel, complementary methods, leveraging abstract interpretation and logical constraints, for making deep learning models more reliable and interpretable.
Cloud computing opens a new chapter in information technology, by enabling global access to shared pools of resources such as services, data, servers, and computer networks. It drives new digital businesses across enterprises. In the last few years, an unprecedented amount of data center capacity has been built to support cloud computing services' growth. Therefore, optimizing the energy budget of data centers, without harming service level agreements, would result in massive savings for their operators, and significantly contribute to greater environmental sustainability. A key challenge in optimizing cloud computing services is their online nature. That is, they require immediate and irrevocable decisions to be made, based on incomplete input.
In this talk, I will discuss my work on two major online optimization problems for cloud services: switch routing and virtual machine placement. I will show how these problems relate to graph coloring, one of the most wellknown, popular and extensivelyresearched areas in the field of graph theory. First, I will present tight bounds for online edge coloring in bipartite graphs, which leads to an optimal switch routing scheduler. I will then discuss vector balancing problems, a wellstudied model for virtual machine placement in cloud services. In these problems, jobs have vector loads and the goal is to balance the load on all dimensions simultaneously. For this purpose, I will first consider two natural relaxations of the vertex coloring problem, and show new online lower bounds for them. I will then show how to port these bounds back to vector balancing, and prove that the bounds are tight by presenting matching upper bounds. Finally, for practical applications, these bounds are unsatisfactory, so I will also discuss how to improve the upper bounds by adding restricting, yet practical, assumptions.
It is no secret that online companies, hospitals, creditcard companies and governments hold massive datasets composed of our sensitive personal details. Information from such datasets is often released using some privacy preserving heuristics, which have been repeatedly shown to fail. That is why in recent years the notion of differential privacy has been gaining much attention, as an approach for conducting dataanalysis that adheres to a strong and mathematically rigorous notion of privacy. Indeed, many differentially private analogs of existing dataanalysis techniques have already been devised. These are, however, new algorithms, that require the use of additional random noise on top of existing techniques.
In this talk we will demonstrate how existing techniques, that were developed independently of any privacy consideration, preserve differential privacy by themselves  when parameters are properly set. The main focus of the talk will be the JohnsonLindenstrauss Transform, which preserves differential privacy provided the input satisfies some ``well spread'' properties. We will discuss applications of this algorithm in approximating multiple linear regressions and in statistical inference. Moreover, focusing on linear regression, we will exhibit additional techniques that preserve privacy: regularization, addition of random datapoints and Bayesian sampling.
(Time permitting, we will survey a very different technique, denoising neural networks, that also aligns with the definition of differential privacy in the local model.)
The talk is selfcontained and no prior knowledge is assumed.
In a majority of computer vision applications, establishing visual correspondence is a fundamental computational stage, since it provides a geometrical understanding of the data, that is crucial for high level decision making. It includes a wide range of alignment, registration or matching tasks, between color images, range scans, 3D models and more, which can provide information like the calibration of a camera, its position in space and the 3D structure and motion of objects in a scene.
In such tasks, real world conditions, due to complex object or camera motion, scene geometry and illumination, are difficult to model and hence algorithms are typically challenged with high levels of noise and outlier data.
In this talk, for a range of matching problems, I will present efforts in designing rich models along with robust and efficient optimization algorithms that extend the performance limits of existing methods. I will present a very general and efficient template matcher as well as some new approaches to robust estimation that excel in the regime of high noise and outlier rates. Lastly, I will discuss the state of visual matching in the era of deep learning, including some current and future research directions.
Understanding deep learning calls for addressing three fundamental questions: expressiveness, optimization and generalization. Expressiveness refers to the ability of compactly sized deep neural networks to represent functions capable of solving realworld problems. Optimization concerns the effectiveness of simple gradientbased algorithms in solving nonconvex neural network training programs. Generalization treats the phenomenon of deep learning models not overfitting despite having much more parameters than examples to learn from. This talk will describe a series of works aimed at unraveling some of the mysteries behind optimization and expressiveness. I will begin by discussing recent analyses of optimization for deep linear neural networks. By studying the trajectories of gradient descent, we will derive the most general guarantee to date for efficient convergence to global minimum of a gradientbased algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that, sometimes, gradient descent can train a deep linear network faster than a classic linear model. In other words, depth can accelerate optimization, even without any gain in expressiveness, and despite introducing nonconvexity to a formerly convex problem. In the second (shorter) part of the talk, I will present an equivalence between convolutional and recurrent networks  the most successful deep learning architectures to date  and hierarchical tensor decompositions. The equivalence brings forth answers to various questions concerning expressiveness, resulting in new theoreticallybacked tools for deep network design.
Optimization works covered in this talk were in collaboration with Sanjeev Arora, Elad Hazan, Noah Golowich and Wei Hu. Expressiveness works were with Amnon Shashua, Or Sharir, Yoav Levine, Ronen Tamari and David Yakira.
As software has grown increasingly critical to our society’s infrastructure, mechanicallyverified software has grown increasingly important, feasible, and prevalent. Proof assistants have seen tremendous growth in recent years because of their success in the mechanical verification of highvalue applications in many areas, including cyber security, cyberphysical systems, operating systems, compilers, and microkernels. These proof assistants are built on top of constructive type theory whose computational interpretation is given by the proofsasprograms paradigm, which establishes a correspondence between formal proofs and computer programs. However, while both proof theory and programming languages have evolved significantly over the past years, the crossfertilization of the independent new developments in each of these fields has yet to be explored in the context of this paradigm. This naturally gives rise to the following questions: how can modern notions of computation influence and contribute to formal foundations, and how can modern reasoning techniques improve the way we design and reason about programs?
In this talk I first demonstrate how using programming principles that go beyond the standard lambdacalculus, namely state and nondeterminism, promotes the specification and verification of modern systems, e.g. distributed systems. I then illustrate the surprising fragility of proof assistants in the presence of such new computational capabilities, and outline my ongoing efforts to develop a more robust foundation. For the converse direction, I show how incorporating modern prooftheoretic techniques offers a more congenial framework for reasoning about hard programming problems and hence facilitates the verification effort.
We consider faulttolerant boolean formulas in which the output of a faulty gate is short circuited to one of the gate’s inputs. A recent result by Kalai et al. [FOCS 2012] converts any boolean formula into a resilient formula of polynomial size that works correctly if less than a fraction 1/6 of the gates (on every inputtooutput path) are faulty. We improve the result of Kalai et al., and show how to efficiently fortify any boolean formula against a fraction 1/5 of shortcircuit gates per path, with only a polynomial blowup in size. We additionally show that it is impossible to obtain formulas with higher resilience and subexponential growth in size.
Towards our results, we consider interactive coding schemes when noiseless feedback is present; these produce resilient boolean formulas via a KarchmerWigderson relation. We develop a coding scheme that resists up to a fraction 1/5 of corrupted transmissions in each direction of the interactive channel. We further show that such a level of noise is maximal for coding schemes with subexponential blowup in communication. Our coding scheme takes a surprising inspiration from Blockchain technology.
Joint work with Mark Braverman, Klim Efremenko, and Michael A. Yitayew
Cryptographic hash functions are the basis of many important and far reaching results in cryptography, complexity theory, and beyond. In particular, hash functions are the primary building block of fundamental applications like digital signatures and verifiable computation, and they are the tool underlying the proofsofwork which drive blockchains.
Because of the central role of cryptographic hash functions in both theory and practice, it is crucial to understand their security guarantees, toward basing applications on the minimal possible notion of security. Indeed, there are many ways to formalize the security requirement of a hash function; each way is sufficient for different applications. Also, there are many candidate hash functions, offering various tradeoffs between security, efficiency, and other desirable properties.
In this talk, I will present an application [KomargodskiNaorYogev, FOCS 2017] of a relatively weak notion of hashing (collision resistance) that goes well beyond cryptography into a fundamental problem in the intersection of complexity theory and combinatorics  the Ramsey problem. This will lead us to new emerging aspects of hash functions, including relaxed security notions and their applications, addressing a recent attack on SHA1. I will conclude with several exciting open problems and challenges.
The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods. The canonical algorithm for training learning models is SGD (Stochastic Gradient Descent), yet this method has its limitations. It is often unable to exploit useful statistical/geometric structure, it might degrade upon encountering prevalent nonconvex phenomena, and it is hard to parallelize. In this talk I will discuss an ongoing line of research where we develop alternative methods that resolve some of SGD’s limitations. The methods that I describe are as efficient as SGD, and implicitly adapt to the underlying structure of the problem in a data dependent manner.
In the first part of the talk, I will discuss a method that is able to take advantage of hard/easy training samples. In the second part, I will discuss a method that enables an efficient parallelization of SGD. Finally, I will briefly describe a method that implicitly adapts to the smoothness and noise properties of the learning objective.
Handling the flood of deep DNA sequencing data calls for better algorithms. Minimizers are a central recent paradigm that has improved many sequence analysis tasks. We present an alternative paradigm that leads to substantial further improvement in such tasks. For integers k and L>k, we say that a set of kmers is a universal hitting set (UHS) if every possible Llong sequence must contain a kmer from the set. We present a heuristic called DOCKS to find a compact UHS.
We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed offers substantial saving compared to minimizers. In particular, DOCKS uses less than 30% of the 10mers needed to span the human genome compared to minimizers.
Joint work with Yaron Orenstein (BGU), David Pellow (TAU), Guillaume Marcais, Daniel Bork and Carl Kingsford (CMU)
The modern era is an era of interactive communication between many parties, where many parties are actively sending messages based on the information they received. However, the errors may ruin the communication. In this talk, I will describe how one can convert multiparty protocol into error resilient as well I will show the limits of such schemes.
Deep neural networks have emerged as an effective means for tackling complex, realworld problems. However, a major obstacle in applying them to safetycritical systems is the great difficulty in providing formal guarantees about their behavior. We present an efficient technique for verifying properties of deep neural networks (or providing counterexamples). The technique can also be used to measure a network's robustness to "adversarial inputs"  slight perturbations to a network's input that cause it to err. Our approach is based on the simplex method, extended to handle piecewiselinear activation functions, which are a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a deep neural network implementation of the nextgeneration Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu), proving various properties about them and in one case identifying incorrect behavior.
[Based on joint work with Clark Barrett, David Dill, Kyle Julian and Mykel Kochenderfer]
Bio:
Guy Katz is a senior lecturer at the Hebrew University of Jerusalem, Israel. He received his Ph.D. at the Weizmann Institute of Science in 2015. His research interests lie at the intersection between Formal Methods and Software Engineering, and in particular in the application of formal methods to software systems with components generated via machine learning.
Participatory budgeting is a recent, exciting development in deliberative grassroots democracy in which, e.g., residents of a city specify their preferences on how to construct their city budget, and an algorithm is then applied to aggregate their preferences and decide upon the budget. Even though it is gaining increasing popularity and is used to decide upon many millions of dollars every year, the foundations of participatory budgeting are not yet well understood. I will discuss some recent developments, specifically novel, efficient aggregation algorithms satisfying certain axiomatic properties desired in participatory budgeting scenarios.
In this talk I will tell you how analyzing economic markets where agents have cognitive biases has lead to a better understanding of the communication complexity of local search procedures.
We begin the talk with studying combinatorial auctions with bidders that exhibit endowment effect. In most of the previous work on cognitive biases in algorithmic game theory (e.g., [Kleinberg and Oren, EC'14] and its followups) the focus was on analyzing the implications and mitigating their negative consequences. In contrast, we show how cognitive biases can sometimes be harnessed to improve the outcome.
In this talk I will tell you how analyzing economic markets where agents have cognitive biases has lead to a better understanding of the communication complexity of local search procedures.
We begin the talk with studying combinatorial auctions with bidders that exhibit endowment effect. In most of the previous work on cognitive biases in algorithmic game theory (e.g., [Kleinberg and Oren, EC'14] and its followups) the focus was on analyzing the implications and mitigating their negative consequences. In contrast, we show how cognitive biases can sometimes be harnessed to improve the outcome.
We consider onesided error property testing of $\calF$minor freeness in boundeddegree graphs for any finite family of graphs $\calF$ that contains a minor of $K_{2,k}$, the $k$circus graph, or the $(k\times 2)$grid for any $k\in\mathbb{N}$.
This includes, for instance, testing whether a graph is outerplanar or a cactus graph.
This is joint work with Hendrik Fichtenberger, Yadu Vasudev and Maximilian Wötzel.
The Planar Graph Metric Compression Problem is to compactly encode the distances among $k$ nodes in a planar graph of size $n$. Two naive solutions are to store the graph using $O(n)$ bits, or to explicitly store the distance matrix with $O(k^2 \log{n})$ bits. The only lower bounds are from the seminal work of Gavoille, Peleg, Pérennes, and Raz [SODA'01], who rule out compressions into a polynomially smaller number of bits, for \emph{weighted} planar graphs, but leave a large gap for unweighted planar graphs. For example, when $k=\sqrt{n}$, the upper bound is $O(n)$ and their constructions imply an $\Omega(n^{3/4})$ lower bound. This gap is directly related to other major open questions in labeling schemes, dynamic algorithms, and compact routing.
Computer science has grown out of the seed of imitation. From von Neumann's machine to the famous Turing test, which sparked the field of AI, algorithms have always tried to imitate humans and nature. Examples of such ``imitation algorithms'' are simulated annealing which imitates thermodynamics, genetic algorithms which imitate biology, or deep learning which imitates human learning.
In this talk, I describe an algorithm which imitates human psychology. Specifically, I discuss $M$ algorithms, which serve as a simple example of psychologybased imitation algorithms. The $M$ algorithm is one of the simplest natural language processing (NLP) algorithms.
Respecting the long tradition of imitation algorithms, the $M$ algorithm is both extremely simple and extremely powerful. Like other imitation algorithms, the $M$ algorithm is able to solve extraordinarily difficult problems efficiently. The $M$ algorithm efficiently pinpoints critical events in films, theater productions, and other scripts, revealing the rhythm of the texts.
At first glance, when trying to design an algorithm which pinpoints critical events of a text, it seems necessary for the algorithm to understand the complete text. Additionally, it would be expected that all layers of the narrative, background information, etc., would also be necessary. In short, it would be expected that the algorithm would imitate the human process of comprehending a text.
Surprisingly, the $M$ algorithm utilizes the structure of the complete text itself without understanding even a \emph{single} word, sentence, or character in order to discover critical events. The content of the narrative is not necessary for the algorithm to work. Other than an awareness of the illusion of time, borrowed from psychology, the $M$ algorithm circumvents the human process of reading.
This talk is based on a book (in process).
I discuss two models of BCi that correlate to neural error detection and to enhanced motor memory consolidation, then apply to a three tier model of applied BCI for enhanced human relearning of motor skills such as after stroke, accidents or in general a BCI system for motor learning.
Three components are embedded in relearning of motor skills: brain signals of error detection, training to reduce errors, and memory consolidation of the improved motor execution. I integrate here three components that I have previously studied separately, into a triplestage paradigm that includes EEG markers for error detection, BCIbased motor correction and BCI for memory consolidation of the corrected motor motion through neurofeedback processes. In my earlier work I show that error potentials are uniquely associated with error characteristics, suggesting a potential BCI system for errortailored correction, rather than generic BCI. The third component relates to neurofeedback. My earlier studies of neurofeedback theta after motor learning, showed a significant effect of enhanced motor memory consolidation in the experimental group compared to the control groups. Here I describe a preliminary three stage protocol of motor rehabilitation: error detection, BCI for motor correction and then a neurofeedback for consolidation of the corrected motion.
Computing shortest paths is one of the fundamental problems of graph algorithms.
The goal of *dynamic* single source shortest paths (SSSP) is to maintain a shortest path tree from a fixed source s as the edges of the graph change over time.
The most general case is the fully dynamic one, where each adversarial update inserts or deletes an arbitrary edge. The trivial algorithm is to recompute SSSP after every update in O(m) time. For the fully dynamic case, no nontrivial algorithm is known.
We can, however, improve upon the trivial algorithm by restricting the update sequence to be partially dynamic: only insertions (referred to as incremental), or only deletions (referred to as decremental).
One of the simplest problems on directed graphs is that of identifying the set of vertices reachable from a designated source vertex. This problem can be solved easily sequentially by performing a graph search, but efficient parallel algorithms have eluded researchers for decades. For sparse highdiameter graphs in particular, there is no previously known parallel algorithm with both nearly linear work and nontrivial parallelism. This talk presents the first such algorithm. Specifically, this talk presents a randomized parallel algorithm for digraph reachability with work O(m*polylog(n)) and span O(n^{2/3}*polylog(n)), with high probability, where n is the number of vertices and m is the number of arcs.
The main technical contribution is an efficient Monte Carlo algorithm that, through the addition of O(n*polylog(n)) shortcuts, reduces the diameter of the graph to O(n^{2/3}*polylog(n)) with high probability. While there are existing algorithms that achieve similar guarantees, they are not computationally efficient; the sequential algorithms have running times much worse than linear. This talk presents a surprisingly simple sequential algorithm with running time O(m log^2(n)) that achieves the stated diameter reduction. Parallelizing that algorithm yields the main result, but doing so involves overcoming several additional challenges.
הרצאת קולוקוויום ביום חמישי 25.1.2018 מבוטלת
The theory of NPhardness allows computer scientists to classify nearly all problems as polynomialtime solvable (in P) or NPhard. Recently, there has been a similar development in classifying problems within the class P. These are known as conditional lower bounds, that is, they depend on the conjectured hardness of certain archetypal problems such as CNF Satisfiability and allpairs shortest paths (APSP). Proving conditional lower bounds for fundamental problems in P is now a flourishing research area with the goal of better understanding the landscape of P. Most of the problems for which conditional lower bounds were recently shown are Stringology problems.
In my talk I will focus on the problems of computing the edit distance of strings and the edit distance of trees. I will first show that these problems are very similar in terms of their upper bounds by showing that their state of the art algorithm is in fact the same algorithm. Its running time for strings is O(n^2) and for trees is O(n^3). Then, I will show that in terms of lower bounds the problems are in fact very different: String edit distance exhibits the hardness of the strong exponential time hypothesis (asserting that CNF Satisfiability requires O(2^n) time) while tree edit distance exhibits the hardness of the ASPS hypothesis (asserting that APSP requires O(n^3) time).
We present novel oblivious routing algorithms for both splittable and unsplittable multicommodity flow. Our algorithm for minimizing congestion for unsplittable multicommodity flow is the first oblivious routing algorithm for this setting. As an intermediate step towards this algorithm, we present a novel generalization of Valiant’s classical load balancing scheme for packetswitched networks to arbitrary graphs, which is of independent interest. Our algorithm for minimizing congestion for splittable multicommodity flow improves upon the stateoftheart, in terms of both running time and performance, for graphs that exhibit good expansion guarantees. Our algorithms rely on diffusing traffic via iterative applications of the random walk operator. Consequently, the performance guarantees of our algorithms are derived from the convergence of the random walk operator to the stationary distribution and are expressed in terms of the spectral gap of the graph (which dominates the mixing time).
We present novel oblivious routing algorithms for both splittable and unsplittable multicommodity flow. Our algorithm for minimizing congestion for unsplittable multicommodity flow is the first oblivious routing algorithm for this setting. As an intermediate step towards this algorithm, we present a novel generalization of Valiant’s classical load balancing scheme for packetswitched networks to arbitrary graphs, which is of independent interest. Our algorithm for minimizing congestion for splittable multicommodity flow improves upon the stateoftheart, in terms of both running time and performance, for graphs that exhibit good expansion guarantees. Our algorithms rely on diffusing traffic via iterative applications of the random walk operator. Consequently, the performance guarantees of our algorithms are derived from the convergence of the random walk operator to the stationary distribution and are expressed in terms of the spectral gap of the graph (which dominates the mixing time).
Joint work with Michael Schapira
The security of any system is only as good as its weakest link. Even if the system's security is theoretically proven under some set of assumptions, when faced with realword adversaries, many of these assumptions become flaky, inaccurate and often completely incorrect.
In this talk I will present two cases for bringing this gap between security theory and security practice:
* Utilizing unintentional and abstractiondefying sidechannel leakage from physical computing devices in order to extract secret cryptographic keys and the relation of these attacks to leakage resilient cryptography.
* Constructing and deploying secure computation schemes for arbitrary C programs.
The talk will discuss cryptographic techniques and will include live demonstrations.
The security of any system is only as good as its weakest link. Even if the system's security is theoretically proven under some set of assumptions, when faced with realword adversaries, many of these assumptions become flaky, inaccurate and often completely incorrect.
In this talk I will present two cases for bringing this gap between security theory and security practice:
* Utilizing unintentional and abstractiondefying sidechannel leakage from physical computing devices in order to extract secret cryptographic keys and the relation of these attacks to leakage resilient cryptography.
* Constructing and deploying secure computation schemes for arbitrary C programs.
The talk will discuss cryptographic techniques and will include live demonstrations.
Graph matching is one of the most wellstudied problems in combinatorial optimization, with applications ranging from scheduling and object recognition to numerical analysis and computational chemistry.
Nevertheless, until recently very little was unknown about this problem in reallife **dynamic networks**, which aim to model the constantly changing physical world.
In the first part of the talk we'll discuss our work on dynamic graph matching, and in the second part we'll highlight our work on a few related problems.
Bio:
Shay Solomon is currently a Herman Goldstine Postdoctoral Fellow at IBM T. J. Watson Research Center.
Prior to joining IBM, he was a Rothschild and Fulbright Postdoctoral Fellow at Stanford University, hosted by Prof. Moses Charikar and Prof. Virginia Vassilevska Williams.
Solomon received a Ph.D. degree in Computer Science from the BenGurion University under the guidance of Prof. Michael Elkin.
Solomon's Ph.D. dissertation investigates several longstanding graph compression problems, and has received numerous awards, including a best student paper award for his singleauthored SODA'11 paper.
His postdoctoral work focuses on fundamental computational challenges that arise when dealing with dynamic networks.
Deep neural networks are the new default machine learning approach in many domains, such as computer vision, speech processing, and natural language processing. Given sufficient data for a target task, endtoend models can be learned with fairly simple, almost universal algorithms. Such models learn their own internal representations, which in many cases appear to be similar to humanengineered ones. This may lead us to wonder whether domainspecific techniques or domain knowledge are needed at all.
This talk will provide a perspective on these issues from the domain of speech processing. It will discuss when and how domain knowledge can be helpful, and describe two lines of work attempting to take advantage of such knowledge without compromising the benefits of deep learning. The main application will be speech recognition, but the techniques discussed are general.
Largescale storage systems lie at the heart of the big data revolution. As these systems grow in scale and capacity, their complexity grows accordingly, building on new storage media, hybrid memory hierarchies, and distributed architectures. Numerous layers of abstraction hide this complexity from the applications, but also hide valuable information that could improve the system's performance considerably.
I will demonstrate how to bridge this semantic gap in the context of erasure codes, which are used to guarantee data availability and durability. Current theoretical research efforts focus on codes that will reduce the storage, network, and compute overheads of the systems that use them, without sacrificing their reliability. However, the semantic gap makes it difficult to observe the theoretical benefit of the resulting codes in real implementations. I will follow the example of regeneration and locally recoverable codes, showing the key challenges in applying optimal erasure codes to real systems, and how they can be addressed. This part is based on joint work with Matan Liram, Oleg Kolosov, Eitan Yaakobi, Itzhak Tamo and Alexander Barg.
I will then briefly describe the challenges introduced by the semantic gap in other layers of the "storage stack", and my experience in addressing them. I will refer to the memory hierarchy, flashbased solidstate drives, workload analysis, and aspects of data security.
Distributed protocols such as Paxos play an important role in many computer systems.
Therefore, a bug in a distributed protocol may have tremendous effects.
Accordingly, a lot of effort has been invested in verifying such protocols.
However, due to the infinite state space (e.g., unbounded number of nodes, messages) and protocols complexity, verification is both undecidable and hard in practice.
I will describe a deductive approach for verification of distributed protocols, based on firstorder logic, inductive invariants and user interaction.
The use of firstorder logic and a decidable fragment of universally quantified invariants allows to completely automate some verification tasks.
Tasks that remain undecidable (e.g. finding inductive invariants) are solved with user interaction, based on graphically displayed counterexamples.
I will also describe the application of these techniques to verify safety of several variants of Paxos, and a way to extend the approach to verify liveness and temporal properties.
Bio:
Oded Padon is a fourth year PhD student in Tel Aviv University, under the supervision of Prof. Mooly Sagiv. His research focuses on verification of distributed protocols using firstorder logic.
He is a recipient of the 2017 Google PhD fellowship in programming languages.
Minimum Spanning Trees of weighted graphs are fundamental objects in numerous applications. In particular in distributed networks, the minimum spanning tree of the network is often used to route messages between network nodes. Unfortunately, while being most efficient in the total cost of connecting all nodes, minimum spanning trees fail miserably in the desired property of approximately preserving distances between pairs. While known lower bounds exclude the possibility of the worst case distortion of a tree being small, Abraham et al showed that there exists a spanning tree with constant average distortion. Yet, the weight of such a tree may be significantly larger than that of the MST. In this paper, we show that any weighted undirected graph admits a spanning tree whose weight is at most (1+\rho) times that of the MST, providing constant averagedistortion O(1/\rho).
The constant average distortion bound is implied by a stronger property of scaling distortion, i.e., improved distortion for smaller fractions of the pairs. The result is achieved by first showing the existence of a low weight spanner with small prioritized distortion, a property allowing to prioritize the nodes whose associated distortions will be improved. We show that prioritized distortion is essentially equivalent to coarse scaling distortion via a general transformation, which has further implications and may be of independent interest. In particular, we obtain an embedding for arbitrary metrics into Euclidean space with optimal prioritized distortion.
Joint work with Yair Bartal and Ofer Neiman.
Random matrices have become a very active area of research in the recent years and have found enormous applications in modern mathematics, physics, engineering, biological modeling, and other fields. In this work, we focus on symmetric sign (+/1) matrices (SSMs) that were originally utilized by Wigner to model the nuclei of heavy atoms in mid50s. Assuming the entries of the upper triangular part to be independent +/1 with equal probabilities, Wigner showed in his pioneering works that when the sizes of matrices grow, their empirical spectra converge to a nonrandom measure having a semicircular shape. Later, this fundamental result was improved and substantially extended to more general families of matrices and finer spectral properties. In many physical phenomena, however, the entries of matrices exhibit significant correlations. At the same time, almost all available analytical tools heavily rely on the independence condition making the study of matrices with structure (dependencies) very challenging. The few existing works in this direction consider very specific setups and are limited by particular techniques, lacking a unified framework and tight informationtheoretic bounds that would quantify the exact amount of structure that matrices may possess without affecting the limiting semicircular form of their spectra.
From a different perspective, in many applications one needs to simulate random objects. Generation of large random matrices requires very powerful sources of randomness due to the independence condition, the experiments are impossible to reproduce, and atypical or nonrandom looking outcomes may appear with positive probability. Reliable deterministic construction of SSMs with randomlooking spectra and low algorithmic and computational complexity is of particular interest due to the natural correspondence of SSMs and undirected graphs, since the latter are extensively used in combinatorial and CS applications e.g. for the purposes of derandomization. Unfortunately, most of the existing constructions of pseudorandom graphs focus on the extreme eigenvalues and do not provide guaranties on the whole spectrum. In this work, using binary Golomb sequences, we propose a simple completely deterministic construction of circulant SSMs with spectra converging to the semicircular law with the same rate as in the original Wigner ensemble. We show that this construction has close to lowest possible algorithmic complexity and is very explicit. Essentially, the algorithm requires at most 2log(n) bits implying that the real amount of randomness conveyed by the semicircular property is quite small.

Bloom filters and other approximate membership query data structures (AMQs), are one of the great successes of theoretical computer science, with uses throughout databases, file systems, networks and beyond. An AMQ maintains a set under insertions, sometimes deletions, and queries with onesided error: if a queried element is in the set, the AMQ returns \texttt{present}, if it is not, the AMQ returns \texttt{present} with probability at most $\epsilon$. An optimal AMQ uses $n\log\epsilon^{1}$ bits of space when the represented set has $n$ elements.
Yet there is a gap between their theoretical bounds provided by AMQs and their requirements in the field.
In this talk, I will describe how AMQs are used in practice and how this changes (for the better) our theoretical understanding of these data structures.
Martin FarachColton is a Professor of Computer Science at Rutgers University, New Brunswick, New Jersey. His research focuses on both the theory and practice of external memory and storage systems. He was a pioneer in the theory of cache oblivious analysis. His current research focuses on the use of write optimization to improve performance in both read and writeintensive big data systems. He has also worked on the algorithmics of strings and metric spaces, with applications to bioinformatics. In addition to his academic work, Professor FarachColton has extensive industrial experience. He was CTO and cofounder of Tokutek, a database company that was founded to commercialize his research and acquired in 2015. During 2000–2002, he was a Senior Research Scientist at Google.
Audio recordings are a data source of great value used for analyzing conversations and enabling digital assistants. An important aspect of analyzing singlechannel audio conversations is identifying who said what, a task known as speaker diarization. The task is further complicated when the number of speakers is a priori unknown. In this talk we’ll review the motivation for speaker diarization and verification, outline previous approaches to diarization and their relation to “real” diarization needs (as in audio applications such as Chorus.ai’s Conversation Analytics platform), and present the pipeline required for integrating these solutions, as well as recent SOTA endtoend deep learning approaches to this problem.
Nonvolatile memory (NVM) technologies such as PCM, ReRAM and STTRAM allow processors to directly write values to persistent storage at speeds that are significantly faster than previous durable media such as hard drives or SSDs.
Many applications of NVM are constructed on a logging subsystem, which enables operations to appear to execute atomically and facilitates recovery from failures.
Writes to NVM, however, pass through a processor's memory system, which can delay and reorder them and can impair the correctness and cost of logging algorithms.
Reordering arises because of outoforder execution in a CPU and the interprocessor cache coherence protocol.
By carefully considering the properties of these reorderings, this paper develops a logging protocol that requires only one round trip to nonvolatile memory while avoiding expensive computations.
We show how to extend the logging protocol to building a persistent set (hash map) that also requires only a single round trip to nonvolatile memory for insertion, updating, or deletion.
In this talk, I introduce the often overlooked concept of cache admission policies. Once the cache is full, in order to admit a new item, we need to evict one of the stored items. An admission policy decides if the new item should be admitted to the cache. I will present two admission policies that enhance performance in different fields. The TinyLFU cache admission policy uses past frequency to estimate the benefit of data items. This policy is implemented in the Caffeine highperformance Java library and is used by many open source projects such as Apache Cassandra, Apache Accumulo, JBoss Infinispan, Druid, Neo4J, Spring, to name a few. It is also used in industry by companies such as Google, Netflix, and Linkedin. Alternatively, the RAP admission policy is a randomized streaming algorithm for finding the most frequent items. In RAP, we use a coin flip to decide if a new item should be admitted to the cache. This approach improves the space to accuracy ratio of state of the art algorithms by up to x32 on real traces and up to x128 on synthetic ones.
A spanner of a given graph is a sparse subgraph that approximately preserves distances. Since their introduction in the late 1980's, spanners have found numerous applications in synchronization problems, information dissemination, routing schemes and more.
Many applications of spanners are in computer networks, where the network needs to find a spanner for its own communication graph. We present distributed algorithms for constructing additive spanners in networks of bounded message size, namely in the CONGEST model. In addition, we present an innovative technique for showing lower bounds for constructing spanners in this setting, a technique that can be useful for other distributed graph problems.
Based on joint works with Keren CensorHillel, Telikepalli Kavitha, Noam Ravid and Amir Yehudayoff
The theory of twosided matching (e.g., assigning residents to hospitals, students to schools) has been extensively developed, and it has been applied to design clearinghouse mechanisms in various markets in practice. As the theory has been applied to increasingly diverse types of environments, however, researchers and practitioners have encountered various forms of distributional constraints. As these features have been precluded from consideration until recently, they pose new challenges for market designers. One example of such distributional constraints is a minimum quota, e.g., school districts may need at least a certain number of students in each school in order for the school to operate. In this talk, I present an overview of research on designing mechanisms that work under distributional
constraints.
+
In this work we study a facility location problem from a perspective of mechanism design. We focus on the problem of locating a facility deterministically on discrete graphs with cycles, including grids and hypercubes. We further assume that mechanisms are defined for any number
of participating agents, so that we can discuss consistency properties of mechanisms' behavior with respect to the number of agents. One of such wellstudied properties is falsenameproofness, requiring that no agent can benefit by adding fake agents into the market. Our contribution is two fold. First, for cycle graphs, we show that there exists a falsenameproof and Pareto efficient mechanism if and only if the underlying graph has at most five vertices. Second, for l*mgrid
graphs (where l,m >= 2), we show that such a mechanism exists if and only if either l=2 or m=2 holds. We finally show that for any n(>2)dimensional binarycube graphs, there is no such mechanisms.
Machinelearning attracts a lot of interest in recent years, more than ever before, as it utilizes the increasing amounts of data for many smart applications. This creates a flood of innovative ideas in this domain, constantly introducing new algorithmic and modeling techniques. The rapid pace of innovation poses a challenge in designing hardware that would best fit the needs of this domain several years from now. The goal of this work was to help coping with this challenge, by constructing a benchmark that represents the variety of fundamental compute requirements relevant to this field. The talk describes the analysis method used for building this benchmark. While no one can predict the future of this domain, interesting insights were deduced from its past, and helped hardware and software experts assess several possibilities for the future.
Traditional software is built using hiding and the decomposition process. The system is broken into subsystems which are then broken into components and so on until the lowest object in the hierarchy is defined. With the introduction of components implemented using machine learning and dependent on data a host of new challenges to the system readability are introduced. Challenges include a good match of business objectives, machine learning optimization objectives and relevant data, stability of the quality of the solution due to changes in data, artistic choices of machine learning parameters as well as correctness of components in probability. In this talk I present an endtoend view of the challenges and possible ways to address them as well as some resulting interesting research directions.
Joint work with Senia Kalma, Bar Magnezi and Hen Porcilan (The College of Management Academic Studies)
Was presented in IEEE Security & Privacy 2017.
We present the password reset MitM (PRMitM) attack and show how it can be used to take over user accounts. The PRMitM attack exploits the similarity of the registration and password reset processes to launch a man in the middle (MitM) attack at the application level. The attacker initiates a password reset process with a website and forwards every challenge to the victim who either wishes to register in the attacking site or to access a particular resource on it. The attack has several variants, including exploitation of a password reset process that relies on the victim’s mobile phone, using either SMS or phone call. We evaluated the PRMitM attacks on Google and Facebook users in several experiments, and found that their password reset process is vulnerable to the PRMitM attack. Other websites and some popular mobile applications are vulnerable as well. Although solutions seem trivial in some cases, our experiments show that the straightforward solutions are not as effective as expected. We designed and evaluated two secure password reset processes and evaluated them on users of Google and Facebook. Our results indicate a significant improvement in the security. Since millions of accounts are currently vulnerable to the PRMitM attack, we also present a list of recommendations for implementing and auditing the password reset process.
My affiliation: (Cyberpion & College of Management Academic Studies)
This talk discusses some of the history of graphical models and neural networks and speculates on the future of both fields with examples from the particular problem of image restoration.
Labeling schemes seek to assign a label to each node in a network, so that a function on two nodes can be computed by examining their labels alone. The goal is to minimize the maximum length of a label and (as a secondary goal) the time to evaluate the function. As a prime example, we might want to compute the distance between two nodes of a network using only their labels. We consider this question for two natural classes of networks: trees and planar graphs. For trees on n nodes, we
design labels consisting of 1/4log^2(n) bits (up to lower order terms), thus matching a recent lower bound of Alstrup et al. [ICALP 2016]. For planar graphs, the situation is much more complex. A major open problem is to close the gap between an upper bound of O(n^{1/2}log(n))bits and a lower bound of O(n^{1/3})bits for unweighted planar graphs. We show that, for undirected unweighted planar graphs, there is no hope to achieve a higher lower bound using the known techniques. This is done by designing a centralized structure of size ~O(min(k^2,(kn)^{1/2})) that can calculate the distance between any pair of designated k terminal nodes. We show that this size it tight, up to polylogarithmic terms, for such centralized structures. This is complemented by an improved upper bound of O(n^{1/2}) for labeling nodes of an undirected unweighted planar graph for calculating the distances.
SequencetoSequence learning (Sutskever et. al, 2014) equipped with the attention mechanism (Bahdanau et. al, 2015) became the stateoftheart approach to machine translation, automatic summarization and many other natural language processing tasks, while being much simpler than the previously dominant methods. In this talk we will overview sequencetosequence learning with neural networks, followed by two of our recent contributions: the Hard Attention mechanism for linear time inference, and the StringtoTree model for syntaxaware neural machine translation.
Consider a multilayered graph, where the different layers correspond to different proprietary social networks on the same ground set of users. Suppose that the owners of the different networks (called hosts) are mutually nontrusting parties: how can they compute a centrality score for each of the users using all the layers, but without disclosing information about their private graphs? Under this setting we study a suite of three centrality measures whose algebraic structure allows performing that computation with provable security and efficiency. The first measure counts the nodes reachable from a node within a given radius. The second measure extends the first one by counting the number of paths between any two nodes. The final one is a generalization to the multilayered graph case: not only the number of paths is counted, but also the multiplicity of these paths in the different layers is considered.
We devise a suite of multiparty protocols to compute those centrality measures, which are all provably secure in the informationtheoretic sense. One typical challenge and limitation of secure multiparty computation protocols is their scalability. We tackle this problem and devise a protocol which is highly scalable and still provably secure.
Background:
Multilayer neural networks, trained with simple variants of stochastic gradient descent (SGD), have achieved stateoftheart performances in many areas of machine learning. It has long been a mystery why does SGD work so well – rather than converging to suboptimal local minima with high training error (and therefore, high test error).
Results:
We examine a neural network with a single hidden layer, quadratic loss, and piecewise linear units, trained in a binary classification task on a standard normal input. We prove that the volume of differentiable regions of the empiric loss containing suboptimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given "mild" (polylogarithmic) overparameterization. This suggests why SGD tends to converge to global minima in such networks.
As robots become integrated into human environments, they increasingly interact directly with people. This is particularly true for assistive robots, which help people through social interactions (like tutoring) or physical interactions (like preparing a meal). Developing effective humanrobot interactions in these cases requires a multidisciplinary approach involving both fundamental algorithms from robotics and insights from cognitive science. My research brings together these two areas to extend the science of humanrobot interaction, with a particular focus on assistive robotics. In the process of developing cognitivelyinspired algorithms for robot behavior, I seek to answer fundamental questions about humanrobot interaction: what makes a robot appear intelligent? How can robots communicate their internal states to human partners to improve their ability to collaborate? Vice versa, how can robots "read" human behaviors that reveal people's goals, intentions, and difficulties, to identify where assistance is required?
In this talk, I describe my vision for robots that collaborate with humans on complex tasks by leveraging natural, intuitive human behaviors. I explain how models of human attention, drawn from cognitive science, can help select robot behaviors that improve human performance on a collaborative task. I detail my work on algorithms that predict people's mental states based on their eye gaze and provide assistance in response to those predictions. And I show how breaking the seamlessness of an interaction can make robots appear smarter. Throughout the talk, I will describe how techniques and knowledge from cognitive science help us develop robot algorithms that lead to more effective interactions between people and their robot partners.
That context plays an important – possibly, crucial – role in language processing and decision making is a truism. But what counts as context is different for different tasks and different people. I will discuss several facets of context used as major sources of heuristic knowledge required of an artificial intelligent agent emulating human functioning. I will concentrate on tasks related to language understanding and generation as well as decision making in a goaloriented task environment where the agent is a member of a team whose members can be other agents or humans.
I will be happy to talk to whoever would want to talk with me!
But, of course, the main reason for my visit is being able to talk with you about potential collaborations.
Communication has played a growing role in our lives over the past decade. It often becomes a bottleneck in the performance of our daily tasks. This motivates the pursuit for more efficient communication. However, efficiency is becoming more challenging from the computational aspect, due to several of its characteristics in modern communications.
One such characteristic is the interactivity of the protocols in today's noisy communication environments. One natural approach to overcome such a challenge is called Interactive Coding. Interactive coding is an efficient blackbox mechanism to convert any interactive protocol that performs well under noiseless environment into one that is also resilient to errors, while preserving the efficiency of the protocol.
Another characteristic which challenges today’s communications is the dynamic and interactive nature of modern protocols. This may lead to desynchronization between the communicating parties. However, until now, almost all designed systems assume that both parties are perfectly synchronized and all context is shared perfectly by the communicating agents. Thus, any violation of those protocols leads to a breakdown in communication.
This talk will address both of the aforementioned challenges.The first part of the talk will be devoted to interactive coding and in the second part, we will focus on designing protocols under uncertainty of the shared context.
I will describe two branches of my work related to algorithms for distributed networks.
The main focus will be devoted for FaultTolerant (FT) Network Structures.
The undisrupted operation of structures and services is a crucial requirement in modern day communication networks. As the vertices and edges of the network may occasionally fail or malfunction, it is desirable to make those structures robust against failures.
FT Network Structures are low cost highly resilient structures, constructed on top of a given network, that satisfy certain desirable performance requirements
concerning, e.g., connectivity, distance or capacity. We will overview some results on fault tolerant graph structures with a special focus on FT BreadthFirstSearch.
The second part of the talk will discuss distributed models and algorithms for largescale networks. Towards the end, we will see some connections between distributed computing and other areas such as EE and Biology.
Arrow's famous impossibility theorem (1951) states that there is no perfect voting rule: for three or more candidates, no voting rule can satisfy a small set of very appealing axioms. However, this is no longer the case if we assume that voters' preferences satisfy certain restrictions, such as being singlepeaked or singlecrossing. In this talk, we discuss singlepeaked and singlecrossing elections, as well as some other closely related restricted preference domains, and provide an overview of recent algorithmic results for these domains.
Algorithms and the Internet are revolutionizing how society allocates its resources. Examples range from wireless spectrum and electricity to online advertising and carpooling opportunities. A fundamental question is how to allocate such resources efficiently by designing robust computational markets.
In this talk I will demonstrate recent progress on this question by considering a problem crucial for major industry players like Google: how to design revenuemaximizing allocation mechanisms. Most existing designs hinge on “getting the price right” – selling goods to buyers at prices low enough to encourage a sale, but high enough to garner nontrivial revenue. This approach is difficult to implement when the seller has little or no a priori information about buyers’ valuations, or when the setting is sufficiently complex, as in the case of markets with heterogeneous goods.
I will show a robust approach to designing auctions for revenue, which “lets the market do the work” by allowing prices to emerge from enhanced competition for scarce goods. This work provides guidelines for a seller in choosing among data acquisition and sophisticated pricing, and investment in drawing additional buyers.
Bio: Inbal TalgamCohen is a Marie Curie postdoctoral researcher at HUJI and a visiting postdoctoral researcher at TAU. She holds a PhD from Stanford supervised by Tim Roughgarden, an MSc from Weizmann and a BSc from TAU in computer science, as well as a law LLB. Her research is in algorithmic game theory, including computational and data aspects of market design and applications to Internet economics. Her awards include the 2015 Best Doctoral Dissertation Award of ACM SIGecom, the Stanford Interdisciplinary Graduate Fellowship, and the Best Student Paper Award at EC’15.
In the past several years, crowdactivities have expanded dramatically in both their scope and their size – more diverse activities are involving online crowds, and the number of people (and money) involved in such activities is growing fast. I will focus in this talk on 2 topics, which share as a central feature the partitioning of the crowd into subgroups.
In peerselection, agents try to select the top k agents among themselves. We present several algorithms that try to make this process impartial, i.e., agents will not benefit from lying about their peers, while still choosing "good" agents. We will show algorithms that provably reach impartiality using the division of the agents into subgroups, and then show a set of simulations that give us further insight into which techniques work better in scenarios resembling realworld settings.
From there, we will move to district selection: decision making in settings where each partition (a company subunit, an electoral district, etc.) reaches a decision, and the decisions from all partitions are amalgamated to a final choice. We explore how this effects representability, and in the case of geographical manipulations ("gerrymandering"), present both computational hardness results and experiments on realworld data.
Parameterized Complexity was introduced in the 1990s by Downey and Fellows. Nowadays, various aspects of this field are ubiquitous, vibrant areas of research. In a nutshell, Parameterized Complexity aims to reduce the running times of algorithms for NPhard problems by confining the combinatorial explosion to a parameter k. More precisely, a problem is fixedparameter tractable (FPT) with respect to a parameter k if it can be solved by an algorithm, called a parameterized algorithm, whose time complexity is bounded by f(k)n^{O(1)} for some function f that depends only on k. By careful examination of real datasets, one can choose a parameter k that will often be significantly smaller than the input size n. Indeed, parameterized problems arise in a variety of areas outside of pure theoretical computer science and math.
In this talk, I give a brief overview of some of my recent contributions in Parameterized Complexity. In the first part of my talk, I present results in core research areas such as Kernelization and Parameterized Algorithms. Here, problems are primarily concerned with graphs. However, in practice, inputs are not necessarily graphs. In the second part of my talk, I address this issue. Here, I describe my foray into input domains that consist of polygons, preferences and matrices.
With the proliferation of multicore processors, sharedmemory concurrent programming has become increasingly important. Nevertheless, despite decades of research, there are still no adequate answers to the following fundamental questions:
1. What is the right semantics for concurrent programs in higherlevel languages?
2. Which reasoning principles apply to realistic sharedmemory concurrency?
Concerning the first question, the challenge lies in simultaneously allowing efficient implementation on modern hardware with compiler optimizations, while exposing a wellbehaved programming model. Both the Java and C/C++ standards tried to address this question, but their
solutions were discovered to be flawed. They either allow spurious "outofthinair" program behaviors or incur a prohibitive runtime cost. In the main part of the talk, I will present a solution to this problem based on a novel idea of promises, which supports efficient implementations and useful programming guarantees.
As for the second question, I will outline my ongoing efforts to identify (i) logical laws for deductive reasoning about concurrent programs; and (ii) programming disciplines that enable the application of wellestablished verification techniques.
Short Bio:
Ori Lahav is a postdoctoral researcher at Max Planck Institute for Software Systems (MPISWS). His research interests include programming languages, formal methods and verification, semantics and logic. Previously, Ori was a postdoctoral researcher at the Programming Languages Group at Tel Aviv University. He obtained his Ph.D. from Tel Aviv University in 2014 under the supervision of Arnon Avron. Ori received a Dan David Prize scholarship for young researchers in 2014, the Kleene award for the best student paper at LICS 2013, and a Wolf Prize for excellent graduate students in 2012.
This talk covers two closely related lines of work.
For a number of speech tasks, it can be useful to represent speech segments of arbitrary length by fixeddimensional vectors, or embeddings. In particular, vectors representing word segments  acoustic word embeddings  can be used in querybyexample tasks, examplebased speech recognition, or spoken term discovery. *Textual* word embeddings have been common in natural language processing for a number of years now; the acoustic analogue is only recently starting to be explored. This talk will present our work on acoustic word embeddings, including a variety of models in unsupervised, weakly supervised, and supervised settings.
Teamwork is a core human activity, essential to progress in many areas. A vast body of research in the social sciences and in computer science has studied teamwork and developed tools to support teamwork. Although the technologies resulting from this work have enabled teams to work together more effectively in many settings, they have proved inadequate for supporting the coordination of distributed teams that operate in a looselycoupled manner. In this talk, I will present three integrated research efforts towards developing intelligent systems that reduce coordination overhead in such teams: an in depth formative study of complex healthcare teams, the design of new computational methods for personalizing the information shared with team members about collaborators' activities, and an evaluation of those methods in a realistic teamwork setting, demonstrating that personalized information sharing resulted in higher productivity and lower perceived workload of team members.
אנו שמחים להתכבד בנוכחותו של
מר רוג'ר גראס – Mr. Roger Grass
תורם הבניין למדעי המחשב
PrivacyPreserving Computations in the Digital Era 
המרצים: פרופ' יהודה לינדל 
Natural Language Processing with Neural Networks 
דוקטורנט רועי אהרוני 
Automated Agents that Interact Proficiently with People 
פרופ' שרית קראוס 
PrivacyPreserving Computations in the Digital Era 
המרצים: פרופ' יהודה לינדל 
Natural Language Processing with Neural Networks 
דוקטורנט רועי אהרוני 
Automated Agents that Interact Proficiently with People 
פרופ' שרית קראוס 
There is a lot of concern, by legislators, experts and the general public, about the loss of privacy. In the computer science research community, extensive efforts are dedicated to develop Privacy Enhancing Technologies, mostly based on cryptography; much of these efforts are designed to provide privacy even against (cloud) providers.
However, the reality is that users explicitly agree to share their private data with few providers (Google, FB,...), although these providers explicitly state that they may make commercial use of this private data, and in spite of numerous incidents where such data was exposed to unauthorized third parties, and papers showing designfailures allowing such unauthorized disclosure of private data. This is known as the privacy paradox.
We present simple economical analysis of privacy, explaining the privacy paradox, and showing potential market failure, harming social welfare. We discuss possible solutions, including regulatory and cryptographic controls.
Recently, many popular InstantMessaging (IM) applications announced support for endtoend encryption, claiming confidentiality even against a rogue operator. Is this, finally, a positive answer to the basic challenge of usablesecurity presented in the seminal paper, ‘Why Johnny Can’t Encrypt’?
Our work evaluates the implementation of endtoend encryption in popular IM applications: WhatsApp, Viber, Telegram, and Signal, against established usablesecurity principles, and in quantitative and qualitative usability experiments. Unfortunately, although participants expressed interest in confidentiality, even against a rogue operator, our results show that current mechanisms are impractical to use, leaving users with only the illusion of security.
Hope is not lost. We conclude with directions which may allow usable EndtoEnd encryption for IM applications.
Connectivity related concepts are of fundamental interest in graph theory. The area has received extensive attention over four decades, but many problems remain unsettled, especially for directed graphs. A strongly connected directed graph is 2edgeconnected (resp.,2vertexconnected) if the removal of any edge (resp., vertex) leaves the graph strongly connected. In this talk we present improved algorithms for computing the maximal 2edge and 2vertexconnected induced subgraphs of a given directed graph. These problems were first studied more than 35 years ago, with O(mn) time algorithms for graphs with m edges and n vertices being known since the late 1980s. In contrast, the same problems for undirected graphs are known to be solvable in linear time. For the directed case, we showed 1) O(n^2) time algorithms in joint work with Monika Henzinger and Sebastian Krinninger (ICALP'15) and 2) O(m^1.5) time algorithms in joint work with Shiri Chechik, Thomas D. Hansen, Giuseppe F. Italiano, and Nikos Parotsidis (SODA'17).
Everyone wants to program with “highlevel concepts”, rather than meddle with the fine details of the implementation, such as pointers, network packets, and asynchronous callbacks. This is usually achieved by introducing layers of abstraction – but every layer incurs some overhead, and when they accumulate, this overhand becomes significant and sometimes prohibitive. Optimizing the program often requires to break abstractions, which leads to suboptimal design, higher maintenance costs, and subtle hardtotrace bugs.
I will present two recent projects that attempt to address this difficulty. TransCal is a rewritebased interactive synthesis engine that helps a developer mechanically transform a highlevel program into a tractable, semantically equivalent version. TransCal attempts to encode and reuse practices that programmers usually reason about informally. The programs targeted are general functionalstyle programs with operations on collections.
Bellmania is a specialized tool for accelerating dynamicprogramming algorithms by generating recursive divideandconquer implementations of them. Recursive divideandconquer is an algorithmic technique that was developed to obtain better memory locality properties. Bellmania includes a highlevel language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divideandconquer technique; a visualization interface helps users to interactively guide the process, while an SMTbased backend verifies each step and takes care of lowlevel reasoning required for parallelism.
Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing informationtheoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a nontrivial polytope, or a convex surrogate for a lowrank structure, computing the projection is computationally inefficient in highdimensional settings. An alternative is the conditional gradient method (CG), aka FrankWolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.
In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for lowrank matrices. Results will be demonstrated on applications including: LASSO, video colocalization, optical character recognition, matrix completion, and multiclass classification.
הרצאת קולוקוויום ביום חמישי 23.6.16 מבוטלת.
הרצאת קולוקוויום ביום חמישי 23.6.16 מבוטלת.
We propose a new measurement scheme for lowrank matrix recovery, where each measurement
is a linear combination of elements in one row or one column of the unknown matrix.
This setting arises naturally in applications but current algorithms, developed for standard matrix recovery problems, do not perform well in this case, hence the need for developing new algorithms and theory.
We propose a simple algorithm for the problem based on Singular Value Decomposition (SVD) and leastsquares (LS), which we term SVLS. We prove favourable theoretical guarantees for our algorithm
for the noiseless and noisy case, compared to standard matrix completion measurement schemes
and algorithms. Simulations show improved speed and accuracy, including for the problem of unknown rank estimation.
Our results suggest that the proposed rowandcolumn measurements scheme, together with our recovery algorithm, may provide a powerful framework for affine matrix recovery.
Time permitting, I will describe also progress on other data analysis projects for sparse and structuredsparse data, including a new groupsparse clustering algorithm
TBA
Transfer learning plays a major role in the recent success of deep face recognition methods. Deep networks are trained to solve the multiclass classification problem using a cross entropy loss and the learned representations are then transferred to a different domain. Moreover, the task changes posttransfer to face verification (same/notsame). In the talk, I will point to a few research questions, including: What is the ideal source dataset? How to train in the target domain? How to estimate the certainty of the identification?
One of the observations we make is that the transferred representations support only a few modes of separation and much of their dimensionality is underutilized. To alleviate this, we suggest to learn, in the source domain, multiple orthogonal classifiers. We prove that this leads to a reduced rank representation, which, however, supports more discriminative directions. For example, we obtain the 2nd best result on LFW for a single network. This is achieved using a training set that is a few orders of magnitude smaller than that of the leading literature network, and using a very compact representation of only 51 dimensions.
Given time, I will describe recent achievements in other computer vision domains: optical flow, action recognition, image annotation, and more.
Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the distributed LOCAL model, such as maximal matching, MIS, vertex coloring, and edgecoloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this talk we show that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each of our results has a very compelling takeaway message:
1. Fast Deltacoloring of trees requires random bits.
2. Randomized lower bounds imply higher deterministic lower bounds.
3. Deterministic lower bounds imply randomized lower bounds.
Joint work with YiJun Chang and Seth Pettie
We will be giving an overview of technologies built in Google for Google and how we externalize some of them to the world either as a service or through OOS or both. We will talk about several products including:
 Google Cloud Platform: A review of the products that are part of it and when it makes sense to use the Cloud
 Big Data tools and services: Best practices and demo showing services that make big data manageable
 Machine Learning: Tensorflow opensourcing and CloudML with some preview of the Vision API and other libraries.
I believe lecture will be one hour to an hour and a half and will be held in english.
The presenter will be Matthieu Mayran, Cloud Solutions Architect
The research division of IBM has been publishing a monthly challenge since May 1998. In the talk we will walk through some of these challenges, give some examples of the solvers and their solutions and some insights on the 'behind the scenes' of authoring PonderThis.
I will talk about fair division for strategic agents and will start with the cake cutting problem, which models the division of a heterogeneous divisible good among players with different preferences. Here I will talk about an impossibility result (dictatorship theorem), as well as an
algorithmic framework for strategic agents, where fair outcomes can be implemented in the equilibria of protocols.
If enough time I will also mention the problem of allocating multiple divisible goods for the fairness solution concept of Nash social welfare. When the preferences are known, this solution is achieved in the Fisher market equilibrium outcomes. However, with strategic agents the strong fairness guarantees of market equilibria can be destroyed. We find that for
additive valuations, the Fisher market approximates the NSW within a factor of 2, but for Leontief utilities the approximation degrades linearly with the number of players. Surprisingly, a mechanism known as Trading Post not only retains the constant 2approximation of Fisher for additive utilities, but approximately implements the NSW objective for Leontief valuations: its price of anarchy is 1 + epsilon for every epsilon > 0.
This is joint work with Peter Bro Miltersen, Ioannis Caragiannis, David Kurokawa, Ariel Procaccia, Vasilis Gkatzelis, and Ruta Mehta.
I will talk about fair division for strategic agents and will start with the cake cutting problem, which models the division of a heterogeneous divisible good among players with different preferences. Here I will talk about an impossibility result (dictatorship theorem), as well as an
algorithmic framework for strategic agents, where fair outcomes can be implemented in the equilibria of protocols.
If enough time I will also mention the problem of allocating multiple divisible goods for the fairness solution concept of Nash social welfare. When the preferences are known, this solution is achieved in the Fisher market equilibrium outcomes. However, with strategic agents the strong fairness guarantees of market equilibria can be destroyed. We find that for
additive valuations, the Fisher market approximates the NSW within a factor of 2, but for Leontief utilities the approximation degrades linearly with the number of players. Surprisingly, a mechanism known as Trading Post not only retains the constant 2approximation of Fisher for additive utilities, but approximately implements the NSW objective for Leontief valuations: its price of anarchy is 1 + epsilon for every epsilon > 0.
This is joint work with Peter Bro Miltersen, Ioannis Caragiannis, David Kurokawa, Ariel Procaccia, Vasilis Gkatzelis, and Ruta Mehta.
Matrix decompositions, and especially SVD, are very important tools in data analysis. When big data is processed, the computation of matrix decompositions becomes expensive and impractical. In recent years, several algorithms, which approximate matrix decomposition, have been developed. These algorithms are based on metric conservation features for linear spaces of random projections. We present a randomized method based on sparse matrix distribution that achieves a fast approximation with bounded error for low rank matrix decomposition.
In this talk I will present our recent successful integration of various techniques from the Artificial Intelligence (AI) literature into the software debugging and testing process. First, we show how data that is already stored by industry standard software engineering tools can be used to learn a fault prediction model able to predict accurate the software components that are likely to contain bugs. This allows focusing testing efforts on such errorprone components. Then, we show how this learned fault prediction model can be used to augment existing software diagnosis algorithms, providing a better understanding of which software components need to be replaced to correct an observed bug. Moreover, for the case where further tests are needed to identify the faulty software component, we present a testplanning algorithm based on Markov Decision Processes (MDP).
Importantly, the presented approach for considering both a fault prediction model, learned from past failures, and a diagnosis algorithm that is modelbased, is general, and can be applied to other fields, beyond software troubleshooting.
Will describe how dynamic pricing can be used to solve parking related problems. Will also discuss dynamic pricing for kserver problems, task systems, unit demand bidders, and other problems.
Miniature crawling robots, on the order of centimeter scale and below, have a variety of applications including medical procedures, search and rescue operations, maintenance, reconnaissance, environmental monitoring, assisted agriculture, cave exploration, mapping, studying biological organisms, security, defense, exploration of hazardous environments. Their small size allows them to navigate in remote areas otherwise inaccessible to wheeled or larger robots such as collapsed building and caves and biological vessels, while their low cost makes them disposable and allows their use in large quantities and swarm formations. As miniature crawling robots at this scale are underactuated are achieving stability at high velocities over varying terrain, reducing the cost of transport, overcoming obstacles, controlling jumping and landing, adhering quickly to different surfaces, transitioning between horizontal to vertical motion, and climbing.
This talk will address some of the modeling and actuation challenges of crawling robots outdoors and inside biological vessels, while taking into account contact compliance and sliding.
Biography: David Zarrouk is currently a Senior Lecturer at the ME depart of BGU. He received his M.S. (2007) and Ph.D (2011) degrees from the faculty of mechanical engineering at the Technion. Between Aug. 2011 and Oct. 2013, he was postdoctoral scholar at the Biomimetics and Millisystems Lab. in the EECS dep. of U.C. Berkeley, working on miniature crawling robots. His research interests are in the fields of biomimetics, millisystems, miniature robotics, flexible and slippery interactions, underactuated mechanisms and theoretical kinematics. He received many prizes for excellence in research and teaching, which include a Fulbright postdoctoral Fellowship, FulbrightIlan Ramon postdoctoral Fellowship, Hershel Rich Innovation award, Aharon and Ovadia Barazani prize for excellent Ph.D. thesis, and Alfred and Yehuda Weisman prize for consistent excellence in teaching. a
People collaborate in carrying out such complex activities as treating patients, coauthoring documents and developing software. Technologies such as Google Drive, Dropbox and Github enable teams to share work artifacts remotely and asynchronously. The coordination of team members’ activities remains a challenge, however, because these technologies do not have capabilities for focusing people’s attention on the actions taken by others that matter most to their own activities. In this talk, I will present our work towards developing intelligent systems for supporting information sharing in distributed teams. Based on a study of complex health care teams, we formalize the problem of information sharing in looselycoupled extendedduration teamwork. We develop a new representation, Mutual Influence Potential Networks, that implicitly learns collaboration patterns and dependencies among activities from team members’ interactions, and an algorithm that uses this representation to determine the information that is most relevant to each team member. Analysis of Wikipedia revision history data and an evaluation in a simulation environment demonstrate the ability of the proposed approach to identify relevant information to share with team members.
Joint work with Barbara Grosz and Krzysztof Gajos.a
System synthesis refers to the task of automatically generating an executable component of a system (e.g. a software or hardware component) from a specification of the component's behavior. The traditional formalization of the problem assumes the specification is given by a logical formalism. Recent trends to synthesis relax the problem definition and consider a variety of inputs including logical requirements, incomplete programs, and examples behaviors. In this talk I will describe some of the challenges on the road to usable synthesis, a variety of current approaches for coping with them, and some success stories.
Short bio:
Dana Fisman is a research scientist at the University of Pennsylvania, the Associate Director of the NSF expedition ExCAPE about system synthesis, and a visiting fellow at Yale University. She did her PhD in Weizmann under the supervision of Amir Pnueli, and worked many years in the industry in IBM Haifa Research Labs, and in Synopsys Inc. Dana’s research interests are in the area of formal methods in system design. She is mostly known for her work on PSL, the IEEE standard for property specification language, on which she received numerous awards from IEEE, IBM and Synopsys.
In recent years, machine learning has emerged as an important and influential discipline in computer science and engineering. Modern applications of machine learning involve reasoning about complex objects like images, videos, and large documents. Treatment of such highdimensional data requires the development of new tools, since traditional methods in machine learning no longer apply. In this talk I will present two recent works in this direction. The first work introduces a family of novel and efficient methods for inference and learning in structured output spaces. This framework is based on applying principles from convex optimization while exploiting the special structure of these problems to obtain efficient algorithms. The second work studies the success of a certain type of approximate inference methods based on linear programming relaxations. In particular, it has been observed that such relaxations are often tight in real applications, and I will present a theoretical explanation for this interesting phenomenon.
Bio:
Ofer Meshi is a Research Assistant Professor at the Toyota Technological Institute at Chicago. Prior to that he obtained his Ph.D. and M.Sc. in Computer Science from the Hebrew University of Jerusalem. His B.Sc. in Computer Science and Philosophy is from Tel Aviv University. Ofer’s research focuses on machine learning, with an emphasis on efficient optimization methods for inference and learning with highdimensional structured outputs. During his doctoral studies Ofer was a recipient of Google's European Fellowship in Machine Learning.
Information diffusion is the process in which nuggets of information spread in a network (typically a social network). This is a complex process that depends on the network topology, the social structures and the information itself (content/language). In this talk I will discuss information diffusion from these different yet complementary perspectives. In the first part of the talk I will focus on the features of the diffusing information. I will present a gradient boosted trees algorithm modified for learning user preferences of Twitter hashtags. In the second part of my talk I will focus on the network structure. I will use exponential random graph models (ERGM) in order to learn what latent factors contribute to network formation and I will show how network structure and social roles contribute to the information spread. Specifically, I will present promising results obtained on political networks in the American political systems and analysis of the partisan use of political hashtags.
In both parts I put emphasis on interpretable models that go beyond accurate prediction and facilitate a better understanding of complex social processes.
Bio sketch:
Oren is a postdoctoral fellow at the Harvard's School for Engineering and Applied Science (SEAS) jointly with the Network Science Institute at Northeastern University and is also affiliated with Harvard's Institute for Quantitative Social Science (IQSS). He received his Ph.D. in Computer Science from the Hebrew University at 2013. In his work he combines natural language processing and network analysis in order to model and predict complex social dynamics. His work was published in both the NLP and the web/data science communities. Oren coorganized the workshop on NLP and Social Dynamics at ACL 2014, is coorganizing the EMNLP 2016 workshop on NLP and Computational Social Science and will be giving the tutorial on Understanding Offline Political Systems by Mining Online Political Data at WSDM 2016. His research on sarcasm detection (with Dmitry Davidov and Ari Rappoport) was listed in Time Magazine special issue as one of the 50 best inventions of the year (2010).
Homepage: http://people.seas.harvard.edu/~orentsur, Oren http://people.seas.harvard.edu/~orentsur/
In distributed storage systems (DSS) data is stored over a large number of storage nodes in such a way that a user can always retrieve the stored data, even if some storage nodes fail. To achieve such resilience against node failures, DSS introduce data redundancy based on different coding techniques. When a single node fails, the system performs node repair, i.e., reconstructs the data stored in the failed node in order to maintain the required level of redundancy.
In this talk we introduce a new approach to constructing different families of optimal codes for DSS via rankmetric codes. In the first part of the talk we deal with two important goals that guide the design of codes for DSS: reducing the repair bandwidth, i.e., the amount of data downloaded from the nodes during node repair, and achieving locality, i.e., reducing the number of nodes participating in a node repair process. First, we present a construction of a new family of optimal locally repairable codes, which enable repair of a failed node while using only few other nodes. Second, we introduce a new class of codes which minimize repair bandwidth for given locality parameters.
In the second part of the talk we consider the problem of making DSS resilient to adversarial attacks. In such attacks, an adversary may gain readonly access to information stored in DSS (passive eavesdropper) or may modify it (active adversary). We present a novel construction of codes for DSS which provide perfect secrecy, i.e., codes which allow to store data without revealing any information to an eavesdropper. This construction is also based on rankmetric codes.
Bio:
Natalia Silberstein received her B.A. degree in Computer Science (cum laude), B.A. degree in Mathematics (cum laude), M.Sc. degree in Applied Mathematics (summa cum laude), and Ph.D. degree in Computer Science from the Technion, in 2004, 2007, and 2011, respectively. She then spent two years as a postdoctoral fellow in the Wireless Networking and Communications Group, at the Department of Electrical and Computer Engineering at the University of Texas at Austin, USA. She is currently a postdoctoral fellow at the Computer Science Department at the Technion.
Her research focuses on coding for distributed storage systems, network coding, algebraic errorcorrecting codes, applications of combinatorial designs and graphs to coding theory.
 Last modified: 24/12/2015