# Elite and Periphery in Social Networks: An Axiomatic Approach

Many societies exhibit a two-tier structure composed of a social elite, namely, a relatively small but well-connected and highly influential group of powerful individuals, and the rest of society (referred to hereafter as the periphery). The talk will concern understanding the structure, size and properties of the elite and the powers that shape it, using an axiom-based model for the relationships between the elite and the periphery.

(Joint work with Chen Avin, Zvi Lotker, Yvonne-Anne Pignolet, and Itzik Turkel)

### Previous Lectures

Discrepancy theory has been developed into a diverse and fascinating field, with numerous closely related areas. In this talk, I will survey several classic results in combinatorial and geometric discrepancy and then present discrepancy bounds of two kinds, the first is a size-sensitive bound for geometric set systems (that is, the sets are realized by simply-shaped regions, as halfspaces and balls in d-dimensions), and the second is a discrepancy bound for abstract set systems of low degree, which is a problem motivated by the Beck-Fiala conjecture. Both bounds are nearly-optimal. For the first bound our analysis exploits the so-called "entropy method" and "partial coloring", combined with the existence of small packing numbers, and for the second bound our mechanism exploits the Lovasz Local Lemma.

I will also present the notion of relative approximations, initially introduced in learning theory, and their relation to discrepancy, and will conclude with their applications to approximate range search, geometric optimization and machine learning.

Cooperative game theory spans the formation of coalitions among collaborative agents, as well as proposing reasonable payoff divisions among them. This branch of game theory is rooted in Von-Neumann & Morgenstern’s foundational work, with many beautiful theoretical ideas; however, it has seen relatively sparse application. In this talk, I will discuss several research thrusts which aim at making the theory of cooperative games more applicable; I will discuss how the introduction of overlapping coalition structures – i.e. allowing agents to divide their resources among more than one coalition – allows one to model complex agent interaction.

Moreover, I will show how one can overcome the computational challenges traditionally associated with finding cooperative solution concepts by relaxing our requirements. By looking for a probably approximately correct (PAC) solution, and applying ideas from computational learning theory, one can find good solutions to cooperative games while eliminating computational overhead.

Finally, I will discuss exciting directions for the study of cooperative games, both in the application of the theory to causality and classification, and in empirical human trials

**Bio**:

Yair Zick is a postdoctoral research fellow in the computer science department at Carnegie Mellon University. He has completed his PhD at Nanyang Technological University, SPMS (funded by the Singapore A*STAR SINGA award). He received his B.Sc (Mathematics and the "Amirim" honors program) from the Hebrew University of Jerusalem. His research interests include game theory, fair division and their applications to domains such as machine learning, security, and privacy.

He is the recipient of the 2014 IFAAMAS Victor Lesser Distinguished Dissertation Award, and the 2011 Pragnesh Jay Modi Best Student Paper Award.

For decades, the Web did not support fundamental cryptography primitives for applications, and user authentication has been limited to user-name passwords. Due to the work of the W3C Web Cryptography Working Group this year and new efforts around Web Authentication next year, the Web has begun evolving to a more secure platform for application development. The Web Cryptography API provides a standardized Javascript API currently supported across browsers that supports fundamental primitives ranging from PNRG access to key derivation functions. The proposed FIDO 2.0 standard to replace passwords with one-factor cryptographic authentication has already gained widespread industry support from Google and Microsoft, and should be an open standard next year. We'll overview how these new Web-level standards interact with the larger standards-based eco-system, including new developments on the TLS level at the IETF, re-visiting the debates over recommended NIST curves in CFRG, and work in encrypted and privacy-preserving messaging standards.

Many societies exhibit a two-tier structure composed of a social elite, namely, a relatively small but well-connected and highly influential group of powerful individuals, and the rest of society (referred to hereafter as the periphery). The talk will concern understanding the structure, size and properties of the elite and the powers that shape it, using an axiom-based model for the relationships between the elite and the periphery.

(Joint work with Chen Avin, Zvi Lotker, Yvonne-Anne Pignolet, and Itzik Turkel)

.

Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency or correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure.

In this talk, we will consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we will consider Bloom filters, a data structure for approximate set membership, and prove a tight connection between Bloom filters in this model and cryptography.

Joint work with Eylon Yogev

Over the past three and a half years, we have been designing, implementing, and deploying a wildlife tracking system using the reverse-GPS or time-of-arrival principle. The system estimates the location of wild animals from arrival-time estimates of signals emitted by miniature radio transmitters attached to the animals. The talk will briefly describe the system but will focus mainly on the algorithms that it uses. I will describe the algorithms, some of which are new, the assumptions that underlie them, and how the assumptions and the algorithms affect the accuracy of the system and its performance. The accuracy evaluation is completely realistic; it is based on both stationary transmitters at known positions and on transmitters attached to stationary wild animals (Owls, which are stationary during the day).

This is joint work with Adi Weller-Weiser, Ran Nathan, Yotam Orchan, Tony Weiss, and others.

**This week we will start 10 minute earlier.**

.Finding the complexity of the n-dimensional Fourier transform computation has been an evasive problem for over 50+ years. The FFT (Fast Fourier Transform) of Cooley and Tukey (1965) gives an upper bound of O(n log n), but nothing better than Omega(n) has been known in a reasonable model of computation. In the talk I will show that speedup of Fourier computation implies loss of accuracy on a machine using words of fixed size (e.g. 32 or 64 bits). In order to recover the accuracy, one must work with words of size \omega(1), giving rise to an effective lower bound of Omega(n log log n).

Clustering is an area of huge practical relevance but rather meager theoretical foundations.

I will discuss some fundamental challenges that a theory of clustering needs to face. In particular, I'll describe two different approaches to addressing those challenges;

an axiomatic approach and a statistical/machine-learning perspective.

If time permits, I will also discuss the computational complexity of some common clustering formulations - another area in which our theoretical understanding lags far behind the practical scene.

I will outline recent progress made along these directions, as well as allude to some common misconceptions and potential pitfalls. The talk is geared towards stimulating discussions and highlighting open questions more than to providing answers or boasting definitive results.

I will survey the immense progress in combinatorial and computational geometry in the past seven years, triggered by the infusion of techniques from algebraic geometry and algebra, as introduced by Guth and Katz and further developed by the community at large.

This has led to solutions of very hard problems, with the crown jewel being a nearly complete solution to Erdos's 1946 problem on distinct distances in the plane.

In this talk I will survey the recent developments. They include new bounds for other variants of the distinct distances problem, new bounds for incidences between points and lines or other geometric entites in various contexts, and re-examination of the theory of Elekes, Ronyai, and Szabo on polynomials vanishing on grids, and numerous applications thereof.

In the (short) time that I will have I will only highlight some of the key developments, and demonstrate the new approach by a few examples.

The talk might assume some basic knowledge in algebra and geometry. Passion for geometric problems is always a plus.

למעוניינים להגיע: הקולוקוויום יתקיים ביום חמישי 8/10/15 בשעה 17:00 בבניין 211 - הפקולטה למדעים מדוייקים בחדר הסמינרים.

Nethanel Gelernter, from Bar-Ilan, is going to give a talk.

Title: Cross-site Search Attacks

Abstract: Cross-site search (XS-search) attacks circumvent the same-origin policy and extract sensitive information, by using the time it takes for the browser to receive responses to search queries. This side-channel is usually considered impractical, due to the limited attack duration and high variability of delays. This may be true for naive XS-search attacks; however, we show that the use of better tools facilitates effective XS-search attacks, exposing information efficiently and precisely.

We present and evaluate three types of tools: (1) appropriate statistical tests, (2) amplification of the timing side-channel, by ‘inflating’ communication or computation, and (3) optimized, tailored divide-and-conquer algorithms, to identify terms from large ‘dictionaries’. These techniques may be applicable in other scenarios. We implemented and evaluated the attacks against the popular Gmail and Bing services, in several environments and ethical experiments, taking careful, IRB-approved measures to avoid exposure of personal information.

Joint work with Amir Herzberg; to be presented in ACM CCS'2015.

Modeling large multi-molecular assemblies at atomic resolution is a key task in elucidating cell function. Since, there is no single experimental method, that can deliver atomic resolution structures of such large molecules, hybrid methods, which integrate data from various experimental modalities, are being developed for this task.

We have developed a new integrative method, which combines atomic resolution models of individual assembly components with an electron microscopy map of the full assembly. It can also naturally accommodate available chemical cross link (Xlink) data.

Specifically, the input to our algorithm is an intermediate resolution (6-10 Angstrom) electron density map of the full assembly, atomic resolution (2 A0) maps of the individual assembly subunits, and, if available, cross link information between some residues of neighboring subunits (an Xlink can be visualized as a loose ~30A0 string connecting two atoms on the surfaces of neighboring subunits). The output is an atomic resolution map of the whole assembly.

The algorithm was highly successful and efficient on all the intermediate resolution EM complexes from the 2010 Cryo-EM Modeling Challenge. Remarkably, a 6.8A0 resolution 20S proteasome map, consisting of 28 (structurally homologues) units was modeled at 1.5A0 RMSD from native in about 10 minutes on a Core i7 laptop. In case of missing (or poorly modeled) individual subunits, the method can return partial solutions, thus, enabling interactive modeling.

From a purely geometric viewpoint, the task can be viewed as an assembly of a large multiple piece puzzle, where we have relatively accurate models of the individual subunits, and a rough, low resolution scan of the full puzzle volume.

This is joint work with my M.Sc. students Dan Cohen and Naama Amir.

**יוגש כיבוד קל**

A dataset has been classified by some unknown classifier into two types of points. What were the most important factors in determining the classification outcome? In this work, we employ an axiomatic approach in order to uniquely characterize an influence measure: a function that, given a set of classified points, outputs a value for each feature corresponding to its influence in determining the classification outcome. We discuss the relation between our influence measure and causality: showing that we in fact measure the expected (counterfactual) responsibility of a feature on the classification outcome. We show that our influence measure takes on an intuitive form when the unknown classifier is linear. Finally, we employ our influence measure in order to analyze the effects of user profiling on Google’s online display advertising.

In this talk I will present several principles behind state of the art algorithms for solving combinatorial optimization tasks defined over graphical models (Bayesian networks, Markov networks, constraint networks, satifiability) and demonstrate their performance on some benchmarks.

Specifically I will present branch and bound search algorithms which explore the AND/OR search space over graphical models and thus exploit problem’s decomposition (using AND nodes), equivalence (by caching) and pruning irrelevant subspaces via the power of bounding heuristics. In particular I will show how the two ideas of mini-bucket partitioning which relaxes the input problem using node duplication only, combined with linear programming relaxations ideas which optimize cost-shifting/re-parameterization schemes, can yield tight bounding heuristic information within systematic, anytime, search.

Notably, a solver for finding the most probable explanation (MPE or MAP), embedding these principles has won first place in all time categories in the 2012 PASCAL2 approximate inference challenge, and first or second place in the UAI-2014 competitions

Parts of this work were done jointly with: Radu Marinescu, Robesrt Mateescu, Lars Otten, Alex Ihler, Natalia Flerova and Kalev Kask

Bio

Rina Dechter is a professor of Computer Science at the University of California, Irvine. She received her

PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematics from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning.

Professor Dechter is an author of ‘Constraint Processing’ published by Morgan Kaufmann, 2003, and ‘Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms’ by Morgan and Claypool publishers, 2013, has authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research and journal of Machine Learning (JLMR). She was awarded the Presidential Young investigator award in 1991, is a fellow of the American association of Artificial Intelligence since 1994, was a Radcliffe Fellowship 2005-2006, received the 2007 Association of Constraint Programming (ACP) research excellence award and is a 2013 Fellow of the ACM. She has been Co-Editor-in-Chief of Artificial Intelligence, since 2011.

**Building 202, room 104**

Bio:Brigadier General (res.) Nadav Zafrir - Cyber and intelligence expert, founder of Team 8. Former Commander of IDF’s technolog &intelligence unit (8200), and founder of the IDF Cyber Command.

Building 305 room 10

The sign-rank of a real matrix A with no 0 entries is the minimum rank of a matrix B so that A_{ij}B_{ij} >0 for all i,j. The study of this notion combines combinatorial, algebraic, geometric and probabilistic techniques with tools from real algebraic geometry, and is related to questions in Communication Complexity, Computational Learning and Asymptotic Enumeration. I will discuss the topic and describe its background, several recent results from joint work with Moran and Yehudayoff, and some intriguing open problems.

This talk describes 3 different scenarios in which systems can be designed to be robust despite malicious participants. The scenarios are very different and have very different defenses. One is how to build trust chains of certificates considering that some CAs might be untrustworthy. Another is how to build a network that guarantees that two nodes can communicate, with a fair share of bandwidth, even if some switches are malicious (e.g., giving false information in the distributed routing algorithm, flooding the network with garbage traffic, mis-forwarding traffic, or throwing away traffic from one source). The third is a way to give a a data item an expiration date, after which it is unrecoverable from a storage system. Although this might seem unrelated to the topic, it really is related. Trust me.

Time-offset interaction is a new technology that enables conversational interaction with a person who is not present, using pre-recorded video statements: a large set of statements are prepared in advance, and users access these statements through natural conversation that mimics face-to-face interaction. The speaker's reactions to user questions are selected by a statistical classifier, using technology that is similar to interactive systems with synthetic characters; recordings of answers, listening and idle behaviors, and blending techniques are used to create a persistent visual image of the speaker throughout the interaction.

A preliminary working system of time-offset interaction has been created with statements recorded by Pinchas Gutter, a Holocaust survivor, talking about his personal experiences before, during and after the Holocaust. Statements were recorded in two rounds; user questions were collected between the rounds through a “Wizard of Oz” system, where live operators select an appropriate reaction to each user utterance in real time. The result of this process is that 95% of newly elicited user questions are addressed by the recorded statements. The automated conversational system is presently undergoing beta testing in a museum. Time-offset interaction will allow future generations to experience a face-to-face conversation with a Holocaust survivor.

This talk will present the concept of time-offset interaction, the underlying language processing technologies, and the process of content elicitation needed to ensure robust coverage. The talk will include a live demonstration of time-offset interaction.

Micro-structure studies the dynamics of trading using synthesized toy models. We will present few basic problems and corresponding models. We analyze them and explore the trading features they reveal. Finally, we will examine their fits and predictions to real life trading.

Shlomo Ahal together with Michael Levin founded Istra research. Before that they founded Kashya and sold it to EMC.

The Evolutionary-Random-Forest algorithm, basically chooses the best trees from the population, and then creates new, more accurate “child” trees from those “parents”. In effect, this creates a new population of trees with enhanced classification accuracy. The predictions are carried out using a small population of enhanced-accuracy “child” trees, thus increasing the prediction algorithm speed and accuracy.

There are numerous situations where we study a large complex system whose behavior is determined by pairwise interactions among its constituents. These can be interacting molecules, companies involved in some business, humans etc. This explains, at least partially, why graph theory is so ubiquitous wherever mathematics is applied in the real world. However, there are numerous interesting and important situations where the underlying interactions involve more than two constituents. This applies in all the above-mentioned scenarios. The obvious place to look for a relevant mathematical theory is hypergraph theory. Unfortunately, this part of combinatorics is not nearly as well understood as graph theory. In this talk I will speak about a special class of hypergraphs, namely, simplicial complexes. It turns out that there is a fascinating combinatorial theory of simplicial complexes that is rapidly developing in recent years. In this lecture I explain some of the exciting recent findings in this area.

My coworkers in these investigations are (historical order): Roy Meshulam, Mishael Rosenthal, Lior Aronshtam, Tomasz Luczak, Yuval Peled, Yuri Rabinovich and Ilan Newman.

Yossi is a recipient of Gödel Prize and is an ACM Fellow for contributions to the analysis of big data and the field of streaming algorithms

Yossi is the only Google VP outside of the US.

ROOM CHANGE!!

This week colloquium will take place in Building 507 room 107.

Bitcoin is the first digital currency to see widespread adoption. While payments are conducted between pseudonyms, Bitcoin cannot offer strong privacy guarantees: payment transactions are recorded in a public decentralized ledger, from which much information can be deduced. Zerocoin (Miers et al., IEEE S&P 2013) tackles some of these privacy issues by unlinking transactions from the payment's origin. Yet, it still reveals payments' destinations and amounts, and is limited in functionality.

In this paper, we construct a full-fledged ledger-based digital currency with strong privacy guarantees. Our results leverage recent advances in zero-knowledge Succinct Non-interactive ARguments of Knowledge (zk-SNARKs).

First, we formulate and construct decentralized anonymous payment schemes (DAP schemes). A DAP scheme enables users to directly pay each other privately: the corresponding transaction hides the payment's origin, destination, and transferred amount. We provide formal definitions and proofs of the construction's security.

Second, we build Zerocash, a practical instantiation of our DAP scheme construction. In Zerocash, transactions are less than 1 kB and take under 6 ms to verify --- orders of magnitude more efficient than the less-anonymous Zerocoin and competitive with plain Bitcoin.

Joint work with Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, and Madars Virza

A long standing challenge in Web search is how to accurately determine

the intention behind a searcher’s query, which is needed to rank,

organize, and present information most effectively. The difficulty is

that users often do not (or cannot) provide sufficient information

about their search goals. As this talk with show, it is nevertheless

possible to read their intentions through clues revealed by behavior,

such as the amount of attention paid to a document or a text fragment.

I will overview the approaches that have emerged for acquiring and

mining behavioral data for inferring search intent, ranging from

contextualizing query interpretation and suggestion, to modeling

fine-grained user interactions such as mouse cursor movements in the

searcher’s browser. The latter can also be used to measure the

searcher’s attention “in the wild’’, with granularity approaching that

of using eye tracking equipment in the laboratory. The resulting

techniques and models have already shown noteworthy improvements for

search tasks such as ranking, relevance estimation, and result summary

generation, and have applications to other domains, such as

psychology, neurology, and online education.

Biosketch:

Eugene is a Principal Research Scientist at Yahoo Labs in Haifa, on

leave from Emory University, where he is an Associate Professor and

founder of the IR Laboratory. Eugene's research spans the areas of

information retrieval, data mining, and human computer interaction. At

Yahoo Labs, he works on the Answering Research team trying to

automatically answer questions from millions of searchers. Dr.

Agichtein is actively involved in the international research

community, having co-authored over 100 publications (including 4 best

paper awards), co-chaired the ACM WSDM 2012 conference (with Yoelle

Maarek), and served on the program or organizing committees of all the

main information-retrieval and web search conferences.

Counting the number of types of squares rather than their occurrences, we consider

the problem of bounding the maximum number of distinct squares in a string. Fraenkel and

Simpson showed in 1998 that a string of length n contains at most 2n distinct squares.

Ilie presented in 2007 an asymptotic upper bound of 2n(log n). We show that a string

of length n contains at most b11n=6c distinct squares. This new upper bound is obtained

by investigating the combinatorial structure of double squares and showing that a string of

length n contains at most b5n=6c double squares. In addition, the established structural

properties provide a novel proof of Fraenkel and Simpson's result. Joint work with Antoine

Deza and Adrien Thierry.

Many types of multi-dimensional data have a natural division into two "views", such as audio and video or images and text. Multi-view learning includes a variety of techniques that use multiple views of data to learn improved models for each of the views. The views can be multiple measurement modalities (like the examples above) but also can be different types of information extracted from the same source (words + context, document text + links) or any division of the data dimensions into subsets satisfying certain learning assumptions. Theoretical and empirical results show that multi-view techniques can improve over single-view ones in certain settings. In many cases multiple views help by reducing noise in some sense. In this talk, I will focus on multi-view learning of representations (features), especially using canonical correlation analysis (CCA) and related techniques. I will give an overview of CCA and its relationship with other techniques such as partial least squares (PLS) and linear discriminant analysis (LDA). I will also present extensions developed by ourselves and others, such as kernel, deep, and generalized ("many-view") CCA. Finally, I will give recent results on speech and language tasks, and demonstrate our publicly available code.

Based on joint work with Raman Arora, Weiran Wang, Jeff Bilmes, Galen Andrew, and others.

The nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. Given a database of points in R^d and a query point, NN search tries to obtain the data point closest to the query point under some metric distance. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces.

In this talk we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.

Our approach relies on efficient range encoding on TCAM, which is a known problem in the field of packet classification. We also show how our encoding technique is useful for applications from this field and provide theoretical analysis to show that it is the closest to the lower bound of number of bits used.

Joint work with Anat Bremler-Barr (IDC), David Hay (HUJI), and Yacov Hel-Or (IDC).

The dynamic process in which agents compete for the public attention is of great interest for computer scientist as well as for political scientists, communication scholars and marketing experts. In this talk I will present a novel framework for unsupervised analysis of these dynamic processes. In the spirit of the time, I’ll focus on the political domain showing how we can use probabilistic topic models combined with sentiment analysis and time series regression analysis (autoregressive-distributed-lag models) to gain insights about the political processes, including partisan attention, topic ownership and coordinated campaigns for agenda setting and attention shifts, commonly known as ‘spin’. This approach can be applied in other similar settings in which agents compete for the public attention, either promoting ideology or a commercial product.

Short Bio:

Oren Tsur is a postdoctoral fellow at Harvard University (SEAS & IQSS) jointly with Lazer's lab at Northeastern University. He earned his PhD. in Computer Science from the Hebrew University and his research combines Natural Language Processing and Network Science. Oren received the 2014 NSF fellowship for research of Political Networks.

15 minutes of fame: Our work on sarcasm detection was selected by Time Magazine as one of the

50 Best Inventions of 2010. (Here is pop sci talk [HEB])

Homepage:

Abstract:In the list-decoding problem, a sender (Alice) sends a message to a receiver (Bob) over a noisy channel, and Bob's job is to come up with a short list of messages with the guarantee that Alice's true message appears in the list. It turns out that Alice and Bob can succeed even when nearly all of the symbols that Alice sends to Bob are corrupted. It is known (and not hard to see) that a completely random code will achieve this, and it is also known (and much harder) that there are explicit codes which achieve this. In this talk, we'll explore the middle-ground of structured random codes: for example, random linear codes, or Reed-Solomon codes with random evaluation points. Our main result is that if you start with any q-ary code with sufficiently good distance and randomly puncture it, then with high probability, you get a code that is list-decodable up to a 1 - 1/q - epsilon fraction of errors, with near-optimal rate and list sizes. Our results imply that "most" Reed-Solomon codes are list-decodable beyond the Johnson bound; whether or not such codes even existed was previously open. Our results also imply improved bounds on the list-decodability of random linear codes over large alphabets. This is joint work with Atri Rudra.

Many operations in scientific computing rely on operations on large matrices. These include clustering, signal denoising, regression, dimension reduction and many more. If these matrices are too large to hold in memory (or on disk), one must operate on their sketches or approximations as surrogates. In this talk I will survey the progression of ideas, results and techniques developed in over a decade of research into this important problem.

Bio:

Edo Liberty is a Research Director at Yahoo Labs and manages Yahoo's Scalable Machine Learning group in New York. He received his BSc from Tel Aviv university and his PhD in Computer Science from Yale university, where he also held a postdoctoral position in the Program in Applied Mathematics.

Managing the valuable data of large-scale networks is among the most fundamental challenges in computer science.

Whenever the data involves network distances, one may use a {spanner},

which provides a good approximation for all the pairwise distances using a small (ideally, a linear) number of network links.

{Geometric spanners} are spanners for networks induced by points in the plane, or more generally, by arbitrary Euclidean spaces.

Such spanners have been intensively studied since the mid-80s, with applications ranging from compact routing and distance oracles to robotics and machine learning.

In this talk I will discuss novel non-geometric techniques to geometric spanners, and demonstrate their effectiveness in solving several long-standing open questions.

The main message of my talk is somewhat surprising -- the right approach to geometric spanners is often non-geometric

תתקיים בחדר הפקולטה למדעים מדויקים

Submodular functions form a large natural class of set functions with applications in many fields including social networks, machine learning and game theory. Optimization of submodular functions subject to various constraints attracted much attention in recent years, both from theoretical and practical points of view. This talk considers the problem of maximizing a submodular function subject to a matroid constraint, which is a central problem demonstrating many of the modern approaches and techniques used in the field of submodular maximization. Many aspects of this problem have been studied, including its polynomial time approximability, fast (almost linear time) algorithms and online models. This talk surveys some of these aspects and explores a few of the main results obtained recently.

Crowd data sourcing is increasingly used to gather information from the crowd for different purposes. In this talk, I will present a novel approach for crowd data sourcing that enables users to pose general questions to the system, which in turn mines the crowd for potentially relevant data. The retrieved answers are concise and represent frequent, significant data patterns. Our approach, termed crowd mining, utilizes an ontological knowledge base as well as virtual databases representing the knowledge of crowd members. A formal query language is used to declaratively specify patterns of interest to be mined from these knowledge sources. Queries are evaluated by an efficient engine that employs means of statistical modeling and semantic inference, for minimizing the effort of the crowd and the probability of error. I will present the theoretical analysis of the algorithm underlying the query evaluation engine, as well as its practical implementation.

Machine learning is the backbone of a large array of fields, including Vision, Speech Recognition, NLP, Computational Biology and many more. Despite 30 years of intensive research learning is still a great mystery, forming a major challenge to theoretical computer science and beyond.

I will survey learning theory as we know it today, addressing both its statistical and computational aspects, and highlighting new and surprising results:

1) I will show that statistically optimal algorithms are surprisingly very different than what was believed since the 70's.

2) I will present a new methodology to prove computational hardness of learning. It is used to prove, under a certain complexity assumption, the hardness of many basic learning problems. Prior to that, and despite 30 years of research, there were huge gaps between the performance of known algorithms and hardness results.

The last result provides evidence that, from nowadays perspective, learning problems are extremely hard. Hardness of learning stands in a sharp contrast with the abilities of computers and humans to learn. I will briefly discuss possible avenues to develop a theory that overcomes this discrepancy.

Based on joint works with Nati Linial, Shai Shalev-Shwartz, Shai Ben-David, Yonatan Bilu, Sivan Sabato, and Mike Saks

**כיבוד קל יוגש בשעה 11:30 לכבוד חנוכה .**

Principal Component Analysis is a widely used pre-processing technique in machine learning for reducing the dimension of the data and cleaning noise.

In standard PCA, the input to the problem is a set of d dimensional vectors x_1,... x_n and a target dimension k < d; the output is a set of k dimensional vectors y_1,..., y_n that best capture the top singular directions of the original vectors.

In the online setting, the vectors x_t are presented to the algorithm one by one, and for every presented x_t the algorithm must output a low-dimensional vector y_t before receiving x_{t+1}. This setting is interesting for instance in case that the PCA is part of an online learning pipeline and the low-dimensional vectors y_t are fed to an online learning algorithm.

We present the first approximation algorithms for this setting of online PCA. Our algorithm produces vectors of dimension k * poly(1/\epsilon) whose quality admit an additive \epsilon approximation to the optimal offline solution allowed to use k dimensions.

We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting). If either the reference policies are stateless rather than stateful, or the feedback includes the rewards of all actions (the so called "expert" setting), previous work shows that the optimal regret grows like O(\sqrt(T)) in terms of the number of decision rounds T.

The difficulty in our setting is that the decision maker unavoidably loses track of the internal states of the reference policies, and thus cannot reliably attribute rewards observed in a certain round to any of the reference policies. In fact, in this setting it is impossible for the algorithm to estimate which policy gives the highest (or even approximately highest) total reward. Nevertheless, we design an algorithm that achieves expected regret that is sublinear in T, of the form O(T/polylog T). Our algorithm is based on a certain local repetition lemma that may be of independent interest. We also show that no algorithm can guarantee expected regret better than O(T/polylog T).

Joint work with Tomer Koren and Moshe Tennenholtz

Handling increasingly large datasets is one of the major problems faced by machine learning today. One approach is to distribute the learning task, and split the data among several machines which can run in parallel. Ideally, a distributed learning algorithm on k machines should provably (1) Run k times faster than an algorithm designed for a single machine; (2) Reach the same statistical learning performance with the same amount of training data; And (3) Use minimal communication between the machines, since it is usually much slow than internal processing. In other words, such an algorithm should combine computational efficiency, statistical efficiency, and communication efficiency. In this talk, I'll survey the challenges of designing such algorithms for convex learning problems, and describe some recent advances as well as fundamental limitations.

Includes joint work with Andrew Cotter, Ofer Dekel, Ran Gilad-Bachrach, Nathan Srebro, Karthik Sridharan, Lin Xiao and Tong Zhang.

**Noam Shental, The department of Computer Science, The Open University of Israel**

**High resolution microbial profiling**

The emergence of massively parallel sequencing technology has revolutionized microbial profiling, allowing the unprecedented comparison of microbial diversity across time and space in a wide range of host-associated and environmental ecosystems. Although the high throughput nature of such methods enables the detection of low frequency bacteria, these advances come at the cost of sequencing read length, limiting the phylogenetic resolution possible by current methods.

We present a generic approach for integrating short reads from large genomic regions, thus enabling phylogenetic resolution far exceeding current methods. The approach is based on a mapping to a statistical model that is later solved as a constrained optimization problem.

We demonstrate the utility of this method by analyzing human saliva and Drosophila samples, using Illumina single-end sequencing of a 750bp amplicon of the 16S rRNA gene. Phylogenetic resolution is significantly extended while reducing the number of falsely detected bacteria, as compared to standard single-region Roche 454 Pyrosequencing.

Our approach can be seamlessly applied to simultaneous sequencing of multiple genes providing a higher resolution view of the composition and activity of complex microbial communities.

Joint work with Amnon Amir, Amit Zeisel, Michael Elgart, Shay Stern, Ohad Shamir and Yoav Soen, from Weizmann Institute of Science, Or Zuk from the Hebrew University and Peter J. Turnbaugh, Harvard University.

References

- Amnon Amir, Amit Zeisel, Or Zuk, Michael Elgart, Shay Stern, Ohad Shamir, Peter J. Turnbaugh, Yoav Soen and Noam Shental.
*High resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions*. Nucleic Acids Research, Nov, 2013 - Or Zuk, Amnon Amir, Amit Zeisel, Ohad Shamir and Noam Shental,
*Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization*. SPIRE, 2013 - Yael Fridmann-Sirkis, Shay Stern, Michael Elgart, Matana Galili, Amit Zeisel, Noam Shental and Yoav Soen,
*Delayed development induced by toxicity to the host can be inherited by a bacterial-dependent, transgenerational effect*. Frontiers in Genetics, 5:27, Jan, 2014

.They are going to tells us how to fix maps and how to estimate time of arrival

Over the past half-century, we have transitioned from a world with just a handful of mainframe computers owned by large corporations, to a world in which private individuals have multiple computers in their

homes, in their cars, in their pockets, and even on their bodies This transition was enabled by computer science research in multiple areas such as systems, networking, programming languages, human computer interaction, and artificial intelligence. We are now in the midst of a similar transition in the area of

robotics. Today, most robots are still found in controlled, industrial settings. However, robots are starting to emerge in the consumer market, and we are rapidly transitioning towards a time when private individuals will have useful robots in their homes, cars, and workplaces. For robots to operate robustly in such dynamic, uncertain environments, we are still in need of multidisciplinary research advances in many areas such as computer vision, tactile sensing, compliant motion, manipulation, locomotion, high-level decision-making, and many others. This talk will focus on two essential capabilities for robust autonomous intelligent robots, namely online learning from experience, and the ability to interact with other robots and with people. Examples of theoretically grounded research in these areas will be highlighted, as well as concrete applications in domains including robot soccer and autonomous driving.

**מיקום - בניין 507 כתה 1**

All oxide PV are expected to be very inexpensive, stable and environmentally safe. However, the

electronic properties of most known MOs, i.e. short lifetime of excited electronic states and low

mobility of charge carriers, prevented their use as active solar cell materials. To bring all-oxide

PV to a breakthrough, new materials have to be developed. The prospect of finding these unique

materials lies in combinatorial material science which can produce novel MOs consisting of

two, three, four or more elements. While most binary MOs are known, the number of unknown

compositions is drastically increasing with the number of components. The new materials are

tested in combinatorial PV device libraries to investigate them under PV operating conditions.

Data analysis, storage and handling are organized using a dedicated database. Tools for data

mining are developed to reveal complex correlations between the material properties and the

device performance and to improve our understanding of all-oxide PV device operation. The

methodology, new photoactive MOs and opportunities will be reported.

.

All oxide PV are expected to be very inexpensive, stable and environmentally safe. However, the electronic properties of most known MOs, i.e. short lifetime of excited electronic states and low mobility of charge carriers, prevented their use as active solar cell materials. To bring all-oxide PV to a breakthrough, new materials have to be developed. The prospect of finding these unique materials lies in combinatorial material science which can produce novel MOs consisting of two, three, four or more elements. While most binary MOs are known, the number of unknown compositions is drastically increasing with the number of components. The new materials are tested in combinatorial PV device libraries to investigate them under PV operating conditions. Data analysis, storage and handling are organized using a dedicated database. Tools for data mining are developed to reveal complex correlations between the material properties and the device performance and to improve our understanding of all-oxide PV device operation. The methodology, new photoactive MOs and opportunities will be reported.

We show how the use of genetic programming, in combination of model checking and testing, provides a powerful way to synthesize programs. Whereas classical algorithmic synthesis provides alarming high complexity and undecidability results, the genetic approach provides a surprisingly successful heauristics. We describe several versions of a method for synthesizing sequential and concurrent systems. To cope with

the constraints of model checking and of theorem proving, we combine such exhaustive verification methods with testing. We show several

examples where we used our approach to synthesize, improve and correct code.

Nearest neighbor searching is among the most fundamental retrieval problems in geometry. Given a set of points in d-dimensional space, the objective is to preprocess these points so that the closest point to a given query point q can be computed efficiently. Provably efficient solutions are known in only one- and two-dimensional space, and research over two decades has focused on solving the problem approximately.

In this talk I will survey developments in approximate nearest neighbor searching in Euclidean space of constant dimension. Recent work on establishing the best computational bounds has been based on the solution to a completely different problem, namely approximating a convex polytope with a minimum number of facets. We will see how a number of classical concepts from convex geometry can be applied to obtain the best known bounds for approximate nearest neighbor searching.

faculty room (next to the dean office)

Suppose you are given a map that subdivides the earth into regions (countries, states, voting districts, and so on). Given your GPS coordinates, you would like to know, as fast as possible, which region you are in. Your goal is to build an efficient data structure for answering such queries.

Suppose there are n regions. Generalizing binary search, an ideal solution uses space O(n) and answers queries in O(log n) time. But what if you know that some regions (such as highly populated cities) are much more likely to be sought than others (such as desolate wastelands). Rather than worst-case time, you want to minimize AVERAGE query time. Here, an ideal solution runs in time proportional to the ENTROPY of the access distribution.

In this talk, we will derive data structures for answering this problem both in 1D and 2D space. We will explain what entropy is and how it is related to this problem. Finally, we will present a simple, randomized algorithm that achieves a running time that is within a constant factor of the entropy bound.

Finding corresponding points between images is challenging, particularly when objects change their pose non-rigidly, in wide-baseline conditions, or when instances of a perceptual category are compared. In this talk I will present an algorithm for finding a geometrically consistent set of point matches between two images. Given a set of candidate matches that may include many outliers, our method seeks the largest subset of these correspondences that can be aligned using a non-rigid deformation that exerts a bounded distortion. I will discuss theoretical guarantees and show experimentally that this algorithm produces excellent results on a number of test sets, in comparison to several state-of-the-art approaches. In the second part of the talk I will introduce a convex framework for problems that involve singular values. Specifically, the framework enables optimization of functionals and constraints expressed in terms of the extremal singular values of matrices. I will show applications of this framework in geometry processing problems such as non-rigid shape registration and computing extremal quasi-conformal maps.

This is joint work with Yaron Lipman, Stav Yagev, Roi Poranne, David Jacobs, Shahar Kovalsky and Noam Aigerman.

In many multi-agent systems we find information brokers or information technologies aiming to provide the agents with more information or reduce the cost of acquiring new information. In this talk I will show that better information can hurt: the presence of an information provider, even if the use of her services is optional, can degrade both individual agents' utilities and overall social welfare. The talk will focus on two specific domains: auctions (where the provided information relates to the common value of the auctioned item) and cooperative information gathering (where costly information is shared between the agents). For the first, I'll show that with the information provider in the market, in conflict with classic auction theory, the auctioneer may prefer to limit the number of bidders that participate in the auction and similarly bidders may prefer to have greater competition. Also, bidders' unawareness of the auctioneer's option to purchase the information does not necessarily play into the hands of the auctioneer and, similarly, bidders may sometimes benefit from not knowing that the auctioneer has the option to purchase such information. For cooperative information gathering I'll present three methods for improving overall and individual performance, all based on limiting and constraining information sharing. Along the talk we will also discuss ways that charities could benefit from group buying; and why it makes sense to pay someone to over-price information she wants to sell you

The 3SUM problem is to decide, given a set of N real numbers, whether

any three of them sum to zero. A simple algorithm solves 3SUM in

O(N^2) time and it has long been conjectured that this is the best

possible.

The significance of the 3SUM problem does not lie with its practical

applications (roughly zero) but with the long list of problems in

computational geometry that are reducible from 3SUM. Some examples

include deciding whether a point set contains 3 colinear points,

calculating the area of the union of a set of triangles, and

determining whether one convex polygon can be placed within another

convex polygon. If 3SUM requires N^2 time then all 3SUM-hard problems

require N^2 time. More recently Patrascu gave many conditional lower

bounds on triangle enumeration and dynamic data structures that rely

on the hardness of 3SUM over the integers.

In this talk I'll present a non-uniform (decision-tree) algorithm for

3SUM that performs only N^{3/2} comparisons and a uniform 3SUM

algorithm running in O(N^2/polylog(N)) time. This result refutes the

3SUM conjecture and casts serious doubts on the optimality of many

O(N^2) geometric algorithms.

This is joint work with Allan Gronlund. A manuscript is available at

arXiv:1404.0799.

Obtaining a non-trivial (super-linear) lower bound for computation of

the Fourier transform in the linear circuit model has been a long

standing open problem for over 40 years.

An early result by Morgenstern from 1973, provides an $\Omega(n \log n)$

lower bound for the UNNORMALIZED Fourier transform when the constants

used in the computation are bounded. The proof uses a potential function

related to a determinant. The determinant of the unnormalized Fourier

transform is $n^{n/2}$, and thus by showing that it can grow by at most

a constant factor after each step yields the result. This classic

result does not explain why the NORMALIZED Fourier transform (of unit

determinant) should require $\Omega(n\log n)$ steps.

More recently, Ailon (2013) showed that if only unitary 2-by-2 gates are

used, one obtains an $\Omega(n\log n)$ lower bound, using a notion of

matrix entropy. In this work we take another step and show an

$\Omega(n\log n)$ lower bound for computing the Fourier transform, under

ANY scaling, in a model allowing the gates to perform any nonsingular

(not necessarily unitary) transformation involving 2 variables. Our

restriction is that the composition of all gates up to any given step is

uniformly well conditioned. This is a practical requirement because an

ill conditioned transformation introduces numerical instability.

The main technical contribution is extension of the matrix entropy used

in Ailon (2013) to "negative probabilities".

A fast pattern matching scheme termed Matching by Tone Mapping (MTM) is introduced which allows matching

under non-linear tone mappings. We show that, when tone mapping is approximated by a piecewise constant/linear function, a fast computational scheme is possible requiring computational time similar to the fast implementation of Normalized Cross Correlation (NCC). In fact, the MTM measure can be viewed as a generalization of the NCC for non-linear mappings and actually reduces to NCC when mappings are restricted to be linear. We empirically show that the MTM is highly discriminative and robust to noise with comparable performance capability to that of the well performing Mutual Information, but on par with NCC in terms of computation time. If time permits, I’ll show that MTM can be formulated as a graph Laplacian where pattern’s pixels are represented by a weighted graph and fast computation is possible due to low-rank decomposition. Generalizing the Laplacian distance results in distance measures which are tolerant to various visual distortions.

I will present a few approaches for the detection of the ancestry of an individual based on the individual's genome. The general approach for ancestry inference is either based on generative probabilistic models that are dichotomous (e.g., mixture models, clustering), or are based on linear methods such as PCA. In this talk I will review these methods and describe two new methods that are based on a hybrid of these two approaches.

By analyzing the user's utterance while it is still in progress, dialogue

systems and virtual humans can provide more natural, human-like

conversational behavior. This includes (non-verbal) listening behavior,

backchannels, quick responses and interruptions and collaborative

completions.

I will present recent work at the Institute for Creative Technologies

aimed at incremental understanding and dialogue management

while someone is speaking. A particular focus will be on a model of

grounding (reaching assumed mutual understanding of what is said) that is

updated very rapidly and facilitates feedback decisions while listening.

These models have been implemented within a multi-party virtual human

negotiation applications. I will also present work on providing additional

listener feedback to indicate attitudes toward the content as well as level

of understanding.

There will not be a talk today.

The research presented in this talk is within the discipline of computer science (CS) education. This is a sub-field of mathematics and science education, which in turn is a sub-field of the discipline of education. Thus, research in CS education is based on methodologies of social science, as well as on understanding the special nature of CS – as a discipline that has something in common with mathematics, science and engineering, but also has unique characteristics of its own. The research of CS education focuses on two educational processes: the teaching process (what should we teach our students in order to better prepare them to be CS graduates, and how should we teach it?), and the learning process (how do our students interpret and perceive what we teach them?).

This talk will focus on non-determinism, a fundamental idea in CS that is relevant to many CS areas, and is strongly connected to abstraction, another CS fundamental idea. I will describe a series of studies. The findings of some of these studies indicate difficulties students have in perceiving and using non-determinism, while others indicate that certain teaching strategies can affect students' perception of non-determinism. I will discuss issues concerning the manner in which non determinism can be taught and the implications of this on students' perception.

Proof complexity is the field that provides the foundation for proof-search algorithms (such as industrial DPLL-based SAT-solvers), as well as an established direction towards the central questions of computational complexity; namely, the fundamental questions about the limits of efficient computation. In its heart is the notion of a propositional proof system: each proof system defines a specific way to write proofs (or certify) tautologies, and the complexity of a given proof is the number of symbols it takes to write it down. Establishing strong size lower bounds on strong enough propositional proof systems, that is, showing that a certain

family of tautologies do not have small proofs in the system, is an extremely hard problem in itself which has striking implications to computational complexity.

In recent years, a new purely algebraic approach to proof complexity has emerged via the notion of arithmetic proofs, introduced by Hrubes and myself (CCC'09). This approach sets out to provide new structures and tools to old problems in the field; as well as providing an elegant solution to some long-standing open problems and giving rise to new seemingly fundamental open problems. In this talk I will give a short background into propositional proof complexity and discuss the algebraic approach, some of its achievements, prospects, and basic open problems.

Based on joint works with Pavel Hrubes (CCC'09, STOC'12), Fu Li ('13) and Stephen A. Cook ('13).

-------

Bio: Dr. Iddo Tzameret is an Assistant Professor at the Institute for Interdisciplinary Information Sciences, Tsinghua University. He received his PhD degree from Tel Aviv University in 2008 under the supervision of Ran Raz (Weizmann Institute) and Nachum Dershowitz (Tel Aviv). He was awarded a postdoctoral fellowship by the Eduard Cech Center for Algebra and Geometry. He has been conducting extensive research on the foundations of computer science, computational complexity, satisfiability (in practice and theory), and applications of logic in computer

science. He has been most successful in applying methods from computational complexity, algebraic complexity and logic in the area of efficient reasoning and proof complexity. His research is funded by the National Natural Science Foundation of China.

Vertex and edge connectivity are fundamental graph-theoretic concepts,

as they give measures for flows and robustness. While there is

extensive knowledge in the literature about edge connectivity, using

the essential tool of edge-disjoint spanning trees, much less is known

about vertex connectivity, mainly due to the exponential number of

vertex cuts, as opposed to the polynomial number of edge cuts.

In this talk I will present new results that propose CDS (connected

dominating set) partitions and packings as tools that help us argue

about the vertex connectivity of graphs. I will show algorithms for

obtaining an optimal CDS packing of size Omega(k/logn) and a CDS

partition of size Omega(k/log^5n) in k-vertex connected graphs. Our

results also apply to analyzing vertex connectivity under random

vertex sampling, showing an almost optimal bound of

\tilde{Omega}(kp^2) for the vertex connectivity of a graph whose

vertexes are sampled with probability p from a k-vertex connected

graph. Finally, I will discuss applications to the throughput of

store-and-forward broadcast algorithms.

Based on joint work with Mohsen Ghaffari and Fabian Kuhn.

No Colloquium Today

We describe an approach for synthesizing data representations for

concurrent programs. Our compiler takes as input a program

written using concurrent relations and synthesizes a representation of

the relations as sets of cooperating data structures as well as the

placement and acquisition of locks to synchronize concurrent access

to those data structures. The resulting code is correct by construction:

individual relational operations are implemented correctly and the

aggregate set of operations is serializable and deadlock free. The

relational specification also permits a high-level optimizer to choose

the best performing of many possible legal data representations

and locking strategies, which we demonstrate with an experiment

autotuning a graph benchmark.

This is joint work with Alex Aiken and Peter Hawkins(Stanford),

Katleen Fisher(DARPA), and Martin Rinard(MIT)

This work is part of Petrer Hawkins thesis

http://theory.stanford.edu/~hawkinsp/

Please also look into the December 12 CACM article

Joint work with Yonatan Sompolinsky

Bitcoin is a potentially disruptive new cryptocurrency based on a

decentralized open-source cryptographic protocol. The Bitcoin network

consists of nodes who contribute their computational power to approve

transactions. Transactions are approved in batches that are called

blocks once every 10 minutes (in expectation). Transactions need

multiple such approvals to occur before they can be considered

irreversible with sufficiently high probability. This implies a long

waiting time for each individual transaction (a typical transaction

may wait for an hour or so).

Additionally, blocks are currently restricted in size to 1MB and thus

limit the average number of transactions that can be processed per

second.

We seek to improve both the waiting time for transaction authorization

and the number of transactions processed per second by lowering the

block authorization time to far below 10 minutes, and by increasing

the maximal allowed block size. We analyze the effects such changes

would have on the security of the protocol.

Using the typical block propagation time in the Bitcoin network

(recently measured by Decker & Wattenhoffer) our findings indicate

that:

1) At today’s transaction rates, the waiting time for approval can be

significantly decreased with negligible compromise with regards to the

security of the protocol.

2) Bitcoin’s ability to scale up to greater transaction volumes is

inherently limited by the propagation delay in the network. Our

analysis allows us to derive estimates regarding this limit.

.

There will be no colloquium this week.

BISFAI takes place on Wed-Fri.

http://websrv.cs.biu.ac.il/bisfai13//

See you next year.

Yahoo! Answers is one of the largest community-based question answering sites today, with over a billion answers for hundreds of millions of questions.

In this talk I will present two works that leverage this amount of data. The first work, presented in WWW 2013, is a personalized recommender system that recommends open questions for users to answer, based on the answering activity of each user. This task contains unique attributes: most of the users and questions are new with hardly any information on them. In addition, proposing relevant questions is not the only requirement by the users, where diversity and freshness play a significant role in the quality of recommendations. This system was recently deployed in Yahoo! Answers.

The second work, presented in WWW 2012, is an automatic question answering algorithm that matches past answers to new open questions. Prior question answering systems focused on factoid questions for which a single correct answer exists. However, our algorithm needs to address many types of open-ended questions, including asking for suggestions, opinions and social feedback.

The first work is a joint work with Dan Pelleg and Yoelle Maarek.

The second work is a joint work with Anna Shtok, Gideon Dror and Yoelle Maarek.

Bloom filters and counting Bloom filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error. Unlike Bloom filters, CBFs also support element deletions.

In the first part of the talk, we introduce a new general method based on variable increments to improve the efficiency of CBFs and their variants. We demonstrate that this method can always achieve a lower false positive rate and a lower overflow probability bound than CBF in practical systems.

In the second part of the talk, we uncover the Bloom paradox in Bloom filters: sometimes, it is better to disregard the query results of Bloom filters, and in fact not to even query them, thus making them useless. We analyze conditions under which the Bloom paradox occurs, and demonstrate that it depends on the a priori probability that a given element belongs to the represented set.

Short Bio:

Ori Rottenstreich is a Ph.D. student at the Electrical Engineering department of the Technion, Haifa, Israel. His current research interests include exploring novel coding techniques for networking applications. Ori is a recipient of the Google Europe Fellowship in Computer Networking, the Andrew Viterbi graduate fellowship, the Jacobs-Qualcomm fellowship, the Intel graduate fellowship and the Gutwirth Memorial fellowship.

Bloom filters and counting Bloom filters (CBFs) are widely used in networking device algorithms. They implement fast set representations to support membership queries with limited error. Unlike Bloom filters, CBFs also support element deletions.

In the first part of the talk, we introduce a new general method based on variable increments to improve the efficiency of CBFs and their variants. We demonstrate that this method can always achieve a lower false positive rate and a lower overflow probability bound than CBF in practical systems.

In the second part of the talk, we uncover the Bloom paradox in Bloom filters: sometimes, it is better to disregard the query results of Bloom filters, and in fact not to even query them, thus making them useless. We analyze conditions under which the Bloom paradox occurs, and demonstrate that it depends on the a priori probability that a given element belongs to the represented set.

In this talk we present relations between linear programming, dynamic programming, and the min-sum algorithm (also known as the sum-product algorithm).

Linear programming is an algorithmic technique for optimization problems over the real numbers. Dynamic programming is an algorithmic technique for optimization problems over trees and acyclic graphs. The min-sum algorithm is a variant of Belief Propagation and is used for

performing inference under graphical models.

The study of relations between these algorithmic techniques is motivated by decoding of error correcting codes. Many variants of the min-sum algorithm have been proposed for decoding starting with Gallagher in 1963. Viterbi's algorithm from 1967 for decoding convolutional codes is a dynamic programming algorithm. Feldman, Karger, and Wainwright proposed in 2005 to use linear programming for decoding binary linear codes.

The talk will focus on two results:

1. Inverse polynomial bounds on the decoding error for a wide class of error correcting codes with logarithmic girth (including LDPC codes, and irregular repeat-accumulate codes with both even and odd repetition factors). The bounds apply both to linear-programming decoding as well as a variant of the min-sum algorithm.

2. A proof that the min-sum algorithm for integer packing and covering problems is not better than linear programming.

Both results use dynamic programming to interpret the min-sum algorithm. In addition, graph covers are used as tool that enables a comparison between optimal solutions of linear programming and optimal solutions of dynamic programming.

Joint work with Nissim Halabi.

Building 507, room 061

אחת מתופעות הלוואי השליליות של מהפכת המחשבים היא תופעת המחשוב הפולשני, המתבטאת בין היתר, בחדירה למחשב. השלכות התופעה אינן עוצרות בפגיעה בזכויות אדם כגון, הזכות לאוטונומיה, לפרטיות, לקנין ולחופש הביטוי; אלא יש להן היבטים חברתיים נרחבים יותר, כגון תקינות הפעולה של מערכות מופעלות מחשב או תשתיות מסחר, שעליהן נסמכת הפעילות החברתית התקינה.

אולם, מהבחינה המשפטית יש קושי רב בהתמודדות עם תופעת המחשוב הפולשני שכן לרוב, קורבנות העברה כלל אינם מודעים לעצם החדירה למחשבם, מעטים מהם מתלוננים למשטרה, שסובלת מכח אדם מועט וגם לחוק עצמו קשה להגדיר את פרישת עברת החדירה למחשב, בשל היבטיה הערטילאיים. כך למשל, לאחרונה בית משפט השלום זיכה נאשם שחדר לחשבונות בנק של אחרים מעברת החדירה למחשב (האוסרת חדירה שלא כדין לחומר מחשב הנמצא במחשב), שכן העברה ותחולתה אינם ברורים דיים (מדינת ישראל נ' ניר עזרא)!

בשל קשיים כבדי משקל אלה, הפתרון הראוי לתופעת המחשוב הפולשני צריך להתבטא, לא רק באיסור פלילי ונזיקי של עצם החדירה הלא מורשית למחשב, אלא גם בראייה עונשית מניעתית. כלומר, על הדין לאסור את עצם עריכת התוכנה החודרנית ואת הפצתה לציבור הרחב. אכן, בתיקון מספר 1 לחוק המחשבים המחוקק ניסה להתמודד בצורה ראויה עם תופעת המחשוב הפולשני על ידי אימוץ עברות הכנה והפצה; סעיף 6 של החוק המתוקן אוסר עריכת תוכנה המשבשת מחשב, מוחקת מידע ממוחשב, מאפשר זיוף של מידע ממוחשב, חדירה למחשב והאזנת סתר והכל "**במטרה לבצען שלא כדין**".

בנוגע לאיסור תכנות תוכנה המאפשרת חדירה למחשב, נשאלת השאלה האם היא כוללת גם איסור על פיתוח תוכנה חודרנית במטרה לסגלה לצרכי ריגול במדינות אויב; האם היא אוסרת תכנות תוכנה חודרנית לצרכי לימוד והדגמה; האם למעשה האיסור צובע באדום גם פיתוח קוקיס וסרגלים? **מושג המפתח למענה על שאלה זו הוא המונח "שלא כדין"**. אלא שקשה ביותר לעמוד על מהותו של מונח זה ולאורך השנים בתי המשפט התלבטו רבות בניסיון לפרשו; יש שפירשוהו כ"ללא צידוק בדין החיצוני לחוק שבו מדובר", כלומר כסוג של "קישורית" המאפשרת לייבא הגנה שקבועה בחוק כלשהו לחוק שבו טבוע המונח, למשל לחוק המחשבים. ויש שפירשו את המונח "שלא כדין" כמתייחס לסוגיית ההסכמה. כלומר, חדירה שלא כדין לחומר מחשב היא חדירה שנעשית ללא הסכמה של בעל המידע הממוחשב (פס"ד מדינת ישראל נ' בדיר).

לדעתי, כדי למנוע הפללת יתר על פי סעיף 6 לחוק המחשבים, צריך יהיה לפרש את המונח "שלא כדין" בצורה רחבה מאוד, כזו שתכלול גם הגנה על פיתוח תוכנות למטרות מחקר. כלומר, כאשר מרצה למחשבים מדגים לסטודנטים שלו איך לערוך ולתכנת תוכנה חודרנית, לא יהא זה מעשה "שלא כדין" ובוודאי שלא עברה פלילית. אשר לקוקיס ולסרגלים פולשניים. אף שלרוב החדרתן למחשב היעד נעשית ללא הסכמה וללא ידיעה של בעל המחשב, ואף שבחלק ניכר מהמקרים התוכנה שמאפשרת את החדרתם מטרידה את בעל המחשב, דומתני שמן הראוי שהמחוקק ייתן עליהם את הדעת באופן פרטני. קרי, שאיסור תוכנות פולשניות נפוצות כל כך ייעשה באמצעות חקיקה ספציפית, אשר תתייחס אליהן ותדרוש הסכמה מדעת של בעל המחשב ואפשרויות זמינות להסרתן מהמחשב הנחדר, ולא באמצעו הפעלה של סעיף 6 לגביהם.

Protein networks have become the workhorse of biological research in recent years, providing mechanistic explanations for

basic cellular processes in health and disease. However, these explanations remain topological in nature as the underlying logic

of these networks is to the most part unknown. In this talk I will

describe the work in my group toward the automated learning of

the Boolean rules that govern network behavior under varying conditions.

I will highlight the algorithmic problems involved and demonstrate how they can be tackled using integer linear programming techniques.

The talk is self contained.

The Border Gateway Protocol (BGP) establishes routes between the smaller networks that make up today's Internet (Google, AT&T, etc.) and can be viewed as the glue that holds the Internet together. As evidenced by recent major Internet outages, BGP is unbelievably vulnerable to configuration errors and attacks. To remedy this, several security-enhancements of BGP have been proposed. Yet, despite over a decade of work, deployment is still not yet on the horizon. While at first it seemed that the main obstacle en routing to deployment is technical infeasibility, it is now clear that the main challenges lie elsewhere: (1) lingering disagreements about which of the major proposals to adopt; and (2) lack of economic incentives for adoption. I will survey recent results along these lines including ways to quantify Internet routing security and economic mechanisms for driving transition to BGP security.

Joint work with Sharon Goldberg, Pete Hummon and Jennifer Rexford (2010), with Phillipa Gill and Sharon Goldberg (2011), and with Robert Lychev and Sharon Goldberg (2013)

The Crossing Number of a graph $G$ is the minimum number of

edge crossings in a drawing of $G$ in the plane. According to the

Crossing Lemma, there is a constant $c$ such that the crossing number

of every (not too sparse) graph $G=(V,E)$ is at least $c|E|^3/|V|^2$.

This bound is tight, apart from the multiplicative constant $c$.

The first result that I will present states that if an $n$-vertex

graph can be drawn such that each of its edges is involved in at most

four crossings, then it has at most $6n-12$ edges. This settles a

conjecture of Pach, Radoicic, Tardos, and G. Toth, and implies a

better lower bound for the constant $c$ of the Crossing Lemma.

The second result that I will present considers the degenerate

crossing number of a graph $G$, namely, the minimum number of crossing

\emph{points} in a drawing of $G$ in the plane. It has been an open

problem whether this crossing number behaves like the usual crossing

number. In a recent joint work with Rom Pinchasi we proved that it

does.

Recent developments in graphical models and the logic of counterfactuals have had a marked effect on the way scientists treat problems involving cause-effect relationships. Paradoxes and controversies have been resolved, slippery concepts have been demystified, and practical problems requiring causal information, which long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics.

I will review concepts, principles, and mathematical tools that were found useful in this transformation, and will demonstrate their applications in several data-intensive sciences.

These include questions of policy evaluation, model testing, mediation mechanisms, missing data, generalizations from experiments, and integrating data from diverse sources,.

Reference: J. Pearl, Causality (Cambridge University Press, 2000,2009) Background material:

http://ftp.cs.ucla.edu/pub/stat_ser/r379.pdf

http://ftp.cs.ucla.edu/pub/stat_ser/r355-reprint.pdf

http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

http://ftp.cs.ucla.edu/pub/stat_ser/r400.pdf

Working papers:

http://bayes.cs.ucla.edu/csl_papers.html

**Building 410 Beck Auditorium**

.

Modbus/TCP is used in SCADA networks to communicate between the Human Machine Interface (HMI) and the Programmable Logic Controllers (PLCs). Therefore, deploying Intrusion Detection Systems (IDS) on Modbus networks is an important security measure. In this paper we introduce a model-based IDS specifically built for Modbus/TCP. Our approach is based on a key observation: Modbus traffic to and from a specific PLC is highly periodic. As a result, we can model each HMI-PLC channel by its own unique deterministic finite automaton (DFA). Our IDS looks deep into the Modbus packets and produces a very detailed model of the traffic. Thus, our method is very sensitive, and is able to flag anomalies such as a message appearing out of its position in the normal sequence, or a message referring to a single unexpected bit. We designed an algorithm to automatically construct the channel’s DFA based on about 100 captured messages. A significant contribution is that we tested our approach on a production Modbus system. Despite its high sensitivity, the system enjoyed a super-low false-positive rate: on 5 out of the 7 PLCs we observed a perfect match of the model to the traffic, without a single false alarm for 111 hours. Further, our system successfully flagged realanomalies that were caused by technicians troubleshooting the HMI system---and the system also helped uncover one incorrectly configured PLC

Everyone is concerned about Internet security, yet most traffic is not cryptographically protected. The usual justification is that most attackers are only {\em off-path} and cannot intercept traffic; hence, challenge-response mechanisms suffice to ensure authenticity. Usually, the challenges re-use existing `unpredictable' protocol header fields; this allows use of existing, widely-deployed protocols such as TCP and DNS.

We argue that this practice may only give an {\em illusion of security}. We present recent off-path TCP injection and DNS poisoning attacks, allowing circumvention of existing challenge-response defenses.

Both TCP and DNS attacks are non-trivial, yet very efficient and practical. The attacks allow circumvention of widely deployed security mechanisms, such as a Same Origin Policy, and allow a wide range of exploits, e.g., long-term caching of malicious objects and scripts.

We hope that this article will motivate adoption of cryptographic mechanisms such as SSL/TLS, IPsec and DNSSEC, and of correct, secure challenge-response mechanisms.

(Joint work with Yossi Gilad and Haya Shulman)

.

Matrix computations over the real and complex numbers are immensely important algorithmic building blocks (for physical simulations, optimization, machine learning, and many other applications). These computations include the solution of systems of linear equations, minimization of quadratic forms, and computation of spectral objects, such as eigen/singular values and vectors. All of these problems can be solved in polynomial time and algorithms for solving them have been investigated extensively for over 60 years (with some algorithms going back to Gauss). But deep and important open questions remain. The talk will explain what is still not known, why these open questions are important, and what was discovered about them recently. The selection of open problems will be fairly subjective and highly influenced by my own research; it will not be the most objective and comprehensive survey of open problems in numerical linear algebra.

This talk will focus on efficient methods for single- and multi-robot autonomous localization and structure from motion (SfM) related problems such as mobile vision, augmented reality and 3D reconstruction. High-rate performance and high accuracy are a challenge, in particular when operating in large scale environments, over long time periods and in presence of loop closure observations. This challenge is further enhanced in multi-robot configurations, where communication and computation budgets are limited and consistent information fusion should be enforced. In this talk, I will describe approaches that address these challenges.

First, I will present a computationally efficient method for incremental bundle adjustment. The method, incremental light bundle adjustment (iLBA), incorporates two key components to substantially reduce computational complexity: the observed 3D points are algebraically eliminated, leading to a cost function that is formulated in terms of multiple view geometry constraints. The second component is incremental smoothing, which uses graphical models to adaptively identify the variables that should be re-eliminated at each step. The described method will be demonstrated both in SfM and robot navigation scenarios. Since high-rate performance for general loop closure observations is not guaranteed, I will overview an approach that addresses this issue by parallelized computations, partitioning the underlying graphical structure of the problem at hand. Next, I will focus on multi-agent configurations and discuss approaches that exploit commonly observed 3D points to both perform cooperative localization, improving pose estimation of the individuals in the group, and to extend sensing horizon. Two methods will be described: sharing the estimated distributions of observed 3D points, and sharing the actual (image) observations of these points, thereby employing distributed multiple view geometry and extending iLBA. A special attention will be given to consistent information fusion, i.e. avoid double counting measurements

###
**הקולוקוויום יתקיים בבניין 202 חדר 106 **

Next week Michael Segal from Communication Systems Engineering Department

In recent decades a vast variety of non-classical logics have been introduced, driven by various CS applications. Temporal logics, separation logics, fuzzy logics and paraconsistent logics are just a few prominent examples, used in verification of software and hardware, medical expert systems, data and knowledge bases, etc. A useful logic should ideally have two components: a simple and intuitive semantics, which can provide real insights into the logic, and a corresponding analytic proof system which is the key to effective proof search strategies for automated deduction methods. Obtaining these components for a given logic is a challenging process, which is usually tailored to the particular logic at hand. However, due to the increasing number of new application-driven logics, there is a need for a systematic approach to obtaining these components, which could be used for developing tools for automatic support for the design and investigation of logical systems.

In this talk we show that this goal can be achieved at least for some useful families of non-classical logics. We provide a uniform and modular method for a systematic generation of effective semantics and analytic proof systems for a very large family of paraconsistent logics used for reasoning with inconsistent information, thus making a substantial step towards the development of efficient paraconsistent theorem provers. The method, implemented by the Prolog system PARAlyzer, has been extended to infinitely many other logics formulated in terms of axiomatic systems of a certain natural form.

The processor landscape has fractured into latency-optimized CPUs,

throughput-oriented GPUs, and soon, custom accelerators. Future applications

will need to cohesively use a variety of hardware to achieve their performance

and power goals. However building efficient systems that use

accelerators today is incredibly difficult.

In this talk I will argue that the root cause of this complexity lies in

the lack of adequate operating system support for accelerators. While

operating systems provide optimized resource management

and Input/Output (I/O) services to CPU applications, they make no such

services available to accelerator programs.

I propose GPUfs - an operating system layer which enables access to files directly from programs

running on throughput-oriented accelerators, such as GPUs.

GPUfs extends the constrained GPU-as-coprocessor programming model,

turning GPUs into first-class computing devices with full file I/O support.

It provides a POSIX-like API for GPU programs, exploits parallelism for efficiency,

and optimizes for access locality by extending a CPU buffer cache into physical memories

of all GPUs in a single machine.

Using real benchmarks I show that GPUfs simplifies the development

of efficient applications by eliminating the GPU management complexity,

and broadens the range of applications that can be accelerated by GPUs.

For example, a simple self-contained GPU program which searches for a set of strings in the entire

tree of Linux kernel source files completes in about third of the time of an

8-CPU-core run.

Joint work with Idit Keidar (Technion), Bryan Ford (Yale) and Emmett Witchel

(UT Austin)

The goal of discriminative learning is to train a system to optimize a certain desired measure of performance. In simple classification we seek a function that assigns a binary label to a single object, and tries to minimize the error rate (correct or incorrect) on unseen data. In structured prediction we are interested in the prediction a structured label, where the input is a complex object. Typically, each structured prediction task has its own measure of performance, or task loss, such as word error rate in speech recognition, the BLEU score in machine translation or the intersection-over-union score in object segmentation. Not only that those task losses are much more involved than the binary error rate, the structured prediction itself spans an exponentially large label space. In the talk, I will present two algorithms each designed to the minimize a given task loss, and applied to speech recognition.

In the first part of the talk, I will present an algorithm which aims at achieving a high area under the receiver operating characteristic (ROC) curve. This measure of performance is often used to evaluate the detection of events over time. We will give some theoretical bounds that relate the performance of the algorithm with the expected area under the ROC curve, and will demonstrate its efficiency to that task of keyword spotting, i.e., that detection of all occurrences of any given word in a speech signal.

In the second part of the talk, I will describe a new algorithm which aims to minimize a regularized task loss. The algorithm is derived by directly minimizing a generalization bound for structured prediction, which gives an upper-bound on the expected task loss in terms of the empirical task loss. The resulting algorithm is iterative and easy to implement, and as far as we know, the only algorithm that can handle a non-separable task loss. We will present experimental results on the task of phoneme recognition, and will show that the algorithm achieves the lowest phoneme error rate (normalized edit distance) compared to other discriminative and generative models with the same expressive power.

- Last modified: 30/11/2015