Invited Talk: Sanjeev Arora
Sanjeev gave a survey of recent work by himself and others in working to develop provable bounds on the performance of many problems across unsupervised learning. The talk itself was entitled "Overcoming Intractability for Unsupervised Learning", where the "unsupervised" refers (loosely) to being given a pile of raw data and asked to extra some relevant statistical data, e.g. given a collection of newspaper articles, what topics do they cover. The foothold for these problems is assuming that the data comes from some (unknown) distribution, and many fo the problems relate to learning something about the distribution.Almost all of the problems covered are NP-hard, and the first part of the talk was an explanation of how this is overcome. In Sanjeev's words, "NP-hardness is not the right way to think about this". He explained that the starting point is determining if the set of interesting/realistic/natural instances lie in a tractable subset by looking for realistic assumptions that lead to tractable instances.
As an example, he described the topic modelling problem, where one is given a corpus of text (e.g. all New York Times articles) and the goal is to describe each article as a linear combination of topics, where the topics are not given in advance (i.e. unsupervised learning). A geometric view of this problem is being given a set of points in a convex polytope (articles) and being asked to find the vertices of the polytope (topics). The problem is NP-hard in general, but can be made tractable by assuming that the data is separable: the vertices are included in the set of points given. Then a polynomial time algorithm have be obtained: for each point, test if it's contained on the convex hull, if not, it can be removed, and otherwise it must be a vertex.
He also spoke about making process on other machine learning techniques, including dictionary learning (given complex objects built from a small, simple dictionary, compute the dictionary) and approaches like deep learning (many-level neutral networks). Dictionary learning and deep learning are both thought to take place in the human visual cortex, giving good evidence that their worst-case intractability is not a barrier to their effective use. For the case of deep learning, Sanjeev pointed out that learning the network is effectively trying to learn a TC^0 (ouch!) and yet they have been used successfully by machine learning people to achieve high accuracy rates on large corpora.
Amir Abboud, Virginia Vassilevska Williams and Oren Weimann. Consequences of Faster Alignment of Sequences.
Amir gave this talk on conditional lower bounds on local matching a pair of strings: given a function defining a score for every pair of symbols, find a pair substrings from the input string that maximizes the score. Like other string problems, including longest common subsequence, the best known algorithm for local alignment takes quadratic time and has resisted improvements in the exponent to, say, O(n^1.99).Amir was quick to point out local alignment is hugely important in genomic sequence analysis (heuristics like BLAST are hugely popular) and that quadratic is too slow in these cases, since n can be several billion. Abboud, Williams, and Weimann prove that a O(n^{2-ε})-time algorithm for local alignment implies the following results:
- 3SUM solved in O(n^{2-δ}) time for some δ, refuting the weak form of the 3SUM conjecture.
- CNF-SAT solvable in O((2-δ)^n) time, refuting the strong exponential time hypothesis.
- Max-4-Clique solvable in O(n^{4-δ}) time for some δ, resolving a long-standing opening problem of improving the best-known algorithm for Max-4-Clique.
Amihood Amir, Timothy M. Chan, Moshe Lewenstein and Noa Lewenstein. On Hardness of Jumbled Indexing.
This work has a similar flavor to the Abboud, Williams, Weimann paper, using the 3SUM conjecture to develop lower bounds for string problems. Unlike that paper, this work gives lower bounds for data structures answering queries. The main problem considered is of jumbled, a.k.a. histogram indexing: given a string T with alphabet of size k, preprocess T to answer queries of the form "what substrings of T have character occurrence counts ", i.e. what substrings have the given histogram. For many pattern matching problems on strings, suffix trees/arrays are used to achieve linear-time queries.For histogram indexing, naive approaches yield either O(1) preprocessing and O(n) query time (scanning the string using a window whose size is the sum of the entries in the histogram) or O(kn^2) preprocessing time and O(1) query time (precompute the histograms of all substrings and look up the answer). The contribution of Amir, Chan, Lewenstein, and Lewenstein is to prove that, assuming the 3SUM conjecture, these results are the best possible.
For a fixed alphabet of size k at least 3, they prove that achieving O(n^{2-2/k}) preprocessing and O(n^{1-1/k}) query time simultaneously is 3SUM-hard. For alphabets of non-constant size, they prove that achieving O(n^{2-ε}) preprocessing and O(n^{1-δ}) query time is 3SUM-hard. He also mentioned that unpublished work with Timothy Chan achieves a positive result in the form of a data structure that achieves O(n^{2 - 2/k + O(1)}) preprocessing and O(n^{1 - 1/k + O(1)}) query time simultaneously for fixed alphabets.
Moshe did a good job of mixing the problems, motivation, and technical details in a way that kept the talk interesting. Related to the reductions in this work, which use a reduction from convolution 3SUM (testing whether there exist two indices i, j, of the input array such that A[i] + A[j] = A[i+j]), he also recommended that people read Patrascu's paper that proves this special form of 3SUM is 3SUM-hard.
John Iacono and Özgür Özkan. Why Some Heaps Support Constant-Amortized-Time Decrease-Key Operations, and Others Do Not
John's hand-drawn slides caught the attention of some people from the very beginning, and combined with his deadpan delivery and promise to "keep it light" left the audience quiet chuckling through the talk. The talk started with a chat about heaps. All good heaps have O(log(n))-time insert and extract-min operations. Decrease-key is also an operation of heaps; some heaps are better at it. There are three families of heaps with three different running times for decrease-key:- Fibonacci and "improvements" that achieve O(1) time.
- Pairing heaps that achieve O(2^{2^{sqrt{loglog(n)}}}) time.
- Elmasry/sort heaps that achieve O(loglog(n)) time.
- Pairing heaps: repeatedly pair trees recursively. Repeat until one tree remains.
- Elmasry/sort heaps: break roots into blocks of size log(n), chain all roots of a block together.
- For Fibonacci heaps: pair heaps whose roots have the same number of children.
This contribution of Iacono and Özkan is a new lower bound without such the requirement that trees are combined when compared, but with a new restriction: must pay pointer model costs. This result yields lower bound of Omega(loglog(n)) for pairing and sort heaps, but says nothing about Fibonacci heaps because they cheat: they do not pay pointer model costs. A pointer model version would build a tree above the array, which would have height loglog(n) and give decrease-key a similar running time. The actual proof is adversarial, and considers a whole family of distinct Fibonacci heaps that must grow larger than the total number of possible heaps, a contradiction.
No comments:
Post a Comment