-----1
[1] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137146, 2003.
[2] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks.In KDD, pages 199208, 2009.
[3] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In ICDM, pages 8897, 2010.
[4] Amit Goyal, Francesco Bonchi, and Laks V. S. Lakshmanan. A data-based approach to social influence maximization. Proc. VLDB Endow., 5, 2011.
[5] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen, and Natalie S. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420429, 2007.
[6] Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral market- ing. In KDD, pages 6170, 2002.
[7] Manuel Gomez-Rodriguez and Bernhard Scholkopf. Influence maximization in continuous time diffusion networks. In ICML 12, 2012.
[8] Nan Du, Le Song, Alexander J. Smola, and Ming Yuan. Learning networks of heterogeneous influence. In NIPS, 2012.
[9] Nan Du, Le Song, Hyenkyun Woo, and Hongyuan Zha. Uncover topic-sensitive information diffusion networks. In AISTATS, 2013.
[10] Manuel Gomez-Rodriguez, David Balduzzi, and Bernhard Scholkopf. Uncovering the tempo- ral dynamics of diffusion networks. In ICML, pages 561568, 2011.
[11] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Scholkopf. Structure and Dynamics of Information Pathways in On-line Media. In WSDM, 2013.
[12] Ke Zhou, Le Song, and Hongyuan Zha. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In Artificial Intelligence and Statistics (AISTATS), 2013.
[13] Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional hawkes processes. In International Conference on Machine Learning(ICML), 2013.
[14] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Scholkopf. Modeling information propagation with survival theory. In ICML, 2013.
[15] Shuanghong Yang and Hongyuan Zha. Mixture of mutually exciting processes for viral diffu- sion. In International Conference on Machine Learning(ICML), 2013.
[16] Jerald F. Lawless. Statistical Models and Methods for Lifetime Data. Wiley-Interscience, 2002.
[17] Edith Cohen. Size-estimation framework with applications to transitive closure and reachabil- ity. Journal of Computer and System Sciences, 55(3):441453, 1997.
[18] GL Nemhauser, LA Wolsey, and ML Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1), 1978.
[19] Andreas Krause. Ph.D. Thesis. CMU, 2008.
[20] Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. JMLR, 11, 2010.
[21] Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. Inferring networks of diffu- sion and influence. In KDD, 2010.
[22] David Easley and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
[23] Jure Leskovec, Lars Backstrom, and Jon M. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, 2009.
[24] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In SIG- METRICS/PERFORMANCE, pages 211222. ACM, 2012.
[25] Wei Chen, Chi Wang, and Yajun Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD 10, pages 10291038, 2010.
-----1
[1] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu.Approximation algorithms for k-anonymity. Journal of Privacy Technology, 2005.
[2] M. Allman and V. Paxson. Issues and etiquette concerning use of shared measurement data. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007.
[3] M. Bugliesi, B. Preneel, V. Sassone, I Wegener, and C. Dwork. Lecture Notes in Computer Sci- ence - Automata, Languages and Programming, chapter Differential Privacy. Springer Berlin / Heidelberg, 2006.
[4] K. Chaudhuri, C. Monteleone, and A.D. Sarwate. Differentially private empirical risk mini- mization. Journal of Machine Learning Research, (12):10691109, 2011.
[5] G. Cormode, D. Srivastava, S. Bhagat, and B. Krishnamurthy. Class-based graph anonymiza- tion for social network data. In PVLDB, volume 2, pages 766777, 2009.
[6] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. VLDB J., 19(1):115139, 2010.
[7] R. Duan and S. Pettie. Approximating maximum weight matching in near-linear time. In Proceedings 51st Symposium on Foundations of Computer Science, 2010.
[8] J. Edmonds. Paths, trees and flowers. Canadian Journal of Mathematics, 17, 1965.
[9] H.N. Gabow. An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, 1983.
[10] A. Gionis, A. Mazza, and T. Tassa. k-anonymization revisited. In ICDE, 2008.
[11] B. Huang and T. Jebara. Fast b-matching via sufficient selection belief propagation. In Artificial Intelligence and Statistics, 2011.
[12] M.I. Jordan, Z. Ghahramani, T. Jaakkola, and L.K. Saul. An introduction to variational meth- ods for graphical models. Machine Learning, 37(2):183233, 1999.
[13] V.N. Kolmogorov. Blossom V: A new implementation of a minimum cost perfect matching algorithm. Mathematical Programming Computation, 1(1):4367, 2009.
[14] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l- diversity. In ICDE, 2007.
[15] S. Lodha and D. Thomas. Probabilistic anonymity. In PinKDD, 2007.
[16] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity.ACMTransactions on Knowledge Discovery fromData (TKDD), 1, 2007.
[17] A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004.
[18] P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing infor- mation. In PODS, 1998.
[19] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression.International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):571 588, 2002.
[20] Y. Tao and X. Xiao. Personalized privacy preservation. In SIGMOD Conference, 2006.
[21] Y. Tao and X. Xiao. Personalized privacy preservation. In Privacy-Preserving Data Mining, 2008.
[22] O. Williams and F. McSherry. Probabilistic inference and differential privacy. In NIPS, 2010.
[23] M. Xue, P. Karras, C. Rassi, J. Vaidya, and K.-L. Tan. Anonymizing set-valued data by nonre- ciprocal recoding. In KDD, 2012.
[24] E. Zheleva and L. Getoor. Preserving the privacy of sensitive relationships in graph data. In KDD, 2007.
-----1
[1] Evrim Acar, Daniel M Dunlavy, Tamara G Kolda, and Morten Mrup. Scalable tensor factorizations for incomplete data. Chemometrics and Intelligent Laboratory Systems, 106(1):4156, 2011.
[2] M Berry et al. Svdpackc (version 1.0) users guide, university of tennessee tech. Report (393-194, 1993 (Revised October 1996)., 1993.
[3] Jian-Feng Cai, Emmanuel J Cande`s, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):19561982, 2010.
[4] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925 936, 2010.
[5] Emmanuel J Cande`s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717772, 2009.
[6] A Evgeniou and Massimiliano Pontil. Multi-task feature learning. 2007.
[7] Maryam Fazel, Haitham Hindi, and Stephen P Boyd. A rank minimization heuristic with application to minimum order system approximation. In American Control Conference, 2001, 2001.
[8] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Physical review letters, 105(15):150401, 2010.
[9] Johan Ha?stad. Tensor rank is np-complete. Journal of Algorithms, 11(4):644654, 1990.
[10] Christopher Hillar and Lek-Heng Lim. Most tensor problems are np hard. JACM, 2013.
[11] Prateek Jain, Raghu Meka, and Inderjit Dhillon. Guaranteed rank minimization via singular value projec- tion. In NIPS, 2010.
[12] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455 500, 2009.
[13] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender sys- tems. Computer, 42(8):3037, 2009.
[14] Rasmus Munk Larsen. Propack-software for large and sparse svd calculations. Available online., 2004.
[15] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. In ICCV, 2009.
[16] Ian Porteous, Evgeniy Bart, and Max Welling. Multi-hdp: A non-parametric bayesian model for tensor factorization. In AAAI, 2008.
[17] Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In SIGKDD, 2009.
[18] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In WWW, 2010.
[19] Steffen Rendle and Lars Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In ICDM, 2010.
[20] Amnon Shashua and Tamir Hazan. Non-negative tensor factorization with applications to statistics and computer vision. In ICML, 2005.
[21] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver.Tfmap: Optimizing map for top-n context-aware recommendation. In SIGIR, 2012.
[22] Nathan Srebro, Jason DM Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. NIPS, 2005.
[23] Ryota Tomioka, Kohei Hayashi, and Hisashi Kashima. Estimation of low-rank tensors via convex opti- mization. arXiv preprint arXiv:1010.0789, 2010.
[24] Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, and Hisashi Kashima. Statistical performance of convex tensor decomposition. NIPS, 2011.
[25] Jason Weston, Chong Wang, Ron Weiss, and Adam Berenzweig. Latent collaborative retrieval. ICML, 2012.
[26] Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, and Jaime G Carbonell. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In SDM, 2010.
-----1
[1] P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utiliza- tion of error estimates of data values. Environmetrics, 5:111126, 1994.
[2] D. Lee and H. Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401:788 791, 1999.
[3] J. Ramsay and B. Silverman. Functional Data Analysis. Springer, New York, 2006.
[4] F. Bach, J. Mairal, and J. Ponce. Convex Sparse Matrix Factorization. Technical report, ENS, Paris, 2008.
[5] D. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10:515534, 2009.
[6] A-J. van der Veen. Analytical Method for Blind Binary Signal Separation. IEEE Signal Processing, 45:10781082, 1997.
[7] J. Liao, R. Boscolo, Y. Yang, L. Tran, C. Sabatti, and V. Roychowdhury. Network component analysis: reconstruction of regulatory signals in biological systems. PNAS, 100(26):1552215527, 2003.
[8] S. Tu, R. Chen, and L. Xu. Transcription Network Analysis by a Sparse Binary Factor Analysis Algorithm.Journal of Integrative Bioinformatics, 9:198, 2012.
[9] E. Houseman et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics, 13:86, 2012.
[10] A. Banerjee, C. Krumpelman, J. Ghosh, S. Basu, and R. Mooney. Model-based overlapping clustering.In KDD, 2005.
[11] E. Segal, A. Battle, and D. Koller. Decomposing gene expression into cellular processes. In Proceedings of the 8th Pacific Symposium on Biocomputing, 2003.
[12] A. Schein, L. Saul, and L. Ungar. A generalized linear model for principal component analysis of binary data. In AISTATS, 2003.
[13] A. Kaban and E. Bingham. Factorisation and denoising of 0-1 data: a variational approach. Neurocom- puting, 71:22912308, 2008.
[14] E. Meeds, Z. Gharamani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. In NIPS, 2007.
[15] Z. Zhang, C. Ding, T. Li, and X. Zhang. Binary matrix factorization with applications. In IEEE ICDM, 2007.
[16] P. Miettinen and T. Mielikainen and A. Gionis and G. Das and H. Mannila. The discrete basis problem.In PKDD, 2006.
[17] S. Arora, R. Ge, R. Kannan, and A. Moitra. Computing a nonnegative matrix factorization  provably.STOC, 2012.
[18] V. Bittdorf, B. Recht, C. Re, and J. Tropp. Factoring nonnegative matrices with linear programs. In NIPS, 2012.
[19] D. Donoho and V. Stodden. When does non-negative matrix factorization give a correct decomposition into parts? In NIPS, 2003.
[20] P. Erdos. On a lemma of Littlewood and Offord. Bull. Amer. Math. Soc, 51:898902, 1951.
[21] M. Gu and S. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization.SIAM Journal on Scientific Computing, 17:848869, 1996.
[22] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
[23] A. Odlyzko. On Subspaces Spanned by Random Selections of 1 vectors. Journal of Combinatorial Theory A, 47:124133, 1988.
[24] J. Kahn, J. Komlos, and E. Szemeredi. On the Probability that a 1 matrix is singular. Journal of the American Mathematical Society, 8:223240, 1995.
[25] H. Nguyen and V. Vu. Small ball probability, Inverse theorems, and applications. arXiv:1301.0019.
[26] T. Tao and V. Vu. The Littlewoord-Offord problem in high-dimensions and a conjecture of Frankl and Furedi. Combinatorica, 32:363372, 2012.
[27] C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19:27562779, 2007.
[28] P. Tao and L. An. Convex analysis approach to D.C. programming: theory, algorithms and applications.Acta Mathematica Vietnamica, pages 289355, 1997.
[29] https://sites.google.com/site/nicolasgillis/publications.
-----1
[1] L. Getoor and B. Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007.
[2] Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors. Probabilistic inductive logic programming: theory and applications. Springer-Verlag, 2008.
[3] David Poole. First-order probabilistic inference. In Proceedings of IJCAI, pages 985991, 2003.
[4] Manfred Jaeger and Guy Van den Broeck. Liftability of probabilistic inference: Upper and lower bounds.In Proceedings of the 2nd International Workshop on Statistical Relational AI,, 2012.
[5] Guy Van den Broeck. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Advances in Neural Information Processing Systems 24 (NIPS), pages 13861394, 2011.
[6] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Proceed- ings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 13191325, 2005.
[7] B. Milch, L.S. Zettlemoyer, K. Kersting, M. Haimes, and L.P. Kaelbling. Lifted probabilistic inference with counting formulas. Proceedings of the 23rd AAAI Conference on Artificial Intelligence, 2008.
[8] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of IJCAI, pages 21782185, 2011.
[9] N. Taghipour, D. Fierens, J. Davis, and H. Blockeel. Lifted variable elimination with arbitrary constraints.In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 2012.
[10] H.H. Bui, T.N. Huynh, and R. de Salvo Braz. Exact lifted inference with distinct soft evidence on every object. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012.
[11] Guy Van den Broeck and Jesse Davis. Conditioning in first-order knowledge compilation and lifted probabilistic inference. In Proceedings of the 26th AAAI Conference on Artificial Intelligence,, 2012.
[12] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of the 27th Confer- ence on Uncertainty in Artificial Intelligence (UAI), pages 256265, 2011.
[13] A. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted inference seen from the other side: The tractable features. In Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS), 2010.
[14] Guy Van den Broeck, Wannes Meert, and Jesse Davis. Lifted generative parameter learning. In Statistical Relational AI (StaRAI) workshop, July 2013.
[15] K. Kersting, B. Ahmadi, and S. Natarajan. Counting belief propagation. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 277284, 2009.
[16] M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1):107136, 2006.
[17] D. Seung and L. Lee. Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13:556562, 2001.
[18] M. Berry, M. Browne, A. Langville, V. Pauca, and R. Plemmons. Algorithms and applications for ap- proximate nonnegative matrix factorization. In Computational Statistics and Data Analysis, 2006.
[19] Pauli Miettinen, Taneli Mielikainen, Aristides Gionis, Gautam Das, and Heikki Mannila. The discrete basis problem. In Knowledge Discovery in Databases, pages 335346. Springer, 2006.
[20] Pauli Miettinen, Taneli Mielikainen, Aristides Gionis, Gautam Das, and Heikki Mannila. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering, 20(10):13481362, 2008.
[21] Pauli Miettinen. Sparse Boolean matrix factorizations. In IEEE 10th International Conference on Data Mining (ICDM), pages 935940. IEEE, 2010.
[22] Boris Mirkin. Mathematical classification and clustering, volume 11. Kluwer Academic Pub, 1996.
[23] Floris Geerts, Bart Goethals, and Taneli Mielikainen. Tiling databases. In Discovery science, 2004.
[24] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social networks, 5(2):109137, 1983.
[25] Pauli Miettinen. Matrix decomposition methods for data mining: Computational complexity and algo- rithms. PhD thesis, 2009.
[26] Guy Van den Broeck. Lifted Inference and Learning in Statistical Relational Models. PhD thesis, KU Leuven, January 2013.
[27] Hans L Bodlaender. Treewidth: Algorithmic techniques and results. Springer, 1997.
[28] M. Craven and S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning Journal, 43(1/2):97119, 2001.
[29] Mathias Niepert. Markov chains on orbits of permutation groups. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
[30] Mathias Niepert. Symmetry-aware marginal density estimation. In Proceedings of the 27th Conference on Artificial Intelligence (AAAI), 2013.
[31] Jesse Davis and Pedro Domingos. Deep transfer via second-order markov logic. In Proceedings of the 26th annual international conference on machine learning, pages 217224, 2009.
[32] R. de Salvo Braz, S. Natarajan, H. Bui, J. Shavlik, and S. Russell. Anytime lifted belief propagation.Proceedings of the 6th International Workshop on Statistical Relational Learning, 2009.
[33] K. Kersting, Y. El Massaoudi, B. Ahmadi, and F. Hadiji. Informed lifting for message-passing. In Proceedings of the 24th AAAI Conference on Artificial Intelligence,, 2010.
-----1
[1] R. Bailly, X. Carreras, F. M. Luque, and A. Quattoni. Unsupervised spectral learning of wcfg as low-rank matrix completion. EMNLP, 2013.
[2] R. Bailly, F. Denis, and L. Ralaivola. Grammatical inference as a principal component analysis problem. In Proc. ICML, 2009.
[3] B. Balle and M. Mohri. Spectral learning of general weighted automata via constrained matrix completion. In Proc. of NIPS, 2012.
[4] B. Balle, A. Quattoni, and X. Carreras. A spectral learning algorithm for finite state transduc- ers. ECMLPKDD, 2011.
[5] Borja Balle, Ariadna Quattoni, and Xavier Carreras. Local loss optimization in operator mod- els: A new insight into spectral learning. In John Langford and Joelle Pineau, editors, Pro- ceedings of the 29th International Conference on Machine Learning (ICML-12), ICML 12, pages 18791886, New York, NY, USA, July 2012. Omnipress.
[6] M. Bernard, J-C. Janodet, and M. Sebban. A discriminative model of stochastic edit distance in the form of a conditional transducer. Grammatical Inference: Algorithms and Applications, 4201, 2006.
[7] B. Boots, S. Siddiqi, and G. Gordon. Closing the learning planning loop with predictive state representations. I. J. Robotic Research, 2011.
[8] F. Casacuberta. Inference of finite-state transducers by using regular grammars and morphisms.Grammatical Inference: Algorithms and Applications, 1891, 2000.
[9] A. Clark. Partially supervised learning of morphology with stochastic transducers. In Proc. of NLPRS, pages 341348, 2001.
[10] Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar. Spectral learn- ing of latent-variable pcfgs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 223231, Jeju Island, Korea, July 2012. Association for Computational Linguistics.
[11] J. Eisner. Parameter estimation for probabilistic finite-state transducers. In Proc. of ACL, pages 18, 2002.
[12] G. Stewart et J.-G. Sun. Matrix perturbation theory. Academic Press, 1990.
[13] Maryam Fazel. Matrix rank minimization with applications. PhD thesis, Stanford University, Electrical Engineering Dept., 2002.
[14] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden markov models.In Proc. of COLT, 2009.
[15] Mehryar Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269311, 1997.
[16] A.P. Parikh, L. Song, and E.P. Xing. A spectral algorithm for latent tree graphical models.ICML, 2011.
[17] S.M. Siddiqi, B. Boots, and G.J. Gordon. Reduced-rank hidden markov models. AISTATS, 2010.
[18] L. Song, B. Boots, S. Siddiqi, G. Gordon, and A. Smola. Hilbert space embeddings of hidden markov models. ICML, 2010.
[19] L. Zwald and G. Blanchard. On the convergence of eigenspaces in kernel principal component analysis. NIPS, 2005.
-----1
[1] Bernhard Scholkpf and Alexander J. Smola. Learning with Kernels: Support Vector Ma- chines, Regularization, Optimization, and Beyond. MIT Press, 2001.
[2] Peter Bhlmann and Sara van de Geer. Statistics for High-Dimensional Data. Springer, 2011.
[3] Patrick L. Combettes and Valrie R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):11681200, 2005.
[4] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[5] Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Pro- gramming, Series B, 140:125161, 2013.
[6] Jerome Friedman, Trevor Hastie, Holger Hfling, and Robert Tibshirani. Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302332, 2007.
[7] Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, and Francis Bach. Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12:22972334, 2011.
[8] Jiayu Zhou, Jun Liu, Vaibhav A. Narayan, and Jieping Ye. Modeling disease progression via fused sparse group lasso. In Conference on Knowledge Discovery and Data Mining, 2012.
[9] Francesco Dinuzzo and Bernhard Schlkopf. The representer theorem for Hilbert spaces: a necessary and sufficient condition. In NIPS, 2012.
[10] Yao-Liang Yu, Hao Cheng, Dale Schuurmans, and Csaba Szepesvri. Characterizing the rep- resenter theorem. In ICML, 2013.
[11] Jean J. Moreau. Proximit et dualtit dans un espace Hilbertien. Bulletin de la Socit Math- matique de France, 93:273299, 1965.
[12] Patrick L. Combettes, inh Dung, and B`?ang Cng Vu. Proximity for sums of composite functions. Journal of Mathematical Analysis and Applications, 380(2):680688, 2011.
[13] Andr F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and Mrio A. T.Figueiredo. Online learning of structured predictors with multiple kernels. In Conference on Artificial Intelligence and Statistics, 2011.
[14] Patrick L. Combettes and Jean-Christophe Pesquet. Proximal thresholding algorithm for min- imization over orthonormal bases. SIAM Journal on Optimization, 18(4):13511376, 2007.
[15] Yaoliang Yu. Fast Gradient Algorithms for Stuctured Sparsity. PhD thesis, University of Alberta, 2013.
[16] Heinz H. Bauschke, Rafal Goebel, Yves Lucet, and Xianfu Wang. The proximal average: Basic theory. SIAM Journal on Optimization, 19(2):766785, 2008.
[17] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67:301320, 2005.
[18] Art B. Owen. A robust hybrid of lasso and ridge regression. In Prediction and Discovery, pages 5972. AMS, 2007.
[19] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805849, 2012.
[20] Xinhua Zhang, Yaoliang Yu, and Dale Schuurmans. Polar operators for structured sparse estimation. In NIPS, 2013.
[21] Howard Bondell and Brian Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics, 64(1):115123, 2008.
[22] Leon Wenliang Zhong and James T. Kwok. Efficient sparse modeling with automatic feature grouping. In ICML, 2011.
[23] John Duchi and Yoram Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:28992934, 2009.
-----1
[1] S. D. Babacan, R. Molina, M. N. Do, and A. K. Katsaggelos. Bayesian blind deconvolution with general sparse image priors. In ECCV, 2012.
[2] E. Cands and Y. Plan. Near-ideal model selection by 1 minimization. The Annals of Statistics, (5A):21452177.
[3] R. Chartrand and W. Yin. Iteratively reweighted algorithms for compressive sensing. In ICASSP, 2008.
[4] S. Cho, H. Cho, Y.-W. Tai, and S. Lee. Registration based non-uniform motion deblurring.Comput. Graph. Forum, 31(7-2):21832192, 2012.
[5] S. Cho and S. Lee. Fast motion deblurring. In SIGGRAPH ASIA, 2009.
[6] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake from a single photograph. In SIGGRAPH, 2006.
[7] A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless. Single image deblurring using motion density functions. In ECCV, 2010.
[8] S. Harmeling, M. Hirsch, and B. Schlkopf. Space-variant single-image blind deconvolution for removing camera shake. In NIPS, 2010.
[9] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schlkopf. Fast removal of non-uniform camera shake. In ICCV, 2011.
[10] M. Hirsch, S. Sra, B. Scholkopf, and S. Harmeling. Efficient filter flow for space-variant multiframe blind deconvolution. In CVPR, 2010.
[11] Z. Hu and M.-H. Yang. Fast non-uniform deblurring using constrained camera pose subspace.In BMVC, 2012.
[12] H. Ji and K. Wang. A two-stage approach to blind spatially-varying motion deblurring. In CVPR, 2012.
[13] N. Joshi, S. B. Kang, C. L. Zitnick, and R. Szeliski. Image deblurring using inertial measure- ment sensors. In ACM SIGGRAPH, 2010.
[14] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure.In CVPR, 2011.
[15] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Deconvolution using natural image priors.Technical report, MIT, 2007.
[16] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Efficient marginal likelihood optimization in blind deconvolution. In CVPR, 2011.
[17] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Understanding blind deconvolution algo- rithms. IEEE Trans. Pattern Anal. Mach. Intell., 33(12):23542367, 2011.
[18] J. G. Nagy and D. P. OLeary. Restoring images degraded by spatially variant blur. SIAM J.Sci. Comput., 19(4):10631082, 1998.
[19] J. A. Palmer. Relatve convexity. Technical report, UCSD, 2003.
[20] J. A. Palmer, D. P. Wipf, K. Kreutz-Delgado, and B. D. Rao. Variational EM algorithms for non-Gaussian latent variable models. In NIPS, 2006.
[21] Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. In SIGGRAPH, 2008.
[22] M. Sorel and F. Sroubek. Image Restoration: Fundamentals and Advances. CRC Press, 2012.
[23] Y.-W. Tai, P. Tan, and M. S. Brown. Richardson-Lucy deblurring for scenes under a projective motion path. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):16031618, 2011.
[24] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211244, 2001.
[25] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform deblurring for shaken images.In CVPR, 2010.
[26] D. P. Wipf, B. D. Rao, and S. S. Nagarajan. Latent variable Bayesian models for promoting sparsity. IEEE Trans. Information Theory, 57(9):62366255, 2011.
[27] D. P. Wipf and H. Zhang. Revisiting Bayesian blind deconvolution. submitted to Journal of Machine Learning Research, 2013.
[28] L. Xu and J. Jia. Two-phase kernel estimation for robust motion deblurring. In ECCV, 2010.
[29] H. Zhang, D. P. Wipf, and Y. Zhang. Multi-image blind deblurring using a coupled adaptive sparse prior. In CVPR, 2013.
-----1
[1] D. Alonso-Gutierrez. On the isotropy constant of random convex sets. Proceedings of the American Mathematical Society, 136(9):32933300, 2008.
[2] L. Bako. Identification of switched linear systems via sparse optimization. Automatica, 47(4):668677, 2011.
[3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1122, 2011.
[4] E.J. Cande`s and B. Recht. Exact matrix completion via convex optimization. Foundations of Computa- tional mathematics, 9(6):717772, 2009.
[5] G. Chen and G. Lerman. Spectral curvature clustering (SCC). International Journal of Computer Vision, 81(3):317330, 2009.
[6] M. Elad. Sparse and redundant representations. Springer, 2010.
[7] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Computer Vision and Pattern Recognition (CVPR09), pages 27902797. IEEE, 2009.
[8] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013.
[9] P. Favaro, R. Vidal, and A. Ravichandran. A closed form solution to robust subspace estimation and clustering. In Computer Vision and Pattern Recognition (CVPR11), pages 18011807. IEEE, 2011.
[10] P. Gritzmann and V. Klee. Computational complexity of inner and outerj-radii of polytopes in finite- dimensional normed spaces. Mathematical programming, 59(1):163213, 1993.
[11] N. Hurley and S. Rickard. Comparing measures of sparsity. Information Theory, IEEE Transactions on, 55(10):47234741, 2009.
[12] A. Jalali, Y. Chen, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization.In International Conference on Machine Learning (ICML11), pages 10011008, 2011.
[13] I.T. Jolliffe. Principal component analysis, volume 487. Springer-Verlag New York, 1986.
[14] F. Lauer and C. Schnorr. Spectral clustering of linear subspaces for motion segmentation. In International Conference on Computer Vision (ICCV09), pages 678685. IEEE, 2009.
[15] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In Advances in Neural Information Processing Systems 24 (NIPS11), pages 612620.2011.
[16] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank representation. IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 2012.
[17] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In International Conference on Machine Learning (ICML10), pages 663670, 2010.
[18] G. Liu, H. Xu, and S. Yan. Exact subspace segmentation and outlier detection by low-rank representation.In International Conference on Artificial Intelligence and Statistics (AISTATS12), 2012.
[19] B. Nasihatkon and R. Hartley. Graph connectivity in sparse subspace clustering. In Computer Vision and Pattern Recognition (CVPR11), pages 21372144. IEEE, 2011.
[20] A.Y. Ng, M.I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 15 (NIPS02), volume 2, pages 849856, 2002.
[21] E. Richard, P. Savalle, and N. Vayatis. Estimation of simultaneously sparse and low rank matrices. In International Conference on Machine learning (ICML12), 2012.
[22] M. Soltanolkotabi and E.J. Candes. A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4):21952238, 2012.
[23] R. Tron and R. Vidal. A benchmark for the comparison of 3-d motion segmentation algorithms. In Computer Vision and Pattern Recognition (CVPR07), pages 18. IEEE, 2007.
[24] R. Vidal. Subspace clustering. Signal Processing Magazine, IEEE, 28(2):5268, 2011.
[25] R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis (gpca). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):19451959, 2005.
[26] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. An algebraic geometric approach to the identification of a class of linear hybrid systems. In Decision and Control, 2003. Proceedings. 42nd IEEE Conference on, volume 1, pages 167172. IEEE, 2003.
[27] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using powerfac- torization and gpca. International Journal of Computer Vision, 79(1):85105, 2008.
[28] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395416, 2007.
[29] Y.X. Wang and H. Xu. Noisy sparse subspace clustering. In International Conference on Machine Learning (ICML13), volume 28, pages 100108, 2013.
-----1
[1] N. Srebro, J. D. M. Rennie, and T. S. Jaakola. Maximum-margin matrix factorization. In Neural Information Processing Systems, 2005.
[2] N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In 18th Annual Conference on Computational Learning Theory (COLT), pages 545560, 2005.
[3] R. Foygel and N. Srebro. Concentration-based guarantees for low-rank matrix reconstruction.Technical report, arXiv, 2011.
[4] E. J. Candes and T. Tao. The power of convex relaxation: near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):20532080, 2010.
[5] E. J. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717772, 2009.
[6] B. Recht. A simpler approach to matrix completion. Technical report, arXiv, 2009.
[7] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. Journal of Machine Learning Research, 11:20572078, 2010.
[8] V. Koltchinskii, A. B. Tsybakov, and K. Lounici. Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Technical report, arXiv, 2010.
[9] E. Heiman, G. Schechtman, and A. Shraibman. Deterministic algorithms for matrix comple- tion. Random Structures and Algorithms, 2013.
[10] A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, 8:261277, 1988.
[11] N. Linial, S. Mendelson, G. Schechtman, and A. Shraibman. Complexity measures of sign matrices. Combinatorica, 27(4):439463, 2007.
-----1
[1] Q. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proceedings ICML, 2012.
[2] N. Srivastava and R. Salakhutdinov. Multimodal learning with deep Boltzmann machines. In NIPS, 2012.
[3] Y. Bengio. Learning deep architectures for AI. Foundat. and Trends in Machine Learning, 2:1127, 2009.
[4] G. Hinton. Learning multiple layers of representations. Trends in Cognitive Sciences, 11:428434, 2007.
[5] G. Hinton, S. Osindero, and Y. Teh. A fast algorithm for deep belief nets. Neur. Comp., 18(7), 2006.
[6] N. Lawrence. Probabilistic non-linear principal component analysis. JMLR, 6:17831816, 2005.
[7] A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn.Res., 6:17051749, 2005.
[8] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictio- naries. IEEE Trans. on Image Processing, 15:37363745, 2006.
[9] P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287314, 1994.
[10] M. Carreira-Perpinan and Z. Lu. dimensionality reduction by unsupervised regression. In CVPR, 2010.
[11] N. Tishby, F. Pereira, and W. Bialek. The information bottleneck method. In Allerton Conf., 1999.
[12] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In ICML, 2011.
[13] K. Swersky, M. Ranzato, D. Buchman, B. Marlin, and N. de Freitas. On autoencoders and score matching for energy based models. In Proceedings ICML, 2011.
[14] Y. LeCun. Who is afraid of non-convex loss functions? http://videolectures.net/eml07 lecun wia, 2007.
[15] Y. Bengio, N. Le Roux, P. Vincent, and O. Delalleau. Convex neural networks. In NIPS, 2005.
[16] S. Nowozin and G. Bakir. A decoupled approach to exemplar-based unsupervised learning. In Proceed- ings of the International Conference on Machine Learning, 2008.
[17] D. Bradley and J. Bagnell. Convex coding. In UAI, 2009.
[18] A. Joulin and F. Bach. A convex relaxation for weakly supervised classifiers. In Proc. ICML, 2012.
[19] A. Joulin, F. Bach, and J. Ponce. Efficient optimization for discrimin. latent class models. In NIPS, 2010.
[20] Y. Guo and D. Schuurmans. Convex relaxations of latent variable training. In Proc. NIPS 20, 2007.
[21] A. Goldberg, X. Zhu, B. Recht, J. Xu, and R. Nowak. Transduction with matrix completion: Three birds with one stone. In NIPS 23, 2010.
[22] E. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? arXiv:0912.3599, 2009.
[23] X. Zhang, Y. Yu, and D. Schuurmans. Accelerated training for matrix-norm regularization: A boosting approach. In Advances in Neural Information Processing Systems 25, 2012.
[24] A. Anandkumar, D. Hsu, and S. Kakade. A method of moments for mixture models and hidden Markov models. In Proc. Conference on Learning Theory, 2012.
[25] D. Hsu and S. Kakade. Learning mixtures of spherical Gaussians: Moment methods and spectral decom- positions. In Innovations in Theoretical Computer Science (ITCS), 2013.
[26] Y. Cho and L. Saul. Large margin classification in infinite neural networks. Neural Comput., 22, 2010.
[27] R. Neal. Connectionist learning of belief networks. Artificial Intelligence, 56(1):71113, 1992.
[28] G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. JMAA, 33:8295, 1971.
[29] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector ma- chines. JMLR, pages 265292, 2001.
[30] J. Fuernkranz, E. Huellermeier, E. Mencia, and K. Brinker. Multilabel classification via calibrated label ranking. Machine Learning, 73(2):133153, 2008.
[31] Y. Guo and D. Schuurmans. Adaptive large margin training for multilabel classification. In AAAI, 2011.
[32] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Mach. Learn., 73(3), 2008.
[33] Y. Nesterov and A. Nimirovskii. Interior-Point Polynomial Algorithms in Convex Programming. 1994.
[34] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge U. Press, 2004.
[35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundat. Trends in Mach. Learn., 3(1):1123, 2010.
[36] S. Laue. A hybrid algorithm for convex semidefinite optimization. In Proc. ICML, 2012.
[37] O. Chapelle. Training a support vector machine in the primal. Neural Comput., 19(5):11551178, 2007.
[38] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.
[39] V. Sindhwani and S. Keerthi. Large scale semi-supervised linear SVMs. In SIGIR, 2006.
[40] http://olivier.chapelle.cc/ssl- book/benchmarks.html.
[41] http://archive.ics.uci.edu/ml/datasets.
[42] http://www.cs.toronto.edu/ kriz/cifar.html.
-----1
[1] Arthur E. Hoerl and Robert W. Kennard. Ridge regression: applications to nonorthogonal problems. Technometrics, 12(1):6982, 1970.
[2] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267288, 1996.
[3] Matthieu Kowalski. Sparse regression using mixed norms. Applied and Computational Har- monic Analysis, 27(3):303324, 2009.
[4] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optmization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1106, 2012.
[5] Rodolphe Jenatton, Guillaume Obozinski, and Francis Bach. Active set algorithm for struc- tured sparsity-inducing norms. In OPT 2009: 2nd NIPS Workshop on Optimization for Ma- chine Learning, 2009.
[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[7] Remi Gribonval, Volkan Cevher, and Mike Davies, E. Compressible Distributions for High- dimensional Statistics. IEEE Transactions on Information Theory, 2012.
[8] Remi Gribonval. Should penalized least squares regression be interpreted as maximum a pos- teriori estimation? IEEE Transactions on Signal Processing, 59(5):24052410, 2011.
[9] Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems.Core discussion papers, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2010.
[10] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In Proceedings of the 25th International Conference on Machine Learning, pages 408415, 2008.
[11] Pierre Machart, Thomas Peel, Liva Ralaivola, Sandrine Anthoine, and Herve Glotin. Stochas- tic low-rank kernel learning for regression. In 28th International Conference on Machine Learning, 2011.
[12] Martin Raphan and Eero P. Simoncelli. Learning to be bayesian without supervision. In in Adv. Neural Information Processing Systems (NIPS*06. MIT Press, 2007.
[13] Remi Gribonval and Pierre Machart. Reconciling priors & priors without prejudice? Re- search report RR-8366, INRIA, September 2013.
-----0
Artemiou, A. and Li, B. (2009). On principal components and regression: a statistical explanation of a natural phenomenon. Statistica Sinica, 19(4):1557.
Bunea, F. and Xiao, L. (2012). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. arXiv preprint arXiv:1212.5321.
Cai, T. T., Ma, Z., and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. The Annals of Statistics (to appear).Choi, K. and Marden, J. (1998). A multivariate version of kendalls ? . Journal of Nonparametric Statistics, 9(3):261293.
Cook, R. D. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1):126.Croux, C., Ollila, E., and Oja, H. (2002). Sign and rank covariance matrices: statistical properties and application to principal components analysis. In Statistical data analysis based on the L1-norm and related methods, pages 257269. Springer.dAspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM review, 49(3):434448.
Fang, K., Kotz, S., and Ng, K. (1990). Symmetric multivariate and related distributions. Chapman&Hall, London.
Han, F. and Liu, H. (2013a). Optimal sparse principal component analysis in high dimensional elliptical model.arXiv preprint arXiv:1310.3561.Han, F. and Liu, H. (2013b). Scale-invariant sparse PCA on high dimensional meta-elliptical data. Journal of the American Statistical Association (in press).
Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486).Kendall, M. G. (1968). A course in multivariate analysis.
Kluppelberg, C., Kuhn, G., and Peng, L. (2007). Estimating the tail dependence function of an elliptical distribution. Bernoulli, 13(1):229251.
Lounici, K. (2012). Sparse principal component analysis with missing observations. arXiv preprint arXiv:1205.7060.
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. to appear Annals of Statistics.Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60(309):234256.
Moghaddam, B., Weiss, Y., and Avidan, S. (2006). Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in neural information processing systems, 18:915.
Oja, H. (2010). Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks, volume 199. Springer.
Ravikumar, P., Raskutti, G., Wainwright, M., and Yu, B. (2008). Model selection in gaussian graphical models: High-dimensional consistency of l1-regularized mle. Advances in Neural Information Processing Systems (NIPS), 21.
Tyler, D. E. (1987). A distribution-freem-estimator of multivariate scatter. The Annals of Statistics, 15(1):234 251.
Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse pca in high dimensions. Journal of Machine Learning Research (AIStats Track).Yuan, X. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research, 14:899925.Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265286.
-----1
[1] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[3] Chaitanya Desai, Deva Ramanan, and Charless C. Fowlkes. Discriminative models for multi-class object layout. International Journal of Computer Vision, 95(1):112, 2011.
[4] Thomas G. Dietterich, Adam Ashenfelter, and Yaroslav Bulatov. Training conditional random fields via gradient tree boosting. In ICML, 2004.
[5] Justin Domke. Learning graphical model parameters with approximate marginal inference. PAMI, 35(10):24542467, 2013.
[6] Thomas Finley and Thorsten Joachims. Training structural svms when exact inference is intractable. In ICML, 2008.
[7] Jerome H. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367 378, 1999.
[8] Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, and Daphne Koller. Multi-class segmentation with relative location prior. IJCV, 80(3):300316, 2008.
[9] Tamir Hazan and Raquel Urtasun. Efficient learning of structured predictors in general graphical models.CoRR, abs/1210.2346, 2012.
[10] Xuming He, Richard S. Zemel, and Miguel . Carreira-Perpin. Multiscale conditional random fields for image labeling. In CVPR, 2004.
[11] Tom Heskes. Convexity arguments for efficient minimization of the bethe and kikuchi free energies. J.Artif. Intell. Res. (JAIR), 26:153190, 2006.
[12] Sanjiv Kumar and Martial Hebert. Discriminative fields for modeling spatial dependencies in natural images. In NIPS, 2003.
[13] Lubor Ladicky, Christopher Russell, Pushmeet Kohli, and Philip H. S. Torr. Associative hierarchical CRFs for object class image segmentation. In ICCV, 2009.
[14] Andr F. T. Martins, Noah A. Smith, and Eric P. Xing. Polyhedral outer approximations with application to natural language parsing. In ICML, 2009.
[15] Ofer Meshi, Tommi Jaakkola, and Amir Globerson. Convergence rate analysis of MAP coordinate mini- mization algorithms. In NIPS. 2012.
[16] Ofer Meshi, David Sontag, Tommi Jaakkola, and Amir Globerson. Learning efficiently with approximate inference via dual losses. In ICML, 2010.
[17] Sebastian Nowozin, Peter V. Gehler, and Christoph H. Lampert. On parameter learning in CRF-based approaches to object class image segmentation. In ECCV, 2010.
[18] Sebastian Nowozin, Carsten Rother, Shai Bagon, Toby Sharp, Bangpeng Yao, and Pushmeet Kohli. De- cision tree fields. In ICCV, 2011.
[19] Florian Schroff, Antonio Criminisi, and Andrew Zisserman. Object class segmentation using random forests. In BMVC, 2008.
[20] Jamie Shotton, John M. Winn, Carsten Rother, and Antonio Criminisi. Textonboost for image understand- ing: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context.IJCV, 81(1):223, 2009.
[21] Nathan Silberman and Rob Fergus. Indoor scene segmentation using a structured light sensor. In ICCV Workshops, 2011.
[22] Benjamin Taskar, Carlos Guestrin, and Daphne Koller. Max-margin markov networks. In NIPS, 2003.
[23] Jakob J. Verbeek and Bill Triggs. Scene segmentation with crfs learned from partially labeled images. In NIPS, 2007.
[24] John M. Winn and Jamie Shotton. The layout consistent random field for recognizing and segmenting partially occluded objects. In CVPR, 2006.
[25] Jianxiong Xiao and Long Quan. Multiple view semantic segmentation for street view images. In ICCV, 2009.
-----2
1. Sommer, F.T. & Dayan, P. Bayesian retrieval in associative memories with storage errors. IEEE transac- tions on neural networks 9, 705713 (1998).
2. Lengyel, M., Kwag, J., Paulsen, O. & Dayan, P. Matching storage and recall: hippocampal spike timing- dependent plasticity and phase response curves. Nature Neuroscience 8, 16771683 (2005).
3. Lengyel, M. & Dayan, P. Uncertainty, phase and oscillatory hippocampal recall. Advances in Neural Information Processing (2007).
4. Savin, C., Dayan, P. & Lengyel, M. Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories. in Advances in Neural Information Processing Systems, 24 (MIT Press, Cambridge, MA, 2011).
5. Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities.Proc. Natl. Acad. Sci. USA 76, 25542558 (1982).
6. Song, S., Sjostrom, P.J., Reigl, M., Nelson, S. & Chklovskii, D.B. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS biology 3, e68 (2005).
7. Dayan, P. & Abbott, L. Theoretical Neuroscience (MIT Press, 2001).
8. Averbeck, B.B., Latham, P.E. & Pouget, A. Neural correlations, population coding and computation.Nature Reviews Neuroscience 7, 358366 (2006).
9. Pillow, J.W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population.Nature 454, 995999 (2008).
10. Latham, P.E. & Nirenberg, S. Synergy, redundancy, and independence in population codes, revisited.Journal of Neuroscience 25, 51955206 (2005).
11. Branco, T. & Hausser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. Neuron 69, 885892 (2011).
12. Hasselmo, M.E. & Bower, J.M. Acetylcholine and memory. Trends Neurosci. 16, 218222 (1993).
13. MacKay, D.J.C. Maximum entropy connections: neural networks. in Maximum Entropy and Bayesian Methods, Laramie, 1990 (eds. Grandy, Jr, W.T. & Schick, L.H.) 237244 (Kluwer, Dordrecht, The Nether- lands, 1991).
14. Fusi, S., Drew, P.J. & Abbott, L.F. Cascade models of synaptically stored memories. Neuron 45, 599611 (2005).
15. Abraham, W.C. Metaplasticity: tuning synapses and networks for plasticity. Nature Reviews Neuroscience 9, 387 (2008).
16. For details, see Supplementary Information.
17. Zhang, W. & Linden, D. The other side of the engram: experience-driven changes in neuronal intrinsic excitability. Nature Reviews Neuroscience (2003).
18. Engel, A., Englisch, H. & Schutte, A. Improved retrieval in neural networks with external fields. Euro- physics Letters (EPL) 8, 393397 (1989).
19. Leibold, C. & Kempter, R. Sparseness constrains the prolongation of memory lifetime via synaptic meta- plasticity. Cerebral cortex (New York, N.Y. : 1991) 18, 6777 (2008).
20. Amit, Y. & Huang, Y. Precise capacity analysis in binary networks with multiple coding level inputs.Neural computation 22, 660688 (2010).
21. Huang, Y. & Amit, Y. Capacity analysis in multi-state synaptic models: a retrieval probability perspective.Journal of computational neuroscience (2011).
22. Dayan Rubin, B. & Fusi, S. Long memory lifetimes require complex synapses and limited sparseness.Frontiers in Computational Neuroscience (2007).
23. Thouless, D.J., Anderson, P.W. & Palmer, R.G. Solution of Solvable model of a spin glass. Philosophical Magazine 35, 593601 (1977).
24. Amit, D., Gutfreund, H. & Sompolinsky, H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett 55, 15301533 (1985).
25. Treves, A. & Rolls, E.T. What determines the capacity of autoassociative memories in the brain? Network 2, 371397 (1991).
-----1
[1] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. U.S.A. 79 (1982) no. 8, 25542558.
[2] D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin-glass models of neural networks, Phys.Rev. A 32 (Aug, 1985) 10071018.
[3] E. Gardner, The space of interactions in neural network models, Journal of Physics A: Mathematical and General 21 (1988) no. 1, 257.
[4] T. V. P. Bliss and G. L. Collingridge, A synaptic model of memory: long-term potentiation in the hippocampus, Nature 361 (Jan, 1993) 3139.
[5] C. C. H. Petersen, R. C. Malenka, R. A. Nicoll, and J. J. Hopfield, All-or-none potentiation at CA3-CA1 synapses, Proc. Natl. Acad. Sci. U.S.A. 95 (1998) no. 8, 47324737.
[6] D. H. OConnor, G. M. Wittenberg, and S. S.-H. Wang, Graded bidirectional synaptic plasticity is composed of switch-like unitary events, Proc. Natl. Acad. Sci. U.S.A. 102 (2005) no. 27, 96799684.
[7] R. Enoki, Y. ling Hu, D. Hamilton, and A. Fine, Expression of Long-Term Plasticity at Individual Synapses in Hippocampus Is Graded, Bidirectional, and Mainly Presynaptic: Optical Quantal Analysis, Neuron 62 (2009) no. 2, 242  253.
[8] D. J. Amit and S. Fusi, Constraints on learning in dynamic synapses, Network: Computation in Neural Systems 3 (1992) no. 4, 443464.
[9] D. J. Amit and S. Fusi, Learning in neural networks with material synapses, Neural Computation 6 (1994) no. 5, 957982.
[10] S. Fusi, P. J. Drew, and L. F. Abbott, Cascade models of synaptically stored memories, Neuron 45 (Feb, 2005) 599611.
[11] S. Fusi and L. F. Abbott, Limits on the memory storage capacity of bounded synapses, Nat.Neurosci. 10 (Apr, 2007) 485493.
[12] C. Leibold and R. Kempter, Sparseness Constrains the Prolongation of Memory Lifetime via Synaptic Metaplasticity, Cerebral Cortex 18 (2008) no. 1, 6777.
[13] D. S. Bredt and R. A. Nicoll, AMPA Receptor Trafficking at Excitatory Synapses, Neuron 40 (2003) no. 2, 361  379.
[14] M. P. Coba, A. J. Pocklington, M. O. Collins, M. V. Kopanitsa, R. T. Uren, S. Swamy, M. D.Croning, J. S. Choudhary, and S. G. Grant, Neurotransmitters drive combinatorial multistate postsynaptic density networks, Sci Signal 2 (2009) no. 68, ra19.
[15] W. C. Abraham and M. F. Bear, Metaplasticity: the plasticity of synaptic plasticity, Trends in Neurosciences 19 (1996) no. 4, 126  130.
[16] J. M. Montgomery and D. V. Madison, State-Dependent Heterogeneity in Synaptic Depression between Pyramidal Cell Pairs, Neuron 33 (2002) no. 5, 765  777.
[17] R. D. Emes and S. G. Grant, Evolution of Synapse Complexity and Diversity, Annual Review of Neuroscience 35 (2012) no. 1, 111131.
[18] A. B. Barrett and M. C. van Rossum, Optimal learning rules for discrete synapses, PLoS Comput. Biol. 4 (Nov, 2008) e1000230.
[19] J. Kemeny and J. Snell, Finite markov chains. Springer, 1960.
[20] C. Burke and M. Rosenblatt, A Markovian function of a Markov chain, The Annals of Mathematical Statistics 29 (1958) no. 4, 11121122.
[21] F. Ball and G. F. Yeo, Lumpability and Marginalisability for Continuous-Time Markov Chains, Journal of Applied Probability 30 (1993) no. 3, 518528.
-----1
[1] K. H. Schindler, M. Palus, M. Vejmelka, and J. Bhattacharya. Causality detection based on information- theoretic approaches in time series analysis. Physics Reports, 441:146, 2007.
[2] A. Renyi. On measures of dependence. Acta Mathematica Hungarica, 10(3-4):441451, 9 1959.
[3] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, 14(3):462467, 1968.
[4] A. Chao and T. Shen. Nonparametric estimation of Shannons index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4):429443, 2003.
[5] P. Grassberger. Estimating the information content of symbol sequences and efficient codes. Information Theory, IEEE Transactions on, 35(3):669675, 1989.
[6] S. Ma. Calculation of entropy from data of motion. Journal of Statistical Physics, 26(2):221240, 1981.
[7] S. Panzeri, R. Senatore, M. A. Montemurro, and R. S. Petersen. Correcting for the sampling bias problem in spike train information measures. J Neurophysiol, 98(3):10641072, Sep 2007.
[8] L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:11911253, 2003.
[9] W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck, R., and D. Warland. Reading a neural code. Science, 252:18541857, 1991.
[10] R. Strong, S. Koberle, de Ruyter van Steveninck R., and W. Bialek. Entropy and information in neural spike trains. Physical Review Letters, 80:197202, 1998.
[11] L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:11911253, 2003.
[12] R. Barbieri, L. Frank, D. Nguyen, M. Quirk, V. Solo, M. Wilson, and E. Brown. Dynamic analyses of information encoding in neural ensembles. Neural Computation, 16:277307, 2004.
[13] M. Kennel, J. Shlens, H. Abarbanel, and E. Chichilnisky. Estimating entropy rates with Bayesian confi- dence intervals. Neural Computation, 17:15311576, 2005.
[14] J. Victor. Approaches to information-theoretic analysis of neural activity. Biological theory, 1(3):302 316, 2006.
[15] J. Shlens, M. B. Kennel, H. D. I. Abarbanel, and E. J. Chichilnisky. Estimating information rates with confidence intervals in neural spike trains. Neural Computation, 19(7):16831719, Jul 2007.
[16] V. Q. Vu, B. Yu, and R. E. Kass. Coverage-adjusted entropy estimation. Statistics in medicine, 26(21):40394060, 2007.
[17] V. Q. Vu, B. Yu, and R. E. Kass. Information in the nonstationary case. Neural Computation, 21(3):688 703, 2009, http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2008.01-08-700. PMID: 18928371.
[18] E. Archer, I. M. Park, and J. Pillow. Bayesian estimation of discrete entropy with mixtures of stick- breaking priors. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 20242032. MIT Press, Cambridge, MA, 2012.
[19] I. Nemenman, F. Shafee, and W. Bialek. Entropy and inference, revisited. In Advances in Neural Infor- mation Processing Systems 14, pages 471478. MIT Press, Cambridge, MA, 2002.
[20] M. Okun, P. Yger, S. L. Marguet, F. Gerard-Mercier, A. Benucci, S. Katzner, L. Busse, M. Carandini, and K. D. Harris. Population rate dynamics and multineuron firing patterns in sensory cortex. The Journal of Neuroscience, 32(48):1710817119, 2012, http://www.jneurosci.org/content/32/48/17108.full.pdf+html.
[21] G. Tkac?ik, O. Marre, T. Mora, D. Amodei, M. J. Berry II, and W. Bialek. The simplest maximum entropy model for collective behavior in a neural network. Journal of Statistical Mechanics: Theory and Experiment, 2013(03):P03011, 2013.
[22] D. Wolpert and D. Wolf. Estimating functions of probability distributions from a finite set of samples.Physical Review E, 52(6):68416854, 1995.
[23] I. M. Park, E. Archer, K. Latimer, and J. W. Pillow. Universal models for binary spike patterns using centered Dirichlet processes. In Advances in Neural Information Processing Systems (NIPS), 2013.
[24] A. Panzeri, S. Treves, S. Schultz, and E. Rolls. On decoding the responses of a population of neurons from short time windows. Neural Computation, 11:15531577, 1999.
-----1
[1] J. N. D. Kerr and W. Denk, Imaging in vivo: watching the brain in action, Nat Rev Neurosci, vol. 9, no. 3, pp. 195205, 2008.
[2] C. Grienberger and A. Konnerth, Imaging calcium in neurons., Neuron, vol. 73, no. 5, pp. 862885, 2012.
[3] S. Lefort, C. Tomm, J.-C. Floyd Sarria, and C. C. H. Petersen, The excitatory neuronal network of the C2 barrel column in mouse primary somatosensory cortex., Neuron, vol. 61, no. 2, pp. 301316, 2009.
[4] D. J. Tolhurst, J. A. Movshon, and A. F. Dean, The statistical reliability of signals in single neurons in cat and monkey visual cortex, Vision research, vol. 23, no. 8, pp. 775785, 1983.
[5] W. R. Softky and C. Koch, The highly irregular firing of cortical cells is inconsistent with temporal integration of random epsps, The Journal of Neuroscience, vol. 13, no. 1, pp. 334350, 1993.
[6] Y. Mishchenko, J. T. Vogelstein, and L. Paninski, A bayesian approach for inferring neuronal connec- tivity from calcium fluorescent imaging data, The Annals of Applied Statistics, vol. 5, no. 2B, pp. 1229 1261, 2011.
[7] O. Stetter, D. Battaglia, J. Soriano, and T. Geisel, Model-free reconstruction of excitatory neuronal connectivity from calcium imaging signals, PLoS Comp Bio, vol. 8, no. 8, p. e1002653, 2012.
[8] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. J. Chichilnisky, and E. P. Simoncelli, Spatio-temporal correlations and visual signalling in a complete neuronal population., Nature, vol. 454, no. 7207, pp. 995999, 2008.
[9] I. H. Stevenson, J. M. Rebesco, L. E. Miller, and K. P. Kording, Inferring functional connections between neurons, Current opinion in neurobiology, vol. 18, no. 6, pp. 582588, 2008.
[10] A. Singh and N. A. Lesica, Incremental mutual information: A new method for characterizing the strength and dynamics of connections in neuronal circuits, PLoS Comp Bio, vol. 6, no. 12, p. e1001035, 2010.
[11] D. Song, H. Wang, C. Y. Tu, V. Z. Marmarelis, R. E. Hampson, S. A. Deadwyler, and T. W. Berger, Identification of sparse neural functional connectivity using penalized likelihood estimation and basis functions, J Comp Neursci, pp. 123, 2013.
[12] A. Wohrer, R. Romo, and C. Machens, Linear readout from a neural population with partial correlation data, in Advances in Neural Information Processing Systems, vol. 22, Curran Associates, Inc., 2010.
[13] J. W. Pillow and P. Latham, Neural characterization in partially observed populations of spiking neu- rons, Adv Neural Information Processing Systems, vol. 20, no. 3.5, 2008.
[14] A. Pakman, J. H. Huggins, and P. L., Fast penalized state-space methods for inferring dendritic synaptic connectivity, Journal of Computational Neuroscience, 2013.
[15] Y. Mishchenko and L. Paninski, A bayesian compressed-sensing approach for reconstructing neural connectivity from subsampled anatomical data, J Comp Neurosci, vol. 33, no. 2, pp. 371388, 2012.
[16] J. T. Vogelstein, B. O. Watson, A. M. Packer, R. Yuste, B. Jedynak, and L. Paninski, Spike inference from calcium imaging using sequential monte carlo methods, Biophysical Journal, vol. 97, no. 2, pp. 636 655, 2009.
[17] M. Vidne, Y. Ahmadian, J. Shlens, J. Pillow, J. Kulkarni, A. Litke, E. Chichilnisky, E. Simoncelli, and L. Paninski, Modeling the impact of common noise inputs on the network activity of retinal ganglion cells., J Comput Neurosci, 2011.
[18] J. H. Macke, L. Busing, J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani., Empirical models of spiking in neural populations., in Advances in Neural Information Processing Systems, vol. 24, Curran Associates, Inc., 2012.
[19] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B, vol. 39, no. 1, pp. 138, 1977.
[20] P. Liang and D. Klein, Online EM for unsupervised models, in NAACL 09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2009.
[21] T. Katayama, Subspace methods for system identification. Springer Verlag, 2005.
[22] L. E. Baum and T. Petrie, Statistical Inference for Probabilistic Functions of Finite State Markov Chains, The Annals of Mathematical Statistics, vol. 37, no. 6, pp. 15541563, 1966.
[23] F. Takens, Detecting Strange Attractors In Turbulence, in Dynamical Systems and Turbulence (D. A.Rand and L. S. Young, eds.), vol. 898 of Lecture Notes in Mathematics, (Warwick), pp. 366381, Springer-Verlag, Berlin, 1981.
[24] S. W. Linderman and R. P. Adams, Inferring functional connectivity with priors on network topology, in Cosyne Abstracts, 2013.
-----1
[1] A. Treves and E. T. Rolls, Computational analysis of the role of the hippocampus in memory, Hip- pocampus, vol. 4, pp. 374391, Jun. 1994.
[2] D. A. Wilson and R. M. Sullivan, Cortical processing of odor objects, Neuron, vol. 72, pp. 506519, Nov. 2011.
[3] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. U.S.A., vol. 79, pp. 25542558, Apr. 1982.
[4] R. J. McEliece, E. C. Posner, E. R. Rodemich, and S. S. Venkatesh, The capacity of the Hopfield asso- ciative memory, IEEE Trans. Inf. Theory, vol. IT-33, pp. 461482, 1987.
[5] D. J. Amit and S. Fusi, Learning in neural networks with material synapses, Neural Comput., vol. 6, pp.957982, Sep. 1994.
[6] B. A. Olshausen and D. J. Field, Sparse coding of sensory inputs, Curr. Opin. Neurobiol., vol. 14, pp.481487, Aug. 2004.
[7] A. A. Koulakov and D. Rinberg, Sparse incomplete representations: A potential role of olfactory granule cells, Neuron, vol. 72, pp. 124136, Oct. 2011.
[8] A. H. Salavati and A. Karbasi, Multi-level error-resilient neural networks, in Proc. 2012 IEEE Int. Symp.Inf. Theory, Jul. 2012, pp. 10641068.
[9] A. Karbasi, A. H. Salavati, and A. Shokrollahi, Iterative learning and denoising in convolutional neural associative memories, in Proc. 30th Int. Conf. Mach. Learn. (ICML 2013), Jun. 2013, pp. 445453.
[10] N. Brunel, V. Hakim, P. Isope, J.-P. Nadal, and B. Barbour, Optimal information storage and the distri- bution of synaptic weights: Perceptron versus Purkinje cell, Neuron, vol. 43, pp. 745757, 2004.
[11] L. R. Varshney, P. J. Sjostrom, and D. B. Chklovskii, Optimal information storage in noisy synapses under resource constraints, Neuron, vol. 52, pp. 409423, Nov. 2006.
[12] C. Koch, Biophysics of Computation. New York: Oxford University Press, 1999.
[13] M. D. McDonnell and L. M. Ward, The benefits of noise in neural systems: bridging theory and experi- ment, Nat. Rev. Neurosci., vol. 12, pp. 415426, Jul. 2011.
[14] H. Chen, P. K. Varshney, S. M. Kay, and J. H. Michels, Theory of the stochastic resonance effect in signal detection: Part Ifixed detectors, IEEE Trans. Signal Process., vol. 55, pp. 31723184, Jul. 2007.
[15] D. A. Spielman and S.-H. Teng, Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time, J. ACM, vol. 51, pp. 385463, May 2004.
[16] D. J. Amit, Modeling Brain Function. Cambridge: Cambridge University Press, 1992.
[17] M. G. Taylor, Reliable information storage in memories designed from unreliable components, Bell Syst. Tech. J., vol. 47, pp. 22992337, Dec. 1968.
[18] A. V. Kuznetsov, Information storage in a memory assembled from unreliable components, Probl. Inf.Transm., vol. 9, pp. 100114, July-Sept. 1973.
[19] L. R. Varshney, Performance of LDPC codes under faulty iterative decoding, IEEE Trans. Inf. Theory, vol. 57, pp. 44274444, Jul. 2011.
[20] V. Gripon and C. Berrou, Sparse neural networks with large learning diversity, IEEE Trans. Neural Netw., vol. 22, pp. 10871096, Jul. 2011.
[21] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in Proc. 25th Int. Conf. Mach. Learn. (ICML 2008), Jul. 2008, pp. 10961103.
[22] Q. V. Le, J. Ngiam, Z. Chen, D. Chia, P. W. Koh, and A. Y. Ng, Tiled convolutional neural networks, in Advances in Neural Information Processing Systems 23, J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, Eds. Cambridge, MA: MIT Press, 2010, pp. 12791287.
[23] A. Karbasi, A. H. Salavati, A. Shokrollahi, and L. R. Varshney, Noise-enhanced associative memories, arXiv, 2013.
[24] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman, Efficient erasure correcting codes, IEEE Trans. Inf. Theory, vol. 47, pp. 569584, Feb. 2001.
[25] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge: Cambridge University Press, 2008.
[26] M. Yoshida, H. Hayashi, K. Tateno, and S. Ishizuka, Stochastic resonance in the hippocampal CA3CA1 model: a possible memory recall mechanism, Neural Netw., vol. 15, pp. 11711183, Dec. 2002.
[27] R. Sarpeshkar, Analog versus digital: Extrapolating from electronics to neurobiology, Neural Comput., vol. 10, pp. 16011638, Oct. 1998.
[28] N. H. Mackworth, Effects of heat on wireless telegraphy operators hearing and recording Morse mes- sages, Br. J. Ind. Med., vol. 3, pp. 143158, Jul. 1946.
-----1
[1] J. Fiser, P. Berkes, G. Orban, and M. Lengyel. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci. (Regul. Ed.), 14(3):119130, Mar 2010.
[2] R. Vincis, O. Gschwend, K. Bhaukaurally, J. Beroud, and A. Carleton. Dense representation of natural odorants in the mouse olfactory bulb. Nat. Neurosci., 15(4):537539, Apr 2012.
[3] Jeff Beck, Katherine Heller, and Alexandre Pouget. Complex inference in neural circuits with probabilistic population codes and topic models. In NIPS, 2012.
[4] W. Rall and G. M. Shepherd. Theoretical reconstruction of field potentials and dendrodendritic synaptic interactions in olfactory bulb. J. Neurophysiol., 31(6):884915, Nov 1968.
[5] Shepherd GM, Chen WR, and Greer CA. The synaptic organization of the brain, volume 4, chapter Olfactory bulb, pages 165216. Oxford University Press Oxford, 2004.
[6] A. A. Koulakov and D. Rinberg. Sparse incomplete representations: a potential role of olfac- tory granule cells. Neuron, 72(1):124136, Oct 2011.
[7] Shawn Olsen, Vikas Bhandawat, and Rachel Wilson. Divisive normalization in olfactory pop- ulation codes. Neuron, 66(2):287299, 2010.
[8] P. Mombaerts. Genes and ligands for odorant, vomeronasal and taste receptors. Nat. Rev.Neurosci., 5(4):263278, Apr 2004.
[9] D. G. Laing and G. W. Francis. The capacity of humans to identify odors in mixtures. Physiol.Behav., 46(5):809814, Nov 1989.
[10] W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic population codes. Nat. Neurosci., 9(11):14321438, Nov 2006.
[11] R. Kiani and M. N. Shadlen. Representation of confidence associated with a decision by neurons in the parietal cortex. Science, 324(5928):759764, May 2009.
[12] G. Laurent, M. Stopfer, R. W. Friedrich, M. I. Rabinovich, A. Volkovskii, and H. D. Abarbanel.Odor encoding as an active, dynamical process: experiments, computation, and theory. Annu.Rev. Neurosci., 24:263297, 2001.
[13] H. Spors and A. Grinvald. Spatio-temporal dynamics of odor representations in the mammalian olfactory bulb. Neuron, 34(2):301315, Apr 2002.
[14] Kevin Cury and Naoshige Uchida. Robust odor coding via inhalation-coupled transient activity in the mammalian olfactory bulb. Neuron, 68(3):570585, 2010.
[15] Z. Li and J. J. Hopfield. Modeling the olfactory bulb and its neural oscillatory processings.Biol Cybern, 61(5):379392, 1989.
[16] Y. Yu, T. S. McTavish, M. L. Hines, G. M. Shepherd, C. Valenti, and M. Migliore. Sparse distributed representation of odors in a large-scale olfactory bulb circuit. PLoS Comput. Biol., 9(3):e1003014, 2013.
[17] Z. Li. A model of olfactory adaptation and sensitivity enhancement in the olfactory bulb. Biol Cybern, 62(4):349361, 1990.
[18] Julie Chapuis and Donald Wilson. Bidirectional plasticity of cortical pattern recognition and behavioral sensory acuity. Nature neuroscience, 15(1):155161, 2012.
[19] Keiji Miura, Zachary Mainen, and Naoshige Uchida. Odor representations in olfactory cortex: distributed rate coding and decorrelated population activity. Neuron, 74(6):10871098, 2012.
[20] R. A. Fuentes, M. I. Aguilar, M. L. Aylwin, and P. E. Maldonado. Neuronal activity of mitral- tufted cells in awake rats during passive and active odorant stimulation. J. Neurophysiol., 100(1):422430, Jul 2008.
-----1
[1] E Schneidman, MJ Berry, R Segev, and W Bialek. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440:10071012, 2005.
[2] Gyorgy Buzsaki. Large-scale recording of neuronal ensembles. NatNeurosci, 7(5):44651, 2004.
[3] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. J. Chichilnisky, and E. P. Simoncelli. Spatio- temporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995 999, 2008.
[4] Mark M. Churchland, Byron M. Yu, Maneesh Sahani, and Krishna V. Shenoy. Techniques for extract- ing single-trial activity patterns from large-scale neural recordings. CurrOpinNeurobiol, 17(5):609618, 2007.
[5] BM Yu, A Afshar, G Santhanam, SI Ryu, KV Shenoy, and M Sahani. Extracting dynamical structure embedded in neural activity. Advances in Neural Information Processing Systems, 18:15451552, 2006.
[6] JH Macke, L Bsing, JP Cunningham, BM Yu, KV Shenoy, and M Sahani. Empirical models of spiking in neural populations. Advances in Neural Information Processing Systems, 24:13501358, 2011.
[7] R.E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):3545, 1960.
[8] JL Elman. Finding structure in time. Cognitive Science, 14:179211, 1990.
[9] L Buesing, JH Macke, and M Sahani. Spectral learning of linear dynamics from generalised-linear obser- vations with application to neural population data. Advances in Neural Information Processing Systems, 25, 2012.
[10] DE Rumelhart, GE Hinton, and RJ Williams. Learning internal representations by error propagation. Mit Press Computational Models Of Cognition And Perception Series, pages 318462, 1986.
[11] T Mikolov, A Deoras, S Kombrink, L Burget, and JH Cernocky. Empirical evaluation and combination of advanced language modeling techniques. Conference of the International Speech Communication Association, 2011.
[12] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
-----1
[1] P. Baldi and P. Sadowski. The dropout learning algorithm. Artificial Intelligence, 2013. Sub- mitted.
[2] E. F. Beckenbach and R. Bellman. Inequalities. Springer-Verlag Berlin, 1965.
[3] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), Austin, TX, June 2010. Oral Presentation.
[4] L. Bottou. Online algorithms and stochastic approximations. In D. Saad, editor, Online Learn- ing and Neural Networks. Cambridge University Press, Cambridge, UK, 1998.
[5] L. Bottou. Stochastic learning. In O. Bousquet and U. von Luxburg, editors, Advanced Lectures on Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages 146168.Springer Verlag, Berlin, 2004.
[6] D. Cartwright and M. Field. A refinement of the arithmetic mean-geometric mean inequality.Proceedings of the American Mathematical Society, pages 3638, 1978.
[7] G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neu- ral networks by preventing co-adaptation of feature detectors. http://arxiv.org/abs/1207.0580, 2012.
[8] E. Neuman and J. Sandor. On the Ky Fan inequality and related inequalities i. MATHEMATI- CAL INEQUALITIES AND APPLICATIONS, 5:4956, 2002.
[9] E. Neuman and J. Sandor. On the Ky Fan inequality and related inequalities ii. Bulletin of the Australian Mathematical Society, 72(1):87108, 2005.
[10] S. Nitish. Improving Neural Networks with Dropout. PhD thesis, University of Toronto, Toronto, Canada, 2013.
[11] H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartin- gales and some applications. Optimizing methods in statistics, pages 233257, 1971.
[12] D. Warde-Farley, I. Goodfellow, P. Lamblin, G. Desjardins, F. Bastien, and Y. Bengio.pylearn2. 2011. http://deeplearning.net/software/pylearn2.
-----1
[1] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and gen- eralized belief propagation algorithms. IEEE Trans. on Inf. Theory, 51(7):22822312, 2005.
[2] Martin J. Wainwright, Tommi Jaakkola, and Alan S. Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):23132335, 2005.
[3] Amir Globerson and Tommi Jaakkola. Approximate Inference Using Conditional Entropy Decompositions. In 11th International Workshop on AI and Statistics (AISTATS2007), 2007.
[4] Radford Neal. Annealed importance sampling. Statistics and Computing, 11:125139, 2001.
[5] John Skilling. Nested sampling for general Bayesian computation. Bayesian Analysis, 1(4):833859, 2006.
[6] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Methodology), 68(3):411436, 2006.
[7] Jascha Sohl-Dickstein and Benjamin J. Culpepper. Hamiltonian annealed importance sampling for partition function estimation. Technical report, Redwood Center, UC Berkeley, 2012.
[8] Lucas Theis, Sebastian Gerwinn, Fabian Sinz, and Matthias Bethge. In all likelihood, deep belief is not enough. Journal of Machine Learning Research, 12:30713096, 2011.
[9] Ruslan Salakhutdinov and Ian Murray. On the quantitative analysis of deep belief networks.In Intl Conf. on Machine Learning, pages 64246429, 2008.
[10] Guillaume Desjardins, Aaron Courville, and Yoshua Bengio. On tracking the partition func- tion. In NIPS 24. MIT Press, 2011.
[11] Graham Taylor and Geoffrey Hinton. Products of hidden Markov models: It takes N > 1 to tango. In Uncertainty in Artificial Intelligence, 2009.
[12] Andrew Gelman and Xiao-Li Meng. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163186, 1998.
[13] Christopher Jarzynski. Equilibrium free-energy differences from nonequilibrium measure- ments: A master-equation approach. Physical Review E, 56:50185035, 1997.
[14] Daan Frenkel and Berend Smit. Understanding Molecular Simulation: From Algorithms to Applications. Academic Press, 2 edition, 2002.
[15] Radford Neal. Sampling from multimodal distributions using tempered transitions. Statistics and Computing, 6:353366, 1996.
[16] Gundula Behrens, Nial Friel, and Merrilee Hurn. Tuning tempered transitions. Statistics and Computing, 22:6578, 2012.
[17] Ben Calderhead and Mark Girolami. Estimating Bayes factors via thermodynamic integration and population MCMC. Computational Statistics and Data Analysis, 53(12):40284045, 2009.
[18] Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
[19] Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):17711800, 2002.
[20] Tijmen Tieleman. Training restricted Boltzmann machines using approximations to the likeli- hood gradient. In Intl. Conf. on Machine Learning, 2008.
[21] Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Oxford University Press, 2000.
[22] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):22782324, 1998.
[23] Y. Iba. Extended ensemble Monte Carlo. International Journal of Modern Physics C, 12(5):623656, 2001.
-----1
[1] S. Ghosal. The Dirichlet process, related priors and posterior asymptotics. In N.L. Hjort, C. Holmes, P. Muller, and S.G. Walker, editors, Bayesian Nonparametrics, pages 3683. Cam- bridge University Press, 2010.
[2] J.P. Huelsenbeck and P. Andolfatto. Inference of population structure under a Dirichlet process model. Genetics, 175(4):17871802, 2007.
[3] M. Medvedovic and S. Sivaganesan. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics, 18(9):11941206, 2002.
[4] E. Otranto and G.M. Gallo. A nonparametric Bayesian approach to detect the number of regimes in Markov switching models. Econometric Reviews, 21(4):477496, 2002.
[5] E.P. Xing, K.A. Sohn, M.I. Jordan, and Y.W. Teh. Bayesian multi-population haplotype in- ference via a hierarchical Dirichlet process mixture. In Proceedings of the 23rd International Conference on Machine Learning, pages 10491056, 2006.
[6] P. Fearnhead. Particle filters for mixture models with an unknown number of components.Statistics and Computing, 14(1):1121, 2004.
[7] J. W. Miller and M. T. Harrison. Inconsistency of PitmanYor process mixtures for the number of components. arXiv:1309.0024, 2013.
[8] M. West, P. Muller, and M.D. Escobar. Hierarchical priors and mixture models, with applica- tion in regression and density estimation. Institute of Statistics and Decision Sciences, Duke University, 1994.
[9] A. Onogi, M. Nurimoto, and M. Morita. Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering meth- ods. BMC Bioinformatics, 12(1):263, 2011.
[10] A. Nobile. Bayesian Analysis of Finite Mixture Distributions. PhD thesis, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, 1994.
[11] S. Richardson and P.J. Green. On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society. Series B, 59(4):731792, 1997.
[12] P.J. Green and S. Richardson. Modeling heterogeneity with and without the Dirichlet process.Scandinavian Journal of Statistics, 28(2):355375, June 2001.
[13] A. Nobile and A.T. Fearnside. Bayesian finite mixtures with an unknown number of compo- nents: The allocation sampler. Statistics and Computing, 17(2):147162, 2007.
[14] W. Kruijer, J. Rousseau, and A. Van der Vaart. Adaptive Bayesian density estimation with location-scale mixtures. Electronic Journal of Statistics, 4:12251257, 2010.
[15] P. McCullagh and J. Yang. How many clusters? Bayesian Analysis, 3(1):101120, 2008.
[16] T.S. Ferguson. Bayesian density estimation by mixtures of normal distributions. In M. H.Rizvi, J. Rustagi, and D. Siegmund, editors, Recent Advances in Statistics, pages 287302.Academic Press, 1983.
[17] A. Y. Lo. On a class of Bayesian nonparametric estimates: I. Density estimates. The Annals of Statistics, 12(1):351357, 1984.
[18] M.D. Escobar and M. West. Computing nonparametric hierarchical models. In D. Dey, P. Muller, and D. Sinha, editors, Practical Nonparametric and Semiparametric Bayesian Statis- tics, pages 122. Springer-Verlag, New York, 1998.
[19] W. Hoeffding. The strong law of large numbers for U-statistics. Institute of Statistics, Univ. of N. Carolina, Mimeograph Series, 302, 1961.
[20] R.L. Graham, D.E. Knuth, and O. Patashnik. Concrete Mathematics. Addison-Wesley, 1989.
-----1
[1] Jose Manuel Alvarez, Theo Gevers, and Antonio M Lopez. 3D scene priors for road de- tection. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.IEEE. 2010, pp. 5764.
[2] Mohamed Aly. Real time detection of lane markers in urban streets. In: Intelligent Vehicles Symposium, 2008 IEEE. IEEE. 2008, pp. 712.
[3] Paul Dagum and Michael Luby. An optimal approximation algorithm for Bayesian infer- ence. In: Artificial Intelligence 93.1 (1997), pp. 127.
[4] L Del Pero et al. Bayesian geometric modeling of indoor scenes. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE. 2012, pp. 27192726.
[5] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE. 2012, pp. 33543361.
[6] Noah Goodman, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua Tenenbaum.Church: A language for generative models. In: UAI. 2008.
[7] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. In: Neural computation 18.7 (2006), pp. 15271554.
[8] Derek Hoiem, Alexei A Efros, and Martial Hebert. Putting objects in perspective. In: Com- puter Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2.IEEE. 2006, pp. 21372144.
[9] Derek Hoiem, Alexei A Efros, and Martial Hebert. Recovering surface layout from an im- age. In: International Journal of Computer Vision 75.1 (2007), pp. 151172.
[10] Berthold Klaus Paul Horn. Robot vision. the MIT Press, 1986.
[11] Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and time se- ries. In: The handbook of brain theory and neural networks 3361 (1995).
[12] Paul Marjoram, John Molitor, Vincent Plagnol, and Simon Tavare. Markov chain Monte Carlo without likelihoods. In: Proceedings of the National Academy of Sciences 100.26 (2003).
[13] Greg Mori and Jitendra Malik. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on. Vol. 1. IEEE. 2003, pp. I134.
[14] Javier Portilla and Eero P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. In: International Journal of Computer Vision 40.1 (2000).
[15] Oliver Ratmann, Christophe Andrieu, Carsten Wiuf, and Sylvia Richardson. Model criticism based on likelihood-free inference, with an application to protein network evolution. In: 106.26 (2009), pp. 1057610581.
[16] Ray Smith. An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition. Vol. 2. IEEE. 2007, pp. 629633.
[17] Zhuowen Tu, Xiangrong Chen, Alan L Yuille, and Song-Chun Zhu. Image parsing: Unifying segmentation, detection, and recognition. In: International Journal of Computer Vision 63.2 (2005), pp. 113140.
[18] Zhuowen Tu and Song-Chun Zhu. Image Segmentation by Data-Driven Markov Chain Monte Carlo. In: IEEE Trans. Pattern Anal. Mach. Intell. 24.5 (May 2002).
[19] Richard D Wilkinson. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. In: arXiv preprint arXiv:0811.3355 (2008).
[20] David Wingate, Noah D Goodman, A Stuhlmueller, and J Siskind. Nonstandard interpreta- tions of probabilistic programs for efficient inference. In: Advances in Neural Information Processing Systems 23 (2011).
[21] Alan Yuille and Daniel Kersten. Vision as Bayesian inference: analysis by synthesis? In: Trends in cognitive sciences 10.7 (2006), pp. 301308.
[22] Yibiao Zhao and Song-Chun Zhu. Image Parsing via Stochastic Scene Grammar. In: Ad- vances in Neural Information Processing Systems. 2011.
-----1
[1] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdi- nov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[2] Laurens van der Maaten, Minmin Chen, Stephen Tyree, and Kilian Q Weinberger. Learning with marginalized corrupted features. In Proceedings of the International Conference on Machine Learning, 2013.
[3] Sida I Wang and Christopher D Manning. Fast dropout training. In Proceedings of the International Conference on Machine Learning, 2013.
[4] Yaser S Abu-Mostafa. Learning from hints in neural networks. Journal of Complexity, 6(2):192198, 1990.
[5] Chris J.C. Burges and Bernhard Schlkopf. Improving the accuracy and speed of support vector machines.In Advances in Neural Information Processing Systems, pages 375381, 1997.
[6] Patrice Y Simard, Yann A Le Cun, John S Denker, and Bernard Victorri. Transformation invariance in pattern recognition: Tangent distance and propagation. International Journal of Imaging Systems and Technology, 11(3):181197, 2000.
[7] Salah Rifai, Yann Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier. Advances in Neural Information Processing Systems, 24:22942302, 2011.
[8] Kiyotoshi Matsuoka. Noise injection into inputs in back-propagation learning. Systems, Man and Cyber- netics, IEEE Transactions on, 22(3):436440, 1992.
[9] Chris M Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1):108116, 1995.
[10] Salah Rifai, Xavier Glorot, Yoshua Bengio, and Pascal Vincent. Adding noise to the input of a model trained with a regularized objective. arXiv preprint arXiv:1104.3250, 2011.
[11] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2010.
[12] Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts.Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Associa- tion for Computational Linguistics, pages 142150. Association for Computational Linguistics, 2011.
[13] Sida I Wang, Mengqiu Wang, Stefan Wager, Percy Liang, and Christopher D Manning. Feature noising for log-linear structured prediction. In Empirical Methods in Natural Language Processing, 2013.
[14] Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of the International Conference on Machine Learning, 2013.
[15] Sida Wang and Christopher D Manning. Baselines and bigrams: Simple, good sentiment and topic clas- sification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 9094. Association for Computational Linguistics, 2012.
[16] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1, 2010.
[17] Erich Leo Lehmann and George Casella. Theory of Point Estimation. Springer, 1998.
[18] Koby Crammer, Alex Kulesza, Mark Dredze, et al. Adaptive regularization of weight vectors. Advances in Neural Information Processing Systems, 22:414422, 2009.
[19] Jiang Su, Jelber Sayyad Shirab, and Stan Matwin. Large scale text classification using semi-supervised multinomial naive Bayes. In Proceedings of the International Conference on Machine Learning, 2011.
[20] Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and TomMitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2-3):103134, May 2000.
[21] G. Bouchard and B. Triggs. The trade-off between generative and discriminative classifiers. In Interna- tional Conference on Computational Statistics, pages 721728, 2004.
[22] R. Raina, Y. Shen, A. Ng, and A. McCallum. Classification with hybrid generative/discriminative models.In Advances in Neural Information Processing Systems, Cambridge, MA, 2004. MIT Press.
[23] J. Suzuki, A. Fujino, and H. Isozaki. Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.
[24] Y. Grandvalet and Y. Bengio. Entropy regularization. In Semi-Supervised Learning, United Kingdom, 2005. Springer.
[25] Thorsten Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning, pages 200209, 1999.
-----1
[ABW12] Sungjin Ahn, Anoop Korattikara Balan, and Max Welling, Bayesian posterior sampling via stochastic gradient fisher scoring., ICML, 2012.
[AKW12] S. Ahn, A. Korattikara, and M. Welling, Bayesian posterior sampling via stochastic gradient Fisher scoring, Proceedings of the International Conference on Machine Learning, 2012.
[Ama95] S. Amari, Information geometry of the EM and em algorithms for neural networks, Neural Net- works 8 (1995), no. 9, 13791408.
[AWST09] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, On smoothing and inference for topic models, Proceedings of the International Conference on Uncertainty in Artificial Intelligence, vol. 25, 2009.
[Bea03] M. J. Beal, Variational algorithms for approximate bayesian inference, Ph.D. thesis, Gatsby Com- putational Neuroscience Unit, University College London, 2003.
[BNJ03] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, Journal of Machine Learning Research 3 (2003), 9931022.
[GC11] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, Journal of the Royal Statistical Society B 73 (2011), 137.
[GCPT07] A. Globerson, G. Chechik, F. Pereira, and N. Tishby, Euclidean Embedding of Co-occurrence Data, The Journal of Machine Learning Research 8 (2007), 22652295.
[GRS96] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov chain monte carlo in practice, Chap- man and Hall, 1996.
[GS04] T. L. Griffiths and M. Steyvers, Finding scientific topics, Proceedings of the National Academy of Sciences, 2004.
[HBB10] M. D. Hoffman, D. M. Blei, and F. Bach, Online learning for latent dirichlet allocation, Advances in Neural Information Processing Systems, 2010.
[Hec99] D. Heckerman, A tutorial on learning with Bayesian networks, Learning in Graphical Models (M. I. Jordan, ed.), Kluwer Academic Publishers, 1999.
[Ken78] J. Kent, Time-reversible diffusions, Advances in Applied Probability 10 (1978), 819835.
[Ken90] A. D. Kennedy, The theory of hybrid stochastic algorithms, Probabilistic Methods in Quantum Field Theory and Quantum Gravity, Plenum Press, 1990.
[MHB12] D. Mimno, M. Hoffman, and D. Blei, Sparse stochastic inference for latent Dirichlet allocation, Proceedings of the International Conference on Machine Learning, 2012.
[NASW09] D. Newman, A. Asuncion, P. Smyth, and M. Welling, Distributed algorithms for topic models, Journal of Machine Learning Research (2009).
[Nea10] R. M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, eds.), Chapman & Hall / CRC Press, 2010.
[PSD00] J.K. Pritchard, M. Stephens, and P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (2000), 945959.
[RM51] H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics 22 (1951), no. 3, 400407.
[RS02] G. O. Roberts and O. Stramer, Langevin diffusions and metropolis-hastings algorithms, Method- ology and Computing in Applied Probability 4 (2002), 337357, 10.1023/A:1023562417138.
[Sat01] M. Sato, Online model selection based on the variational Bayes, Neural Computation 13 (2001), 16491681.
[TNW07] Y. W. Teh, D. Newman, and M. Welling, A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation, Advances in Neural Information Processing Systems, vol. 19, 2007, pp. 13531360.
[WJ08] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational in- ference, Foundations and Trends in Machine Learning 1 (2008), no. 1-2, 1305.
[WMSM09] Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno, Evaluation methods for topic models, Proceedings of the 26th International Conference on Machine Learning (ICML) (Montreal) (Leon Bottou and Michael Littman, eds.), Omnipress, June 2009, pp. 11051112.
[WT11] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, Pro- ceedings of the International Conference on Machine Learning, 2011.
-----1
[1] D. Aldous. Exchangeability and related topics. Ecole dEte de Probabilites de Saint-Flour XIII, pages 1198, 1985.
[2] D. J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581598, 1981.
[3] R. E. Barlow and K. D. Heidtmann. Computing k-out-of-n system reliability. IEEE Transac- tions on Reliability, 33:322323, 1984.
[4] F. Caron. Bayesian nonparametric models for bipartite graphs. In Neural Information Process- ing Systems, 2012.
[5] S. X Chen, A. P. Dempster, and J. S. Liu. Weighted finite population sampling to maximize entropy. Biometrika, 81:457469, 1994.
[6] S. X. Chen and J. S. Liu. Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statistica Sinica, 7:875892, 1997.
[7] F. Doshi-Velez and Z. Ghahramani. Accelerated Gibbs sampling for the Indian buffet process.In International Conference on Machine Learning, 2009.
[8] M. Fernandez and S. Williams. Closed-form expression for the Poisson-binomial probability density function. IEEE Transactions on Aerospace Electronic Systems, 46:803817, 2010.
[9] S. Fortini, L. Ladelli, and E. Regazzini. Exchangeability, predictive distributions and paramet- ric models. Sankhya: The Indian Journal of Statistics, Series A, pages 86109, 2000.
[10] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. Sharing features among dynamical systems with beta processes. In Neural Information Processing Systems, 2010.
[11] T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process.In Neural Information Processing Systems, 2005.
[12] J. F. C. Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):5978, 1967.
[13] K. T. Miller, T. L. Griffiths, and M. I. Jordan. Nonparametric latent feature models for link prediction. In Neural Information Processing Systems, 2009.
[14] R. M. Neal. Slice sampling. Annals of Statistics, 31(3):705767, 2003.
[15] Y. W. Teh and D. Gorur. Indian buffet processes with power law behaviour. In Neural Infor- mation Processing Systems, 2009.
[16] Y. W. Teh, D. Gorur, and Z. Ghahramani. Stick-breaking construction for the Indian buffet process. In Artificial Intelligence and Statistics, 2007.
[17] R. Thibaux and M.I. Jordan. Hierarchical beta processes and the Indian buffet process. In Artificial Intelligence and Statistics, 2007.
[18] M. Titsias. The infinite gamma-Poisson feature model. In Neural Information Processing Systems, 2007.
[19] A. Y. Volkova. A refinement of the central limit theorem for sums of independent random indicators. Theory of Probability and its Applications, 40:791794, 1996.
[20] F. Wood, T. L. Griffiths, and Z. Ghahramani. A non-parametric Bayesian method for inferring hidden causes. In Uncertainty in Artificial Intelligence, 2006.
[21] M. Zhou, L. A. Hannah, D. B. Dunson, and L. Carin. Beta-negative binomial process and Poisson factor analysis. In Artificial Intelligence and Statistics, 2012.
[22] G. K. Zipf. Selective Studies and the Principle of Relative Frequency in Language. Harvard University Press, 1932.
-----0
C. Archambeau, D. Cornford, M. Opper, and J. Shawe-Taylor. Gaussian process approximations of stochastic differential equations. Journal of Machine Learning Research Proceedings Track, 1:116, 2007.B. Cseke and T. Heskes. Properties of Bethe free energies and message passing in Gaussian models. Journal of Artificial Intelligence Research, 41:124, 2011a.B. Cseke and T. Heskes. Approximate marginals in latent Gaussian models. Journal of Machine Learning Research, 12:417457, 2011b.
T. A. Davis. Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2). Society for Industrial and Applied Mathematics, Philadelphia, 2006.
P. M. Di Lorenzo and J. D. Victor. Taste response variability and temporal coding in the nucleus of the solitary tract of the rat. Journal of Neurophysiology, 90:14181431, 2003.
C. W. Gardiner. Handbook of stochastic methods: for physics, chemistry and the natural sciences. Springer series in synergetics, 13. Springer, 2002.
T. Heskes, M. Opper, W. Wiegerinck, O. Winther, and O. Zoeter. Approximate inference techniques with expectation constraints. Journal of Statistical Mechanics: Theory and Experiment, 2005.H. J. Kappen, V. Gomez, and M. Opper. Optimal control as a graphical model inference problem. Machine Learning, 87(2):159182, 2012.J. F. C. Kingman. Poisson Processes. Oxford Statistical Science Series. Oxford University Press, New York, 1992.S. L. Lauritzen. Graphical Models. Oxford Statistical Science Series. Oxford University Press, New York, 1996.
J. H. Macke, L. Buesing, J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani. Empirical models of spiking in neural populations. In Advances in Neural Information Processing Systems 24, pages 13501358. 2011.
T. P. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, MIT, 2001.I. Murray, R. P. Adams, and D. J.C. MacKay. Elliptical slice sampling. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pages 541548. 2010.
A. Ocone, A.J. Millar, and G. Sanguinetti. Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics. Bioinformatics, 29(7):910916, 2013.
B. ksendal. Stochastic differential equations. Universitext. Springer, 2010.M. Opper and G. Sanguinetti. Variational inference for Markov jump processes. In Advances in Neural Information Processing Systems 20, 2008.
M. Opper and O. Winther. Gaussian processes for classification: Mean-field algorithms. Neural Computation, 12(11): 26552684, 2000.M. Opper and O. Winther. Expectation consistent approximate inference. Journal of Machine Learing Research, 6: 21772204, 2005.
M. Opper, U. Paquet, and O. Winther. Improving on Expectation Propagation. In Advances in Neural Information Processing Systems 21, pages 12411248. MIT, Cambridge, MA, US, 2009.
M. Opper, A. Ruttor, and G. Sanguinetti. Approximate inference in continuous time Gaussian-Jump processes. In Advances in Neural Information Processing Systems 23, pages 18311839, 2010.
V. Rao and Y-W Teh. MCMC for continuous-time discrete-state systems. In Advances in Neural Information Processing Systems 25, pages 710718, 2012.
S. Sarkka. Recursive Bayesian Inference on Stochastic Differential Equations. PhD thesis, Helsinki University of Technology, 2006.
A. C. Smith and E. N. Brown. Estimating a state-space model from point process observations. Neural Computation, 15(5):965991, 2003.
W. Wiegerinck and T. Heskes. Fractional Belief Propagation. In Advances in Neural Information Processing Systems 15, pages 438445, Cambridge, MA, 2003. The MIT Press.
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In Advances in Neural Information Processing Systems 12, pages 689695, Cambridge, MA, 2000. The MIT Press.
A. Zammit Mangion, K. Yuan, V. Kadirkamanathan, M. Niranjan, and G. Sanguinetti. Online variational inference for state-space models with point-process observations. Neural Computation, 23(8):19671999, 2011.
A. Zammit-Mangion, G. Dewar, M., Kadirkamanathan V., A., and G. Sanguinetti. Point process modelling of the Afghan war diary. Proceeding of the National Academy of Sciences, 2012. doi: 10.1073/pnas.1203177109.
-----1
[1] A. N. Shiryayev. Optimal Stopping Rules. Springer-Verlag, 1978.
[2] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Morgan Kaufmann, 1988.
[3] M. I. Jordan. Graphical models. Statistical Science, 19:140155, 2004.
[4] P. Diaconis and D. Freedman. Iterated random functions. SIAM Rev., 41(1):4576, 1999.
[5] O. P. Kreidl and A. Willsky. Inference with minimum communication: a decision-theoretic variational approach. In NIPS, 2007.
[6] M. Cetin, L. Chen, J. W. Fisher III, A. Ihler, R. Moses, M. Wainwright, and A. Willsky. Dis- tributed fusion in sensor networks: A graphical models perspective. IEEE Signal Processing Magazine, July:4255, 2006.
[7] X. Nguyen, A. A. Amini, and R. Rajagopal. Message-passing sequential detection of multiple change points in networks. In ISIT, 2012.
[8] A. Frank, P. Smyth, and A. Ihler. A graphical model representation of the track-oriented multiple hypothesis tracker. In Proceedings, IEEE Statistical Signal Processing (SSP). August 2012.
[9] A. T. Ihler, J. W. Fisher III, and A. S. Willsky. Loopy belief propagation: Convergence and effects of message errors. Journal of Machine Learning Research, 6:905936, May 2005.
[10] Alexander Ihler. Accuracy bounds for belief propagation. In Proceedings of UAI 2007, July 2007.
[11] T. G. Roosta, M. Wainwright, and S. S. Sastry. Convergence analysis of reweighted sum- product algorithms. IEEE Trans. Signal Processing, 56(9):42934305, 2008.
[12] D. Steinsaltz. Locally contractive iterated function systems. Ann. Probab., 27(4):19521979, 1999.
[13] W. B. Wu and M. Woodroofe. A central limit theorem for iterated random functions. J . Appl.Probab., 37(3):748755, 2000.
[14] W. B. Wu and X. Shao. Limit theorems for iterated random functions.. :. J. Appl. Probab., 41(2):425436, 2004.
[15] O. Stenflo. A survey of average contractive iterated function systems. J. Diff. Equa. and Appl., 18(8):13551380, 2012.
[16] A. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes: With Applica- tions to Statistics. Springer, 1996.
[17] A. A. Amini and X. Nguyen. Bayesian inference as iterated random functions with applications to sequential inference in graphical models. arXiv preprint.
[18] A. A. Amini and X. Nguyen. Sequential detection of multiple change points in networks: a graphical model approach. IEEE Transactions on Information Theory, 59(9):58245841, 2013.
-----0
Anderson, E. J., & Ferris, M. C. (2001). A direct search algorithm for optimization with noisy function evaluations. SIAM Journal of Optimization, 11 , 837857.
Anderson, J. R., Conrad, F. G., & Corbett, A. T. (1989). Skill acquisition and the LISP tutor. Cognitive Science, 13 , 467506.
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009, June). Curriculum learning.In L. Bottou & M. Littman (Eds.), Proceedings of the 26th international conference on machine learning (pp. 4148). Montreal: Omnipress.
Boeck, P. D., & Wilson, M. (2004). Explanatory item response models. a generalized linear and nonlinear approach. New York: Springer.
Carvalho, P. F., & Goldstone, R. L. (2011, November). Stimulus similarity relations modulate benefits for blocking versus interleaving during category learning. (Presentation at the 52nd Annual Meeting of the Psychonomics Society, Seattle, WA) 
Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction. Special Issue on Data Mining for Personalized Educational Systems, 21 , 137180.
Christian, B. (2012). The A/B test: Inside the technology thats changing the rules of business. Wired , 20(4).de Jonge, M., Tabbers, H. K., Pecher, D., & Zeelenberg, R. (2012). The effect of study time distribution on learning and retention: A goldilocks principle for presentation rate.J. Exp. Psych.: Learning, Mem., & Cog., 38 , 405412.
Forrester, A. I. J., & Keane, A. J. (2009). Recent advances in surrogate-based optimization.Progress in Aerospace Sciences, 45 , 5079.
Goldstone, R. L., & Steyvers, M. (2001). The sensitization and differentiation of dimensions during category learning. Journal of Experimental Psychology: General , 130 , 116139.Kang, S. H. K., & Pashler, H. (2011). Learning painting styles: Spacing is advantageous when it promotes discriminative contrast. Applied Cognitive Psychology , 26 , 97103.
Khan, F., Zhu, X. J., & Mutlu, B. (2011). How do humans teach: On curriculum learning and teaching dimension. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, & K. Weinberger (Eds.), Adv. in NIPS 24 (pp. 14491457). La Jolla, CA: NIPS Found.Koedinger, K. R., & Corbett, A. T. (2006). Cognitive tutors: Technology bringing learning science to the classroom. In K. Sawyer (Ed.), The cambridge handbook of the learning sciences (pp. 6178). Cambridge UK: Cambridge University Press.Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the enemy of induction? Psychological Science, 19 , 585592.
Martin, J., & VanLehn, K. (1995). Student assessment using bayesian nets. International Journal of Human-Computer Studies, 42 , 575591.
Minear, M., & Park, D. C. (2004). A lifespan database of adult facial stimuli. Behavior Research Methods, Instruments, and Computers, 36 , 630633.Murray, I., Adams, R. P., & MacKay, D. J. (2010). Elliptical slice sampling. J. of Machine Learn. Res., 9 , 541548.
Osborne, M. A., Garnett, R., & Roberts, S. J. (2009, January). Gaussian processes for global optimization. In 3d intl. conf. on learning and intell. opt. Trento, Italy.Pashler, H., & Mozer, M. C. (in press). Enhancing perceptual category learning through fading: When does it help? J. of Exptl. Psych.: Learning, Mem., & Cog..
Rafferty, A. N., Brunskill, E. B., Griffiths, T. L., & Shafto, P. (2011). Faster teaching by POMDP planning. In Proc. of the 15th intl. conf. on AI in education.
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4 , 409435.
Salmon, J. P., McMullen, P. A., & Filliter, J. H. (2010). Norms for two types of manipulability (graspability and functional usage), familiarity, and age of acquisition for 320 photographs of objects. Behavioral Research Methods, 42 , 8295.
Schloss, K. B., & Palmer, S. E. (2011). Aesthetic response to color combinations: preference, harmony, and similarity. Attention, Perception, & Psychophysics, 73 , 551571.
Srinivas, N., Krause, A., Kakade, S., & Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th international conference on machine learning. Haifa, Israel.
Whitehill, J., & Movellan, J. R. (2010). Optimal teaching machines (Tech. Rep.). La Jolla, CA: Department of Computer Science, UCSD.
-----1
[1] G. Gigerenzer, P. M. Todd, and the ABC Research Group. Simple heuristics that make us smart. Oxford University Press, New York, 1999.
[2] G. Gigerenzer, R. Hertwig, and T. Pachur, editors. Heuristics: The Foundations of Adaptive Behavior.Oxford University Press, New York, 2011.
[3] K. V. Katsikopoulos. Psychological heuristics for making inferences: Definition, performance, and the emerging theory and practice. Decision Analysis, 8(1):1029, 2011.
[4] R. M. Hogarth and N. Karelaia. Take-the-best and other simple strategies: Why and when they work well with binary cues. Theory and Decision, 61(3):205249, 2006b.
[5] M. Baucells, J. A. Carrasco, and R. M. Hogarth. Cumulative dominance and heuristic performance in binary multiattribute choice. Operations Research, 56(5):12891304, 2008.
[6] J. A. Carrasco and M. Baucells. Tight upper bounds for the expected loss of lexicographic heuristics in binary multi-attribute choice. Mathematical Social Sciences, 55(2):156189, 2008.
[7] L. Martignon and U. Hoffrage. Why does one-reason decision making work? In G. Gigerenzer, P. M.Todd, and the ABC Research Group, editors, Simple heuristics that make us smart, pages 119140. Oxford University Press, New York, 1999.
[8] L. Martignon and U. Hoffrage. Fast, frugal, and fit: Simple heuristics for paired comparison. Theory and Decision, 52(1):2971, 2002.
[9] R. M. Hogarth and N. Karelaia. Simple models for multiattribute choice with many alternatives: When it does and does not pay to face trade-offs with binary attributes. Management Science, 51(12):18601872, 2005.
[10] K. V. Katsikopoulos and L. Martignon. Na?ve heuristics for paired comparisons: Some results on their relative accuracy. Journal of Mathematical Psychology, 50(5):488494, 2006.
[11] K. V. Katsikopoulos. Why do simple heuristics perform well in choices with binary attributes? Decision Analysis, To appear.
[12] S. S. Wilks. Weighting systems for linear functions of correlated variables when there is no dependent variable. Psychometrika, 3(1):2340, 1938.
[13] F. L. Schmidt. The relative efficiency of regression and simple unit weighting predictor weights in applied differential psychology. Educational and Psychological Measurement, 31:699714, 1971.
[14] R. M. Dawes and B. Corrigan. Linear models in decision making. Psychological Bulletin, 81(2):95106, 1974.
[15] R. M. Dawes. The robust beauty of improper linear models in decision making. American Psychologist, 34(7):571582, 1979.
[16] H. J. Einhorn and R. M. Hogarth. Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13(2):171192, 1975.
[17] C. P. Davis-Stober. A geometric analysis of when fixed weighting schemes will outperform ordinary least squares. Psychometrika, 76(4):650669, 2011.
[18] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11):14421471, 1974.
[19] G. Gigerenzer and D. G. Goldstein. Reasoning the fast and frugal way: Models of bounded rationality.Psychological Review, 103(4):650669, 1996.
[20] J. Czerlinski, G. Gigerenzer, and D. G. Goldstein. How good are simple heuristics? In G. Gigerenzer, P. M. Todd, and the ABC Research Group, editors, Simple heuristics that make us smart, pages 97118.Oxford University Press, New York, 1999.
[21] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67:301320, 2005.
[22] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordi- nate descent. Journal of Statistical Software, 33(1):122, 2010.
[23] S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):158, 1992.
[24] H. Brighton and G. Gigerenzer. Bayesian brains and cognitive mechanisms: Harmony or dissonance? In N. Chater and M. Oaksford, editors, The probabilistic mind: Prospects for Bayesian cognitive science, pages 189208. Oxford University Press, New York, 2008.
[25] G. Gigerenzer and H. Brighton. Homo Heuristicus: Why biased minds make better inferences. Topics in Cognitive Science, 1(1):107143, 2009.
-----0
James Surowiecki. The wisdom of crowds. Anchor, 2005.
Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. SIGKDD Intl Conf. on Knowledge Discovery and Data Mining, pages 614622. ACM, 2008.
A.P. Dawid and A.M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, pages 2028, 1979.
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS), pages 20352043, 2009.D.R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In Advances in Neural Information Processing Systems (NIPS), pages 19531961, 2011.
Qiang Liu, Jian Peng, and Alexander Ihler. Variational inference for crowdsourcing. In Advances in Neural Information Processing Systems (NIPS), pages 701709, 2012.Dengyong Zhou, John Platt, Sumit Basu, and Yi Mao. Learning from the wisdom of crowds by minimax entropy. In Advances in Neural Information Processing Systems (NIPS), pages 2204 2212, 2012.
A Kimball Romney, William H Batchelder, and Susan C Weller. Recent applications of cultural consensus theory. American Behavioral Scientist, 31(2):163177, 1987.Michael D Lee, Mark Steyvers, Mindy de Young, and Brent Miller. Inferring expertise in knowledge and prediction ranking tasks. Topics in cognitive science, 4(1):151163, 2012.Gary Chamberlain. Multivariate regression models for panel data. Journal of Econometrics, 18(1): 546, 1982.
Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43(4):439561, 2006.
Joel Friedman, Jeff Kahn, and Endre Szemeredi. On the second eigenvalue of random regular graphs.In Proc. ACM Symp. on Theory of Computing, pages 587598. ACM, 1989.
Doron Puder. Expansion of random graphs: New proofs, new results. arXiv preprint arXiv:1212.5216, 2012.
Cade Massey, Joseph P Simmons, and David A Armor. Hope over experience: Desirability and the persistence of optimism. Psychological Science, 22(2):274281, 2011.
-----1
[1] R. Reid, From Functional Architecture to Functional Connectomics, Neuron, vol. 75, pp. 209217, July 2012.
[2] M. Ashby and J. Isaac, Maturation of a recurrent excitatory neocortical circuit by experience-dependent unsilencing of newly formed dendritic spines, Neuron, vol. 70, no. 3, pp. 510  521, 2011.
[3] E. Fino and R. Yuste, Dense Inhibitory Connectivity in Neocortex, Neuron, vol. 69, pp. 11881203, Mar. 2011.
[4] V. Nikolenko, K. E. Poskanzer, and R. Yuste, Two-photon photostimulation and imaging of neural cir- cuits, Nat Meth, vol. 4, pp. 943950, Nov. 2007.
[5] A. M. Packer, D. S. Peterka, J. J. Hirtz, R. Prakash, K. Deisseroth, and R. Yuste, Two-photon optogenet- ics of dendritic spines and neural circuits, Nat Meth, vol. 9, pp. 12021205, Dec. 2012.
[6] A. M. Packer and R. Yuste, Dense, unspecific connectivity of neocortical parvalbumin-positive interneu- rons: A canonical microcircuit for inhibition?, The Journal of Neuroscience, vol. 31, no. 37, pp. 13260 13271, 2011.
[7] B. Barbour, N. Brunel, V. Hakim, and J.-P. Nadal, What can we learn from synaptic weight distribu- tions?, Trends in neurosciences, vol. 30, pp. 622629, Dec. 2007.
[8] C. Holmgren, T. Harkany, B. Svennenfors, and Y. Zilberter, Pyramidal cell communication within local networks in layer 2/3 of rat neocortex, The Journal of Physiology, vol. 551, no. 1, pp. 139153, 2003.
[9] J. Kozloski, F. Hamzei-Sichani, and R. Yuste, Stereotyped position of local synaptic targets in neocor- tex, Science, vol. 293, no. 5531, pp. 868872, 2001.
[10] R. B. Levy and A. D. Reyes, Spatial profile of excitatory and inhibitory synaptic connectivity in mouse primary auditory cortex, The Journal of Neuroscience, vol. 32, no. 16, pp. 56095619, 2012.
[11] R. Perin, T. K. Berger, and H. Markram, A synaptic organizing principle for cortical neuronal groups, Proceedings of the National Academy of Sciences, vol. 108, no. 13, pp. 54195424, 2011.
[12] S. Song, P. J. Sjostrom, M. Reigl, S. Nelson, and D. B. Chklovskii, Highly nonrandom features of synaptic connectivity in local cortical circuits., PLoS biology, vol. 3, p. e68, Mar. 2005.
[13] E. I. George and R. E. McCulloch, Variable selection via gibbs sampling, Journal of the American Statistical Association, vol. 88, no. 423, pp. 881889, 1993.
[14] T. J. Mitchell and J. J. Beauchamp, Bayesian variable selection in linear regression, Journal of the American Statistical Association, vol. 83, no. 404, pp. 10231032, 1988.
[15] S. Mohamed, K. A. Heller, and Z. Ghahramani, Bayesian and l1 approaches to sparse unsupervised learning, CoRR, vol. abs/1106.1157, 2011.
[16] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.
[17] P. Carbonetto and M. Stephens, Scalable variational inference for bayesian variable selection in regres- sion, and its accuracy in genetic association studies, Bayesian Analysis, vol. 7, no. 1, pp. 73108, 2012.
[18] M. Titsias and M. Lzaro-Gredilla, Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning, in Advances in Neural Information Processing Systems 24, pp. 23392347, 2011.
[19] Y. Dodge, V. Fedorov, and H. Wynn, eds., Optimal Design and Analysis of Experiments. North Holland, 1988.
[20] D. J. C. MacKay, Information-based objective functions for active data selection, Neural Comput., vol. 4, pp. 590604, July 1992.
[21] L. Paninski, Asymptotic Theory of Information-Theoretic Experimental Design, Neural Comput., vol. 17, pp. 14801507, July 2005.
[22] E. Arias-Castro, E. J. Cande`s, and M. A. Davenport, On the fundamental limits of adaptive sensing, IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 472481, 2013.
[23] T. Hu, A. Leonardo, and D. Chklovskii, Reconstruction of Sparse Circuits Using Multi-neuronal Excita- tion (RESCUME), in Advances in Neural Information Processing Systems 22, pp. 790798, 2009.
[24] S. Ji and L. Carin, Bayesian compressive sensing and projection optimization, in Proceedings of the 24th international conference on Machine learning, ICML 07, (New York, NY, USA), pp. 377384, ACM, 2007.
[25] M. Malloy and R. D. Nowak, Near-optimal adaptive compressed sensing, CoRR, vol. abs/1306.6239, 2013.
-----1
[1] Francis Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. arXiv preprint arXiv:1303.6149, 2013.
[2] S. Chatterjee, A. Banerjee, and A. Ganguly. Sparse group lasso for regression on land climate variables.In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 18. IEEE, 2011.
[3] S. Dasgupta, D. Hsu, and N. Verma. A concentration theorem for projections. arXiv preprint arXiv:1206.6813, 2012.
[4] Eva Feredoes, Giulio Tononi, and Bradley R Postle. The neural bases of the short-term storage of verbal information are anatomically variable across individuals. The Journal of Neuroscience, 27(41):11003 11008, 2007.
[5] L. Jacob, G. Obozinski, and J. P. Vert. Group lasso with overlap and graph lasso. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 433440. ACM, 2009.
[6] A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. Advances in Neural Information Processing Systems, 23:964972, 2010.
[7] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. arXiv preprint arXiv:1009.2139, 2010.
[8] K. Lounici, M. Pontil, A. B. Tsybakov, and S. van de Geer. Taking advantage of sparsity in multi-task learning. arXiv preprint arXiv:0903.1468, 2009.
[9] S. N. Negahban, P. Ravikumar, M. J Wainwright, and Bin Yu. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statistical Science, 27(4):538557, 2012.
[10] G. Obozinski, L. Jacob, and J.P. Vert. Group lasso with overlaps: The latent group lasso approach. arXiv preprint arXiv:1110.0413, 2011.
[11] N. Rao, B. Recht, and R. Nowak. Universal measurement bounds for structured sparse signal recovery.In Proceedings of AISTATS, volume 2102, 2012.
[12] Irina Rish, Guillermo A Cecchia, Kyle Heutonb, Marwan N Balikic, and A Vania Apkarianc. Sparse regression analysis of task-relevant information distribution in the brain. In Proceedings of SPIE, volume 8314, page 831412, 2012.
[13] Srikanth Ryali, Kaustubh Supekar, Daniel A Abrams, and Vinod Menon. Sparse logistic regression for whole brain classification of fmri data. NeuroImage, 51(2):752, 2010.
[14] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso. Journal of Computational and Graphical Statistics, (just-accepted), 2012.
[15] P. Sprechmann, I. Ramirez, G. Sapiro, and Y. Eldar. Collaborative hierarchical sparse modeling. In Information Sciences and Systems (CISS), 2010 44th Annual Conference on, pages 16. IEEE, 2010.
[16] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267288, 1996.
[17] Marcel van Gerven, Christian Hesse, Ole Jensen, and Tom Heskes. Interpreting single trial data using groupwise regularisation. NeuroImage, 46(3):665676, 2009.
[18] X. Wang, T. M Mitchell, and R. Hutchinson. Using machine learning to detect cognitive states across multiple subjects. CALD KDD project paper, 2003.
[19] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):4967, 2006.
[20] J. Zhou, J. Chen, and J. Ye. Malsar: Multi-task learning via structural regularization, 2012.
[21] Y. Zhou, R. Jin, and S. C. Hoi. Exclusive lasso for multi-task feature selection. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
-----1
[1] S. R. Becker, E. Cande`s, and M. Grant. Templates for convex cone problems with applications to sparse signal recovery. Technical report, Standford University, 2010.
[2] D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific, 2003.
[3] A. Bruckstein, D. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51:3481, 2009.
[4] E. Cande`s. Compressive sampling. In Proceedings of the International Congress of Mathematics, 2006.
[5] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43:129159, 2001.
[6] D. L. Donoho and Y. Tsaig. Fast solution of l-1 norm minimization problems when the solution may be sparse. IEEE Transactions on Information Theory, 54:47894812, 2008.
[7] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407 499, 2004.
[8] L. El Ghaoui, V. Viallon, and T. Rabbani. Safe feature elimination in sparse supervised learning. Pacific Journal of Optimization, 8:667698, 2012.
[9] J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature spaces. Journal of the Royal Statistical Society Series B, 70:849911, 2008.
[10] J. Friedman, T. Hastie, H. Hefling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1:302332, 2007.
[11] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordi- nate descent. Journal of Statistical Software, 33:122, 2010.
[12] S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior-point method for large scale l1-regularized least squares. IEEE Journal of Selected Topics in Signal Processing, 1:606617, 2007.
[13] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.In Proceedings of the IEEE, 1998.
[14] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009.
[15] J. Mairal and B. Yu. Complexity analysis of the lasso regularization path. In ICML, 2012.
[16] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil). Technical report, No.CUCS-006-96, Dept. Comp. Science, Columbia University, 1996.
[17] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20:389404, 2000.
[18] M. Y. Park and T. Hastie. L1-regularized path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B, 69:659677, 2007.
[19] F. Samaria and A. Harter. Parameterisation of a stochastic model for human face identification. In Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, 1994.
[20] R. Tibshirani. Regression shringkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58:267288, 1996.
[21] R. Tibshirani, J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor, and R. Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society Series B, 74:245 266, 2012.
[22] N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. Advances in neural information processing systems, 15:721728, 2002.
[23] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. In Proceedings of IEEE, 2010.
[24] T. T. Wu, Y. F. Chen, T. Hastie, E. Sobel, and K. Lange. Genomewide association analysis by lasso penalized logistic regression. Bioinformatics, 25:714721, 2009.
[25] Z. J. Xiang and P. J. Ramadge. Fast lasso screening tests based on correlations. In IEEE ICASSP, 2012.
[26] Z. J. Xiang, H. Xu, and P. J. Ramadge. Learning sparse representation of high dimensional data on large scale dictionaries. In NIPS, 2011.
[27] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:4967, 2006.
[28] P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:25412563, 2006.
[29] J. Zhou, L. Yuan, J. Liu, and J. Ye. A multi-task learning formulation for predicting disease progression.In KDD, pages 814822. ACM, 2011.
-----1
[1] A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with Hilbert- Schmidt norms. In ALT, pages 6378, 2005.
[2] G. Szekely, M. Rizzo, and N.K. Bakirov. Measuring and testing dependence by correlation of distances.Ann. Stat., 35(6):27692794, 2007.
[3] D. Sejdinovic, A. Gretton, B. Sriperumbudur, and K. Fukumizu. Hypothesis testing using pairwise dis- tances and associated kernels. In ICML, 2012.
[4] F. R. Bach and M. I. Jordan. Kernel independent component analysis. J. Mach. Learn. Res., 3:148, 2002.
[5] K. Fukumizu, F. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. J.Mach. Learn. Res., 8:361383, 2007.
[6] J. Dauxois and G. M. Nkiet. Nonlinear canonical analysis and independence tests. Ann. Stat., 26(4):1254 1278, 1998.
[7] D. Pal, B. Poczos, and Cs. Szepesvari. Estimation of renyi entropy and mutual information based on generalized nearest-neighbor graphs. In NIPS 23, 2010.
[8] A. Kankainen. Consistent Testing of Total Independence Based on the Empirical Characteristic Function.PhD thesis, University of Jyvaskyla, 1995.
[9] S. Bernstein. The Theory of Probabilities. Gastehizdat Publishing House, Moscow, 1946.
[10] M. Kayano, I. Takigawa, M. Shiga, K. Tsuda, and H. Mamitsuka. Efficiently finding genome-wide three- way gene interactions from transcript- and genotype-data. Bioinformatics, 25(21):27352743, 2009.
[11] N. Meinshausen and P. Buhlmann. High dimensional graphs and variable selection with the lasso. Ann.Stat., 34(3):14361462, 2006.
[12] P. Ravikumar, M.J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing ?1-penalized log-determinant divergence. Electron. J. Stat., 4:935980, 2011.
[13] J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2001.
[14] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. 2nd edition, 2000.
[15] M. Kalisch and P. Buhlmann. Estimating high-dimensional directed acyclic graphs with the PC algorithm.J. Mach. Learn. Res., 8:613636, 2007.
[16] X. Sun, D. Janzing, B. Scholkopf, and K. Fukumizu. A kernel-based causal learning algorithm. In ICML, pages 855862, 2007.
[17] R. Tillman, A. Gretton, and P. Spirtes. Nonlinear directed acyclic structure learning with weakly additive noise models. In NIPS 22, 2009.
[18] K. Zhang, J. Peters, D. Janzing, and B. Schoelkopf. Kernel-based conditional independence test and application in causal discovery. In UAI, pages 804813, 2011.
[19] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Scholkopf, and A. Smola. A kernel statistical test of independence. In NIPS 20, pages 585592, Cambridge, MA, 2008. MIT Press.
[20] K. Fukumizu, A. Gretton, X. Sun, and B. Scholkopf. Kernel measures of conditional dependence. In NIPS 20, pages 489496, 2008.
[21] H.O. Lancaster. The Chi-Squared Distribution. Wiley, London, 1969.
[22] B. Streitberg. Lancaster interactions revisited. Ann. Stat., 18(4):18781885, 1990.
[23] K. Fukumizu, B. Sriperumbudur, A. Gretton, and B. Schoelkopf. Characteristic kernels on groups and semigroups. In NIPS 21, pages 473480, 2009.
[24] A. Kankainen. Consistent Testing of Total Independence Based on the Empirical Characteristic Function.PhD thesis, University of Jyvaskyla, 1995.
[25] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer, 2004.
[26] D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. arXiv:1207.6076, 2012.
[27] B. Sriperumbudur, K. Fukumizu, and G. Lanckriet. Universality, characteristic kernels and rkhs embed- ding of measures. J. Mach. Learn. Res., 12:23892410, 2011.
[28] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Scholkopf. Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res., 11:15171561, 2010.
[29] A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola. A kernel two-sample test. J. Mach.Learn. Res., 13:723773, 2012.
[30] G. Szekely and M. Rizzo. Testing for equal distributions in high dimension. InterStat, (5), November 2004.
[31] L. Baringhaus and C. Franz. On a new multivariate two-sample test. J. Multivariate Anal., 88(1):190206, 2004.
[32] G. Szekely and M. Rizzo. Brownian distance covariance. Ann. Appl. Stat., 4(3):12331303, 2009.
[33] R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, 1980.
[34] T.P. Speed. Cumulants and partition lattices. Austral. J. Statist., 25:378388, 1983.
[35] S. Holm. A simple sequentially rejective multiple test procedure. Scand. J. Statist., 6(2):6570, 1979.
[36] A. Gretton, K. Fukumizu, Z. Harchaoui, and B. Sriperumbudur. A fast, consistent kernel two-sample test.In NIPS 22, Red Hook, NY, 2009. Curran Associates Inc.
-----1
[1] A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. InDecision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 54515452. IEEE, 2012.
[2] A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM, pages 123132, 2012.
[3] N. R. Alexandros Labrinidis. Balancing performance and data freshness in web database servers. pages pp. 393  404, September 2003.
[4] M. Bouzeghoub. A framework for analysis of data freshness. In Proceedings of the 2004 international workshop on Information quality in information systems, IQIS 04, pages 5967, 2004.
[5] J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In International Conference on Machine Learning (ICML 2011), June 2011.
[6] L. Bright and L. Raschid. Using latency-recency profiles for data delivery on the web. In Proceedings of the 28th international conference on Very Large Data Bases, VLDB 02, pages 550561, 2002.
[7] J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, III, C. A. Soules, and A. Veitch. LazyBase: trading freshness for performance in a scalable database. In Proceedings of the 7th ACM european conference on Computer Systems, pages 169182, 2012.
[8] J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. In HotOS 13. Usenix, 2013.
[9] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In NIPS 2012, 2012.
[10] Facebook. www.facebook.com/note.php?note_id=10150388519243859, January 2013.
[11] C. J. Fidge. Timestamps in Message-Passing Systems that Preserve the Partial Ordering. In 11th Aus- tralian Computer Science Conference, pages 5566, University of Queensland, Australia, 1988.
[12] R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD, pages 6977. ACM, 2011.
[13] L. Golab and T. Johnson. Consistency in a stream warehouse. In CIDR 2011, pages 114122.
[14] C.-T. Huang. Loft: Low-overhead freshness transmission in sensor networks. In SUTC 2008, pages 241248, Washington, DC, USA, 2008. IEEE Computer Society.
[15] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558 565, July 1978.
[16] J. Langford, L. Li, and A. Strehl. Vowpal wabbit online learning project, 2007.
[17] J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. In Advances in Neural Information Processing Systems, pages 23312339, 2009.
[18] Y. Low, G. Joseph, K. Aapo, D. Bickson, C. Guestrin, and M. Hellerstein, Joseph. Distributed GraphLab: A framework for machine learning and data mining in the cloud. PVLDB, 2012.
[19] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 International Conference on Man- agement of Data, pages 135146. ACM, 2010.
[20] F. Mattern. Virtual time and global states of distributed systems. In C. M. et al., editor, Proc. Workshop on Parallel and Distributed Algorithms, pages 215226, North-Holland / Elsevier, 1989.
[21] F. Niu, B. Recht, C. Re, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.
[22] R. Power and J. Li. Piccolo: building fast, distributed programs with partitioned tables. In Proceedings of the USENIX conference on Operating systems design and implementation (OSDI), pages 114, 2010.
[23] U. Rohm, K. Bohm, H.-J. Schek, and H. Schuldt. Fas: a freshness-sensitive coordination middleware for a cluster of olap components. In VLDB 2002, pages 754765. VLDB Endowment, 2002.
[24] D. Terry. Replicated data consistency explained through baseball. Technical Report MSR-TR-2011-137, Microsoft Research, October 2011.
[25] Yahoo! http://webscope.sandbox.yahoo.com/catalog.php?datatype=g, 2013.
[26] H. Yu and A. Vahdat. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems, 20(3):239282, Aug. 2002.
[27] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010.
[28] M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. Advances in Neural Information Processing Systems, 23(23):19, 2010.
-----1
[1] G. E. Hinton. Learning translation invariant recognition in massively parallel networks. In Proceedings Conference on Parallel Architectures and Laguages Europe, pages 113. Springer, 1987.
[2] M. Ferraro and T. M. Caelli. Lie transformation groups, integral transforms, and invariant pattern recog- nition. Spatial Vision, 8:3344, 1994.
[3] P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition- tangent distance and tangent propagation. In Neural Networks: Tricks of the Trade, pages 239274, 1996.
[4] O. Chapelle, B. Scholkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2006.
[5] S. Ben-David, T. Lu, and D. Pal. Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In COLT, 2008.
[6] T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems.In ICML, 2000.
[7] O. Bousquet, O. Chapelle, and M. Hein. Measure based regularization. In NIPS, 2003.
[8] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.
[9] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001.
[10] M. Belkin, P. Niyogi, and V. Sindhwani. On manifold regularization. In AI-Stats, 2005.
[11] X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
[12] D. Zhou and B. Scholkopf. Discrete Regularization, pages 221232. MIT Press, 2006.
[13] H. Xu, C. Caramanis, and S. Mannor. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10:35893646, 2009.
[14] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2008.
[15] C. Bhattacharyya, K. S. Pannagadatta, and A. J. Smola. A second order cone programming formulation for classifying missing data. In NIPS, 2005.
[16] A. Globerson and S. Roweis. Nightmare at test time: Robust learning by feature deletion. In ICML, 2006.
[17] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In KDD, 2004.
[18] T. Graepel and R. Herbrich. Invariant pattern recognition by semidefinite programming machines. In NIPS, 2004.
[19] C. H. Teo, A. Globerson, S. Roweis, and A. Smola. Convex learning with invariances. In NIPS, 2007.
[20] D. DeCoste and B. Scholkopf. Training invariant support vector machines. Machine Learning, 46:161 190, 2002.
[21] O. Chapelle and B. Scholkopf. Incorporating invariances in nonlinear support vector machines. In NIPS, 2001.
[22] G. Wahba. An introduction to model building with reproducing kernel Hilbert spaces. Technical Report TR 1020, University of Wisconsin-Madison, 2000.
[23] C. Walder and O. Chapelle. Learning with transformation invariant kernels. In NIPS, 2007.
[24] C. Burges. Geometry and invariance in kernel based methods. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods  Support Vector Learning, pages 89116. MIT Press, 1999.
[25] B. Scholkopf and A. Smola. Learning with Kernels. MIT Press, 2001.
[26] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK, 2000.
[27] I. Steinwart and A. Christmann. Support Vector Machines. Information Science and Statistics. Springer, 2008.
[28] E. Kreyszig. Introductory Functional Analysis with Applications. Wiley, 1989.
[29] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML, 2007.
[30] C. H. Teo, S. V. N. Vishwanthan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk mini- mization. Journal of Machine Learning Research, 11:311365, January 2010.
[31] Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341362, 2012.
[32] O. Chapelle. Training a support vector machine in the primal. Neural Comput., 19(5):11551178, 2007.
[33] http://www.cs.ubc.ca/?pcarbo/lbfgsb-for-matlab.html.
[34] K. Bache and M. Lichman. UCI machine learning repository, 2013. University of California, Irvine.
[35] http://www.dii.unisi.it/?melacci/lapsvmp.
[36] V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In ICML, 2005.
[37] http://www.cs.nyu.edu/?roweis/data.html.
[38] G. Wu and E. Y. Chang. Adaptive feature-space conformal transformation for imbalanced-data learning.In ICML, 2003.
-----1
[1] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, in Proc. 21st ICML, ACM, 2004.
[2] C. Cortes, M. Mohri, and A. Rostamizadeh, Generalization bounds for learning kernels, in Proceedings, 27th ICML, pp. 247254, 2010.
[3] M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien, `p-normmultiple kernel learning, Journal of Machine Learning Research, vol. 12, pp. 953997, Mar 2011.
[4] G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. I. Jordan, Learning the kernel matrix with semi-definite programming, JMLR, vol. 5, pp. 2772, 2004.
[5] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, SimpleMKL, J. Mach. Learn. Res., vol. 9, pp. 24912521, 2008.
[6] S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf, Large scale multiple kernel learning, Journal of Machine Learning Research, vol. 7, pp. 15311565, July 2006.
[7] P. Bartlett and S. Mendelson, Rademacher and gaussian complexities: Risk bounds and structural re- sults, Journal of Machine Learning Research, vol. 3, pp. 463482, Nov. 2002.
[8] V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers, Annals of Statistics, vol. 30, pp. 150, 2002.
[9] N. Srebro and S. Ben-David, Learning bounds for support vector machines with learned kernels, in Proc. 19th COLT, pp. 169183, 2006.
[10] Y. Ying and C. Campbell, Generalization bounds for learning the kernel problem, in COLT, 2009.
[11] P. L. Bartlett, O. Bousquet, and S. Mendelson, Local Rademacher complexities, Ann. Stat., vol. 33, no. 4, pp. 14971537, 2005.
[12] V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization, Annals of Statistics, vol. 34, no. 6, pp. 25932656, 2006.
[13] S. Mendelson, On the performance of kernel classes, J. Mach. Learn. Res., vol. 4, pp. 759771, Decem- ber 2003.
[14] B. Scholkopf and A. Smola, Learning with Kernels. Cambridge, MA: MIT Press, 2002.
[15] P. D. Tao and L. T. H. An, A DC optimization algorithm for solving the trust-region subproblem, SIAM Journal on Optimization, vol. 8, no. 2, pp. 476505, 1998.
[16] A. L. Yuille and A. Rangarajan, The concave-convex procedure, Neural Computation, vol. 15, pp. 915 936, Apr. 2003.
[17] C. Cortes and V. Vapnik, Support vector networks, Machine Learning, vol. 20, pp. 273297, 1995.
[18] B. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, in Proc. 5th Annual ACM Workshop on Computational Learning Theory (D. Haussler, ed.), pp. 144152, 1992.
[19] M. Kloft and G. Blanchard, On the convergence rate of `p-norm multiple kernel learning, Journal of Machine Learning Research, vol. 13, pp. 24652502, Aug 2012.
[20] V. Koltchinskii and M. Yuan, Sparsity in multiple kernel learning, Ann. Stat., vol. 38, no. 6, pp. 3660 3695, 2010.
[21] T. Suzuki, Unifying framework for fast learning rate of non-sparse multiple kernel learning, in Advances in Neural Information Processing Systems 24, pp. 15751583, 2011.
[22] P. Gehler and S. Nowozin, On feature combination for multiclass object classification, in International Conference on Computer Vision, pp. 221228, 2009.
[23] S. Sonnenburg, A. Zien, and G. Ratsch, Arts: Accurate recognition of transcription starts in human, Bioinformatics, vol. 22, no. 14, pp. e472e480, 2006.
[24] T. Abeel, Y. V. de Peer, and Y. Saeys, Towards a gold standard for promoter prediction evaluation, Bioinformatics, 2009.
[25] S. Sonnenburg, G. Ratsch, S. Henschel, C. Widmer, J. Behr, A. Zien, F. de Bona, A. Binder, C. Gehl, and V. Franc, The SHOGUN Machine Learning Toolbox, J. Mach. Learn. Res., 2010.
[26] F. Orabona and L. Jie, Ultra-fast optimization algorithm for sparse multi kernel learning, in Proceedings of the 28th International Conference on Machine Learning, 2011.
[27] A. Zien and C. S. Ong, Multiclass multiple kernel learning, in ICML 24, pp. 11911198, ACM, 2007.
[28] T. Damoulas and M. A. Girolami, Probabilistic multi-class multi-kernel learning: on protein fold recog- nition and remote homology detection, Bioinformatics, vol. 24, no. 10, pp. 12641270, 2008.
[29] P. Bartlett and S. Mendelson, Empirical minimization, Probab. Theory Related Fields, vol. 135(3), pp. 311334, 2006.
[30] A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Stat., vol. 32, pp. 135 166, 2004.
-----1
[1] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7:23992434, 2006.
[2] S. Bickel, M. Bruckner, and T. Scheffer. Discriminative learning for differing training and test distributions. In ICML, 2007.
[3] E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, and F. Odone. Learning from examples as an inverse problem. JMLR, 6:883, 2006.
[4] E. De Vito, L. Rosasco, and A. Toigo. Spectral regularization for support estimation. In NIPS, pages 487495, 2010.
[5] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems. Springer, 1996.
[6] A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, and B. Scholkopf. Covariate shift by kernel mean matching. Dataset shift in machine learning, pages 131160, 2009.
[7] S. Grunewalder, A. Gretton, and J. Shawe-Taylor. Smooth operators. In ICML, 2013.
[8] S. Grunewalder, G. Lever, L. Baldassarre, S. Patterson, A. Gretton, and M. Pontil. Conditional mean embeddings as regressors. In ICML, 2012.
[9] J. Huang, A. Gretton, K. M. Borgwardt, B. Scholkopf, and A. Smola. Correcting sample selection bias by unlabeled data. In NIPS, pages 601608, 2006.
[10] T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance esti- mation. JMLR, 10:13911445, 2009.
[11] J. S. Kim and C. Scott. Robust kernel density estimation. In ICASSP, pages 33813384, 2008.
[12] S. Mukherjee and V. Vapnik. Support vector method for multivariate density estimation. In Center for Biological and Computational Learning. Department of Brain and Cognitive Sci- ences, MIT. CBCL, volume 170, 1999.
[13] R. M. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125139, 2001.
[14] X. Nguyen, M. J. Wainwright, and M. I. Jordan. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. NIPS, 20:10891096, 2008.
[15] S. J. Pan and Q. Yang. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):13451359, 2010.
[16] B. Scholkopf and A. J. Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press, 2001.
[17] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004.
[18] T. Shi, M. Belkin, and B. Yu. Data spectroscopy: Eigenspaces of convolution operators and clustering. The Annals of Statistics, 37(6B):39603984, 2009.
[19] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log- likelihood function. Journal of Statistical Planning and Inference, 90(2):227244, 2000.
[20] A. J. Smola and B. Scholkopf. On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica, 22(1):211231, 1998.
[21] I. Steinwart and A. Christmann. Support vector machines. Springer, 2008.
[22] M. Sugiyama, M. Krauledat, and K. Muller. Covariate shift adaptation by importance weighted cross validation. JMLR, 8:9851005, 2007.
[23] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul Von Buenau, and Motoaki Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS, 20:14331440, 2008.
[24] A. Tsybakov. Introduction to nonparametric estimation. Springer, 2009.
[25] C. Williams and M. Seeger. The effect of the input density distribution on kernel-based classi- fiers. In ICML, 2000.
[26] Y. Yu and C. Szepesvari. Analysis of kernel mean matching under covariate shift. In ICML, 2012.
[27] B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In ICML, 2004.
-----1
[1] Y. Ben-Haim and E. Tom-Tov. A streaming parallel decision tree algorithm. Journal of Ma- chine Learning Research, 11:849872, 2010.
[2] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbors. ICML, 2006.
[3] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.
[4] K. Clarkson. Nearest-neighbor searching and metric space dimensions. Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, 2005.
[5] P. Domingos and G. Hulten. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 7180, 2000.
[6] H. Gu and J. Lafferty. Sequential nonparametric regression. ICML, 2012.
[7] L. Gyorfi, M. Kohler, A. Krzyzak, and H. Walk. A Distribution Free Theory of Nonparametric Regression. Springer, New York, NY, 2002.
[8] A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3:423440, 2002.
[9] S. Kpotufe. k-NN Regression Adapts to Local Intrinsic Dimension. NIPS, 2011.
[10] R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, SODA 04, pages 798807, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics.
[11] B. Pfahringer, G. Holmes, and R. Kirkby. Handling numeric attributes in hoeffding trees. In Advances in Knowledge Discovery and Data Mining: Proceedings of the 12th Pacific-Asia Conference (PAKDD), volume 5012, pages 296307. Springer, 2008.
[12] S. Schaal and C. Atkeson. Robot Juggling: An Implementation of Memory-based Learning.Control Systems Magazine, IEEE, 1994.
[13] C. J. Stone. Optimal rates of convergence for non-parametric estimators. Ann. Statist., 8:1348 1360, 1980.
[14] C. J. Stone. Optimal global rates of convergence for non-parametric estimators. Ann. Statist., 10:13401353, 1982.
[15] M. A. Taddy, R. B. Gramacy, and N. G. Polson. Dynamic trees for learning and design. Journal of the American Statistical Association, 106(493), 2011.
[16] S. Vijayakumar and S. Schaal. Locally weighted projection regression: AnO(n) algorithm for incremental real time learning in high dimensional space. In in Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 10791086, 2000.
-----1
[1] P.D. Allison. Missing data series: Quantitative applications in the social sciences, 2002.
[2] T. Bu, N. Duffield, F.L. Presti, and D. Towsley. Network tomography on general topologies. In ACM SIGMETRICS Performance Evaluation Review, volume 30, pages 2130. ACM, 2002.
[3] E.R. Buhi, P. Goodson, and T.B. Neilands. Out of sight, not out of mind: strategies for handling missing data. American journal of health behavior, 32:8392, 2008.
[4] R.M. Daniel, M.G. Kenward, S.N. Cousens, and B.L. De Stavola. Using causal diagrams to guide analysis in missing data problems. Statistical Methods in Medical Research, 21(3):243256, 2012.
[5] R. Dechter, I. Meiri, and J. Pearl. Temporal constraint networks. Artificial intelligence, 1991.
[6] C.K. Enders. Applied Missing Data Analysis. Guilford Press, 2010.
[7] U.M. Fayyad. Data mining and knowledge discovery: Making sense out of data. IEEE expert, 11(5):20 25, 1996.
[8] F. M. Garcia. Definition and diagnosis of problematic attrition in randomized controlled experiments.Working paper, April 2013. Available at SSRN: http://ssrn.com/abstract=2267120.
[9] R.D. Gill and J.M. Robins. Sequential models for coarsening and missingness. In Proceedings of the First Seattle Symposium in Biostatistics, pages 295305. Springer, 1997.
[10] R.D. Gill, M.J. Van Der Laan, and J.M. Robins. Coarsening at random: Characterizations, conjec- tures, counter-examples. In Proceedings of the First Seattle Symposium in Biostatistics, pages 255294.Springer, 1997.
[11] J.W Graham. Missing Data: Analysis and Design (Statistics for Social and Behavioral Sciences).Springer, 2012.
[12] D.F. Heitjan and D.B. Rubin. Ignorability and coarse data. The Annals of Statistics, pages 22442253, 1991.
[13] R.J.A. Little and D.B. Rubin. Statistical analysis with missing data. Wiley, 2002.
[14] B.M. Marlin and R.S. Zemel. Collaborative prediction and ranking with non-random missing data. In Proceedings of the third ACM conference on Recommender systems, pages 512. ACM, 2009.
[15] B.M. Marlin, R.S. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. In UAI, 2007.
[16] B.M. Marlin, R.S. Zemel, S.T. Roweis, and M. Slaney. Recommender systems: missing data and statisti- cal model estimation. In IJCAI, 2011.
[17] P.E. McKnight, K.M. McKnight, S. Sidani, and A.J. Figueredo. Missing data: A gentle introduction.Guilford Press, 2007.
[18] Harvey J Miller and Jiawei Han. Geographic data mining and knowledge discovery. CRC, 2009.
[19] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kauf- mann, 1988.
[20] J. Pearl. Causality: models, reasoning and inference. Cambridge Univ Press, New York, 2009.
[21] J. Pearl and K. Mohan. Recoverability and testability of missing data: Introduction and summary of results. Technical Report R-417, UCLA, 2013. Available at http://ftp.cs.ucla.edu/pub/stat ser/r417.pdf.
[22] C.Y.J. Peng, M. Harwell, S.M. Liou, and L.H. Ehman. Advances in missing data methods and implications for educational research. Real data analysis, pages 3178, 2006.
[23] J.L. Peugh and C.K. Enders. Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of educational research, 74(4):525556, 2004.
[24] D.B. Rubin. Inference and missing data. Biometrika, 63:581592, 1976.
[25] D.B. Rubin. Multiple Imputation for Nonresponse in Surveys. Wiley Online Library, New York, NY, 1987.
[26] D.B. Rubin. Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434):473489, 1996.
[27] J.L. Schafer and J.W. Graham. Missing data: our view of the state of the art. Psychological Methods, 7(2):147177, 2002.
[28] F. Thoemmes and N. Rose. Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal. Technical Report Technical Report R-002, Cornell University, 2013.
[29] M.J. Van der Laan and J.M. Robins. Unified methods for censored longitudinal data and causality.Springer Verlag, 2003.
[30] W. Wothke. Longitudinal and multigroup modeling with missing data. Lawrence Erlbaum Associates Publishers, 2000.
-----1
[1] H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400407, 1951.
[2] B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838855, 1992.
[3] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Adv. NIPS, 2008.
[4] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proc. ICML, 2007.
[5] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):15741609, 2009.
[6] F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Adv. NIPS, 2011.
[7] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2010.
[8] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization.Wiley & Sons, 1983.
[9] Y. Nesterov. Introductory lectures on convex optimization. Kluwer, 2004.
[10] G. Lan. An optimal method for stochastic composite optimization. Mathematical Program- ming, 133(1-2):365397, 2012.
[11] L. Gyorfi and H. Walk. On the averaged stochastic approximation for linear regression. SIAM Journal on Control and Optimization, 34(1):3161, 1996.
[12] H. J. Kushner and G. G. Yin. Stochastic approximation and recursive algorithms and applica- tions. Springer-Verlag, second edition, 2003.
[13] C. Gu. Smoothing spline ANOVA models. Springer, 2002.
[14] R. Aguech, E. Moulines, and P. Priouret. On a perturbation approach for the analysis of stochastic tracking algorithms. SIAM J. Control and Optimization, 39(3):872899, 2000.
[15] F. Bach and E. Moulines. Non-strongly-convex smooth stochastic approximation with conver- gence rate O(1/n). Technical Report 00831977, HAL, 2013.
[16] A. B. Tsybakov. Optimal rates of aggregation. In Proc. COLT, 2003.
[17] O. Macchi. Adaptive processing: The least mean squares approach with applications in trans- mission. Wiley West Sussex, 1995.
[18] S. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. Cambridge U. P., 2009.
[19] A. Hyvarinen and E. Oja. A fast fixed-point algorithm for independent component analysis.Neural computation, 9(7):14831492, 1997.
[20] N.J. Bershad. Analysis of the normalized LMS algorithm with Gaussian inputs. IEEE Trans- actions on Acoustics, Speech and Signal Processing, 34(4):793806, 1986.
[21] A. Nedic and D. Bertsekas. Convergence rate of incremental subgradient algorithms. Stochas- tic Optimization: Algorithms and Applications, pages 263304, 2000.
[22] F. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logis- tic regression. Technical Report 00804431-v2, HAL, 2013.
[23] V. S. Borkar. Stochastic approximation with two time scales. Systems & Control Letters, 29(5):291294, 1997.
[24] A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge Univ. Press, 2000.
[25] F. Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384414, 2010.
[26] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proc. COLT, 2001.
[27] M. Schmidt, N. Le Roux, and F. Bach. Minimizing finite sums with the stochastic average gradient. Technical Report 00860051, HAL, 2013.
-----1
[1] Subhayu Basu, Yoram Gerchman, Cynthia H. Collins, Frances H. Arnold, and Ron Weiss. A synthetic multicellular system for programmed pattern formation. Nature, 434:11301134, 2005.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, 2006.
[3] Ho-Lin Chen, David Doty, and David Soloveichik. Deterministic function computation with chemical reaction networks. In Darko Stefanovic and Andrew Turberfield, editors, DNA Com- puting and Molecular Programming, volume 7433 of Lecture Notes in Computer Science, pages 2542. Springer Berlin Heidelberg, 2012.
[4] Tal Danino, Octavio Mondragon-Palomino, Lev Tsimring, and Jeff Hasty. A synchronized quorum of genetic clocks. Nature, 463:326330, 2010.
[5] Michael B. Elowitz and Stanislas Leibler. A synthetic oscillatory network of transcriptional regulators. Nature, 403:335338, 2000.
[6] A Hjelmfelt, E D Weinberger, and J Ross. Chemical implementation of neural networks and turing machines. Proceedings of the National Academy of Sciences, 88(24):1098310987, 1991.
[7] Erik Winfree Jongmin Kim, John J. Hopfield. Neural network computation by in vitro tran- scriptional circuits. In Advances in Neural Information Processing Systems 17 (NIPS 2004).MIT Press, 2004.
[8] Chengde Mao, Thomas H. LaBean, John H. Reif, and Nadrian C. Seeman. Logical computa- tion using algorithmic self-assembly of dna triple-crossover molecules. Nature, 407:493496, 2000.
[9] K. Oishi and E. Klavins. Biomolecular implementation of linear i/o systems. Systems Biology, IET, 5(4):252260, 2011.
[10] Lulu Qian, Erik Winfree, and Jehoshua Bruck. Neural network computation with dna strand displacement cascades. Nature, 475:368372, 2011.
[11] Paul W. K Rothemund, Nick Papadakis, and Erik Winfree. Algorithmic self-assembly of dna sierpinski triangles. PLoS Biol, 2(12):e424, 12 2004.
[12] Georg Seelig, David Soloveichik, David Yu Zhang, and Erik Winfree. Enzyme-free nucleic acid logic circuits. Science, 314(5805):15851588, 2006.
[13] David Soloveichik, Georg Seelig, and Erik Winfree. Dna as a universal substrate for chemical kinetics. Proceedings of the National Academy of Sciences, 107(12):53935398, 2010.
[14] Benjamin Vigoda. Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Pro- cessing. PhD thesis, Massachusetts Institute of Technology, 2003.
[15] Jonathan S. Yedidia, W.T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. Information Theory, IEEE Transactions on, 51(7):22822312, 2005.
[16] David Yu Zhang and Georg Seelig. Dna-based fixed gain amplifiers and linear classifier circuits. In Yasubumi Sakakibara and Yongli Mi, editors, DNA Computing and Molecular Programming, volume 6518 of Lecture Notes in Computer Science, pages 176186. Springer Berlin Heidelberg, 2011.
-----1
[1] H. Abelson. Lower bounds on information transfer in distributed computations. Journal of the ACM, 27(2):384392, 1980.
[2] M.-F. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed learning, communication com- plexity and privacy. In Proceedings of the Twenty Fifth Annual Conference on Computational Learning Theory, 2012. URL http://arxiv.org/abs/1204.3514.
[3] K. Ball. An elementary introduction to modern convex geometry. In S. Levy, editor, Flavors of Geometry, pages 158. MSRI Publications, 1997.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statis- tical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 2011.
[6] T. M. Cover and J. A. Thomas. Elements of Information Theory, Second Edition. Wiley, 2006.
[7] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13:165202, 2012.
[8] J. C. Duchi and M. J. Wainwright. Distance-based and continuum fano inequalities with appli- cations to statistical estimation. arXiv [cs.IT], to appear, 2013.
[9] J. C. Duchi, A. Agarwal, and M. J. Wainwright. Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Transactions on Automatic Control, 57(3): 592606, 2012.
[10] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates.arXiv:1302.3203 [math.ST], 2013. URL http://arXiv.org/abs/1302.3203.
[11] S. Han and S. Amari. Statistical inference under multiterminal data compression. IEEE Trans- actions on Information Theory, 44(6):23002324, 1998.
[12] I. A. Ibragimov and R. Z. Hasminskii. Statistical Estimation: Asymptotic Theory. Springer- Verlag, 1981.
[13] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
[14] E. L. Lehmann and G. Casella. Theory of Point Estimation, Second Edition. Springer, 1998.
[15] Z.-Q. Luo. Universal decentralized estimation in a bandwidth constrained sensor network.IEEE Transactions on Information Theory, 51(6):22102219, 2005.
[16] Z.-Q. Luo and J. N. Tsitsiklis. Data fusion with minimal communication. IEEE Transactions on Information Theory, 40(5):15511563, 1994.
[17] R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured percep- tron. In North American Chapter of the Association for Computational Linguistics (NAACL), 2010.
[18] J. N. Tsitsiklis. Decentralized detection. In Advances in Signal Processing, Vol. 2, pages 297344. JAI Press, 1993.
[19] A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.
[20] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence.Annals of Statistics, 27(5):15641599, 1999.
[21] A. C.-C. Yao. Some complexity questions related to distributive computing (preliminary re- port). In Proceedings of the Eleventh Annual ACM Symposium on the Theory of Computing, pages 209213. ACM, 1979.
[22] B. Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam, pages 423435.Springer-Verlag, 1997.
[23] Y. Zhang, J. C. Duchi, and M. J. Wainwright. Communication-efficient algorithms for statisti- cal optimization. In Advances in Neural Information Processing Systems 26, 2012.
-----1
[1] John Langford and John Shawe-Taylor. PAC-Bayes & margins. In Advances in Neural Infor- mation Processing Systems (NIPS), 2002.
[2] David McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1), 2003.
[3] John Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 2005.
[4] Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond. Jour- nal of Machine Learning Research, 11, 2010.
[5] Matthew Higgs and John Shawe-Taylor. A PAC-Bayes bound for tailored density estimation.In Proceedings of the International Conference on Algorithmic Learning Theory (ALT), 2010.
[6] Yevgeny Seldin, Peter Auer, Francois Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC- Bayesian analysis of contextual bandits. In Advances in Neural Information Processing Sys- tems (NIPS), 2011.
[7] Matthias Seeger. PAC-Bayesian generalization error bounds for Gaussian process classifica- tion. Journal of Machine Learning Research, 2002.
[8] Yevgeny Seldin, Francois Laviolette, Nicolo` Cesa-Bianchi, John Shawe-Taylor, and Peter Auer. PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58, 2012.
[9] Andreas Maurer and Massimiliano Pontil. Empirical Bernstein bounds and sample variance penalization. In Proceedings of the International Conference on Computational Learning The- ory (COLT), 2009.
[10] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
[11] Andreas Maurer. A note on the PAC-Bayesian theorem. www.arxiv.org, 2004.
[12] David McAllester. Some PAC-Bayesian theorems. Machine Learning, 37, 1999.
[13] A.W. Van Der Vaart. Asymptotic statistics. Cambridge University Press, 1998.
[14] Stephane Boucheron, Gabor Lugosi, and Olivier Bousquet. Concentration inequalities. In O. Bousquet, U.v. Luxburg, and G. Ratsch, editors, Advanced Lectures in Machine Learning.Springer, 2004.
[15] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.http://www.ics.uci.edu/?mlearn/MLRepository.html.
-----1
[1] A. Agarwal, S. Negahban, and M. J. Wainwright. Fast global convergence of gradient methods for high-dimensional statistical recovery. Annals of Statistics, 40(5):24522482, 2012.
[2] P. Breheny and J. Huang. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5(1):232253, 2011.
[3] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle proper- ties. Journal of the American Statistical Association, 96:13481360, 2001.
[4] D. R. Hunter and R. Li. Variable selection using MM algorithms. Annals of Statistics, 33(4):16171642, 2005.
[5] E.L. Lehmann and G. Casella. Theory of Point Estimation. Springer Verlag, 1998.
[6] P. Loh and M.J. Wainwright. High-dimensional regression with noisy and missing data: Prov- able guarantees with non-convexity. Annals of Statistics, 40(3):16371664, 2012.
[7] P. Loh and M.J. Wainwright. Regularized M -estimators with nonconvexity: Statistical and algorithmic theory for local optima. arXiv e-prints, May 2013. Available at http://arxiv.org/abs/1305.2436.
[8] P. McCullagh and J. A. Nelder. Generalized Linear Models (Second Edition). London: Chap- man & Hall, 1989.
[9] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high- dimensional analysis of M -estimators with decomposable regularizers. Statistical Science, 27(4):538557, December 2012. See arXiv version for lemma/propositions cited here.
[10] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE Discus- sion Papers 2007076, Universit Catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2007.
[11] Y. Nesterov and A. Nemirovskii. Interior Point Polynomial Algorithms in Convex Program- ming. SIAM studies in applied and numerical mathematics. Society for Industrial and Applied Mathematics, 1987.
[12] S. A. Vavasis. Complexity issues in global optimization: A survey. In Handbook of Global Optimization, pages 2741. Kluwer, 1995.
[13] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894942, 2010.
[14] C.-H. Zhang and T. Zhang. A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4):576593, 2012.
[15] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models.Annals of Statistics, 36(4):15091533, 2008.
-----0
Benny Applebaum, Boaz Barak, and David Xiao. On basing lower-bounds for learning on worstcase assumptions. In Foundations of Computer Science, 2008. FOCS08. IEEE 49th Annual IEEE Symposium on, pages 211220. IEEE, 2008.
Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In COLT, 2013.
Nicolo Cesa-Bianchi, Alex Conconi, and Claudio Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50:20502057, 2001.
Venkat Chandrasekaran and Michael I. Jordan. Computational and statistical tradeoffs via convex relaxation. Proceedings of the National Academy of Sciences, 2013.S. Decatur, O. Goldreich, and D. Ron. Computational sample complexity. SIAM Journal on Computing, 29, 1998.
O. Dubios, R. Monasson, B. Selma, and R. Zecchina (Guest Editors). Phase Transitions in Combinatorial Problems. Theoretical Computer Science, Volume 265, Numbers 1-2, 2001.
U. Feige. Relations between average case complexity and approximation complexity. In STOC, pages 534543, 2002.
Uriel Feige and Eran Ofek. Easily refutable subformulas of large random 3cnf formulas. Theory of Computing, 3(1):2543, 2007.
E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In COLT, 2012.
P. Long. and R. Servedio. Low-weight halfspaces for sparse boolean vectors. In ITCS, 2013.R. Servedio. Computational sample complexity and attribute-efficient learning. J. of Comput. Syst.Sci., 60(1):161178, 2000.
Shai Shalev-Shwartz, Ohad Shamir, and Eran Tromer. Using more data to speed-up training time.In AISTATS, 2012.V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.
-----1
[1] Gabor Lugosi and Nicolas Vayatis. On the Bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32(1):3055, 2004.
[2] Wenxin Jiang. Process consistency for AdaBoost. Annals of Statistics, 32(1):1329, 2004.
[3] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1):56134, 2004.
[4] Ingo Steinwart. Consistency of support vector machines and other regularized kernel classi- fiers. IEEE Transactions on Information Theory, 51(1):128142, 2005.
[5] Peter L. Bartlett, Michael Jordan, and JonMcAuliffe. Convexity, classification and risk bounds.Journal of the American Statistical Association, 101(473):138156, 2006.
[6] Tong Zhang. Statistical analysis of some multi-category large margin classification methods.Journal of Machine Learning Research, 5:12251251, 2004.
[7] Ambuj Tewari and Peter L. Bartlett. On the consistency of multiclass classification methods.Journal of Machine Learning Research, 8:10071025, 2007.
[8] Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approx- imation, 26:225287, 2007.
[9] David Cossock and Tong Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):51405154, 2008.
[10] Fen Xia, Tie-Yan Liu, JueWang, Wensheng Zhang, and Hang Li. Listwise approach to learning to rank: Theory and algorithm. In International Conference on Machine Learning, 2008.
[11] John Duchi, Lester Mackey, and Michael Jordan. On the consistency of ranking algorithms. In International Conference on Machine Learning, 2010.
[12] Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. On NDCG consistency of listwise rank- ing methods. In International Conference on Artificial Intelligence and Statistics, 2011.
[13] David Buffoni, Clement Calauze`nes, Patrick Gallinari, and Nicolas Usunier. Learning scoring functions with order-preserving losses and standardized supervision. In International Confer- ence on Machine Learning, 2011.
[14] Wei Gao and Zhi-Hua Zhou. On the consistency of multi-label learning. In Conference on Learning Theory, 2011.
[15] Clement Calauze`nes, Nicolas Usunier, and Patrick Gallinari. On the (non-)existence of convex, calibrated surrogate losses for ranking. In Advances in Neural Information Processing Systems 25, pages 197205. 2012.
[16] Harish G. Ramaswamy and Shivani Agarwal. Classification calibration dimension for general multiclass losses. In Advances in Neural Information Processing Systems 25, pages 2087 2095. 2012.
[17] Yanyan Lan, Jiafeng Guo, Xueqi Cheng, and Tie-Yan Liu. Statistical consistency of ranking methods in a rank-differentiable probability space. In Advances in Neural Information Pro- cessing Systems 25, pages 12411249. 2012.
[18] Quoc V. Le and Alex Smola. Direct optimization of ranking measures, arXiv:0704.3359, 2007.
[19] Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. A support vector method for optimizing average precision. In Proceedings of the 30th ACM SIGIR International Con- ference on Research and Development in Information Retrieval, 2007.
-----1
[1] T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimiza- tion. Annals of Statistics, 32(1):56134, 2004.
[2] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification and risk bounds. Journal of the American Statistical Association, 101(473):138156, 2006.
[3] M. D. Reid and R. C. Williamson. Surrogate regret bounds for proper losses. In ICML, 2009.
[4] C. Scott. Calibrated asymmetric surrogate losses. Electronic Journal of Statistics, 6:958992, 2012.
[5] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining prefer- ences. Journal of Machine Learning Research, 4:933969, 2003.
[6] C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Informa- tion Processing Systems 16. MIT Press, 2004.
[7] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393425, 2005.
[8] S. Clemencon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of U-statistics. Annals of Statistics, 36:844874, 2008.
[9] A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation: Structure and applications. Technical report, University of Pennsylvania, November 2005.
[10] M. D. Reid and R. C. Williamson. Composite binary losses. Journal of Machine Learning Research, 11:23872422, 2010.
[11] L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996.
[12] S. Clemencon and S. Robbiano. Minimax learning rates for bipartite ranking and plug-in rules. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[13] John Langford and Bianca Zadrozny. Estimating class membership probabilities using classifier learners.In AISTATS, 2005.
[14] C. Rudin and R.E. Schapire. Margin-based ranking and an equivalence between adaboost and rankboost.Journal of Machine Learning Research, 10:21932232, 2009.
[15] S. Ertekin and C. Rudin. On equivalence relationships between classification and ranking algorithms.Journal of Machine Learning Research, 12:29052929, 2011.
[16] M. Ayer, H.D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26(4):641647, 1955.
[17] H. D. Brunk. On the estimation of parameters restricted by inequalities. The Annals of Mathematical Statistics, 29(2):437454, 1958.
[18] B. Zadrozny and C. Elkan. Transforming classifier scores into accurate multiclass probability estimates.In KDD, 2002.
[19] A.K. Menon, X. Jiang, S. Vembu, C. Elkan, and L. Ohno-Machado. Predicting accurate probabilities with a ranking loss. In ICML, 2012.
[20] J. Hernandez-Orallo, P. Flach, and C. Ferri. A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13:28132869, 2012.
[21] A. Guyader, N. Hengartner, N. Jegou, and E. Matzner-Lber. Iterative isotonic regression.arXiv:1303.4288, 2013.
[22] C. Drummond and R.C. Holte. Cost curves: An improved method for visualizing classifier performance.Machine Learning, 65(1):95130, 2006.
[23] M.A. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, volume 2, 2003.
[24] A.T. Kalai and R. Sastry. The isotron algorithm: High-dimensional isotonic regression. In COLT, 2009.
[25] T. Fawcett and A. Niculescu-Mizil. PAV and the ROC convex hull. Machine Learning, 68(1):97106, 2007.
[26] S. Agarwal. Surrogate regret bounds for the area under the ROC curve via strongly proper losses. In COLT, 2013.
[27] D. Anevski and P. Soulier. Monotone spectral density estimation. Annals of Statistics, 39(1):418438, 2011.
[28] P. Groeneboom and G. Jongbloed. Generalized continuous isotonic regression. Statistics & Probability Letters, 80(34):248253, 2010.
-----1
[1] N. Alon and J. H. Spencer. The probabilistic method. John Wiley ? Sons, 2004.
[2] Jean-Yves Audibert and Sebastien Bubeck. Minimax policies for adversarial and stochastic bandits. In COLT, 2009.
[3] Peter Auer, Nicolo` Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):4877, 2002.
[4] Peter Auer, Nicolo` Cesa-Bianchi, and Claudio Gentile. Adaptive and self-confident on-line learning algorithms. J. Comput. Syst. Sci., 64(1):4875, 2002.
[5] Y. Caro. New results on the independence number. In Tech. Report, Tel-Aviv University, 1979.
[6] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth.How to use expert advice. J. ACM, 44(3):427485, 1997.
[7] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[8] Nicolo` Cesa-Bianchi and Gabor Lugosi. Combinatorial bandits. J. Comput. Syst. Sci., 78(5):14041422, 2012.
[9] V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4(3):233235, 1979.
[10] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Euro-COLT, pages 2337. Springer-Verlag, 1995. Also, JCSS 55(1): 119-139 (1997).
[11] A. M. Frieze. On the independence number of random graphs. Discrete Mathematics, 81:171 175, 1990.
[12] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Com- puter and System Sciences, 71:291307, 2005.
[13] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212261, 1994.
[14] S. Mannor and O. Shamir. From bandits to experts: On the value of side-observations. In 25th Annual Conference on Neural Information Processing Systems ?NIPS 2011), 2011.
[15] Alan Said, Ernesto W De Luca, and Sahin Albayrak. How social relationships affect user similarities. In Proceedings of the International Conference on Intelligent User Interfaces Workshop on Social Recommender Systems, Hong Kong, 2010.
[16] V. G. Vovk. Aggregating strategies. In COLT, pages 371386, 1990.
[17] V. K. Wey. A lower bound on the stability number of a simple graph. In Bell Lab. Tech. Memo No. 81-11217-9, 1981.
-----1
[1] V. Dani, T.P. Hayes, and S.M. Kakade. Stochastic linear optimization under bandit feedback. In Proceed- ings of the 21st Annual Conference on Learning Theory (COLT), pages 355366, 2008.
[2] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24, 2011.
[3] P. Rusmevichientong and J.N. Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395411, 2010.
[4] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing, 2008.
[5] N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. Information Theory, IEEE Transactions on, 58(5):3250 3265, may 2012. ISSN 0018-9448. doi: 10.1109/TIT.2011.2182033.
[6] S. Filippi, O. Cappe, A. Garivier, and C. Szepesvari. Parametric bandits: The generalized linear case.Advances in Neural Information Processing Systems, 23:19, 2010.
[7] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
[8] T.L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathe- matics, 6(1):422, 1985.
[9] Tze Leung Lai. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statis- tics, pages 10911114, 1987.
[10] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235256, 2002.
[11] O. Cappe, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz. Kullback-Leibler upper confidence bounds for optimal sequential allocation. Submitted to the Annals of Statistics.
[12] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning.The Journal of Machine Learning Research, 99:15631600, 2010.
[13] Peter L Bartlett and Ambuj Tewari. Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 3542. AUAI Press, 2009.
[14] Levente Kocsis and Csaba Szepesvari. Bandit based monte-carlo planning. In Machine Learning: ECML 2006, pages 282293. Springer, 2006.
[15] Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. arXiv preprint arXiv:1301.2609, 2013.
[16] On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika, 25(3/4):285294, 1933.
[17] S.L. Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639658, 2010.
[18] O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Neural Information Processing Systems (NIPS), 2011.
[19] S. Agrawal and N. Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. 2012.
[20] S. Agrawal and N. Goyal. Further optimal regret bounds for Thompson sampling. arXiv preprint arXiv:1209.3353, 2012.
[21] E. Kauffmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite time analysis. In International Conference on Algorithmic Learning Theory, 2012.
[22] S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. arXiv preprint arXiv:1209.3352, 2012.
[23] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R.E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In Conference on Artificial Intelligence and Statistics (AISTATS), vol- ume 15. JMLR Workshop and Conference Proceedings, 2011.
[24] K. Amin, M. Kearns, and U. Syed. Bandits, query learning, and the haystack dimension. In Proceedings of the 24th Annual Conference on Learning Theory (COLT), 2011.
[25] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learn- ing Research, 3:397422, 2003.
-----1
[1] J. Abernethy, Y. Chen, and J. Wortman Vaughan. An optimization-based framework for auto- mated market-making. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 297306, 2011.
[2] S. Agrawal, E. Delag, M. Peters, Z. Wang, and Y. Ye. A unified framework for dynamic prediction market design. Operations research, 59(3):550568, 2011.
[3] A. Blum and A. Kalai. Universal portfolios with and without transaction costs. Machine Learning, 35(3):193205, 1999.
[4] T. Chakraborty and M. Kearns. Market making and mean reversion. In Proceedings of the 12th ACM conference on Electronic commerce, pages 307314. ACM, 2011.
[5] Y. Chen and D. M. Pennock. A utility framework for bounded-loss market makers. In Pro- ceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, pages 4956, 2007.
[6] Y. Chen and J. Wortman Vaughan. A new understanding of prediction markets via no-regret learning. In Proceedings of the 11th ACM Conference on Electronic Commerce, pages 189 198, 2010.
[7] T. M. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions on Information Theory, 42(2):348363, 1996.
[8] S. Das. A learning market-maker in the glostenmilgrom model. Quantitative Finance, 5(2):169180, 2005.
[9] S. Das. The effects of market-making on price dynamics. In Proceedings of the 7th Inter- national Joint Conference on Autonomous Agents and Multiagent Systems, pages 887894, 2008.
[10] S. Das and M. Magdon-Ismail. Adapting to a market shock: Optimal sequential market- making. In Proceedings of the 21th Annual Conference on Neural Information Processing Systems, pages 361368, 2008.
[11] L. R. Glosten and P. R. Milgrom. Bid, ask and transaction prices in a specialist market with heterogeneously informed traders. Journal of financial economics, 14(1):71100, 1985.
[12] R. Hanson. Combinatorial information market design. Information Systems Frontiers, 5(1):105119, 2003.
[13] R. Hanson. Logarithmic market scoring rules for modular combinatorial information aggrega- tion. Journal of Prediction Markets, 1(1):315, 2007.
[14] E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In Learning Theory, pages 499513. Springer, 2006.
[15] D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth. On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4):325347, 1998.
[16] A. T. Kalai and S. Vempala. Efficient algorithms for universal portfolios. The Journal of Machine Learning Research, 3:423440, 2003.
[17] A. T. Kalai and S. Vempala. Efficient algorithms for online decision problems. J. Comput.Syst. Sci., 71(3):291307, 2005.
[18] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212261, 1994.
[19] N. Della Penna and M. D. Reid. Bandit market makers. arXiv preprint arXiv:1112.0076, 2011.
[20] T. M. Cover. Universal portfolios. Mathematical Finance, 1(1):129, January 1991.
-----1
[1] A. Atamturk and V. Narayanan. The submodular knapsack polytope. Discrete Optimization, 2009.
[2] M. Conforti and G. Cornuejols. Submodular set functions, matroids and the greedy algorithm: tight worst- case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Applied Mathematics, 7(3):251274, 1984.
[3] U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 1998.
[4] S. Fujishige. Submodular functions and optimization, volume 58. Elsevier Science, 2005.
[5] J. Garofolo, F. Lamel, L., J. W., Fiscus, D. Pallet, and N. Dahlgren. Timit, acoustic-phonetic continuous speech corpus. In DARPA, 1993.
[6] M. Goemans, N. Harvey, S. Iwata, and V. Mirrokni. Approximating submodular functions everywhere. In SODA, pages 535544, 2009.
[7] A. Guillory and J. Bilmes. Interactive submodular set cover. In ICML, 2010.
[8] A. Guillory and J. Bilmes. Simultaneous learning and covering with adversarial noise. In ICML, 2011.
[9] R. Iyer and J. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In UAI, 2012.
[10] R. Iyer and J. Bilmes. The submodular Bregman and Lovasz-Bregman divergences with applications. In NIPS, 2012.
[11] R. Iyer and J. Bilmes. Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints: Extended arxiv version, 2013.
[12] R. Iyer, S. Jegelka, and J. Bilmes. Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions . In NIPS, 2013.
[13] R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based submodular function optimization. In ICML, 2013.
[14] S. Jegelka and J. A. Bilmes. Submodularity beyond submodular energies: coupling edges in graph cuts. In CVPR, 2011.
[15] Y. Kawahara and T. Washio. Prismatic algorithm for discrete dc programming problems. In NIPS, 2011.
[16] H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems. Springer Verlag, 2004.
[17] A. Krause and C. Guestrin. A note on the budgeted maximization on submodular functions. Technical Report CMU-CALD-05-103, Carnegie Mellon University, 2005.
[18] A. Krause, B. McMahan, C. Guestrin, and A. Gupta. Robust submodular observation selection. Journal of Machine Learning Research (JMLR), 9:27612801, 2008.
[19] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9:235284, 2008.
[20] H. Lin and J. Bilmes. How to select a good training-data subset for transcription: Submodular active selection for sequences. In Interspeech, 2009.
[21] H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submodular functions.In NAACL, 2010.
[22] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT- 2011), Portland, OR, June 2011.
[23] H. Lin and J. Bilmes. Optimal selection of limited vocabulary speech corpora. In Interspeech, 2011.
[24] R. C. Moore and W. Lewis. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, pages 220224. Association for Computational Linguistics, 2010.
[25] M. Narasimhan and J. Bilmes. A submodular-supermodular procedure with applications to discriminative structure learning. In UAI, 2005.
[26] E. Nikolova. Approximation algorithms for offline risk-averse combinatorial optimization, 2010.
[27] J. Rousu and J. Shawe-Taylor. Efficient computation of gapped substring kernels on large alphabets.Journal of Machine Learning Research, 6(2):1323, 2006.
[28] M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint.Operations Research Letters, 32(1):4143, 2004.
[29] L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385393, 1982.
-----1
[1] J. Abernethy, R. M. Frongillo, and A. Wibisono. Minimax option pricing meets Black-Scholes in the limit. In Howard J. Karloff and Toniann Pitassi, editors, STOC, pages 10291040. ACM, 2012.
[2] F. Black andM. Scholes. The pricing of options and corporate liabilities. The Journal of Political Economy, pages 637654, 1973.
[3] P. DeMarzo, I. Kremer, and Y. Mansour. Online trading algorithms and robust option pricing.In Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pages 477486.ACM, 2006.
[4] R. Durrett. Probability: Theory and Examples (Fourth Edition). Cambridge University Press, 2010.
[5] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an appli- cation to boosting. In Computational learning theory, pages 2337. Springer, 1995.
[6] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Compu- tation, 108(2):212261, 1994.
[7] G. Shafer and V. Vovk. Probability and Finance: Its Only a Game!, volume 373. Wiley- Interscience, 2001.
[8] J. M. Steele. Stochastic Calculus and Financial Applications, volume 45. Springer Verlag, 2001.
-----1
[1] C. E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6):11521174, 1974.
[2] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences.Journal of Machine Learning Research, 6:17051749, 2005.
[3] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden Markov model. In Advances in neural information processing systems, 2002.
[4] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[5] T. Broderick, B. Kulis, and M. I. Jordan. MAD-Bayes: MAP-based asymptotic derivations from Bayes. In Proceedings of the 30th International Conference on Machine Learning, 2013.
[6] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the infinite hidden Markov model. In Proceedings of the 25th International Conference on Machine Learning, 2008.
[7] K. Jiang, B. Kulis, and M. I. Jordan. Small-variance asymptotics for exponential family Dirich- let process mixture models. In Advances in Neural Information Processing Systems, 2012.
[8] B. Kulis and M. I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics.In Proceedings of the 29th International Conference on Machine Learning, 2012.
[9] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recog- nition. Proceedings of the IEEE, 77(2):257286, 1989.
[10] S. Roweis. EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems, 1998.
[11] E. Sudderth. Toward reliable Bayesian nonparametric learning. In NIPSWorkshop on Bayesian Noparametric Models for Reliable Planning and Decision-Making Under Uncertainty, 2012.
[12] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):15661581, 2006.
[13] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of Royal Statistical Society, Series B, 21(3):611622, 1999.
[14] S. Tong and D. Koller. Restricted Bayes optimal classifiers. In Proc. 17th AAAI Conference, 2000.
-----1
[1] Y. Huang, Q. Liu, and D. Metaxas. Video object segmentation by hypergraph cut. In CVPR, pages 1738  1745, 2009.
[2] P. Ochs and T. Brox. Higher order motion models and spectral clustering. In CVPR, pages 614621, 2012.
[3] S. Klamt, U.-U. Haus, and F. Theis. Hypergraphs and cellular networks. PLoS Computational Biology, 5:e1000385, 2009.
[4] Z. Tian, T. Hwang, and R. Kuang. A hypergraph-based learning algorithm for classifying gene expression and arraycgh data with prior knowledge. Bioinformatics, 25:28312838, 2009.
[5] D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: an approach based on dynamical systems. VLDB Journal, 8:222236, 2000.
[6] J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang, and X. He. Music recommendation by unified hyper- graph: Combining social media information and music content. In Proc. of the Int. Conf. on Multimedia (MM), pages 391400, 2010.
[7] A. Shashua, R. Zass, and T. Hazan. Multi-way clustering using super-symmetric non-negative tensor factorization. In ECCV, pages 595608, 2006.
[8] S. Rota Bulo and M. Pellilo. A game-theoretic approach to hypergraph clustering. In NIPS, pages 1571 1579, 2009.
[9] M. Leordeanu and C. Sminchisescu. Efficient hypergraph clustering. In AISTATS, pages 676684, 2012.
[10] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Petrona, D. J. Kriegman, and S. Belongie. Beyond pairwise clustering. In CVPR, pages 838845, 2005.
[11] D. Zhou, J. Huang, and B. Scholkopf. Learning with hypergraphs: Clustering, classification, and embed- ding. In NIPS, pages 16011608, 2006.
[12] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In ICML, pages 1724, 2006.
[13] E. Ihler, D. Wagner, and F. Wagner. Modeling hypergraphs by graphs with the same mincut properties.Information Processing Letters, 45:171175, 1993.
[14] M. Hein and T. Buhler. An inverse power method for nonlinear eigenproblems with applications in 1- spectral clustering and sparse PCA. In NIPS, pages 847855, 2010.
[15] A. Szlam and X. Bresson. Total variation and Cheeger cuts. In ICML, pages 10391046, 2010.
[16] M. Hein and S. Setzer. Beyond spectral clustering - tight relaxations of balanced graph cuts. In NIPS, pages 23662374, 2011.
[17] T. Buhler, S. Rangapuram, S. Setzer, and M. Hein. Constrained fractional set programs and their applica- tion in local clustering and community detection. In ICML, pages 624632, 2013.
[18] F. Bach. Learning with submodular functions: A convex optimization perspective. CoRR, abs/1111.6453, 2011.
[19] M. Belkin and P. Niyogi. Semi-supervised learning on manifolds. Machine Learning, 56:209239, 2004.
[20] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency.In NIPS, volume 16, pages 321328, 2004.
[21] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22(8):888905, 2000.
[22] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17:395416, 2007.
[23] E. Esser, X. Zhang, and T. F. Chan. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences, 3(4):10151046, 2010.
[24] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. of Math. Imaging and Vision, 40:120145, 2011.
[25] L. Condat. A primaldual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J. Optimization Theory and Applications, 158(2):460479, 2013.
[26] K. Kiwiel. On Linear-Time algorithms for the continuous quadratic knapsack problem. J. Opt. Theory Appl., 134(3):549554, 2007.
-----1
[1] Mikhail Belkin and Kaushik Sinha, Polynomial learning of distribution families, Foun- dations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, IEEE, 2010, pp. 103112.
[2] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira, Analysis of rep- resentations for domain adaptation, Advances in neural information processing systems 19 (2007), 137.
[3] David M Blei, Andrew Y Ng, and Michael I Jordan, Latent dirichlet allocation, the Journal of machine Learning research 3 (2003), 9931022.
[4] Kamalika Chaudhuri and Satish Rao, Learning mixtures of product distributions using correlations and independence, Proc. of COLT, 2008.
[5] Sanjoy Dasgupta, Learning mixtures of gaussians, Foundations of Computer Science, 1999. 40th Annual Symposium on, IEEE, 1999, pp. 634644.
[6] Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant, Efficiently learning mixtures of two gaussians, Proceedings of the 42nd ACM symposium on Theory of computing, ACM, 2010, pp. 553562.
[7] Ravindran Kannan, Hadi Salmasian, and Santosh Vempala, The spectral method for general mixture models, Learning Theory, Springer, 2005, pp. 444457.
[8] Ankur Moitra and Gregory Valiant, Settling the polynomial learnability of mixtures of gaussians, Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Sympo- sium on, IEEE, 2010, pp. 93102.
[9] Christos H Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vem- pala, Latent semantic indexing: A probabilistic analysis, Proceedings of the seven- teenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database sys- tems, ACM, 1998, pp. 159168.
[10] Arora Sanjeev and Ravi Kannan, Learning mixtures of arbitrary gaussians, Proceedings of the thirty-third annual ACM symposium on Theory of computing, ACM, 2001, pp. 247257.
[11] Santosh Vempala and Grant Wang, A spectral algorithm for learning mixtures of distri- butions, Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on, IEEE, 2002, pp. 113122.
-----1
[1] P. Abbeel and A.Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proc.ICML, 2004.
[2] R. H. Affandi, A. Kulesza, and E. B. Fox. Markov determinantal point processes. In Proc. UAI, 2012.
[3] R.H. Affandi, A. Kulesza, E.B. Fox, and B. Taskar. Nystrom approximation for large-scale determinantal processes. In Proc. AISTATS, 2013.
[4] R. A. Bernstein and M. Gobbel. Partitioning of space in communities of ants. Journal of Animal Ecology, 48(3):931942, 1979.
[5] A. Borodin and E.M. Rains. Eynard-Mehta theorem, Schur process, and their Pfaffian analogs.Journal of statistical physics, 121(3):291317, 2005.
[6] CMU. Carnegie Mellon University graphics lab motion capture database.http://mocap.cs.cmu.edu/, 2009.
[7] D.J. Daley and D. Vere-Jones. An introduction to the theory of point processes: Volume I: Elementary theory and methods. Springer, 2003.
[8] G.E. Fasshauer and M.J. McCourt. Stable evaluation of Gaussian radial basis function inter- polants. SIAM Journal on Scientific Computing, 34(2):737762, 2012.
[9] J. Gillenwater, A. Kulesza, and B. Taskar. Discovering diverse and salient threads in document collections. In Proc. EMNLP, 2012.
[10] J.B. Hough, M. Krishnapur, Y. Peres, and B. Virag. Determinantal processes and independence.Probability Surveys, 3:206229, 2006.
[11] A. Kulesza and B. Taskar. Structured determinantal point processes. In Proc. NIPS, 2010.
[12] A. Kulesza and B. Taskar. k-DPPs: Fixed-size determinantal point processes. In ICML, 2011.
[13] A. Kulesza and B. Taskar. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5(23), 2012.
[14] F. Lavancier, J. Mller, and E. Rubak. Statistical aspects of determinantal point processes. arXiv preprint arXiv:1205.4818, 2012.
[15] O. Macchi. The coincidence approach to stochastic point processes. Advances in Applied Probability, pages 83122, 1975.
[16] B. Matern. Spatial variation. Springer-Verlag, 1986.
[17] T. Neeff, G. S. Biging, L. V. Dutra, C. C. Freitas, and J. R. Dos Santos. Markov point processes for modeling of spatial forest patterns in Amazonia derived from interferometric height. Remote Sensing of Environment, 97(4):484494, 2005.
[18] F. Petralia, V. Rao, and D. Dunson. Repulsive mixtures. In NIPS, 2012.
[19] A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS, 2007.
[20] S. Richardson and P. J. Green. On Bayesian analysis of mixtures with an unknown number of components (with discussion). JRSS:B, 59(4):731792, 1997.
[21] C.P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.
[22] J Schur. Uber potenzreihen, die im innern des einheitskreises beschrankt sind. Journal fur die reine und angewandte Mathematik, 147:205232, 1917.
[23] M. Stephens. Dealing with label switching in mixture models. JRSS:B, 62(4):795809, 2000.
[24] C.A. Sugar and G.M. James. Finding the number of clusters in a dataset: An information- theoretic approach. JASA, 98(463):750763, 2003.
[25] L. A. Waller, A. Sarkka, V. Olsbo, M. Myllymaki, I.G. Panoutsopoulou, W.R. Kennedy, and G. Wendelschafer-Crabb. Second-order spatial analysis of epidermal nerve fibers. Statistics in Medicine, 30(23):28272841, 2011.
[26] J. Wang. Consistent selection of the number of clusters via crossvalidation. Biometrika, 97(4): 893904, 2010.
[27] C.K.I. Williams and M. Seeger. Using the Nystrom method to speed up kernel machines. NIPS, 2000.
[28] T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou. Nystrom method vs random fourier features: A theoretical and empirical comparison. NIPS, 2012.
[29] J. Zou and R.P. Adams. Priors for diversity in generative latent variable models. In NIPS, 2012.
-----1
[1] D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
[2] S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45 (11):24712482, 2009.
[3] S. Bhatnagar, H. Prasad, and L.A. Prashanth. Stochastic Recursive Algorithms for Optimization, volume 434. Springer, 2013.
[4] V. Borkar. A sensitivity formula for the risk-sensitive cost and the actor-critic algorithm. Systems & Control Letters, 44:339346, 2001.
[5] V. Borkar. Q-learning for risk-sensitive control. Mathematics of Operations Research, 27:294311, 2002.
[6] H. Chen, T. Duncan, and B. Pasik-Duncan. A Kiefer-Wolfowitz algorithm with randomized differences.IEEE Transactions on Automatic Control, 44(3):442453, 1999.
[7] E. Delage and S. Mannor. Percentile optimization for Markov decision processes with parameter uncer- tainty. Operations Research, 58(1):203213, 2010.
[8] J. Filar, L. Kallenberg, and H. Lee. Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147161, 1989.
[9] J. Filar, D. Krass, and K. Ross. Percentile performance criteria for limiting average Markov decision processes. IEEE Transaction of Automatic Control, 40(1):210, 1995.
[10] R. Howard and J. Matheson. Risk sensitive Markov decision processes. Management Science, 18(7): 356369, 1972.
[11] V. Katkovnik and Y. Kulchitsky. Convergence of a class of random search algorithms. Automatic Remote Control, 8:8187, 1972.
[12] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matri- ces. Operations Research, 53(5):780798, 2005.
[13] J. Peters, S. Vijayakumar, and S. Schaal. Natural actor-critic. In Proceedings of the Sixteenth European Conference on Machine Learning, pages 280291, 2005.
[14] L.A. Prashanth and S. Bhatnagar. Reinforcement learning with average cost for adaptive control of traffic lights at intersections. In Proceedings of the Fourteenth International IEEE Conference on Intelligent Transportation Systems, pages 16401645. IEEE, 2011.
[15] L.A. Prashanth and S. Bhatnagar. Reinforcement Learning With Function Approximation for Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems, 12(2):412421, june 2011.
[16] L.A. Prashanth and S. Bhatnagar. Threshold Tuning Using Stochastic Optimization for Graded Signal Control. IEEE Transactions on Vehicular Technology, 61(9):38653880, Nov. 2012.
[17] L.A. Prashanth and M. Ghavamzadeh. Actor-Critic Algorithms for Risk-Sensitive MDPs. Technical report inria-00794721, INRIA, 2013.
[18] W. Sharpe. Mutual fund performance. Journal of Business, 39(1):119138, 1966.
[19] M. Sobel. The variance of discounted Markov decision processes. Applied Probability, pages 794802, 1982.
[20] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation.IEEE Transactions on Automatic Control, 37(3):332341, 1992.
[21] R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of Advances in Neural Information Processing Systems 12, pages 10571063, 2000.
[22] A. Tamar, D. Di Castro, and S. Mannor. Policy gradients with variance related risk criteria. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, pages 387396, 2012.
[23] A. Tamar, D. Di Castro, and S. Mannor. Temporal difference methods for the variance of the reward to go.In Proceedings of the Thirtieth International Conference on Machine Learning, pages 495503, 2013.
[24] H. Xu and S. Mannor. Distributionally robust Markov decision processes. Mathematics of Operations Research, 37(2):288300, 2012.
-----1
[1] S. Ross, G. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 2011. 1, 2, 6, 7, 8 
[2] S. Chernova and M. Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34, 2009. 1, 8 
[3] B. Argall, M. Veloso, and B. Browning. Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robotics and Autonomous Systems, 59(3-4), 2011. 1, 8 
[4] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. 1 
[5] Cs. Szepesvari. Algorithms for Reinforcement Learning. Morgan Claypool Publishers, 2010. 1, 2 
[6] M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4: 11071149, 2003. 2, 6 
[7] D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310335, 2011. 2 
[8] A.-m. Farahmand, M. Ghavamzadeh, Cs. Szepesvari, and S. Mannor. Regularized policy iteration. In NIPS 21, 2009. 2, 3 
[9] J. Z. Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In ICML, 2009. 2 
[10] G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In ICML, 2009. 2 
[11] M. Ghavamzadeh, A. Lazaric, R. Munos, and M. Hoffman. Finite-sample analysis of Lasso-TD. In ICML, 2011. 2 
[12] I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008. 3, 5 
[13] I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, and Y. Singer. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):14531484, 2006. 3 
[14] A. Antos, Cs. Szepesvari, and R. Munos. Learning near-optimal policies with Bellman-residual mini- mization based fitted policy iteration and a single sample path. Machine Learning, 71:89129, 2008.
[15] A.-m. Farahmand and Cs. Szepesvari. Model selection in reinforcement learning. Machine Learning, 85 (3):299332, 2011. 4 
[16] R. Munos. Error bounds for approximate policy iteration. In ICML, 2003. 4 
[17] A.-m. Farahmand, R. Munos, and Cs. Szepesvari. Error propagation for approximate policy and value iteration. In NIPS 23, 2010. 4 
[18] B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 22(1):94116, January 1994. 4 
[19] P.-M. Samson. Concentration of measure inequalities for Markov chains and ?-mixing processes. The Annals of Probability, 28(1):416461, 2000. 4 
[20] S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: A survey of some recent advances.ESAIM: Probability and Statistics, 9:323375, 2005. 5 
[21] T. Hester, M. Quinlan, and P. Stone. RTMBA: A real-time model-based reinforcement learning architec- ture for robot control. In ICRA, 2012. 5 
[22] CVX Research, Inc. CVX: Matlab software for disciplined convex programming, version 2.0. http: //cvxr.com/cvx, August 2012. 5 
[23] S. Ross and J. A. Bagnell. Efficient reductions for imitation learning. In AISTATS, 2010. 8 
[24] W. B Knox and P. Stone. Reinforcement learning from simultaneous human and MDP reward. In AAMAS, 2012. 8 
[25] D. Ramachandran and E. Amir. Bayesian inverse reinforcement learning. In IJCAI, 2007. 8 
[26] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning.In AAAI, 2008. 8 
[27] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Learning attractor landscapes for learning motor primitives. In NIPS 15, 2002. 8 
[28] J. Kober and J. Peters. Policy search for motor primitives in robotics. Machine Learning, 84(1-2):171 203, 2011. 8 
[29] A. Coates, P. Abbeel, and A. Y. Ng. Learning for control from multiple demonstrations. In ICML, 2008.
-----1
[1] A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. In NIPS, pages 873881, 2011.
[2] D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 1724, December 2008.
[3] J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, pages 4153, 2010.
[4] P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):5565, 2010.
[5] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit prob- lem. Machine learning, 47(2):235256, 2002.
[6] M. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed learning, communication complexity and privacy. Arxiv preprint arXiv:1204.3514, 2012.
[7] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory, pages 2337. Springer, 2009.
[8] D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal. Mortal multi-armed bandits. In NIPS, pages 273280, 2008.
[9] H. Daume III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Efficient protocols for distributed classification and optimization. In ALT, 2012.
[10] H. Daume III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Protocols for learning classifiers on distributed data. AISTAT, 2012.
[11] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun.ACM, 51(1):107113, Jan. 2008.
[12] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13:165202, 2012.
[13] J. Duchi, A. Agarwal, and M. J. Wainwright. Distributed dual averaging in networks. NIPS, 23, 2010.
[14] E. Even-Dar, S. Mannor, and Y. Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. The Journal of Machine Learning Research, 7:10791105, 2006.
[15] V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck. Multi-bandit best arm identification.NIPS, 2011.
[16] E. Hillel, Z. Karnin, T. Koren, R. Lempel, and O. Somekh. Distributed exploration in multi- armed bandits. arXiv preprint arXiv:1311.0800, 2013.
[17] V. Kanade, Z. Liu, and B. Radunovic. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems 25, pages 260268, 2012.
[18] Z. Karnin, T. Koren, and O. Somekh. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on Machine Learning, 2013.
[19] K. Liu and Q. Zhao. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, 58(11):56675681, Nov. 2010.
[20] S. Mannor and J. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. The Journal of Machine Learning Research, 5:623648, 2004.
[21] O. Maron and A. W. Moore. Hoeffding races: Accelerating model selection search for classi- fication and function approximation. In NIPS, 1994.
[22] V. Mnih, C. Szepesvari, and J.-Y. Audibert. Empirical bernstein stopping. In ICML, pages 672679. ACM, 2008.
[23] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM, pages 4352, October 2008.
[24] Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML, page 151, June 2009.
-----1
[1] J. Abernethy, A. Agarwal, P. L. Bartlett, and A. Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT, 2009.
[2] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-confident on-line learning algo- rithms. J. Comput. Syst. Sci., 64(1):4875, 2002.
[3] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[4] J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2011.
[5] E. Even-Dar, M. Kearns, Y. Mansour, and J. Wortman. Regret to the best vs. regret to the average. In N. H. Bshouty and C. Gentile, editors, COLT, volume 4539 of Lecture Notes in Computer Science, pages 233247. Springer, 2007.
[6] Y. Freund and R. E. Schapire. Large margin classification using the Perceptron algorithm.Machine Learning, pages 277296, 1999.
[7] C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265299, 2003.
[8] S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Regularization techniques for learning with matrices. CoRR, abs/0910.0610, 2009.
[9] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predic- tors. Information and Computation, 132(1):163, January 1997.
[10] N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4):285318, 1988.
[11] Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87.Springer, 2003.
[12] F. Orabona and K. Crammer. New adaptive algorithms for online classification. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 18401848. 2010.
[13] F. Orabona, K. Crammer, and N. Cesa-Bianchi. A generalized online mirror descent with applications to classification and regression, 2013. arXiv:1304.2994.
[14] R. T. Rockafellar. Convex Analysis (Princeton Mathematical Series). Princeton University Press, 1970.
[15] S. Shalev-Shwartz. Online learning: Theory, algorithms, and applications. Technical report, The Hebrew University, 2007. PhD thesis.
[16] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2), 2012.
[17] S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning algorithms.Machine Learning Journal, 2007.
[18] N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 21992207. 2010.
[19] M. Streeter and B. McMahan. No-regret algorithms for unconstrained online convex optimiza- tion. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 24112419. 2012.
[20] V. Vovk. On-line regression competitive with reproducing kernel hilbert spaces. In Jin-Yi Cai, S.Barry Cooper, and Angsheng Li, editors, Theory and Applications of Models of Com- putation, volume 3959 of Lecture Notes in Computer Science, pages 452463. Springer Berlin Heidelberg, 2006.
[21] V. G. Vovk. Aggregating strategies. In COLT, pages 371386, 1990.
-----1
[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer, 2008.
[2] Charles Stein. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution.In Proc. 3rd Berkeley Sympos. Math. Statist. Probability, volume 1, pages 197206, 1956.
[3] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matri- ces. Journal of Multivariate Analysis, 88(2):365411, 2004.
[4] Jerome. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165175, 1989.
[5] Juliane Schafer and Korbinian Strimmer. A shrinkage approach to large-scale covariance matrix esti- mation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1):11751189, 2005.
[6] Boaz Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36(6):27912817, 2008.
[7] Harry Markowitz. Portfolio selection. Journal of Finance, VII(1):7791, March 1952.
[8] Daniel Bartz, Kerr Hatrick, Christian W. Hesse, Klaus-Robert Muller, and Steven Lemm. Directional Variance Adjustment: Bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization. PLoS ONE, 8(7):e67503, 07 2013.
[9] Olivier Ledoit and Michael Wolf. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance, 10:603621, 2003.
[10] Jonathan J. Hull. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550554, May 1994.
[11] Mark A Fanty and Ronald Cole. Spoken letter recognition. In Advances in Neural Information Processing Systems, volume 3, pages 220226, 1990.
[12] Kevin Bache and Moshe Lichman. UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences, 2013.
[13] Anne Kerstin Porbadnigk, Jan-Niklas Antons, Benjamin Blankertz, Matthias S Treder, Robert Schleicher, Sebastian Moller, and Gabriel Curio. Using ERPs for assessing the (sub)conscious perception of noise.In 32nd Annual Intl Conf. of the IEEE Engineering in Medicine and Biology Society, pages 26902693, 2010.
[14] Anne Kerstin Porbadnigk, Matthias S Treder, Benjamin Blankertz, Jan-Niklas Antons, Robert Schleicher, Sebastian Moller, Gabriel Curio, and Klaus-Robert Muller. Single-trial analysis of the neural correlates of speech quality perception. Journal of neural engineering, 10(5):056003, 2013.
-----1
[1] G. Dornhege, J. del R. Millan, T. Hinterberger, D. McFarland, and K.-R. Muller, Eds., Toward Brain-Computer Interfacing. Cambridge, MA: MIT Press, 2007.
[2] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, Brain- computer interfaces for communication and control, Clin. Neurophysiol., vol. 113, no. 6, pp.767791, 2002.
[3] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. Muller, Optimizing Spatial filters for Robust EEG Single-Trial Analysis, IEEE Signal Proc. Magazine, vol. 25, no. 1, pp.4156, 2008.
[4] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, Optimal spatial filtering of single trial eeg during imagined hand movement, IEEE Trans. Rehab. Eng., vol. 8, no. 4, pp. 441446, 1998.
[5] L. C. Parra, C. D. Spence, A. D. Gerson, and P. Sajda, Recipes for the linear analysis of eeg, NeuroImage, vol. 28, pp. 326341, 2005.
[6] S. Lemm, B. Blankertz, T. Dickhaus, and K.-R. Muller, Introduction to machine learning for brain imaging, NeuroImage, vol. 56, no. 2, pp. 387399, 2011.
[7] F. Lotte and C. Guan, Regularizing common spatial patterns to improve bci designs: Unified theory and new algorithms, IEEE Trans. Biomed. Eng., vol. 58, no. 2, pp. 355 362, 2011.
[8] W. Samek, C. Vidaurre, K.-R. Muller, and M. Kawanabe, Stationary common spatial patterns for brain-computer interfacing, Journal of Neural Engineering, vol. 9, no. 2, p. 026013, 2012.
[9] O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance ma- trices, Journal of Multivariate Analysis, vol. 88, no. 2, pp. 365  411, 2004.
[10] H. Lu, H.-L. Eng, C. Guan, K. Plataniotis, and A. Venetsanopoulos, Regularized common spatial pattern with aggregation for eeg classification in small-sample setting, IEEE Transac- tions on Biomedical Engineering, vol. 57, no. 12, pp. 29362946, 2010.
[11] D. Devlaminck, B. Wyns, M. Grosse-Wentrup, G. Otte, and P. Santens, Multi-subject learning for common spatial patterns in motor-imagery bci, Computational Intelligence and Neuro- science, vol. 2011, no. 217987, pp. 19, 2011.
[12] B. Blankertz, M. K. R. Tomioka, F. U. Hohlefeld, V. Nikulin, and K.-R. Muller, Invariant common spatial patterns: Alleviating nonstationarities in brain-computer interfacing, in Ad.in NIPS 20, 2008, pp. 113120.
[13] W. Samek, F. C. Meinecke, and K.-R. Muller, Transferring subspaces between subjects in brain-computer interfacing, IEEE Transactions on Biomedical Engineering, vol. 60, no. 8, pp. 22892298, 2013.
[14] M. Arvaneh, C. Guan, K. K. Ang, and C. Quek, Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain-computer interface, IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 4, pp. 610619, 2013.
[15] S. Amari, H. Nagaoka, and D. Harada, Methods of information geometry. American Mathe- matical Society, 2000.
[16] S. Eguchi and Y. Kano, Robustifying maximum likelihood estimation, Tokyo Institute of Statistical Mathematics, Tokyo, Japan, Tech. Rep, 2001.
[17] R. Bhatia, Matrix analysis, ser. Graduate Texts in Mathematics. Springer, 1997, vol. 169.
[18] M. Mihoko and S. Eguchi, Robust blind source separation by beta divergence, Neural Com- put., vol. 14, no. 8, pp. 18591886, Aug. 2002.
[19] C. Fevotte and J. Idier, Algorithms for nonnegative matrix factorization with the &#946;- divergence, Neural Comput., vol. 23, no. 9, pp. 24212456, Sep. 2011.
[20] A. Hyvarinen, Survey on independent component analysis, Neural Computing Surveys, vol. 2, pp. 94128, 1999.
[21] M. Kawanabe, W. Samek, P. von Bunau, and F. Meinecke, An information geometrical view of stationary subspace analysis, in Artificial Neural Networks and Machine Learning - ICANN 2011, ser. LNCS. Springer Berlin / Heidelberg, 2011, vol. 6792, pp. 397404.
[22] N. Murata, T. Takenouchi, and T. Kanamori, Information geometry of u-boost and bregman divergence, Neural Computation, vol. 16, pp. 14371481, 2004.
[23] H. Wang, Harmonic mean of kullbackleibler divergences for optimizing multi-class eeg spatio-temporal filters, Neural Processing Letters, vol. 36, no. 2, pp. 161171, 2012.
[24] P. von Bunau, F. C. Meinecke, F. C. Kiraly, and K.-R. Muller, Finding Stationary Subspaces in Multivariate Time Series, Physical Review Letters, vol. 103, no. 21, pp. 214 101+, 2009.
[25] P. von Bunau, Stationary subspace analysis - towards understanding non-stationary data, Ph.D. dissertation, Technische Universitat Berlin, 2012.
[26] W. Samek, M. Kawanabe, and K.-R. Muller, Divergence-based framework for common spa- tial patterns algorithms, IEEE Reviews in Biomedical Engineering, 2014, in press.
[27] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones, Robust and efficient estimation by min- imising a density power divergence, Biometrika, vol. 85, no. 3, pp. 549559, 1998.
[28] P. J. Huber, Robust Statistics, ser. Wiley Series in Probability and Statistics. Wiley- Interscience, 1981.
[29] B. Blankertz, C. Sannelli, S. Halder, E. M. Hammer, A. Kubler, K.-R. Muller, G. Curio, and T. Dickhaus, Neurophysiological predictor of smr-based bci performance, NeuroImage, vol. 51, no. 4, pp. 13031309, 2010.
[30] P. J. Rousseeuw and K. V. Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics, vol. 41, no. 3, pp. 212223, 1999.
[31] B. Blankertz, S. Lemm, M. S. Treder, S. Haufe, and K.-R. Muller, Single-trial analysis and classification of ERP components  a tutorial, NeuroImage, vol. 56, no. 2, pp. 814825, 2011.
[32] J. Baik and J. Silverstein, Eigenvalues of large sample covariance matrices of spiked popula- tion models, Journal of Multivariate Analysis, vol. 97, no. 6, pp. 13821408, 2006.
-----1
[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of community hierarchies in large networks. J. Stat Mech, 2008.
[2] T. Cai, W. Liu, and X. Luo. A constrained `1 minimization approach to sparse precision matrix estimation. Journal of American Statistical Association, 106:594607, 2011.
[3] A. dAspremont, O. Banerjee, and L. E. Ghaoui. First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and its Applications, 30(1):5666, 2008.
[4] R. S. Dembo, S. C. Eisenstat, and T. Steihaug. Inexact Newton methods. SIAM J. Numerical Anal., 19(2):400408, 1982.
[5] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multi- level approach. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29:11:19441957, 2007.
[6] J. Duchi, S. Gould, and D. Koller. Projected subgradient methods for learning sparse Gaus- sians. UAI, 2008.
[7] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graph- ical lasso. Biostatistics, 9(3):432441, July 2008.
[8] C.-J. Hsieh, I. S. Dhillon, P. Ravikumar, and A. Banerjee. A divide-and-conquer method for sparse inverse covariance estimation. In NIPS, 2012.
[9] C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar. Sparse inverse covariance estimation using quadratic approximation. 2013.
[10] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359392, 1999.
[11] J. D. Lee, Y. Sun, and M. A. Saunders. Proximal newton-type methods for minimizing com- posite functions. In NIPS, 2012.
[12] L. Li and K.-C. Toh. An inexact interior point method for `1-reguarlized sparse covariance selection. Mathematical Programming Computation, 2:291315, 2010.
[13] R. Mazumder and T. Hastie. Exact covariance thresholding into connected components for large-scale graphical lasso. Journal of Machine Learning Research, 13:723736, 2012.
[14] N. Meinshausen and P. Buhlmann. High dimensional graphs and variable selection with the lasso. Annals of Statistics, 34:14361462, 2006.
[15] P. Olsen, F. Oztoprak, J. Nocedal, and S. Rennie. Newton-like methods for sparse inverse covariance estimation. Technical report, Optimization Center, Northwestern University, 2012.
[16] B. Rolfs, B. Rajaratnam, D. Guillot, A. Maleki, and I. Wong. Iterative thresholding algorithm for sparse inverse covariance estimation. In NIPS, 2012.
[17] K. Scheinberg, S. Ma, and D. Goldfarb. Sparse inverse covariance selection via alternating linearization methods. NIPS, 2010.
[18] K. Scheinberg and I. Rish. Learning sparse Gaussian Markov networks using a greedy coor- dinate ascent approach. In J. Balczar, F. Bonchi, A. Gionis, and M. Sebag, editors, Machine Learning and Knowledge Discovery in Databases, volume 6323 of Lecture Notes in Computer Science, pages 196212. Springer Berlin / Heidelberg, 2010.
[19] D. M. Witten, J. H. Friedman, and N. Simon. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, 20(4):892900, 2011.
-----1
[1] J. Ashburner and K. J. Friston. Voxel-based morphometrythe methods. NeuroImage, 11(6):805821, 2000.
[2] J. Ashburner and K. J. Friston. Why voxel-based morphometry should be used. NeuroImage, 14(6):1238 1243, 2001.
[3] P. H. Westfall and S. S. Young. Resampling-based multiple testing: examples and methods for p-value adjustment, volume 279. Wiley-Interscience, 1993.
[4] J. M. Bland and D. G. Altman. Multiple significance tests: the bonferroni method. British Medical Journal, 310(6973):170, 1995.
[5] J. Li and L. Ji. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3):221227, 2005.
[6] J. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):94409445, 2003.
[7] H. Finner and V. Gontscharuk. Controlling the familywise error rate with plug-in estimator for the propor- tion of true null hypotheses. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5):10311048, 2009.
[8] J. T. Leek and J. D. Storey. A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105(48):1871818723, 2008.
[9] S. Clarke and P. Hall. Robustness of multiple testing procedures against dependence. The Annals of Statistics, pages 332358, 2009.
[10] S. Garc?a, A. Fernandez, J. Luengo, and F. Herrera. Advanced nonparametric tests for multiple compar- isons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10):20442064, 2010.
[11] Y. Ge, S. Dudoit, and T. P. Speed. Resampling-based multiple testing for microarray data analysis. Test, 12(1):177, 2003.
[12] T. Nichols and S. Hayasaka. Controlling the familywise error rate in functional neuroimaging: a compar- ative review. Statistical Methods in Medical Research, 12:419446, 2003.
[13] K. D. Singh, G. R. Barnes, and A. Hillebrand. Group imaging of task-related changes in cortical synchro- nisation using nonparametric permutation testing. NeuroImage, 19(4):15891601, 2003.
[14] D. Pantazis, T. E. Nichols, S. Baillet, and R. M. Leahy. A comparison of random field theory and permu- tation methods for the statistical analysis of meg data. NeuroImage, 25(2):383394, 2005.
[15] B. Gaonkar and C. Davatzikos. Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification. NeuroImage, 78:270283, 2013.
[16] J. M. Cheverud. A simple correction for multiple comparisons in interval mapping genome scans. Hered- ity, 87(1):5258, 2001.
[17] J. He, L. Balzano, and A. Szlam. Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In CVPR, 2012.
[18] M. Dwass. Modified randomization tests for nonparametric hypotheses. The Annals of Mathematical Statistics, 28(1):181187, 1957.
[19] E. J. Cande`s and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans- actions on Information Theory, 56(5):20532080, 2010.
[20] M. Fazel, H. Hindi, and S. Boyd. Rank minimization and applications in system theory. In American Control Conference, volume 4, 2004.
[21] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. Arxiv Preprint, 2007. arxiv:0706.4138.
[22] L. Balzano, R. Nowak, and B. Recht. Online identification and tracking of subspaces from highly incom- plete information. Arxiv Preprint, 2007. arxiv:1006.4046.
[23] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and Willsky A. S. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572596, 2011.
[24] F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturba- tions of large random matrices. Advances in Mathematics, 227(1):494521, 2011.
-----0
Almohamad, H. and Duffuaa, S. A linear programming approach for the weighted graph matching problem. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 15(5):522525, 1993.
Barabasi, A. and Albert, R. Emergence of scaling in random networks. Science, 286(5439):509512, 1999.
Bertsekas, D. and Tsitsiklis, J. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, 1989.
Chiquet, J., Grandvalet, Y., and Ambroise, C. Inferring multiple graphical structures. Statistics and Computing, 21(4):537553, 2011.
Conte, D., Foggia, P., Sansone, C., and Vento, M. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18(03):265298, 2004.
Craddock, R.C., James, G.A., Holtzheimer, P.E., Hu, X.P., and Mayberg, H.S. A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping, 33(8):1914 1928, 2012.
Erdo?s, P. and Renyi, A. On random graphs, I. Publicationes Mathematicae, 6:290297, 1959.Fiori, Marcelo, Muse, Pablo, and Sapiro, Guillermo. Topology constraints in graphical models. In Advances in Neural Information Processing Systems 25, pp. 800808, 2012.
Fiori, Marcelo, Muse, Pablo, Hariri, Ahamd, and Sapiro, Guillermo. Multimodal graphical models via group lasso. Signal Processing with Adaptive Sparse Structured Representations, 2013.
Kuhn, H. W. The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2:8397, 1955.
Lin, Z., Liu, R., and Su, Z. Linearized alternating direction method with adaptive penalty for lowrank representation. In Advances in Neural Information Processing Systems 24, pp. 612620.2011.
Loh, P. and Wainwright, M. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In Advances in Neural Information Processing Systems 25, pp.20962104. 2012.
Newman, M. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.Nooner, K. et al. The NKI-Rockland sample: A model for accelerating the pace of discovery science in psychiatry. Frontiers in Neuroscience, 6(152), 2012.
Seshadhri, C., Kolda, T.G., and Pinar, A. Community structure and scale-free collections of Erdo?sRenyi graphs. Physical Review E, 85(5):056109, 2012.
Umeyama, S. An eigendecomposition approach to weighted graph matching problems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 10(5):695703, 1988.
Varoquaux, G., Gramfort, A., Poline, J.B., and T., Bertrand. Brain covariance selection: better individual functional connectivity models using population prior. In Advances in Neural Information Processing Systems 23, pp. 23342342, 2010.
Varshney, L., Chen, B., Paniagua, E., Hall, D., and Chklovskii, D. Structural properties of the caenorhabditis elegans neuronal network. PLoS Computational Biology, 7(2):e1001066, 2011.
Vogelstein, J.T., Conroy, J.M., Podrazik, L.J., Kratzer, S.G., Harley, E.T., Fishkind, D.E., Vogelstein, R.J., and Priebe, C.E. Fast approximate quadratic programming for large (brain) graph matching.arXiv:1112.5507, 2012.Yuan, M. and Lin, Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):4967, 2006.
Yuan, M. and Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1):1935, February 2007.
Zaslavskiy, M., Bach, F., and Vert, J.P. A path following algorithm for the graph matching problem.Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(12):22272242, 2009.
-----1
[1] A. Agarwal and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In Proc. ECCV, pages 3043, 2006.
[2] R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In Proc. CVPR, 2012.
[3] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, pages 153160, 2006.
[4] A Berg, J Deng, and L Fei-Fei. Large scale visual recognition challenge (ILSVRC), 2010. URL http: //www.image-net.org/challenges/LSVRC/2010/.
[5] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pool- ing. In Proc. ECCV, pages 430443, 2012.
[6] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proc. BMVC., 2011.
[7] D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification.In Proc. CVPR, pages 36423649, 2012.
[8] A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning.In AISTATS, 2011.
[9] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 122, 2004.
[10] A. Gordo, J. A. Rodr?guez-Serrano, F. Perronnin, and E. Valveny. Leveraging category-level labels for instance-level image retrieval. In Proc. CVPR, pages 30453052, 2012.
[11] B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification.In Proc. ECCV, 2012.
[12] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Compu- tation, 18(7):15271554, 2006.
[13] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, pages 487493, 1998.
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, pages 11061114, 2012.
[15] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proc. CVPR, pages 951958, 2009.
[16] S. Lazebnik, C. Schmid, and J Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recog- nizing Natural Scene Categories. In Proc. CVPR, 2006.
[17] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012.
[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998.
[19] D. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, pages 11501157, Sep 1999.
[20] F. Perronnin, J. Sanchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification.In Proc. ECCV, 2010.
[21] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classification. In Proc. CVPR, pages 34823489, 2012.
[22] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):10191025, 1999.
[23] J. Sanchez and F. Perronnin. High-dimensional signature compression for large-scale image classification.In Proc. CVPR, 2011.
[24] J. Sanchez, F. Perronnin, and T. Em?dio de Campos. Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognition Letters, 33(16):22162223, 2012.
[25] P. Sermanet and Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. In Interna- tional Joint Conference on Neural Networks, pages 28092813, 2011.
[26] T. Serre, L. Wolf, and T. Poggio. A new biologically motivated framework for robust object recognition.Proc. CVPR, 2005.
[27] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM.In ICML, volume 227, 2007.
[28] K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher Vector Faces in the Wild. In Proc.BMVC., 2013.
[29] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc.ICCV, volume 2, pages 14701477, 2003.
[30] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category recognition using classemes. In Proc. ECCV, pages 776789, sep 2010.
[31] J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling up to large vocabulary image annotation. In IJCAI, pages 27642770, 2011.
[32] S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In Proc. ECCV, pages 473487, 2012.
[33] J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proc. CVPR, pages 17941801, 2009.
-----0
Ahuja, R., Magnanti, T., and Orlin, J. (1993). Network Flows: Theory, Algorithms and Applications. Prentice Hall.Avis, D. (1980). On the extreme rays of the metric cone. Canadian Journal of Mathematics, 32(1):126144.
Berg, C., Christensen, J., and Ressel, P. (1984). Harmonic Analysis on Semigroups. Number 100 in Graduate Texts in Mathematics. Springer Verlag.
Bieling, J., Peschlow, P., and Martini, P. (2010). An efficient gpu implementation of the revised simplex method.In Parallel Distributed Processing, 2010 IEEE International Symposium on, pages 18.
Brickell, J., Dhillon, I., Sra, S., and Tropp, J. (2008). The metric nearness problem. SIAM J. Matrix Anal. Appl, 30(1):375396.
Brualdi, R. A. (2006). Combinatorial matrix classes, volume 108. Cambridge University Press.Cover, T. and Thomas, J. (1991). Elements of Information Theory. Wiley & Sons.
Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5):14701480.
Erlander, S. and Stewart, N. (1990). The gravity model in transportation analysis: theory and extensions. Vsp.Ferradans, S., Papadakis, N., Rabin, J., Peyre, G., Aujol, J.-F., et al. (2013). Regularized discrete optimal transport. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 112.
Franklin, J. and Lorenz, J. (1989). On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717735.
Good, I. (1963). Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. The Annals of Mathematical Statistics, pages 911934.
Grauman, K. and Darrell, T. (2004). Fast contour matching using approximate earth movers distance. In IEEE Conf. Vision and Patt. Recog., pages 220227.
Gudmundsson, J., Klein, O., Knauer, C., and Smid, M. (2007). Small manhattan networks and algorithmic applications for the earth movers distance. In Proceedings of the 23rd European Workshop on Computational Geometry, pages 174177.
Indyk, P. and Thaper, N. (2003). Fast image retrieval via embeddings. In 3rd International Workshop on Statistical and Computational Theories of Vision (at ICCV).
Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev., 106:620630.Knight, P. A. (2008). The sinkhorn-knopp algorithm: convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1):261275.
Ling, H. and Okada, K. (2007). An efficient earth movers distance algorithm for robust histogram comparison.IEEE transactions on Patt. An. and Mach. Intell., pages 840853.
Naor, A. and Schechtman, G. (2007). Planar earthmover is not in l1. SIAM J. Comput., 37(3):804826.Pele, O. and Werman, M. (2009). Fast and robust earth movers distances. In ICCV09.
Rubner, Y., Guibas, L., and Tomasi, C. (1997). The earth movers distance, multi-dimensional scaling, and color-based image retrieval. In Proceedings of the ARPA Image Understanding Workshop, pages 661668.
Shirdhonkar, S. and Jacobs, D. (2008). Approximate earth movers distance in linear time. In CVPR 2008, pages 18. IEEE.
Sinkhorn, R. (1967). Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402405.
Villani, C. (2009). Optimal transport: old and new, volume 338. Springer Verlag.Wilson, A. G. (1969). The use of entropy maximising models, in the theory of trip distribution, mode split and route split. Journal of Transport Economics and Policy, pages 108126.
-----0
Biau, G. (2012). Analysis of a random forests model. The Journal of Machine Learning Research, 98888:10631095.
Biau, G., Devroye, L., and Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. The Journal of Machine Learning Research, 9:20152033.Breiman, L. (1996). Bagging predictors. Machine learning, 24(2):123140.Breiman, L. (2001). Random forests. Machine learning, 45(1):532.
Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA.
Breiman, L. (2004). Consistency for a simple model of random forests. Technical report, UC Berkeley.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and regression trees.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010). Variable selection using random forests.Pattern Recognition Letters, 31(14):22252236.Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1):342.
Ho, T. (1998). The random subspace method for constructing decision forests. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(8):832844.Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic Journal of Statistics, 1:519537.
Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1):273324.
Liaw, A. and Wiener, M. (2002). Classification and regression by randomforest. R news, 2(3):1822.Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. Journal of the American Statistical Association, 101(474):578590.Meinshausen, N. (2006). Quantile regression forests. The Journal of Machine Learning Research, 7:983999.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:28252830.
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., and Zeileis, A. (2008). Conditional variable importance for random forests. BMC bioinformatics, 9(1):307.
Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics, 8(1):25.
White, A. P. and Liu, W. Z. (1994). Technical note: Bias in information-based measures in decision tree induction. Machine Learning, 15(3):321329.
Zhao, G. (2000). A new perspective on classification. PhD thesis, Utah State University, Department of Mathematics and Statistics.
-----1
[1] D.A. Baldwin. Early referential understanding: Infants ability to recognize referential acts for what they are. Developmental Psychology, 29(5):832843, 1993.
[2] D. Barner, N. Brooks, and A. Bale. Accessing the unsaid: The role of scalar alternatives in childrens pragmatic inference. Cognition, 118(1):84, 2011.
[3] L. Bergen, N. D. Goodman, and R. Levy. Thats what she (could have) said: How alternative utterances affect language use. In Proceedings of the 34th Annual Conference of the Cognitive Science Society, 2012.
[4] R.A.H. Bion, A. Borovsky, and A. Fernald. Fast mapping, slow learning: Disambiguation of novel word object mappings in relation to vocabulary learning at 18, 24, and 30months. Cognition, 2012.
[5] C. F. Camerer, T.-H. Ho, and J.-K. Chong. A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3):861898, 2004.
[6] E.V. Clark. On the logic of contrast. Journal of Child Language, 15:317335, 1988.
[7] Herbert H Clark and Deanna Wilkes-Gibbs. Referring as a collaborative process. Cognition, 22(1):139, 1986.
[8] R. Dale and E. Reiter. Computational interpretations of the gricean maxims in the generation of referring expressions. Cognitive Science, 19(2):233263, 1995.
[9] M. C. Frank and N. D. Goodman. Predicting pragmatic reasoning in language games. Science, 336(6084):998998, 2012.
[10] M. C. Frank, N. D. Goodman, and J. B. Tenenbaum. Using speakers referential intentions to model early cross-situational word learning. Psychological Science, 20:578585, 2009.
[11] B. Galantucci. An experimental study of the emergence of human communication systems. Cognitive science, 29(5):737767, 2005.
[12] D. Golland, P. Liang, and D. Klein. A game-theoretic approach to generating spatial descriptions. In Proceedings of EMNLP 2010, pages 410419. Association for Computational Linguistics, 2010.
[13] Noah D. Goodman and Andreas Stuhlmuller. Knowledge and implicature: Modeling language under- standing as social cognition. Topics in Cognitive Science, 5:173184, 2013.
[14] H.P. Grice. Logic and conversation. Syntax and Semantics, 3:4158, 1975.
[15] L. Horn. Toward a new taxonomy for pragmatic inference: Q-based and r-based implicature. In Meaning, form, and use in context, volume 42. Washington: Georgetown University Press, 1984.
[16] J. S. Horst and L. K. Samuelson. Fast mapping but poor retention by 24-month-old infants. Infancy, 13(2):128157, 2008.
[17] G. Kachergis, C. Yu, and R. M. Shiffrin. An associative model of adaptive inference for learning word referent mappings. Psychonomic Bulletin & Review, 19(2):317324, April 2012.
[18] S. Kirby, H. Cornish, and K. Smith. Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31):1068110686, 2008.
[19] R. M. Krauss and S. Weinheimer. Changes in reference phrases as a function of frequency of usage in social interaction: A preliminary study. Psychonomic Science, 1964.
[20] T. Kwiatkowski, S. Goldwater, L. Zettlemoyer, and M. Steedman. A probabilistic model of syntactic and semantic acquisition from child-directed utterances and their meanings. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 234244, 2012.
[21] S.C. Levinson. Presumptive meanings: The theory of generalized conversational implicature. MIT Press, 2000.
[22] E. M. Markman and G. F. Wachtel. Childrens use of mutual exclusivity to constrain the meanings of words. Cognitive Psychology, 20:121157, 1988.
[23] A. Papafragou and J. Musolino. Scalar implicatures: Experiments at the semantics-pragmatics interface.Cognition, 86(3):253282, 2003.
[24] S. T. Piantadosi, H. Tily, and E. Gibson. Word lengths are optimized for efficient communication. Pro- ceedings of the National Academy of Sciences, 108(9):3526 3529, 2011.
[25] R. van Rooy. Evolution of conventional meaning and conversational principles. Synthese, 139(2):331 366, 2004.
[26] G. Zipf. The Psychobiology of Language. Routledge, London, 1936.
-----1
[1] F. R. Bach. Bolasso: model consistent lasso estimation through the bootstrap. In Proceedings of the 25th international conference on Machine learning, pages 3340. ACM, 2008.
[2] P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Amer. J.of Mathematics, 37:17051732, 2009.
[3] P. Buhlmann and S. van de Geer. Statistics for high-dimensional data. Springer-Verlag, 2011.
[4] E. Cande`s and Y. Plan. Near-ideal model selection by `1 minimization. The Annals of Statistics, 37(5A):21452177, 2009.
[5] E. Candes, J. K. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. on Inform. Theory, 52:489  509, 2006.
[6] E. Candes and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35:23132351, 2007.
[7] E. J. Candes and T. Tao. Decoding by linear programming. IEEE Trans. on Inform. Theory, 51:4203 4215, 2005.
[8] S. Chen and D. Donoho. Examples of basis pursuit. In Proceedings of Wavelet Applications in Signal and Image Processing III, San Diego, CA, 1995.
[9] D. L. Donoho. Compressed sensing. IEEE Trans. on Inform. Theory, 52:489509, April 2006.
[10] A. Javanmard and A. Montanari. Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory. arXiv preprint arXiv:1301.4240, 2013.
[11] A. Javanmard and A. Montanari. Model selection for high-dimensional regression under the generalized irrepresentability condition. arXiv:1305.0355, 2013.
[12] K. Knight and W. Fu. Asymptotics for lasso-type estimators. Annals of Statistics, pages 13561378, 2000.
[13] K. Lounici. Sup-norm convergence rate and sign concentration property of lasso and dantzig estimators.Electronic Journal of statistics, 2:90102, 2008.
[14] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso.Ann. Statist., 34:14361462, 2006.
[15] J. Peng, J. Zhu, A. Bergamaschi, W. Han, D.-Y. Noh, J. R. Pollack, and P. Wang. Regularized multivari- ate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1):5377, 2010.
[16] S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):22462253, 2003.
[17] R. Tibshirani. Regression shrinkage and selection with the Lasso. J. Royal. Statist. Soc B, 58:267288, 1996.
[18] R. J. Tibshirani. The lasso problem and uniqueness. Electronic Journal of Statistics, 7:14561490, 2013.
[19] S. van de Geer and P. Buhlmann. On the conditions used to prove oracle results for the lasso. Electron. J.Statist., 3:13601392, 2009.
[20] S. van de Geer, P. Buhlmann, and S. Zhou. The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat., 5:688749, 2011.
[21] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming. IEEE Trans. on Inform. Theory, 55:21832202, 2009.
[22] F. Ye and C.-H. Zhang. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls.Journal of Machine Learning Research, 11:35193540, 2010.
[23] P. Zhao and B. Yu. On model selection consistency of Lasso. The Journal of Machine Learning Research, 7:25412563, 2006.
[24] S. Zhou. Thresholded Lasso for high dimensional variable selection and statistical estimation.arXiv:1002.1583v2, 2010.
[25] H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):14181429, 2006.
-----1
[1] Practice Fusion Diabetes Classification. http://www.kaggle.com/c/pf2012-diabetes, 2012. Kaggle competition dataset.
[2] P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Amer. J.of Mathematics, 37:17051732, 2009.
[3] P. Buhlmann. Statistical significance in high-dimensional linear models. arXiv:1202.1377, 2012.
[4] P. Buhlmann and S. van de Geer. Statistics for high-dimensional data. Springer-Verlag, 2011.
[5] E. Cande`s and Y. Plan. Near-ideal model selection by `1 minimization. The Annals of Statistics, 37(5A):21452177, 2009.
[6] E. J. Candes and T. Tao. Decoding by linear programming. IEEE Trans. on Inform. Theory, 51:4203 4215, 2005.
[7] S. Chen and D. Donoho. Examples of basis pursuit. In Proceedings of Wavelet Applications in Signal and Image Processing III, San Diego, CA, 1995.
[8] E. Greenshtein and Y. Ritov. Persistence in high-dimensional predictor selection and the virtue of over- parametrization. Bernoulli, 10:971988, 2004.
[9] A. Javanmard and A. Montanari. Confidence Intervals and Hypothesis Testing for High-Dimensional Regression. arXiv:1306.3171, 2013.
[10] A. Javanmard and A. Montanari. Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory. arXiv:1301.4240, 2013.
[11] A. Javanmard and A. Montanari. Nearly Optimal Sample Size in Hypothesis Testing for High- Dimensional Regression. arXiv:1311.0274, 2013.
[12] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):3037, August 2009.
[13] E. Lehmann and G. Casella. Theory of point estimation. Springer, 2 edition, 1998.
[14] E. Lehmann and J. Romano. Testing statistical hypotheses. Springer, 2005.
[15] R. Lockhart, J. Taylor, R. Tibshirani, and R. Tibshirani. A significance test for the lasso. arXiv preprint arXiv:1301.7161, 2013.
[16] M. Lustig, D. Donoho, J. Santos, and J. Pauly. Compressed sensing mri. IEEE Signal Processing Maga- zine, 25:7282, 2008.
[17] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso.Ann. Statist., 34:14361462, 2006.
[18] N. Meinshausen and P. Buhlmann. Stability selection. J. R. Statist. Soc. B, 72:417473, 2010.
[19] J. Minnier, L. Tian, and T. Cai. A perturbation method for inference on regularized regression estimates.Journal of the American Statistical Association, 106(496), 2011.
[20] S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statistical Science, 27(4):538557, 2012.
[21] J. Peng, J. Zhu, A. Bergamaschi, W. Han, D.-Y. Noh, J. R. Pollack, and P. Wang. Regularized multivari- ate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1):5377, 2010.
[22] M. Rudelson and S. Zhou. Reconstruction from anisotropic random measurements. IEEE Transactions on Information Theory, 59(6):34343447, 2013.
[23] T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika, 99(4):879898, 2012.
[24] R. Tibshirani. Regression shrinkage and selection with the Lasso. J. Royal. Statist. Soc B, 58:267288, 1996.
[25] S. van de Geer, P. Buhlmann, and Y. Ritov. On asymptotically optimal confidence regions and tests for high-dimensional models. arXiv:1303.0518, 2013.
[26] A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
[27] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming. IEEE Trans. on Inform. Theory, 55:21832202, 2009.
[28] L. Wasserman. All of statistics: a concise course in statistical inference. Springer Verlag, 2004.
[29] L. Wasserman and K. Roeder. High dimensional variable selection. Annals of statistics, 37(5A):2178, 2009.
[30] C.-H. Zhang and S. Zhang. Confidence Intervals for Low-Dimensional Parameters in High-Dimensional Linear Models. arXiv:1110.2563, 2011.
[31] P. Zhao and B. Yu. On model selection consistency of Lasso. The Journal of Machine Learning Research, 7:25412563, 2006.
-----1
[1] D. Benedetto, E. Caglioti, and V. Loreto. Language trees and zipping. PRL, 88(4):048702, 2002.
[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1 122, 2011.
[3] A. Bratko, B. Filipic?, G. V. Cormack, T. R. Lynam, and B. Zupan. Spam filtering using statistical data compression models. JMLR, 7:26732698, 2006.
[4] P. Brucker. An O(n) algorithm for quadratic knapsack problems. Operations Research Letters, 3(3):163 166, 1984.
[5] E. Cande`s, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted `1 minimization. J Fourier Analysis and Applications, 14(5-6):877905, 2008.
[6] R. Cilibrasi and P. M. Vitanyi. Clustering by compression. TIT, 51(4):15231545, 2005.
[7] E. Frank, C. Chui, and I. Witten. Text categorization using compression models. Technical Report 00/02, University of Waikato, Department of Computer Science, 2000.
[8] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordi- nate descent. J Stat Softw, 33(1):122, 2010.
[9] E. Gabrilovich and S. Markovitch. Text categorization with many redundant features: Using aggressive feature selection to make SVMs competitive with C4.5. In ICML, 2004.
[10] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. JMLR, 3:11571182, 2003.
[11] E. Keogh, S. Lonardi, and C. A. Ratanamahatana. Towards parameter-free data mining. In KDD, 2004.
[12] D. Kim, S. Sra, and I. S. Dhillon. Tackling box-constrained optimization via a new projected quasi-newton approach. SIAM Journal on Scientific Computing, 32(6):35483563, 2010.
[13] V. Kuleshov. Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration.In ICML, 2013.
[14] M. Lan, C. Tan, and H. Low. Proposing a new term weighting scheme for text categorization. In AAAI, 2006.
[15] K. Lang. Newsweeder: Learning to filter netnews. In ICML, 1995.
[16] H. Larochelle and Y. Bengio. Classification using discriminative restricted Boltzmann machines. In ICML, 2008.
[17] B. Li and C. Vogel. Improving multiclass text classification with error-correcting output coding and sub-class partitions. In Can Conf Adv Art Int, 2010.
[18] H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. TKDE, 17(4):491502, 2005.
[19] T. Liu, S. Liu, Z. Chen, and W. Ma. An evaluation on feature selection for text clustering. In ICML, 2003.
[20] A. Maas, R. Daly, P. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In ACL, 2011.
[21] H. S. Paskov, R. West, J. C. Mitchell, and T. J. Hastie. Supplementary material for Compressive Feature Learning, 2013.
[22] J. Rennie. 20 Newsgroups dataset, 2008. http://qwone.com/jason/20Newsgroups (accessed May 31, 2013).
[23] J. Rissanen. Modeling by shortest data description. Automatica, 14(5):465471, 1978.
[24] D. Sculley and C. E. Brodley. Compression and machine learning: A new perspective on feature space vectors. In DCC, 2006.
[25] R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58(1):267288, 1996.
[26] Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In ICML, 1997.
[27] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In NIPS, 2004.
[28] J. Ziv and A. Lempel. A universal algorithm for sequential data compression. TIT, 23(3):337343, 1977.
[29] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. JCGS, 15(2):265286, 2006.
-----1
[1] M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization.SIAM J. Computing, 17(4):848869, 1996.
[2] C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Claire Mathieu, editor, Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 968 977. SIAM, 2009.
[3] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction, February 2011. arXiv e-print (arXiv:1103.0995).
[4] A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney. Feature selection methods for text classification. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 230239. ACM, 2007.
[5] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Sparse features for PCA-like linear regression. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, NIPS, pages 22852293, 2011.
[6] V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 12071214. SIAM, 2012.
[7] Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu. Unsupervised feature selection using nonnegative spectral analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. AAAI Press, 2012.
[8] S. Zhang, H.S. Wong, Y. Shen, and D. Xie. A new unsupervised feature ranking method for gene expres- sion data based on consensus affinity. IEEE/ACM Transactions on Computational Biology and Bioinfor- matics, 9(4):12571263, July 2012.
[9] G. H. Golub and C. F. Van-Loan. Matrix computations. The Johns Hopkins University Press, third edition, 1996.
[10] P. Businger and G. H. Golub. Linear least squares solutions by Householder transformations. Numer.Math., 7:269276, 1965.
[11] A. Civril and M. Magdon-Ismail. Column subset selection via sparse approximation of SVD. Theoretical Computer Science, 421:114, March 2012.
[12] A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approxima- tions. In IEEE Symposium on Foundations of Computer Science, pages 370378, 1998.
[13] A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approxima- tions. Journal of the ACM, 51(6):10251041, 2004.
[14] A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(12):225247, 2006.
[15] A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset selection. In FOCS, pages 329338. IEEE Computer Society Press, 2010.
[16] M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697702, 2009.
[17] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:34413472, 2012.
[18] K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. arXiv e-print (arXiv:1207.6365v4), April 2013.
[19] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217288, 2011.
[20] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to algorithms. MIT Press and McGraw-Hill Book Company, third edition, 2009.
-----1
[1] Walter Rudin. Principles of mathematical analysis. McGraw-Hill, 3rd edition, 1976.
[2] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[3] Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Pro- gramming, Series B, 140:125161, 2013.
[4] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Structured sparsity through convex optimization. Statistical Science, 27(4):450468, 2012.
[5] Peng Zhao, Guilherme Rocha, and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):34683497, 2009.
[6] Seyoung Kim and Eric P. Xing. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genetics, 5(8):118, 2009.
[7] Naum Z. Shor. Minimization Methods for Non-Differentiable Functions. Springer, 1985.
[8] Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127152, 2005.
[9] Patrick L. Combettes and Jean-Christophe Pesquet. Proximal splitting methods in signal pro- cessing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pages 185212. Springer, 2011.
[10] Silvia Villa, Saverio Salzo, Luca Baldassarre, and Alessandro Verri. Accelerated and inexact forward-backward algorithms. SIAM Journal on Optimization, 23(3):16071633, 2013.
[11] Heinz H. Bauschke, Rafal Goebel, Yves Lucet, and Xianfu Wang. The proximal average: Basic theory. SIAM Journal on Optimization, 19(2):766785, 2008.
[12] Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67:91108, 2005.
[13] Xi Chen, Qihan Lin, Seyoung Kim, Jaime G. Carbonell, and Eric P. Xing. Smoothing proximal gradient method for general structured sparse regression. The Annals of Applied Statistics, 6 (2):719752, 2012.
[14] Ralph Tyrell Rockafellar and Roger J-B Wets. Variational Analysis. Springer, 1998.
[15] Jean J. Moreau. Proximite et dualtite dans un espace Hilbertien. Bulletin de la Societe Mathematique de France, 93:273299, 1965.
[16] Ralph Tyrrell Rockafellar. Monotone operators and the proximal point algorithm. SIAM Jour- nal on Control and Optimization, 14(5):877898, 1976.
[17] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator The- ory in Hilbert Spaces. Springer, 1st edition, 2011.
[18] Hua Ouyang, Niao He, Long Q. Tran, and Alexander Gray. Stochastic alternating direction method of multipliers. In International Conference on Machine Learning, 2013.
[19] Taiji Suzuki. Dual averaging and proximal gradient descent for online alternating direction multiplier method. In International Conference on Machine Learning, 2013.
[20] Yaoliang Yu. Fast Gradient Algorithms for Stuctured Sparsity. PhD thesis, University of Alberta, 2013.
-----1
[1] P. Buhlmann and S. van de Geer. Statistics for High-Dimensional Data. Springer, 2011.
[2] Y. Eldar and G. Kutyniok, editors. Compressed Sensing: Theory and Applications. Cambridge, 2012.
[3] J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. JMLR, 12:33713412, 2011.
[4] S. Kim and E. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, 2010.
[5] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding.JMLR, 12:22972334, 2011.
[6] G. Obozinski and F. Bach. Convex relaxation for combinatorial penalties. Technical Report HAL 00694765, 2012.
[7] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):34683497, 2009.
[8] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foun- dations and Trends in Machine Learning, 4(1):1106, 2012.
[9] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[10] Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140: 125161, 2013.
[11] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Convex and network flow optimization for structured sparsity. JMLR, 12:26812720, 2011.
[12] J. Mairal and B. Yu. Supervised feature selection in graphs with path coding penalties and network flows.JMLR, 14:24492485, 2013.
[13] M. Dudik, Z. Harchaoui, and J. Malick. Lifted coordinate descent for learning with trace-norm regular- izations. In AISTATS, 2012.
[14] X. Zhang, Y. Yu, and D. Schuurmans. Accelerated training for matrix-norm regularization: A boosting approach. In NIPS, 2012.
[15] S. Laue. A hybrid algorithm for convex semidefinite optimization. In ICML, 2012.
[16] B. Mishra, G. Meyer, F. Bach, and R. Sepulchre. Low-rank optimization with trace norm penalty. Tech- nical report, 2011. http://arxiv.org/abs/1112.2318.
[17] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127 152, 2005.
[18] G. Nowak, T. Hastie, J. R. Pollack, and R. Tibshirani. A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics, 12(4):776791, 2011.
[19] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
[20] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805849, 2012.
[21] F. Bach. Convex analysis and optimization with submodular functions: a tutorial. Technical Report HAL 00527714, 2010.
[22] W. Dinkelbach. On nonlinear fractional programming. Management Science, 13(7), 1967.
[23] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1st edition, 1986.
[24] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B, 67:91108, 2005.
[25] J. Friedman, T. Hastie, H. Hofling, and R. Tibshirani. Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302332, 2007.
[26] J. Liu, L. Yuan, and J. Ye. An efficient algorithm for a class of fused lasso problems. In Conference on Knowledge Discovery and Data Mining, 2010.
[27] M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697702, 2009.
[28] URL http://www.gems-system.or.
[29] URL http://spams-devel.gforge.inria.fr.
[30] URL http://drwn.anu.edu.au/index.html.
[31] M. Van De Vijver et al. A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347(25):19992009, 2002.
[32] H. Chuang, E. Lee, Y. Liu, D. Lee, and T. Ideker. Network-based classification of breast cancer metastasis.Molecular Systems Biology, 3(140), 2007.
-----1
[1] Y. Amit, M. Fink, N. Srebro, and S. Ullman. Uncovering Shared Structures in Multiclass Classification.In Proc. 24th ICML, pages 1724, 2007.
[2] A. Argyriou, T. Evgeniou, and M. Pontil. ConvexMultiTask Feature Learning. Mach. Learn., 73(3):243 272, 2008.
[3] A. Beck and M. Teboulle. A Fast Iterative ShrinkageThresholding Algorithm for Linear Inverse Prob- lems. SIAM J. Imaging Sci., 2(1):183202, 2009.
[4] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. MPSSIAM Series on Optimization. Society for Industrial and Applied Math- ematics, Philadelphia, Pennsylvania, 2001.
[5] M. Fazel, H. Hindi, and S. P. Boyd. A Rank Minimization Heuristic with Application to Minimum Order System Approximation. In Proc. 2001 ACC, pages 47344739, 2001.
[6] D. Gross. Recovering LowRank Matrices from Few Coefficients in Any Basis. IEEE Trans. Inf. Theory, 57(3):15481566, 2011.
[7] S. Ji, K.-F. Sze, Z. Zhou, A. M.-C. So, and Y. Ye. Beyond Convex Relaxation: A PolynomialTime NonConvex Optimization Approach to Network Localization. In Proc. 32nd IEEE INFOCOM, pages 24992507, 2013.
[8] S. Ji and J. Ye. An Accelerated Gradient Method for Trace Norm Minimization. In Proc. 26th ICML, pages 457464, 2009.
[9] V. Koltchinskii, K. Lounici, and A. B. Tsybakov. NuclearNorm Penalization and Optimal Rates for Noisy LowRank Matrix Completion. Ann. Stat., 39(5):23022329, 2011.
[10] Z.-Q. Luo and P. Tseng. Error Bounds and Convergence Analysis of Feasible Descent Methods: A General Approach. Ann. Oper. Res., 46(1):157178, 1993.
[11] S. Ma, D. Goldfarb, and L. Chen. Fixed Point and Bregman Iterative Methods for Matrix Rank Mini- mization. Math. Program., 128(12):321353, 2011.
[12] Yu. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Pub- lishers, Boston, 2004.
[13] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed MinimumRank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Rev., 52(3):471501, 2010.
[14] R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton Univer- sity Press, Princeton, New Jersey, 1997.
[15] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der mathematis- chen Wissenschaften. SpringerVerlag, Berlin Heidelberg, second edition, 2004.
[16] M. Schmidt, N. Le Roux, and F. Bach. Convergence Rates of Inexact ProximalGradient Methods for Convex Optimization. In Proc. NIPS 2011, pages 14581466, 2011.
[17] A. M.-C. So, Y. Ye, and J. Zhang. A Unified Theorem on SDP Rank Reduction. Math. Oper. Res., 33(4):910920, 2008.
[18] W. So. Facial Structures of Schatten pNorms. Linear and Multilinear Algebra, 27(3):207212, 1990.
[19] K.-C. Toh and S. Yun. An Accelerated Proximal Gradient Algorithm for Nuclear Norm Regularized Linear Least Squares Problems. Pac. J. Optim., 6(3):615640, 2010.
[20] R. Tomioka and K. Aihara. Classifying Matrices with a Spectral Regularization. In Proc. of the 24th ICML, pages 895902, 2007.
[21] R. Tomioka, T. Suzuki, M. Sugiyama, and H. Kashima. A Fast Augmented Lagrangian Algorithm for Learning LowRank Matrices. In Proc. 27th ICML, pages 10871094, 2010.
[22] P. Tseng. Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Opti- mization. Math. Program., 125(2):263295, 2010.
[23] P. Tseng and S. Yun. A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization.Math. Program., 117(12):387423, 2009.
[24] M. White, Y. Yu, X. Zhang, and D. Schuurmans. Convex MultiView Subspace Learning. In Proc. NIPS 2012, pages 16821690, 2012.
[25] H. Zhang, J. Jiang, and Z.-Q. Luo. On the Linear Convergence of a Proximal Gradient Method for a Class of Nonsmooth Convex Minimization Problems. J. Oper. Res. Soc. China, 1(2):163186, 2013.
-----0
C.J. Hsieh, K.W. Chang, C.J. Lin, S.S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, pages 408415, 2008.
Nicolas Le Roux, Mark Schmidt, and Francis Bach. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258, 2012.
Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, 2004.Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388, 2013.
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In International Conference on Machine Learning, pages 807814, 2007.
Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. arXiv preprint arXiv:1209.1873, 2012.T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004.
-----0
Alekh Agarwal and John C Duchi. Distributed delayed stochastic optimization. In Decision and 
Control (CDC), 2012 IEEE 51st Annual Conference on, pages 54515452. IEEE, 2012.Alekh Agarwal, Olivier Chapelle, Miroslav Dud?k, and John Langford. A reliable effective terascale linear learning system. arXiv preprint arXiv:1110.4198, 2011.
Maria-Florina Balcan, Avrim Blum, Shai Fine, and Yishay Mansour. Distributed learning, communication complexity and privacy. arXiv preprint arXiv:1204.3514, 2012.
Joseph K Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In ICML, 2011.
Andrew Cotter, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. arXiv preprint arXiv:1106.4574, 2011.
Hal Daume III, Jeff M Phillips, Avishek Saha, and Suresh Venkatasubramanian. Protocols for learning classifiers on distributed data. arXiv preprint arXiv:1202.6078, 2012.
Ofer Dekel. Distribution-calibrated hierarchical classification. In NIPS, 2010.Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. Optimal distributed online prediction using mini-batches. The Journal of Machine Learning Research, 13:165202, 2012.2It should be noted that one can use our results for Lipschitz functions as well by smoothing the loss function (see Nesterov [2005]). By doing so, we can interpolate between the 1/2 rate of non-accelerated method and the 1/ rate of accelerated gradient.3There are few exceptions in the context of stochastic coordinate descent in the primal. See for example Bradley et al. [2011], Richtarik and Takac? [2012b] 
John Duchi, Alekh Agarwal, and Martin J Wainwright. Distributed dual averaging in networks.Advances in Neural Information Processing Systems, 23, 2010.
Olivier Fercoq and Peter Richtarik. Smooth minimization of nonsmooth functions with parallel coordinate descent methods. arXiv preprint arXiv:1309.5885, 2013.
Nicolas Le Roux, Mark Schmidt, and Francis Bach. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258, 2012.
Phil Long and Rocco Servedio. Algorithms and hardness results for parallel large margin learning.In NIPS, 2011.
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010.
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716727, 2012.
Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103 (1):127152, 2005.Yurii Nesterov. Gradient methods for minimizing composite objective function, 2007.
Feng Niu, Benjamin Recht, Christopher Re, and Stephen J Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.
Peter Richtarik and Martin Takac?. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, pages 138, 2012a.
Peter Richtarik and Martin Takac?. Parallel coordinate descent methods for big data optimization.arXiv preprint arXiv:1212.0873, 2012b.Peter Richtarik and Martin Takac?. Distributed coordinate descent method for learning with big data.arXiv preprint arXiv:1310.2059, 2013.
Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14:567599, Feb 2013a.
Shai Shalev-Shwartz and Tong Zhang. Accelerated mini-batch stochastic dual coordinate ascent.arxiv, 2013b.
Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML, pages 807814, 2007.
Martin Takac, Avleen Bijral, Peter Richtarik, and Nathan Srebro. Mini-batch primal and dual methods for svms. arxiv, 2013.
-----1
[1] P. Auer and C. Gentile. Adaptive and self-confident online learning algorithms. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, 2000.
[2] P. Buhlmann and S. van de Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
[3] N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9):20502057, September 2004.
[4] J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2011.
[5] E. Hazan. The convex optimization approach to regret minimization. In Optimization for Machine Learning, chapter 10. MIT Press, 2012.
[6] J. Hiriart-Urruty and C. Lemarechal. Convex Analysis and Minimization Algorithms I & II.Springer, New York, 1996.
[7] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying malicious urls: An application of large-scale online learning. In Proceedings of the 26th International Conference on Machine Learning, 2009.
[8] C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[9] B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization.In Proceedings of the Twenty Third Annual Conference on Computational Learning Theory, 2010.
[10] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):15741609, 2009.
[11] Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Program- ming, 120(1):261283, 2009.
[12] F. Niu, B. Recht, C. Re, and S. Wright. Hogwild: a lock-free approach to parallelizing stochas- tic gradient descent. In Advances in Neural Information Processing Systems 24, 2011.
[13] P. Richtarik and M. Takac?. Parallel coordinate descent methods for big data optimization.arXiv:1212.0873 [math.OC], 2012. URL http://arxiv.org/abs/1212.0873.
[14] M. Takac?, A. Bijral, P. Richtarik, and N. Srebro. Mini-batch primal and dual methods for SVMs. In Proceedings of the 30th International Conference on Machine Learning, 2013.
-----1
[1] A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5):32353249, 2012.
[2] D. P. Bertsekas. A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4):913926, 1997.
[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[4] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[5] M. Friedlander and M. Schmidt. Hybrid deterministic-stochastic methods for data fitting. SIAM Journal on Scientific Computing, 34(3):A1380A1405, 2012.
[6] S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: a generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469 1492, 2012.
[7] A. Gittens and J. A. Tropp. Tail bounds for all eigenvalues of a sum of random matrices. ArXiv e-prints, arXiv:1104.4513, 2011.
[8] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York, 2009.
[9] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proceedings of the 24th Annual Conference on Learning Theory, pages 421436, 2011.
[10] A. Juditsky and Y. Nesterov. Primal-dual subgradient methods for minimizing uniformly convex func- tions. Technical report, 2010.
[11] G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133:365 397, 2012.
[12] K. Marti. On solutions of stochastic programming problems by descent procedures with stochastic and deterministic directions. Methods of Operations Research, 33:281293, 1979.
[13] K. Marti and E. Fuchs. Rates of convergence of semi-stochastic approximation procedures for solving stochastic optimization problems. Optimization, 17(2):243265, 1986.
[14] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):15741609, 2009.
[15] A. Nemirovski and D. B. Yudin. Problem complexity and method efficiency in optimization. John Wiley & Sons Ltd, 1983.
[16] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Doklady AN SSSR (translated as Soviet. Math. Docl.), 269:543547, 1983.
[17] Y. Nesterov. Introductory lectures on convex optimization: a basic course, volume 87 of Applied opti- mization. Kluwer Academic Publishers, 2004.
[18] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127 152, 2005.
[19] Y. Nesterov. Gradient methods for minimizing composite objective function. Core discussion papers, 2007.
[20] D. P. Palomar and Y. C. Eldar, editors. Convex Optimization in Signal Processing and Communications.2010, Cambridge University Press.
[21] A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning, pages 449456, 2012.
[22] N. L. Roux, M. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In Advances in Neural Information Processing Systems 25, pages 26722680, 2012.
[23] S. Shalev-Shwartz and T. Zhang. Stochastic dual coordinate ascent methods for regularized loss mini- mization. Journal of Machine Learning Research, 14:567599, 2013.
[24] S. Sra, S. Nowozin, and S. J. Wright, editors. Optimization for Machine Learning. The MIT Press, 2011.
[25] Q. Wu and D.-X. Zhou. Svm soft margin classifiers: Linear programming versus quadratic programming.Neural Computation, 17(5):11601187, 2005.
[26] L. Zhang, T. Yang, R. Jin, and X. He. O(log T ) projections for stochastic optimization of smooth and strongly convex functions. In Proceedings of the 30th International Conference on Machine Learning (ICML), pages 621629, 2013.
-----1
[1] A. Agarwal, P. L. Bartlett, P. D. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5):32353249, 2012.
[2] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3):167175, 2003.
[3] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, pages 161168, 2008.
[4] S. Boucheron, G. Lugosi, and O. Bousquet. Concentration inequalities. In Advanced Lectures on Machine Learning, pages 208240, 2003.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[6] R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu. Sample size selection in optimization methods for machine learning. Mathematical programming, 134(1):127155, 2012.
[7] A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In NIPS, pages 16471655, 2011.
[8] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. The Journal of Machine Learning Research, 13:165202, 2012.
[9] M. P. Friedlander and M. Schmidt. Hybrid deterministic-stochastic methods for data fitting.SIAM Journal on Scientific Computing, 34(3):A1380A1405, 2012.
[10] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimiza- tion. Machine Learning, 69(2-3):169192, 2007.
[11] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. Journal of Machine Learning Research - Proceedings Track, 19:421436, 2011.
[12] Q. Lin, X. Chen, and J. Pena. A smoothing stochastic gradient method for composite opti- mization. arXiv preprint arXiv:1008.5204, 2010.
[13] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. on Optimization, 19:15741609, 2009.
[14] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization.1983.
[15] Y. Nesterov. A method of solving a convex programming problem with convergence rate o (1/k2). In Soviet Mathematics Doklady, volume 27, pages 372376, 1983.
[16] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Aca- demic Publishers, 2004.
[17] Y. Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization, 16(1):235249, 2005.
[18] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127 152, 2005.
[19] A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In ICML, 2012.
[20] N. L. Roux, M. W. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In NIPS, pages 26722680, 2012.
[21] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807814, 2007.
[22] S. Shalev-Shwartz and T. Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. JMLR, 14:567599, 2013.
[23] O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Conver- gence results and optimal averaging schemes. ICML, 2013.
[24] L. Zhang, T. Yang, R. Jin, and X. He. O(logt) projections for stochastic optimization of smooth and strongly convex functions. ICML, 2013.
-----1
[1] F. B. Abdelaziz. Solution approaches for the multiobjective stochastic programming. European Journal of Operational Research, 216(1):116, 2012.
[2] F. B. Abdelaziz, B. Aouni, and R. E. Fayedh. Multi-objective stochastic programming for portfolio selection. European Journal of Operational Research, 177(3):18111823, 2007.
[3] A. Agarwal, P. L. Bartlett, P. D. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5):32353249, 2012.
[4] F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In NIPS, pages 451459, 2011.
[5] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization. Princeton University Press, 2009.
[6] S. Boucheron, G. Lugosi, and O. Bousquet. Concentration inequalities. In Advanced Lectures on Machine Learning, pages 208240, 2003.
[7] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[8] R. Caballero, E. Cerda, M. del Mar Munoz, and L. Rey. Stochastic approach versus multi- objective approach for obtaining efficient solutions in stochastic multiobjective programming problems. European Journal of Operational Research, 158(3):633648, 2004.
[9] M. Ehrgott. Multicriteria optimization. Springer, 2005.
[10] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. Journal of Machine Learning Research - Proceedings Track, 19:421436, 2011.
[11] K.-J. Hsiao, K. S. Xu, J. Calder, and A. O. H. III. Multi-criteria anomaly detection using pareto depth analysis. In NIPS, pages 854862, 2012.
[12] Y. Jin and B. Sendhoff. Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 38(3):397415, 2008.
[13] M. Mahdavi, R. Jin, and T. Yang. Trading regret for efficiency: online convex optimization with long term constraints. JMLR, 13:24652490, 2012.
[14] S. Mannor, J. N. Tsitsiklis, and J. Y. Yu. Online learning with sample path constraints. Journal of Machine Learning Research, 10:569590, 2009.
[15] H. Markowitz. Portfolio selection. The journal of finance, 7(1):7791, 1952.
[16] A. Nemirovski. Efficient methods in convex programming. Lecture Notes, Available at http://www2.isye.gatech.edu/ nemirovs, 1994.
[17] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. on Optimization, 19:15741609, 2009.
[18] A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In ICML, 2012.
[19] P. Rigollet and X. Tong. Neyman-pearson classification, convexity and stochastic constraints.The Journal of Machine Learning Research, 12:28312855, 2011.
[20] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107194, 2012.
[21] S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Stochastic convex optimization.In COLT, 2009.
[22] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807814, 2007.
[23] K. Sridharan. Learning from an optimization viewpoint. PhD Thesis, 2012.
[24] K. M. Svore, M. N. Volkovs, and C. J. Burges. Learning to rank with multiple objective functions. In WWW, pages 367376. ACM, 2011.
[25] H. Xu and F. Meng. Convergence analysis of sample average approximation methods for a class of stochastic mathematical programs with equality constraints. Mathematics of Opera- tions Research, 32(3):648668, 2007.
[26] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928936, 2003.
-----1
[1] J. B. Lasserre. A semidefinite programming approach to the generalized problem of moments.Math. Programming, 112:6592, 2008.
[2] A. Schied. Optimal investments for robust utility functionals in complete market models. Math.Oper. Research, 30(3):750764, 2005.
[3] E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with applications to data-driven problems. Operations Research, 2009.
[4] D. Bertsimas, X. V. Doan, K. Natarajan, and C.-P. Teo. Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res., 35(3):580602, 2010.
[5] S. Mehrotra and H. Zhang. Models and algorithms for distributionally robust least squares problems. Preprint, 2011.
[6] D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: a convex optimization approach. SIAM J. Optimization, 15:780804, 2000.
[7] P. R. Halmos. The theory of unbiased estimation. The Annals of Mathematical Statistics, 17(1):3443, 1946.
[8] B.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1998.
[9] L. Devroye and L. Gyrfi. Nonparametric Density Estimation. Wiley, 1985.
[10] P. Hall. On the rate of convergence of orthogonal series density estimators. Journal of the Royal Statistical Society. Series B, 48(1):115122, 1986.
[11] G. Pflug and D. Wozabal. Ambiguity in portfolio selection. Quantitative Finance, 7(4):435 442, 2007.
[12] A. Shapiro and S. Ahmed. On a class of minimax stochastic programs. SIAM J. Optim., 14(4):12371249, 2004.
[13] A. Ben-Tal, D. den Hertog, A. de Waegenaere, B. Melenerg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 2012.
[14] H. Xu and S. Mannor. Distributionally robust markov decision processes. Mathematics of Operations Research, 37(2):288300, 2012.
[15] R. Laraki and J. B. Lasserre. Semidefinite programming for min-max problems and games.Math. Programming A, 131:305332, 2010.
[16] P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Prob- ability, 18(3):12691283, 1990.
[17] B. J. Eck and M. Mevissen. Valve placement in water networks. Technical report, IBM Re- search, 2012. Report No. RC25307 (IRE1209-014).
[18] A. Sterling and A. Bargiela. Leakage reduction by optimised control of valves in water net- works. Transactions of the Institute of Measurement and Control, 6(6):293298, 1984.
[19] H. Waki, S. Kim, M. Kojima, M. Muramatsu, and H. Sugimoto. SparsePOP: a sparse semidef- inite programming relaxation of polynomial optimization problems. ACM Transactions on Mathematical Software, 35(2), 2008.
[20] M. Yamashita, K. Fujisawa, K. Nakata, M. Nakata, M. Fukuda, K. Kobayashi, and K. Goto.A high-performance software package for semidefinite programs: SDPA 7. Technical report, Tokyo Institute of Technology, 2010.
[21] A. Waechter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):2557, 2006.
[22] M. Schweighofer. Optimization of polynomials on compact semialgebraic sets. SIAM J. Opti- mization, 15:805825, 2005.
-----1
[1] I. U. Rahman, I. Drori, V. C. Stodden, and D. L. Donoho. Multiscale representations for manifold- valued data. SIAM J. Multiscale Model, 4:12011232, 2005.
[2] W.K. Allard, G. Chen, and M. Maggioni. Multiscale geometric methods for data sets II: geometric wavelets. Applied and Computational Harmonic Analysis, 32:435462, 2012.
[3] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:7987, 1991.
[4] W. X. Jiang and M. A. Tanner. Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Annals of Statistics, 27:9871011, 1999.
[5] J. Q. Fan, Q. W. Yao, and H. Tong. Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika, 83:189206, 1996.
[6] M. P. Holmes, G. A. Gray, and C. L. Isbell. Fast kernel conditional density estimation: a dual-tree Monte Carlo approach. Computational statistics & data analysis, 54:17071718, 2010.
[7] G. Fu, F. Y. Shih, and H. Wang. A kernel-based parametric method for conditional density estimation.Pattern recognition, 44:284294, 2011.
[8] D. J. Nott, S. L. Tan, M. Villani, and R. Kohn. Regression density estimation with variational methods and stochastic approximation. Journal of Computational and Graphical Statistics, 21:797820, 2012.
[9] M. N. Tran, D. J. Nott, and R. Kohn. Simultaneous variable selection and component selection for regression density estimation with mixtures of heteroscedastic experts. Electronic Journal of Statistics, 6:11701199, 2012.
[10] A. Norets and J. Pelenis. Bayesian modeling of joint and conditional distributions. Journal of Economet- rics, 168:332346, 2012.
[11] J. E. Griffin and M. F. J. Steel. Order-based dependent Dirichlet processes. Journal of the American Statistical Association, 101:179194, 2006.
[12] D. B. Dunson, N. Pillai, and J. H. Park. Bayesian density regression. Journal of the Royal Statistical Society Series B-Statistical Methodology, 69:163183, 2007.
[13] Y. Chung and D. B. Dunson. Nonparametric Bayes conditional distribution modeling with variable selec- tion. Journal of the American Statistical Association, 104:16461660, 2009.
[14] S. T. Tokdar, Y. M. Zhu, and J. K. Ghosh. Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Analysis, 5:319344, 2010.
[15] I. Mossavat and O. Amft. Sparse bayesian hierarchical mixture of experts. IEEE Statistical Signal Pro- cessing Workshop (SSP), 2011.
[16] Isabelle Guyon and Andre Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:11571182, 2003.
[17] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on Scientific Computing 20, 1:359392, 1999.
[18] G. Chen, M. Iwen, S. Chin, and M. Maggioni. A fast multiscale framework for data in high-dimensions: Measure estimation, anomaly detection, and compressive measurements. In VCIP, 2012 IEEE, 2012.
[19] Ingrid Daubechies. Ten Lectures on Wavelets (CBMS-NSF Regional Conference Series in Applied Math- ematics). SIAM: Society for Industrial and Applied Mathematics, 1992.
[20] J. Sethuraman. A constructive denition of Dirichlet priors. Statistica Sinica, 4:639650, 1994.
[21] Didier Chauveau and Jean Diebolt. An automated stopping rule for mcmc convergence assessment. Com- putational Statistics, 14:419442, 1998.
[22] R. Arden, R. S. Chavez, R. Grazioplene, and R. E. Jung. Neuroimaging creativity: a psychometric view.Behavioural brain research, 214:143156, 2010.
[23] R.E. Jung, R. Grazioplene, A. Caprihan, R.S. Chavez, and R.J.Haier. White matter integrity, creativity, and psychopathology: Disentangling constructs with diffusion tensor imaging. PloS one, 5(3):e9818, 2010.
[24] W.R. Gray, J.A. Bogovic, J.T. Vogelstein, B.A. Landman, J? L. Prince, and R.J. Vogelstein. Magnetic resonance connectome automated pipeline: an overview. IEEE pulse, 3(2):428, March 2010.
[25] Susumu Mori and Jiangyang Zhang. Principles of diffusion tensor imaging and its applications to basic neuroscience research. Neuron, 51(5):52739, September 2006.
[26] ABIDE. http://fcon 1000.projects.nitrc.org/indi/abide/.
[27] S. Sikka, J.T. Vogelstein, and M.P. Milham. Towards Automated Analysis of Connectomes: The Config- urable Pipeline for the Analysis of Connectomes (C-PAC). Neuroinformatics, 2012.
[28] Q-H. Zou, C-Z. Zhu, Y. Yang, X-N. Zuo, X-Y. Long, Q-J. Cao, Y-FW?ang, and Y-F. Zang. An improved approach to detection of amplitude of low-frequency fluctuation (ALFF) for resting-state fMRI: fractional ALFF. Journal of neuroscience methods, 172(1):137141, July 2008.
[29] J. D. Power, K. A. Barnes, C. J. Stone, and R. A. Olshen. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage, 59:21422154, 2012.
[30] Leo Breiman. Statistical Modeling : The Two Cultures. Statistical Science, 16(3):199231, 2001.
-----1
[1] G. Beer. Topologies on Closed and Closed Convex Sets. Springer, 1993.
[2] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):13731396, 2003.
[3] Y. Bengio, O. Delalleau, N.L. Roux, J.F. Paiement, P. Vincent, and M. Ouimet. Learning eigenfunctions links spectral embedding and kernel pca. Neural Computation, 16(10):21972219, 2004.
[4] Y. Bengio, J.F. Paiement, and al. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Advances in neural information processing systems, 16:177184, 2004.
[5] S. Bernstein. The Theory of Probabilities. Gastehizdat Publishing House, Moscow, 1946.
[6] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis.Machine Learning, 66(2):259294, 2007.
[7] I. Borg and P.J.F. Groenen. Modern multidimensional scaling: Theory and applications. Springer, 2005.
[8] Ernesto De Vito, Lorenzo Rosasco, and al. Learning sets with separating kernels. arXiv:1204.3573, 2012.
[9] Ernesto De Vito, Lorenzo Rosasco, and Alessandro Toigo. Spectral regularization for support estimation.Advances in Neural Information Processing Systems, NIPS Foundation, pages 19, 2010.
[10] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high- dimensional data. Proceedings of the National Academy of Sciences, 100(10):55915596, 2003.
[11] J. Ham, D.D. Lee, S. Mika, and B. Scholkopf. A kernel view of the dimensionality reduction of manifolds.In Proceedings of the twenty-first international conference on Machine learning, page 47. ACM, 2004.
[12] I. Jolliffe. Principal component analysis. Wiley Online Library, 2005.
[13] Andreas Maurer and Massimiliano Pontil. Kdimensional coding schemes in hilbert spaces. IEEE Trans- actions on Information Theory, 56(11):58395846, 2010.
[14] Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces. The Annals of Probability, pages 16791706, 1994.
[15] J.R. Retherford. Hilbert Space: Compact Operators and the Trace Theorem. London Mathematical Society Student Texts. Cambridge University Press, 1993.
[16] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):23232326, 2000.
[17] L.K. Saul and S.T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional mani- folds. The Journal of Machine Learning Research, 4:119155, 2003.
[18] B. Scholkopf, A. Smola, and K.R. Muller. Kernel principal component analysis. Artificial Neural Networks-ICANN97, pages 583588, 1997.
[19] J. Shawe-Taylor, C. K. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the gram matrix and the generalization error of kernel-pca. Information Theory, IEEE Transactions on, 51(7), 2005.
[20] I. Steinwart and A. Christmann. Support vector machines. Information science and statistics. Springer- Verlag. New York, 2008.
[21] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing markov process on a graph and a connection to a maximum variance unfolding problem. SIAM review, 48(4):681699, 2006.
[22] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear dimension- ality reduction. Science, 290(5500):23192323, 2000.
[23] J.A. Tropp. User-friendly tools for random matrices: An introduction. 2012.
[24] K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semidefinite programming.In Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 2, pages II988. IEEE, 2004.
[25] K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semidefinite programming.International Journal of Computer Vision, 70(1):7790, 2006.
[26] C.K.I. Williams. On a connection between kernel pca and metric multidimensional scaling. Machine Learning, 46(1):1119, 2002.
-----1
[1] F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. In Proceedings of the 22nd international conference on Machine learning - ICML 05, pages 3340, New York, New York, USA, 2005. ACM Press.
[2] E. D. Boer and P. Kuyper. Triggered Correlation, 1968.
[3] N. Brenner, W. Bialek, and R. De Ruyter Van Steveninck. Adaptive rescaling maximizes information transmission. Neuron, 26(3):695702, 2000.
[4] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Comput. Neural Syst, 12:199213, 2001.
[5] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 5(1):7399, 2004.
[6] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. Annals of Statistics, 37(4):18711905, 2009.
[7] A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola. A kernel method for the two sample problem. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 513-520, Cambridge, MA, 2007. MIT Press.
[8] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, and A. Smola. A Kernel Two-Sample Test.Journal of Machine Learning Research, 13:723773, 2012.
[9] A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring Statistical Dependence with Hilbert- Schmidt Norms. In S. Jain, H. U. Simon, and E. Tomita, editors, Advances in Neural Information Pro- cessing Systems, pages 6377. Springer Berlin / Heidelberg, 2005.
[10] A. Gretton, K. Fukumizu, Z. Harchaoui, and B. K. Sriperumbudur. A Fast, Consistent Kernel Two-Sample Test. In Y Bengio, D Schuurmans, J Lafferty, C K I Williams, and A Culotta, editors, Advances in Neural Information Processing Systems, pages 673681. Curran, Red Hook, NY, USA, 2009.
[11] J. D. Hunter. Matplotlib: A 2D graphics environment. Computing In Science & Engineering, 9(3):9095, 2007.
[12] J. Macke, G. Zeck, and M. Bethge. Receptive Fields without Spike-Triggering. Advances in Neural Information Processing Systems 20, pages 18, 2007.
[13] J. H. Manton. Optimization algorithms exploiting unitary constraints. Signal Processing, IEEE Transac- tions on, 50(3):635650, 2002.
[14] P. Z. Marmarelis and K. Naka. White-noise analysis of a neuron chain: an application of the Wiener theory. Science, 175(27):12761278, 1972.
[15] P McCullagh and J A Nelder. Generalized Linear Models, Second Edition. Chapman and Hall, 1989.
[16] T. P. Minka. Old and New Matrix Algebra Useful for Statistics. MIT Media Lab Note, pages 119, 2000.
[17] A. Muller. Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability, 29(2):429443, 1997.
[18] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15(4):243262, 2004.
[19] J. W. Pillow and E. P. Simoncelli. Dimensionality reduction in neural models: an information-theoretic generalization of spike-triggered average and covariance analysis. Journal of Vision, 6(4):414428, 2006.
[20] H. Scheich, T. H. Bullock, and R. H Hamstra. Coding properties of two classes of afferent nerve fibers: high-frequency electroreceptors in the electric fish, Eigenmannia. Journal of Neurophysiology, 36(1):39 60, 1973.
[21] B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Opti- mization, and Beyond, volume 98 of Adaptive computation and machine learning. MIT Press, 2001.
[22] T. Sharpee, N. C. Rust, and W. Bialek. Analyzing neural responses to natural signals: maximally infor- mative dimensions. Neural Computation, 16(2):223250, 2004.
[23] A. Smola, A. Gretton, L. Song, and B. Scholkopf. A Hilbert Space Embedding for Distribu- tions. In Algorithmic Learning Theory: 18th International Conference, pages 1331. Springer-Verlag, Berlin/Heidelberg, 2007.
[24] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Feature selection via dependence maximiza- tion. Journal of Machine Learning Research, 13(May):13931434, 2012.
[25] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, and G. R. G. Lanckriet. On Integral Probability Metrics, phi-divergences and binary classification. Technical Report 1, arXiv, 2009.
[26] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Scholkopf. Injective Hilbert Space Embeddings of Probability Measures. In Proceedings of the 21st Annual Conference on Learning Theory, number i, pages 111122. Omnipress, 2008.
[27] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Scholkopf, and G.R. G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11(1):48, 2010.
[28] R. S. Williamson, M. Sahani, and J. W. Pillow. Equating information-theoretic and likelihood-based methods for neural dimensionality reduction. Technical Report 1, arXiv, 2013.
-----1
[1] E. J. Cande`s and T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory, 51:4203, 2005.
[2] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52:1289, 2006.
[3] B. C. Ng and C. M. S. See. Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation, 44(6):827835, 1996.
[4] Z. Yang, C. Zhang, and L. Xie. Robustly stable signal recovery in compressed sensing with structured matrix perturbation. IEEE Transactions on Signal Processing, 60(9):46584671, 2012.
[5] R. Mignot, L. Daudet, and F. Ollivier. Compressed sensing for acoustic response reconstruction: Interpo- lation of the early part. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 225228, 2011.
[6] T. Ragheb, J. N Laska, H. Nejati, S. Kirolos, R. G Baraniuk, and Y. Massoud. A prototype hardware for random demodulation based compressive analog-to-digital conversion. In 51st Midwest Symposium on Circuits and Systems (MWSCAS), pages 3740. IEEE, 2008.
[7] J. A Tropp, J. N. Laska, M. F. Duarte, J. K Romberg, and R. G. Baraniuk. Beyond nyquist: Efficient sampling of sparse bandlimited signals. IEEE Trans. Inform. Theory, 56(1):520544, 2010.
[8] P. J. Pankiewicz, T. Arildsen, and T. Larsen. Model-based calibration of filter imperfections in the random demodulator for compressive sensing. arXiv:1303.6135, 2013.
[9] R. Gribonval, G. Chardon, and L. Daudet. Blind calibration for compressed sensing by convex optimiza- tion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2713  2716, 2012.
[10] C. Bilen, G. Puy, R. Gribonval, and L. Daudet. Blind sensor calibration in sparse recovery using convex optimization. In 10th Int. Conf. on Sampling Theory and Applications, 2013.
[11] D. L. Donoho and J. Tanner. Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci., 102(27):94469451, 2005.
[12] D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proc.Natl. Acad. Sci., 106(45):1891418919, 2009.
[13] D.L. Donoho, A. Maleki, and A. Montanari. Message passing algorithms for compressed sensing: I.motivation and construction. In IEEE Information Theory Workshop (ITW), pages 1 5, 2010.
[14] S. Rangan. Generalized approximate message passing for estimation with random linear mixing. In Proc.of the IEEE Int. Symp. on Inform. Theory (ISIT), pages 2168 2172, 2011.
[15] F. Krzakala, M. Mezard, F. Sausset, Y.F. Sun, and L. Zdeborova. Statistical physics-based reconstruction in compressed sensing. Phys. Rev. X, 2:021005, 2012.
[16] D. L. Donoho, A. Javanmard, and A. Montanari. Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. In Proc. of the IEEE Int. Symposium on Informa- tion Theory (ISIT), pages 12311235, 2012.
[17] J. Boutros and G. Caire. Iterative multiuser joint decoding: Unified framework and asymptotic analysis.IEEE Trans. Inform. Theory, 48(7):17721793, 2002.
[18] Y. Kabashima. A cdma multiuser detection algorithm on the basis of belief propagation. J. Phys. A: Math.and Gen., 36(43):11111, 2003.
[19] U. S. Kamilov, A. Bourquard, E. Bostan, and M. Unser. Autocalibrated signal reconstruction from linear measurements using adaptive gamp. online preprint, 2013.
[20] F. Krzakala, M. Mezard, and L. Zdeborova. Phase diagram and approximate message passing for blind calibration and dictionary learning. ISIT 2013, arXiv:1301.5898, 2013.
[21] F. Krzakala, M. Mezard, F. Sausset, Y.F. Sun, and L. Zdeborova. Probabilistic reconstruction in com- pressed sensing: Algorithms, phase diagrams, and threshold achieving matrices. J. Stat. Mech., P08009, 2012.
[22] http://aspics.krzakala.org/.
[23] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.0 beta.http://cvxr.com/cvx, 2012.
-----1
[BC13] A. Belloni and V. Chernozhukov, Least Squares after Model Selection in High-Dimensional Sparse Models, Bernoulli (2013).
[BdG11] P. Buhlmann and S. Van de Geer, Statistics for high-dimensional data, Springer-Verlag Berlin Heidelberg, 2011.
[BEM13] M. Bayati, M. A. Erdogdu, and A. Montanari, Estimating LASSO Risk and Noise Level, long version (in preparation), 2013.
[BM12a] M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Trans. on Inform. Theory 57 (2012), 764785.
[BM12b] , The LASSO risk for gaussian matrices, IEEE Trans. on Inform. Theory 58 (2012).
[BRT09] P. Bickel, Y. Ritov, and A. Tsybakov, Simultaneous Analysis of Lasso and Dantzig Selector, The Annals of Statistics 37 (2009), 17051732.
[BS05] Z. Bai and J. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, Springer, 2005.
[BT09] A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM J. Imaging Sciences 2 (2009), 183202.
[BY93] Z. D. Bai and Y. Q. Yin, Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covari- ance Matrix, The Annals of Probability 21 (1993), 12751294.
[CD95] S.S. Chen and D.L. Donoho, Examples of basis pursuit, Proceedings of Wavelet Applications in Signal and Image Processing III (San Diego, CA), 1995.
[CRT06] E. Ca`ndes, J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59 (2006), 12071223.
[CT07] E. Ca`ndes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n, Annals of Statistics 35 (2007), 23132351.
[DMM09] D. L. Donoho, A. Maleki, and A. Montanari, Message Passing Algorithms for Compressed Sens- ing, Proceedings of the National Academy of Sciences 106 (2009), 1891418919.
[DMM11] , The noise-sensitivity phase transition in compressed sensing, Information Theory, IEEE Transactions on 57 (2011), no. 10, 69206941.
[FGH12] J. Fan, S. Guo, and N. Hao, Variance estimation using refitted cross-validation in ultrahigh di- mensional regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012), 14679868.
[JM13] A. Javanmard and A. Montanari, Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory, preprint available in arxiv:1301.4240, 2013.
[Joh12] I. Johnstone, Gaussian estimation: Sequence and wavelet models, Book draft, 2012.
[MB06] N. Meinshausen and P. Buhlmann, High-dimensional graphs and variable selection with the lasso, The Annals of Statistics 34 (2006), no. 3, 14361462.
[NSvdG10] P. Buhlmann N. Stadler and S. van de Geer, `1-penalization for Mixture Regression Models (with discussion), Test 19 (2010), 209285.
[RFG09] S. Rangan, A. K. Fletcher, and V. K. Goyal, Asymptotic analysis of map estimation via the replica method and applications to compressed sensing, 2009.
[SJKG07] M. Lustig S. Boyd S. J. Kim, K. Koh and D. Gorinevsky, An Interior-Point Method for Large-Scale l1-Regularized Least Squares, IEEE Journal on Selected Topics in Signal Processing 4 (2007), 606617.
[Ste81] C. Stein, Estimation of the mean of a multivariate normal distribution, The Annals of Statistics 9 (1981), 11351151.
[SZ12] T. Sun and C. H. Zhang, Scaled sparse linear regression, Biometrika (2012), 120.
[Tib96] R. Tibshirani, Regression shrinkage and selection with the lasso, J. Royal. Statist. Soc B 58 (1996), 267288.
[Wai09] M. J. Wainwright, Sharp thresholds for high-dimensional and noisy sparsity recovery using `1 constrained quadratic programming, Information Theory, IEEE Transactions on 55 (2009), no. 5, 21832202.
[ZY06] P. Zhao and B. Yu, On model selection consistency of Lasso, The Journal of Machine Learning Research 7 (2006), 25412563.
-----1
[1] J. Yedidia, W. Freeman, and Y. Weiss, Constructing free-energy approximations and general- ized belief propagation algorithms, IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2282  2312, 2005.
[2] T. J. Richardson and R. L. Urbanke, Modern Coding Theory. Cambridge University Press, 2008.
[3] M. Mezard and A. Montanari, Information, physics, and computation, ser. Oxford Graduate Texts. Oxford: Oxford Univ. Press, 2009.
[4] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, vol. 1, no. 1, pp. 1305, 2008.
[5] J. Gonzalez, Y. Low, and C. Guestrin. Residual splash for optimally parallelizing belief propa- gation, in International Conference on Artificial Intelligence and Statistics, 2009.
[6] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, GraphLab: A New Parallel Framework for Machine Learning, in Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
[7] M. Bayati, D. Shah, and M. Sharma, Max-product for maximum weight matching: Conver- gence, correctness, and lp duality, IEEE Transactions on Information Theory, vol. 54, no. 3, pp. 1241 1251, 2008.
[8] S. Sanghavi, D. Malioutov, and A. Willsky, Linear Programming Analysis of Loopy Belief Propagation for Weighted Matching, in Neural Information Processing Systems (NIPS), 2007 
[9] B. Huang, and T. Jebara, Loopy belief propagation for bipartite maximum weight b-matching, in Artificial Intelligence and Statistics (AISTATS), 2007.
[10] M. Bayati, C. Borgs, J. Chayes, R. Zecchina, Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions, SIAM Journal in Discrete Math, vol. 25, pp. 9891011, 2011.
[11] S. Sanghavi, D. Shah, and A. Willsky, Message-passing for max-weight independent set, in Neural Information Processing Systems (NIPS), 2007.
[12] D. Gamarnik, D. Shah, and Y. Wei, Belief propagation for min-cost network flow: conver- gence & correctness, in SODA, pp. 279292, 2010.
[13] J. Edmonds, Paths, trees, and flowers, Canadian Journal of Mathematics, vol. 3, pp. 449 467, 1965.
[14] G. Dantzig, R. Fulkerson, and S. Johnson, Solution of a large-scale traveling-salesman prob- lem, Operations Research, vol. 2, no. 4, pp. 393410, 1954.
[15] K. Chandrasekaran, L. A. Vegh, and S. Vempala. The cutting plane method is polynomial for perfect matchings, in Foundations of Computer Science (FOCS), 2012 
[16] R. G. Gallager, Low Density Parity Check Codes, MIT Press, Cambridge, MA, 1963.
[17] Y. Weiss, Belief propagation and revision in networks with loops, MIT AI Laboratory, Tech- nical Report 1616, 1997.
[18] B. J. Frey, and R. Koetter, Exact inference using the attenuated max-product algorithm, Ad- vanced Mean Field Methods: Theory and Practice, ed. Manfred Opper and David Saad, MIT Press, 2000.
[19] Y. Weiss, and W. T. Freeman, On the Optimality of Solutions of the MaxProduct BeliefProp- agation Algorithm in Arbitrary Graphs, IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 736744. 2001.
[20] M. Grotschel, and O. Holland, Solving matching problems with linear programming, Math- ematical Programming, vol. 33, no. 3, pp. 243259. 1985.
[21] J. Shin, A.E. Gelfand, and M. Chertkov, A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles, arXiv preprint arXiv:1306.1167 (2013).
-----1
[1] C. M. Kreucher, A. O. Hero, and K. D. Kastella. An information-based approach to sensor management in large dynamic networks. Proc. IEEE, Special Issue on Modeling, Identificia- tion, & Control of Large-Scale Dynamical Systems, 95(5):978999, May 2007.
[2] H.-L. Choi and J. P. How. Continuous trajectory planning of mobile sensors for informative forecasting. Automatica, 46(8):12661275, 2010.
[3] V. Chandrasekaran, N. Srebro, and P. Harsha. Complexity of inference in graphical models. In Proc. Uncertainty in Artificial Intelligence, 2008.
[4] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[5] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algo- rithm. IEEE Transactions on Information Theory, 47(2):498519, Feb 2001.
[6] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models.In Proc. Uncertainty in Artificial Intelligence (UAI), 2005.
[7] G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:489498, 1978.
[8] J. L. Williams, J. W. Fisher III, and A. S. Willsky. Performance guarantees for information theoretic active inference. In M. Meila and X. Shen, editors, Proc. Eleventh Int. Conf. on Artificial Intelligence and Statistics, pages 616623, 2007.
[9] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 2nd ed. edition, 2006.
[10] Y. Weiss and W. T. Freeman. Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13(10):21732200, 2001.
[11] D. M. Malioutov, J. K. Johnson, and A. S. Willsky. Walk-sums and belief propagation in Gaussian graphical models. Journal of Machine Learning Research, 7:20312064, 2006.
[12] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky. Feedback message passing for inference in gaussian graphical models. IEEE Transactions on Signal Processing, 60(8):4135 4150, Aug 2012.
[13] C. Ko, J. Lee, and M. Queyranne. An exact algorithm for maximum entropy sampling. Oper- ations Research, 43:684691, 1995.
[14] A. Krause and C. Guestrin. Optimal value of information in graphical models. Journal of Artificial Intelligence Research, 35:557591, 2009.
[15] P. L. Erdo?s, M. A. Steel, L. A. Szekely, and T. J. Warnow. A few logs suffice to build (almost) all trees: Part ii. Theoretical Computer Science, 221:77118, 1999.
[16] M. J. Choi, V. Y. F. Tan, A. Anandkumar, and A. S. Willsky. Learning latent tree graphical models. Journal of Machine Learning Research, 12:17711812, May 2011.
[17] CERN - European Organization for Nuclear Research. Colt, 1999.
-----0
Das, Abhimanyu and Kempe, David. Algorithms for subset selection in linear regression. In Proceedings of the 40th annual ACM symposium on Theory of computing, pp. 4554. ACM, 2008.
Friedland, S and Gaubert, S. Submodular spectral functions of principal submatrices of a hermitian matrix, extensions and applications. Linear Algebra and its Applications, 2011.
Garnett, Roman, Krishnamurthy, Yamuna, Xiong, Xuehan, Schneider, Jeff, and Mann, Richard.Bayesian optimal active search and surveying. In ICML, 2012.Ji, Ming and Han, Jiawei. A variance minimization criterion to active learning on graphs. In AISTAT, 2012.
Krause, Andreas, Singh, Ajit, and Guestrin, Carlos. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research (JMLR), 9:235284, February 2008.
Martin, Shawn, Brown, W Michael, Klavans, Richard, and Boyack, Kevin W. Openord: an opensource toolbox for large graph layout. In IS&T/SPIE Electronic Imaging, pp. 786806786806.International Society for Optics and Photonics, 2011.
Nemhauser, George L, Wolsey, Laurence A, and Fisher, Marshall L. An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming, 14(1):265294, 1978.Rasmussen, Carl Edward and Williams, Christopher KI. Gaussian processes for machine learning, volume 1. MIT press Cambridge, MA, 2006.Settles, Burr. Active learning literature survey. University of Wisconsin, Madison, 2010.
Walker, David A. Suppressor variable (s) importance within a regression model: an example of salary compression from career services. Journal of College Student Development, 44(1):127 133, 2003.Wu, Xiao-Ming, Li, Zhenguo, So, Anthony Man-Cho, Wright, John, and Chang, Shih-Fu. Learning with partially absorbing random walks. In Advances in Neural Information Processing Systems 25, pp. 30863094, 2012.Zhu, Xiaojin and Ghahramani, Zoubin. Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.Zhu, Xiaojin, Lafferty, John, and Ghahramani, Zoubin. Combining active learning and semisupervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 5865, 2003.
-----1
[1] B. Settles, Active learning., in Morgan & Claypool, 2012. 1 
[2] D. Jones, A taxonomy of global optimization methods based on response surfaces., Journal of Global Optimization, vol. 21, pp. 345383, 2001. 1, 3 
[3] E. Brochu, M. Cora, and N. de Freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, Tech R., 2009. 1, 3 
[4] L. D. Stone, The theory of optimal search, 1975. 1, 8 
[5] B. O. Koopman, The theory of search. i. kinematic bases., Operations Research, vol. 4, 1956. 1, 8 
[6] J. Najemnik and W. S. Geisler, Optimal eye movement strategies in visual search., Nature, 2005. 1, 8 
[7] W. B. Powell and I. O. Ryzhov in Optimal Learning. (J. Wiley and Sons., eds.), 2012. 1 
[8] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, 1972. 1 
[9] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning., in MIT., 2006. 1, 3 
[10] J. A. Nelder and R. Mead, A simplex method for function minimization, Computer Journa, 1965. 3 
[11] J. Kennedy and R. Eberhart, Particle swarm optimization., in proc. IEEE ICNN., 1995. 3 
[12] J. H. Holland, Adoption in natural and artificial systems., 1975. 3 
[13] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. 1989. 3 
[14] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing., Science., 1983. 3 
[15] D. Finkel, DIRECT Optimization Algorithm User Guide. 2003. 3 
[16] A. Moore, J. Schneider, J. Boyan, and M. S. Lee, Q2: Memory-based active learning for optimizing noisy continuous functions., in ICML, pp. 386394, 1998. 3 
[17] D. Lewis and W. Gale, A sequential algorithm for training text classifiers., in In Proc. ACM SIGIR Conference on Research and Development in Information Retreival, 1994. 3 
[18] H. J. Kushner, A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise., J. Basic Engineering, vol. 86, pp. 97106, 1964. 3 
[19] J. Elder, Global rd optimization when probes are expensive: the grope algorithm, in IEEE International Conference on Systems, Man and Cybernetics, 1992. 3 
[20] J. Mockus, V. Tiesis, and A. Zilinskas, The application of bayesian methods for seeking the extremum., in Towards Global Optimization. (L. Dixon and E. Szego, eds.), 1978. 3 
[21] M. Locatelli, Bayesian algorithms for one-dimensional global optimization., Journal of Global Opti- mization., vol. 10, pp. 5776, 1997. 3 
[22] D. D. Cox and S. John, A statistical method for global optimization, in In Proc. IEEE Conference on Systems, Man and Cybernetics, 1992. 3 
[23] M. Osborne, R. Garnett, and S. Roberts, Gaussian processes for global optimization, in LION3, 2009. 3 
[24] T. L. Griffiths, C. Lucas, J. J. Williams, and M. L. Kalish, Modeling human function learning with gaussian processes., in NIPS, 2009. 4, 8 
[25] B. R. Gibson, X. Zhu, T. T. Rogers, C. Kalish, and J. Harrison, Humans learn using manifolds, reluc- tantly, in NIPS, 2010. 8 
[26] J. D. Carroll, Functional learning: The learning of continuous functional mappings relating stimulus and response continua., in Education Testing Service, Princeton, NJ, 1963. 8 
[27] K. Koh and D. E. Meyer, Function learning: Induction of continuous stimulus-response relations, Jour- nal of Experimental Psychology: Learning, Memory, and Cognition, vol. 17, pp. 811836, 1991. 8 
[28] J. Najemnik and G. Geisler, Eye movement statistics in humans are consistent with an optimal search strategy, Journal of Vision, vol. 8, no. 3, pp. 114, 2008. 8 
[29] W. S. Geisler and R. L. Diehl, A bayesian approach to the evolution of perceptual and cognitive systems., Cogn. Sci., vol. 27, pp. 379402, 2003. 8 
[30] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu, Human active learning, NIPS, 2008. 8 
[31] P. V. J. Palmer and M. Pavel, The psychophysics of visual search., Vision Research., vol. 40, 2000. 8 
[32] J. M. Wolfe in Attention. (H. E. S. in Attention (ed. Pashler, H.) 13-74 (Psychology Press, ed.), 1998. 8 
[33] K. Nakayama and P. Martini, Situating visual search, Vision Research, vol. 51, pp. 15261537, 2011. 8 
[34] M. P. Eckstein, Visual search: A retrospective., Journal of Vision, vol. 11, no. 5, 2011. 8 
[35] A. Borji and L. Itti, State-of-the-art in modeling visual attention., IEEE PAMI, 2012. 8 
[36] A. C. Kamil, J. R. Krebs, and H. R. Pulliam, Foraging Behavior. (ed) 1987 (New York: Plenum). 8 
-----1
[1] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem. Machine Learning, 2002.
[2] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem.SIAM J. on Computing, 2002.
[3] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning and Games. Cambridge University Press, 2006.
[4] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 1994.
[5] D. Cohn, L. Atlas, R. Ladner, M. El-Sharkawi, R. Marks II, M. Aggoune, and D. Park. Training connec- tionist networks with queries and selective sampling. In Proc. NIPS, 1990.
[6] D. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. J. of Artificial Intelligence Research, 1996.
[7] A. Culotta and A. McCallum. Reducing labeling effort for structured prediction tasks. In Proc. AAAI, 2005.
[8] A. Farhangfar, R. Greiner, and C. Szepesvari. Learning to Segment from a Few Well-Selected Training Images. In Proc. ICML, 2009.
[9] A. Fathi, M. F. Balcan, X. Ren, and J. M. Rehg. Combining Self Training and Active Learning for Video Segmentation. In Proc. BMVC, 2011.
[10] T. Hazan and A. Shashua. Norm-Product Belief Propagation: Primal-Dual Message-Passing for LP- Relaxation and Approximate-Inference. Trans. Information Theory, 2010.
[11] T. Hazan and R. Urtasun. A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction. In Proc. NIPS, 2010.
[12] V. Hedau, D. Hoiem, and D. A. Forsyth. Recovering the Spatial Layout of Cluttered Rooms . In Proc.ICCV, 2009.
[13] D. Hoiem, A. A. Efros, and M. Hebert. Recovering Surface Layout from an Image. IJCV, 2007.
[14] A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell. Active Learning with Gaussian Processes for Object Categorization . In Proc. ICCV, 2007.
[15] P. Kohli and P. Torr. Measuring Uncertainty in Graph Cut Solutions -Efficiently Computing Min-marginal Energies using Dynamic Graph Cuts. In Proc. ECCV, 2006.
[16] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML, 2001.
[17] D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces. In Proc. NIPS, 2010.
[18] D. C. Lee, M. Hebert, and T. Kanade. Geometric Reasoning for Single Image Structure Recovery. In Proc. CVPR, 2009.
[19] D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proc. ICML, 1994.
[20] D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proc. Research and Devel- opment in Info. Retrieval, 1994.
[21] T. Mensink, J. Verbeek, and G. Csurka. Learning Structured Prediction Models for Interactive Image Labeling. In Proc. CVPR, 2011.
[22] D. Roth and K. Small. Margin-based Active Learning for Structured Output Spaces. In Proc. ECML, 2006.
[23] C. Rother, V. Kolmogorov, and A. Blake. GrabCut Interactive Foreground Extraction using Iterated Graph Cuts. In Proc. SIGGRAPH, 2004.
[24] N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction.In Proc. ICML, 2001.
[25] T. Scheffer, C. Decomain, and S. Wrobel. Active hidden Markov models for information extraction. In Proc. Intl Conf. Advances in Intelligent Data Analysis, 2001.
[26] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Distributed Message Passing for Large Scale Graphical Models. In Proc. CVPR, 2011.
[27] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient Structured Prediction for 3D Indoor Scene Understanding. In Proc. CVPR, 2012.
[28] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient Structured Prediction with Latent Variables for General Graphical Models. In Proc. ICML, 2012.
[29] B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In Proc. NIPS, 2008.
[30] P. Shivaswamy and T. Joachims. Online Structured Prediction via Coactive Learning. In Proc. ICML, 2012.
[31] B. Siddiquie and A. Gupta. Beyond Active Noun Tagging: Modeling Contextual Interactions for Multi- Class Active Learning. In Proc. CVPR, 2010.
[32] S. Tong and D. Koller. Support vector machine active learning with applications to text classification.JMLR, 2001.
[33] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large Margin Methods for Structured and Interdependent Output Variables. JMLR, 2005.
[34] A. Vezhnevets, V. Ferrari, and J. M. Buhmann. Active Learning for Semantic Segmentation with Expected Change. In Proc. CVPR, 2012.
[35] S. Vijayanarasimhan and K. Grauman. Cost-Sensitive Active Visual Category Learning. IJCV, 2010.
[36] S. Vijayanarasimhan and K. Grauman. Active Frame Selection for Label Propagation in Videos. In Proc.ECCV, 2012.
[37] A. L. Yuille and A. Rangarajan. The Concave-Convex Procedure. Neural Computation, 2003.
-----1
[1] Dimitris Achlioptas and FrankMcsherry. Fast computation of low-rank matrix approximations.Journal of the ACM (JACM), 54(2):9, 2007.
[2] Laura Balzano, Benjamin Recht, and Robert Nowak. High-dimensional matched subspace detection when data are missing. In Information Theory Proceedings (ISIT), 2010 IEEE Inter- national Symposium on, pages 16381642. IEEE, 2010.
[3] Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 968977. Society for Industrial and Applied Mathematics, 2009.
[4] Jian-Feng Cai, Emmanuel J Cande`s, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):19561982, 2010.
[5] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925936, 2010.
[6] Emmanuel J Cande`s and Benjamin Recht. Exact matrix completion via convex optimization.Foundations of Computational mathematics, 9(6):717772, 2009.
[7] Emmanuel J Cande`s and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5):20532080, 2010.
[8] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Coherent matrix completion. arXiv preprint arXiv:1306.2979, 2013.
[9] Mark A Davenport and Ery Arias-Castro. Compressive binary search. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 18271831. IEEE, 2012.
[10] Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2:225247, 2006.
[11] Silvia Gandy, Benjamin Recht, and Isao Yamada. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2):025010, 2011.
[12] Alex Gittens. The spectral norm error of the naive nystrom extension. arXiv preprint arXiv:1110.5305, 2011.
[13] David Gross. Recovering low-rank matrices from few coefficients in any basis. Information Theory, IEEE Transactions on, 57(3):15481566, 2011.
[14] Venkatesan Guruswami and Ali Kemal Sinop. Optimal column-based low-rank matrix re- construction. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 12071214. SIAM, 2012.
[15] Jarvis D Haupt, Richard G Baraniuk, Rui M Castro, and Robert D Nowak. Compressive distilled sensing: Sparse recovery using adaptivity in compressive measurements. In Signals, Systems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on, pages 15511555. IEEE, 2009.
[16] Jun He, Laura Balzano, and John Lui. Online robust subspace tracking from partial informa- tion. arXiv preprint arXiv:1109.3827, 2011.
[17] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries. The Journal of Machine Learning Research, 99:20572078, 2010.
[18] Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling methods for the nystrom method. The Journal of Machine Learning Research, 98888:9811006, 2012.
[19] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. The annals of Statistics, 28(5):13021338, 2000.
[20] Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 2012.
[21] Benjamin Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research, 7777777:34133430, 2011.
[22] Ryota Tomioka, Kohei Hayashi, and Hisashi Kashima. Estimation of low-rank tensors via convex optimization. arXiv preprint arXiv:1010.0789, 2010.
[23] Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, and Hisashi Kashima. Statistical performance of convex tensor decomposition. In Advances in Neural Information Processing Systems, pages 972980, 2011.
[24] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
-----1
[1] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235256, 2002.
[2] Sandilya Bhamidipati, Branislav Kveton, and S. Muthukrishnan. Minimal interaction search: Multi-way search with item categories. In Proceedings of AAAI Workshop on Intelligent Tech- niques for Web Personalization and Recommendation, 2013.
[3] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge Uni- versity Press, New York, NY, 2006.
[4] Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In Advances in Neural Infor- mation Processing Systems 17, pages 337344, 2005.
[5] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in ac- tive learning and stochastic optimization. Journal of Artificial Intelligence Research, 42:427 486, 2011.
[6] Andrew Guillory and Jeff Bilmes. Online submodular set cover, ranking, and repeated active learning. In Advances in Neural Information Processing Systems 24, pages 11071115, 2011.
[7] Andrew Guillory and Jeff Bilmes. Simultaneous learning and covering with adversarial noise.In Proceedings of the 28th International Conference on Machine Learning, pages 369376, 2011.
[8] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:15631600, 2010.
[9] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining, pages 137146, 2003.
[10] Andreas Krause, Ajit Paul Singh, and Carlos Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9:235284, 2008.
[11] Shyong Lam and Jon Herlocker. MovieLens 1M Dataset. http://www.grouplens.org/node/12, 2012.
[12] Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, 2008.
[13] Remi Munos. The optimistic principle applied to games, optimization, and planning: Towards foundations of Monte-Carlo tree search. Foundations and Trends in Machine Learning, 2012.
[14] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maxi- mizing submodular set functions - I. Mathematical Programming, 14(1):265294, 1978.
[15] Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[16] Zheng Wen, Branislav Kveton, Brian Eriksson, and Sandilya Bhamidipati. Sequential Bayesian search. In Proceedings of the 30th International Conference on Machine Learning, pages 977 983, 2013.
[17] Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversified retrieval. In Advances in Neural Information Processing Systems 24, pages 24832491, 2011.
-----0
M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning (ICML), pages 6572, 2006.
P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463482, 2002.
A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 4956. ACM, 2009.
J. Blocki, N. Christin, A. Dutta, and A. Sinha. Regret minimizing audits: A learning-theoretic basis for privacy protection. In Proceedings of 24th IEEE Computer Security Foundations Symposium, 2011.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the VapnikChervonenkis dimension. Journal of the ACM, 36(4):929965, Oct. 1989.D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15:201221, 1994.S. Dasgupta. Analysis of a greedy active learning strategy. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 337344. MIT Press, Cambridge, MA, 2005.S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In J. Platt, 
D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 353360. MIT Press, Cambridge, MA, 2008.
L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. Pattern Recognition, 28(7):10111018, 1995.U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4): 634652, 1998.D. Golovin and A. Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 42:427486, 2011.A. Gonen, S. Sabato, and S. Shalev-Shwartz. Efficient active learning of halfspaces: an aggressive approach. In The 30th International Conference on Machine Learning (ICML), 2013.
S. Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on Machine learning, pages 353360. ACM, 2007a.S. Hanneke. Teaching dimension and the complexity of active learning. In Learning Theory, pages 6681. Springer, 2007b.
S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333361, 2011.L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):1517, May 1976.A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: Guiding supervised learning with decision-theoretic active learning. In Proceedings of IJCAI, 2007.M. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):9831006, 1998.S. R. Kulkarni, S. K. Mitter, and J. N. Tsitsiklis. Active learning using arbitrary binary valued queries. Machine Learning, 11(1):2335, 1993.P. M. Long and L. Tan. PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learning, 30(1):721, 1998.
D. D. Margineantu. Active cost-sensitive learning. In Proceedings of IJCAI, 2007.S. Sabato, A. D. Sarwate, and N. Srebro. Auditing: Active learning with outcome-dependent query costs. arXiv preprint arXiv:1306.2347, 2013.B. Settles, M. Craven, and L. Friedlan. Active learning with real annotation costs. In Proceedings of the NIPS Workshop on Cost-Sensitive Learning, 2008.
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, XVI(2):264280, 1971.
-----1
[1] V. S. Sheng and C. X. Ling. Feature value acquisition in testing: a sequential batch test algorithm. In Proceedings of the 23rd international conference on Machine learning, 2006.
[2] S. Chakraborty, V. Balasubramanian, and S. Panchanathan. An optimization based framework for dynamic batch mode active learning. In Advances in Neural Information Processing, 2010.
[3] S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. Jour- nal of Machine Learning Research, 10:281299, 2009.
[4] S. Dasgupta. Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems 18, 2005.
[5] M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proc. of the 23rd International Conference on Machine Learning, 2006.
[6] S. Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th International Conference on Machine Learning, 2007.
[7] S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333361, 2011.
[8] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201221, 1994.
[9] V. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.
[10] M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
[11] E. Mammen and A.B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27:18081829, 1999.
[12] P. Massart and E. Nedelec. Risk bounds for statistical learning. The Annals of Statistics, 34(5):23262366, 2006.
[13] V. Koltchinskii. Local rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6):25932656, 2006.
[14] S. Hanneke. Activized learning: Transforming passive to active with improved label complex- ity. Journal of Machine Learning Research, 13(5):14691587, 2012.
[15] M.-F. Balcan, A. Broder, and T. Zhang. Margin based active learning. In Proceedings of the 20th Conference on Learning Theory, 2007.
[16] C. H. Papadimitriou and M. Sipser. Communication complexity. Journal of Computer and System Sciences, 28(2):260269, 1984.
[17] P. Harsha, Y. Ishai, Joe Kilian, Kobbi Nissim, and Srinivasan Venkatesh. Communication versus computation. In The 31st International Colloquium on Automata, Languages and Pro- gramming, pages 745756, 2004.
-----1
[1] Andrew McCallum and Kamal Nigam. Employing EM and Pool-Based Active Learning for Text Classi- fication. In International Conference on Machine Learning (ICML), pages 350358, 1998.
[2] Daniel Golovin and Andreas Krause. Adaptive Submodularity: Theory and Applications in Active Learn- ing and Stochastic Optimization. Journal of Artificial Intelligence Research, 42(1):427486, 2011.
[3] Yuxin Chen and Andreas Krause. Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization. In International Conference on Machine Learning (ICML), pages 160168, 2013.
[4] Constantino Tsallis and Edgardo Brigatti. Nonextensive statistical mechanics: A brief introduction. Con- tinuum Mechanics and Thermodynamics, 16(3):223235, 2004.
[5] Steven CH Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Batch Mode Active Learning and Its Appli- cation to Medical Image Classification. In International Conference on Machine learning (ICML), pages 417424. ACM, 2006.
[6] John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In International Conference on Machine Learning (ICML), pages 282289, 2001.
[7] Bassem Sayrafi, Dirk Van Gucht, and Marc Gyssens. The implication problem for measure-based con- straints. Information Systems, 33(2):221239, 2008.
[8] G.L. Nemhauser and L.A. Wolsey. Best Algorithms for Approximating the Maximum of a Submodular Set Function. Mathematics of Operations Research, 3(3):177188, 1978.
[9] Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Ex- traction. Advances in Neural Information Processing Systems (NIPS), 17:11851192, 2004.
[10] Viet Cuong Nguyen, Nan Ye, Wee Sun Lee, and Hai Leong Chieu. Semi-Markov Conditional Random Field with High-Order Features. In ICML Workshop on Structured Sparsity: Learning and Inference, 2011.
[11] Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 Shared Task: Language- Independent Named Entity Recognition. In Proceedings of the 17th Conference on Natural Language Learning (HLT-NAACL 2003), pages 142147, 2003.
[12] Burr Settles and Mark Craven. An Analysis of Active Learning Strategies for Sequence Labeling Tasks.In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 10701079. As- sociation for Computational Linguistics, 2008.
[13] Thorsten Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categoriza- tion. Technical report, DTIC Document, 1996.
[14] Burr Settles. Active Learning Literature Survey. Technical Report 1648, University of Wisconsin- Madison, 2009.
[15] Robert Nowak. Noisy Generalized Binary Search. Advances in Neural Information Processing Systems (NIPS), 22:13661374, 2009.
[16] Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-Optimal Bayesian Active Learning with Noisy Observations. In Advances in Neural Information Processing Systems (NIPS), pages 766774, 2010.
-----1
[1] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge Uni- versity Press, 2006.
[2] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[3] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1), 2008.
[4] S. Lauritzen. Graphical Models. Oxford University Press, 1996.
[5] T. Hazan and T. Jaakkola. On the partition function and random maximum a-posteriori pertur- bations. Proceedings of the 29th International Conference on Machine Learning, 2012.
[6] J. K. Johnson, V. Chandrasekaran, and A. S. Willsky. Learning markov structure by maximum entropy relaxation. In 11th International Conference in Artificial Intelligence and Statistics (AISTATS 2007), 2007.
[7] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky. Maximum entropy relaxation for graph- ical model selection given inconsistent statistics. In IEEE Statistical Signal Processing Work- shop (SSP 2007), 2007.
[8] D. Sontag and T. Jaakkola. New outer bounds on the marginal polytope. In Neural Information Processing Systems (NIPS), 2007.
[9] L. G. Khachiyan. A polynomial algorithm in linear programming. Soviet Mathematics Dok- lady, 20(1):191194, 1979.
[10] A. Ben-Tal and A. Nemirovski. Optimization iii. Lecture notes, 2012.
[11] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Opti- mization. Springer, 1988. Second Edition, 1993.
[12] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge, 2004.
[13] M. Singh and N. Vishnoi. Entropy, optimization and counting. arXiv, (1304.8108), 2013.
-----1
[1] J. N. Darroch, Steffen L. Lauritzen, and T. P. Speed. Markov fields and log-linear interaction models for contingency tables. The Annals of Statistics, 8:522539, 1980.
[2] Steffen L. Lauritzen and Nanny Wermuth. Graphical models for associations between vari- ables, some of which are qualitative and some quantitative. The Annals of Statistics, 17:3157, 1989.
[3] Jukka Corander. Bayesian graphical model determination using decision theory. Journal of Multivariate Analysis, 85:253266, 2003.
[4] Jukka Corander, Magnus Ekdahl, and Timo Koski. Parallel interacting MCMC for learning of topologies of graphical models. Data Mining and Knowledge Discovery, 17:431456, 2008.
[5] Petros Dellaportas and Jonathan J. Forster. Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika, 86:615633, 1999.
[6] Paolo Giudici and Robert Castello. Improving Markov chain Monte Carlo model search for data mining. Machine Learning, 50:127158, 2003.
[7] Paolo Giudici and Peter J. Green. Decomposable graphical Gaussian model determination.Biometrika, 86:785801, 1999.
[8] Mikko Koivisto and Kismat Sood. Exact Bayesian structure discovery in Bayesian networks.Journal of Machine Learning Research, 5:549573, 2004.
[9] David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using Occams window. Journal of the American Statistical Association, 89:15351546, 1994.
[10] Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. Inducing features of random fields. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(4):380393, 1997.
[11] Su-In Lee, Varun Ganapathi, and Daphne Koller. Efficient structure learning of Markov net- works using L1-regularization. In Advances in Neural Information Processing Systems 19, pages 817824. MIT Press, 2006.
[12] M. Schmidt, A. Niculescu-Mizil, and K. Murphy. Learning graphical model structure using L1-regularization paths. In Proceedings of the National Conference on Artificial Intelligence, page 1278. AAAI Press / MIT Press, 2007.
[13] Holger Hofling and Robert Tibshirani. Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10:883906, 2009.
[14] Armin Biere, Marijn J. H. Heule, Hans van Maaren, and Toby Walsh, editors. Handbook of Satisfiability. IOS Press, 2009.
[15] Chu Min Li and Felip Manya`. MaxSAT, Hard and Soft Constraints, chapter 19, pages 613631.In Biere et al. [14], 2009.
[16] Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Satisfiability Modulo Theories, chapter 26, pages 825885. In Biere et al. [14], 2009.
[17] Gerhard Brewka, Thomas Eiter, and Miroslaw Truszczynski. Answer set programming at a glance. Commun. ACM, 54(12):92103, 2011.
[18] Joe Whittaker. Graphical models in applied multivariate statistics. Wiley Publishing, 1990.
[19] Martin C. Golumbic. Algorithmic Graph Theory and Perfect Graphs. Academic Press, 1980.
[20] Ronald L. Graham and Pavol Hell. On the history of the minimum spanning tree problem.Annals of the History of Computing, 7(1):4357, 1985.
[21] Yukio Shibata. On the tree representation of chordal graphs. Journal of Graph Theory, 12(3):421428, 1988.
[22] Finn V. Jensen and Frank Jensen. Optimal junction trees. In Proceedings of the Tenth Confer- ence on Uncertainty in Artificial Intelligence (UAI-94), pages 360366, 1994.
[23] Carsten Sinz. Towards an optimal CNF encoding of Boolean cardinality constraints. In Prin- ciples and Practice of Constraint Programming  CP 2005, number 3709 in Lecture Notes in Computer Science, pages 827831. Springer-Verlag, 2005.
[24] Daniel Le Berre and Anne Parrain. The Sat4j library, release 2.2 system description. Journal on Satisfiability, Boolean Modeling and Computation, 7:5964, 2010.
[25] Ruben Martins, Vasco Manquinho, and Ines Lynce. Parallel search for maximum satisfiability.AI Communications, 25:7595, 2012.
[26] Roberto Sebastiani and Silvia Tomasi. Optimization in SMT with LA(Q) cost functions. In Automated Reasoning, volume 7364 of LNCS, pages 484498. Springer-Verlag, 2012.
[27] Martin Gebser, Benjamin Kaufmann, and Torsten Schaub. Conflict-driven answer set solving: From theory to practice. Artif. Intell., 187:5289, 2012.
[28] Martin Gebser, Benjamin Kaufmann, Ramon Otero, Javier Romero, Torsten. Schaub, and Philipp Wanko. Domain-specific heuristics in answer set programming. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press, 2013.
[29] Tomi Janhunen and Ilkka Niemela. Compact translations of non-disjunctive answer set pro- grams to propositional clauses. In Gelfond Festschrift, Vol. 6565 of LNCS, pages 111130.Springer, 2011.
[30] James Cussens. Bayesian network learning by compiling to weighted MAX-SAT. In Proceed- ings of the Conference on Uncertainty in Artificial Intelligence, pages 105112, 2008.
[31] James Cussens. Bayesian network learning with cutting planes. In Proceedings of the Twenty- Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), pages 153160. AUAI Press, 2011.
[32] Mark Bartlett and James Cussens. Advances in Bayesian network learning using integer pro- gramming. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013), pages 182191. AUAI Press, 2013.
-----1
[1] A. U. Asuncion, Q. Liu, A. T. Ihler, and P. Smyth. Particle filtered MCMC-MLE with connections to contrastive divergence. In ICML, 2010.
[2] O. Banerjee, L. El Ghaoui, and A. dAspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 9:485516, June 2008.
[3] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden Markov model. In NIPS, 2002.
[4] J. Besag. Statistical analysis of non-lattice data. JRSS-D, 24(3):179195, 1975.
[5] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:9931022, 2003.
[6] S. P. Chatzis and G. Tsechpenakis. The infinite hidden Markov random field model. In ICCV, 2009.
[7] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209 230, 1973.
[8] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the infinite hidden Markov model. In ICML, 2008.
[9] A. Gelman and J. Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York, 2007.
[10] S. J. Gershman and D. M. Blei. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1):112, 2012.
[11] C. J. Geyer. Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics, pages 156163, 1991.
[12] M. Gutmann and J. Hirayama. Bregman divergence as general framework to estimate unnormalized statistical models. In UAI, pages 283290, Corvallis, Oregon, 2011. AUAI Press.
[13] M. Gutmann and A. Hyvarinen. Noise-contrastive estimation: A new estimation principle for unnormal- ized statistical models. In AISTATS, 2010.
[14] L. A. Hannah, D. M. Blei, and W. B. Powell. Dirichlet process mixtures of generalized linear models.JMLR, 12:19231953, 2011.
[15] G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:17711800, 2002.
[16] A. Hyvarinen. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Neural Networks, IEEE Transactions on, 18(5):15291531, 2007.
[17] A. Hyvarinen. Some extensions of score matching. Computational statistics & data analysis, 51(5):2499 2512, 2007.
[18] S. Lyu. Unifying non-maximum likelihood learning objectives with minimum KL contraction. NIPS, 2011.
[19] M. Meila. Comparing clusterings by the variation of information. In COLT, 2003.
[20] J. Mller, A. Pettitt, R. Reeves, and K. Berthelsen. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451458, 2006.
[21] I. Murray, Z. Ghahramani, and D. J. C. MacKay. MCMC for doubly-intractable distributions. In UAI, 2006.
[22] R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computa- tional and Graphical Statistics, 9(2):249265, 2000.
[23] P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning.Springer, 2010.
[24] K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML, 2012.
[25] J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random structures and Algorithms, 9(1-2):223252, 1996.
[26] C. E. Rasmussen. The infinite Gaussian mixture model. In NIPS, 2000.
[27] C. E. Rasmussen and Z. Ghahramani. Infinite mixtures of Gaussian process experts. In NIPS, 2001.
[28] B. Shahbaba and R. Neal. Nonlinear models using Dirichlet process mixtures. JMLR, 10:18291850, 2009.
[29] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML, 2008.
[30] T. Tieleman and G. Hinton. Using fast weights to improve persistent contrastive divergence. In ICML, 2009.
[31] D. Vickrey, C. Lin, and D. Koller. Non-local contrastive objectives. In Proc. of the International Confer- ence on Machine Learning. Citeseer, 2010.
[32] J. Zhu, N. Chen, and E. P. Xing. Infinite latent SVM for classification and multi-task learning. In NIPS, 2011.
[33] J. Zhu, N. Chen, and E. P. Xing. Infinite SVM: a Dirichlet process mixture of large-margin kernel ma- chines. In ICML, 2011.
[34] S. C. Zhu and X. Liu. Learning in Gibbsian fields: How accurate and how fast can it be? IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 24:10011006, 2002.
-----1
[1] Alexandre Bouchard-Cote and Michael I Jordan. Optimization of structured mean field objec- tives. In AUAI, pages 6774, 2009.
[2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts.PAMI, 2001.
[3] L.A. Goldberg and M. Jerrum. The complexity of ferromagnetic ising with local fields. Com- binatorics Probability and Computing, 16(1):43, 2007.
[4] E.J. Gumbel and J. Lieblein. Statistical theory of extreme values and some practical applica- tions: a series of lectures, volume 33. US Govt. Print. Office, 1954.
[5] T. Hazan and T. Jaakkola. On the partition function and random maximum a-posteriori pertur- bations. In Proceedings of the 29th International Conference on Machine Learning, 2012.
[6] T. Hazan, S. Maji, Keshet J., and T. Jaakkola. Learning efficient random maximum a-posteriori predictors with non-decomposable loss functions. Advances in Neural Information Processing Systems, 2013.
[7] M. Jerrum and A. Sinclair. Polynomial-time approximation algorithms for the ising model.SIAM Journal on computing, 22(5):10871116, 1993.
[8] M.I. Jordan, Z. Ghahramani, T.S. Jaakkola, and L.K. Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183233, 1999.
[9] J. Keshet, D. McAllester, and T. Hazan. Pac-bayesian approach for minimization of phoneme error rate. In ICASSP, 2011.
[10] Pushmeet Kohli and Philip HS Torr. Measuring uncertainty in graph cut solutionsefficiently computing min-marginal energies using dynamic graph cuts. In ECCV, pages 3043. 2006.
[11] D. Koller and N. Friedman. Probabilistic graphical models. MIT press, 2009.
[12] S. Kotz and S. Nadarajah. Extreme value distributions: theory and applications. World Scien- tific Publishing Company, 2000.
[13] A. Kulesza and B. Taskar. Structured determinantal point processes. In Proc. Neural Informa- tion Processing Systems, 2010.
[14] Qiang Liu and Alexander T Ihler. Negative tree reweighted belief propagation. arXiv preprint arXiv:1203.3494, 2012.
[15] Francesco Orabona, Tamir Hazan, Anand D Sarwate, and Tommi. Jaakkola. On measure con- centration of random maximum a-posteriori perturbations. arXiv:1310.4227, 2013.
[16] G. Papandreou and A. Yuille. Gaussian sampling by local perturbations. In Proc. Int. Conf. on Neural Information Processing Systems (NIPS), pages 18581866, December 2010.
[17] G. Papandreou and A. Yuille. Perturb-and-map random fields: Using discrete optimization to learn and sample from energy models. In ICCV, Barcelona, Spain, November 2011.
[18] Nicholas Ruozzi. The bethe partition function of log-supermodular graphical models. arXiv preprint arXiv:1202.6035, 2012.
[19] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In Conf. Uncertainty in Artificial Intelligence (UAI), 2008.
[20] E.B. Sudderth, M.J. Wainwright, and A.S. Willsky. Loop series and Bethe variational bounds in attractive graphical models. Advances in neural information processing systems, 20, 2008.
[21] D. Tarlow, R.P. Adams, and R.S. Zemel. Randomized optimum models for structured predic- tion. In Proceedings of the 15th Conference on Artificial Intelligence and Statistics, 2012.
[22] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition function. Trans. on Information Theory, 51(7):23132335, 2005.
[23] Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning, pages 11211128. ACM, 2009.
[24] T. Werner. High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (map-mrf). In CVPR, pages 18, 2008.
[25] J. Zhang, H. Liang, and F. Bai. Approximating partition functions of the two-state spin system.Information Processing Letters, 111(14):702710, 2011.
-----1
[1] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, 1989.
[2] J. Besag. Statistical Analysis of Non-Lattice Data. The Statistician, 24:179195, 1975.
[3] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[4] Hei Chan and Adnan Darwiche. On the revision of probabilistic beliefs using uncertain evi- dence. AIJ, 163:6790, 2005.
[5] Arthur Choi, Khaled S. Refaat, and Adnan Darwiche. EDML: A method for learning parame- ters in Bayesian networks. In UAI, 2011.
[6] Adnan Darwiche. A differential approach to inference in Bayesian networks. JACM, 50(3):280305, 2003.
[7] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:138, 1977.
[8] Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelli- gence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Clay- pool Publishers, 2009.
[9] Radim Jirousek and Stanislav Preucil. On the effective implementation of the iterative propor- tional fitting procedure. Computational Statistics & Data Analysis, 19(2):177189, 1995.
[10] Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques.MIT Press, 2009.
[11] S. L. Lauritzen. The EM algorithm for graphical association models with missing data. Com- putational Statistics and Data Analysis, 19:191201, 1995.
[12] D. C. Liu and J. Nocedal. On the Limited Memory BFGS Method for Large Scale Optimiza- tion. Mathematical Programming, 45(3):503528, 1989.
[13] Kevin Patrick Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
[14] James Park and Adnan Darwiche. A differential semantics for jointree algorithms. AIJ, 156:197216, 2004.
[15] Stephen Della Pietra, Vincent J. Della Pietra, and John D. Lafferty. Inducing features of random fields. IEEE Trans. Pattern Anal. Mach. Intell., 19(4):380393, 1997.
[16] Khaled S. Refaat, Arthur Choi, and Adnan Darwiche. New advances and theoretical insights into EDML. In UAI, pages 705714, 2012.
-----1
[1] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput., 16(5):11901208, 1995.
[2] Mary Kathryn Cowles and Bradley P. Carlin. Markov chain monte carlo convergence diag- nostics: A comparative review. Journal of the American Statistical Association, 91:883904, 1996.
[3] John C. Duchi, Alekh Agarwal, Mikael Johansson, and Michael I. Jordan. Ergodic mirror descent. SIAM Journal on Optimization, 22(4):15491578, 2012.
[4] Martin E. Dyer, Leslie Ann Goldberg, and Mark Jerrum. Matrix norms and rapid mixing for spin systems. Ann. Appl. Probab., 19:71107, 2009.
[5] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721741, 1984.
[6] Amir Globerson and Tommi Jaakkola. Approximate inference using planar graph decomposi- tion. In NIPS, pages 473480, 2006.
[7] Firas Hamze and Nando de Freitas. From fields to trees. In UAI, 2004.
[8] Thomas P. Hayes. A simple condition implying rapid mixing of single-site dynamics on spin systems. In FOCS, pages 3946, 2006.
[9] Tamir Hazan and Amnon Shashua. Convergent message-passing algorithms for inference over general graphs with convex free energies. In UAI, pages 264273, 2008.
[10] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov chains and mixing times.American Mathematical Society, 2006.
[11] Eyal Lubetzky and Allan Sly. Critical Ising on the square lattice mixes in polynomial time.Commun. Math. Phys., 313(3):815836, 2012.
[12] Thomas Minka. Divergence measures and message passing. Technical report, 2005.
[13] Yuval Peres and Peter Winkler. Can extra updates delay mixing? arXiv/1112.0603, 2011.
[14] C. Peterson and J. R. Anderson. A mean field theory learning algorithm for neural networks.Complex Systems, 1:9951019, 1987.
[15] Patrick Pletscher, Cheng S. Ong, and Joachim M. Buhmann. Spanning Tree Approximations for Conditional Random Fields. In AISTATS, 2009.
[16] Lawrence K. Saul and Michael I. Jordan. Exploiting tractable substructures in intractable networks. In NIPS, pages 486492, 1995.
[17] Charles Sutton and Andrew Mccallum. Piecewise training for structured prediction. Machine Learning, 77:165194, 2009.
[18] Robert H. Swendsen and Jian-Sheng Wang. Nonuniversal critical dynamics in monte carlo simulations. Phys. Rev. Lett., 58:8688, Jan 1987.
[19] Martin Wainwright, Tommi Jaakkola, and Alan Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):23132335, 2005.
[20] Eric P. Xing, Michael I. Jordan, and Stuart Russell. A generalized mean field algorithm for variational inference in exponential families. In UAI, 2003.
[21] Jonathan Yedidia, William Freeman, and Yair Weiss. Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51:22822312, 2005.
-----1
[1] N.N. Madras. Lectures on Monte Carlo Methods. American Mathematical Society, 2002.ISBN 0821829785.
[2] M. Jerrum and A. Sinclair. The Markov chain Monte Carlo method: an approach to approxi- mate counting and integration. Approximation algorithms for NP-hard problems, pages 482 520, 1997.
[3] Mihir Bellare, Oded Goldreich, and Erez Petrank. Uniform generation of NP-witnesses using an NP-oracle. Information and Computation, 163(2):510526, 2000.
[4] Stefano Ermon, Carla P. Gomes, and Bart Selman. Uniform solution sampling using a con- straint solver as an oracle. In UAI, pages 255264, 2012.
[5] C.P. Gomes, A. Sabharwal, and B. Selman. Near-uniform sampling of combinatorial spaces using XOR constraints. In NIPS-2006, pages 481488, 2006.
[6] S. Chakraborty, K. Meel, and M. Vardi. A scalable and nearly uniform generator of SAT witnesses. In CAV-2013, 2013.
[7] Vibhav Gogate and Pedro Domingos. Approximation by quantization. In UAI, pages 247255, 2011.
[8] Radford M Neal. Slice sampling. Annals of statistics, pages 705741, 2003.
[9] Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of di- mensionality: Discrete integration by hashing and optimization. In ICML, 2013.
[10] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
[11] S. Vadhan. Pseudorandomness. Foundations and Trends in Theoretical Computer Science, 2011.
[12] O. Goldreich. Randomized methods in computation. Lecture Notes, 2011.
[13] D. Allouche, S. de Givry, and T. Schiex. Toulbar2, an open source exact cost function network solver. Technical report, INRIA, 2010.
[14] IBM ILOG. IBM ILOG CPLEX Optimization Studio 12.3, 2011.
[15] Carla P. Gomes, Willem Jan van Hoeve, Ashish Sabharwal, and Bart Selman. Counting CSP solutions using generalized XOR constraints. In AAAI, 2007.
[16] Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graph- ical structures and their application to expert systems. Journal of the Royal Statistical Society.Series B (Methodological), pages 157224, 1988.
[17] Yehuda Naveh, Michal Rimon, Itai Jaeger, Yoav Katz, Michael Vinov, Eitan s Marcu, and Gil Shurek. Constraint-based random stimuli generation for hardware verification. AI magazine, 28(3):13, 2007.
[18] Clark Barrett, Aaron Stump, and Cesare Tinelli. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org, 2010.
[19] Patrice Godefroid, Michael Y Levin, David Molnar, et al. Automated whitebox fuzz testing.In NDSS, 2008.
[20] Patrice Godefroid, Michael Y. Levin, and David Molnar. Sage: Whitebox fuzzing for security testing. Queue, 10(1):20:2020:27, January 2012. ISSN 1542-7730.
[21] M. Soos, K. Nohl, and C. Castelluccia. Extending SAT solvers to cryptographic problems. In SAT-2009. Springer, 2009.
[22] Robert J Bayardo and Joseph Daniel Pehoushek. Counting models using connected compo- nents. In AAAI-2000, pages 157162, 2000.
-----0
J. Cheng and M. Druzdzel. AIS-BN: An adaptive importance sampling algorithm for evidential reasoning in large bayesian networks. Journal of Artificial Intelligence Research, 2000.
H. Haario, M. Laine, A. Mira, and E. Saksman. DRAM: efficient adaptive MCMC. Statistics and Computing, 16(4):339354, 2006.
L. D. Hernandez, S. Moral, and A. Salmeron. A Monte Carlo algorithm for probabilistic propagation in belief networks based on importance sampling and stratified simulation techniques.International Journal of Approximate Reasoning, 18(1):5391, 1998.
B. K. Horn. Understanding image intensities. Artificial intelligence, 8(2):201231, 1977.R. Mateescu, K. Kask, V. Gogate, and R. Dechter. Join-graph propagation algorithms. Journal of Artificial Intelligence Research, 37(1):279328, 2010.
Q. Morris. Recognition networks for approximate inference in BN20 networks. Morgan Kaufmann Publishers Inc., Aug. 2001.
L. E. Ortiz and L. P. Kaelbling. Adaptive importance sampling for estimation in structured domains. In Proc. of the 16th Ann. Conf. on Uncertainty in A.I. (UAI-00), pages 446454. Morgan Kaufmann Publishers, 2000.
M. Plummer et al. Jags: A program for analysis of bayesian graphical models using gibbs sampling.URL http://citeseer. ist. psu. edu/plummer03jags. html, 2003.G. Roberts and J. Rosenthal. Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349367, 2009.
A. Salmeron, A. Cano, and S. Moral. Importance sampling in Bayesian networks using probability trees. Computational Statistics and Data Analysis, 34(4):387413, Oct. 2000.
A. N. Sanborn, V. K. Mansinghka, and T. L. Griffiths. Reconciling intuitive physics and Newtonian mechanics for colliding objects. Psychological Review, 120(2):411, Apr. 2013.
R. D. Shachter and M. A. Peot. Simulation approaches to general probabilistic inference on belief networks. In Proc. of the 5th Ann. Conf. on Uncertainty in A.I. (UAI-89), pages 311318, New York, NY, 1989. Elsevier Science.
K. Watanabe and S. Shimojo. When sound affects vision: effects of auditory grouping on visual motion perception. Psychological Science, 12(2):109116, 2001.
D. Wingate and T. Weber. Automated variational inference in probabilistic programming. arXiv preprint arXiv:1301.1299, 2013.
H. Yu and R. A. Van Engelen. Refractor importance sampling. arXiv preprint arXiv:1206.3295, 2012.
C. Yuan and M. J. Druzdzel. Importance sampling in Bayesian networks: An influence-based approximation strategy for importance functions. arXiv preprint arXiv:1207.1422, 2012.
-----1
[1] Michalis K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. JMLR WC&P, 5:567574, 2009.
[2] Marc Deisenroth and Shakir Mohamed. Expectation propagation in Gaussian process dynamical systems.In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 26182626. 2012.
[3] Jonathan Ko and Dieter Fox. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):7590, July 2009.
[4] Cedric Archambeau, Manfred Opper, Yuan Shen, Dan Cornford, and John Shawe-Taylor. Variational inference for diffusion processes. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1724. MIT Press, Cambridge, MA, 2008.
[5] Jose Bento Ayres Pereira, Morteza Ibrahimi, and Andrea Montanari. Learning networks of stochastic differential equations. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 172180. 2010.
[6] Danilo J. Rezende, Daan Wierstra, and Wulfram Gerstner. Variational learning for recurrent spiking networks. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 136144. 2011.
[7] Simon Lyons, Amos Storkey, and Simo Sarkka. The coloured noise expansion and parameter estimation of diffusion processes. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 19611969. 2012.
[8] Omiros Papaspiliopoulos, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. Nonparametric esti- mation of diffusions: a differential equations approach. Biometrika, 99(3):511531, 2012.
[9] Yvo Pokern, Andrew M. Stuart, and J.H. van Zanten. Posterior consistency via precision operators for Bayesian nonparametric drift estimation in SDEs. Stochastic Processes and their Applications, 123(2):603628, 2013.
[10] P. E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer, New York, corrected edition, June 2011.
[11] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[12] Lehel Csato, Manfred Opper, and Ole Winther. TAP Gibbs free energy, belief propagation and sparsity.In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 657663. MIT Press, 2002.
[13] C. W. Gardiner. Handbook of Stochastic Methods. Springer, Berlin, second edition, 1996.
[14] Manfred Opper, Andreas Ruttor, and Guido Sanguinetti. Approximate inference in continuous time Gaussian-jump processes. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Cu- lotta, editors, Advances in Neural Information Processing Systems 23, pages 18311839. 2010.
[15] Florian Stimberg, Manfred Opper, and Andreas Ruttor. Bayesian inference for change points in dynamical systems with reusable statesa Chinese restaurant process approach. JMLR WC&P, 22:11171124, 2012.
[16] Frank Kwasniok. Analysis and modelling of glacial climate transitions using simple dynamical systems.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1991), 2013.
-----1
[1] C. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The Annals of Statistics, 2(6):11521174, 1974.
[2] Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.
[3] David Blei, Ng Andrew, and Michael Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[4] David M. Blei and Michael I. Jordan. Variational methods for the Dirichlet process. In Proc. of ICML04, 2004.
[5] Michael Bryant and Erik Sudderth. Truly nonparametric online variational inference for hierarchical dirich- let processes. In Proc. of NIPS12, 2012.
[6] David B. Dahl. Sequentially-allocated merge-split sampler for conjugate and nonconjugate dirichlet process mixture models, 2005.
[7] Nils Lid Hjort, Chris Holmes, Peter Muller, and Stephen G. Walker. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2010.
[8] Matt Hoffman, David M. Blei, Chong Wang, and John Paisley. Stochastic variational inference. arXiv eprints, 1206.7501, 2012.
[9] Michael C. Hughes, Emily B. Fox, and Erik B. Sudderth. Effective split-merge monte carlo methods for nonparametric models of sequential data. 2012.
[10] Lancelot F. James, Antonio Lijoi, and Igor Prunster. Posterior analysis for normalized random measures with independent increments. Scaninavian Journal of Stats, 36:7697, 2009.
[11] Kenichi Kurihara, Max Welling, and Yee Whye Teh. Collapsed variational dirichlet process mixture mod- els. In Proc. of IJCAI07, 2007.
[12] Radford M. Neal. Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of computational and graphical statistics, 9(2):249265, 2000.
[13] David J. Nott, Xiaole Zhang, Christopher Yau, and Ajay Jasra. A sequential algorithm for fast fitting of dirichlet process mixture models. In Arxiv: 1301.2897, 2013.
[14] Ian Porteous, Alex Ihler, Padhraic Smyth, and Max Welling. Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick-breaking Representation. In Proc. of UAI06, 2006.
[15] Carl Edward Rasmussen. The Infinite Gaussian Mixture Model. In Proc. of NIPS00, 2000.
[16] Jayaram Sethuraman. A constructive definition of dirichlet priors. Statistical Sinica, 4:639650, 1994.
[17] S.Jain and R.M. Neal. A split-merge markov chain monte carlo procedure for the dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13(1):158182, 2004.
[18] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet Processes.Journal of the American Statistical Association, 101(476):15661581, 2007.
[19] Y.W. Teh, K. Kurihara, and Max Welling. Collapsed Variational Inference for HDP. In Proc. of NIPS07, volume 20, 2007.
[20] Chong Wang and David Blei. A split-merge mcmc algorithm for the hierarchical dirichlet process. arXiv eprints, 1201.1657, 2012.
[21] Chong Wang and David Blei. Truncation-free stochastic variational inference for bayesian nonparametric models. In Proc. of NIPS12, 2012.
[22] Chong Wang and David M Blei. Variational Inference for the Nested Chinese Restaurant Process. In Proc.of NIPS09, 2009.
[23] Chong Wang, John Paisley, and David Blei. Online variational inference for the hierarchical dirichlet process. In AISTATS11, 2011.
[24] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. of CVPR10, 2010.
-----1
[1] M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. JMLR, 14:13031347, 2013.
[2] R. Ranganath, C. Wang., D. Blei, and E. Xing. An adaptive learning rate for stochastic variational infer- ence. In ICML, 2013.
[3] P. Gopalan, D. M. Mimno, S. Gerrish, M. J. Freedman, and D. M. Blei. Scalable inference of overlapping communities. In NIPS, 2012.
[4] M. Bryant and E. Sudderth. Truly nonparametric online variational inference for hierarchical Dirichlet processes. In NIPS, 2012.
[5] S. Jain and R.M. Neal. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13(1):158182, 2004.
[6] D. B. Dahl. Sequentially-allocated merge-split sampler for conjugate and nonconjugate Dirichlet process mixture models. Submitted to Journal of Computational and Graphical Statistics, 2005.
[7] D. M. Blei and M. I. Jordan. Variational inference for Dirichlet process mixture models. Bayesian Analysis, 1(1):121144, 2006.
[8] Y. W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for HDP. In NIPS, 2008.
[9] K. Kurihara, M. Welling, and N. Vlassis. Accelerated variational Dirichlet process mixtures. In NIPS, 2006.
[10] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, 1999.
[11] O. Papaspiliopoulos and G. O. Roberts. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 95(1):169186, 2008.
[12] N. Goodman, V. Mansinghka, D. M. Roy, K. Bonawitz, and J. Tenenbaum. Church: A language for generative models. In Uncertainty in Artificial Intelligence, 2008.
[13] N. Le Roux, M. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In NIPS, 2012.
[14] N. Ueda and Z. Ghahramani. Bayesian model search for mixture models based on optimizing variational bounds. Neural Networks, 15(1):12231241, 2002.
[15] C. Wang and D. Blei. Truncation-free stochastic variational inference for Bayesian nonparametric models.In NIPS, 2012.
[16] N. Ueda, R. Nakano, Z. Ghahramani, and G. Hinton. SMEM algorithm for mixture models. Neural Computation, 12(9):21092128, 2000.
[17] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms, pages 10271035, 2007.
[18] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
[19] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV, 2011.
[20] D. Zoran and Y. Weiss. Natural images, Gaussian mixtures and dead leaves. In NIPS, 2012.
-----1
[1] J. Andrew Bagnell, Andrew Y. Ng, and Jeff G. Schneider. Solving uncertain markov decision processes. Technical report, Carnegie Mellon University, 2001.
[2] Sebastien Bubeck, Remi Munos, and Gilles Stoltz. Pure exploration in multi-armed bandits problems. In Proceedings of the 20th international conference on Algorithmic learning theory, Algorithmic Learning Theory, 2009.
[3] Robert Givan, Sonia Leach, and Thomas Dean. Bounded-parameter markov decision pro- cesses. Artificial Intelligence, 122, 2000.
[4] G. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30, 2004.
[5] Michael Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for near- optimal planning in large markov decision processes. Machine Learning, 49, 2002.
[6] Shie Mannor, Ofir Mebel, and Huan Xu. Lightning does not strike twice: Robust MDPs with coupled uncertainty. In International Conference on Machine Learning (ICML), 2012.
[7] Andrew Mastin and Patrick Jaillet. Loss bounds for uncertain transition probabilities in markov decision processes. In IEEE Annual Conference on Decision and Control (CDC), 2012, 2012.
[8] Arnab Nilim and Laurent El Ghaoui. Ghaoui, l.: Robust control of markov decision processes with uncertain transition matrices. Operations Research, 2005.
[9] Joelle Pineau, Geoffrey J. Gordon, and Sebastian Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence, 2003.
[10] Martin Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming.John Wiley and Sons, 1994.
[11] Kevin Regan and Craig Boutilier. Regret-based reward elicitation for markov decision pro- cesses. In Uncertainty in Artificial Intelligence, 2009.
[12] Kevin Regan and Craig Boutilier. Robust policy computation in reward-uncertain MDPs using nondominated policies. In National Conference on Artificial Intelligence (AAAI), 2010.
[13] Leonard Savage. The Foundations of Statistics. Wiley, 1954.
[14] A. Shapiro. Monte carlo sampling methods. In Stochastic Programming, volume 10 of Hand- books in Operations Research and Management Science. Elsevier, 2003.
[15] Wolfram Wiesemann, Daniel Kuhn, and Ber Rustem. Robust markov decision processes.Mathematics of Operations Research, 38(1), 2013.
[16] Huan Xu and Shie Mannor. Parametric regret in uncertain markov decision processes. In IEEE Conference on Decision and Control, CDC, 2009.
-----1
[1] D.P. Bertsekas and J.N. Tsitsiklis. Neurodynamic Programming. Athena Scientific, 1996.
[2] J. Fearnley. Exponential lower bounds for policy iteration. In Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II, ICALP10, pages 551562, Berlin, Heidelberg, 2010. Springer-Verlag.
[3] T.D. Hansen, P.B. Miltersen, and U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM, 60(1):1:11:16, February 2013.
[4] T.D. Hansen and U. Zwick. Lower bounds for howards algorithm for finding minimum mean-cost cycles. In ISAAC (1), pages 415426, 2010.
[5] R. Hollanders, J.C. Delvenne, and R. Jungers. The complexity of policy iteration is exponential for discounted markov decision processes. In 51st IEEE conference on Decision and control (CDC12), 2012.
[6] Y. Mansour and S.P. Singh. On the complexity of policy iteration. In UAI, pages 401408, 1999.
[7] M. Melekopoglou and A. Condon. On the complexity of the policy improvement algo- rithm for markov decision processes. INFORMS Journal on Computing, 6(2):188192, 1994.
[8] I. Post and Y. Ye. The simplex method is strongly polynomial for deterministic markov decision processes. Technical report, arXiv:1208.5083v2, 2012.
[9] M. Puterman. Markov Decision Processes. Wiley, New York, 1994.
[10] N. Schmitz. How good is howards policy improvement algorithm? Zeitschrift fur Operations Research, 29(7):315316, 1985.
[11] Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4):593603, 2011.
-----1
[1] Yasin Abbasi-Yadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. Journal of Machine Learning Research - Proceedings Track, 19:126, 2011.
[2] Peter Auer and Ronald Ortner. Logarithmic online regret bounds for undiscounted reinforce- ment learning. In NIPS, pages 4956, 2006.
[3] Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. Regret bounds for reinforcement learning with policy advice. CoRR, abs/1305.1027, 2013.
[4] Peter L. Bartlett and Ambuj Tewari. REGAL: A regularization based algorithm for reinforce- ment learning in weakly communicating MDPs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI2009), pages 3542, June 2009.
[5] Dimitri P. Bertsekas and John Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, September 1996.
[6] Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213231, 2002.
[7] Geoffrey Gordon. Online fitted reinforcement learning. In Advances in Neural Information Processing Systems 8, pages 10521058. MIT Press, 1995.
[8] Morteza Ibrahimi, Adel Javanmard, and Benjamin Van Roy. Efficient reinforcement learning for high dimensional linear quadratic systems. In NIPS, 2012.
[9] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:15631600, 2010.
[10] Sham Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
[11] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. Combina- torica, 4(4):373396, 1984.
[12] Michael J. Kearns and Daphne Koller. Efficient reinforcement learning in factored MDPs. In IJCAI, pages 740747, 1999.
[13] Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209232, 2002.
[14] Tor Lattimore, Marcus Hutter, and Peter Sunehag. The sample-complexity of general rein- forcement learning. In ICML, 2013.
[15] Lihong Li and Michael Littman. Reducing reinforcement learning to kwik online regression.Annals of Mathematics and Artificial Intelligence, 2010.
[16] Lihong Li, Michael L. Littman, and Thomas J. Walsh. Knows what it knows: a framework for self-aware learning. In ICML, pages 568575, 2008.
[17] Ronald Ortner and Daniil Ryabko. Online regret bounds for undiscounted continuous rein- forcement learning. In NIPS, 2012.
[18] Warren Powell and Ilya Ryzhov. Optimal Learning. John Wiley and Sons, 2011.
[19] G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical report, 1994.
[20] Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. CoRR, abs/1301.2609, 2013.
[21] Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Reinforcement learning with soft state aggregation. In NIPS, pages 361368, 1994.
[22] Er L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman. PAC model- free reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 881888, 2006.
[23] Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, March 1998.
[24] Csaba Szepesvari. Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2010.
[25] John N. Tsitsiklis and Benjamin Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22(1-3):5994, 1996.
[26] Benjamin Van Roy. Performance loss bounds for approximate value iteration with state aggre- gation. Math. Oper. Res., 31(2):234244, 2006.
-----1
[1] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of multiarmed bandit problems.Machine Learning, 47:235256, 2002.
[2] L. Busoniu and R. Munos. Optimistic planning for Markov decision processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W & CP 22, pages 182189, 2012.
[3] E. F. Camacho and C. Bordons. Model Predictive Control. Springer, 2004.
[4] R. Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. Computers and Games, pages 7283, 2007.
[5] B. Defourny, D. Ernst, and L. Wehenkel. Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees. In Recent Advances in Reinforcement Learning - European Workshop on Reinforcement Learning (EWRL), pages 114, 2008.
[6] R. Fonteneau, L. Busoniu, and R. Munos. Optimistic planning for belief-augmented Markov decision processes. In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2013.
[7] S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of UCT with patterns in Monte- Carlo go. Technical report, INRIA RR-6062, 2006.
[8] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4(2):100107, 1968.
[9] J. F. Hren and R. Munos. Optimistic planning of deterministic systems. Recent Advances in Reinforcement Learning, pages 151164, 2008.
[10] J. E. Ingersoll. Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc., 1987.
[11] M. Kearns, Y. Mansour, and A. Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49(2-3):193208, 2002.
[12] L. Kocsis and C. Szepesvri. Bandit based Monte-Carlo planning. Machine Learning: ECML 2006, pages 282293, 2006.
[13] R. Munos. From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning. To appear in Foundations and Trends in Machine Learning, 2013.
[14] S. A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2):331366, 2003.
[15] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In IEEE-RAS International Conference on Humanoid Robots, pages 120, 2003.
[16] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1998.
[17] T. J. Walsh, S. Goschin, and M. L. Littman. Integrating sample-based planning and model-based reinforcement learning. In AAAI Conference on Artificial Intelligence, 2010.
-----1
[1] Abernethy, J., Hazan, E., and Rakhlin, A. (2008). Competing in the dark: An efficient algorithm for bandit linear optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), pages 263274.
[2] Audibert, J. Y., Bubeck, S., and Lugosi, G. (2013). Regret in online combinatorial optimization.Mathematics of Operations Research. to appear.
[3] Bartok, G., Pal, D., Szepesvari, C., and Szita, I. (2011). Online learning. Lecture notes, Univer- sity of Alberta. https://moodle.cs.ualberta.ca/file.php/354/notes.pdf.
[4] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
[5] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge Univer- sity Press, New York, NY, USA.
[6] Daniel, C., Neumann, G., and Peters, J. (2012). Hierarchical relative entropy policy search.In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of JMLR Workshop and Conference Proceedings, pages 273281.
[7] Dekel, O. and Hazan, E. (2013). Better rates for any adversarial deterministic mdp. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 675683. JMLR Workshop and Conference Proceedings.
[8] Even-Dar, E., Kakade, S. M., and Mansour, Y. (2005). Experts in a Markov decision process.In NIPS-17, pages 401408.
[9] Even-Dar, E., Kakade, S. M., and Mansour, Y. (2009). Online Markov decision processes.Mathematics of Operations Research, 34(3):726736.
[10] Gyorgy, A., Linder, T., Lugosi, G., and Ottucsak, Gy. (2007). The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 8:23692403.
[11] Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems 14 (NIPS), pages 15311538.
[12] Koolen, W. M., Warmuth, M. K., and Kivinen, J. (2010). Hedging structured concepts. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pages 93105.
[13] Martinet, B. (1970). Regularisation dinequations variationnelles par approximations succes- sives. ESAIM: Mathematical Modelling and Numerical Analysis - Modelisation Mathematique et Analyse Numerique, 4(R3):154158.
[14] Neu, G., Gyorgy, A., and Szepesvari, Cs. (2010a). The online loop-free stochastic shortest- path problem. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pages 231243.
[15] Neu, G., Gyorgy, A., and Szepesvari, Cs. (2012). The adversarial stochastic shortest path problem with unknown transition probabilities. In AISTATS 2012, pages 805813.
[16] Neu, G., Gyorgy, A., Szepesvari, Cs., and Antos, A. (2010b). Online Markov decision pro- cesses under bandit feedback. In NIPS-23, pages 18041812.
[17] Peters, J., Mulling, K., and Altun, Y. (2010). Relative entropy policy search. In AAAI 2010, pages 16071612.
[18] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Program- ming. Wiley-Interscience.
[19] Rakhlin, A. (2009). Lecture notes on online learning.
[20] Rockafellar, R. T. (1976). Monotone Operators and the Proximal Point Algorithm. SIAM Journal on Control and Optimization, 14(5):877898.
[21] Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction. MIT Press.
[22] Szepesvari, Cs. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artifi- cial Intelligence and Machine Learning. Morgan & Claypool Publishers.
[23] Yu, J. Y., Mannor, S., and Shimkin, N. (2009). Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, 34(3):737757.
[24] Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pages 928936.
-----1
[1] Nicolo` Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.4Thus, `t(Gt,?(Gt)) = PL l=1 ct(n ? t,l,?).
[2] Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision pro- cesses. Mathematics of Operations Research, 22(1):222255, 1997.
[3] T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:15631600, 2010.
[4] P. L. Bartlett and A. Tewari. REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. In UAI, 2009.
[5] Yasin Abbasi-Yadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. In COLT, 2011.
[6] Yasin Abbasi-Yadkori. Online Learning for Linearly Parametrized Control Problems. PhD thesis, Uni- versity of Alberta, 2012.
[7] Ronald Ortner and Daniil Ryabko. Online regret bounds for undiscounted continuous reinforcement learning. In NIPS, 2012.
[8] Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process. In NIPS, 2004.
[9] Jia Yuan Yu and Shie Mannor. Arbitrarily modulated Markov decision processes. In IEEE Conference on Decision and Control, 2009.
[10] Jia Yuan Yu and Shie Mannor. Online learning in Markov decision processes with arbitrarily changing rewards and transitions. In GameNets, 2009.
[11] Gergely Neu, Andras Gyorgy, and Csaba Szepesvari. The adversarial stochastic shortest path problem with unknown transition probabilities. In AISTATS, 2012.
[12] Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Online Markov decision processes. Mathematics of Operations Research, 34(3):726736, 2009.
[13] Eyal Even-Dar. Personal communication., 2013.
[14] Gergely Neu, Andras Gyorgy, Csaba Szepesvari, and Andras Antos. Online Markov decision processes under bandit feedback. In NIPS, 2010.
[15] Vladimir Vovk. Aggregating strategies. In COLT, pages 372383, 1990.
[16] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Compu- tation, 108(2):212261, 1994.
[17] Sascha Geulen, Berthold Vocking, and Melanie Winkler. Regret minimization for online buffering prob- lems using the weighted majority algorithm. In COLT, 2010.
[18] Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Com- puter and System Sciences, 71(3):291307, 2005.
[19] Gergely Neu, Andras Gyorgy, and Csaba Szepesvari. The online loop-free stochastic shortest path prob- lem. In COLT, 2010.
[20] Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In STOC, pages 629638, 2008.
[21] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. In STOC, pages 8493, 2005.
[22] Gergely Neu, Andras Gyorgy, and Csaba Szepesvari. The adversarial stochastic shortest path problem with unknown transition probabilities. In AISTATS, 2012.
[23] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 2002.
[24] V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In Rocco Servedio and Tong Zhang, editors, COLT, pages 355366, 2008.
[25] Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms for linear stochastic ban- dits. In NIPS, 2011.
[26] Nick Littlestone. From on-line to batch learning. In COLT, pages 269284, 1989.
[27] Varun Kanade and Thomas Steinke. Learning hurdles for sleeping experts. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS 12, pages 1118, 2012.
-----1
[1] M. H. DeGroot, Reaching a consensus, Journal of the American Statistical Association, vol. 69, no. 345, pp. 118121, 1974.
[2] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, Non-bayesian social learning, Games and Economic Behavior, vol. 76, no. 1, pp. 210225, 2012.
[3] E. Mossel and O. Tamuz, Efficient bayesian learning in social networks with gaussian esti- mators, arXiv preprint arXiv:1002.0747, 2010.
[4] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, The Journal of Machine Learning Research, vol. 13, pp. 165202, 2012.
[5] L. Xiao, S. Boyd, and S. Lall, A scheme for robust distributed sensor fusion based on average consensus, in Fourth International Symposium on Information Processing in Sensor Networks.IEEE, 2005, pp. 6370.
[6] S. Kar, J. M. Moura, and K. Ramanan, Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication, IEEE Transactions on Informa- tion Theory, vol. 58, no. 6, pp. 35753605, 2012.
[7] S. Shahrampour and A. Jadbabaie, Exponentially fast parameter estimation in networks using distributed dual averaging, arXiv preprint arXiv:1309.2350, 2013.
[8] D. Acemoglu, A. Nedic, and A. Ozdaglar, Convergence of rule-of-thumb learning rules in social networks, in 47th IEEE Conference on Decision and Control, 2008, pp. 17141720.
[9] R. M. Frongillo, G. Schoenebeck, and O. Tamuz, Social learning in a changing world, in Internet and Network Economics. Springer, 2011, pp. 146157.
[10] U. A. Khan, S. Kar, A. Jadbabaie, and J. M. Moura, On connectivity, observability, and stability in distributed estimation, in 49th IEEE Conference on Decision and Control, 2010, pp. 66396644.
[11] R. Olfati-Saber, Distributed kalman filtering for sensor networks, in 46th IEEE Conference on Decision and Control, 2007, pp. 54925498.
[12] N. Cesa-Bianchi, Prediction, learning, and games. Cambridge University Press, 2006.
[13] J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual averaging for distributed optimization: convergence analysis and network scaling, IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592606, 2012.
[14] R. E. Kalman et al., A new approach to linear filtering and prediction problems, Journal of basic Engineering, vol. 82, no. 1, pp. 3545, 1960.
[15] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks. Princeton University Press, 2010.
[16] J. A. Tropp, User-friendly tail bounds for sums of random matrices, Foundations of Compu- tational Mathematics, vol. 12, no. 4, pp. 389434, 2012.
[17] G. Biau, K. Bleakley, L. Gyorfi, and G. Ottucsak, Nonparametric sequential prediction of time series, Journal of Nonparametric Statistics, vol. 22, no. 3, pp. 297317, 2010.
[18] L. Gyorfi and G. Ottucsak, Sequential prediction of unbounded stationary time series, IEEE Transactions on Information Theory, vol. 53, no. 5, pp. 18661872, 2007.
[19] L. Gyorfi, G. Lugosi et al., Strategies for sequential prediction of stationary time series.Springer, 2000.
-----1
[1] L. A. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election: divided they blog. In LinkKDD, LinkKDD 05, page 3643, New York, NY, USA, 2005. ACM.
[2] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. J.Mach. Learn. Res., 9:19812014, June 2008.
[3] S. Amari. Differential geometry of curved exponential Families-Curvatures and information loss. The Annals of Statistics, 10(2):357385, June 1982.
[4] B. Ball, B. Karrer, and M. E. J. Newman. Efficient and principled method for detecting communities in networks. Physical Review E, 84(3):036103, Sept. 2011.
[5] J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, ICML 06, pages 233240, New York, NY, USA, 2006. ACM.
[6] S. Fortunato. Community detection in graphs. Physics Reports, 486(35):75174, Feb. 2010.
[7] T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force-directed placement. Softw. Pract.Exper., 21(11):11291164, Nov. 1991.
[8] S. Geisser and W. Eddy. A predictive approach to model selection. Journal of the American Statistical Association, 74:153160, 1979.
[9] P. K. Gopalan and D. M. Blei. Efficient discovery of overlapping communities in massive networks.Proceedings of the National Academy of Sciences, 110(36):1453414539, 2013.
[10] P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):10901098, 2002.
[11] M. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 2013.
[12] H. Jeong, Z. Nda, and A. L. Barabsi. Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4):567, 2003.
[13] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Mach. Learn., 37(2):183233, Nov. 1999.
[14] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Phys.Rev. E, 83:016107, Jan 2011.
[15] D. I. Kim, P. Gopalan, D. M. Blei, and E. B. Sudderth. Efficient online inference for bayesian nonpara- metric relational models. In Neural Information Processing Systems, 2013.
[16] P. N. Krivitsky, M. S. Handcock, A. E. Raftery, and P. D. Hoff. Representing degree distributions, clus- tering, and homophily in social networks with latent cluster random effects models. Social Networks, 31(3):204213, July 2009.
[17] A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 80(1):016118, July 2009.
[18] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahone. Community structure in large networks: Natural cluster sizes and the absence of large well-defined cluster. In Internet Mathematics, 2008.
[19] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM 03, pages 556559, New York, NY, USA, 2003. ACM.
[20] P. McCullagh and J. A. Nelder. Generalized Linear Models, Second Edition. Chapman and Hall/CRC, 2 edition, Aug. 1989.
[21] M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89(20):208701, Oct. 2002.
[22] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3):036104, 2006.
[23] K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):10771087, Sept. 2001.
[24] F. Papadopoulos, M. Kitsak, M. . Serrano, M. Bogu, and D. Krioukov. Popularity versus similarity in growing networks. Nature, 489(7417):537540, Sept. 2012.
[25] RITA. U.S. Air Carrier Traffic Statistics, Bur. Trans. Stats, 2010.
[26] H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400407, Sept. 1951.
[27] Y. J. Wang and G. Y. Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):819, 1987.
-----1
[1] E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:19812014, 2008.
[2] S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251276, 1998.
[3] L. Bottou. Stochastic learning. Advanced Lectures on Machine Learning, pages 146168, 2004.
[4] M. Carman, F. Crestani, M. Harvey, and M. Baillie. Towards query log based personalization using topic models. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM 10), pages 18491852, 2010.
[5] P. Gopalan, D. Mimno, S. Gerrish, M. Freedman, and D. Blei. Scalable inference of overlapping commu- nities. In Advances in Neural Information Processing Systems 25, pages 22582266. 2012.
[6] M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):13601380, 1973.
[7] Q. Ho, A. Parikh, and E. Xing. A multiscale community blockmodel for network exploration. Journal of the American Statistical Association, 107(499), 2012.
[8] Q. Ho, J. Yin, and E. Xing. On triangular versus edge representations  towards scalable modeling of networks. In Advances in Neural Information Processing Systems 25, pages 21412149. 2012.
[9] P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):10901098, 2002.
[10] M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 14:13031347, 2013.
[11] D. Hunter, S. Goodreau, and M. Handcock. Goodness of fit of social network models. Journal of the American Statistical Association, 103(481):248258, 2008.
[12] A. Lancichinetti, S. Fortunato, and J. Kertesz. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015+, 2009.
[13] Y. Low, D. Agarwal, and A. Smola. Multiple domain user personalization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 11), pages 123131, 2011.
[14] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction. In Ad- vances in Neural Information Processing Systems 22, pages 12761284. 2009.
[15] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824827, 2002.
[16] M. Morris, M. Handcock, and D. Hunter. Specification of exponential-family random graph models: Terms and computational aspects. Journal of Statistical Software, 24(4), 2008.
[17] M. Newman, S. Strogatz, and D. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2), 2001.
[18] H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400407, 1951.
[19] P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. ACM SIGKDD Explorations Newsletter, 7(2):3140, 2005.
[20] M. Sato. Online model selection based on the variational Bayes. Neural Computation, 13(7):16491681, 2001.
[21] G. Simmel and K. Wolff. The Sociology of Georg Simmel. Free Press, 1950.
[22] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foun- dations and Trends in Machine Learning, 1(1-2):1305, 2008.
[23] J. Xie, S. Kelley, and B. Szymanski. Overlapping community detection in networks: the state of the art and comparative study. ACM Computing Surveys, 45(4), 2013.
[24] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In Pro- ceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012.
-----1
[1] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In ICPR, 2004.
[2] M. Rodriguez, J. Ahmed, and M. Shah. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, 2008.
[3] Ucf50 action dataset. http://vision.eecs.ucf.edu/data/ucf50.rar.
[4] Y.W. Fu, T.M. Hospedales, T. Xiang, and S.G. Gong. Attribute learning for understanding unstructured social activity. In ECCV, 2012.
[5] R. Salakhutdinov and G.E. Hinton. Replicated softmax: an undirected topic model. In NIPS, 2009.
[6] M.E. Tipping. Sparse bayesian learning and the relevance vector machine. JMLR, 2001.
[7] V. Nair and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
[8] D. Bohning. Multinomial logistic regression algorithm. AISM, 1992.
[9] P. Turaga, R. Chellappa, V.S. Subrahmanian, and O. Udrea. Machine recognition of human activities: a survey. TCSVT, 2008.
[10] J. Varadarajan, R. Emonet, and J.-M. Odobez. A sequential topic model for mining recurrent activities from long term video logs. IJCV, 2013.
[11] D. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. JMLR, 2003.
[12] J. Zhu, A. Ahmed, and E.P. Xing. Medlda: Maximum margin supervised topic models. JMLR, 2012.
[13] H. Larochelle, M. Mandel, R. Pascanu, and Y. Bengio. Learning algorithms for the classifica- tion restricted boltzmann machine. JMLR, 2012.
[14] R.M. Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.
[15] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002.
[16] K.P. Murphy. Machine learning: a probabilistic perspective. MIT Press, 2012.
[17] M. Harva and A. Kaban. Variational learning for rectified factor analysis. Signal Processing, 2007.
[18] D.G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
[19] I. Laptev. On space-time interest points. IJCV, 2005.
[20] B. Logan. Mel frequency cepstral coefficients for music modeling. ISMIR, 2000.
[21] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector learning for interde- pendent and structured output spaces. In ICML, 2004.
-----1
[1] F. Niu, B. Recht, C. Re, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Neural Information Processing Systems, 2011.
[2] A. Kleiner, A. Talwalkar, P. Sarkar, and M. Jordan. The big data bootstrap. In International Conference on Machine Learning, 2012.
[3] M. Hoffman, D. M. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In Neural Informa- tion Processing Systems, volume 23, pages 856864, 2010.
[4] M. Hoffman, D. M. Blei, J. Paisley, and C. Wang. Stochastic variational inference. Journal of Machine Learning Research, 14:13031347.
[5] C. Wang, J. Paisley, and D. M. Blei. Online variational inference for the hierarchical Dirichlet process. In Artificial Intelligence and Statistics, 2011.
[6] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference.Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
[7] T. P. Minka. Expectation propagation for approximate Bayesian inference. In Uncertainty in Artificial Intelligence, pages 362369. Morgan Kaufmann, 2001.
[8] T. P. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001.
[9] M. Opper. A Bayesian approach to on-line learning.
[10] K. R Canini, L. Shi, and T. L Griffiths. Online inference of topics with latent Dirichlet allocation. In Artificial Intelligence and Statistics, volume 5, 2009.
[11] A. Honkela and H. Valpola. On-line variational Bayesian learning. In International Symposium on Inde- pendent Component Analysis and Blind Signal Separation, pages 803808, 2003.
[12] J. Luts, T. Broderick, and M. P. Wand. Real-time semiparametric regression. Journal of Computational and Graphical Statistics, to appear. Preprint arXiv:1209.3550.
[13] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[14] T. Minka and J. Lafferty. Expectation-propagation for the generative aspect model. In Uncertainty in Artificial Intelligence, pages 352359. Morgan Kaufmann, 2002.
[15] Y. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Neural Information Processing Systems, 2006.
[16] A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence, 2009.
[17] M. Hoffman. Online inference for LDA (Python code) at http://www.cs.princeton.edu/blei/downloads/onlineldavb.tar, 2010.
[18] R. Ranganath, C. Wang, D. M. Blei, and E. P. Xing. An adaptive learning rate for stochastic variational inference. In International Conference on Machine Learning, 2013.
[19] W. L. Buntine and A. Jakulin. Applying discrete PCA in data analysis. In Uncertainty in Artificial Intelligence.
[20] M. Seeger. Expectation propagation for exponential families. Technical report, University of California at Berkeley, 2005.
-----1
[1] A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM), 2012.
[2] K. Bache and M. Lichman. UCI machine learning repository, 2013.
[3] D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems (NIPS), 2006.
[4] D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning (ICML), 2006.
[5] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[6] N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized relational topic models with data augmen- tation. In International Joint Conference on Artificial Intelligence (IJCAI), 2013.
[7] M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS), 2010.
[8] C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regres- sion. Bayesian Analysis, 1(1):145168, 2006.
[9] D. Mimno, H. Wallach, and A. McCallum. Gibbs sampling for logistic normal topic models with graph-based priors. In NIPS Workshop on Analyzing Graphs, 2008.
[10] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models.Journal of Machine Learning Research, (10):18011828, 2009.
[11] J. Paisley, C. Wang, and D. Blei. The discrete infinite logistic normal distribution for mixed- membership modeling. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
[12] N. G. Polson and J. G. Scott. Default bayesian analysis for multi-way tables: a data- augmentation approach. arXiv:1109.4180, 2011.
[13] N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using Polya- Gamma latent variables. arXiv:1205.0310v2, 2013.
[14] C. P. Robert. Simulation of truncated normal variables. Statistics and Compuating, 5:121125, 1995.
[15] W. Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., 1964.
[16] A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Very Large Data Base (VLDB), 2010.
[17] M. A. Tanner andW. H.Wong. The calculation of posterior distributions by data augmentation.Journal of the Americal Statistical Association, 82(398):528540, 1987.
[18] D. van Dyk and X. Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):150, 2001.
[19] L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on stream- ing document collections. In International Conference on Knowledge Discovery and Data mining (SIGKDD), 2009.
[20] A. Zhang, J. Zhu, and B. Zhang. Sparse online topic models. In International Conference on World Wide Web (WWW), 2013.
[21] J. Zhu, X. Zheng, and B. Zhang. Improved bayesian supervised topic models with data aug- mentation. In Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
-----1
[1] Andre Uschmajew. Local convergence of the alternating least squares algorithm for canonical tensor approximation. SIAM Journal on Matrix Analysis and Applications, 33(2):639652, 2012.
[2] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[3] J. K. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multilo- cus genotype data. Genetics, 155:945959, 2000.
[4] J.B. Kruskal. More factors than subjects, tests and treatments: an indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3):281293, 1976.
[5] Tamara Kolda and Brett Bader. Tensor decompositions and applications. SIREV, 51(3):455 500, 2009.
[6] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, Maryland, 2012.
[7] XuanLong Nguyen. Posterior contraction of the population polytope in finite admixture mod- els. arXiv preprint arXiv:1206.0068, 2012.
[8] T. Austin et al. On exchangeable random variables and the statistics of large graphs and hyper- graphs. Probab. Surv, 5:80145, 2008.
[9] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, andM. Telgarsky. Tensor Methods for Learning Latent Variable Models. Under Review. J. of Machine Learning. Available at arXiv:1210.7559, Oct. 2012.
[10] Elizabeth S. Allman, John A. Rhodes, and Amelia Taylor. A semialgebraic description of the general markov model on phylogenetic trees. Arxiv preprint arXiv:1212.1200, Dec. 2012.
[11] J.B. Kruskal. Three-way arrays: Rank and uniqueness of trilinear decompositions, with appli- cation to arithmetic complexity and statistics. Linear algebra and its applications, 18(2):95 138, 1977.
[12] A. Anandkumar, D. Hsu, M. Janzamin, and S. Kakade. When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity. Preprint available on arXiv:1308.2853, Aug. 2013.
[13] A. Anandkumar, D. Hsu, A. Javanmard, and S. M. Kakade. Learning Linear Bayesian Net- works with Latent Variables. ArXiv e-prints, September 2012.
[14] Daniel A. Spielman, Huan Wang, and John Wright. Exact recovery of sparsely-used dictionar- ies. ArxXiv preprint, abs/1206.5882, 2012.
[15] L. De Lathauwer, J. Castaing, and J.-F Cardoso. Fourth-order cumulant-based blind identifi- cation of underdetermined mixtures. IEEE Tran. on Signal Processing, 55:29652973, June 2007.
-----0
Dimitris Achlioptas and Frank McSherry. On spectral learning of mixtures of distributions. In Learning Theory, pages 458469. Springer, 2005.Sanjeev Arora and Ravi Kannan. Learning mixtures of arbitrary gaussians. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 247257. ACM, 2001.Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 103112. IEEE, 2010.
S Charles Brubaker and Santosh S Vempala. Isotropic pca and affine-invariant clustering. In Building Bridges, pages 241281. Springer, 2008.Kamalika Chaudhuri and Satish Rao. Learning mixtures of product distributions using correlations and independence. In COLT, pages 920, 2008.Kamalika Chaudhuri, Sanjoy Dasgupta, and Andrea Vattani. Learning mixtures of gaussians using the k-means algorithm. arXiv preprint arXiv:0912.0086, 2009.Sanjoy Dasgupta. Learning mixtures of gaussians. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pages 634644. IEEE, 1999.
Jian Guo, Elizaveta Levina, George Michailidis, and Ji Zhu. Pairwise variable selection for highdimensional model-based clustering. Biometrics, 66(3):793804, 2010.Daniel Hsu and Sham M Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 1120. ACM, 2013.
Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Disentangling gaussians. Communications of the ACM, 55(2):113120, 2012.
Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. The spectral method for general mixture models. In Learning Theory, pages 444457. Springer, 2005.
Pascal Massart. Concentration inequalities and model selection. 2007.Wei Pan and Xiaotong Shen. Penalized model-based clustering with application to variable selection. The Journal of Machine Learning Research, 8:11451164, 2007.Adrian E Raftery and Nema Dean. Variable selection for model-based clustering. Journal of the American Statistical Association, 101(473):168178, 2006.
Leonard J. Schulman and Sanjoy Dasgupta. A two-round variant of em for gaussian mixtures. In Proc. 16th UAI (Conference on Uncertainty in Artificial Intelligence), pages 152159, 2000.
Wei Sun, Junhui Wang, and Yixin Fang. Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6:148167, 2012.
Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Series in Statistics.Springer, 2009.Santosh Vempala and Grant Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841860, 2004.
Vincent Q Vu and Jing Lei. Minimax sparse principal subspace estimation in high dimensions. arXiv preprint arXiv:1211.0373, 2012.Daniela M Witten and Robert Tibshirani. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 2010.
-----0
S. Balakrishnan, A. Rinaldo, D. Sheehy, A. Singh, and L. Wasserman. Minimax rates for homology inference.AISTATS, 2012.
P. Bickel and B. Li. Local polynomial regression on unknown manifolds. In Technical report, Department of Statistics, UC Berkeley. 2006.
K. Chaudhuri and S. Dasgupta. Rates of convergence for the cluster tree. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 343351. 2010.
F. Chazal. An upper bound for the volume of geodesic balls in submanifolds of euclidean spaces. Personal Communication, available at http://geometrica.saclay.inria.fr/team/Fred.Chazal/BallVolumeJan2013.pdf, 2013.
A. Cuevas and R. Fraiman. A plug-in approach to support estimation. Annals of Statistics, 25(6):23002312, 1997.
A. Cuevas, W. Gonzlez-Manteiga, and A. Rodrguez-Casal. Plug-in estimation of general level sets. Aust. N.Z. J. Stat., 48(1):719, 2006.
S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In STOC, pages 537546.2008.C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Minimax manifold estimation. Journal of Machine Learning Research, 13:12631291, 2012.J. A. Hartigan. Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374):pp. 388394, 1981.
S. Kpotufe and S. Dasgupta. A tree-based regressor that adapts to intrinsic dimension. J. Comput. Syst. Sci., 78(5):14961515, 2012.
S. Kpotufe and U. von Luxburg. Pruning nearest neighbor cluster trees. In L. Getoor and T. Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML 11, pages 225 232. ACM, New York, NY, USA, 2011.
M. Maier, M. Hein, and U. von Luxburg. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor. Comput. Sci., 410(19):17491764, 2009.
P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Discrete & Computational Geometry, 39(1-3):419441, 2008.
W. Polonik. Measuring mass concentrations and estimating density contour clusters: an excess mass approach.Annals of Statistics, 23(3):855882, 1995.
P. Rigollet and R. Vert. Fast rates for plug-in estimators of density level sets. Bernoulli, 15(4):11541178, 2009.A. Rinaldo, A. Singh, R. Nugent, and L. Wasserman. Stability of density-based clustering. Journal of Machine Learning Research, 13:905948, 2012.
A. Rinaldo and L. Wasserman. Generalized density clustering. The Annals of Statistics, 38(5):26782722, 2010. 0907.3454.
A. Singh, C. Scott, and R. Nowak. Adaptive {H}ausdorff estimation of density level sets. Ann. Statist., 37(5B):27602782, 2009.B. K. Sriperumbudur and I. Steinwart. Consistency and rates for clustering with dbscan. Journal of Machine Learning Research Proceedings Track, 22:10901098, 2012.I. Steinwart. Adaptive density level set clustering. Journal of Machine Learning Research Proceedings Track, 19:703738, 2011.
W. Stuetzle. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J.Classification, 20(1):025047, 2003.W. Stuetzle and N. R. A generalized single linkage method for estimating the cluster tree of a density. Journal of Computational and Graphical Statistics, 19(2):397418, 2010.
A. B. Tsybakov. On nonparametric estimation of density level sets. Ann. Statist., 25(3):948969, 1997.G. Walther. Granulometric smoothing. Annals of Statistics, 25(6):22732299, 1997.
D. Wishart. Mode analysis: a generalization of nearest neighbor which reduces chaining. In Proceedings of the Colloquium on Numerical Taxonomy held in the University of St. Andrews, pages 282308. 1969.
-----1
[1] A. Agarwal, S. Negahban, and M. Wainwright. Noisy matrix decomposition via convex relaxation: Opti- mal rates in high dimensions. Technical report, arXiv:1102.4807v2, 2011.
[2] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. MIT Press, 2011.
[3] E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Technical report, arXiv:0912.3599, 2009.
[4] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geometry of linear inverse problems, prepint. Technical report, arXiv:1012.0621v2, 2010.
[5] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J.Matrix Anal. Appl., 21(4):12531278, 2000.
[6] L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and rank-(R1; R2; : : : ; RN ) ap- proximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21(4):13241342, 2000.
[7] M. Fazel, H. Hindi, and S. P. Boyd. A Rank Minimization Heuristic with Application to Minimum Order System Approximation. In Proc. of the American Control Conference, 2001.
[8] R. Foygel and N. Srebro. Concentration-based guarantees for low-rank matrix reconstruction. Technical report, arXiv:1102.3923, 2011.
[9] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl., 2(1):1740, 1976.
[10] S. Gandy, B. Recht, and I. Yamada. Tensor completion and low-n-rank tensor recovery via convex opti- mization. Inverse Problems, 27:025010, 2011.
[11] F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys., 6(1): 164189, 1927.
[12] D. Hsu, S. M. Kakade, and T. Zhang. Robust matrix decomposition with sparse corruptions. Information Theory, IEEE Transactions on, 57(11):72217234, 2011.
[13] A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In Advances in NIPS 23, pages 964972. 2010.
[14] R. Jenatton, J. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. J.Mach. Learn. Res., 12:27772824, 2011.
[15] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455500, 2009.
[16] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data.In Prof. ICCV, 2009.
[17] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Convex and network flow optimization for structured sparsity. J. Mach. Learn. Res., 12:26812720, 2011.
[18] A. Maurer and M. Pontil. Structured sparsity and generalization. Technical report, arXiv:1108.3476, 2011.
[19] M. Mrup. Applications of tensor (multiway array) factorizations and decompositions in data mining.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):2440, 2011.
[20] S. Negahban, P. Ravikumar, M. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in NIPS 22, pages 13481356. 2009.
[21] G. Obozinski, L. Jacob, and J.-P. Vert. Group lasso with overlaps: the latent group lasso approach.Technical report, arXiv:1110.0413, 2011.
[22] B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471501, 2010.
[23] M. Signoretto, L. De Lathauwer, and J. Suykens. Nuclear norms for tensors and their use for convex multilinear estimation. Technical Report 10-186, ESAT-SISTA, K.U.Leuven, 2010.
[24] N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Proc. of the 18th Annual Conference on Learning Theory (COLT), pages 545560. Springer, 2005.
[25] R. Tomioka, K. Hayashi, and H. Kashima. Estimation of low-rank tensors via convex optimization.Technical report, arXiv:1010.0789, 2011.
[26] R. Tomioka, T. Suzuki, K. Hayashi, and H. Kashima. Statistical performance of convex tensor decompo- sition. In Advances in NIPS 24, pages 972980. 2011.
[27] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279311, 1966.
[28] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. Technical report, arXiv:1011.3027, 2010.
-----1
[1] William S Robinson. A method for chronologically ordering archaeological deposits. American antiquity, 16(4):293301, 1951.
[2] Stephen T Barnard, Alex Pothen, and Horst Simon. A spectral algorithm for envelope reduction of sparse matrices. Numerical linear algebra with applications, 2(4):317334, 1995.
[3] D.R. Fulkerson and O. A. Gross. Incidence matrices and interval graphs. Pacific journal of mathematics, 15(3):835, 1965.
[4] Gemma C Garriga, Esa Junttila, and Heikki Mannila. Banded structure in binary matrices. Knowledge and information systems, 28(1):197226, 2011.
[5] Joao Meidanis, Oscar Porto, and Guilherme P Telles. On the consecutive ones property. Discrete Applied Mathematics, 88(1):325354, 1998.
[6] David G Kendall. Abundance matrices and seriation in archaeology. Probability Theory and Related Fields, 17(2):104112, 1971.
[7] Chris Ding and Xiaofeng He. Linearized cluster assignment via spectral ordering. In Proceedings of the twenty-first international conference on Machine learning, page 30. ACM, 2004.
[8] Niko Vuokko. Consecutive ones property and spectral ordering. In Proceedings of the 10th SIAM Inter- national Conference on Data Mining (SDM10), pages 350360, 2010.
[9] Innar Liiv. Seriation and matrix reordering methods: An historical overview. Statistical analysis and data mining, 3(2):7091, 2010.
[10] Alan George and Alex Pothen. An analysis of spectral envelope reduction via quadratic assignment problems. SIAM Journal on Matrix Analysis and Applications, 18(3):706732, 1997.
[11] Eugene L Lawler. The quadratic assignment problem. Management science, 9(4):586599, 1963.
[12] Qing Zhao, Stefan E Karisch, Franz Rendl, and Henry Wolkowicz. Semidefinite programming relaxations for the quadratic assignment problem. Journal of Combinatorial Optimization, 2(1):71109, 1998.
[13] A. Nemirovski. Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Mathematical programming, 109(2):283317, 2007.
[14] Anthony Man-Cho So. Moment inequalities for sums of random matrices and their applications in opti- mization. Mathematical programming, 130(1):125151, 2011.
[15] J.E. Atkins, E.G. Boman, B. Hendrickson, et al. A spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput., 28(1):297310, 1998.
[16] F. Fogel, R. Jenatton, F. Bach, and A. dAspremont. Convex relaxations for permutation problems.arXiv:1306.4805, 2013.
[17] Erling D Andersen and Knud D Andersen. The mosek interior point optimizer for linear programming: an implementation of the homogeneous algorithm. High performance optimization, 33:197232, 2000.
[18] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2):95110, 1956.
[19] L Portugal, F Bastos, J Judice, J Paixao, and T Terlaky. An investigation of interior-point algorithms for the linear transportation problem. SIAM Journal on Scientific Computing, 17(5):12021223, 1996.
[20] Y. Nesterov. Introductory Lectures on Convex Optimization. Springer, 2003.
[21] D. Bertsekas. Nonlinear Programming. Athena Scientific, 1998.
[22] Frank Roy Hodson. The La Te`ne cemetery at Munsingen-Rain: catalogue and relative chronology, vol- ume 5. Stampfli, 1968.
[23] Thomas M Cover and Joy A Thomas. Elements of information theory. Wiley-interscience, 2012.
-----1
[1] R. E. Burkard, M. DellAmico, and S. Martello. Assignment problems. SIAM, 2009.
[2] D. Shen and C. Davatzikos. Hammer: hierarchical attribute matching mechanism for elastic registration.TMI, IEEE, 21, 2002.
[3] K. Duan, D. Parikh, D. Crandall, and K. Grauman. Discovering localized attributes for fine-grained recognition. In CVPR, 2012.
[4] M.F. Demirci, A. Shokoufandeh, Y. Keselman, L. Bretzner, and S. Dickinson. Object recognition as many-to-many feature matching. IJCV, 69, 2006.
[5] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz. Multi-view stereo for community photo collections. In ICCV, 2007.
[6] A.C. Berg, T.L. Berg, and J. Malik. Shape matching and object recognition using low distortion corre- spondences. In CVPR, 2005.
[7] J. Petterson, T. Caetano, J. McAuley, and J. Yu. Exponential family graph matching and ranking. NIPS, 2009.
[8] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S.M. Seitz, and R. Szeliski. Building Rome in a day. Communications of the ACM, 54, 2011.
[9] I. Simon, N. Snavely, and S.M. Seitz. Scene summarization for online image collections. In ICCV, 2007.
[10] P.A. Pevzner. Multiple alignment, communication cost, and graph matching. SIAM JAM, 52, 1992.
[11] S. Lacoste-Julien, B. Taskar, D. Klein, and M.I. Jordan. Word alignment via quadratic assignment. In Proc. HLT - NAACL, 2006.
[12] A.J. Smola, S.V.N. Vishwanathan, and Q. Le. Bundle methods for machine learning. NIPS, 20, 2008.
[13] I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, and Y. Singer. Large margin methods for structured and interdependent output variables. JMLR, 6, 2006.
[14] M. Volkovs and R. Zemel. Efficient sampling for bipartite matching problems. In NIPS, 2012.
[15] A. Singer and Y. Shkolnisky. Three-dimensional structure determination from common lines in cryo-EM by eigenvectors and semidefinite programming. SIAM Journal on Imaging Sciences, 4(2):543572, 2011.
[16] R. Hadani and A. Singer. Representation theoretic patterns in three dimensional cryo-electron microscopy I  the intrinsic reconstitution algorithm. Annals of Mathematics, 174(2):12191241, 2011.
[17] R. Hadani and A. Singer. Representation theoretic patterns in three-dimensional cryo-electron microscopy II  the class averaging problem. Foundations of Computational Mathematics, 11(5):589616, 2011.
[18] H.W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2, 1955.
[19] E.P. Wigner. On the distribution of the roots of certain symmetric matrices. Ann. Math, 67, 1958.
[20] Z. Furedi and J. Komlos. The eigenvalues of random symmetric matrices. Combinatorica, 1, 1981.
[21] F. Benaych-Georges and R.R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturba- tions of large random matrices. Advances in Mathematics, 227(1):494521, 2011.
[22] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI, 24(4):509522, 2002.
[23] R. Roberts, S. Sinha, R. Szeliski, and D. Steedly. Structure from motion for scenes with large duplicate structures. In CVPR, 2011.
[24] T.S. Caetano, J.J. McAuley, L. Cheng, Q.V. Le, and A.J. Smola. Learning graph matching. PAMI, 31(6):10481058, 2009.
[25] M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and map inference. In NIPS, 2009.
[26] T. Jebara, J. Wang, and S.F. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, 2009.
-----1
[1] F. Bach. Learning with submodular functions: A convex optimization perspective. Arxiv preprint arXiv:1111.6453v2, 2013.
[2] A. Barbero and S. Sra. Fast Newton-type methods for total variation regularization. In ICML, 2011.
[3] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces.Springer, 2011.
[4] H. H. Bauschke, P. L. Combettes, and D. R. Luke. Finding best approximation pairs relative to two closed convex sets in Hilbert spaces. J. Approx. Theory, 127(2):178192, 2004.
[5] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[6] D. P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
[7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE TPAMI, 23(11):12221239, 2001.
[8] B.Savchynskyy, S.Schmidt, J.H.Kappes, and C.Schnorr. Efficient MRF energy minimization via adaptive diminishing smoothing. In UAI, 2012.
[9] A. Chambolle. An algorithm for total variation minimization and applications. J Math. Imaging and Vision, 20(1):8997, 2004.
[10] A. Chambolle and J. Darbon. On total variation minimization and surface evolution using parametric maximum flows. Int. Journal of Comp. Vision, 84(3):288307, 2009.
[11] F. Chudak and K. Nagano. Efficient solutions to relaxations of combinatorial problems with submodular penalties via the Lovasz extension and non-smooth convex optimization. In SODA, 2007.
[12] P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pages 185212. Springer, 2011.
[13] F. R. Deutsch. Best Approximation in Inner Product Spaces. Springer Verlag, first edition, 2001.
[14] J. Douglas and H. H. Rachford. On the numerical solution of the heat conduction problem in 2 and 3 space variables. Tran. Amer. Math. Soc., 82:421439, 1956.
[15] J. Edmonds. Submodular functions, matroids, and certain polyhedra. In Combinatorial optimization - Eureka, you shrink!, pages 1126. Springer, 2003.
[16] U. Feige, V. S. Mirrokni, and J. Vondrak. Maximizing non-monotone submodular functions. SIAM J Comp, 40(4):11331153, 2011.
[17] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3: 95110, 1956.
[18] S. Fujishige. Lexicographically optimal base of a polymatroid with respect to a weight vector. Mathematics of Operations Research, pages 186196, 1980.
[19] S. Fujishige. Submodular Functions and Optimization. Elsevier, 2005.
[20] S. Fujishige and S. Isotani. A submodular function minimization algorithm based on the minimum-norm base. Pacific Journal of Optimization, 7:317, 2011.
[21] H. Groenevelt. Two algorithms for maximizing a separable concave function over a polymatroid feasible region. European Journal of Operational Research, 54(2):227236, 1991.
[22] D.S. Hochbaum and S.-P. Hong. About strongly polynomial time algorithms for quadratic optimization over submodular constraints. Math. Prog., pages 269309, 1995.
[23] S. Iwata and N. Zuiki. A network flow approach to cost allocation for rooted trees. Networks, 44:297301, 2004.
[24] S. Jegelka, H. Lin, and J. Bilmes. On fast approximate submodular minimization. In NIPS, 2011.
[25] S. Jegelka, F. Bach, and S. Sra. Reflection methods for user-friendly submodular optimization (extended version). arXiv, 2013.
[26] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, pages 22972334, 2011.
[27] P. Kohli, L. Ladicky, and P. Torr. Robust higher order potentials for enforcing label consistency. Int.Journal of Comp. Vision, 82, 2009.
[28] V. Kolmogorov. Minimizing a sum of submodular functions. Disc. Appl. Math., 160(15), 2012.
[29] N. Komodakis, N. Paragios, and G. Tziritas. MRF energy minimization and beyond via dual decomposition.IEEE TPAMI, 33(3):531552, 2011.
[30] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology, 2(4), 2011.
[31] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In NAACL/HLT, 2011.
[32] L. Lovasz. Submodular functions and convexity. Mathematical programming: the state of the art, Bonn, pages 235257, 1982.
[33] S. T. McCormick. Submodular function minimization. Discrete Optimization, 12:321391, 2005.
[34] O. Meshi, T. Jaakkola, and A. Globerson. Convergence rate analysis of MAP coordinate minimization algorithms. In NIPS, 2012.
[35] G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations for maximizing submodular set functionsI. Math. Prog., 14(1):265294, 1978.
[36] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Prog., 103(1):127152, 2005.
[37] J. B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. Math.Prog., 118(2):237251, 2009.
[38] B. Savchynskyy, S. Schmidt, J. Kappes, and C. Schnorr. A study of Nesterovs scheme for Lagrangian decomposition and MAP labeling. In CVPR, 2011.
[39] A. Shekhovtsov and V. Hlavac. A distributed mincut/maxflow algorithm combining path augmentation and push-relabel. In Energy Minimization Methods in Computer Vision and Pattern Recognition, 2011.
[40] P. Stobbe. Convex Analysis for Minimizing and Learning Submodular Set functions. PhD thesis, California Institute of Technology, 2013.
[41] P. Stobbe and A. Krause. Efficient minimization of decomposable submodular functions. In NIPS, 2010.
[42] R. Tarjan, J. Ward, B. Zhang, Y. Zhou, and J. Mao. Balancing applied to maximum network flow problems.In European Symp. on Algorithms (ESA), pages 612623, 2006.
-----1
[1] M. F. Balcan, F. Constantin, S. Iwata, and L. Wang. Learning valuation functions. COLT, 2011.
[2] N. Balcan and N. Harvey. Submodular functions: Learnability, structure, and optimization. In Arxiv preprint, 2012.
[3] M. Conforti and G. Cornuejols. Submodular set functions, matroids and the greedy algorithm: tight worst- case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Applied Mathematics, 7(3): 251274, 1984.
[4] W. H. Cunningham. Decomposition of submodular functions. Combinatorica, 3(1):5368, 1983.
[5] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In ICML, 2011.
[6] G. Goel, C. Karande, P. Tripathi, and L. Wang. Approximability of combinatorial problems with multi-agent submodular cost functions. In FOCS, 2009.
[7] G. Goel, P. Tripathi, and L. Wang. Combinatorial problems with discounted price functions in multi-agent systems. In FSTTCS, 2010.
[8] M. Goemans, N. Harvey, S. Iwata, and V. Mirrokni. Approximating submodular functions everywhere. In SODA, pages 535544, 2009.
[9] R. Hassin, J. Monnot, and D. Segev. Approximation algorithms and hardness results for labeled connectivity problems. J Combinatorial Optimization, 14(4):437453, 2007.
[10] S. Iwata and K. Nagano. Submodular function minimization under covering constraints. In In FOCS, pages 671680. IEEE, 2009.
[11] R. Iyer and J. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In UAI, 2012.
[12] R. Iyer, S. Jegelka, and J. Bilmes. Curvature and Optimal Algorithms for Learning and Optimization of Submodular Functions: Extended arxiv version, 2013.
[13] R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based submodular function optimization. In ICML, 2013.
[14] S. Jegelka. Combinatorial Problems with submodular coupling in machine learning and computer vision.PhD thesis, ETH Zurich, 2012.
[15] S. Jegelka and J. Bilmes. Online submodular minimization for combinatorial structures. ICML, 2011.
[16] S. Jegelka and J. A. Bilmes. Approximation bounds for inference using cooperative cuts. In ICML, 2011.
[17] S. Jegelka and J. A. Bilmes. Submodularity beyond submodular energies: coupling edges in graph cuts. In CVPR, 2011.
[18] P. Kohli, A. Osokin, and S. Jegelka. A principled deep random field for image segmentation. In CVPR, 2013.
[19] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proceedings of Uncertainity in Artificial Intelligence. UAI, 2005.
[20] E. Lawler and C. Martel. Computing maximal polymatroidal network flows. Mathematics of Operations Research, 7(3):334347, 1982.
[21] H. Lin and J. Bilmes. Optimal selection of limited vocabulary speech corpora. In Interspeech, 2011.
[22] H. Lin and J. Bilmes. A class of submodular functions for document summarization. In The 49th Meeting of the Assoc. for Comp. Ling. Human Lang. Technologies (ACL/HLT-2011), Portland, OR, June 2011.
[23] H. Lin and J. Bilmes. Learning mixtures of submodular shells with application to document summarization.In UAI, 2012.
[24] E. Nikolova. Approximation algorithms for offline risk-averse combinatorial optimization, 2010.
[25] J. Soto and M. Goemans. Symmetric submodular function minimization under hereditary family constraints.arXiv:1007.2140, 2010.
[26] P. Stobbe and A. Krause. Learning fourier sparse set functions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
[27] Z. Svitkina and L. Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds.In FOCS, pages 697706, 2008.
[28] L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):11341142, 1984.
[29] J. Vondrak. Submodularity and curvature: the optimal algorithm. RIMS Kokyuroku Bessatsu, 23, 2010.
[30] P.-J. Wan, G. Calinescu, X.-Y. Li, and O. Frieder. Minimum-energy broadcasting in static ad hoc wireless networks. Wireless Networks, 8:607617, 2002.
[31] P. Zhang, J.-Y. Cai, L.-Q. Tang, and W.-B. Zhao. Approximation and hardness results for label cut and related problems. Journal of Combinatorial Optimization, 21(2):192208, 2011.
-----1
[1] Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and Aravind Srinivasan. Solving packing integer programs via randomized rounding with alterations. Theory of Computing, 8(1):533565, 2012.
[2] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
[3] Jacob Bien and Robert Tibshirani. Classification by set cover: The prototype vector machine. arXiv preprint arXiv:0908.2284, 2009.
[4] Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26:11241137, 2004.
[5] Gruia Ca?linescu, Howard Karloff, and Yuval Rabani. An improved approximation algorithm for multiway cut. In Proceedings of the thirtieth annual ACM symposium on Theory of Computing, pages 4852. ACM, 1998.
[6] Jonathan Eckstein and Paulo JS Silva. A practical relative error criterion for augmented lagrangians.Mathematical Programming, pages 130, 2010.
[7] Dorit S Hochbaum. Approximation algorithms for the set covering and vertex cover problems. SIAM Journal on Computing, 11(3):555556, 1982.
[8] VK Koval and MI Schlesinger. Two-dimensional programming in image analysis problems. USSR Academy of Science, Automatics and Telemechanics, 8:149168, 1976.
[9] Frank R Kschischang, Brendan J Frey, and H-A Loeliger. Factor graphs and the sum-product algorithm.Information Theory, IEEE Transactions on, 47(2):498519, 2001.
[10] Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung-won Hwang. Web scale entity resolution using relational evidence. Technical report, Microsoft Research, 2011.
[11] Victor Lempitsky and Yuri Boykov. Global optimization for shape fitting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07), pages 18. IEEE, 2007.
[12] Ji Liu, Stephen J. Wright, Christopher Re, and Victor Bittorf. An asynchronous parallel stochastic coor- dinate descent algorithm. Technical report, University of Wisconsin-Madison, October 2013.
[13] F Manshadi, Baruch Awerbuch, Rainer Gemulla, Rohit Khandekar, Julian Mestre, and Mauro Sozio. A distributed algorithm for large-scale generalized matching. Proceedings of the VLDB Endowment, 2013.
[14] Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.
[15] Jorge Nocedal and Stephen J Wright. Numerical Optimization. Springer, 2006.
[16] Pradeep Ravikumar, Alekh Agarwal, and Martin J Wainwright. Message-passing for graph-structured linear programs: Proximal methods and rounding schemes. The Journal of Machine Learning Research, 11:10431080, 2010.
[17] J. Renegar. Some perturbation theory for linear programming. Mathenatical Programming, Series A, 65:7392, 1994.
[18] Dan Roth and Wen-tau Yih. Integer linear programming inference for conditional random fields. In Proceedings of the 22nd International Conference on Machine Learning, pages 736743. ACM, 2005.
[19] Sujay Sanghavi, Dmitry Malioutov, and Alan S Willsky. Linear programming analysis of loopy belief propagation for weighted matching. In Advances in Neural Information Processing Systems, pages 1273 1280, 2007.
[20] Aravind Srinivasan. Improved approximation guarantees for packing and covering integer programs.SIAM Journal on Computing, 29(2):648670, 1999.
[21] Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In IJCAI, pages 17441749, 2007.
[22] Vijay V Vazirani. Approximation Algorithms. Springer, 2004.
[23] Stephen J. Wright. Implementing proximal point methods for linear programming. Journal of Optimization Theory and Applications, 65(3):531554, 1990.
[24] Zheng Wu, Ashwin Thangali, Stan Sclaroff, and Margrit Betke. Coupling detection and data association for multiple object tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 19481955. IEEE, 2012.
[25] Ke Xu and Wei Li. Many hard examples in exact phase transitions. Theoretical Computer Science, 355(3):291302, 2006.
[26] Ce Zhang and Christopher Re. Towards high-throughput gibbs sampling at scale: A study across storage managers. In SIGMOD Proceedings, 2013.
-----1
[1] J. Bergstra, D. Yamins, and D.D. Cox. Making a Science of Model Search, 2012.
[2] C. Cadieu, H. Hong, D. Yamins, N. Pinto, N. Majaj, and J.J. DiCarlo. The neural representation bench- mark and its evaluation on brain and machine. In International Conference on Learning Representations, May 2013.
[3] C. Cadieu, M. Kouh, A. Pasupathy, C. E. Connor, M. Riesenhuber, and T. Poggio. A model of v4 shape selectivity and invariance. J Neurophysiol, 98(3):173350, 2007.
[4] M. Carandini, J. B. Demb, V. Mante, D. J. Tolhurst, Y. Dan, B. A. Olshausen, J. L. Gallant, and N. C.Rust. Do we know what the early visual system does? J Neurosci, 25(46):1057797, 2005.
[5] M. Churchland and K. Shenoy. Temporal complexity and heterogeneity of single-neuron activity in pre- motor and motor cortex. Journal of Neurophysiology, 97(6):42354257, 2007.
[6] C. E. Connor, S. L. Brincat, and A. Pasupathy. Transformation of shape information in the ventral path- way. Curr Opin Neurobiol, 17(2):1407, 2007.
[7] S. V. David, B. Y. Hayden, and J. L. Gallant. Spectral receptive field properties explain shape selectivity in area v4. J Neurophysiol, 96(6):3492505, 2006.
[8] R. Desimone, T. D. Albright, C. G. Gross, and C. Bruce. Stimulus-selective properties of inferior temporal neurons in the macaque. J Neurosci, 4(8):205162, 1984.
[9] J. J. DiCarlo, D. Zoccolan, and N. C. Rust. How does the brain solve visual object recognition? Neuron, 73(3):41534, 2012.
[10] P.E. Downing, Y. Jiang, M. Shuman, and N. Kanwisher. A cortical area selective for visual processing of the human body. Science, 293:24702473, 2001.
[11] D.J. Felleman and D.C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex.Cerebral Cortex, 1:147, 1991.
[12] J. Freeman and E. Simoncelli. Metamers of the ventral stream. Nature Neuroscience, 14(9):11951201, 2011.
[13] K. Grill-Spector, Z. Kourtzi, and N. Kanwisher. The lateral occipital complex and its role in object recognition. Vision research, 41(10-11):14091422, 2001.
[14] C. P. Hung, G. Kreiman, T. Poggio, and J. J. DiCarlo. Fast readout of object identity from macaque inferior temporal cortex. Science, 310(5749):8636, 2005.
[15] N. Kanwisher, J. McDermott, and M. M. Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci, 17(11):430211, 1997.
[16] R. Kiani, H. Esteky, K. Mirpour, and K. Tanaka. Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. J Neurophysiol, 97(6):4296309, 2007.
[17] Z. Kourtzi and N. Kanwisher. Representation of perceived object shape by the human lateral occipital complex. Science, 293(5534):15061509, 2001.
[18] N. Kriegeskorte. Relating population-code representations between man, monkey, and computational models. Frontiers in Neuroscience, 3(3):363, 2009.
[19] N. Kriegeskorte, M. Mur, D. A. Ruff, R. Kiani, J. Bodurka, H. Esteky, K. Tanaka, and P. A. Bandettini.Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6):112641, 2008.
[20] A Krizhevsky, I Sutskever, and G Hinton. ImageNet classification with deep convolutional neural net- works. Advances in Neural Information Processing Systems, 2012.
[21] P. Lennie and J. A. Movshon. Coding of color and form in the geniculostriate visual pathway (invited review). J Opt Soc Am A Opt Image Sci Vis, 22(10):201333, 2005.
[22] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
[23] N. Majaj, H. Najib, E. Solomon, and J.J. DiCarlo. A unified neuronal population code fully explains human object recognition. In Computational and Systems Neuroscience (COSYNE), 2012.
[24] David Marr, Tomaso Poggio, and Shimon Ullman. Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. MIT Press, July 2010.
[25] J. Mutch and D. G. Lowe. Object class recognition and localization using sparse features with limited receptive fields. IJCV, 2008.
[26] Bruno A Olshausen and David J Field. How close are we to understanding v1? Neural computation, 17(8):16651699, 2005.
[27] N. Pinto, D. D. Cox, and J. J. DiCarlo. Why is Real-World Visual Object Recognition Hard. PLoS Comput Biol, 2008.
[28] N. Pinto, D. Doukhan, J. J. DiCarlo, and D. D. Cox. A High-Throughput Screening Approach to Discov- ering Good Forms of Biologically Inspired Visual Representation. PLoS Computational Biology, 5(11), 2009.
[29] T. Serre, A. Oliva, and T. Poggio. A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci U S A, 104(15):64249, 2007. 0027-8424 (Print) Journal Article.
[30] Thomas Serre, Gabriel Kreiman, Minjoon Kouh, Charles Cadieu, Ulf Knoblich, and Tomaso Poggio.A quantitative theory of immediate visual recognition. In Prog. Brain Res., volume 165, pages 3356.Elsevier, 2007.
-----1
[1] F. Theunissen, S. David, N. Singh, A. Hsu, W. Vinje, and J. Gallant. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network: Computation in Neural Systems, 12:289316, 2001.
[2] D. Smyth, B. Willmore, G. Baker, I. Thompson, and D. Tolhurst. The receptive-field organization of simple cells in primary visual cortex of ferrets under natural scene stimulation. Journal of Neuroscience, 23:47464759, 2003.
[3] M. Sahani and J. Linden. Evidence optimization techniques for estimating stimulus-response functions.NIPS, 15, 2003.
[4] S.V. David and J.L. Gallant. Predicting neuronal responses during natural vision. Network: Computation in Neural Systems, 16(2):239260, 2005.
[5] M. Park and J. W. Pillow. Receptive field inference with localized priors. PLoS Comput Biol, 7(10):e1002219, 2011.
[6] Jennifer F. Linden, Robert C. Liu, Maneesh Sahani, Christoph E. Schreiner, and Michael M. Merzenich.Spectrotemporal structure of receptive fields in areas ai and aaf of mouse auditory cortex. Journal of Neurophysiology, 90(4):26602675, 2003.
[7] Anqi Qiu, Christoph E. Schreiner, and Monty A. Escab. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. Journal of Neurophysiology, 90(1):456476, 2003.
[8] J. W. Pillow and E. P. Simoncelli. Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis. Journal of Vision, 6(4):414428, 4 2006.
[9] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, and E. P. Chichilnisky, E. J. Simoncelli. Spatio- temporal correlations and visual signaling in a complete neuronal population. Nature, 454:995999, 2008.
[10] A.J. Izenman. Reduced-rank regression for the multivariate linear model. Journal of multivariate analysis, 5(2):248264, 1975.
[11] Gregory C Reinsel and Rajabather Palani Velu. Multivariate reduced-rank regression: theory and appli- cations. Springer New York, 1998.
[12] John Geweke. Bayesian reduced rank regression in econometrics. Journal of Econometrics, 75(1):121  146, 1996.
[13] A.P. Dawid. Some matrix-variate distribution theory: notational considerations and a bayesian applica- tion. Biometrika, 68(1):265, 1981.
[14] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15:243262, 2004.
[15] M. Park and J. W. Pillow. Bayesian active learning with localized priors for fast receptive field character- ization. In NIPS, pages 23572365, 2012.
[16] N. C. Rust, Schwartz O., J. A. Movshon, and Simoncelli E.P. Spatiotemporal elements of macaque v1 receptive fields. Neuron, 46(6):945956, 2005.
-----1
[1] P. Z. Marmarelis and V. Marmarelis. Analysis of physiological systems: the white-noise approach. Plenum Press, New York, 1978.
[2] Taiho Koh and E. Powers. Second-order volterra filtering and its application to nonlinear system identifi- cation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6):14451455, 1985.
[3] Il Memming Park and Jonathan W. Pillow. Bayesian spike-triggered covariance analysis. Advances in Neural Information Processing Systems 24, pp 16921700, 2011.
[4] E. P. Simoncelli, J. W. Pillow, L. Paninski, and O. Schwartz. Characterization of neural responses with stochastic stimuli. The Cognitive Neurosciences, III, chapter 23, pp 327338. MIT Press, Cambridge, MA, October 2004.
[5] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15:243262, 2004.
[6] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge. Bayesian inference for spiking neuron models with a sparsity prior. Advances in Neural Information Processing Systems, pp 529536, 2008.
[7] J. Bussgang. Crosscorrelation functions of amplitude-distorted gaussian signals. RLE Technical Reports, 216, 1952.
[8] E. deBoer and P. Kuyper. Triggered correlation. IEEE Transact. Biomed. Eng., 15, pp 169179, 1968.
[9] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199213, 2001.
[10] R. R. de Ruyter van Steveninck and W. Bialek. Real-time performance of a movement-senstivive neuron in the blowfly visual system: coding and information transmission in short spike sequences. Proc. R. Soc.Lond. B, 234:379414, 1988.
[11] O. Schwartz, J. W. Pillow, N. C. Rust, and E. P. Simoncelli. Spike-triggered neural characterization. J.Vision, 6(4):484507, 7 2006.
[12] RD Cook and S. Weisberg. Comment on sliced inverse regression for dimension reduction by k.-c. li.Journal of the American Statistical Association, 86:328332, 1991.
[13] Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316327, 1991.
[14] Alexandro D. Ramirez and Liam Paninski. Fast inference in generalized linear models via expected log-likelihoods. Journal of Computational Neuroscience, pp 120, 2013.
[15] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown. A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. J.Neurophysiol, 93(2):10741089, 2005.
[16] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, and E. P. Chichilnisky, E. J. Simoncelli. Spatio- temporal correlations and visual signaling in a complete neuronal population. Nature, 454:995999, 2008.
[17] L. Paninski. Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems, 14:437464, 2003.
[18] J. W. Pillow and E. P. Simoncelli. Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis. J. Vision, 6(4):414428, 4 2006.
[19] Ines Samengo and Tim Gollisch. Spike-triggered covariance: geometric proof, symmetry properties, and extension beyond gaussian stimuli. Journal of Computational Neuroscience, 34(1):137161, 2013.
[20] Tatyana Sharpee, Nicole C. Rust, and William Bialek. Analyzing neural responses to natural signals: maximally informative dimensions. Neural Comput, 16(2):223250, Feb 2004.
[21] R. S. Williamson, M. Sahani, and J. W. Pillow. Equating information-theoretic and likelihood-based methods for neural dimensionality reduction. arXiv:1308.3542 [q-bio.NC], 2013.
[22] J. D. Fitzgerald, R. J. Rowekamp, L. C. Sincich, and T. O. Sharpee. Second order dimensionality reduction using minimum and maximum mutual information models. PLoS Comput Biol, 7(10):e1002249, 2011.
[23] K. Rajan and W. Bialek. Maximally informative stimulus energies in the analysis of neural responses to natural signals. arXiv:1201.0321v1 [q-bio.NC], 2012.
[24] James M. McFarland, Yuwei Cui, and Daniel A. Butts. Inferring nonlinear neuronal computation based on physiologically plausible inputs. PLoS Comput Biol, 9(7):e1003143+, July 2013.
[25] L. Theis, A. M. Chagas, D. Arnstein, C. Schwarz, and M. Bethge. Beyond glms: A generative mixture modeling approach to neural system identification. PLoS Computational Biology, Nov 2013. in press.
[26] A. M. Mathai and S. B. Provost. Quadratic forms in random variables: theory and applications. M.Dekker, 1992.
[27] Y. S. Cho and E. J. Powers. Estimation of quadratically nonlinear systems with an i.i.d. input. [Pro- ceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing pp 31173120 vol.5. IEEE, 1991.
[28] V. J. Uzzell and E. J. Chichilnisky. Precision of spike trains in primate retinal ganglion cells. Journal of Neurophysiology, 92:780789, 2004.
-----1
[1] Kukjin Kang, Robert M Shapley, and Haim Sompolinsky. Information tuning of populations of neurons in primary visual cortex. The Journal of neuroscience, 24(15):37263735, 2004.
[2] Apostolos P. Georgopoulos; Andrew B. Schwartz; Ronald E. Kettner. Adaptation of the motion-sensitive neuron h1 is generated locally and governed by contrast frequency. Science, 233:14161419, 1986.
[3] FE Theunissen and JP Miller. Representation of sensory information in the cricket cercal sensory system. II. information theoretic calculation of system accuracy and optimal tuning- curve widths of four primary interneurons. J Neurophysiol, 66(5):16901703, November 1991.
[4] DC Fitzpatrick, R Batra, TR Stanford, and S Kuwada. A neuronal population code for sound localization. Nature, 388:871874, 1997.
[5] NS Harper and D McAlpine. Optimal neural population coding of an auditory spatial cue.Nature, 430:682686, 2004.
[6] N Brenner, W Bialek, and R de Ruyter van Steveninck. Adaptive rescaling maximizes infor- mation transmission. Neuron, 26:695702, 2000.
[7] Tvd Twer and DIA MacLeod. Optimal nonlinear codes for the perception of natural colours.Network: Computation in Neural Systems, 12(3):395407, 2001.
[8] I Dean, NS Harper, and D McAlpine. Neural population coding of sound level adapts to stimulus statistics. Nature neuroscience, 8:16841689, 2005.
[9] Y Ozuysal and SA Baccus. Linking the computational structure of variance adaptation to biophysical mechanisms. Neuron, 73:10021015, 2012.
[10] SB Laughlin. A simple coding procedure enhances a neurons information capacity. Z. Natur- forschung, 36c(3):910912, 1981.
[11] J-P Nadal and N Parga. Non linear neurons in the low noise limit: A factorial code maximizes information transfer, 1994.
[12] M Bethge, D Rotermund, and K Pawelzik. Optimal short-term population coding: when Fisher information fails. Neural Computation, 14:23172351, 2002.
[13] M Bethge, D Rotermund, and K Pawelzik. Optimal neural rate coding leads to bimodal firing rate distributions. Netw. Comput. Neural Syst., 14:303319, 2003.
[14] MD McDonnell and NG Stocks. Maximally informative stimuli and tuning curves for sig- moidal rate-coding neurons and populations. Phys. Rev. Lett., 101:058103, 2008.
[15] Z Wang, A Stocker, and DD Lee. Optimal neural tuning curves for arbitrary stimulus distribu- tions: Discrimax, infomax and minimum lp loss. Adv. Neural Information Processing Systems, 25:21772185, 2012.
[16] N Brunel and J-P Nadal. Mutual information, fisher information and population coding. Neural Computation, 10(7):17311757, 1998.
[17] K Zhang and TJ Sejnowski. Neuronal tuning: To sharpen or broaden? Neural Computation, 11:7584, 1999.
[18] A Pouget, S Deneve, J-C Ducom, and PE Latham. Narrow versus wide tuning curves: Whats best for a population code? Neural Computation, 11:8590, 1999.
[19] AP Nikitin, NG Stocks, RP Morse, and MD McDonnell. Neural population coding is optimized by discrete tuning curves. Phys. Rev. Lett., 103:138101, 2009.
[20] D Ganguli and EP Simoncelli. Implicit encoding of prior probabilities in optimal neural pop- ulations. Adv. Neural Information Processing Systems, 23:658666, 2010.
[21] S Yaeli and R Meir. Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons. Front Comput Neurosci, 4, 2010.
[22] Haim Sompolinsky and Hyoungsoo Yoon. The effect of correlations on the fisher information of population codes. Advances in Neural Information Processing Systems 11: Proceedings of the 1998 Confernce, 11:167, 1999.
[23] AJ Bell and TJ Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7:11291159, 1995.
[24] DJ Field BA Olshausen. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607609, 1996.
[25] A Hyvarinen and E Oja. Independent component analysis: Algorithms and applications. Neu- ral Networks, 13:411430, 2000.
[26] P Berens, A Ecker, S Gerwinn, AS Tolias, and M Bethge. Reassessing optimal neural pop- ulation codes with neurometric functions. Proceedings of the National Academy of Sciences, 11:44234428, 2011.
[27] TM Cover and J Thomas. Elements of Information Theory. Wiley, 1991.
[28] EL Lehmann and G Casella. Theory of point estimation. New York: Springer-Verlag., 1999.
[29] GH Hardy, JE Littlewood, and G Polya. Inequalities, 2nd ed. Cambridge University Press, 1988.
-----1
[1] I. H. Stevenson and K. P. Kording, How advances in neural recording affect data analysis, Nature neuroscience, vol. 14, no. 2, pp. 139142, 2011.
[2] M. Okun, P. Yger, S. L. Marguet, F. Gerard-Mercier, A. Benucci, S. Katzner, L. Busse, M. Carandini, and K. D. Harris, Population rate dynamics and multineuron firing patterns in sensory cortex, The Journal of Neuroscience, vol. 32, no. 48, pp. 1710817119, 2012.
[3] K. L. Briggman, H. D. I. Abarbanel, and W. B. Kristan, Optical imaging of neuronal populations during decision-making, Science, vol. 307, no. 5711, pp. 896901, 2005.
[4] C. K. Machens, R. Romo, and C. D. Brody, Functional, but not anatomical, separation of what and when in prefrontal cortex, The Journal of Neuroscience, vol. 30, no. 1, pp. 350360, 2010.
[5] M. Stopfer, V. Jayaraman, and G. Laurent, Intensity versus identity coding in an olfactory system, Neuron, vol. 39, no. 6, pp. 9911004, 2003.
[6] M. M. Churchland, J. P. Cunningham, M. T. Kaufman, J. D. Foster, P. Nuyujukian, S. I. Ryu, and K. V.Shenoy, Neural population dynamics during reaching, Nature, 2012.
[7] W. Brendel, R. Romo, and C. K. Machens, Demixed principal component analysis, Advances in Neural Information Processing Systems, vol. 24, pp. 19, 2011.
[8] L. Paninski, Y. Ahmadian, D. G. Ferreira, S. Koyama, K. R. Rad, M. Vidne, J. Vogelstein, and W. Wu, A new look at state-space models for neural data, Journal of Computational Neuroscience, vol. 29, no. 1-2, pp. 107126, 2010.
[9] B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity, Journal of neuro- physiology, vol. 102, no. 1, pp. 614635, 2009.
[10] B. Petreska, B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, Dynam- ical segmentation of single trials from population neural data, Advances in neural information processing systems, vol. 24, 2011.
[11] J. E. Kulkarni and L. Paninski, Common-input models for multiple neural spike-train data, Network: Computation in Neural Systems, vol. 18, no. 4, pp. 375407, 2007.
[12] A. Smith and E. Brown, Estimating a state-space model from point process observations, Neural Com- putation, vol. 15, no. 5, pp. 965991, 2003.
[13] M. Fazel, H. Hindi, and S. P. Boyd, A rank minimization heuristic with application to minimum order system approximation, Proceedings of the American Control Conference., vol. 6, pp. 47344739, 2001.
[14] Z. Liu and L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM Journal on Matrix Analysis and Applications, vol. 31, pp. 12351256, 2009.
[15] Z. Liu, A. Hansson, and L. Vandenberghe, Nuclear norm system identification with missing inputs and outputs, Systems & Control Letters, vol. 62, no. 8, pp. 605612, 2013.
[16] L. Buesing, J. Macke, and M. Sahani, Spectral learning of linear dynamics from generalised-linear obser- vations with application to neural population data, Advances in neural information processing systems, vol. 25, 2012.
[17] L. Paninski, J. Pillow, and E. Simoncelli, Maximum likelihood estimation of a stochastic integrate-and- fire neural encoding model, Neural computation, vol. 16, no. 12, pp. 25332561, 2004.
[18] E. Chornoboy, L. Schramm, and A. Karr, Maximum likelihood identification of neural point process systems, Biological cybernetics, vol. 59, no. 4-5, pp. 265275, 1988.
[19] J. Macke, J. Cunningham, M. Byron, K. Shenoy, and M. Sahani, Empirical models of spiking in neural populations, Advances in neural information processing systems, vol. 24, 2011.
[20] M. Collins, S. Dasgupta, and R. E. Schapire, A generalization of principal component analysis to the exponential family, Advances in neural information processing systems, vol. 14, 2001.
[21] V. Solo and S. A. Pasha, Point-process principal components analysis via geometric optimization, Neu- ral Computation, vol. 25, no. 1, pp. 101122, 2013.
[22] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma, Stable principal component pursuit, Proceedings of the IEEE International Symposium on Information Theory, pp. 15181522, 2010.
[23] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering with Bregman divergences, The Journal of Machine Learning Research, vol. 6, pp. 17051749, 2005.
[24] S. P. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learn- ing via the alternating direction method of multipliers, Foundations and Trends R in Machine Learning, vol. 3, no. 1, pp. 1122, 2011.
[25] P. Van Overschee and B. De Moor, Subspace identification for linear systems: theory, implementation, applications, 1996.
[26] V. Lawhern, W. Wu, N. Hatsopoulos, and L. Paninski, Population decoding of motor cortical activity using a generalized linear model with hidden states, Journal of neuroscience methods, vol. 189, no. 2, pp. 267280, 2010.
[27] S. Koyama, L. Castellanos Perez-Bolde, C. R. Shalizi, and R. E. Kass, Approximate methods for state- space models, Journal of the American Statistical Association, vol. 105, no. 489, pp. 170180, 2010.
[28] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. Chichilnisky, and E. P. Simoncelli, Spatio- temporal correlations and visual signalling in a complete neuronal population, Nature, vol. 454, no. 7207, pp. 995999, 2008.
[29] M. Harrison, Conditional inference for learning the network structure of cortical microcircuits, in 2012 Joint Statistical Meeting, (San Diego, CA), 2012.
-----0
Ahrens, M. B., M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller (2013). Whole-brain functional imaging at cellular resolution using light-sheet microscopy. Nature methods 10(5), 413420.
Amelunxen, D., M. Lotz, M. B. McCoy, and J. A. Tropp (2013). Living on the edge: A geometric theory of phase transitions in convex optimization. arXiv preprint arXiv:1303.6672.
Ba, D., B. Babadi, P. Purdon, and E. Brown (2012). Exact and stable recovery of sequences of signals with sparse increments via differential l1-minimization. In Advances in Neural Information Processing Systems 25, pp. 26362644.
Blanchard, J. D., C. Cartis, and J. Tanner (2011). Compressed sensing: How sharp is the restricted isometry property? SIAM review 53(1), 105125.Boyd, S., N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning 3(1), 1122.
Branco, T., B. A. Clark, and M. Hausser (2010). Dendritic discrimination of temporal input sequences in cortical neurons. Science 329, 16711675.
Candes, E. J., Y. C. Eldar, D. Needell, and P. Randall (2011). Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis 31(1), 5973.
Candes, E. J. and T. Tao (2005). Decoding by linear programming. Information Theory, IEEE Transactions on 51(12), 42034215.
Chandrasekaran, V., B. Recht, P. A. Parrilo, and A. S. Willsky (2012). The convex geometry of linear inverse problems. Foundations of Computational Mathematics 12(6), 805849.
Cotton, R. J., E. Froudarakis, P. Storer, P. Saggau, and A. S. Tolias (2013). Three-dimensional mapping of microcircuit correlation structure. Frontiers in Neural Circuits 7.
Donoho, D. and J. Tanner (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367(1906), 42734293.
Duarte, M. F., M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk (2008).Single-pixel imaging via compressive sampling. Signal Processing Magazine, IEEE 25(2), 8391.
Fazel, M. (2002). Matrix rank minimization with applications. Ph. D. thesis, Stanford University.Gehm, M., R. John, D. Brady, R. Willett, and T. Schulz (2007). Single-shot compressive spectral imaging with a dual-disperser architecture. Opt. Express 15(21), 1401314027.
Katz, O., Y. Bromberg, and Y. Silberberg (2009). Compressive ghost imaging. Applied Physics Letters 95(13).Lewicki, M. (1998). A review of methods for spike sorting: the detection and classification of neural action potentials. Network: Computation in Neural Systems 9, R53R78.
Lustig, M., D. Donoho, and J. M. Pauly (2007). Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic resonance in medicine 58(6), 11821195.
Nikolenko, V., B. Watson, R. Araya, A. Woodruff, D. Peterka, and R. Yuste (2008). SLM microscopy: Scanless two-photon imaging and photostimulation using spatial light modulators. Frontiers in Neural Circuits 2, 5.
Pnevmatikakis, E., T. Machado, L. Grosenick, B. Poole, J. Vogelstein, and L. Paninski (2013). Rank-penalized nonnegative spatiotemporal deconvolution and demixing of calcium imaging data. In Computational and Systems Neuroscience Meeting COSYNE. (journal paper in preparation for PLoS Computational Biology).
Recht, B., M. Fazel, and P. Parrilo (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471501.
Reddy, G., K. Kelleher, R. Fink, and P. Saggau (2008). Three-dimensional random access multiphoton microscopy for functional imaging of neuronal activity. Nature Neuroscience 11(6), 713720.Rockafellar, R. (1970). Convex Analysis. Princeton University Press.
Rust, M. J., M. Bates, and X. Zhuang (2006). Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nature methods 3(10), 793796.
Studer, V., J. Bobin, M. Chahid, H. S. Mousavi, E. Candes, and M. Dahan (2012). Compressive fluorescence microscopy for biological and hyperspectral imaging. Proceedings of the National Academy of Sciences 109(26), E1679E1687.
Vogelstein, J., A. Packer, T. Machado, T. Sippy, B. Babadi, R. Yuste, and L. Paninski (2010). Fast non-negative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology 104(6), 36913704.
Yap, H. L., A. Eftekhari, M. B. Wakin, and C. J. Rozell (2011). The restricted isometry property for block diagonal matrices. In Information Sciences and Systems (CISS), 2011 45th Annual Conference on, pp. 16.
-----1
[1] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 126 134, Lake Tahoe, NV, USA, 2012.
[2] Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4):324345, 1952.
[3] Marquis de Condorcet. Essai sur lapplication de lanalyse a` la probabilite des decisions rendues a` la pluralite des voix. Paris: LImprimerie Royale, 1785.
[4] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th World Wide Web Conference, pages 613622, 2001.
[5] Patricia Everaere, Sebastien Konieczny, and Pierre Marquis. The strategy-proofness landscape of merg- ing. Journal of Artificial Intelligence Research, 28:49105, 2007.
[6] Lester R. Ford, Jr. Solution of a ranking problem from binary comparisons. The American Mathematical Monthly, 64(8):2833, 1957.
[7] Lars Peter Hansen. Large Sample Properties of Generalized Method of Moments Estimators. Economet- rica, 50(4):10291054, 1982.
[8] David R. Hunter. MM algorithms for generalized Bradley-Terry models. In The Annals of Statistics, volume 32, pages 384406, 2004.
[9] Toshihiro Kamishima. Nantonac collaborative filtering: Recommendation based on order responses. In Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (KDD), pages 583588, Washington, DC, USA, 2003.
[10] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.
[11] Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer, 2011.
[12] Tyler Lu and Craig Boutilier. Learning Mallows models with pairwise preferences. In Proceedings of the Twenty-Eighth International Conference on Machine Learning (ICML 2011), pages 145152, Bellevue, WA, USA, 2011.
[13] Robert Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.
[14] Colin L. Mallows. Non-null ranking model. Biometrika, 44(1/2):114130, 1957.
[15] Andrew Mao, Ariel D. Procaccia, and Yiling Chen. Better human computation through principled voting.In Proceedings of the National Conference on Artificial Intelligence (AAAI), Bellevue, WA, USA, 2013.
[16] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Iterative ranking from pair-wise comparisons. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 2483 2491, Lake Tahoe, NV, USA, 2012.
[17] Robin L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193202, 1975.
[18] Louis Leon Thurstone. A law of comparative judgement. Psychological Review, 34(4):273286, 1927.
-----1
[1] Daniel A. Ackerberg. Advertising, learning, and consumer choice in experience goods: An empirical examination. International Economic Review, 44(3):10071040, 2003.
[2] N. Atienza, J. Garcia-Heras, and J.M. Muoz-Pichardo. A new condition for identifiability of finite mix- ture distributions. Metrika, 63(2):215221, 2006.
[3] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 126 134, Lake Tahoe, NV, USA, 2012.
[4] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Preference elicitation for generalized random utility models. In Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence (UAI), Bellevue, Washington, USA, 2013.
[5] Patrick Bajari and C. Lanier Benkard. Discrete choice models as structural models of demand: Some economic implications of common approaches. Technical report, Working Paper, 2003.
[6] James Berkovec and John Rust. A nested logit model of automobile holdings for one vehicle households.Transportation Research Part B: Methodological, 19(4):275285, 1985.
[7] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Economet- rica, 63(4):841890, 1995.
[8] Steven Berry, James Levinsohn, and Ariel Pakes. Voluntary export restraints on automobiles: evaluating a trade policy. The American Economic Review, 89(3):400430, 1999.
[9] Steven Berry, James Levinsohn, and Ariel Pakes. Differentiated products demand systems from a com- bination of micro and macro data: The new car market. Journal of Political Economy, 112(1):68105, 2004.
[10] Steven Berry and Ariel Pakes. Some applications and limitations of recent advances in empirical indus- trial organization: Merger analysis. The American Economic Review, 83(2):247252, 1993.
[11] Steven Berry. Estimating discrete-choice models of product differentiation. The RAND Journal of Eco- nomics, pages 242262, 1994.
[12] Edwin Bonilla, Shengbo Guo, and Scott Sanner. Gaussian process preference elicitation. In Advances in Neural Information Processing Systems 23, pages 262270. 2010.
[13] Stephen P Brooks, Paulo Giudici, and Gareth O Roberts. Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1):339, 2003.
[14] Wei Chu and Zoubin Ghahramani. Gaussian processes for ordinal regression. In Journal of Machine Learning Research, pages 10191041, 2005.
[15] Chris Fraley and Adrian E. Raftery. How many clusters? which clustering method? answers via model- based cluster analysis. THE COMPUTER JOURNAL, 41(8):578588, 1998.
[16] John Geweke, Michael Keane, and David Runkle. Alternative computational approaches to inference in the multinomial probit model. Review of Economics and Statistics, pages 609632, 1994.
[17] P.J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.Biometrika, 82(4):711732, 1995.
[18] Bettina Grn and Friedrich Leisch. Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. Journal of Classification, 25(2):225247, 2008.
[19] Jerry A. Hausman. Valuation of new goods under perfect and imperfect competition. In The economies of new goods, pages 207248. University of Chicago Press, 1996.
[20] James J. Heckman and Burton Singer. Econometric duration analysis. Journal of Econometrics, 24(1- 2):63132, 1984.
[21] Igal Hendel and Aviv Nevo. Measuring the implications of sales and consumer inventory behavior.Econometrica, 74(6):16371673, 2006.
[22] Katherine Ho. The welfare effects of restricted hospital choice in the us medical care market. Journal of Applied Econometrics, 21(7):10391079, 2006.
[23] Neil Houlsby, Jose Miguel Hernandez-Lobato, Ferenc Huszar, and Zoubin Ghahramani. Collaborative gaussian processes for preference learning. In Proceedings of the Annual Conference on Neural Infor- mation Processing Systems (NIPS), pages 21052113. Lake Tahoe, NV, USA, 2012.
[24] Kamel Jedidi, Harsharanjeet S. Jagpal, and Wayne S. DeSarbo. Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16(1):3959, 1997.
[25] Toshihiro Kamishima. Nantonac collaborative filtering: Recommendation based on order responses. In Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (KDD), pages 583588, Washington, DC, USA, 2003.
[26] Daniel McFadden. The measurement of urban travel demand. Journal of Public Economics, 3(4):303 328, 1974.
[27] Daniel McFadden. Modelling the choice of residential location. In Daniel McFadden, A Karlqvist, L Lundqvist, F Snickars, and J Weibull, editors, Spatial Interaction Theory and Planing Models, pages 7596. New York: Academic Press, 1978.
[28] Gormley-Claire McParland, Damien. Clustering ordinal data via latent variable models. IFCS 2013 Con- ference of the International Federation of Classification Societies, Tilburg University, The Netherlands, 2013.
[29] Marina Meila and Harr Chen. Dirichlet process mixtures of generalized Mallows models. arXiv preprint arXiv:1203.3496, 2012.
[30] Andras Prekopa. Logarithmic concave measures and related topics. In Stochastic Programming, pages 6382. Academic Press, 1980.
[31] Mahlet G. Tadesse, Naijun Sha, and Marina Vannucci. Bayesian variable selection in clustering high- dimensional data. Journal of the American Statistical Association, 100(470):602617, 2005.
-----1
[1] J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert. A new approach to collaborative filtering: Operator estimation with spectral regularization. JMLR, 10:803826, 2009.
[2] R. Adams, G. Dahl, and I. Murray. Incorporating side information in probabilistic matrix factorization with gaussian processes. In UAI, 2010.
[3] D. Agarwal and B.-C. Chen. Regression-based latent factor models. In KDD, 2009.
[4] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. MLJ, 73(3):243272, 2008.
[5] H. Avron, S. Kale, S. Kasiviswanathan, and V. Sindhwani. Efficient and practical stochastic subgradient descent for nuclear norm regularization. In ICML, 2012.
[6] F. Bach. Consistency of trace norm minimization. JMLR, 9:10191048, 2008.
[7] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):17571771, 2004.
[8] S. Bucak, R. Jin, and A. Jain. Multi-label learning with incomplete class assignments. In CVPR, 2011.
[9] R. Cabral, F. Torre, J. Costeira, and A. Bernardino. Matrix completion for multi-label image classification.In NIPS, 2011.
[10] J.-F. Cai, E. Cande`s, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):19561982, 2010.
[11] E. Cande`s and B. Recht. Exact matrix completion via convex optimization. CACM, 55(6):111119, 2012.
[12] E. Cande`s and T. Tao. The power of convex relaxation: near-optimal matrix completion. IEEE TIT, 56(5):20532080, 2010.
[13] C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACM TIST, 2(3):27, 2011.
[14] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR, 2009.
[15] B. Eriksson, L. Balzano, and R. Nowak. High-rank matrix completion and subspace clustering with missing data. CoRR, 2011.Table 2: Results on transductive incomplete multi-label learning. Algo. specifies the name of the algorithms.Time is the CPU time measured in seconds. AP is Average Precision measured based on test data; the higher the AP, the better the performance. !% represents the percentage of training instances with observed label assign- ment for each label. The best result and its comparable ones (pairwise single-tailed t-tests at 95% confidence level) are bolded.Data Algo. !% = 10% !% = 20% !% = 40%time AP time AP time AP Arts Maxide 3:09 100 0:548 3:60 100 0:572 4:42 100 0:596 MC-b 2:47 104 0:428 1:59 104 0:444 9:54 103 0:434 MC-1 2:39 104 0:430 2:05 104 0:494 1:27 104 0:473 BR-R 1:63 101 0:540 2:98 101 0:563 5:71 101 0:574 BR-1 1:77 101 0:540 3:07 101 0:563 7:10 101 0:575 Business Maxide 3:24 100 0:868 3:89 100 0:860 5:04 100 0:872 MC-b 2:94 104 0:865 1:83 104 0:851 1:08 104 0:858 MC-1 3:25 104 0:865 2:18 104 0:855 1:21 104 0:862 BR-R 1:02 101 0:846 1:78 101 0:841 3:32 101 0:854 BR-1 1:19 101 0:846 1:96 101 0:841 4:30 101 0:854 Computers Maxide 4:67 100 0:635 5:81 100 0:660 7:79 100 0:675 MC-b 5:58 104 0:597 3:38 104 0:599 1:87 104 0:604 MC-1 6:56 104 0:600 4:40 104 0:608 2:30 104 0:618 BR-R 2:34 101 0:622 4:13 101 0:649 7:68 101 0:662 BR-1 2:70 101 0:621 4:50 101 0:648 8:25 101 0:661 Education Maxide 4:40 100 0:566 5:41 100 0:604 6:73 100 0:618 MC-b 3:82 104 0:472 2:40 104 0:478 1:32 104 0:474 MC-1 4:68 104 0:484 3:02 104 0:536 1:55 104 0:564 BR-R 1:77 101 0:535 3:16 101 0:568 6:01 101 0:583 BR-1 1:94 101 0:535 3:28 101 0:568 6:94 101 0:583 Entertainment Maxide 2:77 100 0:631 3:41 100 0:650 4:56 100 0:679 MC-b 4:86 104 0:474 3:13 104 0:467 1:73 104 0:468 MC-1 4:40 104 0:489 4:15 104 0:492 2:27 104 0:578 BR-R 1:89 101 0:628 3:38 101 0:638 6:47 101 0:668 BR-1 2:04 101 0:627 3:44 101 0:640 6:41 101 0:667 Health Maxide 4:31 100 0:725 5:36 100 0:746 7:11 100 0:769 MC-b 4:98 104 0:609 2:99 104 0:607 1:71 104 0:610 MC-1 5:82 104 0:626 3:82 104 0:632 2:03 104 0:645 BR-R 2:03 101 0:725 3:61 101 0:742 6:83 101 0:757 BR-1 2:16 101 0:725 3:59 101 0:741 7:05 101 0:757 Recreation Maxide 2:75 100 0:559 3:38 100 0:592 4:44 100 0:614 MC-b 3:56 104 0:381 2:41 104 0:381 1:30 104 0:378 MC-1 3:48 104 0:381 3:25 104 0:430 1:90 104 0:421 BR-R 1:97 101 0:548 3:48 101 0:574 6:53 101 0:596 BR-1 2:24 101 0:547 3:74 101 0:573 6:86 101 0:596 Reference Maxide 5:11 100 0:635 6:47 100 0:666 8:49 100 0:696 MC-b 9:38 104 0:565 5:38 104 0:561 2:75 104 0:575 MC-1 1:11 105 0:576 6:53 104 0:576 3:22 104 0:575 BR-R 2:28 101 0:644 3:89 101 0:670 7:08 101 0:693 BR-1 2:71 101 0:644 4:34 101 0:669 7:48 101 0:692 Science Maxide 6:21 100 0:513 7:67 100 0:543 1:02 101 0:568 MC-b 6:80 104 0:395 3:94 104 0:403 2:06 104 0:394 MC-1 8:50 104 0:411 4:97 104 0:470 2:52 104 0:414 BR-R 2:93 101 0:506 5:06 101 0:535 9:30 101 0:557 BR-1 3:60 101 0:506 5:91 101 0:535 1:04 102 0:557 Social Maxide 7:18 100 0:721 9:09 100 0:748 1:21 101 0:754 MC-b 1:71 105 0:582 9:65 104 0:595 4:56 104 0:594 MC-1 2:22 105 0:602 1:17 105 0:625 5:41 104 0:604 BR-R 3:09 101 0:717 5:35 101 0:746 9:74 101 0:751 BR-1 3:71 101 0:717 6:00 101 0:746 1:02 102 0:751 Society Maxide 3:69 100 0:580 4:54 100 0:594 5:80 100 0:616 MC-b 4:75 104 0:550 2:93 104 0:545 1:62 104 0:552 MC-1 4:14 104 0:550 3:65 104 0:561 2:04 104 0:590 BR-R 2:50 101 0:571 4:54 101 0:590 8:59 101 0:600 BR-1 2:84 101 0:572 4:92 101 0:590 9:58 101 0:601 NUS-WIDE Maxide 1:47 103 0:513 2:10 103 0:519 3:53 103 0:522 BR-1 1:24 102 0:329 2:38 102 0:398 4:81 102 0:466 Flickr Maxide 1:33 104 0:124 1:89 104 0:124 2:67 104 0:124 BR-1 2:48 104 0:064 4:74 104 0:074 1:11 105 0:077 
[16] Y. Fang and L. Si. Matrix co-factorization for recommendation with rich side information and implicit feedback. In Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems, 2011.
[17] A. Goldberg, X. Zhu, B. Recht, J.-M. Xu, and R. Nowak. Transduction with matrix completion: Three birds with one stone. In NIPS, 2010.
[18] Y. Guo and D. Schuurmans. Semi-supervised multi-label classification - a simultaneous large-margin, subspace learning approach. In ECML, 2012.
[19] P. Jain, P. Netrapalli, and S. Sanghavi. Provable matrix sensing using alternating minimization. In NIPS Workshop on Optimization for Machine Learning, 2012.
[20] A. Jalali, Y. Chen, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization.In ICML, 2011.
[21] S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspace for multi-label classification. In KDD, 2008.
[22] S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In ICML, 2009.
[23] R. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE TIT, 56(6):2980 2998, 2010.
[24] X. Kong, M. Ng, and Z.-H. Zhou. Transductive multi-label learning via label set propagation. IEEE TKDE, 25(3):704719, 2013.
[25] Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical report, UIUC, 2009.
[26] Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006.
[27] S. Ma, D. Goldfarb, and L. Chen. Fixed point and bregman iterative methods for matrix rank minimiza- tion. Mathematical Programming, 128(1-2):321353, 2011.
[28] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incom- plete matrices. JMLR, 11:22872322, 2010.
[29] A. Menon, K. Chitrapura, S. Garg, D. Agarwal, and N. Kota. Response prediction using collaborative filtering with hierarchies and side-information. In KDD, 2011.
[30] S. Negahban and M. Wainwright. Estimation of (near) low-rank matrices with noise and high dimensional scaling. Annual of Statistics, 39(2):10691097, 2011.
[31] G. Obozinski, B. Taskar, and M. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231252, 2010.
[32] W. Pan, E. Xiang, N. Liu, and Q. Yang. Transfer learning in collaborative filtering for sparsity reduction.In AAAI, 2010.
[33] I. Porteous, A. Asuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In AAAI, 2010.
[34] B. Recht. A simpler approach to matrix completion. JMLR, 12:34133430, 2011.
[35] J. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005.
[36] A. Rhode and A. Tsybakov. Estimation of high dimensional low rank matrices. Annual of Statistics, 39(2):887930, 2011.
[37] N. Srebro, Jason D. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. In NIPS. 2005.
[38] Y.-Y. Sun, Y. Zhang, and Z.-H. Zhou. Multi-label learning with weak label. In AAAI, 2010.
[39] K.-C. Toh and Y. Sangwoon. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 2010.
[40] N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. In NIPS, 2002.
[41] K. Weinberger and L. Saul. Unsupervised learning of image manifolds by semidefinite programming.IJCV, 70(1):7790, 2006.
[42] J. Yi, T. Yang, R. Jin, A. Jain, and M. Mahdavi. Robust ensemble clustering by matrix completion. In ICDM, 2012.
[43] G. Yu, C. Domeniconi, H. Rangwala, G. Zhang, and Z. Yu. Transductive multi-label ensemble classifica- tion for protein function prediction. In KDD, 2012.
[44] M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE TKDE, in press.
[45] J. Zhuang and S. Hoi. A two-view learning approach for image tag ranking. In WSDM, 2011.
-----1
[1] Williams C, Seeger M: Using the Nystrom method to speed up kernel machines. In NIPS 2001.
[2] Rahimi A, Recht B:Weighted sums of random kitchen sinks: Replacing minimization with random- ization in learning. In Adv in Neural Information Processing Systems (NIPS) 2008.
[3] Yang T, Li YF, Mahdavi M, Jin R, Zhou ZH: Nystrom Method vs Random Fourier Features: A Theo- retical and Empirical Comparison. In NIPS 2012.
[4] Gittens A, MahoneyMW:Revisiting the Nystrommethod for improved large-scale machine learning.In ICML 2013.
[5] Bach F: Sharp analysis of low-rank kernel approximations. In COLT 2013.
[6] Rahimi A, Recht B:Random Features for Large-Scale Kernel Machines. In Adv in Neural Information Processing Systems 2007.
[7] Kakade S, Foster DP: Multi-view Regression Via Canonical Correlation Analysis. In Computational Learning Theory (COLT) 2007.
[8] Hotelling H: Relations between two sets of variates. Biometrika 1936, 28:312377.
[9] Hardoon DR, Szedmak S, Shawe-Taylor J: Canonical Correlation Analysis: An Overview with Appli- cation to Learning Methods. Neural Comp 2004, 16(12):26392664.
[10] Ji M, Yang T, Lin B, Jin R, Han J: A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound. In ICML 2012.
[11] Belkin M, Niyogi P, Sindhwani V: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR 2006, 7:23992434.
[12] Blum A, Mitchell T: Combining labeled and unlabeled data with co-training. In COLT 1998.
[13] Chaudhuri K, Kakade SM, Livescu K, Sridharan K: Multiview clustering via Canonical Correlation Analysis. In ICML 2009.
[14] McWilliams B, Montana G:Multi-view predictive partitioning in high dimensions. Statistical Analysis and Data Mining 2012, 5:304321.
[15] Drineas P, Mahoney MW: On the Nystrom Method for Approximating a GramMatrix for Improved Kernel-Based Learning. JMLR 2005, 6:21532175.
[16] Avron H, Boutsidis C, Toledo S, Zouzias A: Efficient Dimensionality Reduction for Canonical Corre- lation Analysis. In ICML 2013.
[17] Hsu D, Kakade S, Zhang T: An Analysis of Random Design Linear Regression. In COLT 2012.
[18] Dhillon PS, Foster DP, Kakade SM, Ungar LH: A Risk Comparison of Ordinary Least Squares vs Ridge Regression. Journal of Machine Learning Research 2013, 14:15051511.
[19] Andrew G, Arora R, Bilmes J, Livescu K: Deep Canonical Correlation Analysis. In ICML 2013.
[20] Kumar S, Mohri M, Talwalkar A: Sampling methods for the Nystrom method. JMLR 2012, 13:981 1006.
-----1
[1] X. Zhu, Z. Ghahramani, and J. D. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proc. of the 20th ICML (T. Fawcett and N. Mishra, eds.), pp. 912919, AAAI Press, 2003.
[2] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, Learning with local and global consis- tency, in Advances in NIPS 16 (S. Thrun, L. Saul, and B. Scholkopf, eds.), MIT Press, 2004.
[3] A. Kapoor, Y. A. Qi, H. Ahn, and R. Picard, Hyperparameter and kernel learning for graph based semi- supervised classification, in Advances in NIPS 18 (Y. Weiss, B. Scholkopf, and J. Platt, eds.), pp. 627 634, MIT Press, 2006.
[4] X. Zhang and W. S. Lee, Hyperparameter learning for graph based semi-supervised learning algorithms, in Advances in NIPS 19 (B. Scholkopf, J. Platt, and T. Hoffman, eds.), pp. 15851592, MIT Press, 2007.
[5] F. Wang and C. Zhang, Label propagation through linear neighborhoods, IEEE TKDE, vol. 20, pp. 55 67, 2008.
[6] S. Roweis and L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, no. 5500, pp. 23232326, 2000.
[7] S. I. Daitch, J. A. Kelner, and D. A. Spielman, Fitting a graph to vector data, in Proc. of the 26th ICML, (New York, NY, USA), pp. 201208, ACM, 2009.
[8] H. Cheng, Z. Liu, and J. Yang, Sparsity induced similarity measure for label propagation, in IEEE 12th ICCV, pp. 317324, IEEE, 2009.
[9] W. Liu, J. He, and S.-F. Chang, Large graph construction for scalable semi-supervised learning, in Proc.of the 27th ICML, pp. 679686, Omnipress, 2010.
[10] J. Chen and Y. Liu, Locally linear embedding: a survey, Artificial Intelligence Review, vol. 36, pp. 29 48, 2011.
[11] L. K. Saul and S. T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, JMLR, vol. 4, pp. 119155, Dec. 2003.
[12] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, and A. J. Smola, A kernel method for the two- sample-problem, in Advances in NIPS 19 (B. Scholkopf, J. C. Platt, and T. Hoffman, eds.), pp. 513520, MIT Press, 2007.
[13] E. Elhamifar and R. Vidal, Sparse manifold clustering and embedding, in Advances in NIPS 24 (J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, eds.), pp. 5563, 2011.
[14] D. Kong, C. H. Ding, H. Huang, and F. Nie, An iterative locally linear embedding algorithm, in Proc.of the 29th ICML (J. Langford and J. Pineau, eds.), pp. 16471654, Omnipress, 2012.
[15] X. Zhu, J. Kandola, Z. Ghahramani, and J. Lafferty, Nonparametric transforms of graph kernels for semi- supervised learning, in Advances in NIPS 17 (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 16411648, MIT Press, 2005.
[16] F. R. Bach and M. I. Jordan, Learning spectral clustering, in Advances in NIPS 16 (S. Thrun, L. K. Saul, and B. Scholkopf, eds.), 2004.
[17] T. Jebara, J. Wang, and S.-F. Chang, Graph construction and b-matching for semi-supervised learning, in Proc. of the 26th ICML (A. P. Danyluk, L. Bottou, and M. L. Littman, eds.), pp. 441448, ACM, 2009.
[18] M. S. Baghshah and S. B. Shouraki, Metric learning for semi-supervised clustering using pairwise con- straints and the geometrical structure of data, Intelligent Data Analysis, vol. 13, no. 6, pp. 887899, 2009.
[19] B. Shaw, B. Huang, and T. Jebara, Learning a distance metric from a network, in Advances in NIPS 24 (J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, eds.), pp. 18991907, 2011.
[20] S. A. Nene, S. K. Nayar, and H. Murase, Columbia object image library, tech. rep., CUCS-005-96, 1996.
[21] T. Hastie, R. Tibshirani, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction. New York: Springer-Verlag, 2001.
[22] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recogni- tion, Proceedings of the IEEE, vol. 86, no. 11, pp. 22782324, 1998.
[23] F. Samaria and A. Harter, Parameterisation of a stochastic model for human face identification, in Proceedings of the Second IEEE Workshop on Applications of Computer Vision, pp. 138142, 1994.
[24] A. Asuncion and D. J. Newman, UCI machine learning repository. http://www.ics.uci.edu/  mlearn/MLRepository.html, 2007.
[25] A. Georghiades, P. Belhumeur, and D. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE TPAMI, vol. 23, no. 6, pp. 643660, 2001.
[26] D. B. Graham and N. M. Allinson, Characterizing virtual eigensignatures for general purpose face recog- nition, in Face Recognition: From Theory to Applications ; NATO ASI Series F, Computer and Systems Sciences (H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman-Soulie, and T. S. Huang, eds.), vol. 163, pp. 446456, 1998.
[27] L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, in Advances in NIPS 17, pp. 16011608, MIT Press, 2004.
-----1
[1] M. Aharon, M. Elad, and A. Bruckstein. k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sig. Proc., 54(11):43114322, 2006.
[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2:183202, March 2009.
[3] E. Benetos and S. Dixon. Multiple-instrument polyphonic music transcription using a convo- lutive probabilistic model. In Sound and Music Computing Conference, pages 1924, 2011.
[4] D.P. Bertsekas. Nonlinear programming. 1999.
[5] H. Bischof, Y. Chen, and T. Pock. Learning l1-based analysis and synthesis sparsity priors using bi-level optimization. NIPS workshop, 2012.
[6] M. M. Bronstein, A. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi. Blind deconvolution of images using optimal sparse representations. IEEE Trans. Im. Proc., 14(6):726736, 2005.
[7] J. C. Brown. Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89:425, 1991.
[8] B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization. Annals of opera- tions research, 153(1):235256, 2007.
[9] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. on Im. Proc., 54(12):37363745, 2006.
[10] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio, Speech, and Language Proc., 18(6):16431654, 2010.
[11] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In ICML, pages 399406, 2010.
[12] J. Mairal, F. Bach, and J. Ponce. Task-driven dictionary learning. IEEE Trans. PAMI, 34(4):791804, 2012.
[13] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE Trans. on Im. Proc., 17(1):5369, 2008.
[14] S. Mallat. A Wavelet Tour of Signal Processing, Second Edition. Academic Press, 1999.
[15] Y. Nesterov. Gradient methods for minimizing composite objective function. In CORE.Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2007.
[16] B.A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607609, 1996.
[17] G. Peyre and J. Fadili. Learning analysis sparsity priors. SAMPTA11, 2011.
[18] G. E. Poliner and D. Ellis. A discriminative model for polyphonic piano transcription.EURASIP J. Adv. in Sig. Proc., 2007, 2006.
[19] L.I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation-based noise removal algorithms.Physica D, 60(1-4):259268, 1992.
[20] P. Sprechmann, A. M. Bronstein, and G. Sapiro. Learning efficient sparse and low rank models.arXiv preprint arXiv:1212.3631, 2012.
[21] R. Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal Stat. Society: Series B, 58(1):267288, 1996.
[22] Ryan Joseph Tibshirani. The solution path of the generalized lasso. Stanford University, 2011.
[23] S. Vaiter, G. Peyre, C. Dossal, and J. Fadili. Robust sparse analysis regularization. Information Theory, IEEE Transactions on, 59(4):20012016, 2013.
[24] J. Yang, John W., T. Huang, and Y. Ma. Image super-resolution as sparse representation of raw image patches. In Proc. CVPR, pages 18. IEEE, 2008.
[25] G. Yu and J.-M. Morel. On the consistency of the SIFT method. Inverse problems and Imaging, 2009.
[26] G. Yu, G. Sapiro, and S. Mallat. Solving inverse problems with piecewise linear estimators: from gaussian mixture models to structured sparsity. IEEE Trans. Im. Proc., 21(5):24812499, 2012.
-----1
[1] M. Segal, K. Dahlquist, and B. Conklin, Regression approaches for microarray data analysis, Journal of Computational Biology, vol. 10, no. 6, pp. 961980, 2003.
[2] H. Zhu and G. Giannakis, Sparse overcomplete representations for efficient identification of power line outages, IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 2215 2224, nov. 2012.
[3] E. J. Cande`s, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruc- tion from highly incomplete frequency information, IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489509, 2006.
[4] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, Single-pixel imaging via compressive sampling, IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 8391, Mar. 2008.
[5] N. Meinshausen and P. Buhlmann, High-dimensional graphs and variable selection with the Lasso, Annals of Statistics, vol. 34, no. 3, pp. 1436, 2006.
[6] P. Ravikumar, M. Wainwright, and J. Lafferty, High-dimensional Ising model selection using 1-egularized logistic regression, Annals of Statistics, vol. 38, no. 3, pp. 12871319, 2010.
[7] G. Varoquaux, A. Gramfort, and B. Thirion, Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering, in Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012, pp. 13751382.
[8] P. Zhao and B. Yu, On model selection consistency of Lasso, Journal of Machine Learning Research, vol. 7, pp. 25412563, 2006.
[9] J. A. Tropp and A. C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Transactions Information Theory, vol. 53, no. 12, pp. 46554666, 2007.
[10] M. J. Wainwright, Sharp thresholds for noisy and high-dimensional recovery of sparsity using 1-constrained quadratic programming (Lasso), IEEE Transactions Information Theory, vol.55, no. 5, May 2009.
[11] N. Meinshausen and B. Yu, Lasso-type recovery of sparse representations for high- dimensional data, Annals of Statistics, vol. 37, no. 1, pp. 246270, 2009.
[12] P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, vol. 37, no. 4, pp. 17051732, 2009.
[13] P. Buhlmann and S. Van De Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer-Verlag New York Inc, 2011.
[14] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301320, 2005.
[15] Y. She, Sparse regression with exact clustering, Electronic Journal Statistics, vol. 4, pp.10551096, 2010.
[16] E. Grave, G. R. Obozinski, and F. R. Bach, Trace Lasso: A trace norm regularization for correlated designs, in Advances in Neural Information Processing Systems 24, J. Shawe- taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds., 2011, pp. 21872195.
[17] J. Huang, S. Ma, H. Li, and C. Zhang, The sparse laplacian shrinkage estimator for high- dimensional regression, Annals of Statistics, vol. 39, no. 4, pp. 2021, 2011.
[18] P. Buhlmann, P. Rutimann, S. van de Geer, and C.-H. Zhang, Correlated variables in regres- sion: clustering and sparse estimation, Journal of Statistical Planning and Inference, vol. 143, pp. 18351858, Nov. 2013.
[19] J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M.Schwab, C. R. Antonescu, C. Peterson, et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature medicine, vol.7, no. 6, pp. 673679, 2001.
[20] T. Zhang, Adaptive forward-backward greedy algorithm for learning sparse representations, IEEE Transactions Information Theory, vol. 57, no. 7, pp. 46894708, 2011.
[21] D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccu- rate samples, Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301321, 2009.
[22] T. Zhang, Some sharp performance bounds for least squares regressionwith l1 regularization, The Annals of Statistics, vol. 37, no. 5A, pp. 21092144, 2009.
[23] L. Wasserman and K. Roeder, High dimensional variable selection, Annals of statistics, vol.37, no. 5A, pp. 2178, 2009.
[24] T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, vol. 11, pp. 10811107, Mar. 2010.
[25] S. van de Geer, P. Buhlmann, and S. Zhou, The adaptive and the thresholded lasso for poten- tially misspecified models (and a lower bound for the lasso), Electronic Journal of Statistics, vol. 5, pp. 688749, 2011.
[26] N. Meinshausen and P. Buhlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 72, no. 4, pp. 417473, 2010.
[27] D. Vats and R. G. Baraniuk, Swapping for high-dimensional sparse regression from correlated measurements, Preprint, 2013.
-----1
[1] M. Slaney. Web-scale multimedia analysis: Does content matter? MultiMedia, IEEE, 18(2):1215, 2011.
[2] O`. Celma. Music Recommendation and Discovery in the Long Tail. PhD thesis, Universitat Pompeu Fabra, Barcelona, 2008.
[3] Malcolm Slaney, Kilian Q. Weinberger, and William White. Learning a metric for music similarity. In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR), 2008.
[4] Jan Schluter and Christian Osendorfer. Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine. In Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), 2011.
[5] Brian McFee, Luke Barrington, and Gert R. G. Lanckriet. Learning content similarity for music recom- mendation. IEEE Transactions on Audio, Speech & Language Processing, 20(8), 2012.
[6] Richard Stenzel and Thomas Kamps. Improving Content-Based Similarity Measures by Training a Col- laborative Model. pages 264271, London, UK, September 2005. University of London.
[7] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.
[8] James Bennett and Stan Lanning. The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35, 2007.
[9] Eric J. Humphrey, Juan P. Bello, and Yann LeCun. Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), 2012.
[10] Philippe Hamel and Douglas Eck. Learning features from music audio with deep belief networks. In Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), 2010.
[11] Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng. Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in Neural Information Processing Systems 22. 2009.
[12] Sander Dieleman, Philemon Brakel, and Benjamin Schrauwen. Audio-based music classification with a pretrained convolutional network. In Proceedings of the 12th International Conference on Music Infor- mation Retrieval (ISMIR), 2011.
[13] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset.In Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), 2011.
[14] Brian McFee, Thierry Bertin-Mahieux, Daniel P.W. Ellis, and Gert R.G. Lanckriet. The million song dataset challenge. In Proceedings of the 21st international conference companion on World Wide Web, 2012.
[15] Andreas Rauber, Alexander Schindler, and Rudolf Mayer. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), 2012.
[16] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, 2008.
[17] Jason Weston, Chong Wang, Ron Weiss, and Adam Berenzweig. Latent collaborative retrieval. In Pro- ceedings of the 29th international conference on Machine learning, 2012.
[18] Jason Weston, Samy Bengio, and Philippe Hamel. Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. Journal of New Music Research, 2011.
[19] Jonathan T Foote. Content-based retrieval of music and audio. In Voice, Video, and Data Communications, pages 138147. International Society for Optics and Photonics, 1997.
[20] Matthew Hoffman, David Blei, and Perry Cook. Easy As CBA: A Simple Probabilistic Model for Tagging Music. In Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR), 2009.
[21] Brian McFee and Gert R. G. Lanckriet. Metric learning to rank. In Proceedings of the 27 th International Conference on Machine Learning, 2010.
[22] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):8297, 2012.
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 2012.
[24] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010.
[25] James Bergstra, Olivier Breuleux, Frederic Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Des- jardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expres- sion compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
[26] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. Technical report, University of Toronto, 2012.
[27] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
[28] Chong Wang and David M. Blei. Collaborative topic modeling for recommending scientific articles.In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011.
[29] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Infor- mation Processing Systems, volume 20, 2008.
-----1
[1] N. Srebro, J.D.M. Rennie, and T. Jaakkola. Maximum-Margin Matrix Factorization. In Ad- vances in neural information processing systems, volume 17, pages 13291336. MIT Press, 2005.
[2] E.J. Cande`s and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717772, 2009.
[3] E.J. Cande`s and Y. Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925 936, 2010.
[4] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 11:22872322, 2010.
[5] M. Fazel. Matrix rank minimization with applications. PhD thesis, Stanford University, 2002.
[6] E.J. Cande`s, M.B. Wakin, and S.P. Boyd. Enhancing sparsity by reweighted l1 minimization.Journal of Fourier Analysis and Applications, 14(5):877905, 2008.
[7] J.F. Cai, E.J. Cande`s, and Z. Shen. A singular value thresholding algorithm for matrix comple- tion. SIAM Journal on Optimization, 20(4):19561982, 2010.
[8] Anthony Lee, Francois Caron, Arnaud Doucet, and Chris Holmes. A hierarchical Bayesian framework for constructing sparsity-inducing priors. arXiv preprint arXiv:1009.1914, 2010.
[9] M. Fazel, H. Hindi, and S.P. Boyd. Log-det heuristic for matrix rank minimization with ap- plications to hankel and euclidean distance matrices. In American Control Conference, 2003.Proceedings of the 2003, volume 3, pages 21562162. IEEE, 2003.
[10] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, pages 138, 1977.
[11] S. Ga?ffas and G. Lecue. Weighted algorithms for compressed sensing and matrix completion.arXiv preprint arXiv:1107.1638, 2011.
[12] Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. arXiv preprint arXiv:1201.0381, 2012.
[13] N. Srebro and T. Jaakkola. Weighted low-rank approximations. In NIPS, volume 20, page 720, 2003.
[14] F. Bach. Consistency of trace norm minimization. The Journal of Machine Learning Research, 9:10191048, 2008.
[15] R. M. Larsen. Lanczos bidiagonalization with partial reorthogonalization. Technical report, DAIMI PB-357, 1998.
[16] R. M. Larsen. Propack-software for large and sparse svd calculations. Available online. URL http://sun. stanford. edu/rmunk/PROPACK, 2004.
[17] J. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative pre- diction. In Proceedings of the 22nd international conference on Machine learning, pages 713719. ACM, 2005.
[18] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2):133151, 2001.
[19] Z. Zhang, S. Wang, D. Liu, and M. I. Jordan. EP-GIG priors and applications in Bayesian sparse learning. The Journal of Machine Learning Research, 98888:20312061, 2012.
[20] M. Seeger and G. Bouchard. Fast variational Bayesian inference for non-conjugate matrix factorization models. In Proc. of AISTATS, 2012.
[21] S. Nakajima, M. Sugiyama, S. D. Babacan, and R. Tomioka. Global analytic solution of fully- observed variational Bayesian matrix factorization. Journal of Machine Learning Research, 14:137, 2013.
-----1
[1] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 2011.
[2] K. Amin, M. Kearns, and U. Syed. Graphical models for bandit problems. Proceedings of the Twenty- Seventh Conference Uncertainty in Artificial Intelligence, 2011.
[3] D. Asanov. Algorithms and methods in recommender systems. Berlin Institute of Technology, Berlin, Germany, 2011.
[4] P. Auer. Using confidence bounds for exploration-exploitation trade-offs. Journal of Machine Learning Research, 3:397422, 2002.
[5] T. Bogers. Movie recommendation using random walks over the contextual graph. In CARS10: Pro- ceedings of the 2nd Workshop on Context-Aware Recommender Systems, 2010.
[6] I. Cantador, P. Brusilovsky, and T. Kuflik. 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). In Proceedings of the 5th ACM Conference on Recommender Systems, RecSys 2011. ACM, 2011.
[7] S. Caron, B. Kveton, M. Lelarge, and S. Bhagat. Leveraging side observations in stochastic bandits. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pages 142151, 2012.
[8] G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear algorithms for online multitask classification.Journal of Machine Learning Research, 11:25972630, 2010.
[9] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In Pro- ceedings of the International Conference on Artificial Intelligence and Statistics, pages 208214, 2011.
[10] K. Crammer and C. Gentile. Multiclass classification with bandit feedback using adaptive regularization.Machine Learning, 90(3):347383, 2013.
[11] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors a multilevel approach.Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(11):19441957, 2007.
[12] T. Evgeniou and M. Pontil. Regularized multitask learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 04, pages 109117, New York, NY, USA, 2004. ACM.
[13] I. Guy, N. Zwerdling, D. Carmel, I. Ronen, E. Uziel, S. Yogev, and S. Ofek-Koifman. Personalized recommendation of social software items based on social relations. In Proceedings of the Third ACM Conference on Recommender Sarxiv ystems, pages 5360. ACM, 2009.
[14] S. Kar, H. V. Poor, and S. Cui. Bandit problems in networks: Asymptotically efficient distributed allo- cation rules. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pages 17711778. IEEE, 2011.
[15] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661670. ACM, 2010.
[16] S. Mannor and O. Shamir. From bandits to experts: On the value of side-observations. In Advances in Neural Information Processing Systems, pages 684692, 2011.
[17] C. A. Micchelli and M. Pontil. Kernels for multitask learning. In Advances in Neural Information Processing Systems, pages 921928, 2004.
[18] A. Said, E. W. De Luca, and S. Albayrak. How social relationships affect user similarities. In Proceedings of the 2010 Workshop on Social Recommender Systems, pages 14, 2010.
[19] A. Slivkins. Contextual bandits with similarity information. Journal of Machine Learning Research  Proceedings Track, 19:679702, 2011.
[20] B. Swapna, A. Eryilmaz, and N. B. Shroff. Multi-armed bandits in the presence of side observations in social networks. In Proceedings of 52nd IEEE Conference on Decision and Control (CDC), 2013.
[21] B. Szorenyi, R. Busa-Fekete, I. Hegedus, R. Ormandi, M. Jelasity, and B. Kegl. Gossip-based distributed stochastic bandit algorithms. Proceedings of the 30th International Conference on Machine Learning, 2013.
[22] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th International Conference on Machine Learning, pages 11131120. Omnipress, 2009.
-----1
[1] David M. Blei, Andrew Ng, and Michael Jordan. Latent Dirichlet allocation. JMLR, 3:993 1022, 2003.
[2] Leonard E. Baum and J. A. Eagon. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math.Soc., 73(3):360363, 1967.
[3] J. Zou and R. Adams. Priors for diversity in generative latent variable models. In Advances in Neural Information Processing Systems 25, 2012.
[4] B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt. Support vector method for novelty detection. In Advances in Neural Information Processing Systems 25, 2000.
[5] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and T. Telgarsky. Tensor decompositions for learning latent variable models, 2012. arXiv:1210.7559.
[6] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models.Journal of Computer and System Sciences, 78(5):14601480, 2012.
[7] S. Siddiqi, B. Boots, and G. Gordon. Reduced rank hidden markov models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
[8] B. Balle, A. Quattoni, and X. Carreras. Local loss optimization in operator models: A new insight into spectral learning. In Twenty-Ninth International Conference on Machine Learning, 2012.
[9] S. B. Cohen, K. Stratos, M. Collins, D. P. Foster, and L. Ungar. Spectral learning of latent variable PCFGs. In Proceedings of Association of Computational Linguistics, 2012.
[10] A. Anandkumar, D. P. Foster, D. Hsu, S. M. Kakade, and Y. K. Liu. A spectral algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 25, 2012.
[11] D. Hsu, S. M. Kakade, and P. Liang. Identifiability and unmixing of latent parse trees. In Advances in Neural Information Processing Systems 25, 2012.
[12] S. B. Cohen, K. Stratos, M. Collins, D. P. Foster, and L. Ungar. Experiments with spectral learning of latent-variable PCFGs. In Proceedings of Conference of the North American Chap- ter of the Association for Computational Linguistics, 2013.
[13] A. T. Chaganty and P. Liang. Spectral experts for estimating mixtures of linear regressions. In Thirtieth International Conference on Machine Learning, 2013.
[14] A. Kontorovich, B. Nadler, and R. Weiss. On learning parametric-output HMMs. In Thirtieth International Conference on Machine Learning, 2013.
[15] J. Zhu et al. Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell, 152(3):64254, 2013.
[16] J. Ernst et al. Mapping and analysis of chromatin state dynamics in nine human cell types.Nature, 473(7345):4349, 2011.
[17] D. Beck et al. Signal analysis for genome wide maps of histone modifications measured by chip-seq. Bioinformatics, 28(8):10629, 2012.
[18] L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and rank- (R1, R2, ..., Rn) approximation and applications of higher-order tensors. SIAM J. Matrix Anal.Appl., 21(4):13241342, 2000.
[19] Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai.Class-based n-gram models of natural language. Comput. Linguist., 18(4):467479, 1992.
-----1
[1] A. Kulesza and B. Taskar. Determinantal point processes for machine learning. ArXiv, 2012.
[2] A. Kulesza and B. Taskar. k-DPPs: Fixed-size determinantal point processes. In Proceedings of ICML, 2011.
[3] A. Kulesza and B. Taskar. Learning determinantal point processes. In Proceedings of UAI, 2011.
[4] J.B. Hough, M. Krishnapur, Y. Peres, and B. Virag. Determinantal processes and independence.Probability Surveys, 3, 2006.
[5] I. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In Proceedings of ACM SIGKDD, 2004.
[6] G. Golub and C. van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
[7] A. Daniel, D. Amit, H. Pierre, and P. Preyas. NP-hardness of euclidean sum-of-squares clus- tering. Machine Learning, 75:245248, 2009.
[8] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of NIPS, 2001.
[9] M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of KDD, 1996.
[10] C. Fraley and A. E. Raftery. How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8), 1998.
[11] J. Gillenwater, A. Kulesza, and B. Taskar. Near-optimal MAP inference for determinantal point processes. In Proceedings of NIPS, 2012.
[12] D. Slate. Letter recognition data set. http://archive.ics.uci.edu/ml/ datasets/Letter+Recognition, 1991.
[13] G. Hebrail and A. Berard. Individual household electric power consumption data set.http://archive.ics.uci.edu/ml/datasets/Individual+household+ electric+power+consumption, 2012.
[14] C. Goutte, L. K. Hansen, M. G. Liptrot, and E. Rostrup. Feature-space clustering for fMRI meta-analysis. Human Brain Mapping, 13, 2001.
[15] V. Guruswami. Rapidly mixing Markov chains: A comparison of techniques. Unpublished manuscript, 2000.
-----1
[1] B. Cipra. The best of the 20th century: Editors name top 10 algorithms. SIAM News, 33(4):1, May 2000.
[2] T.M. Semkow, S. Pomm, S. Jerome, and D.J. Strom, editors. Applied Modeling and Computa- tions in Nuclear Science. American Chemical Society, Washington, DC, 2006.
[3] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, November 1999.
[4] S. Assmussen and P. Glynn. Stochastic Simulation: Algorithms and Analysis (Stochastic Mod- elling and Applied Probability). Springer, 2010.
[5] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21:1087, 1953.
[6] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1):97109, 1970.
[7] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathe- matical Sciences. Johns Hopkins University Press, 1996.
[8] D. Aldous and J. Fill. Reversible Markov chains and random walks on graphs: Chapter 2 (General Markov chains). book in preparation. URL: http://www.stat.berkeley.edu/?aldous/ RWG/Chap2.pdf , pages 7, 1920, 1999.
[9] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications.Advances in Applied probability, pages 502525, 1982.
[10] P. Diaconis and L. Saloff-Coste. What do we know about the Metropolis algorithm? Journal of Computer and System Sciences, 57(1):2036, 1998.
[11] P. Diaconis. The Markov chain Monte Carlo revolution. Bulletin of the American Mathematical Society, 46(2):179205, 2009.
[12] D.A. Levin, Y. Peres, and E.L. Wilmer. Markov chains and mixing times. Amer Mathematical Society, 2009.
[13] G. Jeh and J. Widom. Scaling personalized web search. In Proceedings of the 12th interna- tional conference on World Wide Web, pages 271279, New York, NY, USA, 2003.
[14] D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized PageR- ank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333358, 2005.
[15] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte Carlo methods in PageR- ank computation: When one iteration is sufficient. SIAM Journal on Numerical Analysis, 45(2):890904, 2007.
[16] B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. Proc.VLDB Endow., 4(3):173184, December 2010.
[17] C. Borgs, M. Brautbar, J. Chayes, and S.-H. Teng. Sublinear time algorithm for PageRank computations and related applications. CoRR, abs/1202.2771, 2012.
[18] SP. Meyn and RL. Tweedie. Markov chains and stochastic stability. Springer-Verlag, 1993.
[19] C.E. Lee, A. Ozdaglar, and D. Shah. Computing the stationary distribution locally. MIT LIDS Report 2914, Nov 2013. URL: http://www.mit.edu/?celee/LocalStationaryDistribution.pdf.
[20] F.G. Foster. On the stochastic matrices associated with certain queuing processes. The Annals of Mathematical Statistics, 24(3):355360, 1953.
-----1
[1] Alessandro Acquisti and Hal R. Varian. Conditioning prices on purchase history. Marketing Science, 24(3):367381, 2005.
[2] Raman Arora, Ofer Dekel, and Ambuj Tewari. Online bandit learning against an adaptive adversary: from regret to policy regret. In ICML, 2012.
[3] Peter Auer, Nicolo` Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235256, 2002.
[4] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. Journal on Computing, 32(1):4877, 2002.
[5] Moshe Babaioff, Robert D Kleinberg, and Aleksandrs Slivkins. Truthful mechanisms with implicit payment computation. In Proceedings of the Conference on Electronic Commerce, pages 4352. ACM, 2010.
[6] Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. Characterizing truthful multi- armed bandit mechanisms. In Proceedings of Conference on Electronic Commerce, pages 7988. ACM, 2009.
[7] Ziv Bar-Yossef, Kirsten Hildrum, and Felix Wu. Incentive-compatible online auctions for digital goods. In Proceedings of Symposium on Discrete Algorithms, pages 964970. SIAM, 2002.
[8] Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. Online learning in online auctions. In Proceedings Symposium on Discrete algorithms, pages 202204. SIAM, 2003.
[9] Nicolo Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. Regret minimization for reserve prices in second-price auctions. In Proceedings of the Symposium on Discrete Algorithms.SIAM, 2013.
[10] Ofer Dekel, Felix Fischer, and Ariel D Procaccia. Incentive compatible regression learning.Journal of Computer and System Sciences, 76(8):759777, 2010.
[11] Nikhil R Devanur and Sham M Kakade. The price of truthfulness for pay-per-click auctions.In Proceedings of the Conference on Electronic commerce, pages 99106. ACM, 2009.
[12] Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decision support systems, 43(1):192198, 2007.
[13] Drew Fudenberg and J. Miguel Villas-Boas. Behavior-Based Price Discrimination and Cus- tomer Recognition. Elsevier Science, Oxford, 2007.
[14] Jason Hartline. Dynamic posted price mechanisms, 2001.
[15] Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Symposium on Foundations of Computer Science, pages 594605. IEEE, 2003.
[16] Volodymyr Kuleshov and Doina Precup. Algorithms for the multi-armed bandit problem.Journal of Machine Learning, 2010.
[17] Reshef Meir, Ariel D Procaccia, and Jeffrey S Rosenschein. Strategyproof classification with shared inputs. Proc. of 21st IJCAI, pages 220225, 2009.
[18] Herbert Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169177. Springer, 1985.
-----1
[1] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consis- tency too: a holistic solution to contingency table release. In PODS, pages 273282. ACM, 2007.
[2] A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609618. ACM, 2008.
[3] A. Blum and A. Roth. Fast private data release algorithms for sparse queries. arXiv preprint arX- iv:1111.6842, 2011.
[4] K. Chaudhuri and D. Hsu. Sample complexity bounds for differentially private learning. In COLT, 2011.
[5] K. Chaudhuri, C. Monteleoni, and A.D. Sarwate. Differentially private empirical risk minimization.JMLR, 12:1069, 2011.
[6] K. Chaudhuri, A. Sarwate, and K. Sinha. Near-optimal differentially private principal components. In NIPS, pages 9981006, 2012.
[7] M. Cheraghchi, A. Klivans, P. Kothari, and H.K. Lee. Submodular functions are noise stable. In SODA, pages 15861592. SIAM, 2012.
[8] R.A. DeVore and G. G. Lorentz. Constructive approximation, volume 303. Springer Verlag, 1993.
[9] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.TCC, pages 265284, 2006.
[10] C. Dwork, M. Naor, O. Reingold, G.N. Rothblum, and S. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In STOC, pages 381390. ACM, 2009.
[11] C. Dwork, G.N. Rothblum, and S. Vadhan. Boosting and differential privacy. In FOCS, pages 5160.IEEE, 2010.
[12] T. Gerstner and M. Griebel. Numerical integration using sparse grids. Numerical algorithms, 18(3- 4):209232, 1998.
[13] A. Gupta, M. Hardt, A. Roth, and J. Ullman. Privately releasing conjunctions and the statistical query barrier. In STOC, pages 803812. ACM, 2011.
[14] M. Hardt, K. Ligett, and F. McSherry. A simple and practical algorithm for differentially private data release. In NIPS, 2012.
[15] M. Hardt, G. N. Rothblum, and R. A. Servedio. Private data release via learning thresholds. In SODA, pages 168187. SIAM, 2012.
[16] M. Hardt and G.N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis.In FOCS, pages 6170. IEEE Computer Society, 2010.
[17] D. Kifer and B.R. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, pages 147158. ACM, 2010.
[18] J. Lei. Differentially private M-estimators. In NIPS, 2011.
[19] C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, pages 123134. ACM, 2010.
[20] Pravesh K. Prateek J. and Abhradeep T. Differentially private online learning. In COLT, 2012.
[21] A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, pages 765774.ACM, 2010.
[22] A. Smola, B. Scholkopf, and K. Muller. The connection between regularization operators and support vector kernels. Neural Networks, 11(4):637649, 1998.
[23] V.N. Temlyakov. Approximation of periodic functions. Nova Science Pub Inc, 1994.
[24] J. Thaler, J. Ullman, and S. Vadhan. Faster algorithms for privately releasing marginals. In ICALP, pages 810821. Springer, 2012.
[25] J. Ullman. Answering n2+o(1) counting queries with differential privacy is hard. In STOC. ACM, 2013.
[26] A. van der Vart and J.A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
[27] G. Wahba et al. Support vector machines, reproducing kernel Hilbert spaces and the randomized gacv.Advances in Kernel Methods-Support Vector Learning, 6:6987, 1999.
[28] L. Wang. Smoothness, disagreement coefficient, and the label complexity of agnostic active learning.Journal of Machine Learning Research, 12(2269-2292):52, 2011.
[29] S. Wasserman, L.and Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375389, 2010.
[30] O. Williams and F. McSherry. Probabilistic inference and differential privacy. In NIPS, 2010.
-----1
[ACBF02] Peter Auer, Nicolo` Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 2002.
[ADX10] Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, 2010.
[AHR08] Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algo- rithm for bandit linear optimization. In COLT, 2008.
[BCB12] Sebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi- armed bandit problems. arXiv preprint arXiv:1204.5721, 2012.
[BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: The SuLQ framework. In PODS, 2005.
[CM08] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In NIPS, 2008.
[CMS11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:10691109, 2011.
[CSS10] TH Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. In ICALP, 2010.
[DJW13] John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statis- tical minimax rates. In IEEE Symp. on Foundations of Computer Science (FOCS), 2013.http://arxiv.org/abs/1302.3203.
[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
[DN10] Cynthia Dwork and Moni Naor. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Privacy and Confidentiality, 2(1), 2010.
[DNPR10] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under continual observation. In Proceedings of the 42nd ACM symposium on Theory of computing, 2010.
[Dwo06] Cynthia Dwork. Differential privacy. In ICALP, 2006.
[FKM05] Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimiza- tion in the bandit setting: gradient descent without a gradient. In SODA, 2005.
[HAK07] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Journal of Machine Learning Research, 2007.
[Han57] James Hannan. Approximation to bayes risk in repeated play. 1957.
[JKT12] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In COLT, 2012.
[JT13] Prateek Jain and Abhradeep Thakurta. Differentially private learning with kernels. In ICML, 2013.
[KLN+08] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? In FOCS, 2008.
[KM12] Daniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy.In PODS, 2012.
[KS08] Shiva Prasad Kasiviswanathan and Adam Smith. A note on differential privacy: Defining resis- tance to arbitrary side information. CoRR, arXiv:0803.39461 [cs.CR], 2008.
[KST12] Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization and high-dimensional regression. In COLT, 2012.
[Sha11] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends R  in Machine Learning, 2011.
[Smi11] Adam Smith. Privacy-preserving statistical estimators with optimal convergence rates. In STOC, 2011.
[Zin03] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.
-----1
[1] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consis- tency too: A holistic solution to contingency table release. In Proceedings of the 26th ACM Symposium on Principles of Database Systems, 2007.
[2] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. In Advances in Cryptology, volume 5157 of Lecture Notes in Computer Science, pages 451468.Springer, 2008.
[3] P. Brucker. An O(n) algorithm for quadratic knapsack problems. Operations Research Letters, 3(3): 163166, 1984.
[4] R. Carroll and P. Hall. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association, 83(404):11841186, 1988.
[5] K. Chaudhuri and D. Hsu. Convergence rates for differentially private statistical estimation. In Proceed- ings of the 29th International Conference on Machine Learning, 2012.
[6] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization.Journal of Machine Learning Research, 12:10691109, 2011.
[7] T. M. Cover and J. A. Thomas. Elements of Information Theory, Second Edition. Wiley, 2006.
[8] A. De. Lower bounds in differential privacy. In Proceedings of the Ninth Theory of Cryptography Con- ference, 2012. URL http://arxiv.org/abs/1107.2183.
[9] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates.arXiv:1302.3203 [math.ST], 2013. URL http://arxiv.org/abs/1302.3203.
[10] G. T. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical Association, 81(393):1018, 1986.
[11] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.In Proceedings of the 3rd Theory of Cryptography Conference, pages 265284, 2006.
[12] S. Efromovich. Nonparametric Curve Estimation: Methods, Theory, and Applications. Springer-Verlag, 1999.
[13] A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining.In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, pages 211222, 2003.
[14] I. P. Fellegi. On the question of statistical confidentiality. Journal of the American Statistical Association, 67(337):718, 1972.
[15] S. E. Fienberg, U. E. Makov, and R. J. Steele. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics, 14(4):485502, 1998.
[16] M. Hardt and K. Talwar. On the geometry of differential privacy. In Proceedings of the Fourty- Second Annual ACM Symposium on the Theory of Computing, pages 705714, 2010. URL http://arxiv.org/abs/0907.3754.
[17] I. A. Ibragimov and R. Z. Hasminskii. Statistical Estimation: Asymptotic Theory. Springer-Verlag, 1981.
[18] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793826, 2011.
[19] J. Lei. Differentially private M-estimators. In Advances in Neural Information Processing Systems 25, 2011.
[20] D. Scott. On optimal and data-based histograms. Biometrika, 66(3):605610, 1979.
[21] A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Fourty-Third Annual ACM Symposium on the Theory of Computing, 2011.
[22] A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.
[23] S. Warner. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):6369, 1965.
[24] L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375389, 2010.
[25] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, 27(5):15641599, 1999.
[26] B. Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam, pages 423435. Springer-Verlag, 1997.
-----1
[1] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In PODS, 2005.
[2] K. Chaudhuri and D. Hsu. Convergence rates for differentially private statistical estimation. In ICML, 2012.
[3] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk mini- mization. Journal of Machine Learning Research, 12:10691109, March 2011.
[4] K. Chaudhuri, A.D. Sarwate, and K. Sinha. Near-optimal algorithms for differentially-private principal components. Journal of Machine Learning Research, 2013 (to appear).
[5] L. Devroye and G. Lugosi. Combinatorial methods in density estimation. New York: Springer, 2001.
[6] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Berlin, Heidelberg, 2006.
[7] C. Dwork, G. Rothblum, and S. Vadhan. Boosting and differential privacy. In FOCS, 2010.
[8] A. Frank and A. Asuncion. UCI machine learning repository, 2013.
[9] A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, 2010.
[10] S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary informa- tion in data privacy. In KDD, 2008.
[11] M. Hardt and A. Roth. Beyond worst-case analysis in private singular vector computation. In STOC, 2013.
[12] M. Hardt and G. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, pages 6170, 2010.
[13] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 3(1):10211032, 2010.
[14] P. Jain, P. Kothari, and A. Thakurta. Differentially private online learning. In COLT, 2012.
[15] M C Jones, J S Marron, and S J Sheather. A brief survey of bandwidth selection for density estimation. JASA, 91(433):401407, 1996.
[16] M. Kapralov and K. Talwar. On differentially private low rank approximation. In SODA, 2013.
[17] D. Kifer, A. Smith, and A. Thakurta. Private convex optimization for empirical risk minimiza- tion with applications to high-dimensional regression. In COLT, 2012.
[18] J. Lei. Differentially private M-estimators. In NIPS 24, 2011.
[19] A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, 2008.
[20] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.
[21] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013.
[22] B. Rubinstein, P. Bartlett, L. Huang, and N. Taft. Learning in a large function space: Privacy- preserving mechanisms for svm learning. Journal of Privacy and Confidentiality, 2012.
[23] A.D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms, theory and challenges. IEEE Signal Processing Magazine, 2013.
[24] J. A. Swets and R. M. Pickett. Evaluation of Diagnostic Systems. Methods from Signal Detec- tion Theory. Academic Press, New York, 1982.
[25] Berwin A Turlach. Bandwidth selection in kernel density estimation: A review. In CORE and Institut de Statistique. Citeseer, 1993.
[26] S. Vinterbo. Differentially private projected histograms: Construction and use for prediction.In ECML, 2012.
[27] L. Wasserman and S. Zhou. A statistical framework for differential privacy. JASA, 105(489):375389, 2010.
[28] J. Xu, Z. Zhang, X. Xiao, Y. Yang, and G. Yu. Differentially private histogram publication. In ICDE, 2012.
-----1
[1] NIPS0-12 dataset. http://www.stats.ox.ac.uk/teh/data.html.
[2] I. Abraham, S. Chechik, D. Kempe, and A. Slivkins. Low-distortion Inference of Latent Simi- larities from a Multiplex Social Network. CoRR, abs/1202.0922, 2012.
[3] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9:19812014, June 2008.
[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[5] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic Metric Learning.In ICML, 2007.
[6] S. E. Fienberg, M. M. Meyer, and S. S. Wasserman. Statistical Analysis of Multiple Sociomet- ric Relations. Journal of the American Statistical Association, 80(389):5167, March 1985.
[7] A. Globerson and S. Roweis. Metric Learning by Collapsing Classes. In NIPS, 2005.
[8] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood Components Analysis. In NIPS, 2004.
[9] S. Hauberg, O. Freifeld, and M. Black. A Geometric take on Metric Learning. In NIPS, 2012.
[10] T. S. Jaakkola and M. I. Jordan. Variational Probabilistic Inference and the QMR-DT Network.Journal of Artificial Intelligence Research, 10(1):291322, May 1999.
[11] P. Jain, B. Kulis, I. Dhillon, and K. Grauman. Online Metric Learning and Fast Similarity Search. In NIPS, 2008.
[12] D. Lin. An Information-Theoretic Definition of Similarity. In ICML, 1998.
[13] S. Parameswaran and K. Weinberger. Large Margin Multi-Task Metric Learning. In NIPS, 2010.
[14] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
[15] M. Szell, R. Lambiotte, and S. Thurner. Multirelational Organization of Large-scale Social Networks in an Online World. Proceedings of the National Academy of Sciences, 2010.
[16] L. van der Maaten and G. Hinton. Visualizing Non-Metric Similarities in Multiple Maps.Machine Learning, 33:3355, 2012.
[17] J. Wang, A. Woznica, and A. Kalousis. Parametric Local Metric Learning for Nearest Neighbor Classification. In NIPS, 2012.
[18] K. Q. Weinberger and L. K. Saul. Distance Metric Learning for Large Margin Nearest Neigh- bor Classification. Journal of Machine Learning Research, 10:207244, 2009.
[19] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance Metric Learning, with Application to Clustering with Side-information. In NIPS, 2002.
-----1
[1] Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart, and Paul Beardsley. Image and animation display with multiple mobile robots. 31(6):753773, 2012.
[2] Peter R. Wurman, Raffaello DAndrea, and Mick Mountz. Coordinating hundreds of cooperative, au- tonomous vehicles in warehouses. AI Magazine, 29(1):919, 2008.
[3] Stephen J. Guy, Jatin Chhugani, Changkyu Kim, Nadathur Satish, Ming Lin, Dinesh Manocha, and Pradeep Dubey. Clearpath: highly parallel collision avoidance for multi-agent simulation. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 177187, 2009.
[4] John H. Reif. Complexity of the movers problem and generalizations. In IEEE Annual Symposium on Foundations of Computer Science, pages 421427, 1979.
[5] John E. Hopcroft, Jacob T. Schwartz, and Micha Sharir. On the complexity of motion planning for multiple independent objects; pspace-hardness of the warehousemans problem. The International Journal of Robotics Research, 3(4):7688, 1984.
[6] Maren Bennewitz, Wolfram Burgard, and Sebastian Thrun. Finding and optimizing solvable priority schemes for decoupled path planning techniques for teams of mobile robots. Robotics and Autonomous Systems, 41(23):8999, 2002.
[7] Daniel Mellinger, Alex Kushleyev, and Vijay Kumar. Mixed-integer quadratic program trajectory genera- tion for heterogeneous quadrotor teams. In IEEE International Conference on Robotics and Automation, pages 477483, 2012.
[8] Federico Augugliaro, Angela P. Schoellig, and Raffaello DAndrea. Generation of collision-free trajec- tories for a quadrocopter fleet: A sequential convex programming approach. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 19171922, 2012.
[9] Steven M. LaValle and James J. Kuffner. Randomized kinodynamic planning. The International Journal of Robotics Research, 20(5):378400, 2001.
[10] Oussama Khatib. Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research, 5(1):9098, 1986.
[11] Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles. The International Journal of Robotics Research, 17(7):760772, 1998.
[12] Javier Alonso-Mora, Martin Rufli, Roland Siegwart, and Paul Beardsley. Collision avoidance for multiple agents with joint utility maximization. In IEEE International Conference on Robotics and Automation, 2013.
[13] Nate Derbinsky, Jose Bento, Veit Elser, and Jonathan S. Yedidia. An improved three-weight message- passing algorithm. arXiv:1305.1961 [cs.AI], 2013.
[14] G. David Forney Jr. Codes on graphs: Normal realizations. IEEE Transactions on Information Theory, 47(2):520548, 2001.
[15] Sertac Karaman and Emilio Frazzoli. Incremental sampling-based algorithms for optimal motion plan- ning. arXiv preprint arXiv:1005.0416, 2010.
[16] R. Glowinski and A. Marrocco. Sur lapproximation, par elements finis dordre un, et la resolution, par penalisization-dualite, dune class de proble`ms de Dirichlet non lineare. Revue Francaise dAutomatique, Informatique, et Recherche Operationelle, 9(2):4176, 1975.
[17] Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):1740, 1976.
[18] Hugh Everett III. Generalized lagrange multiplier method for solving problems of optimum allocation of resources. Operations Research, 11(3):399417, 1963.
[19] Magnus R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303320, 1969.
[20] Magnus R. Hestenes. Multiplier and gradient methods. In L.A. Zadeh et al., editor, Computing Methods in Optimization Problems 2. Academic Press, New York, 1969.
[21] M.J.D. Powell. A method for nonlinear constraints in minimization problems. In R. Fletcher, editor, Optimization. Academic Press, London, 1969.
[22] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1122, 2011.
-----1
[1] M. Datar, N. Immorlica, P. Indyk, and V.S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253262. ACM, 2004.
[2] W. Dong and M. Charikar. Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces. SIGIR, 2008.
[3] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. TPAMI, 2012.
[4] A. Gordo and F. Perronnin. Asymmetric distances for binary embeddings. CVPR, 2011.
[5] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 2009.
[6] W. Liu, R. Ji J. Wang, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. CVPR, 2012.
[7] W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. ICML, 2011.
[8] M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. ICML, 2011.
[9] M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. NIPS, 2012.
[10] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels.NIPS, 2009.
[11] R. Salakhutdinov and G. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 2009.
[12] N. Snavely, S. M. Seitz, and R.Szeliski. Photo tourism: Exploring photo collections in 3d. In Proc. SIGGRAPH, 2006.
[13] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition.CVPR, 2008.
[14] J. Wang, S. Kumar, and S. Chang. Sequential projection learning for hashing with compact codes. ICML, 2010.
[15] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.
-----1
[1] G. Amato, F. Rabitti, P. Savino, and P. Zezula. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst., 21(2):192227, Apr. 2003.
[2] G. Amato and P. Savino. Approximate similarity search in metric spaces using inverted files.In Proceedings of the 3rd international conference on Scalable information systems, InfoScale 08, pages 28:128:10, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
[3] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. BoostMap: A method for efficient approx- imate similarity rankings. In Computer Vision and Pattern Recognition, 2004. CVPR 2004.Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II268  II275 Vol.2, june-2 july 2004.
[4] J. Bentley. Multidimensional binary search trees used for associative searching. Communica- tions of the ACM, 18(9):509517, 1975.
[5] L. Boytsov and B. Naidan. Engineering efficient and effective Non-Metric Space Library. In N. Brisaboa, O. Pedreira, and P. Zezula, editors, Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science, pages 280293. Springer Berlin Heidelberg, 2013.
[6] L. Cayton. Fast nearest neighbor retrieval for Bregman divergences. In Proceedings of the 25th international conference on Machine learning, ICML 08, pages 112119, New York, NY, USA, 2008. ACM.
[7] L. Cayton and S. Dasgupta. A learning framework for nearest neighbor search. Advances in Neural Information Processing Systems, 20, 2007.
[8] E. Chavez, K. Figueroa, and G. Navarro. Effective proximity retrieval by ordering permuta- tions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):1647 1658, sept. 2008.
[9] E. Chavez and G. Navarro. Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters, 85(1):3946, 2003.
[10] E. Chavez, G. Navarro, R. Baeza-Yates, and J. L. Marroquin. Searching in metric spaces. ACM Computing Surveys, 33(3):273321, 2001.
[11] W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling LSH for performance tun- ing. In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM 08, pages 669678, New York, NY, USA, 2008. ACM.
[12] O. Edsberg and M. L. Hetland. Indexing inexact proximity search with distance regression in pivot space. In Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 10, pages 5158, New York, NY, USA, 2010. ACM.
[13] A. Esuli. Use of permutation prefixes for efficient and scalable approximate similarity search.Inf. Process. Manage., 48(5):889902, Sept. 2012.
[14] E. Gonzalez, K. Figueroa, and G. Navarro. Effective proximity retrieval by ordering permu- tations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):16471658, 2008.
[15] L. V. Hedges and J. L. Vevea. Fixed-and random-effects models in meta-analysis. Psycholog- ical methods, 3(4):486504, 1998.
[16] G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5):530549, 2003.
[17] P. Indyk. Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. ORourke, editors, Handbook of discrete and computational geometry, pages 877892. Chapman and Hall/CRC, 2004.
[18] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of di- mensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, STOC 98, pages 604613, New York, NY, USA, 1998. ACM.
[19] D. Jacobs, D. Weinshall, and Y. Gdalyahu. Classification with nonmetric distances: Image re- trieval and class representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(6):583600, 2000.
[20] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th annual ACM symposium on Theory of computing, STOC 98, pages 614623, New York, NY, USA, 1998. ACM.
[21] H. Lejsek, F. Asmundsson, B. Jonsson, and L. Amsaleg. NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(5):869 883, may 2009.
[22] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB 07, pages 950961. VLDB Endowment, 2007.
[23] Y. Mu and S. Yan. Non-metric locality-sensitive hashing. In AAAI, 2010.
[24] T. Murakami, K. Takahashi, S. Serita, and Y. Fujii. Versatile probability-based indexing for approximate similarity search. In Proceedings of the Fourth International Conference on SIm- ilarity Search and APplications, SISAP 11, pages 5158, New York, NY, USA, 2011. ACM.
[25] V. Pestov. Indexability, concentration, and {VC} theory. Journal of Discrete Algorithms, 13(0):2  18, 2012. Best Papers from the 3rd International Conference on Similarity Search and Applications (SISAP 2010).
[26] P. Ram, D. Lee, H. Ouyang, and A. G. Gray. Rank-approximate nearest neighbor search: Re- taining meaning and speed in high dimensions. In Advances in Neural Information Processing Systems, pages 15361544, 2009.
[27] H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005.
[28] J. Uhlmann. Satisfying general proximity similarity queries with metric trees. Information Processing Letters, 40:175179, 1991.
[29] J. Vermorel. Near neighbor search in metric and nonmetric space, 2005. http:// hal.archives-ouvertes.fr/docs/00/03/04/85/PDF/densitree.pdf last accessed on Nov 1st 2012.
[30] R. Weber, H. J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th Interna- tional Conference on Very Large Data Bases, pages 194205. Morgan Kaufmann, August 1998.
[31] P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, SODA 93, pages 311321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics.
[32] P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2005.
[33] P. Zezula, P. Savino, G. Amato, and F. Rabitti. Approximate similarity retrieval with M-trees.The VLDB Journal, 7(4):275293, Dec. 1998.
[34] Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. K. H. Tung. Similarity search on Bregman divergence: towards non-metric indexing. Proc. VLDB Endow., 2(1):1324, Aug. 2009.
-----1
[1] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger.Supervised semantic indexing. In CIKM09, pages 187196, 2009.
[2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[3] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Proc. of Computer Vision and Pattern Recognition Conference. IEEE Press, 2005.
[4] A. Dennis and D. Ventura. Learning the architecture of sum-product networks using clustering on vari- ables. In Advances in Neural Information Processing Systems 25.
[5] R. Gens and P. Domingos. Discriminative learning of sum-product networks. In NIPS, pages 32483256, 2012.
[6] D. Grangier and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE transactions on PAMI, 30(8):13711384, 2008.
[7] D. Hardoon and J. Shawe-Taylor. Kcca for different level precision in content-based image retrieval. In Proceedings of Third International Workshop on Content-Based Multimedia Indexing, 2003.
[8] K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR, pages 4148, 2000.
[9] Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. In G. Orr and M. K., editors, Neural Networks: Tricks of the trade. Springer, 1998.
[10] M. Littman, S. Dumais, and T. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval, chapter 5, pages 5162, 1998.
[11] A. K. Menon and C. Elkan. Link prediction via matrix factorization. In Proceedings of the 2011 Eu- ropean conference on Machine learning and knowledge discovery in databases - Volume Part II, ECML PKDD11, pages 437452, 2011.
[12] M. Minsky and S. Papert. Perceptrons - an introduction to computational geometry. MIT Press, 1987.
[13] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In International Conference on Machine Learning (ICML), Bellevue, USA, June 2011.
[14] V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned pho- tographs. In Neural Information Processing Systems (NIPS), 2011.
[15] H. Poon and P. Domingos. Sum-product networks: A new deep architecture. In UAI, pages 337346, 2011.
[16] R. Socher and E. Huang and J. Pennington and A. Ng and C. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In Advances in NIPS 24. 2011.
[17] N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 22312239, 2012.
[18] B. Wang, X. Wang, C. Sun, B. Liu, and L. Sun. Modeling semantic relevance for question-answer pairs in web social communities. In ACL, pages 12301238, 2010.
[19] W. Wu, H. Li, and J. Xu. Learning query and document similarities from click-through bipartite graph with metadata. In Proceedings of the sixth ACM international conference on WSDM, pages 687696, 2013.
[20] W. Wu, Z. Lu, and H. Li. Regularized mapping to latent structures and its application to web search.Technical report.
-----0
Aaron Courville, James Bergstra, and Yoshua Bengio. Unsupervised models of images by spikeand-slab RBMs. In Proceedings of the 28th International Conference on Machine Learning, pages 952960, 2011.
Maria Anglica Cueto, Jason Morton, and Bernd Sturmfels. Geometry of the Restricted Boltzmann Machine. arxiv:0908.4425v1, 2009.
J. Forster. A linear lower bound on the unbounded error probabilistic communication complexity. J.Comput. Syst. Sci., 65(4):612625, 2002.Yoav Freund and David Haussler. Unsupervised learning of distributions on binary vectors using two layer networks, 1994.
A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, and G. Turan. Threshold circuits of bounded depth. J.Comput. System. Sci., 46:129154, 1993.
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.Science, 313(5786):504507, 2006. ISSN 1095-9203.
Geoffrey Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:2002, 2000.
Geoffrey E. Hinton, Simon Osindero, Max Welling, and Yee Whye Teh. Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive Science, 30(4):725731, 2006.
Nicolas Le Roux and Yoshua Bengio. Representational power of Restricted Boltzmann Machines and deep belief networks. Neural Computation, 20(6):16311649, 2008.
Philip Long and Rocco Servedio. Restricted Boltzmann Machines are hard to approximately evaluate or simulate. In Proceedings of the 27th International Conference on Machine Learning, pages 952960, 2010.Wolfgang Maass. Bounds for the computational power and learning complexity of analog neural nets (extended abstract). In Proc. of the 25th ACM Symp. Theory of Computing, pages 335344, 1992.
Wolfgang Maass, Georg Schnitger, and Eduardo D. Sontag. A comparison of the computational power of sigmoid and boolean threshold circuits. In Theoretical Advances in Neural Computation and Learning, pages 127151. Kluwer, 1994.
Benjamin M. Marlin, Kevin Swersky, Bo Chen, and Nando de Freitas. Inductive principles for Restricted Boltzmann Machine learning. Journal of Machine Learning Research Proceedings Track, 9:509516, 2010.
G. Montufar, J. Rauh, and N. Ay. Expressive power and approximation errors of Restricted Boltzmann Machines. In Advances in Neural Information Processing Systems, 2011.
Guido Montufar and Nihat Ay. Refinements of universal approximation results for deep belief networks and Restricted Boltzmann Machines. Neural Comput., 23(5):13061319, May 2011.
Saburo Muroga. Threshold logic and its applications. Wiley, 1971.Ruslan Salakhutdinov and Geoffrey E. Hinton. Deep boltzmann machines. Journal of Machine Learning Research Proceedings Track, 5:448455, 2009.
Ruslan Salakhutdinov and Iain Murray. On the quantitative analysis of Deep Belief Networks.In Andrew McCallum and Sam Roweis, editors, Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pages 872879. Omnipress, 2008.Yichuan Tang and Ilya Sutskever. Data normalization in the learning of Restricted Boltzmann Machines. Technical Report UTML-TR-11-2, Department of Computer Science, University of Toronto, 2011.
-----1
[1] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:11371155, 2003.
[2] Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neu- ral networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160167. ACM, 2008.
[3] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classi- fication: A deep learning approach. In ICML, 513520, 2011.
[4] Michael U Gutmann and Aapo Hyvarinen. Noise-contrastive estimation of unnormalized statistical mod- els, with applications to natural image statistics. The Journal of Machine Learning Research, 13:307361, 2012.
[5] Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 55285531. IEEE, 2011.
[6] Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget and Jan Cernocky. Strategies for Training Large Scale Neural Network Language Models. In Proc. Automatic Speech Recognition and Understand- ing, 2011.
[7] Tomas Mikolov. Statistical Language Models Based on Neural Networks. PhD thesis, PhD Thesis, Brno University of Technology, 2012.
[8] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.
[9] Tomas Mikolov, Wen-tau Yih and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
[10] Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:10811088, 2009.
[11] Andriy Mnih and Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.
[12] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Pro- ceedings of the international workshop on artificial intelligence and statistics, pages 246252, 2005.
[13] David E Rumelhart, Geoffrey E Hintont, and Ronald J Williams. Learning representations by back- propagating errors. Nature, 323(6088):533536, 1986.
[14] Holger Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007.
[15] Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML), volume 2, 2011.
[16] Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.
[17] Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computa- tional Linguistics, pages 384394. Association for Computational Linguistics, 2010.
[18] Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. In Journal of Artificial Intelligence Research, 37:141-188, 2010.
[19] Peter D. Turney. Distributional semantics beyond words: Supervised learning of analogy and paraphrase.In Transactions of the Association for Computational Linguistics (TACL), 353366, 2013.
[20] Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large vocabulary image annota- tion. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 27642770. AAAI Press, 2011.
-----0
Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers.
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. In NIPS2006.Bengio, Y., Courville, A., and Vincent, P. (2012). Representation learning: A review and new perspectives. Technical report, arXiv:1206.5538.
Bengio, Y., Mesnil, G., Dauphin, Y., and Rifai, S. (2013). Better mixing via deep representations.In ICML13.Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281305.
Dahl, G., Adams, R., and Larochelle, H. (2012). Training restricted boltzmann machines on word observations. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML 12, pages 679686, New York, NY, USA. Omnipress.Dauphin, Y., Glorot, X., and Bengio, Y. (2011). Large-scale learning of embeddings with reconstruction sampling. In ICML11.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., and Bengio, S. (2010). Why does unsupervised pre-training help deep learning? JMLR, 11, 625660.Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets.Neural Computation, 18, 15271554.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinv, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580.
Hyvarinen, A. (2005). Estimation of non-normalized statistical models using score matching. 6, 695709.
Hyvarinen, A. (2007). Some extensions of score matching. Computational Statistics and Data Analysis, 51, 24992512.Larochelle, H., Mandel, M. I., Pascanu, R., and Bengio, Y. (2012). Learning algorithms for the classification restricted boltzmann machine. Journal of Machine Learning Research, 13, 643 669.
Lewis, D. D., Yang, Y., Rose, T. G., Li, F., Dietterich, G., and Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361397.
Marlin, B., Swersky, K., Chen, B., and de Freitas, N. (2010). Inductive principles for restricted Boltzmann machine learning. volume 9, pages 509516.
Salakhutdinov, R. and Murray, I. (2008). On the quantitative analysis of deep belief networks. In ICML 2008, volume 25, pages 872879.
Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML2008, pages 10641071.
Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177228.
-----0
Alain, G. and Bengio, Y. (2013). What regularized auto-encoders learn from the data generating distribution. In International Conference on Learning Representations (ICLR2013).
Bengio, Y. and Yao, L. (2013). Bounding the test log-likelihood of generative models. Technical report, U. Montreal, arXiv.
Bengio, Y., Larochelle, H., and Vincent, P. (2006a). Non-local manifold Parzen windows. In NIPS05, pages 115122. MIT Press.
Bengio, Y., Monperrus, M., and Larochelle, H. (2006b). Nonlocal estimation of manifold structure.Neural Computation, 18(10).
Bengio, Y., Thibodeau-Laufer, E., and Yosinski, J. (2013a). Deep generative stochastic networks trainable by backprop. Technical Report arXiv:1306.1091, Universite de Montreal.Bengio, Y., Courville, A., and Vincent, P. (2013b). Unsupervised feature learning and deep learning: A review and new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI).
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., WardeFarley, D., and Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy).
Cho, K., Raiko, T., and Ilin, A. (2013). Enhanced gradient for training restricted boltzmann machines. Neural computation, 25(3), 805831.
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., and Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 4975.Hinton, G. E. (1999). Products of experts. In ICANN1999.Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets.Neural Computation, 18, 15271554.
Hyvarinen, A. (2005). Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6, 695709.
Kingma, D. and LeCun, Y. (2010). Regularized estimation of image statistics by score matching.In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 11261134.Ranzato, M., Boureau, Y.-L., and LeCun, Y. (2008). Sparse feature learning for deep belief networks.In NIPS07, pages 11851192, Cambridge, MA. MIT Press.
Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In ICML2011.
Rifai, S., Bengio, Y., Dauphin, Y., and Vincent, P. (2012). A generative process for sampling contractive auto-encoders. In ICML2012.
Swersky, K., Ranzato, M., Buchman, D., Marlin, B., and de Freitas, N. (2011). On autoencoders and score matching for energy based models. In ICML2011. ACM.
Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7).
-----1
[1] Arnold, L. and Ollivier, Y. (2012). Layer-wise learning of deep generative models. Technical report, arXiv:1212.1524.
[2] Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., and Bengio, Y. (2012). Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.
[3] Bengio, Y., Thibodeau-Laufer, E., and Yosinski, J. (2013). Deep generative stochastic networks trainable by backprop. Technical Report arXiv:1306.1091, Universite de Montreal.
[4] Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.
[5] Blackwell, D. (1947). Conditional Expectation and Unbiased Sequential Estimation. Ann.Math.Statist., 18, 105110.
[6] Brakel, P., Stroobandt, D., and Schrauwen, B. (2013). Training energy-based models for time-series impu- tation. Journal of Machine Learning Research, 14, 27712797.
[7] Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y. (2013a). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214.
[8] Goodfellow, I. J., Courville, A., and Bengio, Y. (2013b). Scaling up spike-and-slab models for unsupervised feature learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 19021914.
[9] Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., and Kadie, C. (2000). Dependency net- works for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 4975.
[10] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinv, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580.
[11] Kolmogorov, A. (1953). Unbiased Estimates:. American Mathematical Society translations. American Mathematical Society.
[12] Le Roux, N. and Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 16311649.
[13] LeCun, Y., Huang, F.-J., and Bottou, L. (????). Learning methods for generic object recognition with invariance to pose and lighting. In CVPR2004, pages 97104.
[14] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 22782324.
[15] Montavon, G. and Muller, K.-R. (2012). Learning feature hierarchies with centered deep Boltzmann machines. CoRR, abs/1203.4416.
[16] Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125139.
[17] Rao, C. R. (1973). Linear Statistical Inference and its Applications. J. Wiley and Sons, New York, 2nd edition.
[18] Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 8.
[19] Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Proceedings of the 18th Annual Conference on Learning Theory, pages 545560. Springer-Verlag.
[20] Stoyanov, V., Ropson, A., and Eisner, J. (2011). Empirical risk minimization of graphical model parame- ters given approximate inference, decoding, and model structure. In AISTATS2011.
[21] Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML2008, pages 10641071.
[22] Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177228.
-----1
[1] Y. Bengio. Deep learning of representations: Looking forward. Technical Report arXiv:1305.0445, Universite de Montreal, 2013.
[2] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification.In IEEE Computer Vision and Pattern Recognition, pages 36423649, 2012.
[3] D. Ciresan, U. Meier, and J. Masci. High-performance neural networks for visual object classification.arXiv:1102.0183, 2011.
[4] A. Coates, A. Karpathy, and A. Ng. Emergence of object-selective features in unsupervised feature learning. In Advances in Neural Information Processing Systems, pages 26902698, 2012.
[5] A. Coates and A. Y. Ng. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems, pages 25282536, 2011.
[6] A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning.In Artificial Intelligence and Statistics, 2011.
[7] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 12321240, 2012.
[8] L. Deng, D. Yu, and J. Platt. Scalable stacking and learning for building deep architectures. In Interna- tional Conference on Acoustics, Speech, and Signal Processing, pages 21332136, 2012.
[9] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In Interna- tional Conference on Machine Learning, 2013.
[10] K. Gregor and Y. LeCun. Emergence of complex-like cells in a temporal product network with local receptive fields. arXiv preprint arXiv:1006.0448, 2010.
[11] C. Gulcehre and Y. Bengio. Knowledge matters: Importance of prior information for optimization. In International Conference on Learning Representations, 2013.
[12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural net- works by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.
[13] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural net- works. In Advances in Neural Information Processing Systems, pages 11061114, 2012.
[14] K. Lang and G. Hinton. Dimensionality reduction and prior knowledge in e-set recognition. In Advances in Neural Information Processing Systems, 1990.
[15] Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng. ICA with reconstruction cost for efficient overcomplete feature learning. Advances in Neural Information Processing Systems, 24:10171025, 2011.
[16] Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high- level features using large scale unsupervised learning. In International Conference on Machine Learning, 2012.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998.
[18] Y. LeCun, J. S. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal brain damage. In Advances in Neural Information Processing Systems, pages 598605, 1990.
[19] K.-F. Lee and H.-W. Hon. Speaker-independent phone recognition using hidden markov models. Acous- tics, Speech and Signal Processing, IEEE Transactions on, 37(11):16411648, 1989.
[20] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, pages 807814. Omnipress Madison, WI, 2010.
[21] M. Ranzato, A. Krizhevsky, and G. E. Hinton. Factored 3-way restricted Boltzmann machines for mod- eling natural images. In Artificial Intelligence and Statistics, 2010.
[22] R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua. Learning separable filters. In IEEE Computer Vision and Pattern Recognition, 2013.
[23] R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58:15531564, 2010.
[24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors.Nature, 323(6088):533536, 1986.
[25] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.
[26] K. Swersky, M. Ranzato, D. Buchman, B. Marlin, and N. Freitas. On autoencoders and score matching for energy based models. In International Conference on Machine Learning, pages 12011208, 2011.
[27] P. Vincent and Y. Bengio. A neural support vector network architecture with adaptive kernels. In Inter- national Joint Conference on Neural Networks, pages 187192, 2000.
-----1
[1] C. M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994.
[2] R. M. Neal. Connectionist learning of belief networks. volume 56, pages 71113, July 1992.
[3] R. M. Neal. Learning stochastic feedforward networks. Technical report, University of Toronto, 1990.
[4] Lawrence K. Saul, Tommi Jaakkola, and Michael I. Jordan. Mean field theory for sigmoid belief networks.Journal of Artificial Intelligence Research, 4:6176, 1996.
[5] David Barber and Peter Sollich. Gaussian fields for approximate inference in layered sigmoid belief networks. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Muller, editors, NIPS, pages 393399. The MIT Press, 1999.
[6] G. Taylor, G. E. Hinton, and S. Roweis. Modeling human motion using binary latent variables. In NIPS, 2006.
[7] Carl Edward Rasmussen. Gaussian processes for machine learning. MIT Press, 2006.
[8] H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications, volume 104 of Mono- graphs on Statistics and Applied Probability. Chapman & Hall, London, 2005.
[9] John Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. pages 282289. Morgan Kaufmann, 2001.
[10] Volodymyr Mnih, Hugo Larochelle, and Geoffrey Hinton. Conditional restricted boltzmann machines for structured output prediction. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 2011.
[11] Yujia Li, Daniel Tarlow, and Richard Zemel. Exploring compositional high order pattern potentials for structured output learning. In Proceedings of International Conference on Computer Vision and Pattern Recognition, 2013.
[12] R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355368. 1998.
[13] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:17711800, 2002.
[14] R. M. Neal. Annealed importance sampling. Statistics and Computing, 11:125139, 2001.
[15] R. Salakhutdinov and I. Murray. On the quantitative analysis of deep belief networks. In Proceedings of the Intl. Conf. on Machine Learning, volume 25, 2008.
[16] J.M. Susskind. The Toronto Face Database. Technical report, 2011. http://aclab.ca/users/josh/TFD.html.
[17] Zoubin Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, 1996.
[18] Ian Nabney. NETLAB: algorithms for pattern recognitions. Advances in pattern recognition. Springer- Verlag, 2002.
[19] V. Nair and G. E. Hinton. 3-D object recognition with deep belief nets. In NIPS 22, 2009.
[20] J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders. The amsterdam library of object images.International Journal of Computer Vision, 61(1), January 2005.
[21] Eran Borenstein and Shimon Ullman. Class-specific, top-down segmentation. In In ECCV, pages 109 124, 2002.
[22] G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214):11581161, 1995.
[23] R. Salakhutdinov and H. Larochelle. Efficient learning of deep boltzmann machines. AISTATS, 2010.
-----1
[1] M. Baroni and A. Lenci. Distributional memory: A general framework for corpus-based semantics.Computational Linguistics, 36(4):673721, 2010.
[2] E. Bart and S. Ullman. Cross-generalization: learning novel classes from a single example by feature replacement. In CVPR, 2005.
[3] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach.Learn. Res., 3, March 2003.
[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL, 2007.
[5] E. Bruni, G. Boleda, M. Baroni, and N. Tran. Distributional semantics in technicolor. In ACL, 2012.
[6] A. Coates and A. Ng. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization . In ICML, 2011.
[7] R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008.
[8] J. Curran. From Distributional to Semantic Similarity. PhD thesis, University of Edinburgh, 2004.
[9] K. Erk and S. Pado. A structured vector space model for word meaning in context. In EMNLP, 2008.
[10] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In CVPR, 2009.
[11] Y. Feng and M. Lapata. Visual information in semantic representation. In HLT-NAACL, 2010.
[12] M. Fink. Object classification from a single example utilizing class relevance pseudo-metrics. In NIPS, 2004.
[13] X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for Large-Scale sentiment classification: A deep learning approach. In ICML, 2011.
[14] D. Hoiem, A.A. Efros, and M. Herbert. Geometric context from a single image. In ICCV, 2005.
[15] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL, 2012.
[16] Yangqing Jia, Chang Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3370 3377, june 2012.
[17] H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. LoOP: local Outlier Probabilities. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM 09, 2009.
[18] Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Masters thesis, Computer Science Department, University of Toronto, 2009.
[19] R.; Perona L. Fei-Fei; Fergus. One-shot learning of object categories. TPAMI, 28, 2006.
[20] B. M. Lake, J. Gross R. Salakhutdinov, and J. B. Tenenbaum. One shot learning of simple visual concepts.In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.
[21] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to Detect Unseen Object Classes by Between- Class Attribute Transfer. In CVPR, 2009.
[22] T. K. Landauer and S. T. Dumais. A solution to Platos problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2):211240, 1997.
[23] C.W. Leong and R. Mihalcea. Going beyond text: A hybrid image-text approach for measuring word relatedness. In IJCNLP, 2011.
[24] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL, pages 768774, 1998.
[25] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A.Y. Ng. Multimodal deep learning. In ICML, 2011.
[26] S. Pado and M. Lapata. Dependency-based construction of semantic space models. Computational Lin- guistics, 33(2):161199, 2007.
[27] M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell. Zero-shot learning with semantic output codes.In NIPS, 2009.
[28] Guo-Jun Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, and T. Huang. Towards cross-category knowledge propagation for learning visual concepts. In CVPR, 2011.
[29] A. Torralba R. Salakhutdinov, J. Tenenbaum. Learning to learn with compound hierarchical-deep models.In NIPS, 2012.
[30] H. Schutze. Automatic word sense discrimination. Computational Linguistics, 24:97124, 1998.
[31] R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
[32] P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141188, 2010.
[33] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008.
-----1
[1] G.A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 1995.
[2] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, 2007.
[3] J. Graupmann, R. Schenkel, and G. Weikum. The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents. In Proceedings of the 31st international conference on Very large data bases, VLDB, 2005.
[4] V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In ACL, 2002.
[5] R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. In NIPS, 2005.
[6] A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction.In EMNLP, 2011.
[7] G. Angeli and C. D. Manning. Philosophers are mortal: Inferring the truth of unseen facts. In CoNLL, 2013.
[8] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowl- edge bases. In AAAI, 2011.
[9] R. Jenatton, N. Le Roux, A. Bordes, and G. Obozinski. A latent factor model for highly multi-relational data. In NIPS, 2012.
[10] A. Bordes, X. Glorot, J. Weston, and Y. Bengio. Joint Learning of Words and Meaning Repre- sentations for Open-Text Semantic Parsing. AISTATS, 2012.
[11] I. Sutskever, R. Salakhutdinov, and J. B. Tenenbaum. Modelling relational data using Bayesian clustered tensor factorization. In NIPS, 2009.
[12] M. Ranzato and A. Krizhevsky G. E. Hinton. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images. AISTATS, 2010.
[13] D. Yu, L. Deng, and F. Seide. Large vocabulary speech recognition using deep tensor neural networks. In INTERSPEECH, 2012.
[14] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.
[15] A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni, and S. Soderland. Textrunner: Open information extraction on the web. In HLT-NAACL (Demonstrations), 2007.
[16] M. Nickel, V. Tresp, and H. Kriegel. A three-way model for collective learning on multi- relational data. In ICML, 2011.
[17] A. Bordes, N. Usunier, A. Garca-Durn, J. Weston, and O. Yakhnenko. Irreflexive and hierar- chical relations as translations. CoRR, abs/1304.7158, 2013.
[18] J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of ACL, pages 384394, 2010.
[19] R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In EMNLP, 2012.
[20] R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008.
[21] R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS. MIT Press, 2011.
[22] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL, 2012.
[23] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD, 2008.
[24] N. Tandon, G. de Melo, and G. Weikum. Deriving a web-scale commonsense fact database. In AAAI Conference on Artificial Intelligence (AAAI 2011), 2011.
-----1
[1] E. Bart, I. Porteous, P. Perona, and M. Welling. Unsupervised learning of visual taxonomies. In CVPR, pages 18, 2008.
[2] Y. Bengio and Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 2007.
[3] Hal Daume, III. Bayesian multitask learning with latent hierarchies. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 09, pages 135142, Arlington, Virginia, United States, 2009. AUAI Press.
[4] Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. In ACM SIGKDD, 2004.
[5] Li Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594611, April 2006.
[6] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 13191327, 2013.
[7] M. Guillaumin, J. Verbeek, and C. Schmid. Multimodal semi-supervised learning for image classification.In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 902 909, june 2010.
[8] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Im- proving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.
[9] Mark J. Huiskes and Michael S. Lew. The MIR Flickr retrieval evaluation. In MIR 08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.
[10] Mark J. Huiskes, Bart Thomee, and Michael S. Lew. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In Multimedia Information Retrieval, pages 527536, 2010.
[11] Zhuoliang Kang, Kristen Grauman, and Fei Sha. Learning with whom to share in multi-task feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML 11, pages 521528, New York, NY, USA, June 2011. ACM.
[12] Seyoung Kim and Eric P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity.In ICML, pages 543550, 2010.
[13] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. MIT Press, 2012.
[15] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Interna- tional Conference on Machine Learning, pages 609616, 2009.
[16] George A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):3941, November 1995.
[17] R. Salakhutdinov, J. Tenenbaum, and A. Torralba. Learning to learn with compound hierarchical-deep models. In NIPS. MIT Press, 2011.
[18] R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011.
[19] R. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 12, 2009.
[20] Babak Shahbaba and Radford M. Neal. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2(1):221238, 2007.
[21] Nitish Srivastava and Ruslan Salakhutdinov. Multimodal learning with deep boltzmann machines. In Advances in Neural Information Processing Systems 25, pages 22312239. MIT Press, 2012.
[22] Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. Image Annotation with TagProp on the MIRFLICKR set. In 11th ACM International Conference on Multimedia Information Retrieval (MIR 10), pages 537546. ACM Press, 2010.
[23] Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res., 8:3563, May 2007.
[24] Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.
[25] Alon Zweig and Daphna Weinshall. Hierarchical regularization cascade for joint learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 3745, May 2013.
-----1
[1] G. R. Arce. Nonlinear signal processing: A statistical approach. Wiley-Interscience, 2005.
[2] E. Arias-Castro and D. L. Donoho. Does median filtering truly preserve edges better than linear filtering? The Annals of Statistics, 37(3):11721206, 2009.
[3] D. I. Barnea and H. F. Silverman. A class of algorithms for fast digital image registration. IEEE Transac- tions on Computers, 100(2):179186, 1972.
[4] Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1127, 2009.
[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, 2007.
[6] R. Bourne. Image filters. In Fundamentals of Digital Imaging in Medicine, pages 137172. Springer London, 2010.
[7] R. G. Brown and P. Y. Hwang. Introduction to random signals and applied Kalman filtering, volume 1.John Wiley & Sons New York, 1992.
[8] A. Buades, B. Coll, and J.-M. Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490530, 2005.
[9] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In CVPR, 2012.
[10] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification.In CVPR, 2012.
[11] K. Dabov, R. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):20802095, 2007.
[12] L. W. Goldman. Principles of CT: Radiation dose and image quality. Journal of Nuclear Medicine Technology, 35(4):213225, 2007.
[13] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Compu- tation, 18(7):15271554, 2006.
[14] G. E. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504507, 2006.
[15] W. Huda. Dose and image quality in CT. Pediatric Radiology, 32(10):709713, 2002.
[16] N. C. Institute. The Cancer Imaging Archive. http://www.cancerimagingarchive.net, 2013.
[17] V. Jain and H. S. Seung. Natural image denoising with convolutional networks. In NIPS, 2008.
[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998.
[19] F. Luisier, T. Blu, and M. Unser. A new SURE approach to image denoising: Interscale orthonormal wavelet thresholding. IEEE Transactions on Image Processing, 16(3):593606, 2007.
[20] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration.In ICCV, 2009.
[21] M. C. Motwani, M. C. Gadiya, R. C. Motwani, and F. C. Harris. Survey of image denoising techniques.In GSPX, 2004.
[22] J. Park and I. W. Sandberg. Universal approximation using radial-basis-function networks. Neural Com- putation, 3(2):246257, 1991.
[23] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image Processing, 12(11):13381351, 2003.
[24] M. G. Rathor, M. A. Kaushik, and M. V. Gupta. Medical images denoising techniques review. Interna- tional Journal of Electronics Communication and Microelectronics Designing, 1(1):3336, 2012.
[25] R. Siemund, A. Love, D. van Westen, L. Stenberg, C. Petersen, and I. Bjorkman-Burtscher. Radiation dose reduction in CT of the brain: Can advanced noise filtering compensate for loss of image quality? Acta Radiologica, 53(4):468472, 2012.
[26] Y. Tang and C. Eliasmith. Deep networks for robust visual recognition. In ICML, 2010.
[27] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:33713408, 2010.
[28] N. Wiener. Extrapolation, interpolation, and smoothing of stationary time series: with engineering ap- plications. Technology Press of the Massachusetts Institute of Technology, 1950.
[29] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In NIPS, 2012.
-----1
[1] G. E. Hinton, S. Osindero, and Y.-W. Teh, A fast learning algorithm for deep belief networks, Neural Computation, vol. 18, no. 7, pp. 15271554, 2006.
[2] Y. LeCun, Une procedure dapprentissage pour reseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks), in Cognitiva 85, 1985.
[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol. 323, pp. 533  536, October 1986.
[4] Y. Bengio, Learning deep architectures for AI, Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1127, 2009.
[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, in NIPS, 2006.
[6] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, Efficient learning of sparse representations with an energy-based model, in NIPS, 2006.
[7] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in ICML, 2008.
[8] P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory, in Paral- lel Distributed Processing: Volume 1: Foundations, ch. 6, pp. 194281, MIT Press, 1986.
[9] G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, vol. 14, no. 8, p. 17711800, 2002.
[10] H. Lee, C. Ekanadham, and A. Ng, Sparse deep belief net model for visual area V2, in NIPS, 2008.
[11] H. Goh, N. Thome, and M. Cord, Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity, in NIPS Workshop, 2010.
[12] H. Larochelle and Y. Bengio, Classification using discriminative restricted Boltzmann machines, in ICML, 2008.
[13] N. Le Roux and Y. Bengio, Representational power of restricted Boltzmann machines and deep belief networks, Neural Computation, vol. 20, pp. 16311649, June 2008.
[14] I. Sutskever and G. E. Hinton, Learning multilevel distributed representations for high-dimensional se- quences, in AISTATS, 2007.
[15] G. E. Hinton and R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol. 28, pp. 504507, 2006.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recogni- tion, Proceedings of the IEEE, vol. 86, pp. 22782324, November 1998.
[17] L. Fei-Fei, R. Fergus, and P. Perona, Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories, CVPR Workshop, 2004.
[18] R. Salakhutdinov and G. E. Hinton, Learning a nonlinear embedding by preserving class neighbourhood structure, in AISTATS, 2007.
[19] L. Deng and D. Yu, Deep convex net: A scalable architecture for speech pattern classification, in Inter- speech, 2011.
[20] D. C. Ciresan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classifica- tion, in CVPR, 2012.
[21] D. Lowe, Object recognition from local scale-invariant features, in CVPR, 1999.
[22] Y. Boureau, F. Bach, Y. LeCun, and J. Ponce, Learning mid-level features for recognition, in CVPR, 2010.
[23] H. Goh, N. Thome, M. Cord, and J.-H. Lim, Unsupervised and supervised visual codes with restricted Boltzmann machines, in ECCV, 2012.
[24] S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial pyramid matching for recogniz- ing natural scene categories, in CVPR, 2006.
[25] S. Avila, N. Thome, M. Cord, E. Valle, and A. Araujo, Pooling in image representation: the visual codeword point of view, Computer Vision and Image Understanding, pp. 453465, May 2013.
[26] C. Theriault, N. Thome, and M. Cord, Extended coding and pooling in the HMAX model, IEEE Trans- action on Image Processing, 2013.
[27] K. Sohn, D. Y. Jung, H. Lee, and A. Hero III, Efficient learning of sparse, distributed, convolutional feature representations for object recognition, in ICCV, 2011.
[28] K. Sohn, G. Zhou, C. Lee, and H. Lee, Learning and selecting features jointly with point-wise gated boltzmann machines, in ICML, 2013.
-----1
[1] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. Advances in neural information processing systems, 19:153, 2007.
[2] Yoshua Bengio. Learning deep architectures for ai. Foundations and Trends R in Machine Learning, 2(1):1127, 2009.
[3] C.M. Bishop. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108116, 1995.
[4] A. Coates and A.Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In International Conference on Machine Learning, volume 8, page 10, 2011.
[5] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In Interna- tional Conference on Machine Learning, 2010.
[6] G.E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):15271554, 2006.
[7] G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural net- works. Science, 313(5786):504507, 2006.
[8] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Im- proving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[9] Kevin Jarrett, Koray Kavukcuoglu, MarcAurelio Ranzato, and Yann LeCun. What is the best multi-stage architecture for object recognition? In Computer Vision, 2009 IEEE 12th Interna- tional Conference on, pages 21462153. IEEE, 2009.
[10] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 2012.
[11] Yann LeCun, Fu Jie Huang, and Leon Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II97.IEEE, 2004.
[12] V. Nair and G. Hinton. 3d object recognition with deep belief nets. Advances in Neural Information Processing Systems, 22:13391347, 2009.
[13] V. Nair and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc.27th International Conference on Machine Learning, pages 807814. Omnipress Madison, WI, 2010.
[14] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Ex- plicit invariance during feature extraction. In Proceedings of the Twenty-eight International Conference on Machine Learning (ICML11), 2011.
[15] Ruslan Salakhutdinov and Hugo Larochelle. Efficient learning of deep boltzmann machines.In International Conference on Artificial Intelligence and Statistics. Citeseer, 2010.
[16] J. Sietsma and R.J.F. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1):6779, 1991.
[17] Jasper Snoek, Ryan P Adams, and Hugo Larochelle. Nonparametric guidance of autoencoder representations using label information. Journal of Machine Learning Research, 13:2567 2588, 2012.
[18] Jasper Snoek, Hugo Larochelle, and Ryan Adams. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960 2968, 2012.
[19] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning.
[20] Tijmen Tieleman. Gnumpy: an easy way to use gpu boards in python. Department of Computer Science, University of Toronto, 2010.
[21] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol. Stacked denoising autoen- coders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:33713408, 2010.
-----0
Arora, Raman, Cotter, Andrew, Livescu, Karen, and Srebro, Nathan. Stochastic optimization for 
PCA and PLS. In 50th Annual Allerton Conference on Communication, Control, and Computing, 2012.
Balzano, Laura, Nowak, Robert, and Recht, Benjamin. Online identification and tracking of subspaces from highly incomplete information. In 48th Annual Allerton Conference on Communication, Control, and Computing, 2010.
Beck, A. and Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167175, 2003.
Bottou, Leon and Bousquet, Olivier. The tradeoffs of large scale learning. In NIPS07, pp. 161168, 2007.
Boyd, Stephen and Vandenberghe, Lieven. Convex Optimization. Cambridge University Press, 2004.Brand, Matthew. Incremental singular value decomposition of uncertain data with missing values.In ECCV, 2002.
Collins, Michael, Globerson, Amir, Koo, Terry, Carreras, Xavier, and Bartlett, Peter L. Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. J.Mach. Learn. Res., 9:17751822, June 2008.
Duchi, John, Shalev-Shwartz, Shai, Singer, Yoram, and Chandra, Tushar. Efficient projections onto the l1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, ICML 08, pp. 272279, New York, NY, USA, 2008. ACM.
Nemirovski, Arkadi and Yudin, David. Problem complexity and method efficiency in optimization.John Wiley & Sons Ltd, 1983.
Nemirovski, Arkadi, Juditsky, Anatoli, Lan, Guanghui, and Shapiro, Alexander. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574 1609, January 2009.
Oja, Erkki and Karhunen, Juha. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications, 106: 6984, 1985.Sanger, Terence D. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 12:459473, 1989.
Shalev-Shwartz, Shai and Srebro, Nathan. SVM optimization: Inverse dependence on training set size. In ICML08, pp. 928935, 2008.
Shalev-Shwartz, Shai and Tewari, Ambuj. Stochastic methods for l1 regularized loss minimization.In Proceedings of the 26th Annual International Conference on Machine Learning, ICML09, pp.929936, New York, NY, USA, 2009. ACM.
Shalev-Shwartz, Shai, Singer, Yoram, and Srebro, Nathan. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML07, pp. 807814, 2007.
Srebro, N., Sridharan, K., and Tewari, A. On the universality of online mirror descent. Advances in Neural Information Processing Systems, 24, 2011.
Warmuth, Manfred K. and Kuzmin, Dima. Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research (JMLR), 9:2287 2320, 2008.
-----1
[1] Spall, J. Introduction to stochastic search and optimization: Estimation, simulation, and control. John Wiley and Sons, 2003.
[2] Bottou, L. Stochastic learning. In O. Bousquet, U. von Luxburg, eds., Advanced Lectures on Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages 146168. Springer Verlag, Berlin, 2004.
[3] Ross, S. M. Simulation. Elsevier, fourth edn., 2006.
[4] Nemirovski, A., A. Juditsky, G. Lan, et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):15741609, 2009.
[5] Paisley, J., D. Blei, M. Jordan. Variational Bayesian inference with stochastic search. In International Conference on Machine Learning. 2012.
[6] Lan, G. An optimal method for stochastic composite optimization. Mathematical Programming, 133:365 397, 2012.
[7] Chen, X., Q. Lin, J. Pena. Optimal regularized dual averaging methods for stochastic optimization. In Advances in Neural Information Processing Systems (NIPS). 2012.
[8] Boyd, S., L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[9] Schaul, T., S. Zhang, Y. LeCun. No More Pesky Learning Rates. ArXiv e-prints, 2012.
[10] Ranganath, R., C. Wang, D. M. Blei, et al. An adaptive learning rate for stochastic variational inference. In International Conference on Machine Learning. 2013.
[11] Hoffman, M., D. Blei, F. Bach. Online inference for latent Drichlet allocation. In Neural Information Processing Systems. 2010.
[12] Teh, Y., M. Jordan, M. Beal, et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):15661581, 2007.
[13] Wang, C., J. Paisley, D. Blei. Online variational inference for the hierarchical Dirichlet process. In International Conference on Artificial Intelligence and Statistics. 2011.
[14] Seung, D., L. Lee. Algorithms for non-negative matrix factorization. In Neural Information Processing Systems. 2001.
[15] Bishop, C. Pattern Recognition and Machine Learning. Springer New York., 2006.
[16] Jaakkola, T., M. Jordan. A variational approach to Bayesian logistic regression models and their extensions.In International Workshop on Artificial Intelligence and Statistics. 1996.
[17] Blei, D., A. Ng, M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:9931022, 2003.
[18] Blei, D., J. Lafferty. Topic models. In A. Srivastava, M. Sahami, eds., Text Mining: Theory and Applications.Taylor and Francis, 2009.
[19] Jordan, M., Z. Ghahramani, T. Jaakkola, et al. Introduction to variational methods for graphical models.Machine Learning, 37:183233, 1999.
[20] Amari, S. Natural gradient works efficiently in learning. Neural computation, 10(2):251276, 1998.
[21] Asuncion, A., M. Welling, P. Smyth, et al. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence. 2009.
[22] Hoffman, M., D. Blei, C. Wang, et al. Stochastic Variational Inference. Journal of Machine Learning Research, 2013.
-----0
Arora, R., Cotter, A., Livescu, K., and Srebro, N. Stochastic optimization for PCA and PLS. In 50th Allerton 
Conference on Communication, Control, and Computing, Monticello, IL, 2012.Balzano, L., Nowak, R., and Recht, B. Online identification and tracking of subspaces from highly incomplete information. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp. 704711, 2010.
Brand, M. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1):2030, 2006.Brand, Matthew. Incremental singular value decomposition of uncertain data with missing values. Computer VisionECCV 2002, pp. 707720, 2002.
Clarkson, Kenneth L. and Woodruff, David P. Numerical linear algebra in the streaming model. In Proceedings of the 41st annual ACM symposium on Theory of computing, pp. 205214, 2009.
Comon, P. and Golub, G. H. Tracking a few extreme singular values and vectors in signal processing. Proceedings of the IEEE, 78(8):13271343, 1990.
Golub, Gene H. and Van Loan, Charles F. Matrix computations, volume 3. JHUP, 2012.Halko, Nathan, Martinsson, Per-Gunnar, and Tropp, Joel A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217288, 2011.
He, J., Balzano, L., and Lui, J. Online robust subspace tracking from partial information. arXiv preprint arXiv:1109.3827, 2011.Herbster, Mark and Warmuth, Manfred K. Tracking the best linear predictor. The Journal of Machine Learning Research, 1:281309, 2001.
Johnstone, Iain M. On the distribution of the largest eigenvalue in principal components analysis.(english. Ann.Statist, 29(2):295327, 2001.
Li, Y. On incremental and robust subspace learning. Pattern recognition, 37(7):15091518, 2004.Mitliagkas, Ioannis, Caramanis, Constantine, and Jain, Prateek. Memory limited, streaming PCA. arXiv preprint arXiv:1307.0032, 2013.Nadler, Boaz. Finite sample approximation results for principal component analysis: a matrix perturbation approach. The Annals of Statistics, pp. 27912817, 2008.
Porteous, Ian, Newman, David, Ihler, Alexander, Asuncion, Arthur, Smyth, Padhraic, and Welling, Max. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 569577, 2008.
Robbins, Herbert and Monro, Sutton. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400407, 1951.
Roweis, Sam. EM algorithms for PCA and SPCA. Advances in neural information processing systems, pp.626632, 1998.
Rudelson, Mark and Vershynin, Roman. Smallest singular value of a random rectangular matrix. Communications on Pure and Applied Mathematics, 62(12):17071739, 2009.Tipping, Michael E. and Bishop, Christopher M. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611622, 1999.Vershynin, R. How close is the sample covariance matrix to the actual covariance matrix? Journal of Theoretical Probability, pp. 132, 2010a.
Vershynin, Roman. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010b.
Warmuth, Manfred K. and Kuzmin, Dima. Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9:22872320, 2008.
-----1
[AHK05] Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast algorithms for approximate semidefinite pro- gramming using the multiplicative weights update method. In Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, pages 339348. IEEE, 2005.
[AHK06] Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Proceedings of the 9th international conference on Approximation Algorithms for Com- binatorial Optimization Problems, and 10th international conference on Randomization and Com- putation, APPROX06/RANDOM06, pages 272279, Berlin, Heidelberg, 2006. Springer-Verlag.
[AKL13] Dimitris Achlioptas, Zohar Karnin, and Edo Liberty. Near-optimal entrywise sampling for data matrices. arXiv preprint, 2013.
[AKV02] Noga Alon, Michael Krivelevich, and VanH. Vu. On the concentration of eigenvalues of random symmetric matrices. Israel Journal of Mathematics, 131:259267, 2002.
[AM01] Dimitris Achlioptas and Frank McSherry. Fast computation of low rank matrix approximations. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 611618.ACM, 2001.
[AM07] Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations. J.ACM, 54(2), april 2007.
[AW02] Rudolf Ahlswede and Andreas Winter. Strong converse for identification via quantum channels.IEEE Transactions on Information Theory, 48(3):569579, 2002.
[CR09] Emmanuel J Cande`s and Benjamin Recht. Exact matrix completion via convex optimization. Foun- dations of Computational mathematics, 9(6):717772, 2009.
[CT10] Emmanuel J Cande`s and Terence Tao. The power of convex relaxation: Near-optimal matrix com- pletion. Information Theory, IEEE Transactions on, 56(5):20532080, 2010.
[dA08] Alexandre dAspremont. Subsampling algorithms for semidefinite programming. arXiv preprint arXiv:0803.1990, 2008.
[DKM06] Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte carlo algorithms for matrices; approximating matrix multiplication. SIAM J. Comput., 36(1):132157, July 2006.
[DZ11] Petros Drineas and Anastasios Zouzias. A note on element-wise matrix sparsification via a matrix- valued bernstein inequality. Inf. Process. Lett., 111(8):385389, 2011.
[FK81] Z. Furedi and J. Komlos. The eigenvalues of random symmetric matrices. Combinatorica, 1(3):233 241, 1981.
[GT09] Alex Gittens and Joel A Tropp. Error bounds for random matrix approximation schemes. arXiv preprint arXiv:0911.4108, 2009.
[Juh81] F. Juhasz. On the spectrum of a random graph. In Algebraic methods in graph theory, Vol. I, II (Szeged, 1978), volume 25 of Colloq. Math. Soc. Janos Bolyai, pages 313316. North-Holland, Amsterdam, 1981.
[NDT09] NH Nguyen, Petros Drineas, and TD Tran. Matrix sparsification via the khintchine inequality, 2009.
[NDT10] Nam H Nguyen, Petros Drineas, and Trac D Tran. Tensor sparsification via a bound on the spectral norm of random tensors. arXiv preprint arXiv:1005.4732, 2010.
[PCI`07] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[Rec11] Benjamin Recht. A simpler approach to matrix completion. J. Mach. Learn. Res., 12:34133430, December 2011.
[RV07] Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geo- metric functional analysis. J. ACM, 54(4), July 2007.
[Sty11] Will Styler. The enronsent corpus. In Technical Report 01-2011, University of Colorado at Boulder Institute of Cognitive Science, Boulder, CO., 2011.
[Tro12a] Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computa- tional Mathematics, 12(4):389434, 2012.
[Tro12b] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389434, 2012.
[Wig58] Eugene P. Wigner. On the distribution of the roots of certain symmetric matrices. Annals of Mathe- matics, 67(2):pp. 325327, 1958.
-----1
[1] O. Banerjee, L. E. Ghaoui, and A. dAspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 9:22612286, 2008.
[2] S. Boyd, E. Chu N. Parikh, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends Machine Learning, 3(1), 2011.
[3] T. Cai, W. Liu, and X. Luo. A constrained `1 minimization approach to sparse precision matrix estimation.Journal of American Statistical Association, 106:594607, 2011.
[4] T. Cai, W. Liu, and H. Zhou. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Preprint, 2012.
[5] J. Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. In High Performance Computing on the Information Superhighway, 1997.
[6] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012.
[7] J. Dean and S. Ghemawat. Map-Reduce: simplified data processing on large clusters. In CACM, 2008.
[8] J. Friedman, T. Hastie, and R. Tibshirani. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Biostatistics, 9:432441, 2008.
[9] Q. Fu, H. Wang, and A. Banerjee. Bethe-ADMM for tree decomposition based parallel MAP inference.In UAI, 2013.
[10] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, and C. D. Bloomfield. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science, pages 531537, 1999.
[11] K. Goto and R. Van De Geijn. Highperformance implementation of the level-3 BLAS. ACM Transactions on Mathematical Software, 35:114, 2008.
[12] B. He and X. Yuan. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Preprint, 2012.
[13] C. Hsieh, I. Dhillon, P. Ravikumar, and A. Banerjee. A divide-and-conquer method for sparse inverse covariance estimation. In NIPS, 2012.
[14] C. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar. Sparse inverse covariance matrix estimation using quadratic approximation. In NIPS, 2011.
[15] A. Cleary J. Demmel I. S. Dhillon J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker L. Blackford, J. Choi and R.C. Whaley. ScaLAPACK Users Guide. SIAM, 1997.
[16] M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimization of blocked algorithms. In Architectural Support for Programming Languages and Operating Systems, 1991.
[17] L. Li and K.-C. Toh. An inexact interior point method for L1-reguarlized sparse covariance selection.Mathematical Programming Computation, 2:291315, 2010.
[18] X. Li, T. Zhao, X. Yuan, and H. Liu. An R package flare for high dimensional linear regression and precision matrix estimation. http://cran.r-project.org/web/packages/flare, 2013.
[19] H. Liu and L. Wang. Tiger: A tuning-insensitive approach for optimally estimating Gaussian graphical models. Preprint, 2012.
[20] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. In VLDB, 2012.
[21] R. Mazumder and T. Hastie. Exact covariance thresholding into connected components for large-scale graphical lasso. JMLR, 13:723736, 2012.
[22] F. Niu, B. Retcht, C. Re, and S. J. Wright. Hogwild! a lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.
[23] N. Parikh and S. Boyd. Graph projection block splitting for distributed optimization. Preprint, 2012.
[24] R. Raina, A. Madhavan, and A. Y. Ng. Large-scale deep unsupervised learning using graphics processors.In ICML, 2009.
[25] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935980, 2011.
[26] H. Wang and A. Banerjee. Online alternating direction method. In ICML, 2012.
[27] J. Yang and Y. Zhang. Alternating direction algorithms for L1-problems in compressive sensing. ArXiv, 2009.
[28] M. Yuan. Sparse inverse covariance matrix estimation via linear programming. JMLR, 11, 2010.
[29] M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, 2010.
-----1
[1] J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs sampling: From colored fields to thin junction trees. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 324332, 2011.
[2] Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and J. M. Hellerstein.Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB, Istanbul, 2012.
[3] Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS) 24, pages 693701, Granada, 2011.
[4] Amr Ahmed, Mohamed Aly, Joseph Gonzalez, Shravan Narayanamurthy, and Alexander J. Smola. Scalable inference in latent variable models. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM), 2012.
[5] Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213226, 1981.
[6] Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 29th International Conference on Machine Learning (ICML), Edinburgh, 2012.
[7] Tamara Broderick, Brian Kulis, and Michael I. Jordan. MAD-bayes: MAP-based asymptotic derivations from Bayes. In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.
[8] Leslie G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103111, 1990.
[9] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.
[10] A. Meyerson. Online facility location. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS), Las Vegas, 2001.
[11] A. Meyerson, N. Mishra, R. Motwani, and L. OCallaghan. Clustering data streams: Theory and practice.IEEE Transactions on Knowledge and Data Engineering, 15(3):515528, 2003.
[12] N. Ailon, R. Jaiswal, and C. Monteleoni. Streaming k-means approximation. In Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, 2009.
[13] John Paisley, David Blei, and Michael I Jordan. Stick-breaking Beta processes and the Poisson process. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
[14] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for Latent Dirichlet Allocation.In Advances in Neural Information Processing Systems (NIPS) 20, Vancouver, 2007.
[15] D. Lovell, J. Malmaud, R. P. Adams, and V. K. Mansinghka. ClusterCluster: Parallel Markov chain Monte Carlo for Dirichlet process mixtures. ArXiv e-prints, April 2013.
[16] F. Doshi-Velez, D. Knowles, S. Mohamed, and Z. Ghahramani. Large scale nonparametric Bayesian inference: Data parallelisation in the Indian Buffet process. In Advances in Neural Information Processing Systems (NIPS) 22, Vancouver, 2009.
[17] Tianbing Xu and Alexander Ihler. Multicore Gibbs sampling in dense, unstructured graphs. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). 2011.
[18] I. Dhillon and D.S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Workshop on Large-Scale Parallel KDD Systems, 2000.
[19] A. Das, M. Datar, A. Garg, and S. Ragarajam. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th World Wide Web Conference, Banff, 2007.
[20] A. Ene, S. Im, and B. Moseley. Fast clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Diego, 2011.
[21] M. Shindler, A. Wong, and A. Meyerson. Fast and accurate k-means for large datasets. In Advances in Neural Information Processing Systems (NIPS) 24, Granada, 2011.
[22] Moses Charikar, Liadan OCallaghan, and Rina Panigrahy. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC), 2003.
[23] Mihai Ba?doiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002.
[24] D. Feldman, A. Krause, and M. Faulkner. Scalable training of mixture models via coresets. In Advances in Neural Information Processing Systems (NIPS) 24, Granada, 2011.
[25] B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable kmeans++. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB), Istanbul, 2012.
-----1
[1] Delbert Dueck and Brendan J. Frey. Non-metric affinity propagation for unsupervised image categoriza- tion. In ICCV, 2007.
[2] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). 2006.
[3] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In North Ameri- can chapter of the Assoc. for Comp. Linguistics/Human Lang. Tech., 2011.
[4] Ryan Gomes and Andreas Krause. Budgeted nonparametric learning from data streams. In Proc. Inter- national Conference on Machine Learning (ICML), 2010.
[5] Andreas Krause and Daniel Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press, 2013.
[6] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD, 2003.
[7] Andreas Krause and Carlos Guestrin. Submodularity and its applications in optimized information gath- ering. ACM Transactions on Intelligent Systems and Technology, 2011.
[8] Andrew Guillory and Jeff Bilmes. Active semi-supervised learning using submodular functions. In Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, July 2011. AUAI.
[9] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 2011.
[10] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Mathematical Programming, 1978.
[11] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a submodular set function. Math. Oper. Research, 1978.
[12] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 1998.
[13] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[14] Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, and Andrew Y. Ng. Map- reduce for machine learning on multicore. In NIPS, 2006.
[15] Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. Mapreduce for data intensive scientific analy- ses. In Proc. of the 4th IEEE Inter. Conf. on eScience.
[16] Daniel Golovin, Matthew Faulkner, and Andreas Krause. Online distributed sensor selection. In IPSN, 2010.
[17] Graham Cormode, Howard Karloff, and Anthony Wirth. Set cover algorithms for very large datasets. In Proc. of the 19th ACM intern. conf. on Inf. knowl. manag.
[18] Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Max-cover in map-reduce. In Proceedings of the 19th international conference on World wide web, 2010.
[19] Guy E. Blelloch, Richard Peng, and Kanat Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, 2011.
[20] Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, 2011.
[21] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proc.of Uncertainty in Artificial Intelligence (UAI), 2005.
[22] M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, LNCS, pages 234243, 1978.
[23] Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to cluster analysis, volume 344. Wiley-Interscience, 2009.
[24] Antonio Torralba, Rob Fergus, and William T Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 2008.
[25] Athanasios Tsanas, Max A Little, Patrick E McSharry, and Lorraine O Ramig. Enhanced classical dys- phonia measures and sparse regression for telemonitoring of parkinsons disease progression. In IEEE Int. Conf. Acoust. Speech Signal Process., 2010.
[26] Yahoo! academic relations. r6a, yahoo! front page today module user click log dataset, version 1.0, 2012.
-----1
[1] Z. Zhang, A. Ganesh, X. Liang, and Y. Ma, TILT: Transform-Invariant Low-rank Textures, Interna- tional Journal of Computer Vision, 99(1): 1-24, 2012.
[2] G. Huang, V. Jain, and E. Learned-Miller, Unsupervised joint alignment of complex images, Interna- tional Conference on Computer Vision pp. 1-8, 2007.
[3] E. Learned-Miller, Data Driven Image Models Through Continuous Joint Alignment, IEEE Trans. on Pattern Analysis and Machine Intelligence, 28(2):236C250, 2006.
[4] M. Cox, S. Lucey, S. Sridharan, and J. Cohn, Least Squares Congealing for Unsupervised Alignment of Images, International Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[5] A. Vedaldi, G. Guidi, and S. Soatto, Joint Alignment Up to (Lossy) Transforamtions, International Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[6] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, RASL: Robust Alignment by Sparse and Low- rank Decomposition for Linearly Correlated Images, IEEE Trans. on Pattern Analysis and Machine Intelligence, 34(11): 2233-2246, 2012.
[7] J. Kruskal, Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra and its Applications, 18(2): 95-138, 1977.
[8] T. Kolda and B. Bader, Tensor decompositions and applications, SIAM Review, 51(3): 455-500, 2009.
[9] E. Candes, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, Journal of the ACM, 2011.
[10] S. Gandy, B. Recht, and I. Yamada, Tensor Completion and Low- N-Rank Tensor Recovery via Convex Optimization, Inverse Problem, 2011.
[11] M. Signoretto, L. Lathauwer, and J. Suykens, Nuclear Norms for Tensors and Their Use for Convex Multilinear Estimation, Linear Algebra and Its Applications, 2010.
[12] J. Liu, P. Musialski, P. Wonka, and J. Ye, Tensor Completion for Estimating Missing Values in Visual Data, IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(1): 208-220, 2013.
[13] R. Tomioka, K. Hayashi, and H. Kashima, Estimation of low-rank tensors via convex optimization, Technical report, arXiv:1010.0789, 2011.
[14] Y. Li, J. Yan, Y. Zhou, and J. Yang, Optimum Subspace Learning and Error Correction for Tensors, European Conference on Computer Vision, pp. 790-803, 2010.
[15] Z. Lin, M. Chen, L. Wu, and Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, Technical Report UILU-ENG-09-2215, UIUC Technical Report, 2009.
[16] J. Yang and X. Yuan, Linearized augmented lagrangian and alternating direction methods for nuclear norm minimization, Mathematics of Computation, 82(281): 301-329, 2013.
-----1
[1] J. Abrahams and A. Leslie. Methods used in the structure determination of bovine mitochondrial f1 atpase. Acta Crystallographica Section D: Biological Crystallography, 52(1):3042, 1996.
[2] H. H. Bauschke, P. L. Combettes, and D. R. Luke. Hybrid projectionreflection method for phase retrieval.JOSA A, 20(6):10251034, 2003.
[3] L. Bregman. Finding the common point of convex sets by the method of successive projection.(russian).In Dokl. Akad. Nauk SSSR, volume 162, pages 487490, 1965.
[4] E. J. Candes, Y. C. Eldar, T. Strohmer, and V. Voroninski. Phase retrieval via matrix completion. SIAM Journal on Imaging Sciences, 6(1):199225, 2013.
[5] E. J. Candes and X. Li. Solving quadratic equations via phaselift when there are about as many equations as unknowns. arXiv preprint arXiv:1208.6247, 2012.
[6] E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 2012.
[7] A. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-only measurements. Inverse Problems, 27(1):015005, 2011.
[8] J. C. Dainty and J. R. Fienup. Phase retrieval and image reconstruction for astronomy. Image Recovery: Theory and Application, ed. byH. Stark, Academic Press, San Diego, pages 231275, 1987.
[9] H. Duadi, O. Margalit, V. Mico, J. A. Rodrigo, T. Alieva, J. Garcia, and Z. Zalevsky. Digital holography and phase retrieval. Source: Holography, Research and Technologies. InTech, 2011.
[10] V. Elser. Phase retrieval by iterated projections. JOSA A, 20(1):4055, 2003.
[11] J. R. Fienup et al. Phase retrieval algorithms: a comparison. Applied optics, 21(15):27582769, 1982.
[12] D. Gabor. A new microscopic principle. Nature, 161(4098):777778, 1948.
[13] R. W. Gerchberg and W. O. Saxton. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik, 35:237, 1972.
[14] N. E. Hurt. Phase Retrieval and Zero Crossings: Mathematical Methods in Image Reconstruction, vol- ume 52. Kluwer Academic Print on Demand, 2001.
[15] P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternating minimization. arXiv preprint arXiv:1212.0467, 2012.
[16] R. H. Keshavan. Efficient algorithms for collaborative filtering. Phd Thesis, Stanford University, 2012.
[17] W. V. Li and A. Wei. Gaussian integrals involving absolute value functions. In Proceedings of the Conference in Luminy, 2009.
[18] X. Li and V. Voroninski. Sparse signal recovery from quadratic measurements via convex programming.arXiv preprint arXiv:1209.4785, 2012.
[19] S. Marchesini. Invited article: A unified evaluation of iterative projection algorithms for phase retrieval.Review of Scientific Instruments, 78(1):011301011301, 2007.
[20] J. Miao, P. Charalambous, J. Kirz, and D. Sayre. Extending the methodology of x-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens. Nature, 400(6742):342344, 1999.
[21] D. Misell. A method for the solution of the phase problem in electron microscopy. Journal of Physics D: Applied Physics, 6(1):L6, 1973.
[22] H. Ohlsson, A. Y. Yang, R. Dong, and S. S. Sastry. Compressive phase retrieval from squared output measurements via semidefinite programming. arXiv preprint arXiv:1111.6323, 2011.
[23] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. arXiv preprint arXiv:1212.3753, 2012.
[24] J. L. Sanz. Mathematical considerations for the problem of fourier transform phase retrieval from magni- tude. SIAM Journal on Applied Mathematics, 45(4):651664, 1985.
[25] Y. Shechtman, Y. C. Eldar, A. Szameit, and M. Segev. Sparsity based sub-wavelength imaging with partially incoherent light via quadratic compressed sensing. arXiv preprint arXiv:1104.4406, 2011.
[26] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389434, 2012.
[27] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
[28] I. Waldspurger, A. dAspremont, and S. Mallat. Phase recovery, maxcut and complex semidefinite pro- gramming. arXiv preprint arXiv:1206.0102, 2012.
[29] D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part 1theory. Medical Imaging, IEEE Transactions on, 1(2):8194, 1982.
-----1
[1] D. Angluin. Queries revisited. Theor. Comput. Sci., 313(2):175194, 2004.
[2] F. J. Balbach and T. Zeugmann. Teaching randomized learners. In COLT, pages 229243.Springer, 2006.
[3] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.
[4] B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support vector machines. In ICML, 2012.
[5] L. D. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory. Institute of Mathematical Statistics, Hayworth, CA, USA, 1986.
[6] M. Cakmak and M. Lopes. Algorithmic and human teaching of sequential decision tasks. In AAAI Conference on Artificial Intelligence, 2012.
[7] N. Chater and M. Oaksford. The probabilistic mind: prospects for Bayesian cognitive science.OXFORD University Press, 2008.
[8] M. C. Frank and N. D. Goodman. Predicting Pragmatic Reasoning in Language Games. Sci- ence, 336(6084):998, May 2012.
[9] G. Gigue`re and B. C. Love. Limits in decision making arise from limits in memory retrieval.Proceedings of the National Academy of Sciences, Apr. 2013.
[10] S. Goldman and M. Kearns. On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1):2031, 1995.
[11] S. Hanneke. Teaching dimension and the complexity of active learning. In COLT, page 6681, 2007.
[12] T. Hegedus. Generalized teaching dimensions and the query complexity of learning. In COLT, pages 108117, 1995.
[13] F. Khan, X. Zhu, and B. Mutlu. How do humans teach: On curriculum learning and teaching dimension. In Advances in Neural Information Processing Systems (NIPS) 25. 2011.
[14] H. Kobayashi and A. Shinohara. Complexity of teaching by a restricted number of examples.In COLT, pages 293302, 2009.
[15] M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. In NIPS, 2010.
[16] Y. J. Lee and K. Grauman. Learning the easy things first: Self-paced visual category discovery.In CVPR, 2011.
[17] B. D. McCandliss, J. A. Fiez, A. Protopapas, M. Conway, and J. L. McClelland. Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cognitive, Affective, & Behavioral Neuroscience, 2(2):89108, 2002.
[18] H. Pashler and M. C. Mozer. When does fading enhance perceptual category learning? Journal of Experimental Psychology: Learning, Memory, and Cognition, 2013. In press.
[19] A. N. Rafferty and T. L. Griffiths. Optimal language learning: The importance of starting representative. 32nd Annual Conference of the Cognitive Science Society, 2010.
[20] P. Shafto and N. Goodman. Teaching Games: Statistical Sampling Assumptions for Learning in Pedagogical Situations. In CogSci, pages 16321637, 2008.
[21] S. Singh, R. L. Lewis, A. G. Barto, and J. Sorg. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. on Auton. Ment. Dev., 2(2):7082, June 2010.
[22] J. B. Tenenbaum and T. L. Griffiths. The rational basis of representativeness. 23rd Annual Conference of the Cognitive Science Society, 2001.
[23] J. B. Tenenbaum, T. L. Griffiths, and C. Kemp. Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7):309318, 2006.
[24] F. Xu and J. B. Tenenbaum. Word learning as Bayesian inference. Psychological review, 114(2), 2007.
[25] S. Zilles, S. Lange, R. Holte, and M. Zinkevich. Models of cooperative teaching and learning.Journal of Machine Learning Research, 12:349384, 2011.
-----1
[1] F. Niu, B. Recht, C. R, and S.J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems (2011).
[2] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirich- let allocation. In: Advances in Neural Information Processing Systems 20.1081-1088 (2007), pp. 1724.
[3] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic mod- els. In: The Journal of Machine Learning Research 10 (2009), pp. 18011828.
[4] Z. Liu, Y. Zhang, E.Y. Chang, and M. Sun. PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing. In: ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011), p. 26.
[5] R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and dis- tributed approaches. Cambridge University Press, 2012.
[6] A. Ihler and D. Newman. Understanding Errors in Approximate Distributed Latent Dirich- let Allocation. In: Knowledge and Data Engineering, IEEE Transactions on 24.5 (2012), pp. 952960.
[7] Y. Liu, O. Kosut, and A. S. Willsky. Sampling GMRFs by Subgraph Correction. In: NIPS 2012 Workshop: Perturbations, Optimization, and Statistics (2012).
[8] G. Papandreou and A. Yuille. Gaussian sampling by local perturbations. In: Neural Infor- mation Processing Systems (NIPS). 2010.
[9] A. Parker and C. Fox. Sampling Gaussian distributions in Krylov spaces with conjugate gradients. In: SIAM Journal on Scientific Computing 34.3 (2012), pp. 312334.
[10] Colin Fox and Albert Parker. Convergence in Variance of First-Order and Second-Order Chebyshev Accelerated Gibbs Samplers. 2013. URL: http://www.physics.otago.ac.nz/data/fox/publications/SIAM_CS_2012-11-30.pdf.
[11] J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees. In: In Artificial Intelligence and Statistics (AISTATS). Ft.Lauderdale, FL, May 2011.
[12] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. In: Foundations and Trends R in Machine Learning 1.1-2 (2008), pp. 1305.
[13] A. Berman and R.J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. In: Classics in Applied Mathematics, 9 (1979).
[14] M. J. Castel V. Migalln and J. Penads. On Parallel two-stage methods for Hermitian pos- itive definite matrices with applications to preconditioning. In: Electronic Transactions on Numerical Analysis 12 (2001), pp. 88112.
[15] D. Serre. Nov. 2011. URL: http://mathoverflow.net/questions/80793/ is- gauss- seidel- guaranteed- to- converge- on- semi- positive- definite-matrices/80845#80845.
[16] Nicholas Ruozzi and Sekhar Tatikonda. Message-Passing Algorithms for Quadratic Min- imization. In: Journal of Machine Learning Research 14 (2013), pp. 22872314. URL: http://jmlr.org/papers/v14/ruozzi13a.html.
[17] A. Frommer and D.B. Szyld. On asynchronous iterations. In: Journal of computational and applied mathematics 123.1 (2000), pp. 201216.
[18] A. Frommer and D.B. Szyld. Asynchronous two-stage iterative methods. In: Numerische Mathematik 69.2 (1994), pp. 141153.
[19] J. A. Kelner, L. Orecchia, A. Sidford, and Z. A. Zhu. A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time. 2013. arXiv: 1301.6628 [cs.DS].
-----1
[1] Y. Bishop, S. Fienberg, and P. Holland. Discrete Multivariate Analysis: Theory and Practice.MIT Press, 1975.
[2] A. Dobra and A. Lenkoski. Copula Gaussian graphical models and their application to model- ing functional disability data. Annals of Applied Statistics, 5:969993, 2011.
[3] R. O. Duda and P. E. Hart. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 15(1):1115, 1972.
[4] G. Elidan. Copulas and machine learning. Proceedings of the Copulae in Mathematical and Quantitative Finance workshop, to appear, 2013.
[5] F. Han and H. Liu. Semiparametric principal component analysis. Advances in Neural Infor- mation Processing Systems, 25:171179, 2012.
[6] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks.Science, 313(5786):504507, 2006.
[7] P. Hoff. Extending the rank likelihood for semiparametric copula estimation. Annals of Applied Statistics, 1:265283, 2007.
[8] R. Jarvis. On the identification of the convex hull of a finite set of points in the plane. Infor- mation Processing Letters, 2(1):1821, 1973.
[9] H. Joe. Multivariate Models and Dependence Concepts. Chapman-Hall, 1997.
[10] S. Kirshner. Learning with tree-averaged densities and distributions. Neural Information Pro- cessing Systems, 2007.
[11] S. Lauritzen. Graphical Models. Oxford University Press, 1996.
[12] I. Murray, R. Adams, and D. MacKay. Elliptical slice sampling. JMLR Workshop and Confer- ence Proceedings: AISTATS 2010, 9:541548, 2010.
[13] J. Murray, D. Dunson, L. Carin, and J. Lucas. Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association, to appear, 2013.
[14] R. Neal. Slice sampling. The Annals of Statistics, 31:705767, 2003.
[15] R. Neal. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, pages 113162, 2010.
[16] R. Nelsen. An Introduction to Copulas. Springer-Verlag, 2007.
[17] A. Pakman and L. Paninski. Exact Hamiltonian Monte Carlo for truncated multivariate Gaus- sians. arXiv:1208.4118, 2012.
[18] C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[19] Y. Yu and X. L. Meng. To center or not to center: That is not the question  An ancillarity- sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. Journal of Compu- tational and Graphical Statistics, 20(3):531570, 2011.
-----1
[1] R Neal. MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo, pages 113162, 2011.
[2] Ari Pakman and Liam Paninski. Exact Hamiltonian Monte Carlo for Truncated Multivariate Gaussians. Journal of Computational and Graphical Statistics, 2013, arXiv:1208.4118.
[3] John A Hertz, Anders S Krogh, and Richard G Palmer. Introduction to the theory of neural computation, volume 1. Westview press, 1991.
[4] Yichuan Zhang, Charles Sutton, Amos Storkey, and Zoubin Ghahramani. Continuous Relax- ations for Discrete Hamiltonian Monte Carlo. In Advances in Neural Information Processing Systems 25, pages 32033211, 2012.
[5] M.D. Hoffman and A. Gelman. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Arxiv preprint arXiv:1111.4246, 2011.
[6] T. Park and G. Casella. The Bayesian lasso. Journal of the American Statistical Association, 103(482):681686, 2008.
[7] C.M. Carvalho, N.G. Polson, and J.G. Scott. The horseshoe estimator for sparse signals.Biometrika, 97(2):465480, 2010.
[8] T.J. Mitchell and J.J. Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):10231032, 1988.
[9] E.I. George and R.E. McCulloch. Variable selection via Gibbs sampling. Journal of the Amer- ican Statistical Association, 88(423):881889, 1993.
[10] S. Mohamed, K. Heller, and Z. Ghahramani. Bayesian and L1 approaches to sparse unsuper- vised learning. arXiv preprint arXiv:1106.1157, 2011.
[11] I.J. Goodfellow, A. Courville, and Y. Bengio. Spike-and-slab sparse coding for unsupervised feature discovery. arXiv preprint arXiv:1201.3382, 2012.
[12] Yutian Chen and Max Welling. Bayesian structure learning for Markov random fields with a spike and slab prior. arXiv preprint arXiv:1206.1088, 2012.
[13] Peter J Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711732, 1995.
[14] Mark E.J. Newman and Gerard T. Barkema. Monte Carlo methods in statistical physics. Ox- ford: Clarendon Press, 1999., 1, 1999.
[15] Alan D Sokal. Monte Carlo methods in statistical mechanics: foundations and new algorithms, 1989.
[16] Fugao Wang and David P Landau. Efficient, multiple-range random walk algorithm to calculate the density of states. Physical Review Letters, 86(10):20502053, 2001.
-----1
[1] M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 56(1- 3):209239, 2004. 4.4, 5 
[2] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 153160. MIT Press, Cambridge, MA, 2007. 1, 3 
[3] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):18721886, 2013. 6 
[4] R. L. Claypoole, G. Davis, W. Sweldens, and R. G. Baraniuk. Nonlinear wavelet transforms for image coding via lifting. IEEE Transactions on Image Processing, 12(12):14491459, Dec. 2003. 3 
[5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):530, July 2006. 4.6, 5 
[6] R. R. Coifman and M. Maggioni. Diffusion wavelets. Appl. Comput. Harmon. Anal., 21(1):5394, 2006. 1 
[7] M. Crovella and E. D. Kolaczyk. Graph wavelets for spatial traffic analysis. In INFOCOM, 2003. 1 
[8] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl., 4(3):245267, 1998. 3 
[9] M. N. Do and Y. M. Lu. Multidimensional filter banks and multiscale geometric representations. Founda- tions and Trends in Signal Processing, 5(3):157264, 2012. 1 
[10] M. Gavish, B. Nadler, and R. R. Coifman. Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. In ICML, pages 367374, 2010. 1, 3, 4.5 
[11] A. Georghiades, P. Belhumeur, and D. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intelligence, 23(6):643660, 2001. 5 
[12] D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph theory. Appl.Comput. Harmon. Anal., 30(2):129150, 2011. 1 
[13] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):15271554, 2006. 1, 3 
[14] G. E. Hinton and R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science, 313:504507, July 2006. 1, 3 
[15] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng. On optimization methods for deep learning. In ICML, pages 265272, 2011. 4.3 
[16] S. Mallat. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way. Academic Press, 3rd edition, 2008. 2 
[17] S. K. Narang and A. Ortega. Multi-dimensional separable critically sampled wavelet filterbanks on arbitrary graphs. In ICASSP, pages 35013504, 2012. 1 
[18] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849856, 2001. 4.5 
[19] I. Ram, M. Elad, and I. Cohen. Generalized tree-based wavelet transform. IEEE Transactions on Signal Processing, 59(9):41994209, 2011. 1 
[20] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energy-based model. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 11371144. MIT Press, Cambridge, MA, 2007. 1, 3 
[21] R. M. Rustamov. Average interpolating wavelets on point clouds and graphs. CoRR, abs/1110.2227, 2011.
[22] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.IEEE Signal Process. Mag., 30(3):8398, 2013. 1 
[23] W. Sweldens. The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, 29(2):511546, 1998. 2 
[24] A. D. Szlam, M. Maggioni, R. R. Coifman, and J. C. Bremer. Diffusion-driven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions. In SPIE, volume 5914, 2005. 1, 3, 4.5 
[25] X. Zhang, X. Dong, and P. Frossard. Learning of structured graph dictionaries. In ICASSP, pages 33733376, 2012. 6 
[26] X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912919, 2003. 5 
-----1
[1] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixed-membership stochastic blockmodels.Journal of Machine Learning Research, 9:19812014, 2008.
[2] D.J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multi- variate Analysis, 11:581598, 1981.
[3] H. Azari and E. M. Airoldi. Graphlet decomposition of a weighted network. Journal of Machine Learning Research, W&CP, 22:5463, 2012.
[4] P.J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other mod- ularities. Proc. Natl. Acad. Sci. USA, 106:2106821073, 2009.
[5] P.J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models.Annals of Statistics, 39(5):22802301, 2011.
[6] C. Borgs, J. Chayes, L. Lovasz, V. T. Sos, B. Szegedy, and K. Vesztergombi. Graph limits and parameter testing. In Proc. ACM Symposium on Theory of Computing, pages 261270, 2006.
[7] A. Channarond, J. Daudin, and S. Robin. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees. Electronic Journal of Statistics, 6:25742601, 2012.
[8] S. Chatterjee. Matrix estimation by universal singular value thresholding. ArXiv:1212.1247. 2012.
[9] D.S. Choi and P.J. Wolfe. Co-clustering separately exchangeable network data. ArXiv:1212.4093. 2012.
[10] D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with a growing number of classes.Biometrika, 99:273284, 2012.
[11] P. Diaconis and S. Janson. Graph limits and exchangeable random graphs. Rendiconti di Matematica e delle sue Applicazioni, Series VII, pages 3361, 2008.
[12] A. Goldenberg, A.X. Zheng, S.E. Fienberg, and E.M. Airoldi. A survey of statistical network models.Foundations and Trends in Machine Learning, 2:129233, 2009.
[13] P.D. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In Neural Information Processing Systems (NIPS), volume 20, pages 657664, 2008.
[14] P.D. Hoff, A.E. Raftery, and M.S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):10901098, 2002.
[15] D.N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 1979.
[16] O. Kallenberg. On the representation theorem for exchangeable arrays. Journal of Multivariate Analysis, 30(1):137154, 1989.
[17] R.H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Information Theory, 56:29802998, Jun. 2010.
[18] N.D. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent vari- able models. Journal of Machine Learning Research, 6:17831816, 2005.
[19] J.R. Lloyd, P. Orbanz, Z. Ghahramani, and D.M. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Neural Information Processing Systems (NIPS), 2012.
[20] L. Lovasz and B. Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B, 96:933957, 2006.
[21] K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparametric latent fature models for link prediction. In Neural Information Processing Systems (NIPS), 2009.
[22] K. Nowicki and T.A. Snijders. Estimation and prediction of stochastic block structures. Journal of American Statistical Association, 96:10771087, 2001.
[23] P. Orbanz and D.M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures, 2013. Unpublished manuscript.
[24] P.Latouche and S. Robin. Bayesian model averaging of stochastic block models to estimate the graphon function and motif frequencies in a w-graph model. ArXiv:1310.6150, October 2013. Unpublished manuscript.
[25] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.Annals of Statistics, 39(4):18781915, 2011.
[26] M. Tang, D.L. Sussman, and C.E. Priebe. Universally consistent vertex classification for latent positions graphs. Annals of Statistics, 2013. In press.
[27] L. Wasserman. All of Nonparametric Statistics. Springer, 2005.
[28] P.J. Wolfe and S.C. Olhede. Nonparametric graphon estimation. ArXiv:1309.5936, September 2013.Unpublished manuscript.
[29] Z. Xu, F. Yan, and Y. Qi. Infinite Tucker decomposition: nonparametric Bayesian models for multiway data analysis. In Proc. Intl. Conf. Machine Learning (ICML), 2012.
[30] Y. Zhao, E. Levina, and J. Zhu. Community extraction for social networks. In Proc. Natl. Acad. Sci. USA, volume 108, pages 73217326, 2011.
-----1
[1] P. Holland, K.B. Laskey, and S. Leinhardt. Stochastic blockmodels: Some first steps. Social Networks, 5:109137, 1983.
[2] Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership stochastic blockmodel. Journal of Machine Learning Research, 9:19812014, 2008.
[3] M. Girvan and M. E. J. Newman. Community structure in social and biological networks.PNAS, 99:78217826, 2002.
[4] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physics Review E, 70, 2004.
[5] Charles Kemp, Joshua B. Tenenbaum, Thomas L. Griffiths, Takeshi Yamada, and Naonori Ueda. Learning systems of concepts with an infinite relational model. AAAI, 2006.
[6] Zhao Xu, Volker Tresp, Kai Yu, and Hans-Peter Kriegel. Infinite hidden relational models.Uncertainty in Artificial Intelligence (UAI), 2006.
[7] Morten Mrup and Mikkel N. Schmidt. Bayesian community detection. Neural Computation, 24:24342456, 2012.
[8] T. Herlau, M. Mrup, M. N. Schmidt, and L. K. Hansen. Detecting hierarchical structure in networks. In Cognitive Information Processing, 2012.
[9] Phaedon-Stelios Koutsourelakis and Tina Eliassi-Rad. Finding mixed-memberships in social networks. 2008 AAAI Spring Symposium on Social Information Processing (AAAI-SS08), 2008.
[10] Konstantina Palla, David A. Knowles, and Zoubin Ghahramani. An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012. July 2012.
[11] Qirong Ho, Ankur P. Parikh, Le Song, and Erix P. Xing. Multiscale community blockmodel for network exploration. Proceedings of the Fourteenth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2011.
[12] D. M. Roy and Y. W. Teh. The Mondrian process. In Advances in Neural Information Pro- cessing Systems, volume 21, 2009.
[13] K. A. Heller and Z. Ghahramani. Bayesian hierarchical clustering. In Proceedings of the International Conference on Machine Learning, volume 22, 2005.
[14] C. Blundell, Y. Teh, and K. A. Heller. Bayesian Rose trees. UAI, 2010.
[15] T. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14:75100, 1997.
[16] Jake M. Hofman and Chris H. Wiggins. Bayesian approach to network modularity. Physical Review Letters, 100(25):258701, 2008.
[17] S. F. Sampson. A novitiate in a period of change. an experimental and case study of social relationships. 1968.
[18] Kurt T. Miller, Thomas L. Griffiths, and Michael I. Jordan. Nonparametric latent feature mod- els for link prediction. Neural Information Processing Systems (NIPS), 2009.
[19] A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 8:22652295, 2007.
[20] Charles Kemp. Infinite relational model implementation. http://www.psy.cmu.edu/ ckemp/code/irm.html. Accessed: 2013-04-08.
[21] Q. Ho, J. Yin, and E. P. Xing. On triangular versus edge representations  towards scalable modeling of networks. Neural Information Processing Systems (NIPS), 2012.
-----1
[1] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels.JMLR, 9, 2008.
[2] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. In WSDM, 2011.
[3] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letter B, 195(2):216222, 1987.
[4] J. Foulds, A. U. Asuncion, C. DuBois, C. T. Butts, and P. Smyth. A dynamic relational infinite feature model for longitudinal social networks. In AISTATS, 2011.
[5] W. Fu, L. Song, and E. P. Xing. Dynamic mixed membership blockmodel for evolving networks. In ICML, 2009.
[6] J. V. Gael, Y. W. Teh, , and Z. Ghahramani. The infinite factorial hidden markov model. In NIPS, 2009.
[7] S. J. Gershman, P. I. Frazier, and D. M. Blei. Distance dependent infinite latent feature models.arXiv:1110.5454, 2012.
[8] Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning, 29(2-3):245273, 1997.
[9] F. Guo, S. Hanneke, W. Fu, and E. P. Xing. Recovering temporally rewiring networks: a model-based approach. In ICML, 2007.
[10] S. Hanneke, W. Fu, and E. P. Xing. Discrete temporal models of social networks. Electron. J. Statist., 4:585605, 2010.
[11] C. Heaukulani and Z. Ghahramani. Dynamic probabilistic models for latent feature propagation in social networks. In ICML, 2013.
[12] Q. Ho, L. Song, and E. P. Xing. Evolving cluster mixed-membership blockmodel for time-varying net- works. In AISTATS, 2011.
[13] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. JASA, 97(460):1090  1098, 2002.
[14] K. Ishiguro, T. Iwata, N. Ueda, and J. Tenenbaum. Dynamic infinite relational model for time-varying relational data analysis. In NIPS, 2010.
[15] S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and longevity. In WSDM, 2012.
[16] M. Kim and J. Leskovec. Modeling social networks with node attributes using the multiplicative attribute graph model. In UAI, 2011.
[17] M. Kim and J. Leskovec. Latent multi-group membership graph model. In ICML, 2012.
[18] M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathe- matics, 8(1-2):113160, 2012.
[19] M. Kim and J. Leskovec. Nonparametric multi-group membership model for dynamic networks.http://i.stanford.edu/?mykim/pub/nips13-dmmg.pdf, 2013.
[20] J. R. Lloyd, P. Orbanz, Z. Ghahramani, and D. M. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In NIPS, 2012.
[21] K. T. Miller, T. L. Grifths, and M. I. Jordan. Nonparametric latent feature models for link prediction. In NIPS, 2010.
[22] M. Mrup, M. N. Schmidt, and L. K. Hansen. Infinite multiple membership relational modeling for complex networks. In MLSP, 2011.
[23] K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML, 2012.
[24] P. Sarkar and A. W. Moore. Dynamic social network analysis using latent space models. In NIPS, 2005.
[25] J. Scott, R. Gass, J. Crowcroft, P. Hui, C. Diot, and A. Chaintreau. CRAWDAD data set cambridge/haggle (v. 2009-05-29), May 2009.
[26] S. L. Scott. Bayesian methods for hidden markov models. JASA, 97(457):337351, 2002.
[27] T. A. B. Snijders, G. G. van de Bunt, and C. E. G. Steglich. Introduction to stochastic actor-based models for network dynamics. Social Networks, 32(1):4460, 2010.
[28] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD08, 2008.
[29] J. Yang and J. Leskovec. Community-affiliation graph model for overlapping community detection. In ICDM, 2012.
-----1
[1] I. E. Ohiorhenuan, F. Mechler, K. P. Purpura, A. M. Schmid, Q. Hu, and J. D. Victor. Sparse coding and high-order correlations in fine-scale cortical networks. Nature, 466(7306):617621, July 2010.
[2] P. Ravikumar, M. Wainwright, and J. Lafferty. High-dimensional Ising model selection using L1- regularized logistic regression. The Annals of Statistics, 38(3):12871319, 2010.
[3] E. Ganmor, R. Segev, and E. Schneidman. Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proceedings of the National Academy of Sciences, 108(23):96799684, 2011.
[4] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply strongly corre- lated network states in a neural population. Nature, 440(7087):10071012, Apr 2006.
[5] J. Shlens, G. Field, J. Gauthier, M. Grivich, D. Petrusca, A. Sher, L. A. M., and E. J. Chichilnisky. The structure of multi-neuron firing patterns in primate retina. J Neurosci, 26:82548266, 2006.
[6] P. Smolensky. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1.chapter Information processing in dynamical systems: foundations of harmony theory, pages 194281.MIT Press, Cambridge, MA, USA, 1986.
[7] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504507, 2006.
[8] G. J. McLachlan and D. Peel. Finite mixture models. Wiley, 2000.
[9] M. Bethge and P. Berens. Near-maximum entropy models for binary neural representations of natural images. Advances in neural information processing systems, 20:97104, 2008.
[10] P. Muller and F. A. Quintana. Nonparametric bayesian data analysis. Statistical science, 19(1):95110, 2004.
[11] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):15661581, 2006.
[12] W. Truccolo and J. P. Donoghue. Nonparametric modeling of neural point processes via stochastic gradi- ent boosting regression. Neural computation, 19(3):672705, 2007.
[13] R. P. Adams, I. Murray, and D. J. C. MacKay. Tractable nonparametric bayesian inference in poisson processes with gaussian process intensities. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM New York, NY, USA, 2009.
[14] A. Kottas, S. Behseta, D. E. Moorman, V. Poynor, and C. R. Olson. Bayesian nonparametric analysis of neuronal intensity rates. Journal of Neuroscience Methods, 203(1):241253, January 2012.
[15] E. Archer, I. M. Park, and J. W. Pillow. Bayesian entropy estimation for binary spike train data using parametric prior knowledge. In Advances in Neural Information Processing Systems (NIPS), 2013.
[16] M. Pachitariu, L. Buesing, and M. Sahani. Generalized recurrent linear models. In Computational and Systems Neuroscience (CoSyNe), Salt Lake City, Utah, February 2013.
[17] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th national conference, ACM 69, pages 157172, New York, NY, USA, 1969. ACM.
-----0
Emery N. Brown. Theory of point processes for neural systems. In Methods and Models in Neurophysics, chapter 14, pages 691726. 2005.
J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. J. Chichilnisky, and E. P. Simoncelli.Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature, 454 (7206):995999, Aug 2008.
Elad Schneidman, Michael J. Berry, Ronen Segev, and William Bialek. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087):10071012, April 2006.
David R. Brillinger. Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics, 59(3):189200, August 1988.
E.S. Chornoboy, L.P. Schramm, and A.F. Karr. Maximum likelihood identification of neural point process systems. Biological Cybernetics, 59(3):265275, 1988.Liam Paninski. Maximum likelihood estimation of cascade point-process neural encoding models.
Network: Computation in Neural Systems, 15(4):243262, 2004.W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93(2):1074, 2005.
K. D. Harris, J. Csicsvari, H. Hirase, G. Dragoi, and G. Buzsaki. Organization of cell assemblies in the hippocampus. Nature, 424:552555, 2003.
J. Ben Hough, Manjunath Krishnapur, Yuval Peres, and Blint Virag. Determinantal processes and independence. Probability Surveys, 3:206229, 2006.Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5(23), 2012.
James Zou and Ryan P. Adams. Priors for diversity in generative latent variable models. In Advances in Neural Information Processing Systems, 2012.
Raja H. Affandi, Alex Kulesza, Emily Fox, and Ben Taskar. Nystrom Approximation for LargeScale Determinantal Processes. In Artificial Intelligence and Statistics, 2013.
Alex Kulesza and Ben Taskar. Structured determinantal point processes. In Advances in Neural Information Processing Systems, 2011.
Matteo Carandini and David J. Heeger. Normalization as a canonical neural computation. Nature reviews. Neuroscience, 13(1):5162, January 2012.
Martin J. Wainwright, Odelia Schwartz, and Eero P. Simoncelli. Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons. In R Rao, B Olshausen, and M Lewicki, editors, Probabilistic Models of the Brain: Perception and Neural Function, chapter 10, pages 203222. MIT Press, February 2002.
J. Csicsvari, H. Hirase, A. Czurko, A. Mamiya, and G. Buzsaki. Oscillatory coupling of hippocampal pyramidal cells and interneurons in the behaving rat. The Journal of Neuroscience, 19(1):274 287, jan 1999.Carl E. Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
Kenji Mizuseki, Anton Sirota, Eva Pastalkova, and Gyorgy Buzsaki. Theta oscillations provide temporal windows for local circuit computation in the entorhinal-hippocampal loop. Neuron, 64 (2):267280, October 2009.
-----1
[1] L. G. Ungerleider and J. V. Haxby, What and where in the human brain. Current Opinion in Neuro- biology, vol. 4, no. 2, pp. 15765, 1994.
[2] J. M. Singer and D. L. Sheinberg, Temporal cortex neurons encode articulated actions as slow sequences of integrated poses. Journal of Neuroscience, vol. 30, no. 8, pp. 313345, 2010.
[3] M. W. Oram and D. I. Perrett, Integration of form and motion in the anterior superior temporal polysen- sory area of the macaque monkey. Journal of Neurophysiology, vol. 76, no. 1, pp. 10929, 1996.
[4] J. S. Baizer, L. G. Ungerleider, and R. Desimone, Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques. Journal of Neuroscience, vol. 11, no. 1, pp. 16890, 1991.
[5] T. Jellema, G. Maassen, and D. I. Perrett, Single cell integration of animate form, motion and location in the superior temporal cortex of the macaque monkey. Cerebral Cortex, vol. 14, no. 7, pp. 78190, 2004.
[6] C. Bruce, R. Desimone, and C. G. Gross, Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology, vol. 46, no. 2, pp. 36984, 1981.
[7] J. Vangeneugden, F. Pollick, and R. Vogels, Functional differentiation of macaque visual temporal corti- cal neurons using a parametric action space. Cerebral Cortex, vol. 19, no. 3, pp. 593611, 2009.
[8] G. Johansson, Visual perception of biological motion and a model for its analysis. Perception & Psy- chophysics, vol. 14, pp. 201211, 1973.
[9] E. Grossman, M. Donnelly, R. Price, D. Pickens, V. Morgan, G. Neighbor, and R. Blake, Brain areas involved in perception of biological motion. Journal of Cognitive Neuroscience, vol. 12, no. 5, pp. 711 20, 2000.
[10] J. A. Beintema and M. Lappe, Perception of biological motion without local image motion. Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 8, pp. 56613, 2002.
[11] D. I. Perrett, P. A. Smith, A. J. Mistlin, A. J. Chitty, A. S. Head, D. D. Potter, R. Broennimann, A. D.Milner, and M. A. Jeeves, Visual analysis of body movements by neurones in the temporal cortex of the macaque monkey: a preliminary report. Behavioural Brain Research, vol. 16, no. 2-3, pp. 15370, 1985.
[12] M. S. Beauchamp, K. E. Lee, J. V. Haxby, and A. Martin, FMRI responses to video and point-light displays of moving humans and manipulable objects. Journal of Cognitive Neuroscience, vol. 15, no. 7, pp. 9911001, 2003.
[13] N. C. Rust, V. Mante, E. P. Simoncelli, and J. A. Movshon, How MT cells analyze the motion of visual patterns. Nature Neuroscience, vol. 9, no. 11, pp. 142131, 2006.
[14] M. Riesenhuber and T. Poggio, Hierarchical models of object recognition in cortex. Nature Neuro- science, vol. 2, no. 11, pp. 101925, 1999.
[15] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, A Biologically Inspired System for Action Recognition, in 2007 IEEE 11th International Conference on Computer Vision. IEEE, 2007.
[16] C. G. Gross, C. E. Rocha-Miranda, and D. B. Bender, Visual properties of neurons in inferotemporal cortex of the macaque. Journal of Neurophysiology, vol. 35, no. 1, pp. 96111, 1972.
[17] P. Dayan and L. F. Abbott, Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge, MA: The MIT Press, 2005.
[18] J. A. Movshon and W. T. Newsome, Visual response properties of striate cortical neurons projecting to area MT in macaque monkeys. Journal of Neuroscience, vol. 16, no. 23, pp. 773341, 1996.
[19] W. Reichardt, Autocorrelation, a principle for the evaluation of sensory information by the central ner- vous system, Sensory Communication, pp. 30317, 1961.
[20] K. Schindler and L. van Gool, Action snippets: How many frames does human action recognition re- quire? in 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008.
[21] M. A. Giese and T. Poggio, Neural mechanisms for the recognition of biological movements. Nature Reviews Neuroscience, vol. 4, no. 3, pp. 17992, 2003.
[22] J. Lange and M. Lappe, A model of biological motion perception from configural form cues. Journal of Neuroscience, vol. 26, no. 11, pp. 2894906, 2006.
[23] P. J. Mineault, F. A. Khawaja, D. A. Butts, and C. C. Pack, Hierarchical processing of complex motion along the primate dorsal visual pathway. Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 16, pp. E97280, 2012.
[24] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, in 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008.
[25] A. Bobick and J. Davis, The recognition of human movement using temporal templates, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257267, 2001.
-----1
[1] Adrian, E.D. and Zotterman, Y. (1926) The impulses produced by sensory nerve endings. The Journal of physiology 49(61): 156-193 
[2] Wohrer A., Humphries M.D. and Machens C.K. (2012) Population-wide distributions of neural activity during perceptual decision-making. Progress in neurobiology 103: 156-193 
[3] Sclar, G. and Freeman, R.D. (1982) Orientation selectivity in the cats striate cortex is invariant with stimulus contrast. Experimental brain research 46(3): 457-61.
[4] Aksay E., Olasagasti I., Mensh B.D., Baker R., Goldman, M.S. and Tank, D.W. (2007) Func- tional dissection of circuitry in a neural integrator. Nature neuroscience 10(4): 494-504.
[5] Hubel D.H. and Wiesel T.N. (1962) Receptive fields, binocular interaction and functional architecture in the cats visual cortex. Physiological Soc 1:(160) 
[6] Olshausen B.A. and Field D.J. (2005) How close are we to understanding V1? Neural compu- tation 8(17): 470-3.
[7] Aertsen A., Johannesma P.I.M. and Hermes D.J. (1980) Spectro-temporal receptive fields of auditory neurons in the grassfrog. Biological Cybernetics 
[8] Machens C.K., Wehr M.S. and Zador A.M. (2004) Linearity of cortical receptive fields mea- sured with natural sounds. The Journal of neuroscience : the official journal of the Society for Neuroscience 5(24): 1089-100.
[9] Ginzburg I. and Sompolinsky H. (1994) Theory of correlations in stochastic neural networks.Physical Review E 4(50): 3171-3191.
[10] Trousdale J., Hu Y., Shea-Brown E. and Josic K. (2012) Impact of network structure and cellular response on spike time correlations. PLoS computational biology 3(8): e1002408 
[11] Beck J., Bejjanki V.R. and Pouget A. (2011) Insights from a simple expression for linear fisher information in a recurrently connected population of spiking neurons. Neural computation 6(23): 1484-502 
[12] van Vreeswijk C. and Sompolinsky H. (1996) Chaos in neuronal networks with balanced excitatory and inhibitory activity. Neural computation 5293(274): 1724-1726 
[13] van Vreeswijk C. and Sompolinsky H. (1998) Chaotic balanced state in a model of cortical circuits. Neural computation 6(10): 1321-1371 
[14] Haider, B., Duque, A., Hasenstaub, A.R. and McCormick, D.A. (2006) Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. The Jour- nal of neuroscience : the official journal of the Society for Neuroscience 17(26): 4535-45 
[15] Bourdoukan R., Barrett D.G.T., Machens C. and Deneve S. (2012) Learning optimal spike- based representations Advances in Neural Information Processing Systems 25: 2294-2302.
[16] Knight B.W. (1972) Dynamics of encoding in a population of neurons. The Journal of general physiology 6(59): 734-66 
[17] Boerlin M., Machens, C.K. and Deneve S. (2012) Predictive coding of dynamical variables in balanced spiking networks. PLoS computational biology, in press.
[18] Boerlin M., Deneve S. (2011) Spike-based population coding and working memory. PLoS Comput Biol 7, e1001080.
[19] Boyd S. and Vandenberghe L. (2004) Convex optimization.
[20] Braitenber V. and Schuz A. (1991) Anatomy of the cortex. Statistics and Geometry. Springer 
[21] Salinas E. (2006) How behavioral constraints may determine optimal sensory representations PLoS biolog 12(4): 1545-7885 
[22] Rozell C.J., Johnson D.H., Baraniuk R.G. and Olshausen B.A. (2011) Spike-based population coding and working memory. PLoS Comput Biol 7, e1001080.
-----1
[1] Hertz, J., Krogh, A. & Palmer, R.G.(1991) Introduction to the Theory of Neural Computation., Addison- Wesley.
[2] Rosenblatt, F. (1957) The Perceptrona perceiving and recognizing automaton. Report 85-460-1.
[3] Minsky M. L. & Papert S. A. (1969) Perceptrons Cambridge, MA: MIT Press.
[4] Diederich, S. & Opper, M. (1987) Learning of correlated patterns in spin-glass networks by local learning rulesPhysical Review Letters 58(9):949-952.
[5] Gutig R. & Sompolinsky H. (2006) The Tempotron: a neuron that learns spike timing-based deci- sions.Nature Neuroscience 9(3):420-8.
[6] Dan, Y. &Poo, M.(2004) Spike Timing-Dependent Plasticity of Neural Circuits Neuron 44:2330.
[7] Dan, Y. &Poo, M.(2006) Spike timing-dependent plasticity: from synapse to perception. Physiological Reviews 86(3):1033-48.
[8] Caporale, N. & Dan, Y. (2008) Spike TimingDependent Plasticity: A Hebbian Learning Rule Annual Review of Neuroscience 31:2546.
[9] Froemke, R. C., Poo, M.-M., Dan, Y. (2005) Spike-timing-dependent synaptic plasticity depends on den- dritic location. Nature 434:221-225.
[10] Sjostrom, P. J., Hausser, M. (2006) A Cooperative Switch Determines the Sign of Synaptic Plasticity in Distal Dendrites of Neocortical Pyramidal Neurons. Neuron 51:227-238.
[11] Haas, J.S., Nowotny,T. & Abarbanel,H.D.I.(2006) Spike-Timing-Dependent Plasticity of Inhibitory Synapses in the Entorhinal Cortex Journal of Neurophysiology 96(6):3305-3313.
[12] Sjostrom, P.J., Turrigiano, G.G. and Nelson, S.B. (2004) Endocannabinoid-Dependent Neocortical Layer- 5 LTD in the Absence of Postsynaptic Spiking. J Neurophysiol 92:3338-3343 
[13] DSouza, P.,Liu, S.-C. & Hahnloser, R. H. R. (2010) Perceptron learning rule derived from spike- frequency adaptation and spike-time-dependent plasticity PNAS 107(10):47224727.
[14] Song, S., Miller, K. D., Abbott, L. F. (2000) Competitive Hebbian learning through spike-timing- dependent synaptic plasticity. Nature Neuroscience 3:919-926.
[15] Izhikevich, E. M., Desai, N. S. (2003) Relating STDP to BCM. Neural Computation 15:1511-1523 
[16] Vogels, T. P., Sprekeler, H., Zenke1, F., Clopath, C. & Gerstner, W. (2011) Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks Science 334(6062):1569-1573.
[17] Florian, R.V. (2012) The Chronotron: A Neuron That Learns to Fire Temporally Precise Spike Patterns PLoS ONE 7(8): e40233 
[18] Rubin, R., Monasson, R. & Sompolinsky, H. (2010) Theory of Spike Timing-Based Neural Classifiers Physical Review Letters 105(21): 218102.
[19] Ponulak, F., Kasinski, A. (2010) Supervised Learning in Spiking Neural Networks with ReSuMe: Se- quence Leanring, Classification, and Spike Shifting Neural Computation 22:467-510 
[20] Xu, Y., Zeng, X., Zhong, S. (2013) A New Supervised Learning Algorithm for Spiking Neurons Neural Computation 25: 1475-1511 
[21] Legenstein, R., Naeger, C., Maas, W. (2005) What Can a Neuron Learn with Spike-Timing-Dependent Plasticity? Neural Computation 17:2337-2382 
[22] Clopath, C., Busing, L., Vasilaki, E., Gerstner, W. (2010) Connectivity reflects coding: a model of voltage- based STDP with homeostasis Nature Neuroscience 13:344-355 
[23] Fino, E., Deniau, J-M., Venance, L. (2009) Brief Subthreshold Events Can Act as Hebbian Signals for Long-Term Plasticity PLoS ONE 4(8): e6557 
-----1
[1] L. R. Harris, M. Jenkin, D. C. Zikovitz, Experimental Brain Research 135, 12 (2000).
[2] R. Bertin, A. Berthoz, Experimental Brain Research 154, 11 (2004).
[3] B. E. Stein, T. R. Stanford, Nature Reviews Neuroscience 9, 255 (2008).
[4] R. J. van Beers, A. C. Sittig, J. J. D. van der Gon, Journal of Neurophysiology 81, 1355 (1999).
[5] M. O. Ernst, M. S. Banks, Nature 415, 429 (2002).
[6] Y. Gu, D. E. Angelaki, G. C. DeAngelis, Nature Neuroscience 11, 1201 (2008).
[7] A. Chen, G. C. DeAngelis, D. E. Angelaki, The Journal of Neuroscience 33, 3567 (2013).
[8] R. A. Jacobs, Vision Research 39, 3621 (1999).
[9] W. J. Ma, J. M. Beck, P. E. Latham, A. Pouget, Nature Neuroscience 9, 1432 (2006).
[10] Y. Gu, P. V. Watkins, D. E. Angelaki, G. C. DeAngelis, The Journal of Neuroscience 26, 73 (2006).
[11] A. Chen, G. C. DeAngelis, D. E. Angelaki, The Journal of Neuroscience 31, 3082 (2011).
[12] D. Boussaoud, L. G. Ungerleider, R. Desimone, Journal of Comparative Neurology 296, 462 (1990).
[13] J. S. Baizer, L. G. Ungerleider, R. Desimone, The Journal of Neuroscience 11, 168 (1991).
[14] J. Vincent, et al., Nature 447, 83 (2007).
[15] A. Chen, G. C. DeAngelis, D. E. Angelaki, The Journal of Neuroscience 31, 12036 (2011).
[16] K. Zhang, The Journal of Neuroscience 16, 2112 (1996).
[17] S. Deneve, P. Latham, A. Pouget, Nature Neuroscience 2, 740 (1999).
[18] S. Wu, S.-I. Amari, H. Nakahara, Neural Computation 14, 999 (2002).
[19] C. A. Fung, K. M. Wong, S. Wu, Neural Computation 22, 752 (2010).
[20] S.-I. Amari, Biological Cybernetics 27, 77 (1977).
[21] A. P. Georgopoulos, M. Taira, A. Lukashin, et al., Science 260, 47 (1993).
[22] A. Samsonovich, B. L. McNaughton, The Journal of Neuroscience 17, 5900 (1997).
[23] C. R. Fetsch, A. Pouget, G. C. DeAngelis, D. E. Angelaki, Nature Neuroscience 15, 146 (2011).
[24] K. H. Britten, M. N. Shadlen, W. T. Newsome, J. A. Movshon, et al., Visual Neuroscience 10, 1157 (1993).
-----1
[1] Barry E. Stein and Terrence R. Stanford. Multisensory integration: Current issues from the perspective of a single neuron. Nature Reviews Neuroscience, 9:255266, April 2008.
[2] Christoph Kayser, Christopher I. Petkov, and Nikos K. Logothetis. Multisensory interactions in primate auditory cortex: fmri and electrophysiology. Hearing Research, 258:8088, March 2009.
[3] Stephen J. Huston and Vivek Jayaraman. Studying sensorimotor integration in insects. Current Opinion in Neurobiology, 21:527534, June 2011.
[4] Barry E. Stein and M. Alex Meredith. The merging of the senses. The MIT Press, 1993.
[5] David A. Bulkin and Jennifer M. Groh. Seeing sounds: Visual and auditory interactions in the brain.Current Opinion in Neurobiology, 16:415419, July 2006.
[6] Jon Driver and Toemme Noesselt. Multisensory interplay reveals crossmodal influences on sensory- specific brain regions, natural responses, and judgments. Neuron, 57:1123, January 2008.
[7] Christoph Kayser, Nikos K. Logothetis, and Stefano Panzeri. Visual enhancement of the information representation in auditory cortex. Current Biology, pages 1924, January 2010.
[8] Asif A. Ghazanfar and Charles E. Schroeder. Is neocortex essentially multisensory? Trends in Cognitive Sciences, 10:278285, June 2006.
[9] Paul J. Laurienti, Thomas J. Perrault, Terrence R. Stanford, Mark T. Wallace, and Barry E. Stein. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166:289297, 2005.
[10] Konrad P. Kording and Joshua B. Tenenbaum. Causal inference in sensorimotor integration. Advances in Neural Information Processing Systems 19, 2007.
[11] Ulrik R. Beierholm, Konrad P. Kording, Ladan Shams, and Wei Ji Ma. Comparing bayesian models for multisensory cue combination without mandatory integration. Advances in Neural Information Processing Systems 20, 2008.
[12] Daniel C. Kadunce, J. William Vaughan, Mark T. Wallace, and Barry E. Stein. The influence of visual and auditory receptive field organization on multisensory integration in the superior colliculus. Experimental Brain Research, 2001.
[13] Wei Ji Ma and Alexandre Pouget. Linking neurons to behavior in multisensory perception: A computa- tional review. Brain Research, 1242:412, 2008.
[14] Mark A. Frye. Multisensory systems integration for high-performance motor control in flies. Current Opinion in Neurobiology, 20:347352, 2010.
[15] Aurel A. Lazar and Yevgeniy B. Slutskiy. Channel Identification Machines. Computational Intelligence and Neuroscience, 2012.
[16] Aurel A. Lazar. Time encoding with an integrate-and-fire neuron with a refractory period.Neurocomputing, 58-60:5358, June 2004.
[17] Aurel A. Lazar. Population encoding with Hodgkin-Huxley neurons. IEEE Transactions on Information Theory, 56(2), February 2010.
[18] Aurel A. Lazar and Laszlo T. Toth. Perfect recovery and sensitivity analysis of time encoded bandlimited signals. IEEE Transactions on Circuits and Systems-I: Regular Papers, 51(10):20602073, 2004.
[19] Aurel A. Lazar and Eftychios A. Pnevmatikakis. Faithful representation of stimuli with a population of integrate-and-fire neurons. Neural Computation, 20(11):27152744, November 2008.
[20] Aurel A. Lazar and Yevgeniy B. Slutskiy. Functional identification of spike-processing neural circuits.Neural Computation, in press, 2013.
[21] Anmo J. Kim and Aurel A. Lazar. Recovery of stimuli encoded with a Hodgkin-Huxley neuron using conditional PRCs. In N.W. Schultheiss, A.A. Prinz, and R.J. Butera, editors, Phase Response Curves in Neuroscience. Springer, 2011.
[22] Alain Berlinet and Christine Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, 2004.
[23] Aurel A. Lazar, Eftychios A. Pnevmatikakis, and Yiyin Zhou. Encoding natural scenes with neural circuits with random thresholds. Vision Research, 2010. Special Issue on Mathematical Models of Visual Coding.
[24] Aurel A. Lazar and Eftychios A. Pnevmatikakis. Reconstruction of sensory stimuli encoded with integrate-and-fire neurons with random thresholds. EURASIP Journal on Advances in Signal Processing, 2009, 2009.
[25] Yevgeniy B. Slutskiy. Identification of Dendritic Processing in Spiking Neural Circuits. PhD thesis, Columbia University, 2013.
-----1
[1] Pietro Berkes, Gergo? Orban, Mate Lengyel, and Jozsef Fiser. Spontaneous cortical activity reveals hall- marks of an optimal internal model of the environment. Science, 331(6013):8387, 2011.
[2] Scott Kirkpatrick, D. Gelatt Jr., and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671680, 1983.
[3] John J Hopfield and David W Tank. neural computation of decisions in optimization problems. Biological cybernetics, 52(3):141152, 1985.
[4] Behzad Kamgar-Parsi. Dynamical stability and parameter selection in neural optimization. In Neural Networks, 1992. IJCNN., International Joint Conference on, volume 4, pages 566571. IEEE, 1992.
[5] Sreeram VB Aiyer, Mahesan Niranjan, and Frank Fallside. A theoretical investigation into the perfor- mance of the hopfield model. Neural Networks, IEEE Transactions on, 1(2):204215, 1990.
[6] Kate Smith, Marimuthu Palaniswami, and Mohan Krishnamoorthy. Neural techniques for combinatorial optimization with applications. Neural Networks, IEEE Transactions on, 9(6):13011318, 1998.
[7] Luonan Chen and Kazuyuki Aihara. Chaotic simulated annealing by a neural network model with tran- sient chaos. Neural networks, 8(6):915930, 1995.
[8] James P Crutchfield. Critical computation, phase transitions, and hierarchical learning. Towards the Harnessing of Chaos, Amsterdam, 1994.
[9] Jerry A Fodor. Information and association. Notre Dame journal of formal logic, 27(3):307323, 1986.
[10] Rodney J. Douglas and Kevan A. Martin. Neuronal circuits of the neocortex. Annual review of neuro- science, 27(1):419451, 2004.
[11] R. J. Douglas and K. A. Martin. A functional microcircuit for cat visual cortex. The Journal of Physiology, 440(1):735769, January 1991.
[12] Carver Mead. Neuromorphic electronic systems. In Proc. IEEE, 78:16291636, 1990.
[13] J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead. Winner-take-all networks of O(n) com- plexity. In D.S. Touretzky, editor, Advances in neural information processing systems, volume 2, pages 703711, San Mateo - CA, 1989. Morgan Kaufmann.
[14] G. Indiveri, B. Linares-Barranco, T.J. Hamilton, A. van Schaik, R. Etienne-Cummings, T. Delbruck, S.-C.Liu, P. Dudek, P. Hafliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. Saighi, T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, and K. Boahen. Neuromorphic silicon neuron circuits. Frontiers in Neuroscience, 5:123, 2011.
[15] Ueli Rutishauser, Rodney J Douglas, and Jean-Jacques Slotine. Collective stability of networks of winner- take-all circuits. Neural computation, 23(3):735773, 2011.
[16] Stefano Fusi and Maurizio Mattia. Collective Behavior of Networks with Linear (VLSI) Integrate-and- Fire Neurons. Neural Computation, 11(3):633652, April 1999.
[17] Maurizio Mattia and Paolo Del Giudice. Population dynamics of interacting spiking neurons. Physical Review E, 66(5):051917+, November 2002.
[18] E.L. Bienenstock, L.N. Cooper, and P.W. Munro. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. The Journal of Neuroscience, 2(1):32 48, 1982.
[19] Per J. Sjostrom, Gina G. Turrigiano, and Sacha B. Nelson. Rate, Timing, and Cooperativity Jointly Determine Cortical Synaptic Plasticity. Neuron, 32(6):11491164, December 2001.
[20] Bob Kanefsky and W Taylor. Where the really hard problems are. In Proceedings of IJCAI, volume 91, pages 163169, 1991.
[21] Xiao-Jing Wang. Neurophysiological and computational principles of cortical rhythms in cognition. Phys- iological reviews, 90(3):11951268, 2010.
[22] Pascal Fries et al. A mechanism for cognitive dynamics: neuronal communication through neuronal coherence. Trends in cognitive sciences, 9(10):474480, 2005.
[23] Thilo Womelsdorf, Jan-Mathijs Schoffelen, Robert Oostenveld, Wolf Singer, Robert Desimone, An- dreas K Engel, and Pascal Fries. Modulation of neuronal interactions through neuronal synchronization.science, 316(5831):16091612, 2007.
[24] Peter Lakatos, George Karmos, Ashesh D Mehta, Istvan Ulbert, and Charles E Schroeder. Entrainment of neuronal oscillations as a mechanism of attentional selection. science, 320(5872):110113, 2008.
[25] Xiao-Jing Wang. Decision making in recurrent neuronal circuits. Neuron, 60(2):215  234, 2008.
[26] G. Indiveri. A current-mode hysteretic winner-take-all network, with excitatory and inhibitory coupling.Analog Integrated Circuits and Signal Processing, 28(3):279291, September 2001.
-----1
[1] D. J. Amit. Modeling Brain Function: The World of Attractor Neural Networks. Cambridge, 1989.
[2] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks. Phys. Rev. A, 32:10071018, 1985.
[3] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett., 55:15301533, Sep 1985.
[4] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Information storage in neural networks with low levels of activity. Phys. Rev. A, 35:22932303, Mar 1987.
[5] L. E. Baum and M. Katz. Convergence rates in the law of large numbers. Transactions of the American Mathematical Society, 120(1):108123, 1965.
[6] P. Billingsley. Probability and Measure. John Wiley & Sons, second edition edition, 1986.
[7] E. Bolthausen. Random media and spin glasses: An introduction into some mathematical results and problems. In E. Bolthausen and A. Bovier, editors, Spin Glasses, volume 1900 of Lecture Notes in Mathematics. Springer, 2007.
[8] A. Bovier and V. Gayrard. Hopfield models as generalized random mean field models. In A. Bovier and P. Picco, editors, Mathematical Aspects of Spin Glasses and Neural Networks, pages 389. Birkhuser, 1998.
[9] John Bowlby. Attachment: Volume One of the Attachment and Loss Trilogy. Pimlico, second revised edition edition, 1997.
[10] L. Cozolino. The Neuroscience of Human Relationships. W. W. Norton, 2006.
[11] F. Crick and G. Mitchison. The function of dream sleep. Nature, 304:111114, 1983.
[12] L. F. Cugliandolo and M. V. Tsodyks. Capacity of networks with correlated attractors. Journal of Physics A: Mathematical and General, 27(3):741, 1994.
[13] V. Dotsenko. An Introduction to the theory of spin glasses and neural networks. World Scientific, 1994.
[14] A. Edalat and F. Mancinelli. Strong attractors of Hopfield neural networks to model attachment types and behavioural patterns. In IJCNN 2013 Conference Proceedings. IEEE, August 2013.
[15] K. H. Fischer and J. A. Hertz. Spin Glasses (Cambridge Studies in Magnetism). Cambridge, 1993.
[16] T. Geszti. Physical Models of Neural Networks. World Scientific, 1990.
[17] R. J. Glauber. Timedependent statistics of the Ising model. J. Math. Phys., 294(4), 1963.
[18] H. Gutfreund. Neural networks with hierarchically correlated patterns. Phys. Rev. A, 37:570577, Jan 1988.
[19] J. A. Hertz, A. S. Krogh, and R. G. Palmer. Introduction To The Theory Of Neural Computation. Westview Press, 1991.
[20] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural compu- tation, 18(7):15271554, 2006.
[21] R. E. Hoffman. Computer simulations of neural information processing and the schizophrenia-mania dichotomy. Arch Gen Psychiatry., 44(2):17888, 1987.
[22] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Science, USA, 79:25542558, 1982.
[23] T. Lewis, F. Amini, and R. Richard. A General Theory of Love. Vintage, 2000.
[24] Matthias Lowe. On The Storage Capacity of Hopfield Models with Correlated Patterns. Annals of App- plied Probability, 8(4):12161250, 1998.
[25] M. Mezard, G. Parisi, and M. Virasoro, editors. Spin Glass Theory and Beyond. World Scientific, 1986.
[26] P. Peretto. On learning rules and memory storage abilities of asymmetrical neural networks. J. Phys.France, 49:711726, 1998.
[27] A. Salakhutdinov, R.and Mnih and G. Hinton. Restricted boltzmann machines for collaborative filtering.In Proceedings of the 24th international conference on Machine learning, pages 791798, 2007.
[28] A. N. Schore. Affect Dysregulation and Disorders of the Self. W. W. Norton, 2003.
[29] T. S. Smith, G. T. Stevens, and S. Caldwell. The familiar and the strange: Hopfield network models for prototype-entrained. In Thomas S. (Ed) Franks, David D. (Ed); Smith, editor, Mind, brain, and society: Toward a neurosociology of emotion, volume 5 of Social perspectives on emotion. Elsevier Science/JAI Press, 1999.
[30] M. Tsodyks and M. Feigelman. Enhanced storage capacity in neural networks with low level of activity.Europhysics Letters,, 6:101105, 1988.
-----1
[1] Per Anderson, Gary N. Gross, Terje Lmo, and Ola Sveen. Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output. In Mary A.B. Brazier, editor, The Interneuron, volume 11. University of California Press, Los Angeles, 1969.
[2] John Carew Eccles, Masao Ito, and Jnos Szentgothai. The cerebellum as a neuronal machine.Springer-Verlag New York, 1967.
[3] Costas Stefanis. Interneuronal mechanisms in the cortex. In Mary A.B. Brazier, editor, The Interneuron, volume 11. University of California Press, Los Angeles, 1969.
[4] Stephen Grossberg. Contour enhancement, short-term memory, and constancies in reverber- ating neural networks. Studies in Applied Mathematics, 52:213257, 1973.
[5] Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation, 24:109164, 1989.
[6] Gail A. Carpenter and Stephen Grossberg. The art of adaptive pattern recognition by a self-organising neural network. Computer, 21(3):7788, 1988.
[7] Mark B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, Texas 78712, August 1994.
[8] Samuel A. Ellias and Stephen Grossberg. Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Bio. Cybernetics, 1975.
[9] Brad Ermentrout. Complex dynamics in winner-take-all neural nets with slow inhibition.Neural Networks, 5(1):415431, 1992.
[10] Christoph von der Malsburg. Self-organization of orientation sensitive cells in the striate cortex.Kybernetik, 14(2):85100, December 1973.
[11] Teuvo Kohonen. Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1):5969, 1982.
[12] Risto Mikkulainen, James A. Bednar, Yoonsuck Choe, and Joseph Sirosh. Computational maps in the visual cortex. Springer Science+ Business Media, 2005.
[13] Dale K. Lee, Laurent Itti, Christof Koch, and Jochen Braun. Attention activates winner-take- all competition among visual filters. Nature Neuroscience, 2(4):37581, April 1999.
[14] Matthias Oster and Shih-Chii Liu. Spiking inputs to a winner-take-all network. In Proceedings of NIPS, volume 18. MIT; 1998, 2006.
[15] John P. Lazzaro, Sylvie Ryckebusch, Misha Anne Mahowald, and Caver A. Mead. Winner- take-all networks of O(n) complexity. Technical report, 1988.
[16] Giacomo Indiveri. Modeling selective attention using a neuromorphic analog VLSI device.Neural Computation, 12(12):28572880, 2000.
[17] Wolfgang Maass. Neural computation with winner-take-all as the only nonlinear operation. In Proceedings of NIPS, volume 12, 1999.
[18] Wolfgang Maass. On the computational power of winner-take-all. Neural Computation, 12:25192535, 2000.
[19] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio.Maxout networks. In Proceedings of the ICML, 2013.
[20] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R.Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012. arXiv:1207.0580.
[21] Juergen Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403412, 1989.
[22] Rupesh K. Srivastava, Bas R. Steunebrink, and Juergen Schmidhuber. First experiments with powerplay. Neural Networks, 2013.
[23] Maximillian Riesenhuber and Tomaso Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1999.
[24] Alex Krizhevsky, Ilya Sutskever, and Goeffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of NIPS, pages 19, 2012.
[25] Dan Ciresan, Ueli Meier, and Jrgen Schmidhuber. Multi-column deep neural networks for image classification. Proceeedings of the CVPR, 2012.
[26] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann ma- chines. In Proceedings of the ICML, number 3, 2010.
[27] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier networks. In AIS- TATS, volume 15, pages 315323, 2011.
[28] George E. Dahl, Tara N. Sainath, and Geoffrey E. Hinton. Improving Deep Neural Networks for LVCSR using Rectified Linear Units and Dropout. In Proceedings of ICASSP, 2013.
[29] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML, 2013.
[30] Tijmen Tieleman. Gnumpy: an easy way to use GPU boards in Python. Department of Computer Science, University of Toronto, 2010.
[31] Volodymyr Mnih. CUDAMat: a CUDA-based matrix class for Python. Department of Com- puter Science, University of Toronto, Tech. Rep. UTML TR, 4, 2009.
[32] Patrice Y. Simard, Dave Steinkraus, and John C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In International Conference on Document Analysis and Recognition (ICDAR), 2003.
[33] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
[34] MarcAurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann LeCun. Efficient learn- ing of sparse representations with an energy-based model. In Proceedings of NIPS, 2007.
[35] Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the ICLR, 2013.
[36] Kevin Jarrett, Koray Kavukcuoglu, MarcAurelio Ranzato, and Yann LeCun. What is the best multi-stage architecture for object recognition? In Proc. of the ICCV, pages 21462153, 2009.
[37] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. Annual Meeting-ACL, 2007.
[38] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale senti- ment classification: A deep learning approach. In Proceedings of the ICML, number 1, 2011.
-----1
[1] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009.
[2] T. Cacoullos. Estimation of a multivariate density. Annals of the Institute of Statistical Mathematics, 18 (1):179189, 1966.
[3] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In International Conference on Computer Vision, pages 479486. IEEE, 2011.
[4] R. Salakhutdinov and I. Murray. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine learning, pages 872879. Omnipress, 2008.
[5] H. Larochelle and I. Murray. The neural autoregressive distribution estimator. Journal of Machine Learning Research W&CP, 15:2937, 2011.
[6] C. M. Bishop. Mixture density networks. Technical Report NCRG 4288, Neural Computing Research Group, Aston University, Birmingham, 1994.
[7] B. J. Frey, G. E. Hinton, and P. Dayan. Does the wake-sleep algorithm produce good density estimators? In Advances in Neural Information Processing Systems 8, pages 661670. MIT Press, 1996.
[8] Y. Bengio and S. Bengio. Modeling high-dimensional discrete data with multi-layer neural networks.Advances in Neural Information Processing Systems, 12:400406, 2000.
[9] H. Larochelle and S. Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, 2012.
[10] Y. Bengio. Discussion of the neural autoregressive distribution estimator. Journal of Machine Learning Research W&CP, 15:3839, 2011.
[11] L. Theis, R. Hosseini, and M. Bethge. Mixtures of conditional Gaussian scale mixtures applied to multiscale image representations. PLoS ONE, 7(7), 2012. doi: 10.1371/journal.pone.0039857.
[12] N. Friedman and I. Nachman. Gaussian process networks. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pages 211219. Morgan Kaufmann Publishers Inc., 2000.
[13] I. Murray and R. Salakhutdinov. Evaluating probabilities under high-dimensional latent variable models.In Advances in Neural Information Processing Systems 21, pages 11371144, 2009.
[14] L. Theis, S. Gerwinn, F. Sinz, and M. Bethge. In all likelihood, deep belief is not enough. Journal of Machine Learning Research, 12:30713096, 2011.
[15] M. A. Ranzato and G. E. Hinton. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Computer Vision and Pattern Recognition, pages 25512558. IEEE, 2010.
[16] A. Courville, J. Bergstra, and Y. Bengio. A spike and slab restricted Boltzmann machine. Journal of Machine Learning Research, W&CP, 15, 2011.
[17] V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, pages 807814. Omnipress, 2010.
[18] J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neuro-computing: algorithms, architectures and applications, pages 227236. Springer-Verlag, 1989.
[19] T. Robinson. SHORTEN: simple lossless and near-lossless waveform compression. Technical Report CUED/F-INFENG/TR.156, Engineering Department, Cambridge University, 1994.
[20] Y. Tang, R. Salakhutdinov, and G. Hinton. Deep mixtures of factor analysers. In Proceedings of the 29th International Conference on Machine Learning, pages 505512. Omnipress, 2012.
[21] D. Zoran and Y. Weiss. Natural images, Gaussian mixtures and dead leaves. Advances in Neural Information Processing Systems, 25:17451753, 2012.
[22] K. Bache and M. Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[23] R. Silva, C. Blundell, and Y. W. Teh. Mixed cumulative distribution networks. Journal of Machine Learning Research W&CP, 15:670678, 2011.
[24] Z. Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, 1996.
[25] J. Verbeek. Mixture of factor analyzers Matlab implementation, 2005. http://lear.inrialpes.fr/ verbeek/code/.
[26] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International Conference on Computer Vision, volume 2, pages 416423. IEEE, July 2001.
[27] D. Zoran. Personal communication, 2013.
[28] B. Frey. Graphical models for machine learning and digital communication. MIT Press, 1998.
[29] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue. Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, 10(5):0, 1993.
[30] T. Schaul, S. Zhang, and Y. LeCun. No More Pesky Learning Rates. In Proceedings of the 30th international conference on Machine learning, 2013.
[31] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281305, 2012.
[32] J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms.In Advances in Neural Information Processing Systems 25, pages 29602968, 2012.
[33] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. Arxiv preprint arXiv:1207.0580, 2012.
-----1
[1] J W Pillow, J Shlens, E J Chichilnisky, and E P Simoncelli. A model-based spike sorting algorithm for removing correlation artifacts in multi-neuron recordings. PLoS ONE, 8(5):115, 2013.
[2] J S Prentice, J Homann, K D Simmons, G Tkac?ik, V Balasubramanian, and P C Nelson. Fast, scalable, Bayesian spike identification for multi-electrode arrays. PloS one, 6(7):e19884, January 2011.
[3] F Franke, M Natora, C Boucsein, M H J Munk, and K Obermayer. An online spike detection and spike classification algorithm capable of instantaneous resolution of overlapping spikes. Journal of Computa- tional Neuroscience, 29(1-2):127148, August 2010.
[4] W Gerstner and W M Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cam- bridge University Press, 1 edition, August 2002.
[5] M S Lewicki. A review of methods for spike sorting: the detection and classification of neural action potentials. Network: Computation in Neural Systems, 1998.
[6] C E Rasmussen and C K I Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[7] J F C Kingman. Poisson processes, volume 3 of Oxford Studies in Probability. The Clarendon Press Oxford University Press, New York, 1993. Oxford Science Publications.
[8] F Wood and M J Black. A non-parametric Bayesian alternative to spike sorting. Journal of Neuroscience Methods, 173:112, 2008.
[9] J F C Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):5978, 1967.
[10] L F James, A Lijoi, and I Pruenster. Posterior analysis for normalized random measures with independent increments. Scand. J. Stat., 36:7697, 2009.
[11] N L Hjort. Nonparametric Bayes estimators based on beta processes in models for life history data. Annals of Statistics, 18(3):12591294, 1990.
[12] R Thibaux and M I Jordan. Hierarchical beta processes and the Indian buffet process. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, volume 11, 2007.
[13] K Sato. Levy Processes and Infinitely Divisible Distributions. Cambridge University Press, 1990.
[14] D Applebaum. Levy Processes and Stochastic Calculus. Cambridge studies in advanced mathematics.University Press, 2004.
[15] T S Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209 230, 1973.
[16] A Y Lo. On a class of bayesian nonparametric estimates: I. density estimates. Annals of Statistics, 12(1):351357, 1984.
[17] J Pitman. Combinatorial stochastic processes. Technical Report 621, Department of Statistics, University of California at Berkeley, 2002. Lecture notes for St. Flour Summer School.
[18] R M Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computa- tional and Graphical Statistics, 9:249265, 2000.
[19] H Ishwaran and L F James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161173, 2001.
[20] D M Blei and M I Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1):121144, 2006.
[21] T P Minka and Z Ghahramani. Expectation propagation for infinite mixtures. Presented at NIPS2003 Workshop on Nonparametric Bayesian Methods and Infinite Models, 2003.
[22] L Wang and D B Dunson. Fast bayesian inference in dirichlet process mixture models. Journal of Computational & Graphical Statistics, 2009.
[23] D A Henze, Z Borhegyi, J Csicsvari, A Mamiya, K D Harris, and G Buzsaki. Intracellular feautures predicted by extracellular recordings in the hippocampus in Vivo. J. Neurophysiology, 2000.
[24] D E Carlson, Q Wu, W Lian, M Zhou, C R Stoetzner, D Kipke, D Weber, J T Vogelstein, D B Dunson, and L Carin. Multichannel Electrophysiological Spike Sorting via Joint Dictionary Learning and Mixture Modeling. IEEE TBME, 2013.
[25] U Rutishauser, E M Schuman, and A NMamelak. Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo. J. Neuro. Methods, 2006.
[26] A Calabrese and L Paninski. Kalman filter mixture model for spike sorting of non-stationary data. Journal of neuroscience methods, 196(1):159169, 2011.
[27] J Gasthaus, F D Wood, D Gorur, and Y W Teh. Dependent dirichlet process spike sorting. Advances in neural information processing systems, 21:497504, 2009.
[28] G Chen, M Iwen, S Chin, and M. Maggioni. A fast multiscale framework for data in high-dimensions: Measure estimation, anomaly detection, and compressive measurements. In VCIP, 2012 IEEE, 2012.
-----1
[1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artificial In- telligence, pages 247254. AAAI Press, Menlo Park, CA, 2011.
[2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963.
[3] C. Manski. Identification for Prediction and Decision. Harvard University Press, Cambridge, Mas- sachusetts, 2007.
[4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985.
[5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Gen- eralized Causal Inference. Houghton-Mifflin, Boston, second edition, 2002.
[6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007.
[7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669710, 1995.
[8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000.
[9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000.2nd edition, 2009.
[10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence, pages 698704. AAAI Press, Menlo Park, CA, 2012.
[12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 95101, Menlo Park, CA, 2013. AAAI Press.
[13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: z- transportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), pages 361370. AUAI Press, 2013.
[14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135143. JMLR W&CP 31, 2013.
[15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments.In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 583590, Menlo Park, CA, 2013. AAAI Press.
[16] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101126, 2006.
[17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 328. MIT Press, Cambridge, MA, 2009.
[18] B. Scholkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 12551262, New York, NY, USA, 2012. Omnipress.
[19] K. Zhang, B. Scholkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP volume 28, 2013.
[20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 113120. AUAI Press, 2012.
[21] M. Kuroki and M. Miyakawa. Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105117, 1999.
[22] J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567573. AAAI Press/The MIT Press, Menlo Park, CA, 2002.
[23] I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 12191226. AAAI Press, Menlo Park, CA, 2006.
-----0
N. Ancona, D. Marinazzo, and S. Stramaglia. Radial basis function approach to nonlinear Granger causality of time series. Phys. Rev. E, 70(5):056221, 2004.
A. Asuncion and D. J. Newman. UCI repository. http://archive.ics.uci.edu/ml/, 2007.A. Azzalini and A. W. Bowman. A look at some data on the Old Faithful Geyser. Applied Statistics, 39(3): 357365, 1990.
D. Bell, J. Kay, and J. Malley. A non-parametric approach to non-linear causality testing. Economics Letters, 51(1):718, 1996.
G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. Time series analysis: forecasting and control. Wiley series in probability and statistics. John Wiley, 2008.
P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods. Springer, 2nd edition, 1991.R. B. Buxton, E. C. Wong, and L. R. Frank. Dynamics of blood flow and oxygenation changes during brain activation: The balloon model. Magnetic Resonance in Medicine, 39(6):855864, 1998.
Y. Chen, G. Rangarajan, J. Feng, and M. Ding. Analyzing multiple nonlinear time series with extended Granger causality. Physics Letters A, 324, 2004.
Z. Chen, K. Zhang, and L. Chan. Causal discovery with scale-mixture model for spatiotemporal variance dependencies. In NIPS 25, 2012.T. Chu and C. Glymour. Search for additive nonlinear time series causal models. Journal of Machine Learning Research, 9:967991, 2008.
M. Eichler. Graphical modelling of multivariate time series. Probability Theory and Related Fields, 2011.D. Entner and P. Hoyer. Discovering unconfounded causal relationships using linear non-Gaussian models. In JSAI-isAI Workshops, 2010.
J. P. Florens and M. Mouchart. A note on noncausality. Econometrica, 50(3):583591, 1982.C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3):42438, July 1969.
A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Scholkopf, and A. Smola. A kernel statistical test of independence. In NIPS 20, Canada, 2008.
T. J. Hastie and R. J. Tibshirani. Generalized additive models. London: Chapman & Hall, 1990.P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Scholkopf. Nonlinear causal discovery with additive noise models. In NIPS 21, Canada, 2009.
A. Hyvarinen, S. Shimizu, and P. Hoyer. Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In ICML 25, 2008.
D. Janzing and B. Steudel. Justifying additive-noise-model based causal discovery via algorithmic information theory. Open Systems and Information Dynamics, 17:189212, 2010.
D. Janzing, J. Peters, J.M. Mooij, and B. Scholkopf. Identifying confounders using additive noise models. In UAI 25, 2009.
J. Mooij, D. Janzing, J. Peters, and B. Scholkopf. Regression by dependence minimization and its application to causal inference. In ICML 26, 2009.
G. Nolte, A. Ziehe, V. Nikulin, A. Schlogl, N. Kramer, T. Brismar, and K.-R. Muller. Robustly Estimating the Flow Direction of Information in Complex Physical Systems. Physical Review Letters, 100, 2008.
J. Pearl. Causality: Models, reasoning, and inference. Cambridge Univ. Press, 2nd edition, 2009.J. Peters, D. Janzing, A. Gretton, and B. Scholkopf. Detecting the dir. of causal time series. In ICML 26, 2009.
J. Peters, D. Janzing, and B. Scholkopf. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Analysis Machine Intelligence, 33(12):24362450, 2011a.
J. Peters, J. Mooij, D. Janzing, and B. Scholkopf. Identifiability of causal graphs using functional models. In UAI 27, 2011b.
J. Peters, J. Mooij, D. Janzing, and B. Scholkopf. Causal discovery with continuous additive noise models, 2013. arXiv:1309.6779.
C. Quinn, T. Coleman, N. Kiyavash, and N. Hatsopoulos. Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. Journal of Comp. Neuroscience, 30(1):1744, 2011.
S. Shimizu, P. Hoyer, A. Hyvarinen, and A. J. Kerminen. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:20032030, 2006.
S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. Nichols, J. D. Ramsey, and M. W. Woolrich. Network modelling methods for FMRI. NeuroImage, 54(2):875  891, 2011.
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000.M. Yamada and M. Sugiyama. Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In AAAI. AAAI Press, 2010.
-----0
Anandkumar, Anima, Chaudhuri, Kamalika, Hsu, Daniel, Kakade, Sham, Song, Le, & Zhang, Tong.2011. Spectral Methods for Learning Multivariate Latent Tree Structure. Proceedings of NIPS 24, 20252033.
Anandkumar, Anima, Foster, Dean, Hsu, Daniel, Kakade, Sham, & Liu, Yi-Kai. 2012a. A spectral algorithm for latent Dirichlet allocation. Proceedings of NIPS 25, 926934.
Anandkumar, Animashree, Hsu, Daniel, & Kakade, Sham M. 2012b. A method of moments for mixture models and hidden Markov models. In: Proceedings of COLT.Anandkumar, Animashree, Javanmard, Adel, Hsu, Daniel J, & Kakade, Sham M. 2013. Learning Linear Bayesian Networks with Latent Variables. Pages 249257 of: Proceedings of ICML.
Chang, Joseph T. 1996. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical biosciences, 137(1), 5173.
Cooper, Gregory F. 1987. Probabilistic Inference Using Belief Networks Is NP-Hard. Technical Report BMIR-1987-0195. Medical Computer Science Group, Stanford University.Elidan, Gal, & Friedman, Nir. 2006. Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6(1), 81.
Elidan, Gal, Lotner, Noam, Friedman, Nir, & Koller, Daphne. 2001. Discovering hidden variables: A structure-based approach. Advances in Neural Information Processing Systems, 479485.
Elsner, Ludwig. 1985. An optimal bound for the spectral variation of two matrices. Linear algebra and its applications, 71, 7780.
Eriksson, Nicholas. 2005. Tree construction using singular value decomposition. Algebraic Statistics for computational biology, 347358.Friedman, Nir. 1997. Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Pages 125133 of: ICML 97.Halpern, Yoni, & Sontag, David. 2013. Unsupervised Learning of Noisy-Or Bayesian Networks.
In: Conference on Uncertainty in Artificial Intelligence (UAI-13).Ishteva, Mariya, Park, Haesun, & Song, Le. 2013. Unfolding Latent Tree Structures using 4th Order Tensors. In: ICML 13.
Kearns, Michael, & Mansour, Yishay. 1998. Exact inference of hidden structure from sample data in noisy-OR networks. Pages 304310 of: Proceedings of UAI 14.
Lazarsfeld, Paul. 1950. Latent Structure Analysis. In: Stouffer, Samuel, Guttman, Louis, Suchman, Edward, Lazarsfeld, Paul, Star, Shirley, & Clausen, John (eds), Measurement and Prediction.Princeton, New Jersey: Princeton University Press.
Lazic, Nevena, Bishop, Christopher M, & Winn, John. 2013. Structural Expectation Propagation: Bayesian structure learning for networks with latent variables. In: Proceedings of AISTATS 16.Martin, J, & VanLehn, Kurt. 1995. Discrete factor analysis: Learning hidden variables in Bayesian networks. Tech. rept. Department of Computer Science, University of Pittsburgh.Mossel, Elchanan, & Roch, Sebastien. 2005. Learning nonsingular phylogenies and hidden Markov models. Pages 366375 of: Proceedings of 37th STOC. ACM.Pearl, Judea, & Tarsi, Michael. 1986. Structuring causal trees. Journal of Complexity, 2(1), 6077.
Saund, Eric. 1995. A multiple cause mixture model for unsupervised learning. Neural Computation, 7(1), 5171.
Shwe, Michael A, Middleton, B, Heckerman, DE, Henrion, M, Horvitz, EJ, Lehmann, HP, & Cooper, GF. 1991. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Meth. Inform. Med, 30, 241255.Silva, Ricardo, Scheine, Richard, Glymour, Clark, & Spirtes, Peter. 2006. Learning the structure of linear latent variable models. The Journal of Machine Learning Research, 7, 191246.
S?ingliar, Tomas?, & Hauskrecht, Milos?. 2006. Noisy-or component analysis and its application to link analysis. The Journal of Machine Learning Research, 7, 21892213.
-----1
[1] A. Anandkumar, D. P. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu. A spectral algorithm for latent dirichlet allocation. arXiv preprint arXiv:1204.6703v4, 2013.
[2] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. arXiv preprint arXiv:1210.7559v2, 2012.
[3] A. Anandkumar, D. Hsu, and S. M. Kakade. A method of moments for mixture models and hidden Markov models. arXiv preprint arXiv:1203.0683, 2012.
[4] S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu. A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777, 2012.
[5] D. Hsu and S. M. Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 1120. ACM, 2013.
[6] T.-K. Huang and J. Schneider. Learning linear dynamical systems without sequence infor- mation. In Proceedings of the 26th International Conference on Machine Learning, pages 425432, 2009.
[7] T.-K. Huang and J. Schneider. Learning auto-regressive models from sequence and non- sequence data. In Advances in Neural Information Processing Systems 24, pages 15481556.2011.
[8] T.-K. Huang, L. Song, and J. Schneider. Learning nonlinear dynamic models from non- sequenced data. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010.
[9] M. G. Rabbat, M. A. Figueiredo, and R. D. Nowak. Network inference from co-occurrences.Information Theory, IEEE Transactions on, 54(9):40534068, 2008.
[10] G. Stewart. On the perturbation of pseudo-inverses, projections and linear least squares prob- lems. SIAM review, 19(4):634662, 1977.
[11] X. Zhu, A. B. Goldberg, M. Rabbat, and R. Nowak. Learning bigrams from unigrams. In the Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technology, Columbus, OH, 2008.
-----1
[1] Andrew Blake, Carsten Rother, Matthew Brown, Patrick Perez, and Philip Torr. Interactive image segmentation using an adaptive gmmrf model. In ECCV 2004, pages 428441. 2004.
[2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts.PAMI, 2001.
[3] O. Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning.arXiv preprint arXiv:0712.0248, 2007.
[4] G.B. Folland. Real analysis: Modern techniques and their applications, john wiley & sons.New York, 1999.
[5] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. Pac-bayesian learning of linear classifiers. In ICML, pages 353360. ACM, 2009.
[6] A. Globerson and T. S. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. Advances in Neural Information Processing Systems, 21, 2007.
[7] L.A. Goldberg and M. Jerrum. The complexity of ferromagnetic ising with local fields. Com- binatorics Probability and Computing, 16(1):43, 2007.
[8] T. Hazan and T. Jaakkola. On the partition function and random maximum a-posteriori pertur- bations. In Proceedings of the 29th International Conference on Machine Learning, 2012.
[9] T. Hazan, S. Maji, and T. Jaakkola. On sampling from the gibbs distribution with random maximum a-posteriori perturbations. Advances in Neural Information Processing Systems, 2013.
[10] J. Keshet, D. McAllester, and T. Hazan. Pac-bayesian approach for minimization of phoneme error rate. In ICASSP, 2011.
[11] John Langford and John Shawe-Taylor. Pac-bayes & margins. Advances in neural information processing systems, 15:423430, 2002.
[12] Erich Leo Lehmann and George Casella. Theory of point estimation, volume 31. 1998.
[13] Andreas Maurer. A note on the pac bayesian theorem. arXiv preprint cs/0411099, 2004.
[14] D. McAllester. Simplified pac-bayesian margin bounds. Learning Theory and Kernel Ma- chines, pages 203215, 2003.
[15] D. McAllester, T. Hazan, and J. Keshet. Direct loss minimization for structured prediction.Advances in Neural Information Processing Systems, 23:15941602, 2010.
[16] Francesco Orabona, Tamir Hazan, Anand D Sarwate, and Tommi. Jaakkola. On measure con- centration of random maximum a-posteriori perturbations. arXiv:1310.4227, 2013.
[17] G. Papandreou and A. Yuille. Perturb-and-map random fields: Using discrete optimization to learn and sample from energy models. In ICCV, Barcelona, Spain, November 2011.
[18] A.M. Rush and M. Collins. A tutorial on dual decomposition and lagrangian relaxation for inference in natural language processing.
[19] Matthias Seeger. Pac-bayesian generalisation error bounds for gaussian process classification.The Journal of Machine Learning Research, 3:233269, 2003.
[20] Yevgeny Seldin. A PAC-Bayesian Approach to Structure Learning. PhD thesis, 2009.
[21] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In Conf. Uncertainty in Artificial Intelligence (UAI), 2008.
[22] Martin Szummer, Pushmeet Kohli, and Derek Hoiem. Learning crfs using graph cuts. In Computer VisionECCV 2008, pages 582595. Springer, 2008.
[23] D. Tarlow, R.P. Adams, and R.S. Zemel. Randomized optimum models for structured predic- tion. In AISTATS, pages 2123, 2012.
[24] Daniel Tarlow and Richard S Zemel. Structured output learning with high order loss functions.In International Conference on Artificial Intelligence and Statistics, pages 12121220, 2012.
[25] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. Advances in neural information processing systems, 16:51, 2004.
[26] Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning, pages 11211128. ACM, 2009.
-----0
R Iris Bahar, Erica A Frohm, Charles M Gaona, Gary D Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebraic decision diagrams and their applications. In IEEE/ACM International Conference on Computer-Aided Design, pages 188191, 1993.David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2011.
Craig Boutilier, Richard Dearden, and Moises Goldszmidt. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1):49107, 2000.
Nicklas Forsell and Regis Sabbadin. Approximate linear-programming algorithms for graph-based markov decision processes. Frontiers in Artificial Intelligence and Applications, 141:590, 2006.
Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored MDPs. Advances in Neural Information Processing Systems, 14:15231530, 2001.
Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored mdps. Journal of Artificial Intelligence Research, 19:399468, 2003.
Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence, pages 279 288, 1999.
Daphne Koller and Ronald Parr. Policy iteration for factored mdps. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pages 326334, 2000.
Qiang Liu and Alexander Ihler. Variational algorithms for marginal MAP. In Uncertainty in Artificial Intelligence (UAI), 2011.
Qiang Liu and Alexander Ihler. Belief propagation for structured decision making. In Uncertainty in Artificial Intelligence (UAI), pages 523532, August 2012.
A. Nath and P. Domingos. Efficient belief propagation for utility maximization and repeated inference. In The Proceeding of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
Nathalie Peyrard and Regis Sabbadin. Mean field approximation of the policy iteration algorithm for graphbased markov decision processes. Frontiers in Artificial Intelligence and Applications, 141:595, 2006.
Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley-Interscience, 2009.
Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 61 70, 2002.
R. Sabbadin, N. Peyrard, and N. Forsell. A framework and a mean-field algorithm for the local control of spatial processes. International Journal of Approximate Reasoning, 53(1):66  86, 2012.
Ross D Shachter. Model building with belief networks and influence diagrams. Advances in decision analysis: from foundations to applications, pages 177201, 2007.
Olivier Sigaud, Olivier Buffet, et al. Markov decision processes in artificial intelligence. ISTE-Jonh Wiley & Sons, 2010.
M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
-----1
[1] D. Gamerman and H. F. Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference.Chapman & Hall Texts in Statistical Science Series. Taylor & Francis, 2006.
[2] C. P. Robert and G. Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2005.
[3] R. E. Kass and D. Steffey. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Statist. Assoc., 84(407):717726, 1989.
[4] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In Learning in graphical models, pages 105161, Cambridge, MA, 1999. MIT Press.
[5] T. P. Minka. Expectation propagation for approximate Bayesian inference. In J. S. Breese and D. Koller, editors, Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 362369, 2001.
[6] J. T. Ormerod. Skew-normal variational approximations for Bayesian inference. Technical Report CRG- TR-93-1, School of Mathematics and Statistics, Univeristy of Sydney, 2011.
[7] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B, 71(2):319 392, 2009.
[8] J. Hensman, M. Rattray, and N. D. Lawrence. Fast variational inference in the conjugate exponential family. In Advances in Neural Information Processing Systems, 2012.
[9] J. Foulds, L. Boyles, C. Dubois, P. Smyth, and M. Welling. Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013.
[10] J. W. Paisley, D. M. Blei, and M. I. Jordan. Variational Bayesian inference with stochastic search. In International Conference on Machine Learning, 2012.
[11] C. Wang and D. M. Blei. Truncation-free online variational inference for Bayesian nonparametric models.In Advances in Neural Information Processing Systems, 2012.
[12] S. J. Gershman, M. D. Hoffman, and D. M. Blei. Nonparametric variational inference. In International Conference on Machine Learning, 2012.
[13] E. Challis and D. Barber. Concave Gaussian variational approximations for inference in large-scale Bayesian linear models. Journal of Machine Learning Research - Proceedings Track, 15:199207, 2011.
[14] M. E. Khan, S. Mohamed, and K. P. Muprhy. Fast Bayesian inference for non-conjugate Gaussian process regression. In Advances in Neural Information Processing Systems, 2012.
[15] M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Computa- tional Neuroscience Unit, University College London, 2003.
[16] C. Ritter and M. A. Tanner. Facilitating the Gibbs sampler: The Gibbs stopper and the griddy-Gibbs sampler. J. Am. Statist. Assoc., 87(419):pp. 861868, 1992.
[17] M. Opper and C. Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786792, 2009.
[18] E. Challis and D. Barber. Affine independence variational inference. In Advances in Neural Information Processing Systems, 2012.
[19] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[20] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. J.Am. Statist. Assoc., 81:8286, 1986.
[21] T. Park and G. Casella. The Bayesian Lasso. J. Am. Statist. Assoc., 103(482):681686, 2008.
[22] D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B, 36(1):pp. 99102, 1974.
[23] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58:267288, 1996.
[24] G. H. Golub and C. V. Loan. Matrix Computations(Third Edition). Johns Hopkins University Press, 1996.
[25] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407 499, 2004.
[26] C. Hans. Bayesian Lasso regression. Biometrika, 96(4):835845, 2009.
[27] T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, F. Freha, E. Redwine, and N. Yang. Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. ii. radical prostatectomy treated patients. Journal of Urology, 16:pp. 10761083, 1989.
[28] M. W. Seeger and H. Nickisch. Large scale Bayesian inference and experimental design for sparse linear models. SIAM J. Imaging Sciences, 4(1):166199, 2011.
[29] B. Cseke and T. Heskes. Approximate marginals in latent Gaussian models. J. Mach. Learn. Res., 12:417 454, 2011.
-----1
[1] H. Attias. Inferring parameters and structure of latent variable models by variational Bayes. In Proc. of UAI, pages 2130, 1999.
[2] S. D. Babacan, S. Nakajima, and M. N. Do. Probabilistic low-rank subspace clustering. In Advances in Neural Information Processing Systems 25, pages 27532761, 2012.
[3] J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48:259 302, 1986.
[4] C. M. Bishop. Variational principal components. In Proc. of International Conference on Artificial Neural Networks, volume 1, pages 514509, 1999.
[5] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, NY, USA, 2006.
[6] F. J. Drexler. A homotopy method for the calculation of all zeros of zero-dimensional polynomial ideals.In H. J. Wacker, editor, Continuation methods, pages 6993, New York, 1978. Academic Press.
[7] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. of CVPR, pages 27902797, 2009.
[8] P. Favaro, R. Vidal, and A. Ravichandran. A closed form solution to robust subspace estimation and clustering. In Proceedings of CVPR, pages 18011807, 2011.
[9] C. B. Garcia and W. I. Zangwill. Determining all solutions to certain systems of nonlinear equations.Mathematics of Operations Research, 4:114, 1979.
[10] T. Gunji, S. Kim, M. Kojima, A. Takeda, K. Fujisawa, and T. Mizutani. Phoma polyhedral homotopy continuation method. Computing, 73:5777, 2004.
[11] A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman and Hall/CRC, 1999.
[12] T. L. Lee, T. Y. Li, and C. H. Tsai. Hom4ps-2.0: a software package for solving polynomial systems by the polyhedral homotopy continuation method. Computing, 83:109133, 2008.
[13] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In Proc. of ICML, pages 663670, 2010.
[14] G. Liu, H. Xu, and S. Yan. Exact subspace segmentation and outlier detection by low-rank representation.In Proc. of AISTATS, 2012.
[15] G. Liu and S. Yan. Latent low-rank representation for subspace segmentation and feature extraction. In Proc. of ICCV, 2011.
[16] S. Nakajima, M. Sugiyama, and S. D. Babacan. Variational Bayesian sparse additive matrix factorization.Machine Learning, 92:3191347, 2013.
[17] S. Nakajima, M. Sugiyama, S. D. Babacan, and R. Tomioka. Global analytic solution of fully-observed variational Bayesian matrix factorization. Journal of Machine Learning Research, 14:137, 2013.
[18] M. Seeger and G. Bouchard. Fast variational Bayesian inference for non-conjugate matrix factorization models. In Proceedings of International Conference on Artificial Intelligence and Statistics, La Palma, Spain, 2012.
[19] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell., 22(8):888905, 2000.
[20] M. Soltanolkotabi and E. J. Cande`s. A geometric analysis of subspace clustering with outliers. CoRR, 2011.
[21] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In Proc.of CVPR, 2007.
-----1
[1] S. Barthelme and N. Chopin. ABC-EP: Expectation Propagation for likelihood-free Bayesian computation. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[2] J. Domke. Parameter learning with truncated message-passing. InComputer Vision and Pattern Recognition (CVPR). IEEE, 2011.
[3] J. Domke. Learning graphical model parameters with approximate marginal inference. Pattern Analysis and Machine Intelligence (PAMI), 2013.
[4] N.D. Goodman, V.K. Mansinghka, D.M. Roy, K. Bonawitz, and J.B. Tenenbaum. Church: A language for generative models. In Proc. of Uncertainty in Artificial Intelligence (UAI), 2008.
[5] R. Herbrich, T.P. Minka, and T. Graepel. Trueskill: A Bayesian skill rating system. Advances in Neural Information Processing Systems, 19:569, 2007.
[6] T.P. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Mas- sachusetts Institute of Technology, 2001.
[7] T.P. Minka and J. Winn. Gates: A graphical notation for mixture models. In Advances in Neural Information Processing Systems, 2008.
[8] T.P. Minka, J.M. Winn, J.P. Guiver, and D.A. Knowles. Infer.NET 2.5, 2012. Microsoft Research. http://research.microsoft.com/infernet.
[9] P. Kohli R. Shapovalov, D. Vetrov. Spatial inference machines. InComputer Vision and Pattern Recognition (CVPR). IEEE, 2013.
[10] S. Ross, D. Munoz, M. Hebert, and J.A. Bagnell. Learning message-passing inference ma- chines for structured prediction. In Computer Vision and Pattern Recognition (CVPR). IEEE, 2011.
[11] D.B. Rubin. Bayesianly justifiable and relevant frequency calculations for the applies statisti- cian. The Annals of Statistics, pages 11511172, 1984.
[12] Stan Development Team. Stan: A C++ library for probability and sampling, version 1.3, 2013.
[13] D.H. Stern, R. Herbrich, and T. Graepel. Matchbox: Large scale online Bayesian recommenda- tions. In Proceedings of the 18th international conference on World Wide Web, pages 111120.ACM, 2009.
[14] A. Thomas. BUGS: A statistical modelling package. RTA/BCS Modular Languages Newslet- ter, 1994.
[15] D. Wingate, N.D. Goodman, A. Stuhlmueller, and J. Siskind. Nonstandard interpretations of probabilistic programs for efficient inference. In Advances in Neural Information Processing Systems, 2011.
[16] D. Wingate and T. Weber. Automated variational inference in probabilistic programming. In arXiv:1301.1299, 2013.
-----1
[1] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008.
[2] A. Bordes, X. Glorot, J. Weston, and Y. Bengio. A semantic matching energy function for learning with multi-relational data. Machine Learning, 2013.
[3] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowl- edge bases. In Proceedings of the 25th Annual Conference on Artificial Intelligence (AAAI), 2011.
[4] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural net- works. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)., 2010.
[5] R. A. Harshman and M. E. Lundy. Parafac: parallel factor analysis. Computational Statistics & Data Analysis, 18(1):3972, Aug. 1994.
[6] R. Jenatton, N. Le Roux, A. Bordes, G. Obozinski, et al. A latent factor model for highly multi-relational data. In Advances in Neural Information Processing Systems (NIPS 25), 2012.
[7] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st Annual Conference on Artificial Intelligence (AAAI), 2006.
[8] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS 26), 2013.
[9] G. Miller. WordNet: a Lexical Database for English. Communications of the ACM, 38(11):39 41, 1995.
[10] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction.In Advances in Neural Information Processing Systems (NIPS 22), 2009.
[11] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning on multi- relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML), 2011.
[12] M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing YAGO: scalable machine learning for linked data. In Proceedings of the 21st international conference on World Wide Web (WWW), 2012.
[13] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2008.
[14] R. Socher, D. Chen, C. D. Manning, and A. Y. Ng. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In Advances in Neural Information Processing Systems (NIPS 26), 2013.
[15] I. Sutskever, R. Salakhutdinov, and J. Tenenbaum. Modelling relational data using bayesian clustered tensor factorization. In Advances in Neural Information Processing Systems (NIPS 22), 2009.
[16] J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013.
[17] J. Zhu. Max-margin nonparametric latent feature models for link prediction. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.
-----1
[1] E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. JMLR, 9, 2008.
[2] S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251276, 1998.
[3] M. Bastian, S. Heymann, and M. Jacomy. Gephi: An open source software for exploring and manipulating networks, 2009.
[4] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In ICML, 2004.
[5] M. Bryant and E. B. Sudderth. Truly nonparametric online variational inference for hierarchical dirichlet processes. In NIPS, pages 27082716, 2012.
[6] P. Gopalan, D. M. Mimno, S. Gerrish, M. J. Freedman, and D. M. Blei. Scalable inference of overlapping communities. In NIPS, pages 22582266, 2012.
[7] T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. Technical Report 2005-001, Gatsby Computational Neuroscience Unit, May 2005.
[8] M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. arXiv preprint arXiv:1206.7051, 2012.
[9] M. D. Hoffman, D. M. Blei, and F. R. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856864, 2010.
[10] C. Kemp, J. Tenenbaum, T. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In AAAI, 2006.
[11] D. Kim, M. C. Hughes, and E. B. Sudderth. The nonparametric metadata dependent relational model. In ICML, 2012.
[12] A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80(1):016118, July 2009.
[13] littlesis.org. Littlesis is a free database detailing the connections between powerful people and organiza- tions, June 2009.
[14] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction. In NIPS, 2009.
[15] M. Morup, M. N. Schmidt, and L. K. Hansen. Infinite multiple membership relational modeling for com- plex networks. In Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on, pages 16. IEEE, 2011.
[16] T. Nepusz, A. Petrczi, L. Ngyessy, and F. Bazs. Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 77(1 Pt 2):016107, 2008.
[17] M. Sato. Online model selection based on the variational bayes. Neural Computation, 13(7):16491681, 2001.
[18] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. JASA, 101(476):15661581, Dec. 2006.
[19] Y. W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for hdp. In NIPS, 2007.
[20] Y. Wang and G. Wong. Stochastic blockmodels for directed graphs. JASA, 82(397):819, 1987.
-----0
D. Angluin and P. Laird. Learning from noisy examples. Mach. Learn., 2(4):343370, 1988.
Javed A. Aslam and Scott E. Decatur. On the sample complexity of noise-tolerant learning. Inf. Process. Lett., 57(4):189195, 1996.Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138156, 2006.
Shai Ben-David, David Pal, and Shai Shalev-Shwartz. Agnostic online learning. In Proceedings of the 22nd Conference on Learning Theory, 2009.
Battista Biggio, Blaine Nelson, and Pavel Laskov. Support vector machines under adversarial label noise.Journal of Machine Learning Research Proceedings Track, 20:97112, 2011.Tom Bylander. Learning linear threshold functions in the presence of classification noise. In Proc. of the 7th COLT, pages 340347, NY, USA, 1994. ACM.
Nicolo` Cesa-Bianchi, Eli Dichterman, Paul Fischer, Eli Shamir, and Hans Ulrich Simon. Sample-efficient strategies for learning in the presence of noise. J. ACM, 46(5):684719, 1999.Nicolo` Cesa-Bianchi, Shai Shalev-Shwartz, and Ohad Shamir. Online learning of noisy data. IEEE Transactions on Information Theory, 57(12):79077931, 2011.K. Crammer and D. Lee. Learning via gaussian herding. In Advances in NIPS 23, pages 451459, 2010.
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. J. Mach. Learn. Res., 7:551585, 2006.
Koby Crammer, Alex Kulesza, and Mark Dredze. Adaptive regularization of weight vectors. In Advances in NIPS 22, pages 414422, 2009.
Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification. In Proceedings of the Twenty-Fifth ICML, pages 264271, 2008.C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proc. of the 14th ACM SIGKDD intl. conf. on Knowledge discovery and data mining, pages 213220, 2008.Yoav Freund. A more robust boosting algorithm, 2009. preprint arXiv:0905.2138 [stat.ML] available at http://arxiv.org/abs/0905.2138.
T. Graepel and R. Herbrich. The kernel Gibbs sampler. In Advances in NIPS 13, pages 514520, 2000.Roni Khardon and Gabriel Wachman. Noise tolerant variants of the perceptron algorithm. J. Mach. Learn.Res., 8:227248, 2007.
Neil D. Lawrence and Bernhard Scholkopf. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the Eighteenth ICML, pages 306313, 2001.
Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S Yu. Building text classifiers using positive and unlabeled examples. In ICDM 2003., pages 179186. IEEE, 2003.
Philip M. Long and Rocco A. Servedio. Random classification noise defeats all convex potential boosters.Mach. Learn., 78(3):287304, 2010.Yves Lucet. What shape is your conjugate? a survey of computational convex analysis and its applications.SIAM Rev., 52(3):505542, August 2010. ISSN 0036-1445.
Naresh Manwani and P. S. Sastry. Noise tolerance under risk minimization. To appear in IEEE Trans. Syst.Man and Cybern. Part B, 2013. URL: http://arxiv.org/abs/1109.5231.
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. on Opt., 19(4):15741609, 2009.
David F. Nettleton, A. Orriols-Puig, and A. Fornells. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev., 33(4):275306, 2010.
Clayton Scott. Calibrated asymmetric surrogate losses. Electronic J. of Stat., 6:958992, 2012.Clayton Scott, Gilles Blanchard, and Gregory Handy. Classification with asymmetric label noise: Consistency and maximal denoising. To appear in COLT, 2013.G. Stempfel and L. Ralaivola. Learning kernel perceptrons on noisy data using random projections. In Algorithmic Learning Theory, pages 328342. Springer, 2007.G. Stempfel, L. Ralaivola, and F. Denis. Learning from noisy data using hyperplane sampling and sample averages. 2007.Guillaume Stempfel and Liva Ralaivola. Learning SVMs from sloppily labeled data. In Proc. of the 19th Intl.
Conf. on Artificial Neural Networks: Part I, pages 884893. Springer-Verlag, 2009.Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth ICML, pages 928936, 2003.
-----1
[1] P. Paatero, Least squares formulation of robust non-negative factor analysis, Chemometrics and Intelli- gent Laboratory Systems, vol. 37, no. 1, pp. 2335, May 1997.
[2] P. O. Hoyer, Non-negative matrix factorization with sparseness constraints, The Journal of Machine Learning Research, vol. 5, pp. 14571469, Dec. 2004.
[3] R. Salakhutdinov and A. Mnih, Bayesian probabilistic matrix factorization using Markov chain Monte Carlo, in Proceedings of the 25th International Conference on Machine Learning, New York, NY, Jul. 5 Aug. 9, 2008, pp. 880887.
[4] Y. J. Lim and Y. W. Teh, Variational Bayesian approach to movie rating prediction, in Proceedings of KDD Cup and Workshop, San Jose, CA, Aug. 12, 2007.
[5] T. Raiko, A. Ilin, and J. Karhunen, Principal component analysis for large scale problems with lots of missing values, in Machine Learning: ECML 2007, ser. Lecture Notes in Computer Science, J. N.Kok, J. Koronacki, R. L. de Mantaras, S. Matwin, D. Mladenic?, and A. Skowron, Eds. Springer Berlin Heidelberg, 2007, vol. 4701, pp. 691698.
[6] D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proceedings of the National Academy of Sciences USA, vol. 106, no. 45, pp. 18 91418 919, Nov. 2009.
[7] S. Rangan, Generalized approximate message passing for estimation with random linear mixing, in Pro- ceedings of 2011 IEEE International Symposium on Information Theory, St. Petersburg, Russia, Jul. 31 Aug. 5, 2011, pp. 21682172.
[8] S. Rangan and A. K. Fletcher, Iterative estimation of constrained rank-one matrices in noise, in Pro- ceedings of 2012 IEEE International Symposium on Information Theory, Cambridge, MA, Jul. 16, 2012, pp. 12461250.
[9] R. Matsushita and T. Tanaka, Approximate message passing algorithm for low-rank matrix reconstruc- tion, in Proceedings of the 35th Symposium on Information Theory and its Applications, Oita, Japan, Dec. 1114, 2012, pp. 314319.
[10] W. Xu, X. Liu, and Y. Gong, Document clustering based on non-negative matrix factorization, in Pro- ceedings of the 26th annual international ACM SIGIR conference on Research and development in infor- maion retrieval, Toronto, Canada, Jul. 28Aug. 1, 2003, pp. 267273.
[11] C. Ding, T. Li, and M. Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 4555, Jan. 2010.
[12] S. P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol. IT-28, no. 2, pp. 129137, Mar. 1982.
[13] F. Krzakala, M. Mezard, and L. Zdeborova, Phase diagram and approximate message passing for blind calibration and dictionary learning, preprint, Jan. 2013, arXiv:1301.5898v1 [cs.IT].
[14] J. T. Parker, P. Schniter, and V. Cevher, Bilinear generalized approximate message passing, preprint, Oct. 2013, arXiv:1310.2632v1 [cs.IT].
[15] S. Nakajima and M. Sugiyama, Theoretical analysis of Bayesian matrix factorization, Journal of Ma- chine Learning Research, vol. 12, pp. 25832648, Sep. 2011.
[16] D. Arthur and S. Vassilvitskii, k-means++: the advantages of careful seeding, in SODA 07 Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, Jan. 79, 2007, pp. 10271035.
[17] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Constructing free-energy approximations and generalized belief propagation algorithms, IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 22822312, Jul. 2005.
[18] M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 764785, Feb. 2011.
[19] H. W. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly, vol. 2, no. 12, pp. 8397, Mar. 1955.
[20] F. S. Samaria and A. C. Harter, Parameterisation of a stochastic model for human face identification, in Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, Dec. 1994, pp.138142. [Online]. Available: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html 
-----1
[1] http://googleresearch.blogspot.com/2010/04/ lessons-learned-developing-practical.html.
[2] Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? In CVPR, pages 7380, 2010.
[3] Leon Bottou. http://leon.bottou.org/projects/sgd.
[4] Olivier Chapelle, Patrick Haffner, and Vladimir N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks, 10(5):10551064, 1999.
[5] Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
[6] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9:18711874, 2008.
[7] Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, and Nakul Verma. Learning the structure of manifolds using random projections. In NIPS, Vancouver, BC, Canada, 2008.
[8] Jerome H. Friedman, F. Baskett, and L. Shustek. An algorithm for finding nearest neighbors. IEEE Transactions on Computers, 24:10001006, 1975.
[9] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42(6):11151145, 1995.
[10] Matthias Hein and Olivier Bousquet. Hilbertian metrics and positive definite kernels on probability mea- sures. In AISTATS, pages 136143, Barbados, 2005.
[11] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation.Journal of ACM, 53(3):307323, 2006.
[12] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimen- sionality. In STOC, pages 604613, Dallas, TX, 1998.
[13] Yugang Jiang, Chongwah Ngo, and Jun Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In CIVR, pages 494501, Amsterdam, Netherlands, 2007.
[14] Thorsten Joachims. Training linear svms in linear time. In KDD, pages 217226, Pittsburgh, PA, 2006.
[15] Fuxin Li, Guy Lebanon, and Cristian Sminchisescu. A linear approximation to the 2 kernel with geo- metric convergence. Technical report, arXiv:1206.4074, 2013.
[16] Ping Li. Very sparse stable random projections for dimension reduction in l (0 <   2) norm. In KDD, San Jose, CA, 2007.
[17] Ping Li. Estimators and tail bounds for dimension reduction in l (0 <   2) using stable random projections. In SODA, pages 10  19, San Francisco, CA, 2008.
[18] Ping Li. Improving compressed counting. In UAI, Montreal, CA, 2009.
[19] Ping Li, Michael Mitzenmacher, and Anshumali Shrivastava. Coding for random projections. 2013.
[20] Ping Li, Art B Owen, and Cun-Hui Zhang. One permutation hashing. In NIPS, Lake Tahoe, NV, 2012.
[21] Ping Li, Cun-Hui Zhang, and Tong Zhang. Compressed counting meets compressed sensing. 2013.
[22] S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1:117236, 2 2005.
[23] Noam Nisan. Pseudorandom generators for space-bounded computations. In STOC, 1990.
[24] Gennady Samorodnitsky and Murad S. Taqqu. Stable Non-Gaussian Random Processes. Chapman & Hall, New York, 1994.
[25] Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807814, Corvalis, Oregon, 2007.
[26] Andrea Vedaldi and Andrew Zisserman. Efficient additive kernels via explicit feature maps. IEEE Trans.Pattern Anal. Mach. Intell., 34(3):480492, 2012.
[27] Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman, and C. V. Jawahar. Generalized rbf feature maps for efficient detection. In BMVC, pages 111, Aberystwyth, UK, 2010.
[28] Gang Wang, Derek Hoiem, and David A. Forsyth. Building text features for object image classification.In CVPR, pages 13671374, Miami, Florida, 2009.
[29] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. Locality- constrained linear coding for image classification. In CVPR, pages 33603367, San Francisco, CA, 2010.
[30] Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In ICML, pages 11131120, 2009.
[31] Haiquan (Chuck) Zhao, Nan Hua, Ashwin Lall, Ping Li, JiaWang, and Jun Xu. Towards a universal sketch for origin-destination network measurements. In Network and Parallel Computing, pages 201213, 2011.
-----1
[1] G. Bergqvist and E. G. Larsson. The Higher-Order Singular Value Decomposition Theory and an Appli- cation. IEEE Signal Processing Magazine, 27(3):151154, 2010.
[2] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
[3] A. Cichocki and R. Zdunek. Multilayer nonnegative matrix factorization. Electronics Letters, 42:947 948, 2006.
[4] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonnegative Matrix and Tensor Factorizations - Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, 2009.
[5] F. Diego, S. Reichinnek, M. Both, and F. A. Hamprecht. Automated identification of neuronal activity from calcium imaging by sparse dictionary learning. In International Symposium on Biomedical Imaging, in press, 2013.
[6] W. Goebel and F. Helmchen. In vivo calcium imaging of neural network function. Physiology, 2007.
[7] C. Grienberger and A. Konnerth. Imaging calcium in neurons. Neuron, 2011.
[8] Q. Ho, J. Eisenstein, and E. P. Xing. Document hierarchies from text and links. In Proc. of the 21st Int.World Wide Web Conference (WWW 2012), pages 739748. ACM, 2012.
[9] R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, E. Eger, F. Bach, and B. Thirion. Multi-scale Mining of fMRI data with Hierarchical Structured Sparsity. SIAM Journal on Imaging Sciences, 5(3), 2012.
[10] R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[11] J. Kerr and W. Denk. Imaging in vivo: watching the brain in action. Nature Review Neuroscience, 2008.
[12] H. Kim and H. Park. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. on Matrix Analysis and Applications, 2008.
[13] S. Kim and E. P. Xing. Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann. Appl. Stat., 2012.
[14] Y. Li and A. Ngom. The non-negative matrix factorization toolbox for biological data mining. In BMC Source Code for Biology and Medicine, 2013.
[15] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
[16] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Matrix Factorization and Sparse Coding.Journal of Machine Learning Research, 2010.
[17] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and R. Jenatton. Sparse modeling software. http://spams- devel.gforge.inria.fr/.
[18] E. A. Mukamel, A. Nimmerjahn, and M. J. Schnitzer. Automated analysis of cellular signals from large- scale calcium imaging data. Neuron, 2009.
[19] M. Protter and M. Elad. Image sequence denoising via sparse and redundant representations. IEEE Transactions on Image Processing, 18(1), 2009.
[20] S. Reichinnek, A. von Kameke, A. M. Hagenston, E. Freitag, F. C. Roth, H. Bading, M. T. Hasan, A. Draguhn, and M. Both. Reliable optical detection of coherent neuronal activity in fast oscillating networks in vitro. NeuroImage, 60(1), 2012.
[21] R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 2010.
[22] A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. ECML PKDD, 2008.
[23] M. Sun and H. Van Hamme. A two-layer non-negative matrix factorization model for vocabulary discov- ery. In Symposium on machine learning in speech and language processing, 2011.
[24] Q. Sun, P. Wu, Y. Wu, M. Guo, and J. Lu. Unsupervised multi-level non-negative matrix factorization model: Binary data case. Journal of Information Security, 2012.
[25] J. Yang, Z. Wang, Z. Lin, X. Shu, and T. S. Huang. Bilevel sparse coding for coupled feature spaces. In CVPR12, pages 23602367. IEEE, 2012.
[26] B. Zhao, L. Fei-Fei, and E. P. Xing. Online detection of unusual events in videos via dynamic sparse coding. In The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
-----1
applied to various fields, ranging from collaborative filtering [15], to computer vision [17], and medical imaging [9], among others. In this paper, we propose a new method to tensor completion, which is based on a convex regularizer which encourages low rank tensors and develop an algorithm for solving the associated regularization problem.Arguably the most widely used convex approach to tensor completion is based upon the extension of trace norm regularization [24] to that context. This involves computing the average of the trace norm of each matricization of the tensor [16]. A key insight behind using trace norm regularization for matrix completion is that this norm provides a tight convex relaxation of the rank of a matrix defined on the spectral unit ball [8]. Unfortunately, the extension of this methodology to the more general tensor setting presents some difficulties. In particular, we shall prove in this paper that the tensor trace norm is not a tight convex relaxation of the tensor rank.The above negative result stems from the fact that the spectral norm, used to compute the convex relaxation for the trace norm, is not an invariant property of the matricization of a tensor. This observation leads us to take a different route and study afresh the convex relaxation of tensor rank on the Euclidean ball. We show that this relaxation is tighter than the tensor trace norm, and we describe a technique to solve the associated regularization problem. This method builds upon the alternating direction method of multipliers and a subgradient method to compute the proximity operator of the proposed regularizer. Furthermore, we present numerical experiments on one synthetic dataset and two real-life datasets, which indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable.The paper is organized in the following manner. In Section 2, we describe the tensor completion framework. In Section 3, we highlight some limitations of the tensor trace norm regularizer and present an alternative convex relaxation for the tensor rank. In Section 4, we describe a method to solve the associated regularization problem. In Section 5, we report on our numerical experience with the proposed method. Finally, in Section 6, we summarize the main contributions of this paper and discuss future directions of research.2 Preliminaries In this section, we begin by introducing some notation and then proceed to describe the learning problem. We denote by N the set of natural numbers and, for every k ? N, we define [k] = {1, . . . , k}. Let N ? N and let1 p1, . . . , pN ? 2. An N -order tensor W ? Rp1׷pN , is a collection of real numbers (Wi1,...,iN : in ? [pn], n ? [N ]). Boldface Euler scripts, e.g. W , will be used to denote tensors of order higher than two. Vectors are 1-order tensors and will be denoted by lower case letters, e.g. x or a; matrices are 2-order tensors and will be denoted by upper case letters, e.g. W . If x ? Rd then for every r ? s ? d, we define xr:s := (xi : r ? i ? s). We also use the notation pmin = min{p1, . . . , pN} and pmax = max{p1, . . . , pN}.A mode-n fiber of a tensor W is a vector composed of the elements of W obtained by fixing all indices but one, corresponding to the n-th mode. This notion is a higher order analogue of columns (mode-1 fibers) and rows (mode-2 fibers) for matrices. The mode-n matricization (or unfolding) of W , denoted by W(n), is a matrix obtained by arranging the mode-n fibers of W so that each of them is a column of W(n) ? RpnJn , where Jn := ? k 6=n pk. Note that the ordering of the columns is not important as long as it is used consistently.We are now ready to describe the learning problem. We choose a linear operator I : Rp1׷pN ? R m, representing a set of linear measurements obtained from a target tensor W0 as y = I(W0)+?, where ? is some disturbance noise. Tensor completion is an important example of this setting, in this case the operator I returns the known elements of the tensor. That is, we have I(W0) = (W0i1(j),...,iN (j) : j ? [m]), where, for every j ? [m] and n ? [N ], the index in(j) is a prescribed integer in the set [pn]. Our aim is to recover the tensor W 0 from the data (I, y). To this end, we solve the regularization problem min {?y ? I(W)?22 + ?R(W) : W ? Rp1׷pN} (1) where ? is a positive parameter which may be chosen by cross validation. The role of the regularizer R is to encourage solutions W which have a simple structure in the sense that they involve a small number of degrees of freedom. A natural choice is to consider the average of the rank of the tensors matricizations. Specifically, we consider the combinatorial regularizer R(W) = N N? n=1 rank(W(n)). (2) Finding a convex relaxation of this regularizer has been the subject of recent works [9, 17, 23]. They all agree to use the sum of nuclear norms as a convex proxy of R. This is defined as the average of the trace norm of each matricization of W , that is, ?W?tr = 1 N N? n=1 ?W(n)?tr (3) where ?W(n)?tr is the trace (or nuclear) norm of matrix W(n), namely the ?1-norm of the vector of singular values of matrix W(n) (see, e.g. [14]). Note that in the particular case of 2-order tensors, functions (2) and (3) coincide with the usual notion of rank and trace norm of a matrix, respectively.A rational behind the regularizer (3) is that the trace norm is the tightest convex lower bound to the rank of a matrix on the spectral unit ball, see [8, Thm. 1]. This lower bound is given by the convex envelope of the function ?(W ) = { rank(W ), if ?W?? ? 1 +?, otherwise (4) 1For simplicity we assume that pn ? 2 for every n ? [N ], otherwise we simply reduce the order of the tensor without loss of information.where ?  ?? is the spectral norm, namely the largest singular value of W . The convex envelope can be derived by computing the double conjugate of ?. This is defined as ???(W ) = sup {?W,S? ???(W ) : S ? Rp1p2} (5) where ?? is the conjugate of ?, namely ??(S) = sup {?W,S? ??(W ) :W ? Rp1p2}.Note that ? is a spectral function, that is, ?(W ) = ?(?(W )) where ? : Rd+ ? R denotes the associated symmetric gauge function. Using von Neumanns trace theorem (see e.g. [14]) it is easily seen that ??(S) is also a spectral function. That is, ??(S) = ??(?(S)), where ??(?) = sup {??,w? ? ?(w) : w ? Rd+} , with d := min(p1, p2).We refer to [8] for a detailed discussion of these ideas. We will use this equivalence between spectral and gauge functions repeatedly in the paper.3 Alternative Convex Relaxation In this section, we show that the tensor trace norm is not a tight convex relaxation of the tensor rank R in equation (2). We then propose an alternative convex relaxation for this function.Note that due to the composite nature of the function R, computing its convex envelope is a chal- lenging task and one needs to resort to approximations. In [22], the authors note that the tensor trace norm ?  ?tr in equation (3) is a convex lower bound to R on the set G? := { W ? Rp1׷pN : ??W(n)??? ? 1, ?n ? [N ]} .The key insight behind this observation is summarized in Lemma 4, which we report in Appendix A.However, the authors of [22] leave open the question of whether the tensor trace norm is the convex envelope of R on the set G?. In the following, we will prove that this question has a negative answer by showing that there exists a convex function ? 6= ?  ?tr which underestimates the function R on G? and such that for some tensor W ? G? it holds that ?(W) > ?W?tr.To describe our observation we introduce the set G2 := { W ? Rp1...pN : ?W?2 ? 1 } where ?  ?2 is the Euclidean norm for tensors, that is, ?W?22 := p1? i1=1    pN? iN=1 (Wi1,...,iN )2.We will choose ?(W) = ??(W) := N N? n=1 ???? ( ? ( W(n) )) (6) where ???? is the convex envelope of the cardinality of a vector on the ?2-ball of radius ? and we will choose ? = ? pmin. Note, by Lemma 4 stated in Appendix A, that for every ? > 0, function ?? is a convex lower bound of function R on the set ?G2.Below, for every vector s ? Rd we denote by s? the vector obtained by reordering the components of s so that they are non increasing in absolute value, that is, |s?1| ?    ? |s?d|.Lemma 1. Let ???? be the convex envelope of the cardinality on the ?2-ball of radius ?. Then, for every x ? Rd such that ?x?2 = ?, it holds that ???? (x) = card (x).This lemma is proved in Appendix B. The function ???? resembles the norm developed in [1], which corresponds to the convex envelope of the indicator function of the cardinality of a vector in the ?2 ball. The extension of its application to tensors is not straighforward though, as it is required to specify beforehand the rank of each matricization.The next lemma provides, together with Lemma 1, a sufficient condition for the existence of a tensor W ? G? at which the regularizer in equation (6) is strictly larger than the tensor trace norm.Lemma 2. If N ? 3 and p1, . . . , pN are not all equal to each other, then there exists W ? Rp1׷pN such that: (a) ?W?2 = ?pmin, (b) W ? G?, (c) min n?[N ] rank(W(n)) < max n?[N ] rank(W(n)).The proof of this lemma is presented in Appendix C. We are now ready to formulate the main result of this section.Proposition 3. Let p1, . . . , pN ? N, let ?  ?tr be the tensor trace norm in equation (3) and let ?? be the function in equation (6) for ? = ? pmin. If pmin < pmax, then there are infinitely many tensors W ? G? such that ??(W) > ?W?tr. Moreover, for every W ? G2, it holds that ?1(W) ? ?W?tr.Proof. By construction ??(W) ? R(W) for every W ? ?G2. Since G? ? ?G2 then ?? is a convex lower bound for the tensor rank R on the set G? as well. The first claim now follows by Lemmas 1 and 2. Indeed, all tensors obtained following the process described in the proof of Lemma 2 (in Appendix C) have the property that ?W?tr = 1 N N? n=1 ??(W(n))?1 = 1 N ( pmin(N ? 1) + ? p2min + pmin ) < N (pmin(N ? 1) + pmin + 1) = ?(W) = R(W).Furthermore there are infinitely many such tensors which satisfy this claim (see Appendix C).With respect to the second claim, given that ???1 is the convex envelope of the cardinality card on the Euclidean unit ball, then ???1 (?) ? ???1 for every vector ? such that ???2 ? 1. Consequently, ?1(W) = N ?N n=1 ? ?? ( ? ( W(n) )) ? 1 N ?N n=1 ??(W(n))?1 = ?W?tr.The above result stems from the fact that the spectral norm is not an invariant property of the matri- cization of a tensor, whereas the Euclidean (Frobenius) norm is. This observation leads us to further study the function ??.4 Optimization Method In this section, we explain how to solve the regularization problem associated with the regularizer (6). For this purpose, we first recall the alternating direction method of multipliers (ADMM) [4], which was conveniently applied to tensor trace norm regularization in [9, 22].4.1 Alternating Direction Method of Multipliers (ADMM) To explain ADMM we consider a more general problem comprising both tensor trace norm regular- ization and the regularizer we propose, min W { E (W) + ? N? n=1 ? ( W(n) )} (7) where E(W) is an error term such as ?y ? I(W)?22 and ? is a convex spectral function. It is defined, for every matrix A, as ?(A) = ?(?(A)) where ? is a gauge function, namely a function which is symmetric and invariant under permuta- tions. In particular, if ? is the ?1 norm then problem (7) corresponds to tensor trace norm regular- ization, whereas if ? = ???? it implements the proposed regularizer.Problem (7) poses some difficulties because the terms under the summation are interdependent, due to the different matricizations of W having the same elements rearranged in a different way. In order to overcome this difficulty, the authors of [9, 22] proposed to use ADMM as a natural way to decouple the regularization term appearing in problem (7). This strategy is based on the introduction of N auxiliary tensors, B1, . . . ,BN ? Rp1׷pN , so that problem (7) can be reformulated as2 min W,B1,...,BN { ? E (W) + N? n=1 ? ( Bn(n) ) : Bn = W , n ? [N ] } (8) The corresponding augmented Lagrangian (see e.g. [4, 5]) is given by L (W ,B,A) = 1 ? E (W) + N? n=1 ( ? ( Bn(n) )? ?An,W ?Bn?+ ? ?W ?Bn?22 ) , (9) where ?, ? denotes the scalar product between tensors, ? is a positive parameter and A1, . . .AN ? R p1׷pN are the set of Lagrange multipliers associated with the constraints in problem (8).ADMM is based on the following iterative scheme W 
[i+1] ? argmin W L ( W ,B[i],A[i] ) (10) B 
[i+1] n ? argmin Bn L ( W 
[i+1],B,A[i] ) (11) A 
[i+1] n ? A[i]n ? ( ?W [i+1] ?B[i+1]n ) . (12) Step (12) is straightforward, whereas step (10) is described in [9]. Here we focus on the step (11) since this is the only problem which involves function ?. We restate it with more explanatory notations as argmin Bn(n) { ? ( Bn(n) )? ?An(n),W(n) ?Bn(n)?+ ? ??W(n) ?Bn(n)??22 } .By completing the square in the right hand side, the solution of this problem is given by Bn(n) = prox 1 ? ? (X) := argmin Bn(n) { ? ? ( Bn(n) ) + ??Bn(n) ?X??22 } , where X =W(n)? 1?An(n). By using properties of proximity operators (see e.g. [2, Prop. 3.1]) we know that if ? is a gauge function then prox 1 ? ? (X) = UXdiag ( prox 1 ? ? (?(X)) ) V ?X , where UX and VX are the orthogonal matrices formed by the left and right singular vectors of X , respectively. If we choose ? = ??1 the associated proximity operator is the well-known soft thresholding operator, that is, prox 1 ? ??1 (?) = v, where the vector v has components vi = sign (?i) ( |?i| ? 1 ? ) .On the other hand, if we choose ? = ???? , we need to compute prox 1 ? ???? . In the next section, we describe a method to accomplish this task.4.2 Computation of the Proximity Operator To compute the proximity operator of the function 1 ? ???? we will use several properties of proximity calculus. First, we use the formula (see e.g. [7]) proxg? (x) = x? proxg (x) for g? = 1????? . Next we use a property of conjugate functions from [21, 13], which states that g() = 1 ? ???(?). Finally, by the scaling property of proximity operators [7], we have that proxg (x) = ? prox???? (?x).2The somewhat cumbersome notation B n(n) denotes the mode-n matricization of tensor Bn, that is, B n(n) = (Bn)(n).Algorithm 1 Computation of prox????(y) Input: y ? Rd, ?, ? > 0.Output: w ? Rd.Initialization: initial step ?0 = 2 , initial and best found solution w 0 = w = PS(y) ? Rd.for t = 1, 2, . . . do ? ? ?0? t Find k such that k ? argmax{??wt?11:r ?2 ? r : 0 ? r ? d} w1:k ? wt?11:k ? ? ( wt?11:k ( 1 + ???wt?11:k ?2 ) ? y1:k ) wk+1:d ? wt?1k+1:d ? ? ( wt?1k+1:d ? yk+1:d ) wt ? PS (w) If h(wt) < h(w) then w ? wt If Stopping Condition = True then terminate.end for It remains to compute the proximity operator of a multiple of the function ??? in equation (13), that is, for any ? > 0, y ? S , we wish to compute prox???? (y) = argmin w {h (w) : w ? S} where we have defined S := {w ? Rd : w1 ?    ? wd ? 0} and h (w) = ?w ? y?22 + ? d max r=0 {? ?w1:r?2 ? r} .In order to solve this problem we employ the projected subgradient method, see e.g. [6]. It consists in applying two steps at each iteration. First, it advances along a negative subgradient of the current solution; second, it projects the resultant point onto the feasible set S . In fact, according to [6], it is sufficient to compute an approximate projection, a step which we describe in Appendix D. To compute a subgradient of h at w, we first find any integer k such that k ? dargmax r=0 {? ?w1:r?2 ? r}.Then, we calculate a subgradient g of the function h at w by the formula gi = { ( 1 + ???w1:k?2 ) wi ? yi, if i ? k, wi ? yi, otherwise.Now we have all the ingredients to apply the projected subgradient method, which is summarized in Algorithm 1. In our implementation we stop the algorithm when an update of w is not made for more than 102 iterations.5 Experiments We have conducted a set of experiments to assess whether there is any advantage of using the pro- posed regularizer over the tensor trace norm for tensor completion3. First, we have designed a synthetic experiment to evaluate the performance of both approaches under controlled conditions.Then, we have tried both methods on two tensor completion real data problems. In all cases, we have used a validation procedure to tune the hyper-parameter ?, present in both approaches, among the values { 10j : j = ?7,?6, . . . , 1}. In our proposed approach there is one further hyper-parameter, ?, to be specified. It should take the value of the Euclidean norm of the underlying tensor. Since this is unknown, we propose to use the estimate ? = ?????w?22 + (mean(w)2 + var(w)) ( N? i=1 pi ?m ) , where m is the number of known entries and w ? Rm contains their values. This estimator assumes that each value in the tensor is sampled from N (mean(w), var(w)), where mean(w) and var(w) are the average and the variance of the elements in w.3The code is available at http://romera-paredes.com/code/tensor-completion ?5 ?4 ?3 ?2 ?1 0.0085 0.009 0.0095 0.01 0.0105 0.011 0.0115 log ?2 R M S E Tensor Trace Norm Proposed Regularizer 50 100 150 200 p S e c o n d s Tensor Trace Norm Proposed Regularizer Figure 1: Synthetic dataset: (Left) Root Mean Squared Error (RMSE) of tensor trace norm and the proposed regularizer. (Right) Running time execution for different sizes of the tensor.5.1 Synthetic Dataset We have generated a 3-order tensor W0 ? R402010 by the following procedure. First we gener- ated a tensor W with ranks (12, 6, 3) using Tucker decomposition (see e.g. [16]) Wi1,i2,i3 = 12? j1=1 6? j2=1 3? j3=1 Cj1,j2,j3M (1)i1,j1M (2) i2,j2 M (3) i3,j3 , (i1, i2, i3) ? [40] [20] [10] where each entry of the Tucker decomposition components is sampled from the standard Gaussian distribution N (0, 1). We then created the ground truth tensor W0 by the equation W0i1,i2,i3 = Wi1,i2,i3 ?mean(W)? Nstd(W) + ?i1,i2,i3 where mean(W) and std(W) are the mean and standard deviation of the elements of W , N is the total number of elements of W , and the ?i1,i2,i3 are i.i.d. Gaussian random variables with zero mean and variance ?2. We have randomly sampled 10% of the elements of the tensor to compose the training set, 45% for the validation set, and the remaining 45% for the test set. After repeating this process 20 times, we report the average results in Figure 1 (Left). Having conducted a paired t-test for each value of ?2, we conclude that the visible differences in the performances are highly significant, obtaining always p-values less than 0.01 for ?2 ? 10?2.Furthermore, we have conducted an experiment to test the running time of both approaches. We have generated tensors W0 ? Rppp for different values of p ? {20, 40, . . . , 200}, following the same procedure as outlined above. The results are reported in Figure 1 (Right). For low values of p, the ratio between the running time of our approach and that of the trace norm regularization method is quite high. For example in the lowest value tried for p in this experiment, p = 20, this ratio is 22.661. However, as the volume of the tensor increases, the ratio quickly decreases. For example, for p = 200, the running time ratio is 1.9113. These outcomes are expected because when p is low, the most demanding routine in our method is the one described in Algorithm 1, where each iteration is of order O (p) and O ( p2 ) in the best and worst case, respectively. However, as p increases the singular value decomposition routine, which is common to both methods, becomes the most demanding because it has a time complexity O ( p3 ) 
[10]. Therefore, we can conclude that even though our approach is slower than the trace norm based method, this difference becomes much smaller as the size of the tensor increases.5.2 School Dataset The first real dataset we have tried is the Inner London Education Authority (ILEA) dataset. It is composed of examination marks ranging from 0 to 70, of 15362 students who are described by a set of attributes such as school and ethnic group. Most of these attributes are categorical, thereby we can think of exam mark prediction as a tensor completion problem where each of the modes corresponds to a categorical attribute. In particular, we have used the following attributes: school (139), gender (2), VR-band (3), ethnic (11), and year (3), leading to a 5-order tensor W ? R13923113.4000 6000 8000 10000 12000 10.4 10.6 10.8 11.2 11.4 11.6 m (Training Set Size) R M S E Tensor Trace Norm Proposed Regularizer 2 4 6 8 10 12 14 16 x 10 m (Training Set Size) R M S E Tensor Trace Norm Proposed Regularizer Figure 2: Root Mean Squared Error (RMSE) of tensor trace norm and the proposed regularizer for ILEA dataset (Left) and Ocean video (Right).We have selected randomly 5% of the instances to make the test set and another 5% of the instances for the validation set. From the remaining instances, we have randomly chosenm of them for several values of m. This procedure has been repeated 20 times and the average performance is presented in Figure 2 (Left). There is a distinguishable improvement of our approach with respect to tensor trace norm regularization for values of m > 7000. To check whether this gap is significant, we have conducted a set of paired t-tests in this regime. In all these cases we obtained a p-value below 0.01.5.3 Video Completion In the second real-data experiment we have performed a video completion test. Any video can be treated as a 4-order tensor: width  height  RGB  video length, so we can use tensor completion algorithms to rebuild a video from a few inputs, a procedure that can be useful for compression purposes. In our case, we have used the Ocean video, available at [17]. This video sequence can be treated as a tensor W ? R160112332. We have randomly sampled m tensors elements as training data, 5% of them as validation data, and the remaining ones composed the test set. After repeating this procedure 10 times, we present the average results in Figure 2 (Right). The proposed approach is noticeably better than the tensor trace norm in this experiment. This apparent outcome is strongly supported by the paired t-tests which we run for each value of m, obtaining always p-values below 0.01, and for the cases m > 5 104, we obtained p-values below 10?6.6 Conclusion In this paper, we proposed a convex relaxation for the average of the rank of the matricizations of a tensor. We compared this relaxation to a commonly used convex relaxation used in the context of tensor completion, which is based on the trace norm. We proved that this second relaxation is not tight and argued that the proposed convex regularizer may be advantageous. Our numerical experience indicates that our method consistently improves in terms of estimation error over tensor trace norm regularization, while being computationally comparable on the range of problems we considered. In the future it would be interesting to study methods to speed up the computation of the proximity operator of our regularizer and investigate its utility in tensor learning problems beyond tensor completion such as multilinear multitask learning [20].Acknowledgements We wish to thank Andreas Argyriou, Raphael Hauser, Charles Micchelli and Marco Signoretto for useful comments. A valuable contribution was made by one of the anonymous referees. Part of this work was supported by EPSRC Grant EP/H017178/1, EP/H027203/1 and Royal Society Interna- tional Joint Project 2012/R2.
[1] A. Argyriou, R. Foygel and N. Srebro. Sparse Prediction with the k-Support Norm. Advances in Neural Information Processing Systems 25, pages 14661474, 2012.
[2] A. Argyriou, C.A. Micchelli, M. Pontil, L. Shen and Y. Xu. Efficient first order methods for linear com- posite regularizers. arXiv:1104.1436, 2011.
[3] R. Bhatia. Matrix Analysis. Springer Verlag, 1997.
[4] D.P. Bertsekas, J.N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, 1989.
[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1 122, 2011.
[6] S. Boyd, L. Xiao, A. Mutapcic. Subgradient methods, Stanford University, 2003.
[7] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-Point Al- gorithms for Inverse Problems in Science and Engineering (H. H. Bauschke et al. Eds), pages 185212, Springer, 2011.
[8] M. Fazel, H. Hindi, and S. Boyd. A rank minimization heuristic with application to minimum order system approximation. Proc. American Control Conference, Vol. 6, pages 47344739, 2001.
[9] S. Gandy, B. Recht, I. Yamada. Tensor completion and low-n-rank tensor recovery via convex optimiza- tion. Inverse Problems, 27(2), 2011.
[10] G. H. Golub, C. F. Van Loan. Matrix Computations. 3rd Edition. Johns Hopkins University Press, 1996.
[11] Z. Harchaoui, M. Douze, M. Paulin, M. Dudik, J. Malick. Large-scale image classification with trace- norm regularization. IEEE Conference on Computer Vision & Pattern Recognition (CVPR), pages 3386 3393, 2012.
[12] J-B. Hiriart-Urruty and C. Lemarechal. Convex Analysis and Minimization Algorithms, Part I. Springer, 1996.
[13] J-B. Hiriart-Urruty and C. Lemarechal. Convex Analysis and Minimization Algorithms, Part II. Springer, 1993.
[14] R.A. Horn and C.R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 2005.
[15] A. Karatzoglou, X. Amatriain, L. Baltrunas, N. Oliver. Multiverse recommendation: n-dimensional ten- sor factorization for context-aware collaborative filtering. Proc. 4th ACM Conference on Recommender Systems, pages 7986, 2010.
[16] T.G. Kolda and B.W. Bade. Tensor decompositions and applications. SIAM Review, 51(3):455500, 2009.
[17] J. Liu, P. Musialski, P. Wonka, J. Ye. Tensor completion for estimating missing values in visual data.Proc. 12th International Conference on Computer Vision (ICCV), pages 21142121, 2009.
[18] Y. Nesterov. Gradient methods for minimizing composite objective functions. ECORE Discussion Paper, 2007/96, 2007.
[19] B. Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12:3413 3430, 2009.
[20] B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze and M. Pontil. Multilinear multitask learning. Proc.30th International Conference on Machine Learning (ICML), pages 14441452, 2013.
[21] N. Z. Shor. Minimization Methods for Non-differentiable Functions. Springer, 1985.
[22] M. Signoretto, Q. Tran Dinh, L. De Lathauwer, J.A.K. Suykens. Learning with tensors: a framework based on convex optimization and spectral regularization. Machine Learning, to appear.
[23] M. Signoretto, R. Van de Plas, B. De Moor, J.A.K. Suykens. Tensor versus matrix completion: a com- parison with application to spectral data. IEEE Signal Processing Letters, 18(7):403406, 2011.
[24] N. Srebro, J. Rennie and T. Jaakkola. Maximum margin matrix factorization. Advances in Neural Infor- mation Processing Systems (NIPS) 17, pages 13291336, 2005.
[25] R. Tomioka, K. Hayashi, H. Kashima, J.S.T. Presto. Estimation of low-rank tensors via convex optimiza- tion. arXiv:1010.0789, 2010.
[26] R. Tomioka and T. Suzuki. Convex tensor decomposition via structured Schatten norm regularization.arXiv:1303.6370, 2013.
[27] R. Tomioka, T. Suzuki, K. Hayashi, H. Kashima. Statistical performance of convex tensor decomposition.Advances in Neural Information Processing Systems (NIPS) 24, pages 972980, 2013.
-----1
[1] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning.In NIPS, 2002.
[2] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector ma- chines. Journal of Machine Learning Research, 2:265292, 2001.
[3] T. M. T. Do and T. Artie`res. Large margin training for hidden Markov models with partially observed states. In ICML, 2009.
[4] A. Farhadi and M. K. Tabrizi. Learning to recognize activities from the wrong view point. In ECCV, 2008.
[5] P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.
[6] R. Gopalan and J. Sankaranarayanan. Max-margin clustering: Detecting margins from projections of points on lines. In CVPR, 2011.
[7] J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100108, 1979.
[8] M. Hoai and A. Zisserman. Discriminative sub-categorization. In CVPR, 2013.
[9] C.-F. Hsu, J. Caverlee, and E. Khabiri. Hierarchical comments-based clustering. In SAC, 2011.
[10] H. Izadinia and M. Shah. Recognizing complex events using large margin joint low-level event model. In ECCV, 2012.
[11] A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
[12] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.
[13] A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, 2008.
[14] M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. In NIPS, 2010.
[15] T. O. Kvalseth. Entropy and correlation: Some comments. IEEE Transactions on Systems, Man and Cybernetics, 17(3):517519, 1987.
[16] Y.-F. Li, I. W. Tsang, J. T.-Y. Kwok, and Z.-H. Zhou. Tighter and convex maximum margin clustering. In AISTATS, 2009.
[17] J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In CVPR, 2011.
[18] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.
[19] P. Over, G. Awad, J. Fiscus, A. F. Smeaton, W. Kraaij, and G. Quenot. TRECVID 2011  an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID, 2011.
[20] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation.In ACM Multimedia, 2007.
[21] W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statis- tical Association, 66(336):846850, 1971.
[22] R. Redner and H. Walker. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195239, 1984.
[23] M. D. Rodriguez, J. Ahmed, and M. Shah. Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, 2008.
[24] S. Sadanand and J. J. Corso. Action Bank: A high-level representation of activity in video. In CVPR, 2012.
[25] F. Schroff, C. L. Zitnick, and S. Baker. Clustering videos by location. In BMVC, 2009.
[26] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In ICPR, 2004.
[27] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888905, 2000.
[28] A. Vahdat and G. Mori. Handling uncertain tags in visual recognition. In ICCV, 2013.
[29] H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In NIPS, 2006.
[30] Y. Wang and L. Cao. Discovering latent clusters from geotagged beach images. In MMM, 2013.
[31] Y. Wang and G. Mori. Max-margin hidden conditional random fields for human action recognition. In CVPR, 2009.
[32] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In NIPS, 2004.
[33] L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. In AAAI, 2005.
[34] W. Yang and G. Toderici. Discriminative tag learning on YouTube videos with latent sub-tags. In CVPR, 2011.
[35] W. Yang, Y. Wang, A. Vahdat, and G. Mori. Kernel latent SVM for visual recognition. In NIPS, 2012.
[36] C.-N. J. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.
[37] K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. In ICML, 2007.
[38] B. Zhao, F. Wang, and C. Zhang. Efficient multiclass maximum margin clustering. In ICML, 2008.
-----1
[1] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics.Kluwer Academic Boston, 2004.
[2] M. Besserve, D. Janzing, N. Logothetis, and B. Scholkopf. Finding dependencies between frequencies with the kernel cross-spectral density. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 20802083, 2011.
[3] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis.Machine Learning, 66(2-3):259294, 2007.
[4] D. Brillinger. Time series: data analysis and theory. Holt, Rinehart, and Winston New York,, 1974.
[5] J.-F. Cardoso. High-order contrasts for independent component analysis. Neural computation, 11(1):157 192, 1999.
[6] K. Fukumizu, F. Bach, and A. Gretton. Statistical convergence of kernel CCA. In Advances in Neural Information Processing Systems, pages 387394, 2006.
[7] K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res., 5:7399, 2004.
[8] K. Fukumizu, A. Gretton, G. R. Lanckriet, B. Scholkopf, and B. K. Sriperumbudur. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Advances in neural information processing systems, pages 17501758, 2009.
[9] K. Fukumizu, A. Gretton, X. Sun, and B. Scholkopf. Kernel Measures of Conditional Dependence. In Advances in Neural Information Processing Systems, pages 489496, 2008.
[10] G. B. Giannakis and J. M. Mendel. Identification of nonminimum phase systems using higher order statistics. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(3):360377, 1989.
[11] A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Scholkopf, and A. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20, pages 585592. 2008.
[12] A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sripe- rumbudur. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems, pages 12141222, 2012.
[13] A. Hyvarinen, S. Shimizu, and P. O. Hoyer. Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In Proceedings of the 25th international conference on Machine learning, pages 424431. ACM, 2008.
[14] C. Leslie, E. Eskin, and W. Noble. The spectrum kernel: a string kernel for SVM protein classification.In Pac Symp Biocomput., 2002.
[15] C. Nikias and A. Petropulu. Higher-Order Spectra Analysis - A Non-linear Signal Processing Framework.Prentice-Hall PTR, Englewood Cliffs, NJ, 1993.
[16] D. Pantazis, T. Nichols, S. Baillet, and R. Leahy. A comparison of random field theory and permutation methods for the statistical analysis of MEG data. NeuroImage, 25:383  394, 2005.
[17] J. Pearl. Causality - Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK, 2000.
[18] J. Peters, D. Janzing, and B. Scholkopf. Causal inference on time series using structural equation models.In Advances in Neural Information Processing Systems, 2013.
[19] B. Scholkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[20] V. Sindhwani, H. Q. Minh, and A. C. Lozano. Scalable matrix-valued kernel learning for high-dimensional nonlinear multivariate regression and granger causality. In Proceedings of the 29th Conference on Uncer- tainty in Artificial Intelligence, 2013.
[21] K. Whittingstall and N. Logothetis. Frequency-band coupling in surface EEG reflects spiking activity in monkey visual cortex. Neuron, 64:2819, 2009 Oct 29.
[22] X. Zhang, L. Song, A. Gretton, and A. Smola. Kernel Measures of Independence for Non-IID Data. In Advances in neural information processing systems 21, pages 19371944, 2009.
-----1
[1] A. J. Smola, A. Gretton, L. Song, and B. Scholkopf. A Hilbert space embedding for dis- tributions. In Proceedings of the International Conference on Algorithmic Learning Theory, volume 4754, pages 1331. Springer, 2007.
[2] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Scholkopf. Injective Hilbert space embeddings of probability measures. In Proc. Annual Conf. Computational Learning Theory, pages 111122, 2008.
[3] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Scholkopf, and A. J. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20, pages 585592, Cambridge, MA, 2008. MIT Press.
[4] L. Song, A. Gretton, D. Bickson, Y. Low, and C. Guestrin. Kernel belief propagation. In Proc.Intl. Conference on Artificial Intelligence and Statistics, volume 10 of JMLR workshop and conference proceedings, 2011.
[5] L. Song, B. Boots, S. Siddiqi, G. Gordon, and A. J. Smola. Hilbert space embeddings of hidden markov models. In International Conference on Machine Learning, 2010.
[6] L. Song, A. Parikh, and E.P. Xing. Kernel embeddings of latent tree graphical models. In Advances in Neural Information Processing Systems, volume 25, 2011.
[7] L. Song, M. Ishteva, H. Park, A. Parikh, and E. Xing. Hierarchical tensor decomposition of latent tree graphical models. In International Conference on Machine Learning (ICML), 2013.
[8] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, 2004.
[9] L. Song, J. Huang, A. J. Smola, and K. Fukumizu. Hilbert space embeddings of conditional distributions. In Proceedings of the International Conference on Machine Learning, 2009.
[10] Tamara. G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455500, 2009.
[11] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications, 31(4):20292054, 2010.
[12] I Oseledets. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5):2295 2317, 2011.
[13] L. Rosasco, M. Belkin, and E.D. Vito. On learning with integral operators. Journal of Machine Learning Research, 11:905934, 2010.
-----1
[1] Bengt Von Bahr. On the convergence of moments in the central limit theorem. The Annals of Mathematical Statistics, 36(3):pp. 808818, 1965.
[2] L. Baringhaus and C. Franz. On a new multivariate two-sample test. J. Multivariate Anal., 88:190206, 2004.
[3] A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, 2004.
[4] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49(1):122136, 1941.
[5] B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.
[6] M. Fromont, B. Laurent, M. Lerasle, and P. Reynaud-Bouret. Kernels based tests with non- asymptotic bootstrap approaches for two-sample problems. In COLT, 2012.
[7] A Gretton, K Fukumizu, Z Harchaoui, and BK Sriperumbudur. A fast, consistent kernel two- sample test. In Advances in Neural Information Processing Systems 22, pages 673681, 2009.
[8] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Scholkopf, and A. J. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20, pages 585592, Cambridge, MA, 2008. MIT Press.
[9] A Gretton, B Sriperumbudur, D Sejdinovic, H Strathmann, S Balakrishnan, M Pontil, and K Fukumizu. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems 25, pages 12141222, 2012.
[10] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander Smola. A kernel two-sample test. J. Mach. Learn. Res., 13:723773, March 2012.
[11] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander J.Smola. A kernel method for the two-sample-problem. In NIPS, pages 513520, 2006.
[12] Z. Harchaoui, F. Bach, and E. Moulines. Testing for homogeneity with kernel Fisher discrimi- nant analysis. In NIPS, pages 609616. MIT Press, Cambridge, MA, 2008.
[13] Norman Lloyd Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Continuous uni- variate distributions. Distributions in statistics. Wiley, 2nd edition, 1994.
[14] A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan. A scalable bootstrap for massive data.Journal of the Royal Statistical Society, Series B, In Press.
[15] Andrey N Kolmogorov. Sulla determinazione empirica di una legge di distribuzione. Giornale dellIstituto Italiano degli Attuari, 4(1):8391, 1933.
[16] B Scholkopf. Support vector learning. Oldenbourg, Munchen, Germany, 1997.
[17] D. Sejdinovic, A. Gretton, B. Sriperumbudur, and K. Fukumizu. Hypothesis testing using pairwise distances and associated kernels. In ICML, 2012.
[18] R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, 1980.
[19] Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions. The Annals of Mathematical Statistics, 19(2):279281, 1948.
[20] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Scholkopf. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:15171561, 2010.
[21] G. Szekely and M. Rizzo. Testing for equal distributions in high dimension. InterStat, (5), November 2004.
[22] G. Szekely, M. Rizzo, and N. Bakirov. Measuring and testing dependence by correlation of distances. Ann. Stat., 35(6):27692794, 2007.
[23] M. Yamada, T. Suzuki, T. Kanamori, H. Hachiya, and M. Sugiyama. Relative density-ratio estimation for robust distribution comparison. Neural Computation, 25(5):13241370, 2013.
-----1
[1] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463482, 2002.
[2] S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems 23, pages 163171, 2010.
[3] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings 13th ACM International Conference on Information and Knowledge Management (CIKM), pages 7887. ACM, 2004.
[4] O. Dekel. Distribution-calibrated hierarchical classification. In Advances in Neural Informa- tion Processing Systems 22, pages 450458. 2009.
[5] O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning, pages 2735, 2004.
[6] J. Deng, S. Satheesh, A. C. Berg, and F.-F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In Advances in Neural Information Processing Systems 24, pages 567575, 2011.
[7] S. Dumais and H. Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference, pages 256263, 2000.
[8] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:18711874, 2008.
[9] T. Gao and D. Koller. Discriminative learning of relaxed hierarchy for large-scale visual recog- nition. In IEEE International Conference on Computer Vision (ICCV), pages 20722079, 2011.
[10] S. Gopal and Y. Y. A. Niculescu-Mizil. Regularization framework for large scale hierarchical classification. In Large Scale Hierarchical Classification, ECML/PKDD Discovery Challenge Workshop, 2012.
[11] S. Gopal, Y. Yang, B. Bai, and A. Niculescu-Mizil. Bayesian models for large-scale hierarchi- cal classification. In Advances in Neural Information Processing Systems 25, 2012.
[12] Y. Guermeur. Sample complexity of classifiers taking values in Rq , application to multi-class SVMs. Communications in Statistics - Theory and Methods, 39, 2010.
[13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer New York Inc., 2001.
[14] T.-Y. Liu, Y. Yang, H. Wan, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Support vector machines classification with a very large-scale taxonomy. SIGKDD, 2005.
[15] H. Malik. Improving hierarchical SVMs by hierarchy flattening and lazy classification. In 1st Pascal Workshop on Large Scale Hierarchical Classification, 2009.
[16] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classification. In Computer Vision and Pattern Recognition, pages 3482 3489, 2012.
[17] M. Schervish. Theory of Statistics. Springer Series in Statistics. Springer New York Inc., 1995.
[18] X. Wang and B.-L. Lu. Flatten hierarchies for large-scale hierarchical text categorization. In 5th International Conference on Digital Information Management, pages 139144, 2010.
[19] K. Q. Weinberger and O. Chapelle. Large margin taxonomy embedding for document cat- egorization. In Advances in Neural Information Processing Systems 21, pages 17371744, 2008.
[20] Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd annual International ACM SIGIR conference, pages 4249. ACM, 1999.
[21] J. Zhang, L. Tang, and H. Liu. Automatically adjusting content taxonomies for hierarchical classification. In Proceedings of the 4th Workshop on Text Mining, 2006.
-----1
[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment., 10, 2008.
[2] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422426, 1970.
[3] L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman. Exact and approximate mem- bership testers. In Proceedings of the tenth annual ACM symposium on Theory of computing, STOC 78, pages 5965, New York, NY, USA, 1978. ACM.
[4] Y.-N. Chen and H.-T. Lin. Feature-aware label space dimension reduction for multi-label classification. In NIPS, pages 15381546, 2012.
[5] W. Cheng and E. Hullermeier. Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2-3):211225, 2009.
[6] K. Christensen, A. Roginsky, and M. Jimeno. A new analysis of the false positive rate of a bloom filter. Inf. Process. Lett., 110(21):944949, Oct. 2010.
[7] O. Dekel and O. Shamir. Multiclass-multilabel classification with more classes than examples.volume 9, pages 137144, 2010.
[8] K. Dembczynski, W. Cheng, and E. Hullermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, pages 279286, 2010.
[9] K. Dembczynski, W. Waegeman, W. Cheng, and E. Hullermeier. On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1-2):545, 2012.
[10] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:18711874, June 2008.
[11] B. Hariharan, S. V. N. Vishwanathan, and M. Varma. Large Scale Max-Margin Multi-Label Classification with Prior Knowledge about Densely Correlated Labels. In Proceedings of In- ternational Conference on Machine Learning, 2010.
[12] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing.In NIPS, pages 772780, 2009.
[13] RCV1. RCV1 Dataset, http://www.daviddlewis.com/resources/testcollections/rcv1/.
[14] J. Read, B. Pfah ringer, G. Holmes, and E. Frank. Classifier chains for multi-label classi- fication. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, ECML PKDD 09, pages 254269, Berlin, Heidelberg, 2009.Springer-Verlag.
[15] F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, 24(9):25082542, 2012.
[16] G. Tsoumakas, I. Katakis, and I. Vlahavas. A Review of Multi-Label Classification Meth- ods. In Proceedings of the 2nd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD 2006), pages 99109, 2006.
[17] Wikipedia. Wikipedia Dataset, http://lshtc.iit.demokritos.gr/.
-----1
[1] L. Bauwens, S. Laurent, and J. V. K. Rombouts. Multivariate GARCH models: a survey. Journal of Applied Econometrics, 21(1):79109, 2006.
[2] T. Bedford and R. M. Cooke. Probability density decomposition for conditionally dependent random variables modeled by vines. Annals of Mathematics and Artificial Intelligence, 32(1-4):245268, 2001.
[3] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2007.
[4] T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307327, 1986.
[5] T. Bollerslev, R. F. Engle, and J. M. Wooldridge. A capital asset pricing model with time-varying covari- ances. The Journal of Political Economy, pages 116131, 1988.
[6] G. Elidan. Copulas and machine learning. In Invited survey to appear in the proceedings of the Copulae in Mathematical and Quantitative Finance workshop, 2012.
[7] R. F. Engle and K. F. Kroner. Multivariate simultaneous generalized ARCH. Econometric theory, 11(1):122150, 1995.
[8] E. B. Fox and D. B. Dunson. Bayesian nonparametric covariance regression. arXiv:1101.2017, 2011.
[9] J. M. Hernandez-Lobato, D. Hernandez-Lobato, and A. Suarez. GARCH processes with non-parametric innovations for market risk estimation. In Artificial Neural Networks ICANN 2007, volume 4669 of Lecture Notes in Computer Science, pages 718727. Springer Berlin Heidelberg, 2007.
[10] H. Joe. Asymptotic efficiency of the two-stage estimation method for copula-based models. Journal of Multivariate Analysis, 94(2):401419, 2005.
[11] E. Jondeau and M. Rockinger. The Copula-GARCH model of conditional dependencies: An international stock market application. Journal of International Money and Finance, 25(5):827853, 2006.
[12] D. Lopez-Paz, J. M. Hernandez-Lobato, and Z. Ghahramani. Gaussian process vine copulas for multi- variate dependence. In S Dasgupta and D McAllester, editors, JMLR W&CP 28(2): Proceedings of The 30th International Conference on Machine Learning, pages 1018. JMLR, 2013.
[13] T. P. Minka. Expectation Propagation for approximate Bayesian inference. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 362369, 2001.
[14] A. Naish-Guzman and S. Holden. The generalized fitc approximation. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 10571064. MIT Press, Cambridge, MA, 2008.
[15] A. J. Patton. Modelling asymmetric exchange rate dependence. International Economic Review, 47(2):527556, 2006.
[16] A. Ranaldo and P. Soderlind. Safe haven currencies. Review of Finance, 14(3):385407, 2010.
[17] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
[18] A. Sklar. Fonctions de repartition a` n dimensions et leurs marges. Publ. Inst. Statis. Univ. Paris, 8(1):229 231, 1959.
[19] E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. In Y. Weiss, B. Scholkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 12571264. MIT Press, Cambridge, MA, 2006.
[20] Y. K. Tse and A. K. C. Tsui. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics, 20(3):351362, 2002.
[21] M. A. J. van Gerven, B. Cseke, F. P. de Lange, and T. Heskes. Efficient bayesian multivariate fmri analysis using a sparsifying spatio-temporal prior. NeuroImage, 50(1):150161, 2010.
[22] A. G. Wilson and Z. Ghahramani. Generalised Wishart processes. In F. Cozman and A. Pfeffer, editors, Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelli- gence (UAI-11), Barcelona, Spain, 2011. AUAI Press.
[23] Y. Wu, J. M. Hernandez-Lobato, and Z. Ghahramani. Dynamic covariance models for multivariate fi- nancial time series. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 558566. JMLR Workshop and Confer- ence Proceedings, 2013.
-----1
[1] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. MIT Press, 2006.
[2] R. Turner, M. P. Deisenroth, and C. E. Rasmussen, State-space inference and learning with Gaussian processes, in 13th International Conference on Artificial Intelligence and Statistics, ser. W&CP, Y. W.Teh and M. Titterington, Eds., vol. 9, Chia Laguna, Sardinia, Italy, May 1315 2010, pp. 868875.
[3] C. Andrieu, A. Doucet, and R. Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 72, no. 3, pp. 269342, 2010.
[4] F. Lindsten, M. Jordan, and T. B. Schon, Ancestor sampling for particle Gibbs, in Advances in Neural Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 26002608.
[5] M. Deisenroth, R. Turner, M. Huber, U. Hanebeck, and C. Rasmussen, Robust filtering and smoothing with Gaussian processes, IEEE Transactions on Automatic Control, vol. 57, no. 7, pp. 1865 1871, july 2012.
[6] M. Deisenroth and S. Mohamed, Expectation Propagation in Gaussian process dynamical systems, in Advances in Neural Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 26182626.
[7] Z. Ghahramani and S. Roweis, Learning nonlinear dynamical systems using an EM algorithm, in Ad- vances in Neural Information Processing Systems 11, M. J. Kearns, S. A. Solla, and D. A. Cohn, Eds.MIT Press, 1999.
[8] J. Wang, D. Fleet, and A. Hertzmann, Gaussian process dynamical models, in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Scholkopf, and J. Platt, Eds. Cambridge, MA: MIT Press, 2006, pp. 14411448.
[9] J. S. Liu, Monte Carlo Strategies in Scientific Computing. Springer, 2001.
[10] A. Doucet and A. Johansen, A tutorial on particle filtering and smoothing: Fifteen years later, in The Oxford Handbook of Nonlinear Filtering, D. Crisan and B. Rozovsky, Eds. Oxford University Press, 2011.
[11] F. Gustafsson, Particle filter theory and practice with positioning applications, IEEE Aerospace and Electronic Systems Magazine, vol. 25, no. 7, pp. 5382, 2010.
[12] M. K. Pitt and N. Shephard, Filtering via simulation: Auxiliary particle filters, Journal of the American Statistical Association, vol. 94, no. 446, pp. 590599, 1999.
[13] F. Lindsten and T. B. Schon, Backward simulation methods for Monte Carlo statistical inference, Foun- dations and Trends in Machine Learning, vol. 6, no. 1, pp. 1143, 2013.
[14] F. Lindsten and T. B. Schon, On the use of backward simulation in the particle Gibbs sampler, in Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, Mar. 2012.
[15] D. K. Agarwal and A. E. Gelfand, Slice sampling for simulation based fitting of spatial data models, Statistics and Computing, vol. 15, no. 1, pp. 6169, 2005.
[16] E. Snelson and Z. Ghahramani, Sparse Gaussian processes using pseudo-inputs, in Advances in Neural Information Processing Systems (NIPS), Y. Weiss, B. Scholkopf, and J. Platt, Eds., Cambridge, MA, 2006, pp. 12571264.
[17] J. Quinonero-Candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, Journal of Machine Learning Research, vol. 6, pp. 19391959, 2005.
[18] M. Seeger, C. Williams, and N. Lawrence, Fast Forward Selection to Speed Up Sparse Gaussian Process Regression, in Artificial Intelligence and Statistics 9, 2003.
[19] Y. Chen, M. Welling, and A. Smola, Super-samples from kernel herding, in Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), P. Grunwald and P. Spirtes, Eds. AUAI Press, 2010.
[20] M. Deisenroth, Efficient reinforcement learning using Gaussian processes, Ph.D. dissertation, Karl- sruher Institut fur Technologie, 2010.
-----1
[1] Eric Brochu, T Brochu, and Nando de Freitas. A Bayesian interactive optimization approach to procedural animation design. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2010.
[2] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In ICML, 2010.
[3] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for gen- eral algorithm configuration. In Learning and Intelligent Optimization 5, 2011.
[4] M. A. Osborne, R. Garnett, and S. J. Roberts. Gaussian processes for global optimization. In LION, 2009.
[5] James Bergstra, Remi Bardenet, Yoshua Bengio, and Balazs Kegl. Algorithms for hyper-parameter opti- mization. In NIPS. 2011.
[6] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, 2012.
[7] James Bergstra, Daniel Yamins, and David Cox. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, 2013.
[8] Carl E. Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[9] Iain Murray and Ryan P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models.In NIPS. 2010.
[10] Andre G. Journel and Charles J. Huijbregts. Mining Geostatistics. Academic press London, 1978.
[11] Pierre Goovaerts. Geostatistics for natural resources evaluation. Oxford University Press, 1997.
[12] Matthias Seeger, Yee-Whye Teh, andMichael I. Jordan. Semiparametric latent factor models. In AISTATS, 2005.
[13] Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In NIPS, 2008.
[14] Mauricio A Alvarez and Neil D Lawrence. Computationally efficient convolved multiple output Gaussian processes. Journal of Machine Learning Research, 12, 2011.
[15] Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2, 1978.
[16] Matthew Hoffman, Eric Brochu, and Nando de Freitas. Portfolio allocation for Bayesian optimization. In UAI, 2011.
[17] Donald R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21, 2001.
[18] Philipp Hennig and Christian J. Schuler. Entropy search for information-efficient global optimization.Journal of Machine Learning Research, 13, 2012.
[19] Remi Bardenet, Matyas Brendel, Balazs Kegl, and Miche`le Sebag. Collaborative hyperparameter tuning.In ICML, 2013.
[20] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[21] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.
[22] Pierre Sermanet, Soumith Chintala, and Yann LeCun. Convolutional neural networks applied to house numbers digit classification. In ICPR, 2012.
[23] Adam Coates, Honglak Lee, and Andrew Y Ng. An analysis of single-layer networks in unsupervised feature learning. AISTATS, 2011.
[24] Robert Gens and Pedro Domingos. Discriminative learning of sum-product networks. In NIPS, 2012.
[25] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Im- proving neural networks by preventing co-adaptation of feature detectors. arXiv preprint, 2012.
[26] Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Unsupervised feature learning for RGB-D based object recognition. ISER,, 2012.
[27] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. NIPS, 2008.
[28] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. An algorithmic framework for performing collaborative filtering. In ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, 1999.
[29] Matthew Hoffman, David M. Blei, and Francis Bach. Online learning for latent Dirichlet allocation. In NIPS, 2010.
-----1
[1] Bach, F. R. and Jordan, M. I. Predictive low-rank decomposition for kernel methods. ICML, pp. 3340, New York, 2005..
[2] Bo, L. and Sminchisescu, C. Twin gaussian processes for structured prediction. IJCV, 87:28 52, 2010.
[3] Y. Cao, M.A. Brubaker, D.J. Fleet, and A. Hertzmann. Project page: supplementary mate- rial and software for efficient optimization for sparse gaussian process regression. www.cs.toronto.edu/caoy/opt_sgpr, 2013.
[4] Csato, L. and Opper, M. Sparse on-line gaussian processes. Neural Comput., 14:641668, 2002.
[5] Dalal, N. and Triggs, B. Histograms of oriented gradients for human detection. IEEE CVPR, pp. 886893, 2005.
[6] S.S. Keerthi and W. Chu. A matching pursuit approach to sparse gaussian process regression.NIPS 18, pp. 643650. 2006.
[7] Lawrence, N.D., Seeger, M., and Herbrich, R. Fast sparse gaussian process methods: The informative vector machine. NIPS 15, pp. 609616. 2003.
[8] Lee, J.J. Libpmk: A pyramid match toolkit. TR: MIT-CSAIL-TR-2008-17, MIT CSAIL, 2008.URL http://hdl.handle.net/1721.1/41070.
[9] Quinonero-Candela, J. and Rasmussen, C.E. A unifying view of sparse approximate gaussian process regression. JMLR, 6:19391959, 2005.
[10] Rasmussen, C.E. and Williams, C.K.I. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006.
[11] Seeger, M., Williams, C.K.I., Lawrence, N.D., and Dp, S.S. Fast forward selection to speed up sparse gaussian process regression. AI & Stats. 9, 2003.
[12] A.J. Smola and P. Bartlett. Sparse greedy gaussian process regression. In Advances in Neural Information Processing Systems 13, pp. 619625. 2001.
[13] Snelson, Edward and Ghahramani, Zoubin. Sparse gaussian processes using pseudo-inputs.NIPS 18, pp. 12571264. 2006.
[14] Titsias, M.K. Variational learning of inducing variables in sparse gaussian processes. JMLR, 5:567574, 2009.
[15] Christian Walder, Kwang In Kim, and Bernhard Scholkopf. Sparse multiscale gaussian process regression. ICML pp. 11121119, New York, NY, USA, 2008.
-----0
Bishop, C. M. (1999). Variational principal components. In In Proceedings Ninth International Conference on Artificial Neural Networks, ICANN?99, pages 509514.Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1st ed. 2006 edition.
MacKay, D. J. (1994). Bayesian non-linear modelling for the energy prediction competition. SHRAE Transactions, 4:448472.
Murray, I. and Adams, R. P. (2010). Slice sampling covariance hyperparameters of latent Gaussian models. In Lafferty, J., Williams, C. K. I., Zemel, R., Shawe-Taylor, J., and Culotta, A., editors, Advances in Neural Information Processing Systems 23, pages 17231731.
Neal, R. M. (1998). Assessing relevance determination methods using delve. Neural Networksand Machine Learning, pages 97129.Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press.
Snelson, E. and Ghahramani, Z. (2006a). Sparse Gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems 18, pages 12591266. MIT Press.
Snelson, E. and Ghahramani, Z. (2006b). Variable noise and dimensionality reduction for sparse Gaussian processes. In Uncertainty in Artificial Intelligence.Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211244.Titsias, M. K. (2009). Variational learning of inducing variables in sparse Gaussian processes. In Proc. of the 12th International Workshop on AI Stats.
Titsias, M. K. and Lawrence, N. D. (2010). Bayesian Gaussian process latent variable model. Journal of Machine Learning Research Proceedings Track, 9:844851.
Vivarelli, F. and Williams, C. K. I. (1998). Discovering hidden features with Gaussian processes regression. In Advances in Neural Information Processing Systems, pages 613619.Weinberger, K. Q. and Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res., 10:207244.
Xing, E., Ng, A., Jordan, M., and Russell, S. (2003). Distance metric learning, with application to clustering with side-information.
-----1
[1] Edwin V. Bonilla, Kian Ming Adam Chai, and Christopher K. I. Williams. Multi-task gaussian process prediction. In NIPS, 2007.
[2] Mauricio A. Alvarez and Neil D. Lawrence. Sparse convolved gaussian processes for multi- output regression. In NIPS, pages 5764, 2008.
[3] Edwin V. Bonilla, Felix V. Agakov, and Christopher K. I. Williams. Kernel multi-task learning using task-specific features. In AISTATS, 2007.
[4] Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, and Maneesh Sahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In NIPS, pages 18811888, 2008.
[5] Oliver Stegle, Christoph Lippert, Joris M. Mooij, Neil D. Lawrence, and Karsten M. Borg- wardt. Efficient inference in matrix-variate gaussian models with iid observation noise. In NIPS, pages 630638, 2011.
[6] Karin Meyer. Estimating variances and covariances for multivariate animal models by re- stricted maximum likelihood. Genetics Selection Evolution, 23(1):6783, 1991.
[7] V Ducrocq and H Chapuis. Generalizing the use of the canonical transformation for the so- lution of multivariate mixed model equations. Genetics Selection Evolution, 29(2):205224, 1997.
[8] Hao Zhang. Maximum-likelihood estimation for multivariate spatial linear coregionalization models. Environmetrics, 18(2):125139, 2007.
[9] Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. Gaussian process re- gression networks. In ICML, 2012.
[10] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.
[11] Alfredo A. Kalaitzis and Neil D. Lawrence. Residual components analysis. In ICML, 2012.
[12] Ciyou Zhu, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw., 23(4):550560, December 1997.
[13] Ulrike Ober, Julien F. Ayroles, Eric A. Stone, Stephen Richards, and et al. Using Whole- Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster.PLoS Genetics, 8(5):e1002685+, May 2012.
[14] Erin N Smith and Leonid Kruglyak. Geneenvironment interaction in yeast gene expression.PLoS Biology, 6(4):e83, 2008.
[15] S. Atwell, Y. S. Huang, B. J. Vilhjalmsson, Willems, and et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature, 465(7298):627631, Jun 2010.
-----1
[1] Matthew B Kennel, Jonathon Shlens, Henry DI Abarbanel, and EJ Chichilnisky. Estimating entropy rates with bayesian confidence intervals. Neural Computation, 17(7):15311576, 2005.
[2] Abraham Lempel and Jacob Ziv. On the complexity of finite sequences. Information Theory, IEEE Transactions on, 22(1):7581, 1976.
[3] Ilya Nemenman, Fariel Shafee, and William Bialek. Entropy and inference, revisited. arXiv preprint physics/0108025, 2001.
[4] George Armitage Miller and William Gregory Madow. On the Maximum Likelihood Esti- mate of the Shannon-Weiner Measure of Information. Operational Applications Laboratory, Air Force Cambridge Research Center, Air Research and Development Command, Bolling Air Force Base, 1954.
[5] Yee Whye Teh, Michael I Jordan, Matthew J Beal, and David M Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476), 2006.
[6] Yee Whye Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 985992. Association for Computational Linguistics, 2006.
[7] Frank Wood, Cedric Archambeau, Jan Gasthaus, Lancelot James, and Yee Whye Teh. A stochas- tic memoizer for sequence data. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 11291136. ACM, 2009.
[8] V. J. Uzzell and E. J. Chichilnisky. Precision of spike trains in primate retinal ganglion cells.Journal of Neurophysiology, 92:780789, 2004.
-----1
[1] R. Atar and T. Weissman. Mutual information, relative entropy, and estimation in the Poisson channel.IEEE Transactions on Information Theory, 58(3):13021318, March 2012.
[2] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh. Clustering with bregman divergences. JMLR, 2005.
[3] M.W Berry, M. Browne, A.N. Langville, V.P. Pauca, and R. J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 2007.
[4] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. JMLR, 2003.
[5] L.M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 1967.
[6] E. Cande`s, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. on Inform. Theory, 2006.
[7] W.R. Carson, M. Chen, M.R.D. Rodrigues, R. Calderbank, and L. Carin. Communications-inspired pro- jection design with application to compressive sensing. SIAM J. Imaging Sciences, 2013.
[8] M. Chen, W. Carson, M. Rodrigues, R. Calderbank, and L. Carin. Communications inspired linear dis- criminant analysis. In ICML, 2012.
[9] G.B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley New York, 1999.
[10] D. Guo. Information and estimation over binomial and negative binomial models. arXiv preprint arXiv:1207.7144, 2012.
[11] D. Guo, S. Shamai, and S. Verdu. Mutual information and minimum mean-square error in Gaussian channels. IEEE Transactions on Information Theory, 51(4):12611282, April 2005.
[12] D. Guo, S. Shamai, and S. Verdu. Mutual information and conditional mean estimation in Poisson chan- nels. IEEE Transactions on Information Theory, 54(5):18371849, May 2008.
[13] S.M. Haas and J.H. Shapiro. Capacity of wireless optical communications. IEEE Journal on Selected Areas in Communications, 21(8):13461357, Aug. 2003.
[14] M. Hellman and J. Raviv. Probability of error, equivocation, and the Chernoff bound. IEEE Transactions on Information Theory, 1970.
[15] A. Lapidoth and S. Shamai. The poisson multiple-access channel. IEEE Transactions on Information Theory, 44(2):488501, Feb. 1998.
[16] R.S. Liptser and A.N. Shiryaev. Statistics of Random Processes: II. Applications, volume 2. Springer, 2000.
[17] D.P. Palomar and S. Verdu. Gradient of mutual information in linear vector Gaussian channels. IEEE Transactions on Information Theory, 52(1):141154, Jan. 2006.
[18] D.P. Palomar and S. Verdu. Representation of mutual information via input estimates. IEEE Transactions on Information Theory, 53(2):453470, Feb. 2007.
[19] S. Prasad. Certain relations between mutual information and fidelity of statistical estimation.http://arxiv.org/pdf/1010.1508v1.pdf, 2012.
[20] M. Raginsky, R.M. Willett, Z.T. Harmany, and R.F. Marcia. Compressed sensing performance bounds under poisson noise. IEEE Trans. Signal Processing, 2010.
[21] M. Seeger, H. Nickisch, R. Pohmann, and B. Schoelkopf. Optimization of k-space trajectories for com- pressed sensing by bayesian experimental design. Magnetic Resonance in Medicine, 2010.
[22] C.G. Taborda and F. Perez-Cruz. Mutual information and relative entropy over the binomial and negative binomial channels. In IEEE International Symposium on Information Theory Proceedings (ISIT), pages 696700. IEEE, 2012.
[23] S. Verdu. Mismatched estimation and relative entropy. IEEE Transactions on Information Theory, 56(8):37123720, Aug. 2010.
[24] T. Weissman. The relationship between causal and noncausal mismatched estimation in continuous-time awgn channels. IEEE Transactions on Information Theory, 2010.
[25] D.S. Wilcox, G.T. Buzzard, B.J. Lucier, P. Wang, and D. Ben-Amotz. Photon level chemical classification using digital compressive detection. Analytica Chimica Acta, 2012.
[26] M. Zhou, L. Hannah, D. Dunson, and L. Carin. Beta-negative binomial process and Poisson factor anal- ysis. AISTATS, 2012.
-----1
[1] A. Agarwal, S. Negahban, and M. J. Wainwright. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Annals of Statistics, 40(2):11711197, 2012.
[2] E. J. Cande`s, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):12071223, 2006.
[3] E. J. Cande`s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM, 58(3), May 2011.
[4] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of lin- ear inverse problems. In 48th Annual Allerton Conference on Communication, Control and Computing, 2010.
[5] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2), 2011.
[6] V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky. Latent variable graphical model selection via convex optimization. Annals of Statistics (with discussion), 40(4), 2012.
[7] D. Hsu, S. M. Kakade, and T. Zhang. Robust matrix decomposition with sparse corruptions.IEEE Trans. Inform. Theory, 57:72217234, 2011.
[8] A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In Neur. Info. Proc. Sys. (NIPS), 23, 2010.
[9] M. McCoy and J. A. Tropp. Two proposals for robust pca using semidefinite programming.Electron. J. Statist., 5:11231160, 2011.
[10] S. Negahban and M. J. Wainwright. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Annals of Statistics, 39(2):10691097, 2011.
[11] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high- dimensional analysis of M-estimators with decomposable regularizers. Statistical Science, 27 (4):538557, 2012.
[12] G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for correlated gaussian designs. Journal of Machine Learning Research (JMLR), 99:22412259, 2010.
[13] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.
[14] H. Xu and C. Leng. Robust multi-task regression with grossly corrupted observations. Inter.Conf. on AI and Statistics (AISTATS), 2012.
[15] H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. IEEE Transactions on Information Theory, 58(5):30473064, 2012.
-----1
[1] Ferguson, T. S. (1973) A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1(2):209230.
[2] Teh, Y. W. (2010) Dirichlet Processes. In Encyclopedia of Machine Learning. Springer.
[3] Kingman, J. F. C. (1992). Poisson processes. Oxford University Press.
[4] Pitman, J., & Yor, M. (1997) The two-parameter PoissonDirichlet distribution derived from a stable subor- dinator. Annals of Probability, 25:855-900.
[5] Pitman, J. (2006) Combinatorial Stochastic Processes. Lecture Notes in Mathematics. Springer-Verlag.
[6] Sethuraman, J. (1994) A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639-650.
[7] Neal, R. M. (2000) Markov chain sampling methods for Dirichlet process mixture models, Journal of Com- putational and Graphical Statistics, 9:249265.
[8] Meeds, E., Ghahramani, Z., Neal, R., & Roweis, S. (2007) Modelling dyadic data with binary latent factors.In Advances in Neural Information Processing 19.
[9] Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006) Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):15661581.
[10] Griffiths, T. L. and Ghahramani, Z. (2011) The Indian buffet process: An introduction and review. Journal of Machine Learning Research, 12:11851224.
[11] Broderick, T., Pitman, J., & Jordan, M. I. (2013). Feature allocations, probability functions, and paintboxes.arXiv preprint arXiv:1301.6647.
[12] Teh, Y. W., Blundell, C., & Elliott, L. T. (2011). Modelling genetic variations with fragmentation- coagulation processes. In Advances in Neural Information Processing Systems 23.
[13] Orbanz, P. & Teh, Y. W. (2010). Bayesian Nonparametric Models. In Encyclopedia of Machine Learning.Springer.
[14] Medvedovic, M. & Sivaganesan, S. (2002) Bayesian infinite mixture model based clustering of gene expres- sion profiles. Bioinformatics, 18:11941206.
[15] Medvedovic, M., Yeung, K. and Bumgarner, R. (2004) Bayesian mixture model based clustering of repli- cated microarray data. Bioinformatics 20:12221232.
[16] Liu X., Sivanagesan, S., Yeung, K.Y., Guo, J., Bumgarner, R. E. and Medvedovic, M. (2006) Context- specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinfor- matics, 22:1737-1744.
[17] Shannon, C. E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal 27(3):379423.
[18] I. Nemenman, F. Shafee, & W. Bialek. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems, 14.
[19] Archer, E., Park, I. M., & Pillow, J. (2013) Bayesian Entropy Estimation for Countable Discrete Distribu- tions. arXiv preprint arXiv:1302.0328.
[20] Simovici, D. (2007) On Generalized Entropy and Entropic Metrics. Journal of Multiple Valued Logic and Soft Computing, 13(4/6):295.
[21] Ellerman, D. (2009) Counting distinctions: on the conceptual foundations of Shannons information theory.Synthese, 168(1):119-149.
[22] Neal, R. M. (1992) Bayesian mixture modeling, in Maximum Entropy and Bayesian Methods: Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, Seattle, 1991, eds, Smith, Erickson, & Neudorfer, Dordrecht: Kluwer Academic Publishers, 197-211.
[23] Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998) Cluster analysis and display of genome- wide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863-14868.
[24] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179-188.
[25] Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R., Aebersold, R. & Hood, L. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science, 292(5518):929-934.
[26] Pevehouse, J. C., Nordstrom, T. & Warnke, K. (2004) The COW-2 International Orga- nizations Dataset Version 2.0. Conflict Management and Peace Science 21(2):101-119.http://www.correlatesofwar.org/COW2%20Data/IGOs/IGOv2-1.htm 
-----1
[1] Yee Whye Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, New York, 2010.
[2] Radford M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249265, 2000.
[3] David M. Blei and Michael I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1(1):121144, 2006.
[4] Carlos M. Carvalho, Hedibert F. Lopes, Nicholas G. Polson, and Matt A. Taddy. Particle learning for general mixtures. Bayesian Analysis, 5(4):709740, 2010.
[5] Steven N. MacEachern. Dependent nonparametric processes. In Proceedings of the Bayesian Statistical Science Section. American Statistical Association, 1999.
[6] Dahua Lin, Eric Grimson, and John Fisher. Construction of dependent dirichlet processes based on poisson processes. In Neural Information Processing Systems, 2010.
[7] Matt Hoffman, David Blei, Chong Wang, and John Paisley. Stochastic variational inference. arXiv ePrint 1206.7051, 2012.
[8] Finale Doshi-Velez and Zoubin Ghahramani. Accelerated sampling for the indian buffet process. In Proceedings of the International Conference on Machine Learning, 2009.
[9] Felix Endres, Christian Plagemann, Cyrill Stachniss, and Wolfram Burgard. Unsupervised discovery of object classes from range data using latent dirichlet allocation. In Robotics Science and Systems, 2005.
[10] Matthias Luber, Kai Arras, Christian Plagemann, and Wolfram Burgard. Classifying dynamic objects: An unsupervised learning approach. In Robotics Science and Systems, 2004.
[11] Zhikun Wang, Marc Deisenroth, Heni Ben Amor, David Vogt, Bernard Scholkopf, and Jan Peters. Prob- abilistic modeling of human movements for intention inference. In Robotics Science and Systems, 2008.
[12] Stuart P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129 137, 1982.
[13] Dan Pelleg and Andrew Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning, 2000.
[14] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society B, 63(2):411423, 2001.
[15] Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via bayesian nonparametrics.In Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, Scotland, 2012.
[16] Thomas S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209230, 1973.
[17] Jayaram Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639650, 1994.
[18] Tsunenori Ishioka. Extended k-means with an efficient estimation of the number of clusters. In Proceed- ings of the 2nd International Conference on Intelligent Data Engineering and Automated Learning, pages 1722, 2000.
[19] Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis, and Rene Schult. Monic - modeling and monitor- ing cluster transitions. In Proceedings of the 12th International Conference on Knowledge Discovering and Data Mining, pages 706711, 2006.
[20] Panos Kalnis, Nikos Mamoulis, and Spiridon Bakiras. On discovering moving clusters in spatio-temporal data. In Proceedings of the 9th International Symposium on Spatial and Temporal Databases, pages 364381. Springer, 2005.
[21] Deepayan Chakraborti, Ravi Kumar, and Andrew Tomkins. Evolutionary clustering. In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006.
[22] Kevin Xu, Mark Kliger, and Alfred Hero III. Adaptive evolutionary clustering. Data Mining and Knowl- edge Discovery, pages 133, 2012.
[23] Carlos M. Carvalho, Michael S. Johannes, Hedibert F. Lopes, and Nicholas G. Polson. Particle learning and smoothing. Statistical Science, 25(1):88106, 2010.
[24] Carine Hue, Jean-Pierre Le Cadre, and Patrick Perez. Tracking multiple objects with particle filtering.IEEE Transactions on Aerospace and Electronic Systems, 38(3):791812, 2002.
[25] Jaco Vermaak, Arnaud Doucet, and Partick Perez. Maintaining multi-modality through mixture tracking.In Proceedings of the 9th IEEE International Conference on Computer Vision, 2003.
[26] Jasper Snoek, Hugo Larochelle, and Ryan Adams. Practical bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012.
-----1
[1] H. Alt and L. Guibas, Discrete geometric shapes: matching, interpolation, and approximation, in: J.-R. Sack, J. Urrutia (Eds.), Handbook of Computational Geometry, Elsevier, Amsterdam, 1999, pp. 121-153.
[2] K. S. Arun, T. S. Huang and S. D. Blostein: Least-Squares Fitting of Two 3-D Point Sets. IEEE Trans.Pattern Anal. Mach. Intell. (PAMI) 9(5):698-700, 1987.
[3] R. Berezney. Regulating the mammalian genome: the role of nuclear architecture. Advances in Enzyme Regulation, 42:39-52, 2002.
[4] P.J. Besl and N.D. McKay, A method for registration of 3-d shapes, IEEE Trans. Pattern Anal. Mach. Intell.14 (2) 239-256, 1992.
[5] S. Belongie, J. Malik and J. Puzicha. Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4): 509-522 (2002) 
[6] J. Croft, J. Bridger, S. Boyle, P. Perry, P. Teague and W. Bickmore. Differences in the localization and morphology of chromosomes in the human nucleus. J. Cell. Biol., 145(6):1119-1131, 1999.
[7] T. Cremer, M. Cremer, S. Dietzel, S. Mller, I. Solovei, and S. Fakan. Chromosome territoriesa functional nuclear landscape. Curr. Opin. Cell. Biol., 18(3):307-316, 2006.
[8] S. D. Cohen and L. J. Guibas. The Earth Movers Distance under Transformation Sets. In ICCV, 1076-1083, 1999.
[9] M. Charikar, S. Guha, E. Tardos and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In Proceedings of the thirtieth annual ACM symposium on Theory of computing, STOC 99, pages 1-10, New York, NY, USA, 1999.
[10] H. Ding, B. Stojkovic, R. Berezney and J. Xu. Gauging Association Patterns of Chromosome Territories via Chromatic Median. In CVPR 2013: 1296-1303 
[11] M. F. Demirci, A. Shokoufandeh and S. J. Dickinson. Skeletal Shape Abstraction from Examples. IEEE Trans. Pattern Anal. Mach. Intell. 31(5): 944-952, 2009.
[12] H. Ding and J. Xu. Solving the Chromatic Cone Clustering Problem via Minimum Spanning Sphere. In ICALP (1) 2011: 773-784 
[13] H. Ding and J. Xu. FPTAS for Minimizing Earth Movers Distance under Rigid Transformations. In ESA 2013: 397-408 
[14] M. Ferrer, D. Karatzas, E. Valveny, I. Bardaj and H. Bunke. A generic framework for median graph computation based on a recursive embedding approach. Computer Vision and Image Understanding 115(7): 919-928, 2011.
[15] J. C. Gower. Generalized procrustes analysis. Psychometrika 40: 3331, 1975.
[16] R. I. Hartley and F. Kahl. Global Optimization through Rotation Space Search. International Journal of Computer Vision 82(1): 64-79 (2009) 
[17] T. Jiang, F. Jurie and C. Schmid. Learning shape prior models for object matching. In CVPR 2009: 848-855 
[18] X. Jiang, A. Munger and H. Bunke. On Median Graphs: Properties, Algorithms, and Applications, IEEE TPAMI, vol. 23, no. 10, pp. 1144-1151, Oct. 2001.
[19] S. Li and O. Svensson: Approximating k-median via pseudo-approximation. In STOC 2013: 901-910.
[20] D. Macrini, K. Siddiqi and S. J. Dickinson. From skeletons to bone graphs: Medial abstraction for object recognition. In CVPR, 2008.
[21] T. B. Sebastian, P. N. Klein and B. B. Kimia. Recognition of Shapes by Editing Shock Graphs. In ICCV, 755-762, 2001.
[22] N. H. Trinh and B. B. Kimia. Learning Prototypical Shapes for Object Categories. In SMiCV, 2010.
[23] E. Weiszfeld. On the point for which the sum of the distances to n given points is minimum. Tohoku. Math.Journal., 43:355-386, 1937.
[24] D. B. West, Introduction to Graph Theory , Prentice Hall, Chapter 3, ISBN 0-13-014400-2, 1999.
[25] M.J. Zeitz, L. Mukherjee, J. Xu, S. Bhattacharya and R. Berezney. A Probabilistic Model for the Arrange- ment of a Subset of Chromosome Territories in WI38 Human Fibroblasts. Journal of Cellular Physiology, 221, 120-129, 2009.
-----1
[1] R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 2002.
[2] P. Awasthi and M. Balcan. Center based clustering: A foundational perspective. Survey Chap- ter in Handbook of Cluster Analysis (Manuscript), 2013.
[3] J. Considine, F. Li, G. Kollios, and J. Byers. Approximate aggregation techniques for sensor databases. In Proceedings of the International Conference on Data Engineering, 2004.
[4] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Googles globally-distributed database. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 2012.
[5] S. Dutta, C. Gianella, and H. Kargupta. K-means clustering over peer-to-peer networks. In Pro- ceedings of the International Workshop on High Performance and Distributed Mining, 2005.
[6] D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In Proceedings of the Annual ACM Symposium on Theory of Computing, 2011.
[7] D. Feldman, A. Sugaya, and D. Rus. An effective coreset compression algorithm for large scale sensor networks. In Proceedings of the International Conference on Information Processing in Sensor Networks, 2012.
[8] G. Forman and B. Zhang. Distributed data clustering can be efficient and exact. ACM SIGKDD Explorations Newsletter, 2000.
[9] S. Greenhill and S. Venkatesh. Distributed query processing for mobile surveillance. In Pro- ceedings of the International Conference on Multimedia, 2007.
[10] M. Greenwald and S. Khanna. Power-conserving computation of order-statistics over sensor networks. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2004.
[11] S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In Proceed- ings of the Annual ACM Symposium on Theory of Computing, 2004.
[12] E. Januzaj, H. Kriegel, and M. Pfeifle. Towards effective and efficient distributed clustering.In Workshop on Clustering Large Data Sets in the IEEE International Conference on Data Mining, 2003.
[13] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local search approximation algorithm for k-means clustering. In Proceedings of the Annual Symposium on Computational Geometry, 2002.
[14] H. Kargupta, W. Huang, K. Sivakumar, and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 2001.
[15] S. Li and O. Svensson. Approximating k-median via pseudo-approximation. In Proceedings of the Annual ACM Symposium on Theory of Computing, 2013.
[16] Y. Li, P. M. Long, and A. Srinivasan. Improved bounds on the sample complexity of learning.In Proceedings of the eleventh annual ACM-SIAM Symposium on Discrete Algorithms, 2000.
[17] S. Mitra, M. Agrawal, A. Yadav, N. Carlsson, D. Eager, and A. Mahanti. Characterizing web- based video sharing workloads. ACM Transactions on the Web, 2011.
[18] C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2003.
[19] D. Tasoulis and M. Vrahatis. Unsupervised distributed clustering. In Proceedings of the Inter- national Conference on Parallel and Distributed Computing and Networks, 2004.
[20] Q. Zhang, J. Liu, and W. Wang. Approximate clustering on distributed data streams. In Proceedings of the IEEE International Conference on Data Engineering, 2008.
-----1
[1] Raman Arora, M Gupta, Amol Kapila, and Maryam Fazel. Clustering by left-stochastic ma- trix factorization. In International Conference on Machine Learning (ICML), pages 761768, 2011.
[2] A. Bertozzi and A. Flenner. Diffuse Interface Models on Graphs for Classification of High Dimensional Data. Multiscale Modeling and Simulation, 10(3):10901118, 2012.
[3] X. Bresson, T. Laurent, D. Uminsky, and J. von Brecht. Convergence and energy landscape for cheeger cut clustering. In Advances in Neural Information Processing Systems (NIPS), pages 13941402, 2012.
[4] X. Bresson, X.-C. Tai, T.F. Chan, and A. Szlam. Multi-Class Transductive Learning based on `1 Relaxations of Cheeger Cut and Mumford-Shah-Potts Model. UCLA CAM Report, 2012.
[5] T. Buhler and M. Hein. Spectral Clustering Based on the Graph p-Laplacian. In International Conference on Machine Learning (ICML), pages 8188, 2009.
[6] A. Chambolle and T. Pock. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision, 40(1):120145, 2011.
[7] J. Cheeger. A Lower Bound for the Smallest Eigenvalue of the Laplacian. Problems in Analy- sis, pages 195199, 1970.
[8] F. R. K. Chung. Spectral Graph Theory, volume 92 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 1997.
[9] Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. SIAM Data Mining Conf, number 4, pages 606 610, 2005.
[10] C. Garcia-Cardona, E. Merkurjev, A. L. Bertozzi, A. Flenner, and A. G. Percus. Fast multiclass segmentation using diffuse interface methods on graphs. Submitted, 2013.
[11] M. Hein and T. Buhler. An Inverse Power Method for Nonlinear Eigenproblems with Applica- tions in 1-Spectral Clustering and Sparse PCA. In Advances in Neural Information Processing Systems (NIPS), pages 847855, 2010.
[12] M. Hein and S. Setzer. Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts. In Advances in Neural Information Processing Systems (NIPS), 2011.
[13] E. Merkurjev, T. Kostic, and A. Bertozzi. An mbo scheme on graphs for segmentation and image processing. UCLA CAM Report 12-46, 2012.
[14] C. Michelot. A Finite Algorithm for Finding the Projection of a Point onto the Canonical Simplex of Rn. Journal of Optimization Theory and Applications, 50(1):195200, 1986.
[15] S. Rangapuram and M. Hein. Constrained 1-Spectral Clustering. In International conference on Artificial Intelligence and Statistics (AISTATS), pages 11431151, 2012.
[16] J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(8):888905, 2000.
[17] A. Szlam and X. Bresson. A total variation-based graph clustering algorithm for cheeger ratio cuts. UCLA CAM Report 09-68, 2009.
[18] A. Szlam and X. Bresson. Total variation and cheeger cuts. In International Conference on Machine Learning (ICML), pages 10391046, 2010.
[19] Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, and Erkki Oja. Clustering by nonnegative matrix factorization using graph random walk. In Advances in Neural Information Processing Systems (NIPS), pages 10881096, 2012.
[20] Zhirong Yang and Erkki Oja. Clustering by low-rank doubly stochastic matrix decomposition.In International Conference on Machine Learning (ICML), 2012.
-----1
[1] S. Lloyd. Least squares quantization in PCM. Information Theory, IEEE Transactions on, 28(2):129137, 1982.
[2] P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.
[3] J.A. Hartigan and M.A. Wong. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100108, 1979.
[4] R. Ostrovsky, Y. Rabani, L.J. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k-means problem. In Foundations of Computer Science, 2006. FOCS06. 47th Annual IEEE Symposium on, pages 165176. IEEE, 2006.
[5] P. Hansen, E. Ngai, B.K. Cheung, and N. Mladenovic. Analysis of global k-means, an incre- mental heuristic for minimum sum-of-squares clustering. Journal of classification, 22(2):287 310, 2005.
[6] G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering.Marcel Dekker, New York, 1998.
[7] Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In FOCS 2010: Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, pages 103112. IEEE Computer Society, 2010.
[8] G. Chen and M. Maggioni. Multiscale geometric and spectral analysis of plane arrangements.In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2825 2832. IEEE, 2011.
[9] Yaoliang Yu and Dale Schuurmans. Rank/norm regularization with closed-form solutions: Application to subspace clustering. In Fabio Gagliardi Cozman and Avi Pfeffer, editors, UAI, pages 778785. AUAI Press, 2011.
[10] M. Soltanolkotabi and E.J. Cande`s. A geometric analysis of subspace clustering with outliers.Arxiv preprint arXiv:1112.4258, 2011.
[11] A. Maurer andM. Pontil. k-dimensional coding schemes in hilbert spaces. Information Theory, IEEE Transactions on, 56(11):58395846, 2010.
[12] A.J. Smola, S. Mika, B. Scholkopf, and R.C. Williamson. Regularized principal manifolds.The Journal of Machine Learning Research, 1:179209, 2001.
[13] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1):138, 1977.
[14] R.N. Dave and R. Krishnapuram. Robust clustering methods: a unified view. Fuzzy Systems, IEEE Transactions on, 5(2):270293, 1997.
[15] Huan Xu, Constantine Caramanis, and Shie Mannor. Outlier-robust PCA: The high- dimensional case. IEEE transactions on information theory, 59(1):546572, 2013.
[16] B. Zhang. Regression clustering. InData Mining, 2003. ICDM 2003. Third IEEE International Conference on, pages 451458. IEEE, 2003.
[17] P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.
[18] R. A. Maronna, R. D. Martin, and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006.
[19] Olivier Bousquet and Andre Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499526, 2002.
[20] M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is np-hard.WALCOM: Algorithms and Computation, pages 274285, 2009.
[21] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[22] Roberto Tron and Rene Vidal. A benchmark for the comparison of 3-d motion segmentation algorithms. In CVPR. IEEE Computer Society, 2007.
[23] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1- ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272279, 2008.
-----1
[1] K. Chaudhuri, F. Chung, and A. Tsiatas. Spectral clustering of graphs with general degrees in the extended planted partition model. Journal of Machine Learning Research, pages 123, 2012.
[2] Arash A Amini, Aiyou Chen, Peter J Bickel, and Elizaveta Levina. Pseudo-likelihood methods for community detection in large sparse networks. 2012.
[3] F. McSherry. Spectral partitioning of random graphs. In Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pages 529537. IEEE, 2001.
[4] Anirban Dasgupta, John E Hopcroft, and Frank McSherry. Spectral analysis of random graphs with skewed degree distributions. In Foundations of Computer Science, 2004. Proceedings.45th Annual IEEE Symposium on, pages 602610. IEEE, 2004.
[5] Amin Coja-Oghlan and Andre Lanka. Finding planted partitions in random graphs with general degree distributions. SIAM Journal on Discrete Mathematics, 23(4):16821714, 2009.
[6] Brendan PW Ames and Stephen A Vavasis. Convex optimization for the planted k-disjoint- clique problem. arXiv preprint arXiv:1008.2814, 2010.
[7] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):18781915, 2011.
[8] D.L. Sussman, M. Tang, D.E. Fishkind, and C.E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):11191128, 2012.
[9] P.W. Holland and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2): 109137, 1983.
[10] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):016107, 2011.
[11] Michael W Mahoney. Randomized algorithms for matrices and data. Advances in Machine Learning and Data Mining for Astronomy, CRC Press, Taylor & Francis Group, Eds.: Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, Ashok N. Srivastava, p. 647-672, 1:647672, 2012.
[12] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849856, 2002.
[13] Jiashun Jin. Fast network community detection by score. arXiv preprint arXiv:1211.5803, 2012.
[14] Fan Chung and Mary Radcliffe. On the spectra of general random graphs. the electronic journal of combinatorics, 18(P215):1, 2011.
[15] Peter H Schonemann. A generalized solution of the orthogonal procrustes problem. Psychome- trika, 31(1):110, 1966.
[16] D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, 99(2):273284, 2012.
[17] Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 us election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pages 3643. ACM, 2005.
-----1
[1] David Pollard. Strong consistency of k-means clustering. The Annals of Statistics, 9(1):135 140, 1981.
[2] Gbor Lugosi and Kenneth Zeger. Rates of convergence in the source coding theorem, in em- pirical quantizer design, and in universal lossy source coding. IEEE Trans. Inform. Theory, 40: 17281740, 1994.
[3] Shai Ben-david. A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In COLT, pages 415426. Springer, 2004.
[4] Alexander Rakhlin and Andrea Caponnetto. Stability of k-means clustering. In NIPS, pages 11211128, 2006.
[5] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. Wiley, 2 edition, 2001.
[6] Thomas S. Ferguson. A course in large sample theory. Chapman & Hall, 1996.
[7] David Pollard. A central limit theorem for k-means clustering. The Annals of Probability, 10 (4):919926, 1982.
[8] Shai Ben-david, Ulrike Von Luxburg, and David Pal. A sober look at clustering stability. In In COLT, pages 519. Springer, 2006.
[9] Ohad Shamir and Naftali Tishby. Cluster stability for finite samples. In Annals of Probability, 10(4), pages 919926, 1982.
[10] Ohad Shamir and Naftali Tishby. Model selection and stability in k-means clustering. In COLT, 2008.
[11] Stephane Boucheron, Olivier Bousquet, and Gabor Lugosi. Theory of classification: a survey of recent advances. ESAIM: Probability and Statistics, 9:323375, 2005.
[12] Stephane Boucheron, Gabor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford, 2013.
[13] Jon Wellner. Consistency and rates of convergence for maximum likelihood estimators via empirical process theory. 2005.
[14] Aad van der Vaart and Jon Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
[15] FuChang Gao and Jon A. Wellner. On the rate of convergence of the maximum likelihood estimator of a k-monotone density. Science in China Series A: Mathematics, 52(7):15251538, 2009.
[16] Yair Al Censor and Stavros A. Zenios. Parallel Optimization: Theory, Algorithms and Appli- cations. Oxford University Press, 1997.
[17] Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:17051749, 2005.
[18] Terence Tao. 254a notes 1: Concentration of measure, January 2010. URL http://terrytao.wordpress.com/2010/01/03/ 254a-notes-1-concentration-of-measure/.
[19] I. F. Pinelis and S. A. Utev. Estimates of the moments of sums of independent random vari- ables. Teor. Veroyatnost. i Primenen., 29(3):554557, 1984. Translation to English by Bernard Seckler.
[20] Shai Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University of Jerusalem, July 2007.
[21] Jean-Baptiste Hiriart-Urruty and Claude Lemarechal. Fundamentals of Convex Analysis.Springer Publishing Company, Incorporated, 2001.
-----1
[1] D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2:343370, 1988.
[2] J. Aslam and S. Decatur. Specification and simulation of statistical query algorithms for efficiency and noise tolerance. JCSS, 56:191208, 1998.
[3] M. Balcan, A. Broder, and T. Zhang. Margin based active learning. In COLT, pages 3550, 2007.
[4] M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML, 2006.
[5] M. F. Balcan and V. Feldman. Statistical active learning algorithms, 2013. ArXiv:1307.3102.
[6] M.-F. Balcan and S. Hanneke. Robust interactive learning. In COLT, 2012.
[7] M.-F. Balcan, S. Hanneke, and J. Wortman. The true sample complexity of active learning. In COLT, 2008.
[8] M.-F. Balcan and P. M. Long. Active and passive learning of linear separators under log-concave distri- butions. JMLR - COLT proceedings (to appear), 2013.
[9] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML, pages 4956, 2009.
[10] A. Beygelzimer, D. Hsu, J. Langford, and T. Zhang. Agnostic active learning without constraints. In NIPS, 2010.
[11] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In Proceedings of PODS, pages 128138, 2005.
[12] A. Blum, A. Frieze, R. Kannan, and S. Vempala. A polynomial time algorithm for learning noisy linear threshold functions. Algorithmica, 22(1/2):3552, 1997.
[13] A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, and S. Rudich. Weakly learning DNF and charac- terizing statistical query learning using Fourier analysis. In STOC, pages 253262, 1994.
[14] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929965, 1989.
[15] R. Castro and R. Nowak. Minimax bounds for active learning. In COLT, 2007.
[16] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Learning noisy linear classifiers via adaptive and selective sampling. Machine Learning, 2010.
[17] C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In Proceedings of NIPS, pages 281288, 2006.
[18] S. Dasgupta. Coarse sample complexity bounds for active learning. In NIPS, volume 18, 2005.
[19] S. Dasgupta. Active learning. Encyclopedia of Machine Learning, 2011.
[20] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In ICML, pages 208215, 2008.
[21] S. Dasgupta, D.J. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. NIPS, 20, 2007.
[22] S. Dasgupta, A. Tauman Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10:281299, 2009.
[23] O. Dekel, C. Gentile, and K. Sridharan. Selective sampling and active learning from single and multiple teachers. JMLR, 2012.
[24] J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs.In STOC, pages 315320, 2004.
[25] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.In TCC, pages 265284, 2006.
[26] V. Feldman. A complete characterization of statistical query learning with applications to evolvability.Journal of Computer System Sciences, 78(5):14441459, 2012.
[27] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133168, 1997.
[28] A. Gonen, S. Sabato, and S. Shalev-Shwartz. Efficient pool-based active learning of halfspaces. In ICML, 2013.
[29] S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, 2007.
[30] M. Kearns. Efficient noise-tolerant learning from statistical queries. JACM, 45(6):9831006, 1998.
[31] V. Koltchinskii. Rademacher complexities and bounding the excess risk in active learning. JMLR, 11:24572485, 2010.
[32] L. Lovasz and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Struct. Algorithms, 30(3):307358, 2007.
[33] A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In ICML, pages 350358, 1998.
[34] M. Raginsky and A. Rakhlin. Lower bounds for passive and active learning. In NIPS, pages 10261034, 2011.
[35] L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):11341142, 1984.
-----1
[1] S. D. Babacan, R. Molina, M. N. Do, and A. K. Katsaggelos. Bayesian blind deconvolution with general sparse image priors. In ECCV, 2012.
[2] E. Cands and Y. Plan. Near-ideal model selection by 1 minimization. The Annals of Statistics, (5A):21452177.
[3] R. Chartrand and W. Yin. Iteratively reweighted algorithms for compressive sensing. In ICASSP, 2008.
[4] S. Cho, H. Cho, Y.-W. Tai, and S. Lee. Registration based non-uniform motion deblurring.Comput. Graph. Forum, 31(7-2):21832192, 2012.
[5] S. Cho and S. Lee. Fast motion deblurring. In SIGGRAPH ASIA, 2009.
[6] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake from a single photograph. In SIGGRAPH, 2006.
[7] A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless. Single image deblurring using motion density functions. In ECCV, 2010.
[8] S. Harmeling, M. Hirsch, and B. Schlkopf. Space-variant single-image blind deconvolution for removing camera shake. In NIPS, 2010.
[9] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schlkopf. Fast removal of non-uniform camera shake. In ICCV, 2011.
[10] M. Hirsch, S. Sra, B. Scholkopf, and S. Harmeling. Efficient filter flow for space-variant multiframe blind deconvolution. In CVPR, 2010.
[11] Z. Hu and M.-H. Yang. Fast non-uniform deblurring using constrained camera pose subspace.In BMVC, 2012.
[12] H. Ji and K. Wang. A two-stage approach to blind spatially-varying motion deblurring. In CVPR, 2012.
[13] N. Joshi, S. B. Kang, C. L. Zitnick, and R. Szeliski. Image deblurring using inertial measure- ment sensors. In ACM SIGGRAPH, 2010.
[14] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure.In CVPR, 2011.
[15] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Deconvolution using natural image priors.Technical report, MIT, 2007.
[16] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Efficient marginal likelihood optimization in blind deconvolution. In CVPR, 2011.
[17] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Understanding blind deconvolution algo- rithms. IEEE Trans. Pattern Anal. Mach. Intell., 33(12):23542367, 2011.
[18] J. G. Nagy and D. P. OLeary. Restoring images degraded by spatially variant blur. SIAM J.Sci. Comput., 19(4):10631082, 1998.
[19] J. A. Palmer. Relatve convexity. Technical report, UCSD, 2003.
[20] J. A. Palmer, D. P. Wipf, K. Kreutz-Delgado, and B. D. Rao. Variational EM algorithms for non-Gaussian latent variable models. In NIPS, 2006.
[21] Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. In SIGGRAPH, 2008.
[22] M. Sorel and F. Sroubek. Image Restoration: Fundamentals and Advances. CRC Press, 2012.
[23] Y.-W. Tai, P. Tan, and M. S. Brown. Richardson-Lucy deblurring for scenes under a projective motion path. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):16031618, 2011.
[24] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211244, 2001.
[25] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform deblurring for shaken images.In CVPR, 2010.
[26] D. P. Wipf, B. D. Rao, and S. S. Nagarajan. Latent variable Bayesian models for promoting sparsity. IEEE Trans. Information Theory, 57(9):62366255, 2011.
[27] D. P. Wipf and H. Zhang. Revisiting Bayesian blind deconvolution. submitted to Journal of Machine Learning Research, 2013.
[28] L. Xu and J. Jia. Two-phase kernel estimation for robust motion deblurring. In ECCV, 2010.
[29] H. Zhang, D. P. Wipf, and Y. Zhang. Multi-image blind deblurring using a coupled adaptive sparse prior. In CVPR, 2013.
-----1
[1] C. J. Stone. Optimal rates of convergence for non-parametric estimators. Ann. Statist., 8:1348 1360, 1980.
[2] C. J. Stone. Optimal global rates of convergence for non-parametric estimators. Ann. Statist., 10:13401353, 1982.
[3] W. S. Cleveland and C. Loader. Smoothing by local regression: Principles and methods. Sta- tistical theory and computational aspects of smoothing, 1049, 1996.
[4] L. Gyorfi, M. Kohler, A. Krzyzak, and H. Walk. A Distribution Free Theory of Nonparametric Regression. Springer, New York, NY, 2002.
[5] J. Lafferty and L. Wasserman. Rodeo: Sparse nonparametric regression in high dimensions.Arxiv preprint math/0506342, 2005.
[6] O. V. Lepski, E. Mammen, and V. G. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. The Annals of Statistics, pages 929947, 1997.
[7] O. V. Lepski and V. G. Spokoiny. Optimal pointwise adaptive methods in nonparametric esti- mation. The Annals of Statistics, 25(6):25122546, 1997.
[8] O. V. Lepski and B. Y. Levit. Adaptive minimax estimation of infinitely differentiable func- tions. Mathematical Methods of Statistics, 7(2):123156, 1998.
[9] S. Kpotufe. k-NN Regression Adapts to Local Intrinsic Dimension. NIPS, 2011.
[10] K. Clarkson. Nearest-neighbor searching and metric space dimensions. Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, 2005.
-----1
[1] D.L. Donoho, M. Elad, and V.N. Temlyakov. Stable recovery of sparse overcomplete repre- sentations in the presence of noise. Information Theory, IEEE Transactions on, 52(1):618, 2006.
[2] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle proper- ties. Journal of the American Statistical Association, 96:13481360, 2001.
[3] J.J. Fuchs. Recovery of exact sparse representations in the presence of bounded noise. Infor- mation Theory, IEEE Transactions on, 51(10):36013608, 2005.
[4] H.-C. Huang, N.-J. Hsu, D.M. Theobald, and F.J. Breidt. Spatial Lasso with applications to GIS model selection. Journal of Computational and Graphical Statistics, 19(4):963983, 2010.
[5] J.C. Huang and N. Jojic. Variable selection through Correlation Sifting. In V. Bafna and S.C. Sahinalp, editors, RECOMB, volume 6577 of Lecture Notes in Computer Science, pages 106123. Springer, 2011.
[6] J. Jia and K. Rohe. Preconditioning to comply with the irrepresentable condition. 2012.
[7] N. Meinshausen. Lasso with relaxation. Technical Report 129, Eidgenossische Technische Hochschule, Zurich, 2005.
[8] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34(3):14361462, 2006.
[9] D. Paul, E. Bair, T. Hastie, and R. Tibshirani. Preconditioning for feature selection and regression in high-dimensional problems. Annals of Statistics, 36(4):15951618, 2008.
[10] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267288, 1994.
[11] R.J. Tibshirani. The solution path of the Generalized Lasso. Stanford University, 2011.
[12] M.J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 55(5):21832202, 2009.
[13] P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of Machine Learning Research, 7:25412563, 2006.
[14] H. Zou. The Adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101:14181429, 2006.
[15] H. Zou and T. Hastie. Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, 67:301320, 2005.
-----1
[1] Boutsidis, C., Gittens, A.: Improved matrix algorithms via the subsampled randomized hadamard transform. CoRR abs/1204.0062 (2012) 
[2] Tygert, M.: A fast algorithm for computing minimal-norm solutions to underdetermined sys- tems of linear equations. CoRR abs/0905.4745 (2009) 
[3] Rokhlin, V., Tygert, M.: A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences 105(36) (September 2008) 1321213217 
[4] Drineas, P., Mahoney, M.W., Muthukrishnan, S., Sarlos, T.: Faster least squares approxima- tion. CoRR abs/0710.1435 (2007) 
[5] Mahoney, M.W.: Randomized algorithms for matrices and data. (April 2011) 
[6] Ailon, N., Chazelle, B.: Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In: STOC. (2006) 557563 
[7] Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: Supercharging lapacks least-squares solver. SIAM J. Sci. Comput. 32(3) (April 2010) 12171236 
[8] Vershynin, R.: How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? Journal of Theoretical Probability 25(3) (September 2012) 655686 
[9] Golub, G.H., Van Loan, C.F.: Matrix Computations (Johns Hopkins Studies in Mathematical Sciences)(3rd Edition). 3rd edn. The Johns Hopkins University Press (October 1996) 
[10] Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. CoRR abs/1011.3027 (2010) 
[11] Dhillon, P.S., Rodu, J., Foster, D., Ungar, L.: Two step cca: A new spectral method for estimating vector models of words. In: Proceedings of the 29th International Conference on Machine learning. ICML12 (2012) 
[12] Dhillon, P.S., Foster, D., Ungar, L.: Multi-view learning of word embeddings via cca. In: Advances in Neural Information Processing Systems (NIPS). Volume 24. (2011) 
[13] Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. CoRR abs/1209.1873 (2012) 
-----1
[1] Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast johnson- lindenstrauss transform. In STOC, pages 557563, 2006.
[2] Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes. Technical report, 2007.
[3] Francis Bach. Sharp analysis of low-rank kernel matrix approximations. CoRR, abs/1208.2015, 2012.
[4] Christos Boutsidis and Alex Gittens. Improved matrix algorithms via the subsampled random- ized hadamard transform. CoRR, abs/1204.0062, 2012.
[5] Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, and Lyle H. Ungar. A risk comparison of ordinary least squares vs ridge regression. Journal of Machine Learning Research, 14:1505 1511, 2013.
[6] Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, and Tams Sarls. Faster least squares approximation. CoRR, abs/0710.1435, 2007.
[7] Isabelle Guyon. Design of experiments for the nips 2003 variable selection benchmark. 2003.
[8] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217288, May 2011.
[9] Nathan Halko, Per-Gunnar Martinsson, Yoel Shkolnisky, and Mark Tygert. An algorithm for the principal component analysis of large data sets. SIAM J. Scientific Computing, 33(5):2580 2594, 2011.
[10] Daniel Hsu, Sham M. Kakade, and Tong Zhang. Analysis of a randomized approximation scheme for matrix multiplication. CoRR, abs/1211.5414, 2012.
[11] S. Jung and J.S. Marron. PCA consistency in high dimension, low sample size context. Annals of Statistics, 37:41044130, 2009.
[12] Quoc Le, Tamas Sarlos, and Alex Smola. Fastfood -approximating kernel expansions in log- linear time. ICML, 2013.
[13] W.F. Massy. Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60:234256, 1965.
[14] Xiangrui Meng, Michael A. Saunders, and Michael W. Mahoney. Lsrn: A parallel iterative solver for strongly over- or under-determined systems. CoRR, abs/1109.5981, 2011.
[15] Ali Rahimi and Ben Recht. Random features for large-scale kernel machines. In In Neural Infomration Processing Systems, 2007.
[16] Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principal component analysis. SIAM J. Matrix Analysis Applications, 31(3):11001124, 2009.
[17] Vladimir Rokhlin and Mark Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105(36):13212 13217, September 2008.
[18] Tamas Sarlos. Improved approximation algorithms for large matrices via random projections.In In Proc. 47th Annu. IEEE Sympos. Found. Comput. Sci, pages 143152. IEEE Computer Society, 2006.
[19] G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual vari- ables. In Proc. 15th International Conf. on Machine Learning, pages 515521. Morgan Kauf- mann, San Francisco, CA, 1998.
[20] Joel A. Tropp. Improved analysis of the subsampled randomized hadamard transform. CoRR, abs/1011.1595, 2010.
[21] Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Com- putational Mathematics, 12(4):389434, 2012.
[22] Mark Tygert. A fast algorithm for computing minimal-norm solutions to underdetermined systems of linear equations. CoRR, abs/0905.4745, 2009.
-----0
Agarwal, A., Dudk, M., Kale, S., Langford, J., and Schapire, R. E. (2012). Contextual bandit learning with predictable rewards. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS12).
Anandkumar, A., Foster, D. P., Hsu, D., Kakade, S., and Liu, Y.-K. (2012a). A spectral algorithm for latent dirichlet allocation. In Proceedings of Advances in Neural Information Processing Systems 25 (NIPS12), pages 926934.Anandkumar, A., Ge, R., Hsu, D., and Kakade, S. M. (2013). A tensor spectral approach to learning mixed membership community models. Journal of Machine Learning Research, 1:65.
Anandkumar, A., Ge, R., Hsu, D., Kakade, S. M., and Telgarsky, M. (2012b). Tensor decompositions for learning latent variable models. CoRR, abs/1210.7559.
Anandkumar, A., Hsu, D., and Kakade, S. M. (2012c). A method of moments for mixture models and hidden markov models. In Proceeding of the 25th Annual Conference on Learning Theory (COLT12), volume 23, pages 33.133.34.
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47:235256.
Azar, M. G., Lazaric, A., and Brunskill, E. (2013). Sequential transfer in multi-armed bandit with finite set of models. CoRR, abs/1307.6887.
Cavallanti, G., Cesa-Bianchi, N., and Gentile, C. (2010). Linear algorithms for online multitask classification. Journal of Machine Learning Research, 11:29012934.
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
Dekel, O., Long, P. M., and Singer, Y. (2006). Online multitask learning. In Proceedings of the 19th Annual Conference on Learning Theory (COLT06), pages 453467.
Garivier, A. and Moulines, E. (2011). On upper-confidence bound policies for switching bandit problems. In Proceedings of the 22nd international conference on Algorithmic learning theory, ALT11, pages 174188, Berlin, Heidelberg. Springer-Verlag.
Kleibergen, F. and Paap, R. (2006). Generalized reduced rank tests using the singular value decomposition. Journal of Econometrics, 133(1):97126.
Langford, J. and Zhang, T. (2007). The epoch-greedy algorithm for multi-armed bandits with side information. In Proceedings of Advances in Neural Information Processing Systems 20 (NIPS07).
Lazaric, A. (2011). Transfer in reinforcement learning: a framework and a survey. In Wiering, M.and van Otterlo, M., editors, Reinforcement Learning: State of the Art. Springer.Lugosi, G., Papaspiliopoulos, O., and Stoltz, G. (2009). Online multi-task learning with hard constraints. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT09).Mann, T. A. and Choe, Y. (2012). Directed exploration in reinforcement learning with transferred knowledge. In Proceedings of the Tenth European Workshop on Reinforcement Learning (EWRL12).Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):13451359.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the AMS, 58:527535.Saha, A., Rai, P., Daum III, H., and Venkatasubramanian, S. (2011). Online learning of multiple tasks and their relationships. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS11), Ft. Lauderdale, Florida.
Stewart, G. W. and Sun, J.-g. (1990). Matrix perturbation theory. Academic press.Taylor, M. E. (2009). Transfer in Reinforcement Learning Domains. Springer-Verlag.
Wedin, P. (1972). Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12(1):99111.
-----0
S. Agrawal and N. Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT), 2012a.
S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling, 2012b.arXiv:1209.3353.
J.-Y. Audibert and S. Bubeck. Minimax policies for adversarial and stochastic bandits. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT), 2009.J.-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11:26352686, 2010.
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem.Machine Learning Journal, 47(2-3):235256, 2002.
S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1122, 2012.
S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT), 2009.
S. Bubeck, V. Perchet, and P. Rigollet. Bounded regret in stochastic multi-armed bandits. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), 2013.
O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems (NIPS), 2011.J.C. Gittins. Bandit processes and dynamic allocation indices. Journal Royal Statistical Society Series B, 14:148167, 1979.
E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite-time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory (ALT), 2012.
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society, 58:527535, 1952.D. Russo and B. Van Roy. Learning to optimize via posterior sampling, 2013. arXiv:1301.2609.W. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bulletin of the American Mathematics Society, 25:285294, 1933.
-----1
[1] Peter Auer, Nicolo` Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2-3):235256, May 2002.
[2] Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, and Larry A. Shepp. Bandit problems with infinitely many arms. Annals of Statistics, 25(5):21032116, 1997.
[3] Olivier Cappe, Aurelien Garivier, Odalric-Ambrym Maillard, Remi Munos, and Gilles Stoltz.Kullback-leibler upper confidence bounds for optimal sequential allocation. To appear in An- nals of Statistics, 2013.
[4] Kung-Yu Chen and Chien-Tai Lin. A note on strategies for bandit problems with infinitely many arms. Metrika, 59(2):193203, 2004.
[5] Kung-Yu Chen and Chien-Tai Lin. A note on infinite-armed bernoulli bandit problems with generalized beta prior distributions. Statistical Papers, 46(1):129140, 2005.
[6] Stephen J Herschkorn, Erol Pekoez, and Sheldon M Ross. Policies without memory for the infinite-armed bernoulli bandit under the average-reward criterion. Probability in the Engi- neering and Informational Sciences, 10:2128, 1996.
[7] Ying-Chao Hung. Optimal bayesian strategies for the infinite-armed bernoulli bandit. Journal of Statistical Planning and Inference, 142(1):8694, 2012.
[8] Emilie Kaufmann, Nathaniel Korda, and Remi Munos. Thompson sampling: An asymptoti- cally optimal finite-time analysis. In Algorithmic Learning Theory, pages 199213. Springer, 2012.
[9] Tze L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):422, 1985.
[10] Chien-Tai Lin and CJ Shiau. Some optimal strategies for bandit problems with beta prior distributions. Annals of the Institute of Statistical Mathematics, 52(2):397405, 2000.
[11] C.L Mallows and Herbert Robbins. Some problems of optimal sampling strategy. Journal of Mathematical Analysis and Applications, 8(1):90  103, 1964.
[12] Olivier Teytaud, Sylvain Gelly, and Miche`le Sebag. Anytime many-armed bandits. In CAP07, 2007.
[13] Yizao Wang, Jean-Yves Audibert, and Remi Munos. Algorithms for infinitely many-armed bandits. In NIPS, 2008.
-----1
[1] S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem.In Conference On Learning Theory (COLT), 2012.
[2] S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling. In Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
[3] S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. In 30th International Conference on Machine Learning (ICML), 2013.
[4] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit prob- lem. Machine Learning, 47(2):235256, 2002.
[5] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities. Oxford Univeristy Press, 2013.
[6] S. Bubeck and Che-Yu Liu. A note on the bayesian regret of thompson sampling with an arbitrairy prior. arXiv:1304.5758, 2013.
[7] O. Cappe, A. Garivier, O-A. Maillard, R. Munos, and G. Stoltz. Kullback-Leibler upper con- fidence bounds for optimal sequential allocation. Annals of Statistics, 41(3):516541, 2013.
[8] A. Garivier and O. Cappe. The kl-ucb algorithm for bounded stochastic bandits and beyond.In Conference On Learning Theory (COLT), 2011.
[9] J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for bounded support models. In Conference On Learning Theory (COLT), 2010.
[10] H. Jeffreys. An invariant form for prior probability in estimation problem. Proceedings of the Royal Society of London, 186:453461, 1946.
[11] E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Algorithmic Learning Theory, Lecture Notes in Computer Science, pages 199213. Springer, 2012.
[12] T.L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):422, 1985.
[13] B.C. May, N. Korda, A. Lee, and D. Leslie. Optimistic bayesian sampling in contextual bandit problems. Journal of Machine Learning Research, 13:20692106, 2012.
[14] D. Russo and B. Van Roy. Learning to optimize via posterior sampling. arXiv:1301.2609, 2013.
[15] W.R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285294, 1933.
[16] A.W Van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998.
[17] L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer Publishing Company, Incorporated, 2010.
-----1
[1] S. Gelly and D. Silver. Monte-carlo tree search and rapid action value estimation in computer go. Artificial Intelligence, 175(11):18561875, 2011.
[2] Mark HM Winands, Yngvi Bjornsson, and J Saito. Monte carlo tree search in lines of action.IEEE Transactions on Computational Intelligence and AI in Games, 2(4):239250, 2010.
[3] L. Kocsis and C. Szepesvari. Bandit based monte-carlo planning. In European Conference on Machine Learning, pages 282293, 2006.
[4] D. Silver and J. Veness. Monte-carlo planning in large pomdps. In Advances in Neural Infor- mation Processing Systems, pages 21642172, 2010.
[5] Feng Wu, Shlomo Zilberstein, and Xiaoping Chen. Online planning for ad hoc autonomous agent teams. In International Joint Conference on Artificial Intelligence, pages 439445, 2011.
[6] Arthur Guez, David Silver, and Peter Dayan. Efficient bayes-adaptive reinforcement learning using sample-based search. In Advances in Neural Information Processing Systems, pages 10341042, 2012.
[7] John Asmuth and Michael L. Littman. Learning is planning: near bayes-optimal reinforcement learning via monte-carlo tree search. In Uncertainty in Artificial Intelligence, pages 1926, 2011.
[8] William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285294, 1933.
[9] Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances Neural Information Processing Systems, pages 22492257, 2011.
[10] Emilie Kaufmann, Nathaniel Korda, and Remi Munos. Thompson sampling: An optimal finite time analysis. In Algorithmic Learning Theory, pages 199213, 2012.
[11] Richard Dearden, Nir Friedman, and Stuart Russell. Bayesian q-learning. In AAAI Conference on Artificial Intelligence, pages 761768, 1998.
[12] Gerald Tesauro, V. T. Rajan, and Richard Segal. Bayesian inference in monte-carlo tree search.In Uncertainty in Artificial Intelligence, pages 580588, 2010.
[13] Galin L Jones. On the markov chain central limit theorem. Probability surveys, 1:299320, 2004.
[14] Anirban DasGupta. Asymptotic theory of statistics and probability. Springer, 2008.
[15] Shipra Agrawal and Navin Goyal. Further optimal regret bounds for thompson sampling. In Artificial Intelligence and Statistics, pages 99107, 2013.
[16] Blai Bonet and Hector Geffner. Action selection for mdps: Anytime ao* vs. uct. In AAAI Conference on Artificial Intelligence, pages 17491755, 2012.
[17] Christos H Papadimitriou and Mihalis Yannakakis. Shortest paths without a map. Theoretical Computer Science, 84(1):127150, 1991.
[18] Patrick Eyerich, Thomas Keller, and Malte Helmert. High-quality policies for the canadian travelers problem. In AAAI Conference on Artificial Intelligence, pages 5158, 2010.
[19] A.G. Barto, S.J. Bradtke, and S.P. Singh. Learning to act using real-time dynamic program- ming. Artificial Intelligence, 72(1-2):81138, 1995.
-----0
S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie. Generalized nonmetric multidimensional scaling. In AISTATS, 2007.
M. Alamgir and U. von Luxburg. Shortest path distance in random k-nearest neighbor graphs. In International Conference on Machine Learning (ICML), 2012.
N. Alon, M. Ba?doiu, E. Demaine, M. Farach-Colton, M. Hajiaghayi, and A. Sidiropoulos. Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics. ACM Transactions on Algorithms, 4(4):46, 2008.
M. Ba?doiu, E. Demaine, M. Hajiaghayi, A. Sidiropoulos, and M. Zadimoghaddam. Ordinal embedding: approximation algorithms and dimensionality reduction. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer, 2008.Y. Bilu and N. Linial. Monotone maps, sphericity and bounded second eigenvalue. Journal of Combinatorial Theory, Series B, 95(2):283299, 2005.
I. Borg and P. Groenen. Modern multidimensional scaling: Theory and applications. Springer, 2005.
D. Burago, Y. Burago, and S. Ivanov. A course in metric geometry. American Mathematical Society, 2001.
G. Gutin, E. Kim, M. Mnich, and A. Yeo. Ordinal embedding relaxations parameterized above tight lower bound. arXiv preprint arXiv:0907.5427, 2009.
K. Jamieson and R. Nowak. Low-dimensional embedding using adaptively selected ordinal data. In Conference on Communication, Control, and Computing, pages 10771084, 2011.
B. McFee and G. Lanckriet. Partial order embedding with multiple kernels. In International Conference on Machine Learning (ICML), 2009.H. Ouyang and A. Gray. Learning dissimilarities by ranking: from SDP to QP. In International Conference on Machine Learning (ICML), pages 728735, 2008.M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Neural Information Processing Systems (NIPS), 2004.B. Shaw and T. Jebara. Structure preserving embedding. In International Conference on Machine Learning (ICML), 2009.B. Shaw, B. Huang, and T. Jebara. Learning a distance metric from a network. Neural Information Processing Systems (NIPS), 2011.R. Shepard. Metric structures in ordinal data. Journal of Mathematical Psychology, 3(2):287315, 1966.
-----1
[1] C. Boutsidis and A. Gittens. Improved matrix algorithms via the Subsampled Randomized Hadamard Transform. ArXiv e-prints, Mar. 2012. To appear in the SIAM Journal on Matrix Analysis and Applications.
[2] P. Buhlmann and S. v. d. Geer. Statistics for High-dimensional Data. Springer, 2011.
[3] Y. Chang, C. Hsieh, K. Chang, M. Ringgaard, and C. Lin. Low-degree polynomial mapping of data for svm. JMLR, 11, 2010.
[4] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Theo- retical Computer Science, 312(1):3  15, 2004. ce:titleAutomata, Languages and Program- ming/ce:title.
[5] K. L. Clarkson, P. Drineas, M. Magdon-Ismail, M. W. Mahoney, X. Meng, and D. P. Woodruff.The Fast Cauchy Transform and faster faster robust regression. CoRR, abs/1207.4684, 2012.Also in SODA 2013.
[6] K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the 41st annual ACM Symposium on Theory of Computing, STOC 09, pages 205214, New York, NY, USA, 2009. ACM.
[7] K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the 45th annual ACM Symposium on Theory of Computing, STOC 13, pages 8190, New York, NY, USA, 2013. ACM.
[8] A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. Mahoney. Sampling algorithms and coresets for `p regression. SIAM Journal on Computing, 38(5):20602078, 2009.
[9] A. Gilbert and P. Indyk. Sparse recovery using sparse matrices. Proceedings of the IEEE, 98(6):937947, 2010.
[10] N. Halko, P. G. Martinsson, and J. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217 288, 2011.
[11] Q. Le, T. Sarlos, and A. Smola. Fastfood computing hilbert space expansions in loglinear time. In Proceedings of International Conference on Machine Learning, ICML 13, 2013.
[12] M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123224, 2011.
[13] M. W. Mahoney, P. Drineas, M. Magdon-Ismail, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. In Proceedings of the 29th International Conference on Machine Learning, ICML 12, 2012.
[14] X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th annual ACM Symposium on Theory of Computing, STOC 13, pages 91100, New York, NY, USA, 2013. ACM.
[15] J. Nelson and H. L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. CoRR, abs/1211.1002, 2012.
[16] R. Rahimi and B. Recht. Random features for large-scale kernel machines. In Proceedings of Neural Information Processing Systems, NIPS 07, 2007.
[17] V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105(36):13212, 2008.
[18] W. Rudin. Fourier Analysis on Groups. Wiley Classics Library. Wiley-Interscience, New York, 1994.
[19] T. Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceeding of IEEE Symposium on Foundations of Computer Science, FOCS 06, pages 143 152, 2006.
[20] C. Sohler and D. P. Woodruff. Subspace embeddings for the l1-norm with applications. In Proceedings of the 43rd annual ACM Symposium on Theory of Computing, STOC 11, pages 755764, 2011.
[21] D. P. Woodruff and Q. Zhang. Subspace embeddings and lp regression using exponential random variables. In COLT, 2013.
-----1
[1] A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. In CDC, pages 54515452, 2012.
[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3:1 122, 2011.
[3] J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel Coordinate Descent for L1- Regularized Loss Minimization. In ICML, 2011.
[4] C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS, pages 281288, 2006.
[5] W. Deng and W. Yin. On the global and linear convergence of the generalized alternating direction method of multipliers. Technical report, 2012.
[6] M. Eberts and I. Steinwart. Optimal learning rates for least squares svms using gaussian ker- nels. In NIPS, pages 15391547, 2011.
[7] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl., 2:1740, 1976.
[8] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In ICML, pages 408415, 2008.
[9] H. D. III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Protocols for learning classifiers on distributed data. JMLR- Proceedings Track, 22:282290, 2012.
[10] S. Lacoste-Julien, M. Jaggi, M. W. Schmidt, and P. Pletscher. Stochastic block-coordinate frank-wolfe optimization for structural svms. CoRR, abs/1207.4747, 2012.
[11] J. Langford, A. Smola, and M. Zinkevich. Slow learners are fast. In NIPS, pages 23312339.2009.
[12] Z. Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, pages 735, 1992.
[13] G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient Large-Scale dis- tributed training of conditional maximum entropy models. In NIPS, pages 12311239. 2009.
[14] H. Ouyang, N. He, L. Tran, and A. G. Gray. Stochastic alternating direction method of multi- pliers. In ICML, pages 8088, 2013.
[15] N. L. Roux, M. W. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In NIPS, pages 26722680, 2012.
[16] S. Shalev-Shwartz and T. Zhang. Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. JMLR, 2013.
[17] S. Smale and D.-X. Zhou. Estimating the approximation error in learning theory. Anal. Appl.(Singap.), 1(1):1741, 2003.
[18] K. Sridharan, S. Shalev-Shwartz, and N. Srebro. Fast rates for regularized objectives. In NIPS, pages 15451552, 2008.
[19] T. Suzuki. Dual averaging and proximal gradient descent for online alternating direction mul- tiplier method. In ICML, pages 392400, 2013.
[20] M. Takac, A. S. Bijral, P. Richtarik, and N. Srebro. Mini-batch primal and dual methods for svms. In ICML, 2013.
[21] C. H. Teo, S. Vishwanthan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. JMLR, pages 311365, 2010.
[22] K. I. Tsianos, S. Lawlor, and M. G. Rabbat. Communication/computation tradeoffs in consensus-based distributed optimization. In NIPS, pages 19521960, 2012.
[23] C. Zhang, H. Lee, and K. G. Shin. Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In AISTAT, pages 13981406, 2012.
[24] M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In NIPS, pages 25952603, 2010.
-----1
[1] Tsay, R.S. (2005). Analysis of Financial Time Series. Hoboken, New Jersey: Wiley.
[2] Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engi- neering 82:35-45.
[3] Rasmussen, C.E. & Williams, C.K.I (2006). Gaussian processes for machine learning. Boston: MIT Press.
[4] Huang, J.Z., Wu, C.O & Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111-128.
[5] Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Models. London: Chapman and Hall.
[6] Wu C.O., Chiang C.T. & Hoover D.R. (1998). Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. JASA 93:1388-1402.
[7] Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Statistics 19:1-67.
[8] Smith, M. & Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75:317-343.
[9] George, E.I. & McCulloch, R.E. (1993). Variable selection via Gibbs sampling. JASA 88:881-889.
[10] Donoho, D.L. & Johnstone, I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. JASA 90:1200-1224.
[11] Fan, J. & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial fitting: variable band- width and spatial adaptation. JRSS. Series B 57:371-394.
[12] Wolpert, R.L., Clyde M.A. & Tu, C. (2011). Stochastic expansions using continuous dictionaries: Levy adaptive regression kernels. Annals of Statistics 39:1916-1962.
[13] Bollerslev, T., Engle, R.F. and Wooldrige, J.M. (1988). A capital-asset pricing model with time-varying covariances. Journal of Political Economy 96:116-131.
[14] Engle, R.F. & Kroner, K.F. (1995). Multivariate simultaneous generalized ARCH. Econometric Theory 11:122-150.
[15] Engle, R.F. (2002). Dynamic conditional correlation: a simple class of multivariate generalized autore- gressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20:339-350.
[16] Burns, P. (2005). Multivariate GARCH with Only Univariate Estimation. http://www.burns-stat.com.
[17] Alexander, C.O. (2001). Orthogonal GARCH. Mastering Risk 2:21-38.
[18] van der Weide, R. (2002). GO-GARCH: a multivariate generalized orthogonal GARCH model. Journal of Applied Econometrics 17:549-564.
[19] Nakajima, J. & West, M. (2012). Dynamic factor volatility modeling: A Bayesian latent threshold ap- proach. Journal of Financial Econometrics, in press.
[20] Wilson, A.G. & Ghahramani Z. (2010). Generalised Wishart Processes. arXiv:1101.0240.
[21] Bru, M. (1991). Wishart Processes. Journal of Theoretical Probability 4:725-751.
[22] Fox E. & Dunson D.B. (2011). Bayesian Nonparametric Covariance Regression. arXiv:1101.2017.
[23] Zhu B. & Dunson D.B., (2012). Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian Processes. arXiv:1201.4403.
[24] Durbin, J. & Koopman, S. (2002). A simple and efficient simulation smoother for state space time series analysis. Biometrika 89:603-616.
[25] Durbin, J. & Koopman, S. (2001). Time Series Analysis by State Space Methods. New York: Oxford University Press Inc.
[26] Donoho, D.L. & Johnstone, J.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425- 455.
[27] Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statisti- cal Science 7:457-511.
-----1
[1] Anthony Bagnall, Luke Davis, Jon Hills, and Jason Lines. Transformation based ensembles for time series classification. In Proceedings of the 12th SIAM International Conference on Data Mining, pages 307319, 2012.
[2] Gustavo E.A.P.A. Batista, Xiaoyue Wang, and Eamonn J. Keogh. A complexity-invariant distance mea- sure for time series. In Proceedings of the 11th SIAM International Conference on Data Mining, pages 699710, 2011.
[3] Hila Becker, Mor Naaman, and Luis Gravano. Beyond trending topics: Real-world event identification on Twitter. In Proceedings of the Fifth International Conference on Weblogs and Social Media, 2011.
[4] Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the 10th International Workshop on Multimedia Data Mining, 2010.
[5] Thomas M. Cover and Peter E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):2127, 1967.
[6] Sanjoy Dasgupta and Leonard Schulman. A probabilistic analysis of EM for mixtures of separated, spherical gaussians. Journal of Machine Learning Research, 8:203226, 2007.
[7] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. Querying and min- ing of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):15421552, 2008.
[8] Dan Feldman, Matthew Faulkner, and Andreas Krause. Scalable training of mixture models via coresets.In Advances in Neural Information Processing Systems 24, 2011.
[9] Keinosuke Fukunaga. Introduction to statistical pattern recognition (2nd ed.). Academic Press Profes- sional, Inc., 1990.
[10] Daniel Hsu and Sham M. Kakade. Learning mixtures of spherical gaussians: Moment methods and spectral decompositions, 2013. arXiv:1206.5766.
[11] Shiva Prasad Kasiviswanathan, Prem Melville, Arindam Banerjee, and Vikas Sindhwani. Emerging topic detection using dictionary learning. In Proceedings of the 20th ACM Conference on Information and Knowledge Management, pages 745754, 2011.
[12] Shiva Prasad Kasiviswanathan, Huahua Wang, Arindam Banerjee, and Prem Melville. Online l1- dictionary learning with application to novel document detection. In Advances in Neural Information Processing Systems 25, pages 22672275, 2012.
[13] Michael Mathioudakis and Nick Koudas. Twittermonitor: trend detection over the Twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 2010.
[14] Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of gaussians. In 51st Annual IEEE Symposium on Foundations of Computer Science, pages 93102, 2010.
[15] Alex Nanopoulos, Rob Alcock, and Yannis Manolopoulos. Feature-based classification of time-series data. International Journal of Computer Research, 10, 2001.
[16] Juan J. Rodr?guez and Carlos J. Alonso. Interval and dynamic time warping-based decision trees. In Proceedings of the 2004 ACM Symposium on Applied Computing, 2004.
[17] Santosh Vempala and Grant Wang. A spectral algorithm for learning mixture models. Journal of Com- puter and System Sciences, 68(4):841860, 2004.
[18] Kilian Q. Weinberger and Lawrence K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207244, 2009.
[19] Yi Wu and Edward Y. Chang. Distance-function design and fusion for sequence data. In Proceedings of the 2004 ACM International Conference on Information and Knowledge Management, 2004.
[20] Xiaopeng Xi, Eamonn J. Keogh, Christian R. Shelton, Li Wei, and Chotirat Ann Ratanamahatana. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on Machine Learning, 2006.
-----1
[1] Ledyard R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279 311, 1966.
[2] Tamara G. Kolda. Tensor decompositions and applications. SIAM Review, 51(3):455500, 2009.
[3] Tamara G. Kolda. Multilinear operators for higher-order decompositions. Technical report, United States Department of Energy, 2006.
[4] Amy N. Langville and William J. Stuart. The Kronecker product and stochastic automata networks.Journal of Computational and Applied Mathematics, 167(2):429447, 2004.
[5] Jan R. Magnus and Heinz Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, revised edition, 1999.
[6] Vin De Silva and Lek-Heng Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications, 30(3):10841127, 2008.
[7] Peter J. Basser and Sinisa Pajevic. A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI. IEEE Transactions on Medical Imaging, 22(7):785794, 2003.
[8] Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):138, 1977.
[9] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 1st edition, 2006.
[10] NOAA/Pacific Marine Environmental Laboratory. Tropical Atmosphere Ocean Project. http://www.pmel.noaa.gov/tao/data_deliv/deliv.html. Accessed: May 23, 2013.
[11] Zoubin Ghahramani and Geoffrey E. Hinton. Parameter estimation for linear dynamical systems. Tech- nical Report CRG-TR-96-2, University of Toronto Department of Computer Science, 1996.
[12] Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond streams and graphs: dynamic tensor analysis.In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 374383. ACM, 2006.
[13] Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, and Jaime G. Carbonell. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. In Proceedings of SIAM Data Mining, 2010.
[14] Haipin Lu, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos. MPCA: Multilinear principal components analysis of tensor objects. IEEE Transactions on Neural Networks, 19(1), 2008.
[15] Tamara Kolda and Jimeng Sun. Scalable tensor decompositions for multi-aspect data mining. In Eighth IEEE International Conference on Data Mining. IEEE, 2008.
[16] Dacheng Tao, Mingli Song, Xuelong Li, Jialie Shen, Jimeng Sun, Xindong Wu, Christos Faloutsos, and Stephen J. Maybank. Bayesian tensor approach for 3-D face modeling. IEEE Transactions on Circuits and Systems for Video Technology, 18(10):13971410, 2008.
[17] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender sys- tems. Computer, 42(8):3037, 2009.
[18] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Infor- mation Processing Systems, volume 20, pages 12571264, 2008.
[19] Ruslan Salakhutdinov and Andriy Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 2008.
[20] Cyril Goutte and Massih-Reza Amini. Probabilistic tensor factorization and model selection. In Tensors, Kernels, and Machine Learning (TKLM 2010), pages 14, 2010.
[21] Christian F. Beckmann and Stephen M. Smith. Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage, 25(1):294311, 2005.
[22] Y. Kenan Yilmaz, A. Taylan Cemgil, and Umut Simsekli. Generalized coupled tensor factorization. In Neural Information Processing Systems. MIT Press, 2011.
-----1
[1] A. Bekessy, P. Bekessy, and J. Komplos. Asymptotic enumeration of regular matrices. Studia Scientiarum Mathematicarum Hungarica, pages 343353, 1972.
[2] I. Bezakova, N. Bhatnagar, and E. Vigoda. Sampling binary contingency tables with a greedy start.Random Struct. Algorithms, 30(1-2):168205, 2007.
[3] T. D. Bie, K.-N. Kontonasios, and E. Spyropoulou. A framework for mining interesting pattern sets.SIGKDD Explorations, 12(2):92100, 2010.
[4] T. Bond and C. Fox. Applying the Rasch Model: Fundamental Measurement in the Human Sciences.Lawrence Erlbaum, 2007.
[5] R. Brualdi and H. J. Ryser. Combinatorial Matrix Theory. Cambridge University Press, 1991.
[6] Y. Chen, P. Diaconis, S. Holmes, and J. Liu. Sequential monte carlo methods for statistical analysis of tables. Journal of American Statistical Association, JASA, 100:109120, 2005.
[7] G. W. Cobb and Y.-P. Chen. An application of Markov chain Monte Carlo to community ecology. Amer.Math. Month., 110(4):264288, 2003.
[8] M. Gail and N. Mantel. Counting the number of r  c contigency tables with fixed margins. Journal of the American Statistical Association, 72(360):859862, 1977.
[9] A. Gionis, H. Mannila, T. Mielikainen, and P. Tsaparas. Assessing data mining results via swap random- ization. TKDD, 1(3), 2007.
[10] I. J. Good and J. Crook. The enumeration of arrays and a generalization related to contigency tables.Discrete Mathematics, 19(1):23  45, 1977.
[11] S. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear graph. Journal of the Society for Industrial and Applied Mathematics, 10(3):496506, 1962.
[12] S. Hyvonen, P. Miettinen, and E. Terzi. Interpretable nonnegative matrix decompositions. In KDD, pages 345353, 2008.
[13] T. Kariya and H. Kurata. Generalized Least Squares. Wiley, 2004.
[14] N. Kashtan, S. Itzkovitz, R. Milo, and U. Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):17461758, 2004.
[15] M. Mampaey, J. Vreeken, and N. Tatti. Summarizing data succinctly with the most informative itemsets.TKDD, 6(4), 2012.
[16] H. Mannila and E. Terzi. Nestedness and segmented nestedness. In KDD, pages 480489, 2007.
[17] R. Milo, S. Shen-Orr, S. Itzkovirz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824827, 2002.
[18] P. Erdos and T. Gallai. Graphs with prescribed degrees of vertices. Mat. Lapok., 1960.
[19] G. Rasch. Probabilistic models for some intelligence and attainment tests. 1960. Technical Report, Danish Institute for Educational Research, Copenhagen.
[20] J. Sanderson. Testing ecological patterns. Amer. Sci., 88(4):332339, 2000.
[21] T. Snijders. Enumeration and simulation methods for 01 matrices with given marginals. Psychometrika, 56(3):397417, 1991.
[22] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in Facebook.In WOSN, pages 3742, 2009.
[23] B. Wang and F. Zhang. On the precise number of (0,1)-matrices in u(r,s). Discrete Mathematics, 187(13):211220, 1998.
[24] M. Yannakakis. Computing the minimum fill-in is NP-Complete. SIAM Journal on Algebraic and Dis- crete Methods, 2(1):7779, 1981.
-----1
[1] E. Acar, D. Dunlavy, and T. Kolda. Link prediction on evolving data using matrix and tensor factoriza- tions. In Data Mining Workshops, 2009. ICDMW09. IEEE International Conference on, pages 262269.IEEE, 2009.
[2] A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in NIPS 20, pages 2532. MIT Press, Cambridge, MA, 2008.
[3] E. J. Cande`s and B. Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717772, 2009. ISSN 1615-3375. doi: 10.1007/s10208-009-9045-5. URL http://dx.doi.org/10.1007/s10208-009-9045-5.
[4] A. Goldberg, X. Zhu, B. Recht, J. Xu, and R. Nowak. Transduction with matrix completion: Three birds with one stone. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 757765. 2010.
[5] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Inform.Theory, 56(6):29802998, 2010. ISSN 0018-9448. doi: 10.1109/TIT.2010.2046205. URL http: //dx.doi.org/10.1109/TIT.2010.2046205.
[6] F. J. Kiraly, L. Theran, R. Tomioka, and T. Uno. The algebraic combinatorial approach for low-rank matrix completion. Preprint, arXiv:1211.4116v4, 2012. URL http://arxiv.org/abs/1211.4116.
[7] F. J. Kiraly and L. Theran. AlCoCoMa, 2013. http://mloss.org/software/view/524/.
[8] A. Menon and C. Elkan. Link prediction via matrix factorization. Machine Learning and Knowledge Discovery in Databases, pages 437452, 2011.
[9] N. Srebro, J. D. M. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in NIPS 17, pages 13291336. MIT Press, Cambridge, MA, 2005.
[10] R. Tomioka, K. Hayashi, and H. Kashima. On the extension of trace norm to tensors. In NIPS Workshop on Tensors, Kernels, and Machine Learning, 2010.
-----1
[1] E. Amir and A. Chang. Learning partially observable deterministic action models. Journal of Artificial Intelligence Research, 33(1):349402, 2008.
[2] D. Bryce, S. Kambhampati, and D. Smith. Sequential monte carlo in probabilistic planning reachability heuristics. Proceedings of ICAPS06, 2006.
[3] K. Delgado, S. Sanner, and L. De Barros. Efficient solutions to factored mdps with imprecise transition probabilities. Artificial Intelligence, 2011.
[4] C. Domshlak and J. Hoffmann. Probabilistic planning via heuristic forward search and weighted model counting. Journal of Artificial Intelligence Research, 30(1):565620, 2007.
[5] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres. Planning under incomplete knowledge. Compu- tational LogicCL 2000, pages 807821, 2000.
[6] M. Fox, R. Howey, and D. Long. Exploration of the robustness of plans. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 834, 2006.
[7] A. Garland and N. Lesh. Plan evaluation with incomplete action descriptions. In Proceedings of the National Conference on Artificial Intelligence, pages 461467, 2002.
[8] R. Givan, S. Leach, and T. Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122(1-2):71109, 2000.
[9] P. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research, 24(1):4979, 2005.
[10] P. Gmytrasiewicz and E. Durfee. Rational coordination in multi-agent environments. Autonomous Agents and Multi-Agent Systems, 3(4):319350, 2000.
[11] N. Hyafil and F. Bacchus. Conformant probabilistic planning via csps. In Proceedings of the Thirteenth International Conference on Automated Planning and Scheduling, pages 205214, 2003.
[12] R. Jensen, M. Veloso, and R. Bryant. Fault tolerant planning: Toward probabilistic uncertainty models in symbolic non-deterministic planning. In Proceedings of the 14th International Conference on Automated Planning and Scheduling (ICAPS), volume 4, pages 235344, 2004.
[13] S. Kambhampati. Model-lite planning for the web age masses: The challenges of planning with incom- plete and evolving domain models. In Proceedings of the National Conference on Artificial Intelligence, volume 22, page 1601, 2007.
[14] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
[15] N. Kushmerick, S. Hanks, and D. Weld. An algorithm for probabilistic planning. Artificial Intelligence, 76(1-2):239286, 1995.
[16] D. Morwood and D. Bryce. Evaluating temporal plans in incomplete domains. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[17] A. Nilim and L. Ghaoui. Robust control of Markov decision processes with uncertain transition matrices.Operations Research, 53(5):780798, 2005.
[18] F. A. Oliehoek. Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environ- ments. PhD thesis, Informatics Institute, University of Amsterdam, Feb. 2010.
[19] J. Satia and R. Lave Jr. Markovian decision processes with uncertain transition probabilities. Operations Research, pages 728740, 1973.
[20] S. Seuken and S. Zilberstein. Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2):190250, 2008.
[21] A. Shapiro and A. Kleywegt. Minimax analysis of stochastic problems. Optimization Methods and Software, 17(3):523542, 2002.
[22] L. Valiant. The complexity of computing the permanent. Theoretical computer science, 8(2):189201, 1979.
[23] L. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410421, 1979.
[24] C. Weber and D. Bryce. Planning and acting in incomplete domains. Proceedings of ICAPS11, 2011.
[25] C. White III and H. Eldeib. Markov decision processes with imprecise transition probabilities. Operations Research, pages 739749, 1994.
[26] Q. Yang, K. Wu, and Y. Jiang. Learning action models from plan examples using weighted max-sat.Artificial Intelligence, 171(2):107143, 2007.
-----1
[1] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions in Mathematical Software, 1977.
[2] N. Verma, S. Kpotufe, and S. Dasgupta. Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2009.
[3] R.F. Sproull. Refinements to Nearest-Neighbor Searching in k-dimensional Trees. Algorith- mica, 1991.
[4] J. McNames. A Fast Nearest-Neighbor Algorithm based on a Principal Axis Search Tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001.
[5] K. Fukunaga and P. M. Nagendra. A Branch-and-Bound Algorithm for Computing k-Nearest- Neighbors. IEEE Transactions on Computing, 1975.
[6] D. Nister and H. Stewenius. Scalable Recognition with a Vocabulary Tree. In IEEE Conference on Computer Vision and Pattern Recognition, 2006.
[7] S. Dasgupta and Y. Freund. Random Projection trees and Low Dimensional Manifolds. In Proceedings of ACM Symposium on Theory of Computing, 2008.
[8] P. Ram, D. Lee, and A. G. Gray. Nearest-neighbor Search on a Time Budget via Max-Margin Trees. In SIAM International Conference on Data Mining, 2012.
[9] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum Margin Clustering. Advances in Neural Information Processing Systems, 2005.
[10] P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of ACM Symposium on Theory of Computing, 1998.
[11] T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An Investigation of Practical Approximate Nearest Neighbor Algorithms. Advances in Neural Information Proceedings Systems, 2005.
[12] S. Dasgupta and K. Sinha. Randomized Partition Trees for Exact Nearest Neighbor Search. In Proceedings of the Conference on Learning Theory, 2013.
[13] J. He, S. Kumar and S. F. Chang. On the Difficulty of Nearest Neighbor Search. In Proceedings of the International Conference on Machine Learning, 2012.
[14] Y. Freund, S. Dasgupta, M. Kabra, and N. Verma. Learning the Structure of Manifolds using Random Projections. Advances in Neural Information Processing Systems, 2007.
[15] D. R. Karger and M. Ruhl. Finding Nearest Neighbors in Growth-Restricted Metrics. In Proceedings of ACM Symposium on Theory of Computing, 2002.
[16] B. Zhao, F. Wang, and C. Zhang. Efficient Maximum Margin Clustering via Cutting Plane Algorithm. In SIAM International Conference on Data Mining, 2008.
[17] Y. Gong and S. Lazebnik. Iterative Quantization: A Procrustean Approach to Learning Binary Codes. In IEEE Conference on Computer Vision and Pattern Recognition, 2011.
[18] R. Salakhutdinov and G. Hinton. Learning a Nonlinear Embedding by Preserving Class Neigh- bourhood Structure. In Artificial Intelligence and Statistics, 2007.
[19] Y. Weiss, A. Torralba, and R. Fergus. Spectral Hashing. Advances of Neural Information Processing Systems, 2008.
[20] J. Wang, S. Kumar, and S. Chang. Semi-Supervised Hashing for Scalable Image Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[21] M. Norouzi and D. J. Fleet. Minimal Loss Hashing for Compact Binary Codes. In Proceedings of the International Conference on Machine Learning, 2011.
[22] S. Lloyd. Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28(2):129137, 1982.
-----1
[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proc. of Interna- tional Conference on Machine learning, 2004.
[2] AccessKenya.com. http://traffic.accesskenya.com/.
[3] S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251276, 1998.
[4] S. Amari and H. Nagaoka. Method of Information Geometry. Oxford University Press, 2000.
[5] S. Amari, H. Park, and K. Fukumizu. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12(6):13991409, 2000.
[6] H. Balzter. Markov chain models for vegetation dynamics. Ecological Modelling, 126(2-3):139154, 2000.
[7] J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319350, 2001.
[8] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[9] H. H. Bui, S. Venkatesh, and G. West. On the recognition of abstract Markov policies. In Proc. of AAAI Conference on Artificial Intelligence, pages 524530, 2000.
[10] S. L. Chang, L. S. Chen, Y. C. Chung, and S. W. Chen. Automatic license plate recognition. In Proc. of IEEE Transactions on Intelligent Transportation Systems, pages 4253, 2004.
[11] E. Crisostomi, S. Kirkland, and R. Shorten. A Google-like model of road network dynamics and its application to regulation and control. International Journal of Control, 84(3):633651, 1995.
[12] M. Gamon and A. C. Konig. Navigation patterns from and to social media. In Proc. of AAAI Conference on Weblogs and Social Media, 2009.
[13] J. E. Gonzales, C. C. Chavis, Y. Li, and C. F. Daganzo. Multimodal transport in Nairobi, Kenya: Insights and recommendations with a macroscopic evidence-based model. In Proc. of Transportation Research Board 90th Annual Meeting, 2011.
[14] T. Ide and M. Sugiyama. Trajectory regression on road networks. In Proc. of AAAI Conference on Artificial Intelligence, pages 203208, 2011.
[15] T. Katasuki, T. Morimura, and T. Ide. Bayesian unsupervised vehicle counting. In Technical Report. IBM Research, RT0951, 2013.
[16] D. Levin, Y. Peres, and E. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.
[17] D. MacKay. Information theory, inference, and learning algorithms. Cambridge University Press, 2003.
[18] T. Morimura and S. Kato. Statistical origin-destination generation with multiple sources. In Proc. of International Conference on Pattern Recognition, pages 283290, 2012.
[19] T. Morimura, E. Uchibe, J. Yoshimoto, and K. Doya. A generalized natural actor-critic algorithm. In Proc. of Advances in Neural Information Processing Systems, volume 22, 2009.
[20] A. Y. Ng and S. Russell. Algorithms for inverse reinforcement learning. In Proc. of International Con- ference on Machine Learning, 2000.
[21] J. R. Norris. Markov Chains. Cambridge University Press, 1998.
[22] OpenStreetMap. http://wiki.openstreetmap.org/.
[23] C. C. Pegels and A. E. Jelmert. An evaluation of blood-inventory policies: A Markov chain application.Operations Research, 18(6):10871098, 1970.
[24] J.A. Quinn and R. Nakibuule. Traffic flow monitoring in crowded cities. In Proc. of AAAI Spring Sympo- sium on Artificial Intelligence for Development, 2010.
[25] C. M. Roberts. Radio frequency identification (RFID). Computers & Security, 25(1):1826, 2006.
[26] S. M. Ross. Stochastic processes. John Wiley & Sons Inc, 1996.
[27] S. Santini. Analysis of traffic flow in urban areas using web cameras. In Proc. of IEEE Workshop on Applications of Computer Vision, pages 140145, 2000.
[28] R. R. Sarukkai. Link prediction and path analysis using Markov chains. Computer Networks, 33(1- 6):377386, 2000.
[29] G. Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Eco- nomics Letters, 20(2):177181, 1986.
[30] Y. Zhang, M. Roughan, C. Lund, and D. Donoho. An information-theoretic approach to traffic matrix es- timation. In Proc. of Conference on Applications, technologies, architectures, and protocols for computer communications, pages 301312. ACM, 2003.
[31] J. Zhu, J. Hong, and J. G. Hughes. Using Markov chains for link prediction in adaptive Web sites. In Proc. of Soft-Ware 2002: Computing in an Imperfect World, volume 2311, pages 6073. Springer, 2002.
[32] B. D. Ziebart, A. L. Maas, and A. K. Dey J. A. Bagnell. Maximum entropy inverse reinforcement learning.In Proc. of AAAI Conference on Artificial Intelligence, pages 14331438, 2008.
-----1
[1] D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientific, 3rd edition, 2007.
[2] W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley-Blackwell, 2007.
[3] L. Devroye. The uniform convergence of the nadaraya-watson regression function estimate. Canadian Journal of Statistics, 6(2):179191, 1978.
[4] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009.
[5] A. Keshavarz and S. Boyd. Quadratic approximate dynamic programming for input-affine systems. In- ternational Journal of Robust and Nonlinear Control, 2012. Forthcoming.
[6] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matri- ces. Operations Research, 53(5):780798, 2005.
[7] G. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257280, 2005.
[8] S. Mannor, O. Mebel, and H. Xu. Lightning does not strike twice: Robust MDPs with coupled uncertainty.In Proceedings of the 29th International Conference on Machine Learning, pages 385392, 2012.
[9] W. Wiesemann, D. Kuhn, and B. Rustem. Robust Markov decision processes. Mathematics of Operations Research, 38(1):153183, 2013.
[10] M.G. Lagoudakis and R. Parr. Least-squares policy iteration. The Journal of Machine Learning Research, 4:11071149, 2003.
[11] D.P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310335, 2011.
[12] C.E. Rasmussen and M. Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems, pages 751759, 2004.
[13] L. Busoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuka, and B. De Schutter. Least-squares methods for policy iteration. In Reinforcement Learning, pages 75109. Springer, 2012.
[14] X. Xu, T. Xie, D. Hu, and X. Lu. Kernel least-squares temporal difference learning. International Journal of Information Technology, 11(9):5463, 2005.
[15] Y. Engel, S. Mannor, and R. Meir. Reinforcement learning with Gaussian processes. In Proceedings of the 22nd International Conference on Machine Learning, pages 201208, 2005.
[16] G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In Proceed- ings of the 26th International Conference on Machine Learning, pages 10171024, 2009.
[17] E.A. Nadaraya. On estimating regression. Theory of Probability & its Applications, 9(1):141142, 1964.
[18] G.S. Watson. Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A, 26(4):359 372, 1964.
[19] B. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.
[20] L. Hannah, W. Powell, and D. Blei. Nonparametric density estimation for stochastic optimization with an observable state variable. In Advances in Neural Information Processing Systems, pages 820828, 2010.
[21] A. Cybakov. Introduction to Nonparametric Estimation. Springer, 2009.
[22] L. Pardo. Statistical Inference Based on Divergence Measures, volume 185 of Statistics: A Series of Textbooks and Monographs. Chapman and Hall/CRC, 2005.
[23] Z. Wang, P.W. Glynn, and Y. Ye. Likelihood robust optimization for data-driven newsvendor problems.Technical report, Stanford University, 2009.
[24] A. Ben-Tal, D. den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341357, 2013.
[25] A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2009.
[26] J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimiza- tion Methods and Software, 11-12:625654, 1999.
[27] J. Lfberg. YALMIP : A toolbox for modeling and optimization in MATLAB. In Proceedings of the CACSD Conference, 2004.
[28] L. Hannah and D. Dunson. Approximate dynamic programming for storage problems. In Proceedings of the 28th International Conference on Machine Learning, pages 337344, 2011.
[29] J.H. Kim and W.B. Powell. Optimal energy commitments with storage and intermittent supply. Opera- tions Research, 59(6):13471360, 2011.
[30] M. Kraning, Y. Wang, E. Akuiyibo, and S. Boyd. Operation and configuration of a storage portfolio via convex optimization. In Proceedings of the IFAC World Congress, pages 1048710492, 2011.
-----1
[1] Adams, R. P. and MacKay, D. J. (2007). Bayesian online changepoint detection. Technical report, Univer- sity of Cambridge, Cambridge, UK.
[2] Bar-Shalom, Y., Li, X. R., and Kirubarajan, T. (2011). Tracking and Data Fusion: A Handbook of Algo- rithms. YBS Publishing.
[3] Beal, M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In Bayesian Statistics, volume 7, pages 453464.
[4] Bishop, C. M. (2007). Pattern Recognition and Machine Learning. Springer.
[5] Braun, J. V., Braun, R., and Muller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika, 87(2):301314.
[6] Byrd, E. (2003). Single integrated air picture (SIAP) attributes version 2.0. Technical Report 2003-029, DTIC.
[7] Chen, J. and Gupta, A. (1997). Testing and locating variance changepoints with application to stock prices.Journal of the Americal Statistical Association, 92(438):739747.
[8] Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics. Interscience.
[9] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):138.
[10] Ehrman, L. M. and Blair, W. D. (2006). Comparison of methods for using target amplitude to improve measurement-to-track association in multi-target tracking. In Information Fusion, 2006 9th International Conference on, pages 18. IEEE.
[11] Fearnhead, P. and Liu, Z. (2007). Online inference for multiple changepoint problems. Journal of the Royal Statistical Society, Series B, 69(4):589605.
[12] Hipp, C. (1974). Sufficient statistics and exponential families. The Annals of Statistics, 2(6):12831292.
[13] Honkela, A. and Valpola, H. (2003). On-line variational Bayesian learning. In 4th International Sympo- sium on Independent Component Analysis and Blind Signal Separation, pages 803808.
[14] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME  Journal of Basic Engineering, 82(Series D):3545.
[15] Lauwers, L., Barbe, K., Van Moer, W., and Pintelon, R. (2009). Estimating the parameters of a Rice distribution: A Bayesian approach. In Instrumentation and Measurement Technology Conference, 2009.I2MTC09. IEEE, pages 114117. IEEE.
[16] Mahler, R. (2003). Multi-target Bayes filtering via first-order multi-target moments. IEEE Trans. AES, 39(4):11521178.
[17] Marcum, J. (1950). Table of Q functions. U.S. Air Force RAND Research Memorandum M-339, Rand Corporation, Santa Monica, CA.
[18] Mardia, K. V. and Jupp, P. E. (2000). Directional Statistics. John Wiley & Sons, New York.
[19] Murray, I. (2007). Advances in Markov chain Monte Carlo methods. PhD thesis, Gatsby computational neuroscience unit, University College London, London, UK.
[20] Poore, A. P., Rijavec, N., Barker, T. N., and Munger, M. L. (1993). Data association problems posed as multidimensional assignment problems: algorithm development. In Optical Engineering and Photonics in Aerospace Sensing, pages 172182. International Society for Optics and Photonics.
[21] Richards, M. A., Scheer, J., and Holm, W. A., editors (2010). Principles of Modern Radar: Basic Princi- ples. SciTech Pub.
[22] Rogers, S. (2012). The worlds top 100 airports: listed, ranked and mapped. The Guardian.
[23] Saatci, Y., Turner, R., and Rasmussen, C. E. (2010). Gaussian process change point models. In 27th International Conference on Machine Learning, pages 927934, Haifa, Israel. Omnipress.
[24] Sato, M.-A. (2001). Online model selection based on the variational Bayes. Neural Computation, 13(7):16491681.
[25] Slocumb, B. J. and Klusman III, M. E. (2005). A multiple model SNR/RCS likelihood ratio score for radar-based feature-aided tracking. In Optics & Photonics 2005, pages 59131N59131N. International Society for Optics and Photonics.
[26] Swerling, P. (1954). Probability of detection for fluctuating targets. Technical Report RM-1217, Rand Corporation.
[27] Turner, R. (2011). Gaussian Processes for State Space Models and Change Point Detection. PhD thesis, University of Cambridge, Cambridge, UK.
-----1
[1] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001.
[2] Yixin Chen, Xin Dang, Hanxiang Peng, and Henry L. Bart. Outlier detection with the kernel- ized spatial depth function. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):288305, 2009.
[3] G. Fasano and A. Franceschini. A multidimensional version of the Kolmogorov-Smirnov test.Monthly Notices of the Royal Astronomical Society, 225:155170, 1987.
[4] A. Glazer, M. Lindenbaum, and S. Markovitch. Learning high-density regions for a generalized Kolmogorov-Smirnov test in high-dimensional data. In NIPS, pages 737745, 2012.
[5] John A Hartigan. Clustering Algorithms. John Wiley & Sons, Inc., 1975.
[6] V. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85126, 2004.
[7] Roger Koenker. Quantile regression. Cambridge university press, 2005.
[8] Gyemin Lee and Clayton Scott. Nested support vector machines. Signal Processing, IEEE Transactions on, 58(3):16481660, 2010.
[9] Youjuan Li, Yufeng Liu, and Ji Zhu. Quantile regression in reproducing kernel hilbert spaces.Journal of the American Statistical Association, 102(477):255268, 2007.
[10] D.G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, 3rd edition, 2008.
[11] A. Munoz and J.M. Moguerza. Estimation of high-density regions using one-class neighbor machines. In PAMI, pages 476480, 2006.
[12] W. Polonik. Concentration and goodness-of-fit in higher dimensions:(asymptotically) distribution-free methods. The Annals of Statistics, 27(4):12101229, 1999.
[13] B. Scholkopf, A.J. Smola, R.C. Williamson, and P.L. Bartlett. New support vector algorithms.Neural Computation, 12(5):12071245, 2000.
[14] Bernhard Scholkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C.Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):14431471, 2001.
[15] R. Serfling. Quantile functions for multivariate analysis: approaches and applications. Statis- tica Neerlandica, 56(2):214232, 2002.
[16] Ingo Steinwart, Don R Hush, and Clint Scovel. A classification framework for anomaly detec- tion. In JMLR, pages 211232, 2005.
[17] Ichiro Takeuchi, Quoc V Le, Timothy D Sears, and Alexander J Smola. Nonparametric quantile estimation. JMLR, 7:12311264, 2006.
[18] Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 2nd edition, 1998.
[19] R. Vert and J.P. Vert. Consistency and convergence rates of one-class svms and related algo- rithms. The Journal of Machine Learning Research, 7:817854, 2006.
[20] W. Zhang, X. Lin, M.A. Cheema, Y. Zhang, and W. Wang. Quantile-based knn over multi- valued objects. In ICDE, pages 1627. IEEE, 2010.
[21] Yijun Zuo and Robert Serfling. General notions of statistical depth function. The Annals of Statistics, 28(2):461482, 2000.
-----1
[1] S.-C. Zhu and D. Mumford, A stochastic grammar of images, Found. Trends. Comput. Graph. Vis., vol. 2, no. 4, pp. 259362, 2006.
[2] Y. Jin and S. Geman, Context and hierarchy in a probabilistic image model, in CVPR, 2006.
[3] Y. Zhao and S. C. Zhu, Image parsing with stochastic scene grammar, in NIPS, 2011.
[4] Y. A. Ivanov and A. F. Bobick, Recognition of visual activities and interactions by stochastic parsing, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 8, pp. 852872, 2000.
[5] M. S. Ryoo and J. K. Aggarwal, Recognition of composite human activities through context-free gram- mar based representation, in CVPR, 2006.
[6] Z. Zhang, T. Tan, and K. Huang, An extended grammar system for learning and recognizing complex visual events, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 240255, Feb. 2011.
[7] M. Pei, Y. Jia, and S.-C. Zhu, Parsing video events with goal inference and intent prediction, in ICCV, 2011.
[8] C. D. Manning and H. Schutze, Foundations of statistical natural language processing. Cambridge, MA, USA: MIT Press, 1999.
[9] P. Liang, M. I. Jordan, and D. Klein, Probabilistic grammars and hierarchical dirichlet processes, The handbook of applied Bayesian analysis, 2009.
[10] H. Poon and P. Domingos, Sum-product networks : A new deep architecture, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI), 2011.
[11] J. K. Baker, Trainable grammars for speech recognition, in Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, 1979.
[12] D. Klein and C. D. Manning, Corpus-based induction of syntactic structure: Models of dependency and constituency, in Proceedings of ACL, 2004.
[13] S. Wang, Y. Wang, and S.-C. Zhu, Hierarchical space tiling for scene modeling, in Computer Vision ACCV 2012. Springer, 2013, pp. 796810.
[14] A. Stolcke and S. M. Omohundro, Inducing probabilistic grammars by Bayesian model merging, in ICGI, 1994, pp. 106118.
[15] Z. Solan, D. Horn, E. Ruppin, and S. Edelman, Unsupervised learning of natural languages, Proc. Natl.Acad. Sci., vol. 102, no. 33, pp. 11 62911 634, August 2005.
[16] K. Tu and V. Honavar, Unsupervised learning of probabilistic context-free grammar using iterative bi- clustering, in Proceedings of 9th International Colloquium on Grammatical Inference (ICGI 2008), ser.LNCS 5278, 2008.
[17] Z. Si and S. Zhu, Learning and-or templates for object modeling and recognition, IEEE Trans on Pattern Analysis and Machine Intelligence, 2013.
[18] Z. Si, M. Pei, B. Yao, and S.-C. Zhu, Unsupervised learning of event and-or grammar and semantics from video, in ICCV, 2011.
[19] J. F. Allen, Towards a general theory of action and time, Artificial intelligence, vol. 23, no. 2, pp.123154, 1984.
[20] V. I. Spitkovsky, H. Alshawi, D. Jurafsky, and C. D. Manning, Viterbi training improves unsupervised dependency parsing, in Proceedings of the Fourteenth Conference on Computational Natural Language Learning, ser. CoNLL 10, 2010.
[21] K. Tu and V. Honavar, Unambiguity regularization for unsupervised learning of probabilistic grammars, in Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Nat- ural Language Learning (EMNLP-CoNLL 2012), 2012.
[22] S. C. Madeira and A. L. Oliveira, Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. on Comp. Biol. and Bioinformatics, vol. 1, no. 1, pp. 2445, 2004.
[23] P. Wei, N. Zheng, Y. Zhao, and S.-C. Zhu, Concurrent action detection with structural prediction, in Proc. Intl Conference on Computer Vision (ICCV), 2013.
[24] A. Barbu, M. Pavlovskaia, and S. Zhu, Rates for inductive learning of compositional models, in AAAI Workshop on Learning Rich Representations from Low-Level Sensors (RepLearning), 2013.
-----1
[1] Aggarwal, C. C. Outlier Analysis. Springer, 2013.
[2] Bache, K. and Lichman, M. UCI machine learning repository, 2013.
[3] Bakar, Z. A., Mohemad, R., Ahmad, A., and Deris, M. M. A comparative study for outlier detection tech- niques in data mining. In Proceedings of IEEE International Conference on Cybernetics and Intelligent Systems, 16, 2006.
[4] Bay, S. D. and Schwabacher, M. Mining distance-based outliers in near linear time with randomiza- tion and a simple pruning rule. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2938, 2003.
[5] Berchtold, S., Keim, D. A., and Kriegel, H.-P. The X-tree: An index structure for high-dimensional data.In Proceedings of the 22th International Conference on Very Large Data Bases, 2839, 1996.
[6] Bhaduri, K., Matthews, B. L., and Giannella, C. R. Algorithms for speeding up distance-based outlier detection. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 859867, 2011.
[7] Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. LOF: Identifying density-based local outliers.In Proceedings of the ACM SIGMOD International Conference on Management of Data, 93104, 2000.
[8] Caputo, B., Sim, K., Furesjo, F., and Smola, A. Appearance-based object recognition using SVMs: Which kernel should I use? In Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, 2002.
[9] de Vries, T., Chawla, S., and Houle, M. E. Density-preserving projections for large-scale local anomaly detection. Knowledge and Information Systems, 128, 2011.
[10] Hawkins, D. Identification of Outliers. Chapman and Hall, 1980.
[11] Knorr, E. M. and Ng, R. T. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, 392403, 1998.
[12] Knorr, E. M., Ng, R. T., and Tucakov, V. Distance-based outliers: algorithms and applications. The VLDB Journal, 8(3):237253, 2000.
[13] Kriegel, H.-P., Kroger, P., and Zimak, A. Outlier detection techniques. Tutorial at 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
[14] Kriegel, H.-P., Schubert, M., and Zimek, A. Angle-based outlier detection in high-dimensional data.In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 444452, 2008.
[15] Liu, F. T., Ting, K. M., and Zhou, Z. H. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 6(1):3:13:39, 2012.
[16] Orair, G. H., Teixeira, C. H. C., Wang, Y., Meira Jr., W., and Parthasarathy, S. Distance-based outlier detection: consolidation and renewed bearing. PVLDB, 3(2):14691480, 2010.
[17] Pham, N. and Pagh, R. A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 877885, 2012.
[18] Ramaswamy, S., Rastogi, R., and Shim, K. Efficient algorithms for mining outliers from large data sets.In Proceedings of the ACM SIGMOD International Conference on Management of Data, 427438, 2000.
[19] Scholkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):14431471, 2001.
[20] Weber, R., Schek, H.-J., and Blott, S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Data Bases, 194205, 1998.
[21] Wu, M. and Jermaine, C. Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 767772, 2006.
[22] Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8(3):275 300, 2004.
[23] Yu, D., Sheikholeslami, G., and Zhang, A. FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4(4):387412, 2002.
[24] Zimek, A., Schubert, E., and Kriegel, H.-P. A survey on unsupervised outlier detection in high- dimensional numerical data. Statistical Analysis and Data Mining, 5(5):363387, 2012.
-----1
[1] M. K. Babcock and J. Freyd. Perception of dynamic information in static handwritten forms. American Journal of Psychology, 101(1):111130, 1988.
[2] I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2):11547, 1987.
[3] S. Carey and E. Bartlett. Acquiring a single new word. Papers and Reports on Child Language Develop- ment, 15:1729, 1978.
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[5] L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594611, 2006.
[6] J. Feldman. The structure of perceptual categories. Journal of Mathematical Psychology, 41:145170, 1997.
[7] J. Freyd. Representing the dynamics of a static form. Memory and Cognition, 11(4):342346, 1983.
[8] S. Geman, E. Bienenstock, and R. Doursat. Neural Networks and the Bias/Variance Dilemma. Neural Computation, 4:158, 1992.
[9] E. Gilet, J. Diard, and P. Bessie`re. Bayesian action-perception computational model: interaction of pro- duction and recognition of cursive letters. PloS ONE, 6(6), 2011.
[10] G. E. Hinton and V. Nair. Inferring motor programs from images of handwritten digits. In Advances in Neural Information Processing Systems 19, 2006.
[11] K. H. James and I. Gauthier. Letter processing automatically recruits a sensory-motor brain network.Neuropsychologia, 44(14):29372949, 2006.
[12] K. H. James and I. Gauthier. When writing impairs reading: letter perceptions susceptibility to motor interference. Journal of Experimental Psychology: General, 138(3):41631, Aug. 2009.
[13] C. Kemp and A. Jern. Abstraction and relational learning. In Advances in Neural Information Processing Systems 22, 2009.
[14] A. Krizhevsky. Learning multiple layers of features from tiny images. PhD thesis, Unviersity of Toronto, 2009.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, 2012.
[16] B. M. Lake, R. Salakhutdinov, J. Gross, and J. B. Tenenbaum. One shot learning of simple visual concepts.In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.
[17] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Concept learning as motor program induction: A large-scale empirical study. In Proceedings of the 34th Annual Conference of the Cognitive Science Society, 2012.
[18] L. Lam, S.-W. Lee, and C. Y. Suen. Thinning Methodologies - A Comprehensive Survey. IEEE Transac- tions of Pattern Analysis and Machine Intelligence, 14(9):869885, 1992.
[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-Based Learning Applied to Document Recog- nition. Proceedings of the IEEE, 86(11):22782323, 1998.
[20] K. Liu, Y. S. Huang, and C. Y. Suen. Identification of Fork Points on the Skeletons of Handwritten Chinese Characters. IEEE Transactions of Pattern Analysis and Machine Intelligence, 21(10):10951100, 1999.
[21] M. Longcamp, J. L. Anton, M. Roth, and J. L. Velay. Visual presentation of single letters activates a premotor area involved in writing. Neuroimage, 19(4):14921500, 2003.
[22] E. M. Markman. Categorization and Naming in Children. MIT Press, Cambridge, MA, 1989.
[23] E. G. Miller, N. E. Matsakis, and P. A. Viola. Learning from one example through shared densities on transformations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000.
[24] M. Revow, C. K. I. Williams, and G. E. Hinton. Using Generative Models for Handwritten Digit Recog- nition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):592606, 1996.
[25] R. Salakhutdinov and G. E. Hinton. Deep Boltzmann Machines. In 12th Internationcal Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
[26] R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba. Learning with Hierarchical-Deep Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):195871, 2013.
[27] P. H. Winston. Learning structural descriptions from examples. In P. H. Winston, editor, The Psychology of Computer Vision. McGraw-Hill, New York, 1975.
[28] F. Xu and J. B. Tenenbaum. Word Learning as Bayesian Inference. Psychological Review, 114(2):245 272, 2007.
-----1
[1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM J. Imaging Sci., 2(1):183202, 2009.
[2] J.M. Borwein and A.S. Lewis. Convex analysis and nonlinear optimization. Springer, 2006.
[3] L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, Online Learning and Neural Networks. 1998.
[4] L. Bottou and O. Bousquet. The trade-offs of large scale learning. In Adv. NIPS, 2008.
[5] O. Cappe and E. Moulines. On-line expectationmaximization algorithm for latent data models. J. Roy.Stat. Soc. B, 71(3):593613, 2009.
[6] J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. J. Mach.Learn. Res., 10:28992934, 2009.
[7] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res., 9:18711874, 2008.
[8] G. Gasso, A. Rakotomamonjy, and S. Canu. Recovering sparse signals with non-convex penalties and DC programming. IEEE T. Signal Process., 57(12):46864698, 2009.
[9] S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.Technical report, 2013.
[10] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Mach.Learn., 69(2-3):169192, 2007.
[11] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In Proc. COLT, 2011.
[12] C. Hu, J. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learn- ing. In Adv. NIPS, 2009.
[13] R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. In Proc. AIS- TATS, 2010.
[14] G. Lan. An optimal method for stochastic composite optimization. Math. Program., 133:365397, 2012.
[15] K. Lange, D.R. Hunter, and I. Yang. Optimization transfer using surrogate objective functions. J. Comput.Graph. Stat., 9(1):120, 2000.
[16] J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. J. Mach. Learn. Res., 10:777801, 2009.
[17] N. Le Roux, M. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In Adv. NIPS, 2012.
[18] J. Mairal. Optimization with first-order surrogate functions. In Proc. ICML, 2013. arXiv:1305.3120.
[19] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. J.Mach. Learn. Res., 11:1960, 2010.
[20] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. In Adv. NIPS, 2010.
[21] R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89, 1998.
[22] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz., 19(4):15741609, 2009.
[23] Y. Nesterov. Gradient methods for minimizing composite objective functions. Technical report, CORE Discussion Paper, 2007.
[24] S. Shalev-Schwartz and T. Zhang. Proximal stochastic dual coordinate ascent. arXiv 1211.2717v1, 2012.
[25] S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Stochastic convex optimization. In Proc.COLT, 2009.
[26] S. Shalev-Shwartz and A. Tewari. Stochastic methods for ?1 regularized loss minimization. In Proc.ICML, 2009.
[27] A. W. Van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998.
[28] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference.Found. Trends Mach. Learn., 1(1-2):1305, 2008.
[29] S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation. IEEE T.Signal Process., 57(7):24792493, 2009.
[30] L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. J. Mach.Learn. Res., 11:25432596, 2010.
-----1
[1] F. Bach. Consistency of trace norm minimization. Journal of Machine Learning Research, 9:10191048, 2008.
[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences, 2, No. 1:183202, 2009.
[3] M Fazel. Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002.
[4] S. Gupta, D. Phung, B. Adams, and S. Venkatesh. Regularized nonnegative shared subspace learning. Data Mining and Knowledge Discovery, 26:5797, 2013.
[5] P. Huber. Robust Statistics. New York, New York, 1981.
[6] S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In Proc. of International Conference on Machine Learning (ICML), 2009.
[7] I. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, New York, 1986.
[8] Q. Ke and T. Kanade. Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
[9] S. Ma, D. Goldfarb, and L. Chen. Fixed point and bregman iterative methods for matrix rank minimization. Mathematical Programming, 2009.
[10] S. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010.
[11] B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum rank solutions to linear matrix equa- tions via nuclear norm minimization. SIAM Review, 52, no 3:471501, 2010.
[12] M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, B, 6(3):611622, 1999.
[13] F. De La Torre and M. Black. A framework for robust subspace learning. International Journal of Computer Vision (IJCV), 54(1-3):117142, 2003.
[14] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. In Advances in Neural Information Processing Systems (NIPS), 2009.
[15] X. Zhang, Y. Yu, M. White, R. Huang, and D. Schuurmans. Convex sparse coding, subspace learning, and semi-supervised extensions. In Proc. of AAAI Conference on Artificial Intelli- gence (AAAI), 2011.
-----1
[1] M. Artac, M. Jogan, and A. Leonardis. Incremental pca for on-line visual learning and recogni- tion. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 3, pages 781784. IEEE, 2002.
[2] D.P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
[3] Samuel Burer and Renato Monteiro. A nonlinear programming algorithm for solving semidef- inite programs via low-rank factorization. Math. Progam., 2003.
[4] E.J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? ArXiv:0912.3599, 2009.
[5] V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, and A.S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572596, 2011.
[6] M. Fazel. Matrix rank minimization with applications. PhD thesis, PhD thesis, Stanford University, 2002.
[7] J. Feng, H. Xu, and S. Yan. Robust PCA in high-dimension: A deterministic approach. In ICML, 2012.
[8] D.L. Fisk. Quasi-martingales. Transactions of the American Mathematical Society, 1965.
[9] N. Guan, D. Tao, Z. Luo, and B. Yuan. Online nonnegative matrix factorization with robust stochastic approximation. Neural Networks and Learning Systems, IEEE Transactions on, 23(7):10871099, 2012.
[10] Jun He, Laura Balzano, and John Lui. Online robust subspace tracking from partial informa- tion. arXiv preprint arXiv:1109.3827, 2011.
[11] P.J. Huber, E. Ronchetti, and MyiLibrary. Robust statistics. John Wiley & Sons, New York, 1981.
[12] M. Hubert, P.J. Rousseeuw, and K.V. Branden. Robpca: a new approach to robust principal component analysis. Technometrics, 2005.
[13] Y. Li. On incremental and robust subspace learning. Pattern recognition, 2004.
[14] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055, 2010.
[15] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2009.
[16] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. JMLR, 2010.
[17] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 1901.
[18] C. Qiu, N. Vaswani, and L. Hogben. Recursive robust pca or recursive sparse recovery in large but structured noise. arXiv preprint arXiv:1211.3754, 2012.
[19] B. Recht, M. Fazel, and P.A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471501, 2010.
[20] Jasson Rennie and Nathan Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005.
[21] H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. In COLT, 2010.
[22] H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. Information Theory, IEEE Transactions on, 58(5):30473064, 2012.
-----1
[1] A. Agarwal, O. Chapelle, M. Dud?k, and J. Langford. A reliable effective terascale linear learning system. CoRR, abs/1110.4198, 2011.
[2] R. Arora, A. Cotter, K. Livescu, and N. Srebro. Stochastic optimization for PCA and PLS.In 50th Annual Allerton Conference on Communication, Control, and Computing, pages 861 868. 2012.
[3] R. Arora, A. Cotter, and N. Srebro. Stochastic optimization of PCA with capped MSG. In Advances in Neural Information Processing Systems, 2013.
[4] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66(2-3):259294, 2007.
[5] T. T. Cai, Z. Ma, and Y. Wu. Sparse PCA: Optimal rates and adaptive estimation. CoRR, abs/1211.1309, 2012.
[6] R. Durrett. Probability: Theory and Examples. Duxbury, second edition, 1995.
[7] T.P. Krasulina. A method of stochastic approximation for the determination of the least eigen- value of a symmetrical matrix. USSR Computational Mathematics and Mathematical Physics, 9(6):189195, 1969.
[8] I. Mitliagkas, C. Caramanis, and P. Jain. Memory limited, streaming PCA. In Advances in Neural Information Processing Systems, 2013.
[9] E. Oja. Subspace Methods of Pattern Recognition. Research Studies Press, 1983.
[10] E. Oja and J. Karhunen. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Math. Analysis and Applications, 106:6984, 1985.
[11] A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In International Conference on Machine Learning, 2012.
[12] S. Roweis. EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems, 1997.
[13] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):16151618, 2003.
[14] V.Q. Vu and J. Lei. Minimax rates of estimation for sparse PCA in high dimensions. Journal of Machine Learning Research - Proceedings Track, 22:12781286, 2012.
[15] M.K. Warmuth and D. Kuzmin. Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. In Advances in Neural Information Processing Systems. 2007.
[16] J. Weng, Y. Zhang, and W.-S. Hwang. Candid covariance-free incremental principal compo- nent analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):1034 1040, 2003.
[17] L. Zwald and G. Blanchard. On the convergence of eigenspaces in kernel principal component analysis. In Advances in Neural Information Processing Systems, 2005.
-----1
[1] F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical correlation analysis.Technical Report 608, Department of Statistics, University of California, Berkeley, 2005.
[2] A. Bhattacharya and D. B. Dunson. Nonparametric bayesian density estimation on manifolds with applications to planar shapes. Biometrika, 97(4):851865, 2010.
[3] C. M. Bishop. Bayesian PCA. Advances in neural information processing systems, pages 382388, 1999.
[4] S. Byrne and M. Girolami. Geodesic Monte Carlo on embedded manifolds. arXiv preprint arXiv:1301.6064, 2013.
[5] N. Courty, T. Burger, and P. F. Marteau. Geodesic analysis on the Gaussian RKHS hypersphere.In Machine Learning and Knowledge Discovery in Databases, pages 299313, 2012.
[6] M. do Carmo. Riemannian Geometry. Birkhauser, 1992.
[7] A. Edelman, T. A Arias, and S. T Smith. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303353, 1998.
[8] P. T. Fletcher. Geodesic regression and the theory of least squares on Riemannian manifolds.International Journal of Computer Vision, pages 115, 2012.
[9] P. T. Fletcher and S. Joshi. Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In Workshop on Computer Vision Approaches to Medical Image Analysis (CVAMIA), 2004.
[10] P. T. Fletcher, C. Lu, and S. Joshi. Statistics of shape via principal geodesic analysis on Lie groups. In Computer Vision and Pattern Recognition, pages 95101, 2003.
[11] S. Huckemann and H. Ziezold. Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. Advances in Applied Probability, 38(2):299319, 2006.
[12] I. T. Jolliffe. Principal Component Analysis, volume 487. Springer-Verlag New York, 1986.
[13] S. Jung, I. L. Dryden, and J. S. Marron. Analysis of principal nested spheres. Biometrika, 99(3):551568, 2012.
[14] D. G. Kendall. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16:18121, 1984.
[15] N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. Advances in neural information processing systems, 16:329336, 2004.
[16] K. V. Mardia. Directional Statistics. John Wiley and Sons, 1999.
[17] X. Pennec. Intrinsic statistics on Riemannian manifolds: basic tools for geometric measure- ments. Journal of Mathematical Imaging and Vision, 25(1), 2006.
[18] S. Roweis. EM algorithms for PCA and SPCA. Advances in neural information processing systems, pages 626632, 1998.
[19] S. Said, N. Courty, N. Le Bihan, and S. J. Sangwine. Exact principal geodesic analysis for data on SO(3). In Proceedings of the 15th European Signal Processing Conference, pages 17001705, 2007.
[20] B. Scholkopf, A. Smola, and K. R. Muller. Nonlinear component analysis as a kernel eigen- value problem. Neural Computation, 10(5):12991319, 1998.
[21] S. Sommer, F. Lauze, S. Hauberg, and M. Nielsen. Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In Proceedings of the European Conference on Computer Vision, pages 4356, 2010.
[22] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611622, 1999.
[23] P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa. Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 33(11):22732286, 2011.
[24] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classification on Riemannian mani- folds. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(10):17131727, 2008.
[25] M. Zhang, N. Singh, and P. T. Fletcher. Bayesian estimation of regularization and atlas building in diffeomorphic image registration. In Information Processing in Medical Imaging, pages 37 48. Springer, 2013.
-----1
[1] S. Amari, A. Cichocki, H. H. Yang, et al. A new learning algorithm for blind signal separation.Advances in neural information processing systems, pages 757763, 1996.
[2] S. Arora, R. Ge, A. Moitra, and S. Sachdeva. Provable ICA with unknown Gaussian noise, with implications for Gaussian mixtures and autoencoders. In NIPS, pages 23842392, 2012.
[3] M. Belkin, L. Rademacher, and J. Voss. Blind signal separation in the presence of Gaussian noise. In JMLR W&CP, volume 30: COLT, pages 270287, 2013.
[4] C. M. Bishop. Variational principal components. Proc. Ninth Int. Conf. on Articial Neural Networks. ICANN, 1:509514, 1999.
[5] E. J. Cande`s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? CoRR, abs/0912.3599, 2009.
[6] J. Cardoso and A. Souloumiac. Blind beamforming for non-Gaussian signals. In Radar and Signal Processing, IEE Proceedings F, volume 140, pages 362370. IET, 1993.
[7] J.-F. Cardoso and A. Souloumiac. Matlab JADE for real-valued data v 1.8. http:// perso.telecom-paristech.fr/cardoso/Algo/Jade/jadeR.m, 2005. [On- line; accessed 8-May-2013].
[8] P. Comon and C. Jutten, editors. Handbook of Blind Source Separation. Academic Press, 2010.
[9] X. Ding, L. He, and L. Carin. Bayesian robust principal component analysis. Image Process- ing, IEEE Transactions on, 20(12):34193430, 2011.
[10] H. Gavert, J. Hurri, J. Sarela, and A. Hyvarinen. Matlab FastICA v 2.5. http:// research.ics.aalto.fi/ica/fastica/code/dlcode.shtml, 2005. [Online; accessed 1-May-2013].
[11] D. Hsu and S. M. Kakade. Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. In ITCS, pages 1120, 2013.
[12] A. Hyvarinen. Independent component analysis in the presence of Gaussian noise by maxi- mizing joint likelihood. Neurocomputing, 22(1-3):4967, 1998.
[13] A. Hyvarinen. Fast and robust fixed-point algorithms for independent component analysis.IEEE Transactions on Neural Networks, 10(3):626634, 1999.
[14] A. Hyvarinen and E. Oja. Independent component analysis: Algorithms and applications.Neural Networks, 13(4-5):411430, 2000.
[15] J. F. Kenney and E. S. Keeping. Mathematics of Statistics, part 2. van Nostrand, 1962.
[16] H. Li and T. Adali. A class of complex ICA algorithms based on the kurtosis cost function.IEEE Transactions on Neural Networks, 19(3):408420, 2008.
[17] L. Mafttner. What are cumulants. Documenta Mathematica, 4:601622, 1999.
[18] P. Q. Nguyen and O. Regev. Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures. J. Cryptology, 22(2):139160, 2009.
[19] J. Voss, L. Rademacher, and M. Belkin. Matlab GI-ICA implementation. http:// sourceforge.net/projects/giica/, 2013. [Online].
[20] M. Welling. Robust higher order statistics. In Tenth International Workshop on Artificial Intelligence and Statistics, pages 405412, 2005.
[21] A. Yeredor. Blind source separation via the second characteristic function. Signal Processing, 80(5):897902, 2000.
[22] V. Zarzoso and P. Comon. How fast is FastICA. EUSIPCO, 2006.
-----1
[1] J.R. Bunch and C.P. Nielsen. Updating the singular value decomposition. Numerische Mathe- matik, 1978.
[2] E.J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? ArXiv:0912.3599, 2009.
[3] C. Croux and A. Ruiz-Gazen. A fast algorithm for robust principal components based on projection pursuit. In COMPSTAT, 1996.
[4] C. Croux and A. Ruiz-Gazen. High breakdown estimators for principal components: the projection-pursuit approach revisited. Journal of Multivariate Analysis, 2005.
[5] A. dAspremont, F. Bach, and L. Ghaoui. Optimal solutions for sparse principal component analysis. JMLR, 2008.
[6] J. Feng, H. Xu, and S. Yan. Robust PCA in high-dimension: A deterministic approach. In ICML, 2012.
[7] David Gross and Vincent Nesme. Note on sampling without replacing from a finite collection of matrices. arXiv preprint arXiv:1001.2738, 2010.
[8] P. Hall, D. Marshall, and R. Martin. Merging and splitting eigenspace models. TPAMI, 2000.
[9] Jun He, Laura Balzano, and John Lui. Online robust subspace tracking from partial informa- tion. arXiv preprint arXiv:1109.3827, 2011.
[10] P. Honeine. Online kernel principal component analysis: a reduced-order model. TPAMI, 2012.
[11] P.J. Huber, E. Ronchetti, and MyiLibrary. Robust statistics. John Wiley & Sons, New York, 1981.
[12] M. Hubert, P.J. Rousseeuw, and K.V. Branden. Robpca: a new approach to robust principal component analysis. Technometrics, 2005.
[13] M. Hubert, P.J. Rousseeuw, and S. Verboven. A fast method for robust principal components with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 2002.
[14] G. Li and Z. Chen. Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and monte carlo. Journal of the American Statistical Association, 1985.
[15] Y. Li. On incremental and robust subspace learning. Pattern recognition, 2004.
[16] Michael W Mahoney. Randomized algorithms for matrices and data. arXiv preprint arXiv:1104.5557, 2011.
[17] R.A. Maronna. Robust m-estimators of multivariate location and scatter. The annals of statis- tics, 1976.
[18] S. Ozawa, S. Pang, and N. Kasabov. A modified incremental principal component analysis for on-line learning of feature space and classifier. PRICAI, 2004.
[19] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 1901.
[20] C. Qiu, N. Vaswani, and L. Hogben. Recursive robust pca or recursive sparse recovery in large but structured noise. arXiv preprint arXiv:1211.3754, 2012.
[21] P.J. Rousseeuw. Least median of squares regression. Journal of the American statistical asso- ciation, 1984.
[22] P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection. John Wiley & Sons Inc, 1987.
[23] M.K. Warmuth and D. Kuzmin. Randomized online pca algorithms with regret bounds that are logarithmic in the dimension. JMLR, 2008.
[24] H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. In COLT, 2010.
[25] H. Zhao, P.C. Yuen, and J.T. Kwok. A novel incremental principal component analysis and its application for face recognition. TSMC-B, 2006.
-----1
Over the past decade, there has been a fever of activity to address the drawbacks of PCA with a class of techniques called sparse PCA that combine the essence of PCA with the assumption that the phe- nomena of interest depend mostly on a few variables. Examples include algorithmic [e.g., 1, 310] and theoretical [e.g., 1114] developments. However, much of this work has focused on the first principal component. One rationale behind this focus is by analogy with ordinary PCA: additional components can be found by iteratively deflating the input matrix to account for variation uncovered by previous components. However, the use of deflation with sparse PCA entails complications of non-orthogonality, sub-optimality, and multiple tuning parameters [15]. Identifiability and consis- tency present more subtle issues. The principal directions of variation correspond to eigenvectors of some population matrix ?. There is no reason to assume a priori that the d largest eigenvalues of ? are distinct. Even if the eigenvalues are distinct, estimates of individual eigenvectors can be unreliable if the gap between their eigenvalues is small. So it seems reasonable, if not necessary, to de-emphasize eigenvectors and to instead focus on their span, i.e. the principal subspace of variation.There has been relatively little work on the problem of estimating the principal subspace or even multiple eigenvectors simultaneously. Most works that do are limited to iterative deflation schemes or optimization problems whose global solution is intractable to compute. Sole exceptions are the diagonal thresholding method [2], which is just ordinary PCA applied to the subset of variables with largest marginal sample variance, or refinements such as iterative thresholding [16], which use diagonal thresholding as an initial estimate. These works are limited, because they cannot be used when the variables have equal variances (e.g., correlation matrices). Theoretical results are equally limited in their applicability. Although the optimal minimax rates for the sparse principal subspace problem are known in both the spiked [17] and general [18] covariance models, existing statistical guarantees only hold under the restrictive spiked covariance model, which essentially guarantees that diagonal thresholding has good properties, or for estimators that are computationally intractable.In this paper, we propose a novel convex optimization problem to estimate the d-dimensional princi- pal subspace of a population matrix? based on a noisy input matrix S. We show that if S is a sample covariance matrix and the projection ? of the d-dimensional principal subspace of ? depends only on s variables, then with a suitable choice of regularization parameter, the Frobenius norm of the error of our estimator X? is bounded with high probability |||X? ??|||2 = O ( (?1/?)s ? log p/n ) where ?1 is the largest eigenvalue of ? and ? the gap between the dth and (d + 1)th largest eigen- values of ?. This rate turns out to be nearly minimax optimal (Corollary 3.3), and under additional assumptions on signal strength, it also allows us to recover the support of the principal subspace (Theorem 3.2). Moreover, we provide easy to verify conditions (Theorem 3.3) that yield near- optimal statistical guarantees for other choices of input matrix, such as Pearsons correlation and Kendalls tau correlation matrices (Corollary 3.4).Our estimator turns out to be a semidefinite program (SDP) that generalizes the DSPCA approach of [1] to d ? 1 dimensions. It is based on a convex body, called the Fantope, that provides a tight relaxation for simultaneous rank and orthogonality constraints on the positive semidefinite cone.Solving the SDP is non-trivial. We find that an alternating direction method of multipliers (ADMM) algorithm [e.g., 19] can efficiently compute its global optimum (Section 4).In summary, the main contributions of this paper are as follows.1. We formulate the sparse principal subspace problem as a novel semidefinite program with a Fantope constraint (Section 2).2. We show that the proposed estimator achieves a near optimal rate of convergence in sub- space estimation without assumptions on the rank of the solution or restrictive spiked co- variance models. This is a first for both d = 1 and d > 1 (Section 3).3. We provide a general theoretical framework that accommodates other matrices, in addition to sample covariance, such as Pearsons correlation and Kendalls tau.4. We develop an efficient ADMM algorithm to solve the SDP (Section 4), and provide nu- merical examples that demonstrate the superiority of our approach over deflation methods in both computational and statistical efficiency (Section 5).The remainder of the paper explains each of these contributions in detail, but we defer all proofs to Appendix A.Related work Existing work most closely related to ours is the DSPCA approach for single com- ponent sparse PCA that was first proposed in [1]. Subsequently, there has been theoretical analysis under a spiked covariance model and restrictions on the entries of the eigenvectors [11], and algo- rithmic developments including block coordinate ascent [9] and ADMM [20]. The crucial difference with our work is that this previous work only considered d = 1. The d > 1 case requires invention and novel techniques to deal with a convex body, the Fantope, that has never before been used in sparse PCA.Notation For matrices A,B of compatible dimension ?A,B? := tr(ATB) is the Frobenius inner product, and |||A|||22 := ?A,A? is the squared Frobenius norm. ?x?q is the usual ?q norm with ?x?0 defined as the number of nonzero entries of x. ?A?a,b is the (a, b)-norm defined to be the ?b norm of the vector of rowwise ?a norms of A, e.g. ?A?1,? is the maximum absolute row sum. For a symmetric matrix A, we define ?1(A) ? ?2(A) ?    to be the eigenvalues of A with multiplicity.When the context is obvious we write ?j := ?j(A) as shorthand. For two subspacesM1 andM2, sin?(M1,M2) is defined to be the matrix whose diagonals are the sines of the canonical angles between the two subspaces [see 21, VII].2 Sparse subspace estimation Given a symmetric input matrix S, we propose a sparse principal subspace estimator X? that is defined to be a solution of the semidefinite program maximize ?S,X? ? ??X?1,1 subject to X ? Fd, (1) in the variable X , where Fd := {X : 0 ? X ? I and tr(X) = d} is a convex body called the Fantope [22, 2.3.2], and ? ? 0 is a regularization parameter that encourages sparsity. When d = 1, the spectral norm bound in Fd is redundant and (1) reduces to the DSPCA approach of [1]. The motivation behind (1) is based on two key insights.The first insight is a variational characterization of the principal subspace of a symmetric matrix.The sum of the d largest eigenvalues of a symmetric matrix A can be expressed as d? i=1 ?i(A) (a) = max V TV=Id ?A, V V T ? (b)= max X?Fd ?A,X? . (2) Identity (a) is an extremal property known as Ky Fans maximum principle [23]; (b) is based on the less well known observation that Fd = conv({V V T : V TV = Id}) , i.e. the extremal points of Fd are the rank-d projection matrices. See [24] for proofs of both. Thus, the constraint in (1) is a convex relaxation of an orthogonality constraint.The second insight is a connection between the (1, 1)-norm and a notion of subspace sparsity intro- duced by [18]. Any X ? 0 can be factorized (non-uniquely) as X = V V T .Lemma 2.1. If X = V V T , then ?X?1,1 ? ?V ?22,1 ? ?V ?22,0 tr(X).Consequently, any X ? Fd that has at most s non-zero rows necessarily has ?X?1,1 ? s2d. Thus, ?X?1,1 is a convex relaxation of what [18] call row sparsity for subspaces.These two insights reveal that (1) is a semidefinite relaxation of the non-convex problem maximize ?S, V V T ? ? ??V ?22,0d subject to V TV = Id .
[18] proposed solving an equivalent form of the above optimization problem and showed that the estimator corresponding to its global solution is minimax rate optimal under a general statistical model for S. Their estimator requires solving an NP-hard problem. The advantage of (1) is that it is computationally tractable.Subspace estimation The constraint X? ? Fd guarantees that its rank is? d. However X? need not be an extremal point ofFd, i.e. a rank-d projection matrix. In order to obtain a proper d-dimensional subspace estimate, we can extract the d leading eigenvectors of X? , say V? , and form the projection matrix ?? = V? V? T . The projection is unique, but the choice of basis is arbitrary. We can follow the convention of standard PCA by choosing an orthogonal matrixO so that (V? O)TS(V? O) is diagonal, and take V? O as the orthonormal basis for the subspace estimate.3 Theory In this section we describe our theoretical framework for studying the statistical properties of X? given by (1) with arbitrary input matrices that satisfy the following assumptions.Assumption 1 (Symmetry). S and ? are p p symmetric matrices.Assumption 2 (Identifiability). ? = ?(?) = ?d(?)? ?d+1(?) > 0.Assumption 3 (Sparsity). The projection ? onto the subspace spanned by the eigenvectors of ? corresponding to its d largest eigenvalues satisfies ???2,0 ? s, or equivalently, ?diag(?)?0 ? s.The key result (Theorem 3.1 below) implies that the statistical properties of the error of the estimator ? := X? ?? , can be derived, in many cases, by routine analysis of the entrywise errors of the input matrix W := S ? ? .There are two main ideas in our analysis of X? . The first is relating the difference in the values of the objective function in (1) at ? and X? to ?. The second is exploiting the decomposability of the regularizer. Conceptually, this is the same approach taken by [25] in analyzing the statistical properties of regularized M -estimators. It is worth noting that the curvature result in our problem comes from the geometry of the constraint set in (1). It is different from the restricted strong convexity in [25], a notion of curvature tailored for regularization in the form of penalizing an unconstrained convex objective.3.1 Variational analysis on the Fantope The first step of our analysis is to establish a bound on the curvature of the objective function along the Fantope and away from the truth.Lemma 3.1 (Curvature). Let A be a symmetric matrix and E be the projection onto the subspace spanned by the eigenvectors of A corresponding to its d largest eigenvalues ?1 ? ?2 ?    . If ?A = ?d ? ?d+1 > 0, then ?A |||E ? F |||22 ? ?A,E ? F ? for all F satisfying 0 ? F ? I and tr(F ) = d.A version of Lemma 3.1 first appeared in [18] with the additional restriction that F is a projection matrix. Our proof of the above extension is a minor modification of their proof.The following is an immediate corollary of Lemma 3.1 and the Ky Fan maximal principle.Corollary 3.1 (A sin? theorem [18]). Let A,B be symmetric matrices and MA, MB be their respective d-dimensional principal subspaces. If ?A,B = [?d+1(A)??d(A)]? [?d+1(B)??d(B)], then |||sin?(MA,MB)|||2 ? ? ?A,B |||A?B|||2 .The advantage of Corollary 3.1 over the Davis-Kahan Theorem [see, e.g., 21, VII.3] is that it does not require a bound on the differences between eigenvalues of A and eigenvalues of B. This means that typical applications of the Davis-Kahan Theorem require the additional invocation of Weyls Theorem. Our primary use of this result is to show that even if rank(X?) ?= d, its principal subspace will be close to that of ? if ? is small.Corollary 3.2 (Subspace error bound). IfM is the principal d-dimensional subspace of ? and M? is the principal d-dimensional subspace of X? , then |||sin?(M,M?)|||2 ? ? 2|||?|||2 .3.2 Deterministic error With Lemma 3.1, it is straightforward to prove the following theorem.Theorem 3.1 (Deterministic error bound). If ? ? ?W??,? and s ? ???2,0 then |||?|||2 ? 4s?/? .Theorem 3.1 holds for any global optimizer X? of (1). It does not assume that the solution is rank-d as in [11]. The next theorem gives a sufficient condition for support recovery by diagonal thresholding X? .Theorem 3.2 (Support recovery). For all t > 0??{j : ?jj = 0, X?jj ? t}??+ ??{j : ?jj ? 2t, X?jj < t}?? ? |||?|||22 t2 .As a consequence, the variable selection procedure J?(t) := { j : X?jj ? t } succeeds if minj:?jj ?=0?jj ? 2t > 2|||?|||2.3.3 Statistical properties In this section we use Theorem 3.1 to derive the statistical properties of X? in a generic setting where the entries ofW uniformly obey a restricted sub-Gaussian deviation inequality. This is not the most general result possible, but it allows us to illustrate the statistical properties of X? for two different types of input matrices: sample covariance and Kendalls tau correlation. The former is the standard input for PCA; the latter has recently been shown to be a useful robust and nonparametric tool for high-dimensional graphical models [26].Theorem 3.3 (General statistical error bound). If there exists ? > 0 and n > 0 such that ? and S satisfy max ij P (|Sij ? ?ij | ? t) ? 2 exp (? 4nt2/?2) (3) for all t ? ? and ? = ? ? log p/n ? ? , (4) then |||X? ??|||2 ? 4? ? s ? log p/n with probability at least 1? 2/p2.Sample covariance Consider the setting where the input matrix is the sample covariance matrix of a random sample of size n > 1 from a sub-Gaussian distribution. A random vector Y with ? = Var(Y ) has sub-Gaussian distribution if there exists a constant L > 0 such that P (|?Y ? EY, u?| ? t) ? exp (? Lt2/??1/2u?22) (5) for all u and t ? 0. Under this condition we have the following corollary of Theorem 3.3.Corollary 3.3. Let S be the sample covariance matrix of an i.i.d. sample of size n > 1 from a sub-Gaussian distribution (5) with population covariance matrix ?. If ? is chosen to satisfy (4) with ? = c?1, then |||X? ??|||2 ? C ?1 ? s ? log p/n with probablity at least 1? 2/p2, where c, C are constants depending only on L.Comparing with the minimax lower bounds derived in [17, 18], we see that the rate in Corollary 3.3 is roughly larger than the optimal minimax rate by a factor of? ?1/?d+1  ? s/d The first term only becomes important in the near-degenerate case where ?d+1 ? ?1. It is possible with much more technical work to get sharp dependence on the eigenvalues, but we prefer to retain brevity and clarity in our proof of the version here. The second term is likely to be unimprovable without additional conditions on S and ? such as a spiked covariance model. Very recently, [14] showed in a testing framework with similar assumptions as ours when d = 1 that the extra factor? s is necessary for any polynomial time procedure if the planted clique problem cannot be solved in randomized polynomial time.Kendalls tau Kendalls tau correlation provides a robust and nonparametric alternative to ordi- nary (Pearson) correlation. Given an n  p matrix whose rows are i.i.d. p-variate random vectors, the theoretical and empirical versions of Kendalls tau correlation matrix are ?ij := Cor ( sign(Y1i ? Y2i) , sign(Y1j ? Y2j) ) ?ij := n(n? 1) ? s<t sign(Ysi ? Yti) sign(Ysj ? Ytj) .A key feature of Kendalls tau is that it is invariant under strictly monotone transformations, i.e.sign(Ysi ? Yti) sign(Ysj ? Ytj)) = sign(fi(Ysi)? fi(Yti)) sign(fj(Ysj)? fj(Ytj)) , where fi, fj are strictly monotone transformations. When Y is multivariate Gaussian, there is also a one-to-one correspondence between ?ij and ?ij = Cor(Y1i, Y1j) [27] : ?ij = ? arcsin(?ij) . (6) These two observations led [26] to propose using T?ij = { sin ( ? 2 ?ij ) if i ?= j 1 if i = j .(7) as an input matrix to Gaussian graphical model estimators in order to extend the applicability of those procedures to the wider class of nonparanormal distributions [28]. This same idea was extended to sparse PCA by [29]; they proposed and analyzed using T? as an input matrix to the non-convex sparse PCA procedure of [13]. A shortcoming of that approach is that their theoretical guarantees only hold for the global solution of an NP-hard optimization problem. The following corollary of Theorem 3.3 rectifies the situation by showing that X? with Kendalls tau is nearly optimal.Corollary 3.4. Let S = T? as defined in (7) for an i.i.d. sample of size n > 1 and let ? = T be the analogous quantity with ?ij in place of ?ij . If ? is chosen to satisfy (4) with ? = ? 8?, then |||X? ??|||2 ? ? 2? ? s ? log p/n with probablity at least 1? 2/p2.Note that Corollary 3.4 only requires that ? be computed from an i.i.d. sample. It does not specify the marginal distribution of the observations. So ? = T is not necessarily positive semidefinite and may be difficult to interpret. However, under additional conditions, the following lemma gives meaning to T by extending (6) to a wide class of distributions, called transelliptical by [29], that includes the nonparanormal. See [29, 30] for further information.Lemma ([29, 30]). If (Y11, . . . , Y1p) has continuous distribution and there exist monotone transfor- mations f1, . . . , fp such that ( f1(Y11), . . . , fp(Y1p) ) has elliptical distribution with scatter matrix ?, then Tij = ?ij/ ? ?ii?jj .Moreover, if fj(Y1j), j = 1, . . . , p have finite variance, then Tij = Cor ( fi(Y1i), fj(Y1j) ) .This lemma together with Corollary 3.4 shows that Kendalls tau can be used in place of the sample correlation matrix for a wide class of distributions without much loss of efficiency.4 An ADMM algorithm The chief difficulty in directly solving (1) is the interaction between the penalty and the Fantope constraint. Without either of these features, the optimization problemwould be much easier. ADMM can exploit this fact if we first rewrite (1) as the equivalent equality constrained problem minimize ?  1Fd(X)? ?S,X?+ ??Y ?1,1 subject to X ? Y = 0 , (8) Algorithm 1 Fantope Projection and Selection (FPS) Require: S = ST , d ? 1, ? ? 0, ? > 0, ? > 0 Y (0) ? 0, U (0) ? 0 ? Initialization repeat t = 0, 1, 2, 3, . . .X(t+1) ? PFd ( Y (t) ? U (t) + S/?) ? Fantope projection Y (t+1) ? S?/? ( X(t+1) + U (t) ) ? Elementwise soft thresholding U (t+1) ? U (t) +X(t+1) ? Y (t+1) ? Dual variable update until max(|||X(t) ? Y (t)|||22 , ?2|||Y (t) ? Y (t?1)|||22) ? d?2 ? Stopping criterion return Y (t) in the variablesX and Y , where 1Fd is the 0-1 indicator function forFd and we adopt the convention?  0 = 0. The augmented Lagrangian associated with (8) has the form L?(X,Y, U) :=?  1Fd(X)? ?S,X?+ ??Y ?1,1 + ? ( |||X ? Y + U |||22 ? |||U |||22 ) , (9) where U = (1/?)Z is the scaled ADMM dual variable and ? is the ADMM penalty parameter [see 19, 3.1]. ADMM consists of iteratively minimizing L? with respect to X , minimizing L? with respect to Y , and then updating the dual variable. Algorithm 1 summarizes the main steps.In light of the separation of X and Y in (9) and some algebraic manipulation, the X and Y updates reduce to computing the proximal operators PFd ( Y ? U + S/?) := argmin X?Fd |||X ? (Y ? U + S/?)|||22 S?/?(X + U) := argmin Y ? ? ?Y ?1,1 + |||(X + U)? Y |||22 .S?/? is the elementwise soft thresholding operator [e.g., 19, 4.4.3] defined as S?/?(x) = sign(x)max(|x|? ?/?, 0) .PFd is the Euclidean projection onto Fd and is given in closed form in the following lemma.Lemma 4.1 (Fantope projection). If X = ? i ?iuiu T i is a spectral decomposition of X , then PFd(X) = ? i ? + i (?)uiu T i , where ? + i (?) = min(max(?i ? ?, 0), 1) and ? satisfies the equation? i ? + i (?) = d.Thus,PFd(X) involves computing an eigendecomposition of Y , and then modifying the eigenvalues by solving a monotone, piecewise linear equation.Rather than fix the ADMM penalty parameter ? in Algorithm 1 at some constant value, we recom- mend using the varying penalty scheme described in [19, 3.4.1] that dynamically updates ? after each iteration of the ADMM to keep the primal and dual residual norms (the two sum of squares in the stopping criterion of Algorithm 1) within a constant factor of each other. This eliminates an additional tuning parameter, and in our experience, yields faster convergence.5 Simulation results We conducted a simulation study to compare the effectiveness of FPS against three deflation-based methods: DSPCA (which is just FPS with d = 1), GPower?1 [7], and SPC [5, 6]. These methods obtain multiple component estimates by taking the kth component estimate vk from input matrix Sk, and then re-running the method with the deflated input matrix: Sk+1 = (I ? vkvTk )Sk(I ? vkvTk ).The resulting d-dimensional principal subspace estimate is the span of v1, . . . , vd. Tuning parameter selection can be much more complicated for these iterative deflation methods. In our simulations, we simply fixed the regularization parameter to be the same for all d components.We generated input matrices by sampling n = 100, i.i.d. observations from a Np(0,?), p = 200 distribution and taking S to be the usual sample covariance matrix. We considered two different types of sparse ? = V V T of rank d = 5: those with disjoint support for the nonzero entries of the s:10, noise:1 s:10, noise:10 s:25, noise:1 s:25, noise:10 ?3 ?2 ?1 ?1 support:disjoint support:shared 5 10 20 30 5 10 20 30 5 10 20 30 5 10 20 30 (2,1)?norm of estimate log (M SE ) Figure 1: Mean squared error of FPS ( ), DSPCAwith deflation ( ), GPower?1 ( ), and SPC ( ) across 100 replicates each of a variety of simulation designs with n = 100, p = 200, d = 5, s ? {10, 25}, noise ?2 ? {1, 10}.columns of V and those with shared support. We generated V by sampling its nonzero entries from a standard Gaussian distribution and then orthnormalizing V while retaining the desired sparsity pattern. In both cases, the number of nonzero rows of V is equal to s ? {10, 25}. We then embedded ? inside the population covariance matrix ? = ?? + (I ? ?)?0(I ? ?), where ?0 is a Wishart matrix with p degrees of freedom and ? > 0 is chosen so that the effective noise level (in the optimal minimax rate [18]), ?2 = ? ?1?d+1/(?d ? ?d+1) ? {1, 10}.Figure 1 summarizes the resulting mean squared error |||?? ? ?|||22 across 100 replicates for each of the different combinations of simulation parameters. Each methods regularization parameter varies over a range and the x-axis shows the (2, 1)-norm of the corresponding estimate. At the right extreme, all methods essentially correspond to standard PCA. It is clear that regularization is beneficial, because all the methods have significantly smaller MSE than standard PCA when they are sufficiently sparse. Comparing between methods, we see that FPS dominates in all cases, but the competition is much closer in the disjoint support case. Finally, all methods degrade when the number of active variables or noise level increases.6 Discussion Estimating sparse principal subspaces in high-dimensions poses both computational and statistical challenges. The contribution of this papera novel SDP based estimator, an efficient algorithm, and strong statistical guarantees for a wide array of input matricesis a significant leap forward on both fronts. Yet, there are newly open problems and many possible extensions related to this work.For instance, it would be interesting to investigate the performance of FPS a under weak, rather than exact, sparsity assumption on ? (e.g., ?q , 0 < q ? sparsity). The optimization problem (1) and ADMM algorithm can easily be modified handle other types of penalties. In some cases, extensions of Theorem 3.1 would require minimal modifications to its proof. Finally, the choices of dimension d and regularization parameter ? are of great practical interest. Techniques like cross-validation need to be carefully formulated and studied in the context of principal subspace estimation.Acknowledgments This research was supported in part by NSF grants DMS-0903120, DMS-1309998, BCS-0941518, and NIH grant MH057881.
[1] A. dAspremont et al. A direct formulation of sparse PCA using semidefinite programming . In: SIAM Review 49.3 (2007).
[2] I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal components analysis in high dimensions . In: JASA 104.486 (2009), pp. 682693.
[3] I. T. Jolliffe, N. T. Trendafilov, and M. Uddin. A modified principal component technique based on the Lasso . In: JCGS 12 (2003), pp. 531547.
[4] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis . In: JCGS 15.2 (2006), pp. 265286.
[5] H. Shen and J. Z. Huang. Sparse principal component analysis via regularized low rank matrix approx- imation . In: Journal of Multivariate Analysis 99 (2008), pp. 10151034.
[6] D. M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis . In: Biostatistics 10 (2009), pp. 515 534.
[7] M. Journee et al. Generalized power method for sparse principal component analysis . In: JMLR 11 (2010), pp. 517553.
[8] B. K. Sriperumbudur, D. A. Torres, and G. R. G. Lanckriet. A majorization-minimization approach to the sparse generalized eigenvalue problem . In: Machine Learning 85.12 (2011), pp. 339.
[9] Y. Zhang and L. E. Ghaoui. Large-scale sparse principal component analysis with application to text data . In: NIPS 24. Ed. by J. Shawe-Taylor et al. 2011, pp. 532539.
[10] X. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems . In: JMLR 14 (2013), pp. 899925.
[11] A. A. Amini and M. J. Wainwright. High-dimensional analysis of semidefinite relaxations for sparse principal components . In: Ann. Statis. 37.5B (2009), pp. 28772921.
[12] A. Birnbaum et al. Minimax bounds for sparse pca with noisy high-dimensional data . In: Ann. Statis.41.3 (2013), pp. 10551084.
[13] V. Q. Vu and J. Lei. Minimax rates of estimation for sparse PCA in high dimensions . In: AISTATS 15.Ed. by N. Lawrence and M. Girolami. Vol. 22. JMLR W&CP. 2012, pp. 12781286.
[14] Q. Berthet and P. Rigollet. Computational lower bounds for sparse PCA . In: (2013). arXiv: 1304.0828.
[15] L. Mackey. Deflation methods for sparse PCA . In: NIPS 21. Ed. by D. Koller et al. 2009, pp. 1017 1024.
[16] Z. Ma. Sparse principal component analysis and iterative thresholding . In: Ann. Statis. 41.2 (2013).
[17] T. T. Cai, Z. Ma, and Y. Wu. Sparse PCA: optimal rates and adaptive estimation . In: Ann. Statis.(2013). to appear. arXiv: 1211.1309.
[18] V. Q. Vu and J. Lei. Minimax sparse principal subspace estimation in high dimensions . In: Ann. Statis.(2013). to appear. arXiv: 1211.0373.
[19] S. Boyd et al. Distributed optimization and statistical learning via the alternating direction method of multipliers . In: Foundations and Trends in Machine Learning 3.1 (2010), pp. 1122.
[20] S. Ma. Alternating direction method of multipliers for sparse principal component analysis . In: (2011).arXiv: 1111.6703.
[21] R. Bhatia. Matrix analysis. Springer-Verlag, 1997.
[22] J. Dattorro. Convex optimization & euclidean distance geometry. Meboo Publishing USA, 2005.
[23] K. Fan. On a theorem of Weyl concerning eigenvalues of linear transformations I . In: Proceedings of the National Academy of Sciences 35.11 (1949), pp. 652655.
[24] M. Overton and R. Womersley. On the sum of the largest eigenvalues of a symmetric matrix . In: SIAM Journal on Matrix Analysis and Applications 13.1 (1992), pp. 4145.
[25] S. N. Negahban et al. A unified framework for the high-dimensional analysis of M -estimators with decomposable regularizers . In: Statistical Science 27.4 (2012), pp. 538557.
[26] H. Liu et al. High-dimensional semiparametric gaussian copula graphical models . In: Ann. Statis.40.4 (2012), pp. 22932326.
[27] W. H. Kruskal. Ordinal measures of association . In: JASA 53.284 (1958), pp. 814861.
[28] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: semiparametric estimation of high dimen- sional undirected graphs . In: JMLR 10 (2009), pp. 22952328.
[29] F. Han and H. Liu. Transelliptical component analysis . In: NIPS 25. Ed. by P. Bartlett et al. 2012, pp. 368376.
[30] F. Lindskog, A. McNeil, and U. Schmock. Kendalls tau for elliptical distributions . In: Credit Risk.Ed. by G. Bol et al. Contributions to Economics. Physica-Verlag HD, 2003, pp. 149156.
-----1
[1] L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28:594611, 2006.
[2] R. Salakhutdinov, J.B. Tenenbaum, and A. Torralba. One-shot learning with a hierarchical nonparametric Bayesian model. JMLR Workshop and Conference Proceedings Volume 26: Unsupervised and Transfer Learning Workshop, 27:195206, 2012.
[3] M.C. Frank, N.D. Goodman, and J.B. Tenenbaum. A Bayesian framework for cross-situational word- learning. Advances in Neural Information Processing Systems, 20:2029, 2007.
[4] J.B. Tenenbaum, T.L. Griffiths, and C. Kemp. Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10:309318, 2006.
[5] C. Kemp, A. Perfors, and J.B. Tenenbaum. Learning overhypotheses with hierarchical Bayesian models.Developmental Science, 10:307321, 2007.
[6] S. Carey and E. Bartlett. Acquiring a single new word. Proceedings of the Stanford Child Language Conference, 15:1729, 1978.
[7] L.B. Smith, S.S. Jones, B. Landau, L. Gershkoff-Stowe, and L. Samuelson. Object name learning provides on-the-job training for attention. Psychological Science, 13:1319, 2002.
[8] F. Xu and J.B. Tenenbaum. Word learning as Bayesian inference. Psychological Review, 114:245272, 2007.
[9] M. Fink. Object classification from a single example utilizing class relevance metrics. Advances in Neural Information Processing Systems, 17:449456, 2005.
[10] P. Hall, J.S. Marron, and A. Neeman. Geometric representation of high dimension, low sample size data.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67:427444, 2005.
[11] P. Hall, Y. Pittelkow, and M. Ghosh. Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70:159173, 2008.
[12] Y.I. Ingster, C. Pouet, and A.B. Tsybakov. Classification of sparse high-dimensional vectors. Philosoph- ical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367:4427 4448, 2009.
[13] W.F. Massy. Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60:234256, 1965.
[14] I.M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29:295327, 2001.
[15] D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17:16171642, 2007.
[16] B. Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. Annals of Statistics, 36:27912817, 2008.
[17] I.M. Johnstone and A.Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104:682693, 2009.
[18] S. Jung and J.S. Marron. PCA consistency in high dimension, low sample size context. Annals of Statis- tics, 37:41044130, 2009.
[19] S. Lee, F. Zou, and F.A. Wright. Convergence and prediction of principal component scores in high- dimensional settings. Annals of Statistics, 38:36053629, 2010.
[20] Q. Berthet and P. Rigollet. Optimal detection of sparse principal components in high dimension. arXiv preprint arXiv:1202.5070, 2012.
[21] S. Jung, A. Sen, and J.S Marron. Boundary behavior in high dimension, low sample size asymptotics of PCA. Journal of Multivariate Analysis, 109:190203, 2012.
[22] Z. Ma. Sparse principal component analysis and iterative thresholding. Annals of Statistics, 41:772801, 2013.
[23] C. Stein. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 197206, 1955.
[24] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society.Series B (Methodological), 58:267288, 1996.
[25] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009.
[26] V.N. Vapnik. Statistical Learning Theory. Wiley, 1998.
[27] K.S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43:211246, 2001.
-----1
[1] F. R. Bach and M. I. Jordan. Kernel independent component analysis. JMLR, 3:148, 2002.
[2] L. Breiman and J. H. Friedman. Estimating Optimal Transformations for Multiple Regression and Correlation. Journal of the American Statistical Association, 80(391):580598, 1985.
[3] H. Gebelein. Das statistische Problem der Korrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Zeitschrift fur Angewandte Mathematik und Mechanik, 21(6):364379, 1941.
[4] I.I. Gihman and A.V. Skorohod. The Theory of Stochastic Processes, volume 1. Springer, 1974s.
[5] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, and A. Smola. A kernel two-sample test. JMLR, 13:723773, 2012.
[6] A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the 16th international conference on Algorithmic Learning Theory, pages 6377. Springer-Verlag, 2005.
[7] W. K. Hardle and L. Simar. Applied Multivariate Statistical Analysis. Springer, 2nd edition, 2007.
[8] D. Hardoon and J. Shawe-Taylor. Convergence analysis of kernel canonical correlation analy- sis: theory and practice. Machine Learning, 74(1):2338, 2009.
[9] T. Hastie and R. Tibshirani. Generalized additive models. Statistical Science, 1:297310, 1986.
[10] L. K. Jones. A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Annals of Statistics, 20(1):608 613, 1992.
[11] Q. Le, T. Sarlos, and A. Smola. Fastfood  Approximating kernel expansions in loglinear time.In ICML, 2013.
[12] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, 1979.
[13] P. Massart. The tight constant in the Dvoretzky-Kiefer-wolfowitz inequality. The Annals of Probability, 18(3), 1990.
[14] R. Nelsen. An Introduction to Copulas. Springer Series in Statistics, 2nd edition, 2006.
[15] B. Poczos, Z. Ghahramani, and J. Schneider. Copula-based kernel dependency measures. In ICML, 2012.
[16] A. Rahimi and B. Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. NIPS, 2008.
[17] A. Renyi. On measures of dependence. Acta Mathematica Academiae Scientiarum Hungari- cae, 10:441451, 1959.
[18] D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):15181524, 2011.
[19] B. Scholkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.
[20] A. Sklar. Fonctions de repartition a` n dimension set leurs marges. Publ. Inst. Statis. Univ.Paris, 8(1):229231, 1959.
[21] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Feature selection via dependence maximization. JMLR, 13:13931434, June 2012.
[22] M.L. Stein. Interpolation of Spatial Data. Springer, 1999.
[23] G. J. Szekely and M. L. Rizzo. Rejoinder: Brownian distance covariance. Annals of Applied Statistics, 3(4):13031308, 2009.
[24] G. J. Szekely, M. L. Rizzo, and N. K. Bakirov. Measuring and testing dependence by correla- tion of distances. Annals of Statistics, 35(6), 2007.
[25] K. Zhang, J. Peters, D. Janzing, and B.Scholkopf. Kernel-based conditional independence test and application in causal discovery. CoRR, abs/1202.3775, 2012.
-----1
[1] A. Ahmed and E. P. Xing. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological pespective. In Proc. EMNLP, pages 11401150, 2010.
[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[3] D. Blei and J. McAuliffe. Supervised topic models. In Advances in NIPS, pages 121128.2008.
[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. JMLR, 3:9931022, 2003.
[5] D. Bohning. Multinomial logistic regression algorithm. Annals of Inst. of Stat. Math., 44:197 200, 1992.
[6] G. Bouchard. Efficient bounds for the softmax function, applications to inference in hybrid models. In Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems at NIPS07, 2007.
[7] X. Chen, Q. Lin, S. Kim, J. G. Carbonell, and E. P. Xing. Smoothing proximal gradient method for general structured sparse regression. The Annals of Applied Statistics, 6(2):719752, 2012.
[8] X. Ding, L. He, and L. Carin. Bayesian robust principal component analysis. IEEE Trans.Image Processing, 20(12):34193430, 2011.
[9] J. Eckstein. Augmented Lagrangian and alternating direction methods for convex optimization: A tutorial and some illustrative computational results. Technical report, RUTCOR Research Report RRR 32-2012, 2012.
[10] J. Eisenstein, A. Ahmed, and E. P. Xing. Sparse additive generative models of text. In Proc.ICML, 2011.
[11] J. Eisenstein and E. P. Xing. The CMU 2008 political blog corpus. Technical report, Carnegie Mellon University, School of Computer Science, Machine Learning Department, 2010.
[12] M. A. T. Figueiredo. Adaptive sparseness using Jeffreys prior. In Advances in NIPS, pages 679704. 2002.
[13] M. R. Gormley, M. Dredze, B. Van Durme, and J. Eisner. Shared components topic models.In Proc. NAACL-HLT, pages 783792, 2012.
[14] L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis. Discovering geo- graphical topics in the twitter stream. In Proc. 12th WWW, pages 769778, 2012.
[15] T. Jaakkola and M. I. Jordan. A variational approach to Bayesian logistic regression problems and their extensions. In Proc. AISTATS, 1996.
[16] M. Jaggi and M. Sulovsky`. A simple algorithm for nuclear norm regularized problems. In Proc. ICML, pages 471478, 2010.
[17] Y. Jiang and A. Saxena. Discovering different types of topics: Factored topics models. In Proc.IJCAI, 2013.
[18] A. Joulin, F. Bach, and J. Ponce. Efficient optimization for discriminative latent class models.In Advances in NIPS, pages 10451053. 2010.
[19] J. D. Lafferty and M. D. Blei. Correlated topic models. In Advances in NIPS, pages 147155, 2006.
[20] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Technical report, UIUC Technical Report UILU-ENG-09-2214, August 2009.
[21] Q. Mei, X. Ling, M. Wondra, H. Su, and C. X. Zhai. Topic sentiment mixture: modeling facets and opinions in webblogs. In Proc. WWW, 2007.
[22] T. P. Minka. Estimating a dirichlet distribution. Technical report, Massachusetts Institute of Technology, 2003.
[23] M. Paul and R. Girju. A two-dimensional topic-aspect model for discovering multi-faceted topics. In Proc. AAAI, 2010.
[24] E. Richard, P.-A. Savalle, and N. Vayatis. Estimation of simultaneously sparse and low rank matrices. In Proc. ICML, pages 13511358, 2012.
[25] Y. S. N. A. Smith and D. A. Smith. Discovering factions in the computational linguistics community. In ACL Workshop on Rediscovering 50 Years of Discoveries, 2012.
[26] C. Wang and D. Blei. Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In Advances in NIPS, pages 19821989. 2009.
[27] C. Wang and D. M. Blei. Variational inference in nonconjugate models. To appear in JMLR.
[28] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma. Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in NIPS, pages 20802088. 2009.
[29] J. Yang and X. Yuan. Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comp., 82:301329, 2013.
[30] J. Zhu, A. Ahmed, and E. P. Xing. MedLDA: maximum margin supervised topic models.JMLR, 13:22372278, 2012.
-----1
[1] Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of machine Learning Research 3 (2003) 9931022 
[2] Reisinger, J., Waters, A., Silverthorn, B., Mooney, R.J.: Spherical topic models. In: ICML 10: Proceed- ings of the 27th international conference on Machine learning. (2010) 
[3] Jojic, N., Perina, A.: Multidimensional counting grids: Inferring word order from disordered bags of words. In: Proceedings of conference on Uncertainty in artificial intelligence (UAI). (2011) 547556 
[4] Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42 (2001) 177196 
[5] Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS. (2005) 
[6] Banerjee, A., Basu, S.: Topic models over text streams: a study of batch and online unsupervised learning.In: In Proc. 7th SIAM Intl. Conf. on Data Mining. (2007) 
[7] Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: Proceed- ings of the 2011 International Conference on Computer Vision. ICCV 11, Washington, DC, USA, IEEE Computer Society (2011) 24072414 
[8] Neal, R.M., Hinton, G.E.: A view of the em algorithm that justifies incremental, sparse, and other variants.Learning in graphical models (1999) 355368 
[9] Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: In Proceedings of Uncertainty in Artificial Intelligence. (2009) 
[10] Minka, T.P.: Estimating a Dirichlet distribution. Technical report, Microsoft Research (2012) 
[11] Frey, B.J., Jojic, N.: Transformation-invariant clustering using the em algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 117 
[12] Dunson, D.B., Park, J.H.: Kernel stick-breaking processes. Biometrika 95 (2008) 307323 
[13] Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score spaces: Using generative information in discriminative classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012) 12491262 
[14] Raina, R., Shen, Y., Ng, A.Y., Mccallum, A.: Classification with hybrid generative/discriminative models.In: In Advances in Neural Information Processing Systems 16, MIT Press (2003) 
[15] Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5 (2004) 819844 
[16] Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 712727 
[17] Bicego, M., Lovato, P., Perina, A., Fasoli, M., Delledonne, M., Pezzotti, M., Polverari, A., Murino, V.: Investigating topic models capabilities in expression microarray data classification. IEEE/ACM Trans.Comput. Biology Bioinform. 9 (2012) 18311836 
[18] Perina, A., Jojic, N.: Image analysis by counting on a grid. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). (2011) 19851992 
[19] Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. SIGIR 03 (2003) 127134 
[20] Thomas, J., Cook, K.: Illuminating the Path: The Research and Development Agenda for Visual Analyt- ics. IEEE Press (2005) 
[21] Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimen- sionality Reduction. Science 290 (2000) 23192323 
[22] Globerson, A., Chechik, G., Pereira, F., Tishby, N.: Euclidean embedding of co-occurrence data. Journal of Machine Learning Research 8 (2007) 22652295 
[23] Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. SCIENCE 290 (2000) 23232326 
-----1
[1] Daniel D. Lee and H. Sebastian Seung. Algorithms for nonnegative matrix factorization. In Advances in Neural Information Processing Systems (NIPS 13), 2001. 1, 2 
[2] A. Cichocki, R. Zdunek, A.H. Phan, and S. Amari. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, 2009. 1 
[3] Seungjin Choi Jong-Hoon Ahn and Jong-Hoon Oh. A multiplicative up-propagation algorithm. In ICML, 2004. 2 
[4] Nicolas Gillis and Fran cois Glineur. A multilevel approach for nonnegative matrix factorization. Journal of Computational and Applied Mathematics, 236 (7):17081723, 2012. 2 
[5] A Cichocki and R Zdunek. Multilayer nonnegative matrix factorisation. Electronics Letters, 42(16):947 948, 2006. 2, 7 
[6] Patrik O. Hoyer and Peter Dayan. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:14571469, 2004. 1, 2 
[7] Bhiksha Raj Madhusudana Shashanka and Paris Smaragdis. Sparse overcomplete latent variable decom- position of counts data. In NIPS, 2007. 1, 2 
[8] Martin Larsson and Johan Ugander. A concave regularization technique for sparse mixture models. In NIPS, 2011. 1, 2, 5, 8 
[9] Suvrit Sra and Inderjit S Dhillon. Nonnegative matrix approximation: Algorithms and applications.Computer Science Department, University of Texas at Austin, 2006. 2, 7, 8 
[10] Jussi Kujala. Sparse topic modeling with concave-convex procedure: EMish algorithm for latent dirichlet allocation. In Technical Report, 2004. 2 
[11] Jagannadan Varadarajan, Remi Emonet, and Jean-Marc Odobez. A sequential topic model for mining recurrent activities from long term video logs. International Journal of Computer Vision, 103(1):100 126, 2013. 2, 6 
[12] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2005. 4 
[13] P. Gahinet, P. Apkarian, and M. Chilali. Affine parameter-dependent Lyapunov functions and real para- metric uncertainty. IEEE Transactions on Automatic Control, 41(3):436442, 1996. 4 
[14] Wray Buntine and Aleks Jakulin. Discrete component analysis. In Subspace, Latent Structure and Feature Selection Techniques. Springer-Verlag, 2006. 5 
[15] N. Hurley and Scott Rickard. Comparing measures of sparsity. Information Theory, IEEE Transactions on, 55(10):47234741, 2009. 6 
[16] Misha Denil and Nando de Freitas. Recklessly approximate sparse coding. CoRR, abs/1208.0959, 2012.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998. 7 
[18] Andrzej Cichocki, Rafal Zdunek, and Shun-ichi Amari. Csiszars divergences for non-negative matrix fac- torization: Family of new algorithms. In Independent Component Analysis and Blind Signal Separation, pages 3239. Springer, 2006. 8 
[19] Cedric Fevotte, Nancy Bertin, and Jean-Louis Durrieu. Nonnegative matrix factorization with the itakura- saito divergence: With application to music analysis. Neural Computation, 21(3):793830, 2009. 8 
[20] Andrzej Cichocki, Rafal Zdunek, Seungjin Choi, Robert Plemmons, and Shun-Ichi Amari. Non-negative tensor factorization using alpha and beta divergences. In Acoustics, Speech and Signal Processing, 2007.ICASSP 2007. IEEE International Conference on, volume 3, pages III1393. IEEE, 2007. 8 
[21] V. Chvatal. Linear Programming. W. H. Freeman and Company, New York, 1983.
-----1
[1] Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden Markov support vector machines. In Proc. ICML, 2003.
[2] M. Chen, Z. Xu, K.Q. Weinberg, O. Chapelle, and D. Kedem. Classifier cascade for minimizing feature evaluation cost. In AISATATS, 2012.
[3] M. Collins. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proc. EMNLP, 2002.
[4] T. Gao and D. Koller. Active classification based on value of classifier. In NIPS, 2011.
[5] A. Grubb and D. Bagnell. Speedboost: Anytime prediction with uniform near-optimality. In AISTATS, 2012.
[6] H. He, H. Daume III, and J. Eisner. Imitation learning by coaching. In NIPS, 2012.
[7] H. He, H. Daume III, and J. Eisner. Dynamic feature selection for dependency parsing. In EMNLP, 2013.
[8] R. A Howard. Information value theory. Systems Science and Cybernetics, IEEE Transactions on, 2(1):2226, 1966.
[9] Andreas Krause and Carlos Guestrin. Optimal value of information in graphical models. Journal of Artificial Intelligence Research (JAIR), 35:557591, 2009.
[10] J.D. Lafferty, A. McCallum, and F.C.N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML, 2001.
[11] M. Lagoudakis and R. Parr. Least-squares policy iteration. JMLR, 2003.
[12] Dennis V Lindley. On a measure of the information provided by an experiment. The Annals of Mathe- matical Statistics, pages 9861005, 1956.
[13] C. Liu. Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. PhD thesis, MIT, 2009.
[14] V.C. Raykar, B. Krishnapuram, and S. Yu. Designing efficient cascaded classifiers: tradeoff between accuracy and cost. In SIGKDD, 2010.
[15] B. Sapp and B. Taskar. MODEC: Multimodal decomposable models for human pose estimation. In CVPR, 2013.
[16] B. Sapp, D. Weiss, and B. Taskar. Parsing human motion with stretchable models. In CVPR, 2011.
[17] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML, 2007.
[18] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: A large margin approach. In ICML, 2005.
[19] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.
[20] K. Trapeznikov and V. Saligrama. Supervised sequential classification under budget constraints. In AISTATS, 2013.
[21] C. Watkins and P. Dayan. Q-learning. Machine learning, 1992.
[22] D. Weiss, B. Sapp, and B. Taskar. Dynamic structured model selection. In ICCV, 2013.
[23] D. Weiss and B. Taskar. Structured prediction cascades. In AISTATS, 2010.
-----1
[1] Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. SPUDD: Stochastic Planning Using Decision Diagrams. In Proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence(UAI), 1999.
[2] Robert St-Aubin, Jesse Hoey, and Craig Boutilier. APRICODD: Approximate Policy Construction Using Decision Diagrams. Advances in Neural Information Processing Systems(NIPS), 2001.
[3] Scott Sanner, William Uther, and Karina Valdivia Delgado. Approximate Dynamic Programming with Affine ADDs. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 2010.
[4] Aswin Raghavan, Saket Joshi, Alan Fern, Prasad Tadepalli, and Roni Khardon. Planning in Factored Action Spaces with Symbolic Dynamic Programming. In Twenty-Sixth AAAI Conference on Artificial Intelligence(AAAI), 2012.
[5] Martin L Puterman and Moon Chirl Shin. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. Management Science, 1978.
[6] Craig Boutilier, Richard Dearden, and Moises Goldszmidt. Exploiting Structure in Policy Construction.In International Joint Conference on Artificial Intelligence(IJCAI), 1995.
[7] R Iris Bahar, Erica A Frohm, Charles M Gaona, Gary D Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebraic Decision Diagrams and their Applications. In Computer-Aided Design, 1993.
[8] Chenggang Wang and Roni Khardon. Policy Iteration for Relational MDPs. arXiv preprint arXiv:1206.5287, 2012.
[9] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. 1996.
[10] Jason Pazis and Ronald Parr. Generalized Value Functions for Large Action Sets. In Proc. of ICML, 2011.
[11] Scott Sanner. Relational Dynamic Influence Diagram Language (RDDL): Language Description. Unpub- lished ms. Australian National University, 2010.
[12] Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent Planning with Factored MDPs. Advances in Neural Information Processing Systems(NIPS), 2001.
[13] Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, and Matthieu Geist. Approximate Modified Policy Iteration. In ICML, 2012.
-----1
[1] C. Amato, D. S. Bernstein, and S. Zilberstein. Optimizing memory-bounded controllers for decentralized POMDPs. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pages 18, Vancouver, British Columbia, 2007.
[2] C. Amato and S. Zilberstein. Achieving goals in decentralized pomdps. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pages 593600, 2009.
[3] J. S. Dibangoye, C. Amato, O. Buffet, F. Charpillet, S. Nicol, T. Iwamura, O. Buffet, I. Chades, M. Tagorti, B. Scherrer, et al. Optimally solving dec-pomdps as continuous-state mdps. In Pro- ceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
[4] P. Doshi and P. Gmytrasiewicz. Monte Carlo sampling methods for approximating interactive POMDPs. Journal of Artificial Intelligence Research, 34(1):297337, 2009.
[5] R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the Third In- ternational Joint Conference on Autonomous Agents and Multiagent Systems-Volume 1, pages 136143. IEEE Computer Society, 2004.
[6] R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Game theoretic control for robot teams. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 11631169. IEEE, 2005.
[7] F. Forges. Correlated equilibrium in games with incomplete information revisited. Theory and decision, 61(4):329344, 2006.
[8] A. Kumar and S. Zilberstein. Anytime planning for decentralized pomdps using expectation maximization. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2012.
[9] F. A. Oliehoek. Sufficient plan-time statistics for decentralized pomdps. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
[10] F. A. Oliehoek, M. T. Spaan, J. S. Dibangoye, and C. Amato. Heuristic search for identical payoff bayesian games. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 2010.
[11] F. A. Oliehoek, M. T. Spaan, and N. Vlassis. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32(1):289353, 2008.
[12] F. A. Oliehoek and N. Vlassis. Q-value functions for decentralized pomdps. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, page 220. ACM, 2007.
[13] F. A. Oliehoek, S. Whiteson, and M. T. Spaan. Lossless clustering of histories in decentralized pomdps. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, pages 577584, 2009.
[14] J. Pajarinen and J. Peltonen. Periodic finite state controllers for efficient pomdp and dec-pomdp planning. In Proc. of the 25th Annual Conf. on Neural Information Processing Systems, 2011.
[15] C. H. Papadimitriou and J. Tsitsiklis. On the complexity of designing distributed protocols.Information and Control, 53(3):211218, 1982.
[16] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: an anytime algorithm for pomdps. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI03, pages 10251030, 2003.
[17] M. Roth, R. Simmons, and M. Veloso. Reasoning about joint beliefs for execution-time communication decisions. In Proceedings of the fourth international joint conference on Au- tonomous agents and multiagent systems, pages 786793. ACM, 2005.
[18] M. T. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for pomdps.Journal of artificial intelligence research, 24(1):195220, 2005.
[19] D. Szer, F. Charpillet, S. Zilberstein, et al. Maa*: A heuristic search algorithm for solving decentralized pomdps. In 21st Conference on Uncertainty in Artificial Intelligence-UAI2005, 2005.
-----1
[1] Manish Jain, Dmytro Korzhyk, Ondrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind Tambe. A double oracle algorithm for zero-sum security games. In Tenth International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), pages 327334, 2011.
[2] Michael Johanson, Nolan Bard, Neil Burch, and Michael Bowling. Finding optimal abstract strategies in extensive-form games. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI- 12), pages 13711379, 2012.
[3] S. M. Ross. Goofspiel  the game of pure strategy. Journal of Applied Probability, 8(3):621625, 1971.
[4] Glenn C. Rhoads and Laurent Bartholdi. Computer solution to the game of pure strategy. Games, 3(4):150156, 2012.
[5] Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In In Pro- ceedings of the Eleventh International Conference on Machine Learning (ICML-1994), pages 157163.Morgan Kaufmann, 1994.
[6] M. Genesereth and N. Love. General game-playing: Overview of the AAAI competition. AI Magazine, 26:6272, 2005.
[7] Michael Buro. Solving the Oshi-Zumo game. In Proceedings of Advances in Computer Games 10, pages 361366, 2003.
[8] Abdallah Saffidine, Hilmar Finnsson, and Michael Buro. Alpha-beta pruning for games with simultaneous moves. In Proceedings of the Thirty-Second Conference on Artificial Intelligence (AAAI-12), pages 556 562, 2012.
[9] Branislav Bosansky, Viliam Lisy, Jiri Cermak, Roman Vitek, and Michal Pechoucek. Using double-oracle method and serialized alpha-beta search for pruning in simultaneous moves games. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 4854, 2013.
[10] H. Finnsson and Y. Bjornsson. Simulation-based approach to general game-playing. In The Twenty-Third AAAI Conference on Artificial Intelligence, pages 259264. AAAI Press, 2008.
[11] Olivier Teytaud and Sebastien Flory. Upper confidence trees with short term partial information. In Applications of Eolutionary Computation (EvoApplications 2011), Part I, volume 6624 of LNCS, pages 153162, Berlin, Heidelberg, 2011. Springer-Verlag.
[12] Pierre Perick, David L. St-Pierre, Francis Maes, and Damien Ernst. Comparison of different selection strategies in monte-carlo tree search for the game of Tron. In Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), pages 242249, 2012.
[13] Hilmar Finnsson. Simulation-Based General Game Playing. PhD thesis, Reykjavik University, 2012.
[14] L. Kocsis and C. Szepesvari. Bandit-based Monte Carlo planning. In 15th European Conference on Machine Learning, volume 4212 of LNCS, pages 282293, 2006.
[15] S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):11271150, 2000.
[16] Peter Auer, Nicolo` Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):4877, 2002.
[17] M. Shafiei, N. R. Sturtevant, and J. Schaeffer. Comparing UCT versus CFR in simultaneous games. In Proceeding of the IJCAI Workshop on General Game-Playing (GIGA), pages 7582, 2009.
[18] Kevin Waugh. Abstraction in large extensive games. Masters thesis, University of Alberta, 2009.
[19] A. Blum and Y. Mansour. Learning, regret minimization, and equilibria. In Noam Nisan, Tim Rough- garden, Eva Tardos, and Vijay V. Vazirani, editors, Algorithmic Game Theory, chapter 4. Cambridge University Press, 2007.
[20] Marc Lanctot, Viliam Lisy, and Mark H.M. Winands. Monte Carlo tree search in simultaneous move games with applications to Goofspiel. In Workshop on Computer Games at IJCAI, 2013.
-----1
[1] K. Alam. A two-sample estimate of the largest mean. Annals of the Institute of Statistical Mathematics, 19(1):271283, 1967.
[2] M. Babaioff, R.D. Kleinberg, and A. Slivkins. Truthful mechanisms with implicit payment computation. arXiv preprint arXiv:1004.3630, 2010.
[3] S. Blumenthal and A. Cohen. Estimation of the larger of two normal means. Journal of the American Statistical Association, pages 861876, 1968.
[4] Bhaeiyal Ishwaei D, D. Shabma, and K. Krishnamoorthy. Non-existence of unbiased esti- mators of ordered parameters. Statistics: A Journal of Theoretical and Applied Statistics, 16(1):8995, 1985.
[5] N.R. Devanur and S.M. Kakade. The price of truthfulness for pay-per-click auctions. In Proceedings of the tenth ACM conference on Electronic commerce, pages 99106, 2009.
[6] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Technical report, National Bureau of Economic Research, 2005.
[7] N. Gatti, A. Lazaric, and F. Trovo`. A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 605622. ACM, 2012.
[8] R. Gonen and E. Pavlov. An incentive-compatible multi-armed bandit mechanism. In Pro- ceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, pages 362363. ACM, 2007.
[9] S.M. Li, M. Mahdian, and R. McAfee. Value of learning in sponsored search auctions. Internet and Network Economics, pages 294305, 2010.
[10] Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic game theory.Cambridge University Press, 2007.
[11] Sandeep Pandey and Christopher Olston. Handling advertisements of unknown quality in search advertising. Advances in Neural Information Processing Systems, 19:1065, 2007.
[12] A.D. Sarma, S. Gujar, and Y. Narahari. Multi-armed bandit mechanisms for multi-slot spon- sored search auctions. arXiv preprint arXiv:1001.1414, 2010.
[13] J. Wortman, Y. Vorobeychik, L. Li, and J. Langford. Maintaining equilibria during exploration in sponsored search auctions. Internet and Network Economics, pages 119130, 2007.
-----1
[1] S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: A meta-algorithm and applications. Theory of Computing, 8(1):121164, 2012.
[2] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-confident on-line learning algo- rithms. Journal of Computer and System Sciences, 64(1):4875, 2002.
[3] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[4] C.-K. Chiang, T. Yang, C.-J. Lee, M. Mahdavi, C.-J. Lu, R. Jin, and S. Zhu. Online optimiza- tion with gradual variations. In COLT, 2012.
[5] P. Christiano, J. A Kelner, A. Madry, D. A. Spielman, and S.-H. Teng. Electrical flows, lapla- cian systems, and faster approximation of maximum flow in undirected graphs. In Proceedings of the 43rd annual ACM symposium on Theory of computing, pages 273282. ACM, 2011.
[6] C. Daskalakis, A. Deckelbaum, and A. Kim. Near-optimal no-regret algorithms for zero- sum games. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pages 235254. SIAM, 2011.
[7] Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79103, 1999.
[8] A. Goldberg and S. Rao. Beyond the flow decomposition barrier. Journal of the ACM (JACM), 45(5):783797, 1998.
[9] A. Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229251, 2004.
[10] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127152, 2005.
[11] A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), 2013.
-----0
Jacob Abernethy and Manfred K. Warmuth. Repeated games against budgeted adversaries. In NIPS, 2010.
Jacob Abernethy, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal strategies and minimax lower bounds for online convex games. In COLT, 2008a.
Jacob Abernethy, Manfred K Warmuth, and Joel Yellin. Optimal strategies from random walks.In Proceedings of The 21st Annual Conference on Learning Theory, pages 437446. Citeseer, 2008b.
Jacob Abernethy, Alekh Agarwal, Peter Bartlett, and Alexander Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT, 2009.
Jacob Abernethy, Rafael M. Frongillo, and Andre Wibisono. Minimax option pricing meets blackscholes in the limit. In STOC, 2012.
Amit Agarwal, Elad Hazan, Satyen Kale, and Robert E. Schapire. Algorithms for portfolio management based on the Newton method. In ICML, 2006.
Nicolo` Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
A. de Moivre. The Doctrine of Chances: or, A Method of Calculating the Probabilities of Events in Play. 1718.Ofer Dekel, Ambuj Tewari, and Raman Arora. Online bandit learning against an adaptive adversary: from regret to policy regret. In ICML, 2012.
Peter DeMarzo, Ilan Kremer, and Yishay Mansour. Online trading algorithms and robust option pricing. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 477486. ACM, 2006.
Persi Diaconis and Sandy Zabell. Closed form summation for classical distributions: Variations on a theme of de Moivre. Statistical Science, 6(3), 1991.
Elad Hazan and Satyen Kale. On stochastic and worst-case models for investing. In NIPS. 2009.J. L. Kelly Jr. A new interpretation of information rate. Bell System Technical Journal, 1956.
Wouter Koolen, Dmitry Adamskiy, and Manfred Warmuth. Putting bayes to sleep. In NIPS. 2012.N. Merhav, E. Ordentlich, G. Seroussi, and M. J. Weinberger. On sequential strategies for loss functions with memory. IEEE Trans. Inf. Theor., 48(7), September 2006.
Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Relax and randomize: From value to algorithms. In NIPS, 2012.
Ralph T. Rockafellar. Convex Analysis (Princeton Landmarks in Mathematics and Physics). Princeton University Press, 1997.Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 2012.
Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1):330, 2011.Gilles Stoltz. Contributions to the sequential prediction of arbitrary sequences: applications to the theory of repeated games and empirical studies of the performance of the aggregation of experts.
Habilitation a` diriger des recherches, Universite Paris-Sud, 2011.Matthew Streeter and H. Brendan McMahan. No-regret algorithms for unconstrained online convex optimization. In NIPS, 2012.
Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69, 2001.Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.
-----0
Agarwal, A. and Duchi, J. C. (2011). Distributed delayed stochastic optimization. In Shawe-Taylor, 
J., Zemel, R. S., Bartlett, P. L., Pereira, F. C. N., and Weinberger, K. Q., editors, NIPS, pages 873881.
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):4877.
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, learning, and games. Cambridge Univ Pr.Cesa-Bianchi, N., Lugosi, G., and Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51(6):21522162.
Cesa-Bianchi, N., Lugosi, G., and Stoltz, G. (2006). Regret minimization under partial monitoring.Math. Oper. Res., 31(3):562580.Cesa-Bianchi, N., Shalev-Shwartz, S., and Shamir, O. (2010). Efficient learning with partially observed attributes. CoRR, abs/1004.4421.Dekel, O., Shamir, O., and Xiao, L. (2010). Learning to classify with missing and corrupted features.Machine Learning, 81(2):149178.Joulani, P., Gyorgy, A., and Szepesvari, C. (2013). Online learning under delayed feedback. In 30th International Conference on Machine Learning, Atlanta, GA, USA.Kapoor, A. and Greiner, R. (2005). Learning and classifying under hard budgets. In European Conference on Machine Learning (ECML), pages 166173.Lizotte, D., Madani, O., and Greiner, R. (2003). Budgeted learning of naive-Bayes classifiers. In Conference on Uncertainty in Artificial Intelligence (UAI).
Mannor, S. and Shamir, O. (2011). From bandits to experts: On the value of side-observations.CoRR, abs/1106.2436.Mesterharm, C. (2005). On-line learning with delayed label feedback. In Proceedings of the 16th international conference on Algorithmic Learning Theory, ALT05, pages 399413, Berlin, Heidelberg. Springer-Verlag.Rostamizadeh, A., Agarwal, A., and Bartlett, P. L. (2011). Learning with missing features. In UAI, pages 635642.Settles, B. (2009). Active learning literature survey. Technical report.
Weinberger, M. J. and Ordentlich, E. (2006). On delayed prediction of individual sequences. IEEE Trans. Inf. Theor., 48(7):19591976.
-----1
[1] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119139, 1997.
[2] Marcus Hutter and Jan Poland. Adaptive online prediction by following the perturbed leader.Journal of Machine Learning Research, 6:639660, 2005.
[3] Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In Rocco A. Servedio and Tong Zhang, editors, COLT, pages 437446. Omni- press, 2008.
[4] Wouter M. Koolen. Combining Strategies Efficiently: High-quality Decisions from Conflicting Advice. PhD thesis, Institute of Logic, Language and Computation (ILLC), University of Amsterdam, January 2011.
[5] Nicolo` Cesa-Bianchi and Ohad Shamir. Efficient online learning via randomized rounding.In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 343351, 2011.
[6] Nicolo` Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427485, 1997.
[7] Jacob Abernethy, John Langford, and Manfred K Warmuth. Continuous experts and the Bin- ning algorithm. In Learning Theory, pages 544558. Springer, 2006.
[8] Jacob Abernethy and Manfred K. Warmuth. Repeated games against budgeted adversaries. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 19, 2010.
[9] Sasha Rakhlin, Ohad Shamir, and Karthik Sridharan. Relax and randomize : From value to algorithms. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 21502158, 2012.
[10] Eyal Even-Dar, Michael Kearns, Yishay Mansour, and Jennifer Wortman. Regret to the best vs. regret to the average. Machine Learning, 72(1-2):2137, 2008.
[11] Michael Kapralov and Rina Panigrahy. Prediction strategies without loss. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 828836, 2011.
[12] Kamalika Chaudhuri, Yoav Freund, and Daniel Hsu. A parameter-free hedging algorithm. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 297305, 2009.
[13] Alexey V. Chernov and Vladimir Vovk. Prediction with advice of unknown number of experts.In Peter Grunwald and Peter Spirtes, editors, UAI, pages 117125. AUAI Press, 2010.
[14] Nicolo` Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge Uni- versity Press, 2006.
[15] Peter Auer, Nicolo` Cesa-Bianchi, and Claudio Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64(1):4875, 2002.
[16] Nicolo` Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2-3):321352, 2007.
[17] Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by varia- tion in costs. Machine learning, 80(2-3):165188, 2010.
[18] Chao-Kai Chiang, Tianbao Yangand Chia-Jung Leeand Mehrdad Mahdaviand Chi-Jen Lu- and Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Annual Conference on Learning Theory, number 23 in JMLR W&CP, pages 6.1  6.20, June 2012.
[19] Steven de Rooij, Tim van Erven, Peter D. Grunwald, and Wouter M. Koolen. Follow the leader if you can, Hedge if you must. ArXiv, 1301.0534, January 2013.
-----1
[1] J. Abernethy and A. Rakhlin. Beating the adaptive bandit with high probability. In COLT, 2009.
[2] R. Agrawal, M.V. Hedge, and D. Teneketzis. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost. IEEE Transactions on Automatic Control, 33(10):899906, 1988.
[3] R. Arora, O. Dekel, and A. Tewari. Online bandit learning against an adaptive adversary: from regret to policy regret. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.
[4] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):4877, 2002.
[5] A. Borodin and R. El-Yaniv. Online computation and competitive analysis. Cambridge Uni- versity Press, 1998.
[6] S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari. X-armed bandits. Journal of Machine Learning Research, 12:16551695, 2011.
[7] N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second-price auctions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA13), 2013.
[8] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[9] N. Cesa-Bianchi, Y. Mansour, and G. Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2/3):321352, 2007.
[10] V. Dani and T. P. Hayes. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In Proceedings of the Seventeenth Annual ACM-SIAM Sympo- sium on Discrete Algorithms, 2006.
[11] Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and System Sciences, 55(1):119139, 1997.
[12] A. Gyorgy and G. Neu. Near-optimal rates for limited-delay universal lossy source coding. In IEEE International Symposium on Information Theory, pages 22182222, 2011.
[13] T. Jun. A survey on the bandit problem with switching costs. De Economist, 152:513541, 2004.
[14] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Com- puter and System Sciences, 71:291307, 2005.
[15] N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Com- putation, 108:212261, 1994.
[16] O. Maillard and R. Munos. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
[17] H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In Proceedings of the Seventeenth Annual Conference on Learning Theory, 2004.
[18] N. Merhav, E. Ordentlich, G. Seroussi, and M.J. Weinberger. Sequential strategies for loss functions with memory. IEEE Transactions on Information Theory, 48(7):19471958, 2002.
[19] C. Mesterharm. Online learning with delayed label feedback. In Proceedings of the Sixteenth International Conference on Algorithmic Learning Theory, 2005.
[20] R. Ortner. Online regret bounds for Markov decision processes with deterministic transitions.Theoretical Computer Science, 411(2930):26842695, 2010.
[21] O. Shamir. On the complexity of bandit and derivative-free stochastic convex optimization.CoRR, abs/1209.2388, 2012.
-----1
[1] D. Lizotte, T. Wang, M. Bowling, and D. Schuurmans. Automatic gait optimization with gaussian process regression. In Proc. of IJCAI, pages 944949, 2007.
[2] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281305, 2012.
[3] Jasper Snoek, Hugo Larochelle, and Ryan Prescott Adams. Practical bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012.
[4] Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316327, 1991.
[5] G. Raskutti, M.J. Wainwright, and B. Yu. Minimax rates of estimation for high-dimensional linear regres- sion over `q-balls. Information Theory, IEEE Transactions on, 57(10):69766994, 2011.
[6] S. Mukherjee, Q. Wu, and D. Zhou. Learning gradients on manifolds. Bernoulli, 16(1):181207, 2010.
[7] H. Tyagi and V. Cevher. Active learning of multi-index function models. In Advances in Neural Informa- tion Processing Systems 25, pages 14751483, 2012.
[8] H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527535, 1952.
[9] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learn- ing Research, 3:397422, 2003.
[10] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In STOC, pages 681690, 2008.
[11] S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari. Online optimization in X-armed bandits. In NIPS, 2008.
[12] N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):32503265, May 2012.
[13] E. Brochu, V.M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost func- tions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
[14] J. Moc?kus. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference Novosibirsk, July 17, 1974, pages 400404. Springer, 1975.
[15] A.D. Bull. Convergence rates of efficient global optimization algorithms. The Journal of Machine Learn- ing Research, 12:28792904, 2011.
[16] A. Carpentier and R. Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Journal of Machine Learning Research - Proceedings Track, 22:190198, 2012.
[17] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
[18] B. Chen, R. Castro, and A. Krause. Joint optimization and variable selection of high-dimensional gaussian processes. In Proc. International Conference on Machine Learning (ICML), 2012.
[19] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas. Bayesian optimization in high dimensions via random embeddings. In In Proc. IJCAI, 2013.
[20] R.A. DeVore and G.G. Lorentz. Constructive approximation, volume 303. Springer Verlag, 1993.
[21] M. Fornasier, K. Schnass, and J. Vybiral. Learning functions of few arbitrary linear parameters in high dimensions. Foundations of Computational Mathematics, pages 134, 2012.
[22] B. Scholkopf and A.J. Smola. Learning with kernels: Support vector machines, regularization, optimiza- tion, and beyond. MIT press, 2001.
[23] E.J. Candes and Y. Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Information Theory, IEEE Transactions on, 57(4):23422359, 2011.
[24] P. A. Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12(1):99111, 1972.
[25] G.W. Stewart and J. Sun. Matrix Perturbation Theory, volume 175. Academic Press New York, 1990.
[26] John G Daugman et al. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Optical Society of America, Journal, A: Optics and Image Science, 2:11601169, 1985.
-----1
[1] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34:14361462, 2006.
[2] M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1): 19, 2007.
[3] O. Banerjee, L. El Ghaoui, and A. dAspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9:485 516, 2008.
[4] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the lasso. Biostatis- tics, 9(3):432441, 2007.
[5] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using `1- regularized logistic regression. Annals of Statistics, 38(3):12871319, 2010.
[6] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using group- sparse regularization. In Inter. Conf. on AI and Statistics (AISTATS), 14, 2011.
[7] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimen- sional undirected graphs. The Journal of Machine Learning Research, 10:22952328, 2009.
[8] A. Dobra and A. Lenkoski. Copula gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5(2A):969993, 2011.
[9] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High dimensional semiparametric gaussian copula graphical models. Arxiv preprint arXiv:1202.2169, 2012.
[10] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. The nonparanormal skeptic. Arxiv preprint arXiv:1206.6488, 2012.
[11] S. L. Lauritzen. Graphical models, volume 17. Oxford University Press, USA, 1996.
[12] I. Yahav and G. Shmueli. An elegant method for generating multivariate poisson random variable. Arxiv preprint arXiv:0710.5670, 2007.
[13] A. S. Krishnamoorthy. Multivariate binomial and poisson distributions. Sankhya: The Indian Journal of Statistics (1933-1960), 11(2):117124, 1951.
[14] P. Holgate. Estimation for the bivariate poisson distribution. Biometrika, 51(1-2):241287, 1964.
[15] D. Karlis. An em algorithm for multivariate poisson distribution and related models. Journal of Applied Statistics, 30(1):6377, 2003.
[16] N. A. C. Cressie. Statistics for spatial data. Wiley series in probability and mathematical statistics, 1991.
[17] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2):192236, 1974.
[18] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. Graphical models via generalized linear models. In Neur.Info. Proc. Sys., 25, 2012.
[19] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models via univariate exponential family distributions. Arxiv preprint arXiv:1301.4183, 2013.
[20] M. S. Kaiser and N. Cressie. Modeling poisson variables with positive spatial dependence. Statistics & Probability Letters, 35(4):423432, 1997.
[21] D. A. Griffith. A spatial filtering specification for the auto-poisson model. Statistics & probability letters, 58(3):245251, 2002.
[22] J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research, 18(9):15091517, 2008.
[23] Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours.Nature, 490(7418):6170, 2012.
[24] G. I. Allen and Z. Liu. A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. IEEE International Conference on Bioinformatics and Biomedicine, 2012.
[25] H. Liu, K. Roeder, and L. Wasserman. Stability approach to regularization selection (stars) for high dimensional graphical models. Arxiv preprint arXiv:1006.3316, 2010.
[26] L. Ma, F. Reinhardt, E. Pan, J. Soutschek, B. Bhat, E. G. Marcusson, J. Teruya-Feldstein, G. W. Bell, and R. A. Weinberg. Therapeutic silencing of mir-10b inhibits metastasis in a mouse mammary tumor model.Nature biotechnology, 28(4):341347, 2010.
[27] P. de Souza Rocha Simonini, A. Breiling, N. Gupta, M. Malekpour, M. Youns, R. Omranipour, F. Malekpour, S. Volinia, C. M. Croce, H. Najmabadi, et al. Epigenetically deregulated microrna-375 is involved in a positive feedback loop with estrogen receptor ? in breast cancer cells. Cancer research, 70(22):91759184, 2010.
-----1
[1] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families and variational inference.Foundations and Trends in Machine Learning, 1(12):1305, December 2008.
[2] J. Cheng, E. Levina, P. Wang, and J. Zhu. Sparse ising models with covariates. Arxiv preprint arXiv:1209.6419, 2012.
[3] S. Ding, G. Wahba, and X. Zhu. Learning Higher-Order Graph Structure with Features by Structure Penalty. In NIPS, 2011.
[4] T. Cai, H. Li, W. Liu, and J. Xie. Covariate adjusted precision matrix estimation with an application in genetical genomics. Biometrika, 2011.
[5] S. Kim and E. P. Xing. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genetics, 2009.
[6] H. Liu, X. Chen, J. Lafferty, and L. Wasserman. Graph-valued regression. In NIPS, 2010.
[7] J. Yin and H. Li. A sparse conditional gaussian graphical model for analysis of genetical genomics data.Annals of Applied Statistics, 5(4):26302650, 2011.
[8] X. Yuan and T. Zhang. Partial gaussian graphical model estimation. Arxiv preprint arXiv:1209.6419, 2012.
[9] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. Graphical models via generalized linear models. In Neur.Info. Proc. Sys., 25, 2012.
[10] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models via univariate exponential family distributions. Arxiv preprint arXiv:1301.4183, 2013.
[11] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using group- sparse regularization. In Inter. Conf. on AI and Statistics (AISTATS), 14, 2011.
[12] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34:14361462, 2006.
[13] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using `1- regularized logistic regression. Annals of Statistics, 38(3):12871319, 2010.
[14] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2):192236, 1974.
[15] A. Y. Ng andM. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Neur. Info. Proc. Sys., 2002.
[16] M. Schmidt, K. Murphy, G. Fung, and R. Rosales. Structure learning in random fields for heart motion abnormality detection. In Computer Vision and Pattern Recognition (CVPR), pages 18, 2008.
[17] A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual models for object detection using boosted random fields. In NIPS, 2004.
[18] J. K. Bradley and C. Guestrin. Learning tree conditional random fields. In ICML, 2010.
[19] D. Shahaf, A. Chechetka, and C. Guestrin. Learning thin junction trees via graph cuts. In AISTATS, 2009.
[20] Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):10611068, October 2008.
[21] H. Liu, K. Roeder, and L. Wasserman. Stability approach to regularization selection (stars) for high dimensional graphical models. Arxiv preprint arXiv:1006.3316, 2010.
[22] J. Yang, S. A. Mani, J. L. Donaher, S. Ramaswamy, R. A. Itzykson, C. Come, P. Savagner, I. Gitelman, A. Richardson, and R. A. Weinberg. Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis. Cell, 117(7):927939, 2004.
[23] S. A. Mikheeva, A. M. Mikheev, A. Petit, R. Beyer, R. G. Oxford, L. Khorasani, J.-P. Maxwell, C. A.Glackin, H. Wakimoto, I. Gonzalez-Herrero, et al. Twist1 promotes invasion through mesenchymal change in human glioblastoma. Mol Cancer, 9:194, 2010.
[24] M. A. Smit, T. R. Geiger, J.-Y. Song, I. Gitelman, and D. S. Peeper. A twist-snail axis critical for trkb- induced epithelial-mesenchymal transition-like transformation, anoikis resistance, and metastasis. Molec- ular and cellular biology, 29(13):37223737, 2009.
[25] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming (Lasso). IEEE Trans. Information Theory, 55:21832202, May 2009.
[26] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis ofm-estimators with decomposable regularizers. Arxiv preprint arXiv:1010.2731, 2010.
-----1
[1] N. Shervashidze, S.V.N. Vishwanathan, T. Petri, K. Mehlhorn, and K.M. Borgwardt. Efficient graphlet kernels for large graph comparison. JMLR, 5:488495, 2009.
[2] N. Shervashidze, P. Schweitzer, E.J. van Leeuwen, K. Mehlhorn, and K.M. Borgwardt. Weisfeiler- Lehman graph kernels. JMLR, 12:25392561, 2011.
[3] D. Haussler. Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz, 1999.
[4] M. Collins and N. Duffy. Convolution kernels for natural language. In NIPS, pages 625632, 2001.
[5] S.V.N. Vishwanathan and A.J. Smola. Fast kernels for string and tree matching. In NIPS, pages 569576, 2002.
[6] C. Cortes, P. Haffner, and M. Mohri. Rational kernels: Theory and algorithms. JMLR, 5:10351062, 2004.
[7] D. Kimura and H. Kashima. Fast computation of subpath kernel for trees. In ICML, 2012.
[8] P. Mahe and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning, 75:335, 2009.
[9] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In ICML, pages 321328, 2003.
[10] T. Gartner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines, volume 2777 of LNCS, pages 129143, 2003.
[11] S.V.N. Vishwanathan, N.N. Schraudolph, R.I. Kondor, and K.M. Borgwardt. Graph kernels. JMLR, 11:12011242, 2010.
[12] F.R. Bach. Graph kernels between point clouds. In ICML, pages 2532, 2008.
[13] P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Extensions of marginalized graph kernels. In ICML, 2004.
[14] N. Kriege and P. Mutzel. Subgraph matching kernels for attributed graphs. In ICML, 2012.
[15] B. Gauze`re, L. Brun, and D. Villemin. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters, 15:20382047, 2012.
[16] M. Neumann, N. Patricia, R. Garnett, and K. Kersting. Efficient graph kernels by randomization. In ECML/PKDD (1), pages 378393, 2012.
[17] K.M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. ICDM, 2005.
[18] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms (3. ed.). MIT Press, 2009.
[19] A. Feragen, J. Petersen, D. Grimm, A. Dirksen, J.H. Pedersen, K. Borgwardt, and M. de Bruijne. Geo- metric tree kernels: Classification of COPD from airway tree geometry. In IPMI 2013, 2013.
[20] N. Shervashidze. Graph kernels code, http://mlcb.is.tuebingen.mpg.de/Mitarbeiter/ Nino/Graphkernels/.
[21] D. Gleich. MatlabBGL http://dgleich.github.io/matlab-bgl/.
[22] I. Schomburg, A. Chang, C. Ebeling, M. Gremse, C. Heldt, G. Huhn, and D. Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic Acids Research, 32:431433, 2004.
[23] P.D. Dobson and A.J. Doig. Distinguishing enzyme structures from non-enzymes without alignments.Journal of Molecular Biology, 330(4):771  783, 2003.
[24] K.M. Borgwardt, C.S. Ong, S. Schonauer, S.V.N. Vishwanathan, A.J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl 1):i47i56, 2005.
[25] J. Pedersen, H. Ashraf, A. Dirksen, K. Bach, H. Hansen, P. Toennesen, H. Thorsen, J. Brodersen, B. Skov, M. Dssing, J. Mortensen, K. Richter, P. Clementsen, and N. Seersholm. The Danish randomized lung cancer CT screening trial - overall design and results of the prevalence round. J Thorac Oncol, 4(5):608 614, May 2009.
[26] J. Petersen, M. Nielsen, P. Lo, Z. Saghir, A. Dirksen, and M. de Bruijne. Optimal graph based seg- mentation using flow lines with application to airway wall segmentation. In IPMI, LNCS, pages 4960, 2011.
[27] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. Int. Syst. and Tech., 2:27:127:27, 2011. Software available at http://www.csie.ntu.edu.tw/cjlin/ libsvm.
-----1
[1] E. Arias-Castro, E.J. Candes, and A. Durand. Detection of an anomalous cluster in a network. The Annals of Statistics, 39(1):278304, 2011.
[2] L. Addario-Berry, N. Broutin, L. Devroye, and G. Lugosi. On combinatorial testing problems. The Annals of Statistics, 38(5):30633092, 2010.
[3] E. Arias-Castro, E.J. Candes, H. Helgason, and O. Zeitouni. Searching for a trail of evidence in a maze.The Annals of Statistics, 36(4):17261757, 2008.
[4] V. Cevher, C. Hegde, M.F. Duarte, and R.G. Baraniuk. Sparse signal recovery using markov random fields. Technical report, DTIC Document, 2009.
[5] P. Ravikumar and J.D. Lafferty. Quadratic programming relaxations for metric labeling and markov random field map estimation. 2006.
[6] R.I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 315322. Citeseer, 2002.
[7] A. Smola and R. Kondor. Kernels and regularization on graphs. Learning theory and kernel machines, pages 144158, 2003.
[8] Y.I. Ingster. Minimax testing of nonparametric hypotheses on a distribution density in the lp metrics.Theory of Probability and its Applications, 31:333, 1987.
[9] Y.I. Ingster and I.A. Suslina. Nonparametric goodness-of-fit testing under Gaussian models, volume 169.Springer Verlag, 2003.
[10] E. Arias-Castro, D. Donoho, and X. Huo. Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory, 51(7):24022425, 2005.
[11] L. Jacob, P. Neuvial, and S. Dudoit. Gains in power from structured two-sample tests of means on graphs.Arxiv preprint arXiv:1009.5173, 2010.
[12] Daniel B Neill and Andrew W Moore. Rapid detection of significant spatial clusters. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 256265. ACM, 2004.
[13] Deepak Agarwal, Andrew McGregor, Jeff M Phillips, Suresh Venkatasubramanian, and Zhengyuan Zhu.Spatial scan statistics: approximations and performance study. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 2433. ACM, 2006.
[14] Carey E Priebe, John M Conroy, David J Marchette, and Youngser Park. Scan statistics on enron graphs.Computational & Mathematical Organization Theory, 11(3):229247, 2005.
[15] Chih-Wei Yi. A unified analytic framework based on minimum scan statistics for wireless ad hoc and sensor networks. Parallel and Distributed Systems, IEEE Transactions on, 20(9):12331245, 2009.
[16] J. Sharpnack, A. Rinaldo, and A. Singh. Changepoint detection over graphs with the spectral scan statistic.Arxiv preprint arXiv:1206.0773, 2012.
[17] James Sharpnack, Akshay Krishnamurthy, and Aarti Singh. Detecting activations over graphs using spanning tree wavelet bases. arXiv preprint arXiv:1206.0937, 2012.
[18] Christos H Papadimitriou and Kenneth Steiglitz. Combinatorial optimization: algorithms and complexity.Courier Dover Publications, 1998.
[19] Francis Bach. Convex analysis and optimization with submodular functions: a tutorial. arXiv preprint arXiv:1010.4207, 2010.
[20] Vladimir Kolmogorov and Ramin Zabin. What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(2):147159, 2004.
[21] Petter Strandmark and Fredrik Kahl. Parallel and distributed graph cuts by dual decomposition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 20852092. IEEE, 2010.
[22] David Sontag, Amir Globerson, and Tommi Jaakkola. Introduction to dual decomposition for inference.Optimization for Machine Learning, 1, 2011.
[23] R. Lyons and Y. Peres. Probability on trees and networks. Book in preparation., 2000.
[24] G. Kirchhoff. Ueber die auflosung der gleichungen, auf welche man bei der untersuchung der linearen vertheilung galvanischer strome gefuhrt wird. Annalen der Physik, 148(12):497508, 1847.
[25] R.M. Foster. The average impedance of an electrical network. Contributions to Applied Mechanics (Reissner Anniversary Volume), pages 333340, 1949.
[26] P. Tetali. Random walks and the effective resistance of networks. Journal of Theoretical Probability, 4(1):101109, 1991.
[27] Ulrike Von Luxburg, Agnes Radl, and Matthias Hein. Hitting and commute times in large graphs are often misleading. ReCALL, 2010.
[28] R Tyrell Rockafellar. Convex analysis, volume 28. Princeton university press, 1997.
[29] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms.MIT press, 2001.
[30] Wai Shing Fung and Nicholas JA Harvey. Graph sparsification by edge-connectivity and random spanning trees. arXiv preprint arXiv:1005.0265, 2010.
[31] Michel Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Soc., 2001.
-----1
[1] M. Alamgir and U. von Luxburg. Phase transition in the family of p-resistances. In NIPS.2011.
[2] R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475486, 2006.
[3] M. Belkin. Problems of Learning on Manifolds. PhD thesis, The University of Chicago, 2003.
[4] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning on large graphs. In COLT, pages 624638, 2004.
[5] M. Belkin, Q. Que, Y. Wang, and X. Zhou. Toward understanding complex spaces: Graph laplacians on manifolds with singularities and boundaries. In COLT, 2012.
[6] O. Bousquet, O. Chapelle, and M. Hein. Measure based regularization. In NIPS, 2003.
[7] F. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
[8] R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):530, 2006.
[9] P. G. Doyle and J. L. Snell. Random walks and electric networks. Mathematical Association of America, 1984.
[10] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transac- tions on Knowledge and Data Engineering, 19(3):355369, 2007.
[11] L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering.IEEE transactions on Computer-aided design of integrated circuits and systems, 11(9):1074 1085, 1992.
[12] D. J. Klein and M. Randic. Resistance distance. Journal of Mathematical Chemistry, 12(1):81 95, 1993.
[13] S. S. Lafon. Diffusion maps and geometric harmonics. PhD thesis, Yale University, 2004.
[14] M. H. G. Lever and M. Herbster. Predicting the labelling of a graph via minimum p-seminorm interpolation. In COLT, 2009.
[15] Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM, pages 469 478, 2008.
[16] B. Nadler, N. Srebro, and X. Zhou. Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data. In NIPS, pages 13301338, 2009.
[17] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.
[18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. PAMI, 22(8):888 905, 2000.
[19] M. Szummer and T. Jaakkola. Partially labeled classification with Markov random walks. In NIPS, pages 945952, 2002.
[20] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395416, 2007.
[21] U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555586, 2008.
[22] U. Von Luxburg, A. Radl, and M. Hein. Hitting and commute times in large graphs are often misleading. Arxiv preprint arXiv:1003.1266, 2010.
[23] X.-M. Wu, Z. Li, A. M.-C. So, J. Wright, and S.-F. Chang. Learning with partially absorbing random walks. In NIPS, 2012.
[24] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS, 2004.
[25] X. Zhou and M. Belkin. Semi-supervised learning by higher order regularization. In AISTATS, 2011.
[26] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
-----1
[1] J. Pearl, A constraint propagation approach to probabilistic reasoning, Proc. Uncertainty in Artificial Intell. (UAI), 1986.
[2] C. Chow and C. Liu, Approximating discrete probability distributions with dependence trees, IEEE Trans. Inform. Theory, vol. 14, no. 3, pp. 462467, 1968.
[3] M. Choi, V. Chandrasekaran, and A. Willsky, Exploiting sparse Markov and covariance struc- ture in multiresolution models, in Proc. 26th Annu. Int. Conf. on Machine Learning. ACM, 2009, pp. 177184.
[4] M. Comer and E. Delp, Segmentation of textured images using a multiresolution Gaussian autoregressive model, IEEE Trans. Image Process., vol. 8, no. 3, pp. 408420, 1999.
[5] C. Bouman and M. Shapiro, A multiscale random field model for Bayesian image segmenta- tion, IEEE Trans. Image Process., vol. 3, no. 2, pp. 162177, 1994.
[6] D. Karger and N. Srebro, Learning Markov networks: Maximum bounded tree-width graphs, in Proc. 12th Annu. ACM-SIAM Symp. on Discrete Algorithms, 2001, pp. 392401.
[7] M. Jordan, Graphical models, Statistical Sci., pp. 140155, 2004.
[8] P. Abbeel, D. Koller, and A. Ng, Learning factor graphs in polynomial time and sample com- plexity, J. Machine Learning Research, vol. 7, pp. 17431788, 2006.
[9] A. Dobra, C. Hans, B. Jones, J. Nevins, G. Yao, and M. West, Sparse graphical models for exploring gene expression data, J. Multivariate Anal., vol. 90, no. 1, pp. 196212, 2004.
[10] M. Tipping, Sparse Bayesian learning and the relevance vector machine, J. Machine Learn- ing Research, vol. 1, pp. 211244, 2001.
[11] J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graph- ical lasso, Biostatistics, vol. 9, no. 3, pp. 432441, 2008.
[12] P. Ravikumar, G. Raskutti, M. Wainwright, and B. Yu, Model selection in Gaussian graphical models: High-dimensional consistency of l1-regularized MLE, Advances in Neural Informa- tion Processing Systems (NIPS), vol. 21, 2008.
[13] V. Vazirani, Approximation Algorithms. New York: Springer, 2004.
[14] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. Willsky, Feedback message passing for inference in Gaussian graphical models, IEEE Trans. Signal Process., vol. 60, no. 8, pp.41354150, 2012.
[15] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian network classifiers, Machine learn- ing, vol. 29, no. 2, pp. 131163, 1997.
[16] V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky, Latent variable graphical model selection via convex optimization, in Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on. IEEE, 2010, pp. 16101613.
[17] M. Dinneen, K. Cattell, and M. Fellows, Forbidden minors to graphs with small feedback sets, Discrete Mathematics, vol. 230, no. 1, pp. 215252, 2001.
[18] F. Brandt, Minimal stable sets in tournaments, J. Econ. Theory, vol. 146, no. 4, pp. 1481 1499, 2011.
[19] V. Bafna, P. Berman, and T. Fujito, A 2-approximation algorithm for the undirected feedback vertex set problem, SIAM J. Discrete Mathematics, vol. 12, p. 289, 1999.
[20] S. Kirshner, P. Smyth, and A. W. Robertson, Conditional Chow-Liu tree structures for model- ing discrete-valued vector time series, in Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004, pp. 317324.
[21] M. J. Choi, V. Y. Tan, A. Anandkumar, and A. S. Willsky, Learning latent tree graphical models, Journal of Machine Learning Research, vol. 12, pp. 17291770, 2011.
-----1
[1] D. Koller and N. Friedman. Probabilistic Graphical Models:Principles and Techniques. MIT Press, 2009.
[2] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. on PAMI, 29(7), July 2007.
[3] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference.Found. Trends Mach. Learn., 1(1-2):1305, 2008.
[4] M. Schlesinger. Syntactic analysis of two-dimensional visual signals in the presence of noise. Kibernetika, (4):113130, 1976.
[5] M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on (hyper)trees: message passing and linear programming approaches. IEEE Trans. on Inf. Th., 51(11), 2005.
[6] N. Komodakis, N. Paragios, and G. Tziritas. MRF energy minimization and beyond via dual decomposi- tion. IEEE Trans. on PAMI, 33(3):531 552, march 2011.
[7] B. Savchynskyy, J. H. Kappes, S. Schmidt, and C. Schnorr. A study of Nesterovs scheme for Lagrangian decomposition and MAP labeling. In CVPR 2011, 2011.
[8] S. Schmidt, B. Savchynskyy, J. H. Kappes, and C. Schnorr. Evaluation of a first-order primal-dual algo- rithm for MRF energy minimization. In EMMCVPR, pages 89103, 2011.
[9] O. Meshi and A. Globerson. An alternating direction method for dual MAP LP relaxation. In ECML/PKDD (2), pages 470483, 2011.
[10] A. F. T. Martins, M. A. T. Figueiredo, P. M. Q. Aguiar, N. A. Smith, and E. P. Xing. An augmented Lagrangian approach to constrained MAP inference. In ICML, 2011.
[11] B. Savchynskyy, S. Schmidt, J. H. Kappes, and C. Schnorr. Efficient MRF energy minimization via adaptive diminishing smoothing. In UAI-2012, pages 746755.
[12] D. V. N. Luong, P. Parpas, D. Rueckert, and B. Rustem. Solving MRF minimization by mirror descent.In Advances in Visual Computing, volume 7431, pages 587598. Springer Berlin Heidelberg, 2012.
[13] J. H. Kappes, B. Savchynskyy, and C. Schnorr. A bundle approach to efficient MAP-inference by La- grangian relaxation. In CVPR 2012, 2012.
[14] B. Savchynskyy and S. Schmidt. Getting feasible variable estimates from infeasible ones: MRF local polytope study. Technical report, arXiv:1210.4081, 2012.
[15] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnorr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, J. Lellmann, N. Komodakis, and C. Rother. A comparative study of modern inference techniques for discrete energy minimization problems. In CVPR, 2013.
[16] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. on PAMI, 28(10):15681583, 2006.
[17] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIPS, 2007.
[18] T. Hazan and A. Shashua. Norm-product belief propagation: Primal-dual message-passing for approxi- mate inference. IEEE Trans. on Inf. Theory,, 56(12):6294 6316, 2010.
[19] M. I. Schlesinger and K. V. Antoniuk. Diffusion algorithms and structural recognition optimization prob- lems. Cybernetics and Systems Analysis, 47(2):175192, 2011.
[20] V. Franc, S. Sonnenburg, and T. Werner. Cutting-Plane Methods in Machine Learning, chapter 7, pages 185218. The MIT Press, Cambridge,USA, 2012.
[21] J. H. Kappes, M. Speth, B. Andres, G. Reinelt, and C. Schnorr. Globally optimal image partitioning by multicuts. In EMMCVPR, 2011.
[22] M. Bergtholdt, J. H. Kappes, S. Schmidt, and C. Schnorr. A study of parts-based object class detection using complete graphs. IJCV, 87(1-2):93117, 2010.
[23] M. Sun, M. Telaprolu, H. Lee, and S. Savarese. Efficient and exact MAP-MRF inference using branch and bound. In AISTATS-2012.
[24] L. Otten and R. Dechter. Anytime AND/OR depth-first search for combinatorial optimization. In Pro- ceedings of the Annual Symposium on Combinatorial Search (SOCS), 2011.
[25] M. C. Cooper, S. de Givry, M. Sanchez, T. Schiex, M. Zytnicki, and T. Werner. Soft arc consistency revisited. Artificial Intelligence, 174(7-8):449478, May 2010.
[26] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother.A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Trans. PAMI., 30:10681080, June 2008.
[27] ILOG, Inc. ILOG CPLEX: High-performance software for mathematical programming and optimization.See http://www.ilog.com/products/cplex/.
[28] B. Andres, T. Beier, and J. H. Kappes. OpenGM: A C++ library for discrete graphical models. ArXiv e-prints, 2012. Projectpage: http://hci.iwr.uni-heidelberg.de/opengm2/.
[29] I. Kovtun. Partial optimal labeling search for a NP-hard subclass of (max, +) problems. In Proceedings of the DAGM Symposium, 2003.
[30] J. H. Kappes, M. Speth, G. Reinelt, and C. Schnorr. Towards efficient and exact MAP-inference for large scale discrete computer vision problems via combinatorial optimization. In CVPR, 2013.
[31] S. Chopra and M. R. Rao. On the multiway cut polyhedron. Networks, 21(1):5189, 1991.
[32] P. Swoboda, B. Savchynskyy, J. H. Kappes, and C. Schnorr. Partial optimality via iterative pruning for the Potts model. In SSVM, 2013.
[33] D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, and T. Jaakkola. Tightening LP relaxations for MAP using message-passing. In UAI-2008, pages 503510.
[34] D. Sontag. C++ code for MAP inference in graphical models. See http://cs.nyu.edu/ dsontag/code/mplp_ver2.tgz.
-----1
[1] F. Bacchus, S. Dalmao, and T. Pitassi. Solving #-SAT and Bayesian inference with backtracking search.Journal of Artificial Intelligence Research, 34(2):391, 2009.
[2] Adnan Darwiche. Recursive conditioning. Artif. Intell., 126(1-2):541, 2001.
[3] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Pro- ceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 13191325, 2005.
[4] Rina Dechter. Bucket elimination: A unifying framework for reasoning. Artif. Intell., 113(1-2):4185, 1999.
[5] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007.
[6] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of the 27th Confer- ence on Uncertainty in Artificial Intelligence (UAI), pages 256265, 2011.
[7] Manfred Jaeger and Guy Van den Broeck. Liftability of probabilistic inference: Upper and lower bounds.In Proceedings of the 2nd International Workshop on Statistical Relational AI (StaRAI), 2012.
[8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted inference seen from the other side : The tractable features. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS), pages 973981. 2010.
[9] Kristian Kersting, Babak Ahmadi, and Sriraam Natarajan. Counting belief propagation. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 277284, 2009.
[10] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted probabilistic inference with counting formulas. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), pages 10621608, 2008.
[11] Mathias Niepert. Markov chains on orbits of permutation groups. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), pages 624633, 2012.
[12] David Poole. First-order probabilistic inference. In Proceedings of the 18th International Joint Confer- ence on Artificial Intelligence (IJCAI), pages 985991, 2003.
[13] David Poole, Fahiem Bacchus, and Jacek Kisynski. Towards completely lifted search-based probabilistic inference. CoRR, abs/1107.4035, 2011.
[14] David Poole and Nevin Lianwen Zhang. Exploiting contextual independence in probabilistic inference.J. Artif. Intell. Res. (JAIR), 18:263313, 2003.
[15] Parag Singla and Pedro Domingos. Lifted first-order belief propagation. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), pages 10941099, 2008.
[16] Nima Taghipour and Jesse Davis. Generalized counting for lifted variable elimination. In Proceedings of the 2nd International Workshop on Statistical Relational AI (StaRAI), 2012.
[17] Nima Taghipour, Daan Fierens, Jesse Davis, and Hendrik Blockeel. Lifted variable elimination with arbitrary constraints. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 11941202, 2012.
[18] Nima Taghipour, Daan Fierens, Guy Van den Broeck, Jesse Davis, and Hendrik Blockeel. Completeness results for lifted variable elimination. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS), 2013.
[19] Guy Van den Broeck. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Proceedings of the 24th Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 13861394, 2011.
[20] Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Lifted relax, compensate and then recover: From approximate to exact lifted probabilistic inference. In Proceedings of the 28th Conference on Un- certainty in Artificial Intelligence (UAI), pages 131141, 2012.
[21] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted proba- bilistic inference by first-order knowledge compilation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pages 21782185, 2011.
[22] Deepak Venugopal and Vibhav Gogate. On lifting the gibbs sampling algorithm. In Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 16, 2012.
-----1
[1] Kaufman, L., P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, 1990.
[2] Jain, A. K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651666, 2010.
[3] Brown, P. F., V. J. D. Pietra, P. V. deSouza, et al. Class-based n-gram models of natural language.Computational Linguistics, 18:184, 1990.
[4] Bergen, J., P. Anandan, K. Hanna, et al. Hierarchical model-based motion estimation. In ECCV. 1992.
[5] Girvan, M., M. E. J. Newman. Community structure in social and biological networks. PNAS, 99:7821 7826, 2002.
[6] Kingman, J. F. C. On the genealogy of large populations. Journal of Applied Probability, 19:2743, 1982.
[7] Pitman, J. Coalescents with multiple collisions. The Annals of Probability, 27:18701902, 1999.
[8] Berestycki, N. Recent progress in coalescent theory. In Ensaios Matematicos, vol. 16. 2009.
[9] Teh, Y. W., H. Daume III, D. M. Roy. Bayesian agglomerative clustering with coalescents. In NIPS. 2008.
[10] Heller, K. A., Z. Ghahramani. Bayesian hierarchical clustering. In ICML. 2005.
[11] Blundell, C., Y. W. Teh, K. A. Heller. Bayesian rose trees. In UAI. 2010.
[12] Adams, R., Z. Ghahramani, M. Jordan. Tree-structured stick breaking for hierarchical data. In NIPS. 2010.
[13] Knowles, D., Z. Ghahramani. Pitman-Yor diffusion trees. In UAI. 2011.
[14] Neal, R. M. Density modeling and clustering using Dirichlet diffusion trees. Bayesian Statistics, 7:619629, 2003.
[15] Sagitov, S. The general coalescent with asynchronous mergers of ancestral lines. Journal of Applied Probability, 36:11161125, 1999.
[16] Neal, R. M. Annealed importance sampling. Technical report 9805, University of Toronto, 1998.
[17] Fearhhead, P. Sequential Monte Carlo method in filter theory. PhD thesis, University of Oxford, 1998.
[18] Felsenstein, J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet, 25(5):471492, 1973.
[19] Birkner, M., J. Blath, M. Steinrucken. Importance sampling for lambda-coalescents in the infinitely many sites model. Theoretical population biology, 79(4):15573, 2011.
[20] Doucet, A., N. De Freitas, N. Gordon, eds. Sequential Monte Carlo methods in practice. 2001.
[21] Gordon, N., D. Salmond, A. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation.IEEE Proceedings F, Radar and Signal Processing, 140(2):107113, 1993.
[22] Gorur, D., L. Boyles, M. Welling. Scalable inference on Kingmans coalescent using pair similarity. JMLR, 22:440448, 2012.
[23] Antoniak, C. E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.The Annals of Statistics, 2(6):11521174, 1974.
[24] Cappe, O., S. Godsill, E. Moulines. An overview of existing methods and recent advances in sequential Monte Carlo. PROCEEDINGS-IEEE, 95(5):899, 2007.
[25] Chen, Z. Bayesian filtering: From kalman filters to particle filters, and beyond. McMaster, [Online], 2003.
[26] Eads, D. Hierarchical clustering (scipy.cluster.hierarchy). SciPy, 2007.
[27] Neal, R. M. Slice sampling. Annals of Statistics, 31:705767, 2003.
[28] Powers, D. M. W. Unsupervised learning of linguistic structure an empirical evaluation. International Journal of Corpus Linguistics, 2:91131, 1997.
[29] Jongeneel, C., M. Delorenzi, C. Iseli, et al. An atlas of human gene expression from massively parallel signature sequencing (mpss). Genome Res, 15:10071014, 2005.
[30] Shlens, J. A tutorial on principal component analysis. In Systems Neurobiology Laboratory, Salk Institute for Biological Studies. 2005.
[31] Blei, D. M., A. Ng, M. Jordan. Latent Dirichlet allocation. JMLR, 2003.
[32] Rai, P., H. Daume III. The infinite hierarchical factor regression model. In NIPS. 2008.
[33] Koo, T., X. Carreras, M. Collins. Simple semi-supervised dependency parsing. In ACL. 2008.
[34] Boyd-Graber, J., P. Resnik. Holistic sentiment analysis across languages: Multilingual supervised latent Dirichlet allocation. In EMNLP. 2010.
[35] Andrzejewski, D., X. Zhu, M. Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In ICML. 2009.
-----1
[1] K. Bache and M. Lichman. UCI machine learning repository, 2013.
[2] D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In NIPS, 2003.
[3] D. M. Blei and M. I. Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1:121144, 2005.
[4] C. A. Bush and S. N. MacEachern. A semiparametric Bayesian model for randomised block designs.Biometrika, 83:275285, 1973.
[5] D. B. Dahl. An improved merge-split sampler for conjugate Dirichlet process mixture models. Technical report, University of Wisconsin - Madison Dept. of Statistics, 2003.
[6] M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430):577588, 1995.
[7] S. Favaro and Y. W. Teh. MCMC for normalized random measure mixture models. Statistical Science, 2013.
[8] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209 230, 1973.
[9] P. J. Green and S. Richardson. Modelling heterogeneity with and without the Dirichlet process. Scandi- navian Journal of Statistics, pages 355375, 2001.
[10] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97109, 1970.
[11] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96:161173, 2001.
[12] H. Ishwaran and M. Zarepour. Exact and approximate sum-representations for the Dirichlet process.Canadian Journal of Statistics, 30:269283, 2002.
[13] S. Jain and R. Neal. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13:158182, 2000.
[14] S. Jain and R. Neal. Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Analysis, 2(3):445472, 2007.
[15] K. Kurihara, M. Welling, and Y. W. Teh. Collapsed variational Dirichlet process mixture models. In International Joint Conference on Artificial Intelligence, 2007.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998.
[17] P. Liang, M. I. Jordan, and B. Taskar. A permutation-augmented sampler for DP mixture models. In Proceedings of the 24th international conference on Machine learning, 2007.
[18] D. Lin, E. Grimson, and J. W. Fisher III. Construction of dependent Dirichlet processes based on Poisson processes. In NIPS, 2010.
[19] D. Lovell, R. P. Adams, and V. K. Mansingka. Parallel Markov chain Monte Carlo for Dirichlet process mixtures. In Workshop on Big Learning, NIPS, 2012.
[20] S. N. MacEachern. Estimating normal means with a conjugate style Dirichlet process prior. In Commu- nications in Statistics: Simulation and Computation, 1994.
[21] S. N. MacEachern and P. Muller. Estimating mixture of Dirichlet process models. Journal of Computa- tional and Graphical Statistics, 7(2):223238, June 1998.
[22] R. Neal. Bayesian mixture modeling. In Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, 1992.
[23] R. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249265, June 2000.
[24] O. Papaspiliopoulos and G. O. Roberts. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 95(1):169186, 2008.
[25] J. Pitman. Combinatorial stochastic processes. Technical report, U.C. Berkeley Dept. of Statistics, 2002.
[26] J. Sethuraman. A constructive definition of Dirichlet priors. Statstica Sinica, pages 639650, 1994.
[27] E. B. Sudderth. Graphical Models for Visual Object Recognition and Tracking. PhD thesis, Massachusetts Institute of Technology, 2006.
[28] E. B. Sudderth, A. B. Torralba, W. T. Freeman, and A. S. Willsky. Describing visual scenes using trans- formed Dirichlet processes. In NIPS, 2006.
[29] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):15661581, 2006.
[30] M. West, P. Muller, and S. N. MacEachern. Hierarchical priors and mixture models, with application in regression and density estimation. Aspects of Uncertainity, pages 363386, 1994.
[31] S. A. Williamson, A. Dubey, and E. P. Xing. Parallel Markov chain Monte Carlo for nonparametric mixture models. In ICML, 2013.
[32] E. P. Xing, R. Sharan, and M. I. Jordan. Bayesian haplotype inference via the Dirichlet process. In ICML, 2004.
-----1
[1] McCombs, M. The agenda-setting role of the mass media in the shaping of public opinion. North, 2009(05-12):21, 2002.
[2] McCombs, M., S. Ghanem. The convergence of agenda setting and framing. In Framing public life. 2001.
[3] Baumgartner, F. R., S. L. De Boef, A. E. Boydstun. The decline of the death penalty and the discovery of innocence. Cambridge University Press, 2008.
[4] Blei, D. M., A. Ng, M. Jordan. Latent Dirichlet allocation. JMLR, 3, 2003.
[5] Grimmer, J. A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis, 18(1):135, 2010.
[6] Zhang, J. Explore objects and categories in unexplored environments based on multimodal data. Ph.D.thesis, University of Hamburg, 2012.
[7] Blei, D. M., T. L. Griffiths, M. I. Jordan. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM, 57(2), 2010.
[8] Teh, Y. W., M. I. Jordan, M. J. Beal, et al. Hierarchical Dirichlet processes. JASA, 101(476), 2006.
[9] Paisley, J. W., C. Wang, D. M. Blei, et al. Nested hierarchical Dirichlet processes. arXiv:1210.6738, 2012.
[10] Ahmed, A., L. Hong, A. Smola. The nested Chinese restaurant franchise process: User tracking and document modeling. In ICML. 2013.
[11] Kim, J. H., D. Kim, S. Kim, et al. Modeling topic hierarchies with the recursive Chinese restaurant process.In CIKM, pages 783792. 2012.
[12] Blei, D. M., J. D. McAuliffe. Supervised topic models. In NIPS. 2007.
[13] Liu, D., J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Prog., 1989.
[14] Thomas, M., B. Pang, L. Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In EMNLP. 2006.
[15] Lewis, J. B., K. T. Poole. Measuring bias and uncertainty in ideal point estimates via the parametric bootstrap. Political Analysis, 12(2), 2004.
[16] Jindal, N., B. Liu. Opinion spam and analysis. In WSDM. 2008.
[17] Pang, B., L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL. 2005.
[18] Joachims, T. Making large-scale SVM learning practical. In Adv. in Kernel Methods - SVM. 1999.
[19] Neal, R. M. Slice sampling. Annals of Statistics, 31:705767, 2003.
[20] Yu, B., D. Diermeier, S. Kaufmann. Classifying party affiliation from political speech. JITP, 2008.
[21] Chang, J., J. Boyd-Graber, C. Wang, et al. Reading tea leaves: How humans interpret topic models. In NIPS. 2009.
[22] Petinot, Y., K. McKeown, K. Thadani. A hierarchical model of web summaries. In HLT. 2011.
[23] Mao, X., Z. Ming, T.-S. Chua, et al. SSHLDA: A semi-supervised hierarchical topic model. In EMNLP.2012.
[24] Boyd-Graber, J., P. Resnik. Holistic sentiment analysis across languages: Multilingual supervised latent Dirichlet allocation. In EMNLP. 2010.
[25] Mimno, D. M., A. McCallum. Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. In UAI. 2008.
[26] Perotte, A. J., F. Wood, N. Elhadad, et al. Hierarchically supervised latent Dirichlet allocation. In NIPS.2011.
[27] Ahmed, A., E. P. Xing. Staying informed: Supervised and semi-supervised multi-view topical analysis of ideological perspective. In EMNLP. 2010.
[28] Eisenstein, J., A. Ahmed, E. P. Xing. Sparse additive generative models of text. In ICML. 2011.
[29] Jo, Y., A. H. Oh. Aspect and sentiment unification model for online review analysis. In WSDM. 2011.
[30] Pang, B., L. Lee. Opinion Mining and Sentiment Analysis. Now Publishers Inc, 2008.
[31] Monroe, B. L., M. P. Colaresi, K. M. Quinn. Fightinwords: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis, 16(4):372403, 2008.
[32] Greene, S., P. Resnik. More than words: Syntactic packaging and implicit sentiment. In NAACL. 2009.
[33] Yano, T., P. Resnik, N. A. Smith. Shedding (a thousand points of) light on biased language. In NAACL-HLT Workshop on Creating Speech and Language Data with Amazons Mechanical Turk. 2010.
[34] Recasens, M., C. Danescu-Niculescu-Mizil, D. Jurafsky. Linguistic models for analyzing and detecting biased language. In ACL. 2013.
-----1
[1] M. Amini and C. Goutte. A co-classification approach to learning from multilingual corpora.Machine Learning, 79:105121, 2010.
[2] M. Amini, N. Usunier, and C. Goutte. Learning from multiple partially observed views - an application to multilingual text categorization. In NIPS, 2009.
[3] B. A.R., A. Joshi, and P. Bhattacharyya. Cross-lingual sentiment analysis for indian languages using linked wordnets. In Proc. of COLING, 2012.
[4] E. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717772, 2009.
[5] C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:127:27, 2011.
[6] W. Dai, Y. Chen, G. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across different feature spaces. In NIPS, 2008.
[7] A. Gliozzo. Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. In Proc. of ICCL-ACL, 2006.
[8] J. Jagarlamudi, R. Udupa, H. Daume III, and A. Bhole. Improving bilingual projections via sparse covariance matrices. In Proc. of EMNLP, 2011.
[9] X. Ling, G. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu. Can chinese web pages be classified with English data source? In Proc. of WWW, 2008.
[10] M. Littman, S. Dumais, and T. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval, chapter 5, pages 5162.Kluwer Academic Publishers, 1998.
[11] S. Ma, D. Goldfarb, and L. Chen. Fixed point and bregman iterative methods for matrix rank minimization. Mathematical Programming: Series A and B archive, 128, Issue 1-2, 2011.
[12] X. Meng, F. Wei, X. Liu, M. Zhou, G. Xu, and H. Wang. Cross-lingual mixture model for sentiment classification. In Proc. of ACL, 2012.
[13] J. Pan, G. Xue, Y. Yu, and Y. Wang. Cross-lingual sentiment classification via bi-view non- negative matrix tri-factorization. In Proc. of PAKDD, 2011.
[14] P. Petrenz and B. Webber. Label propagation for fine-grained cross-lingual genre classification.In Proc. of the NIPS xLiTe workshop, 2012.
[15] J. Platt, K. Toutanova, and W. Yih. Translingual document representations from discriminative projections. In Proc. of EMNLP, 2010.
[16] P. Prettenhofer and B. Stein. Cross-language text classification using structural correspondence learning. In Proc. of ACL, 2010.
[17] L. Rigutini and M. Maggini. An EM based training algorithm for cross-language text catego- rization. In Proc. of the Web Intelligence Conference, 2005.
[18] J. Shanahan, G. Grefenstette, Y. Qu, and D. Evans. Mining multilingual opinions through classification and translation. In AAAI Spring Symp. on Explor. Attit. and Affect in Text, 2004.
[19] W. Smet, J. Tang, and M. Moens. Knowledge transfer across multilingual corpora via latent topics. In Proc. of PAKDD, 2011.
[20] A. Vinokourov, J. Shawe-taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In NIPS, 2002.
[21] C. Wan, R. Pan, and J. Li. Bi-weighting domain adaptation for cross-language text classifica- tion. In Proc. of IJCAI, 2011.
[22] X. Wan. Co-training for cross-lingual sentiment classification. In Proc. of ACL-IJCNLP, 2009.
[23] K. Wu, X. Wang, and B. Lu. Cross language text categorization using a bilingual lexicon. In Proc. of IJCNLP, 2008.
-----1
[1] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:11371155, 2003.
[2] Yoshua Bengio and Jean-Sebastien Senecal. Quick training of probabilistic neural nets by importance sampling. In AISTATS03, 2003.
[3] Yoshua Bengio and Jean-Sebastien Senecal. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks, 19(4):713722, 2008.
[4] R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, 2008.
[5] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2010.
[6] M.U. Gutmann and A. Hyvarinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13:307361, 2012.
[7] Zellig S Harris. Distributional structure. Word, 1954.
[8] Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. Improving word representa- tions via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 873882, 2012.
[9] T. Mikolov, M. Karafiat, L. Burget, J. C?ernocky`, and S. Khudanpur. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association, 2010.
[10] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. International Conference on Learning Representations 2013, 2013.
[11] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. Proceedings of NAACL-HLT, 2013.
[12] A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. Proceedings of the 24th International Conference on Machine Learning, pages 641648, 2007.
[13] Andriy Mnih and Geoffrey Hinton. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems, volume 21, 2009.
[14] Andriy Mnih and Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Conference on Machine Learning, pages 17511758, 2012.
[15] Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In AIS- TATS05, pages 246252, 2005.
[16] Magnus Sahlgren. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm, 2006.
[17] R. Socher, C.C. Lin, A.Y. Ng, and C.D. Manning. Parsing natural scenes and natural language with recursive neural networks. In International Conference on Machine Learning (ICML), 2011.
[18] J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semi- supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384394, 2010.
[19] G. Zweig and C.J.C. Burges. The Microsoft Research Sentence Completion Challenge. Technical Report MSR-TR-2011-129, Microsoft Research, 2011.
[20] Geoffrey Zweig and Chris J.C. Burges. A challenge set for advancing language modeling. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pages 2936, 2012.
-----1
[1] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
[2] L. Bottou and O. Bousquet. The tradeoffs of large-scale learning. Optimization for Machine Learning, page 351, 2011.
[3] W.-Y. Chen, Y.-F. Liao, and S.-H. Chen. Speech recognition with hierarchical recurrent neural networks.Pattern Recognition, 28(6):795  805, 1995.
[4] D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural computation, 22(12):32073220, 2010.
[5] S. El Hihi and Y. Bengio. Hierarchical recurrent neural networks for long-term dependencies. Advances in Neural Information Processing Systems, 8:493499, 1996.
[6] S. Fernandez, A. Graves, and J. Schmidhuber. Sequence labelling in structured domains with hierarchi- cal recurrent neural networks. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, January 2007.
[7] J. Garofolo, N. I. of Standards, T. (US, L. D. Consortium, I. Science, T. Office, U. States, and D. A. R. P.Agency. TIMIT Acoustic-phonetic Continuous Speech Corpus. Linguistic Data Consortium, 1993.
[8] A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In To appear in ICASSP 2013, 2013.
[9] G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):15271554, 2006.
[10] G. E. Hinton. Reducing the dimensionality of data with neural networks. Science, 313:504507, 2006.
[11] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):17351780, 1997.
[12] M. Hutter. The human knowledge compression prize, 2006.
[13] H. Jaeger. Long short-term memory in echo state networks: Details of a simulation study. Technical report, Jacobs University, 2012.
[14] M. Mahoney. Adaptive weighing of context models for lossless data compression. Florida Tech., Mel- bourne, USA, Tech. Rep, 2005.
[15] J. Martens. Deep learning via hessian-free optimization. In Proceedings of the 27th International Con- ference on Machine Learning, pages 735742, 2010.
[16] J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. In Pro- ceedings of the 28th International Conference on Machine Learning, volume 46, page 68. Omnipress Madison, WI, 2011.
[17] A. Mohamed, G. Dahl, and G. Hinton. Acoustic modeling using deep belief networks. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):1422, 2012.
[18] C. E. Shannon. Prediction and entropy of printed english. Bell system technical journal, 30(1):5064, 1951.
[19] I. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning, pages 10171024, 2011.
[20] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine learning, pages 10961103, 2008.
-----1
[1] M Oberlaender, VJ Dercksen, R Egger, M Gensel, B Sakmann, and HC Hege. Automated three- dimensional detection and counting of neuron somata. J Neuroscience Methods, 180:147160, 2009.
[2] EA Mukamel, A Nimmerjahn, and MJ Schnitzer. Automated analysis of cellular signals from large-scale calcium imaging data. Neuron, 63:747760, 2009.
[3] I Ozden, HM Lee, MR Sullivan, and SSH Wang. Identification and clustering of event patterns from in vivo multiphoton optical recordings of neuronal ensembles. J Neurophysiol, 100:495503, 2008.
[4] K Kavukcuoglu, P Sermanet, YL Boureau, K Gregor, M Mathieu, and Y LeCun. Learning convolutional feature hierarchies for visual recognition. Advances in Neural Information Processing, 2010.
[5] K Gregor, A Szlam, and Y LeCun. Structured sparse coding via lateral inhibition. Advances in Neural Information Processing, 2011.
[6] A Szlam, K Kavukcuoglu, and Y LeCun. Convolutional matching pursuit and dictionary training. arXiv, page 1010.0422v1, 2010.
[7] A Hyvarinen, J Hurri, and PO Hoyer. Natural Image Statistics. Springer, 2009.
[8] P Berkes, RE Turner, and M Sahani. A structured model of video produces primary visual cortical organisation. PLoS Computational Biology, 5, 2009.
[9] SG Mallat and Z Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):33973415, 1993.
[10] M Aharon, M Elad, and A Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):43114322, 2006.
-----1
[1] N. Kanwisher, J. McDermott, and M. M. Chun, The fusiform face area: a module in human extrastriate cortex specialized for face perception., J Neurosci, vol. 17, p. 4302, 1997.
[2] R. Poldrack, Can cognitive processes be inferred from neuroimaging data?, Trends in cognitive sciences, vol. 10, p. 59, 2006.
[3] A. Laird, J. Lancaster, and P. Fox, Brainmap, Neuroinformatics, vol. 3, p. 65, 2005.
[4] T. Yarkoni, R. Poldrack, T. Nichols, D. V. Essen, and T. Wager, Large-scale automated synthesis of human functional neuroimaging data, Nature Methods, vol. 8, p. 665, 2011.
[5] R. Poldrack, Y. Halchenko, and S. Hanson, Decoding the large-scale structure of brain function by classifying mental states across individuals, Psychological Science, vol. 20, p. 1364, 2009.
[6] S. Hanson and Y. Halchenko, Brain reading using full brain support vector machines for object recogni- tion: there is no face identification area, Neural Computation, vol. 20, p. 486, 2008.
[7] G. Salimi-Khorshidi, S. M. Smith, J. R. Keltner, T. D. Wager, et al., Meta-analysis of neuroimaging data: a comparison of image-based and coordinate-based pooling of studies, Neuroimage, vol. 45, p. 810, 2009.
[8] C. Pallier, A. Devauchelle, and S. Dehaene, Cortical representation of the constituent structure of sen- tences, Proc Natl Acad Sci, vol. 108, p. 2522, 2011.
[9] J. Turner and A. Laird, The cognitive paradigm ontology: design and application, Neuroinformatics, vol. 10, p. 57, 2012.
[10] V. Michel, A. Gramfort, G. Varoquaux, E. Eger, C. Keribin, and B. Thirion, A supervised clustering approach for fMRI-based inference of brain states, Pattern Recognition, vol. 45, p. 2041, 2012.
[11] H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of statistical planning and inference, vol. 90, p. 227, 2000.
[12] R. Poldrack, D. Barch, J. Mitchell, T. Wager, A. Wagner, J. Devlin, C. Cumba, and M. Milham, Towards open sharing of task-based fMRI data: The openfMRI project (in press), Frontiers in Neuroinformatics.
[13] T. Schonberg, C. Fox, J. Mumford, C. Congdon, C. Trepel, and R. Poldrack, Decreasing ventromedial prefrontal cortex activity during sequential risk-taking: an fMRI investigation of the balloon analog risk task, Frontiers in Neuroscience, vol. 6, 2012.
[14] S. Tom, C. Fox, C. Trepel, and R. Poldrack, The neural basis of loss aversion in decision-making under risk, Science, vol. 315, p. 515, 2007.
[15] A. Aron, M. Gluck, and R. Poldrack, Long-term testretest reliability of functional MRI in a classifica- tion learning task, Neuroimage, vol. 29, p. 1000, 2006.
[16] K. Foerde, B. Knowlton, and R. Poldrack, Modulation of competing memory systems by distraction, Proc Natl Acad Sci, vol. 103, p. 11778, 2006.
[17] R. Poldrack, J. Clark, E. Pare-Blagoev, D. Shohamy, J. Creso Moyano, C. Myers, and M. Gluck, Inter- active memory systems in the human brain, Nature, vol. 414, p. 546, 2001.
[18] G. Xue and R. Poldrack, The neural substrates of visual perceptual learning of words: implications for the visual word form area hypothesis, J Cognitive Neurosci, vol. 19, p. 1643, 2007.
[19] L. Vagharchakian, G. Dehaene-Lambertz, C. Pallier, and S. Dehaene, A temporal bottleneck in the lan- guage comprehension network, J Neurosci, vol. 32, p. 9089, 2012.
[20] G. Xue, A. Aron, and R. Poldrack, Common neural substrates for inhibition of spoken and manual responses, Cerebral Cortex, vol. 18, p. 1923, 2008.
[21] A. Kelly, L. Q. Uddin, B. B. Biswal, F. Castellanos, and M. Milham, Competition between functional brain networks mediates behavioral variability, Neuroimage, vol. 39, p. 527, 2008.
[22] J. Haxby, I. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini, Distributed and overlapping repre- sentations of faces and objects in ventral temporal cortex, Science, vol. 293, p. 2425, 2001.
[23] K. Duncan, C. Pattamadilok, I. Knierim, and J. Devlin, Consistency and variability in functional localis- ers, Neuroimage, vol. 46, p. 1018, 2009.
[24] P. Pinel, B. Thirion, S. Meriaux, A. Jobert, J. Serres, D. L. Bihan, J. B. Poline, and S. Dehaene, Fast reproducible identification and large-scale databasing of individual functional cognitive networks, BMC neuroscience, vol. 8, p. 91, 2007.
[25] P. Pinel and S. Dehaene, Genetic and environmental contributions to brain activation during calculation, NeuroImage, vol. in press, 2013.
[26] A. Knops, B. Thirion, E. M. Hubbard, V. Michel, and S. Dehaene, Recruitment of an area involved in eye movements during mental arithmetic, Science, vol. 324, p. 1583, 2009.
[27] J. Deng, A. Berg, K. Li, and L. Fei-Fei, What does classifying more than 10,000 image categories tell us?, in Computer VisionECCV 2010, p. 71, 2010.
[28] W. W. Seeley, V. Menon, A. F. Schatzberg, J. Keller, G. H. Glover, H. Kenna, A. L. Reiss, and M. D.Greicius, Dissociable intrinsic connectivity networks for salience processing and executive control, J neurosci, vol. 27, p. 2349, 2007.
-----1
[1] R. Bhatia. Positive Definite Matrices. Princeton University Press, 2007.
[2] R. Bhatia and R. L. Karandikar. The matrix geometric mean. Technical Report isid/ms/2-11/02, Indian Statistical Institute, 2011.
[3] D. A. Bini and B. Iannazzo. Computing the karcher mean of symmetric positive definite matrices. Linear Algebra and its Applications, 438(4):1700  1710, 2013.
[4] G. Blekherman and P. A. Parrilo, editors. Semidefinite Optimization and Convex Algebraic Geometry.SIAM, 2013.
[5] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt: a matlab toolbox for optimization on manifolds. arXiv Preprint 1308.5200, 2013.
[6] S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi. A Tutorial on Geometric Programming. Optimization and Engineering, 8(1):67127, 2007.
[7] M. R. Bridson and A. Haeflinger. Metric Spaces of Non-Positive Curvature. Springer, 1999.
[8] S. Cambanis, S. Huang, and G. Simons. On the theory of elliptically contoured distributions. Journal of Multivariate Analysis, 11(3):368385, 1981.
[9] Y. Chen, A. Wiesel, and A. Hero. Robust shrinkage estimation of high-dimensional covariance matrices.IEEE Transactions on Signal Processing, 59(9):40974107, 2011.
[10] G. Cheng and B. Vemuri. A novel dynamic system in the space of spd matrices with applications to appearance tracking. SIAM Journal on Imaging Sciences, 6(1):592615, 2013.
[11] G. Cheng, H. Salehian, and B. C. Vemuri. Efficient Recursive Algorithms for Computing the Mean Diffusion Tensor and Applications to DTI Segmentation. In European Conference on Computer Vision (ECCV), volume 7, pages 390401, 2012.
[12] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Jensen-Bregman LogDet Divergence for Efficient Similarity Computations on Positive Definite Tensors. IEEE TPAMI, 2012.
[13] A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman and Hall/CRC, 1999.
[14] L. Gurvits and A. Samorodnitsky. A deterministic algorithm for approximating mixed discriminant and mixed volume, and a combinatorial corollary. Disc. Comp. Geom., 27(4), 2002.
[15] S. K. K.-T. Fang and K. W. Ng. Symmetric multivariate and related distributions. Chapman & Hall, 1990.
[16] J. T. Kent and D. E. Tyler. Redescending M-estimates of multivariate location and scatter. The Annals of Statistics, 19(4):21022119, Dec. 1991.
[17] H. Lee and Y. Lim. Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity, 21: 857878, 2008.
[18] B. Lemmens and R. Nussbaum. Nonlinear Perron-Frobenius Theory. Cambridge Univ. Press, 2012.
[19] Y. Lim and M. Palfia. Matrix power means and the Karcher mean. J. Functional Analysis, 262:14981514, 2012.
[20] R. J. Muirhead. Aspects of multivariate statistical theory. John-Wiley, 1982.
[21] Y. Nesterov and A. S. Nemirovskii. Interior-point polynomial algorithms in convex programming. SIAM, 1994.
[22] F. Nielsen and R. Bhatia, editors. Matrix Information Geometry. Springer, 2013.
[23] E. Ollila, D. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions: Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):55975625, 2011.
[24] A. Papadopoulos. Metric spaces, convexity and nonpositive curvature. Europ. Math. Soc., 2005.
[25] T. Rapcsak. Geodesic convexity in nonlinear optimization. J. Optim. Theory and Appl., 69(1):169183, 1991.
[26] R. T. Rockafellar and R. J.-B. Wets. Variational analysis. Springer, 1998.
[27] S. Sra. Positive Definite Matrices and the Symmetric Stein Divergence. arXiv:1110.1773, Oct. 2012.
[28] S. Sra and R. Hosseini. Conic geometric optimisation on the manifold of positive definite matrices. arXiv preprint, 2013.
[29] A. Wiesel. Geodesic convexity and covariance estimation. IEEE Transactions on Signal Processing, 60 (12):618289, 2012.
[30] T. Zhang, A. Wiesel, and S. Greco. Multivariate generalized gaussian distribution: Convexity and graphical models. arXiv preprint arXiv:1304.3206, 60(11):55975625, Nov. 2013.
[31] H. Zhu, H. Zhang, J. Ibrahim, and B. Peterson. Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance imaging data. Journal of the American Statistical Association, 102(480):10851102, 2007.
-----1
[1] G. Valiant and P. Valiant. Estimating the unseen: an n/ log(n)sample estimator for entropy and support size, shown optimal via new CLTs. In Symposium on Theory of Computing (STOC), 2011.
[2] G. Valiant and P. Valiant. The power of linear estimators. In IEEE Symposium on Foundations of Computer Science (FOCS), 2011.
[3] M. R. Nelson et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337(6090):100104, 2012.
[4] J. A. Tennessen et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337(6090):6469, 2012.
[5] A. Keinan and A. G. Clark. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science, 336(6082):740743, 2012.
[6] F. Olken and D. Rotem. Random sampling from database files: a survey. In Proceedings of the Fifth International Workshop on Statistical and Scientific Data Management, 1990.
[7] P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on random sampling. Journal of Computer and System Sciences, 52(3):550569, 1996.
[8] R.A. Fisher, A. Corbet, and C.B. Williams. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of the British Ecological Society, 12(1):4258, 1943.
[9] I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(16):237264, 1953.
[10] D. A. McAllester and R.E. Schapire. On the convergence rate of Good-Turing estimators. In Conference on Learning Theory (COLT), 2000.
[11] A. Orlitsky, N.P. Santhanam, and J. Zhang. Always Good Turing: Asymptotically optimal probability estimation. Science, 302(5644):427431, October 2003.
[12] A. Orlitsky, N. Santhanam, K.Viswanathan, and J. Zhang. On modeling profiles instead of values. Un- certainity in Artificial Intelligence, 2004.
[13] J. Acharya, A. Orlitsky, and S. Pan. The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns. In IEEE Symp. on Information Theory, 2009.
[14] J. Acharya, H. Das, A. Orlitsky, and S. Pan. Competitive closeness testing. In COLT, 2011.
[15] L. Paninski. Estimation of entropy and mutual information. Neural Comp., 15(6):11911253, 2003.
[16] J. Bunge and M. Fitzpatrick. Estimating the number of species: A review. Journal of the American Statistical Association, 88(421):364373, 1993.
[17] J. Bunge. Bibliography of references on the problem of estimating support size, available at http://www.stat.cornell.edu/bunge/bibliography.html.
[18] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: lower bounds and applications. In STOC, 2001.
[19] T. Batu Testing Properties of Distributions Ph.D. thesis, Cornell, 2001.
[20] M. Charikar, S. Chaudhuri, R. Motwani, and V.R. Narasayya. Towards estimation error guarantees for distinct values. In SODA, 2000.
[21] T. Batu, L. Fortnow, R. Rubinfeld, W.D. Smith, and P. White. Testing that distributions are close. In IEEE Symposium on Foundations of Computer Science (FOCS), 2000.
[22] V.Q. Vu, B. Yu, and R.E. Kass. Coverage-adjusted entropy estimation. Statistics in Medicine, 26(21):40394060, 2007.
[23] G. Miller. Note on the bias of information estimates. Information Theory in Psychology II-B, ed H Quastler (Glencoe, IL: Free Press):pp 95100, 1955.
[24] S. Panzeri and A Treves. Analytical estimates of limited sampling biases in different information mea- sures. Network: Computation in Neural Systems, 7:87107, 1996.
[25] S. Zahl. Jackknifing an index of diversity. Ecology, 58:907913, 1977.
[26] B. Efron and C. Stein. The jacknife estimate of variance. Annals of Statistics, 9:586596, 1981.
[27] A. Chao and T.J. Shen. Nonparametric estimation of shannons index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10:429443, 2003.
[28] D.G. Horvitz and D.J. Thompson. A generalization of sampling without replacement from a finite uni- verse. Journal of the American Statistical Association, 47(260):663685, 1952.
[29] P. Valiant. Testing Symmetric Properties of Distributions. SIAM J. Comput., 40(6):19271968,2011.
-----1
[1] T. Broderick, B. Kulis, and M. I. Jordan. MAD-Bayes: MAP-based Asymptotic Derivations from Bayes.In ICML, 2013.
[2] F. Doshi-Velez and Z. Ghahramani. Accelerated sampling for the indian buffet process. In ICML, 2009.
[3] F. Doshi-Velez, K. T. Miller, J. Van Gael, and Y. W. Teh. Variational inference for the Indian buffet process. In AISTATS, 2009.
[4] A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[5] R. Fujimaki and K. Hayashi. Factorized asymptotic bayesian hidden markov model. In ICML, 2012.
[6] R. Fujimaki and S. Morinaga. Factorized asymptotic bayesian inference for mixture modeling. In AIS- TATS, 2012.
[7] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:643660, 2001.
[8] Z. Ghahramani. Factorial learning and the EM algorithm. In NIPS, 1995.
[9] Z. Ghahramani, T. L. Griffiths, and P. Sollich. Bayesian nonparametric latent feature models (with dis- cussion). In 8th Valencia International Meeting on Bayesian Statistics, 2006.
[10] T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process, 2005.
[11] T. L. Griffiths and Z. Ghahramani. The indian buffet process: An introduction and review. JMLR, 12:11851224, 2011.
[12] U. Hoffmann, G. Garcia, J. M. Vesin, K. Diserens, and T. Ebrahimi. A boosting approach to p300 detection with application to brain-computer interfaces. In International IEEE EMBS Conference on Neural Engineering, pages 97100. 2005.
[13] J. J. Hull. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550554, 1994.
[14] M. W. Kadous. Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series. PhD thesis, School of Computer Science & Engineering, University of New South Wales, 2002.
[15] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors.Information and Computation, 132(1):163, 1997.
[16] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction. In NIPS, 2009.
[17] K. T. Miller. Bayesian Nonparametric Latent Feature Models. PhD thesis, University of California, Berkeley, 2011.
[18] S. Nakajima, M. Sugiyama, and D. Babacan. On bayesian PCA: Automatic dimensionality selection and analytic solution. In ICML, 2011.
[19] K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML, 2012.
[20] C. Peterson and J. Anderson. A mean field theory learning algorithm for neural networks. Complex systems, 1:9951019, 1987.
[21] G. E. Poliner and D. P. W. Ellis. A discriminative model for polyphonic piano transcription. EURASIP Journal of Advances in Signal Processing, 2007(1):154, 2007.
[22] C. Reed and Z. Ghahramani. Scaling the indian buffet process via submodular maximization. In ICML, 2013.
[23] G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461464, 1978.
[24] M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society. Series B, 61(3):611622, 1999.
[25] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference.Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
[26] S. Watanabe. Algebraic analysis for nonidentifiable learning machines. Neural Computation, 13(4):899 933, 2001.
[27] S. Watanabe. Algebraic Geometry and Statistical Learning Theory (Cambridge Monographs on Applied and Computational Mathematics). Cambridge University Press, 2009.
[28] R. Wong. Asymptotic Approximation of Integrals (Classics in Applied Mathematics). SIAM, 2001.
[29] A. L. Yuille and A. Rangarajan. The Concave-Convex procedure. Neural Computation, 15(4):915936, 2003.
[30] R. S. Zemel and G. E. Hinton. Learning population codes by minimizing description length. Neural Computation, 7(3):1118, 1994.
-----1
[1] R. P. Adams and D. J. C. MacKay. Bayesian online changepoint detection. Technical report, University of Cambridge, Cambridge, UK, 2007. arXiv:0710.3742v1 [stat.ML].
[2] D. M. Chickering. Learning Bayesian networks is NP-complete. In Proceedings of AI and Statistics, 1995.
[3] D. M. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507554, 2002.
[4] F. Desobry, M. Davy, and C. Doncarli. An online kernel change detection algorithm. IEEE Transactions on Signal Processing, 8:29612974, 2005.
[5] E. Kummerfeld and D. Danks. Model change and methodological virtues in scientific infer- ence. Technical report, Carnegie Mellon University, Pittsburgh, Pennsylvania, 2013.
[6] S. L. Lauritzen. Graphical models. Clarendon Press, 1996.
[7] T. Liptak. On the combination of independent tests. Magyar Tud. Akad. Mat. Kutato Int. Kozl., 3:171197, 1958.
[8] P. C. Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 2:4955, 1936.
[9] A. McCallum, D. Freitag, and F. C. N. Pereira. Maximum entropy Markov models of informa- tion extraction and segmentation. In Proceedings of ICML-2000, pages 591598, 2000.
[10] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
[11] M.R. Siracusa and J.W. Fisher III. Tractable bayesian inference of time-series dependence structure. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, 2009.
[12] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000.
[13] R. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:944, 1988.
[14] M. Talih and N. Hengartner. Structural learning with time-varying components: tracking the cross-section of financial time series. Journal of the Royal Statistical Society - Series B: Sta- tistical Methodology, 67(3):321341, 2005.
-----0
BANERJEE, O., EL GHAOUI, L. and DASPREMONT, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research 9 485516.
BELLONI, A., CHERNOZHUKOV, V. and WANG, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98 791806.
CAI, T., LIU, W. and LUO, X. (2011). A constrained `1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106 594607.
CAMBANIS, S., HUANG, S. and SIMONS, G. (1981). On the theory of elliptically contoured distributions. Journal of Multivariate Analysis 11 368385.
CATONI, O. (2012). Challenging the empirical mean and empirical variance: a deviation study.Annales de lInstitut Henri Poincare, Probabilites et Statistiques 48 11481185.
DEMPSTER, A. P. (1972). Covariance selection. Biometrics 157175.FANG, K.-T., KOTZ, S. and NG, K. W. (1990). Symmetric Multivariate and Related Distributions, Monographs on Statistics and Applied Probability, 36. London: Chapman and Hall Ltd.MR1071174.
FRIEDMAN, J., HASTIE, T. and TIBSHIRANI, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432441.
GAUTIER, E. and TSYBAKOV, A. B. (2011). High-dimensional instrumental variables regression and confidence sets. Tech. rep., ENSAE ParisTech.
HESS, K. R., ANDERSON, K., SYMMANS, W. F., VALERO, V., IBRAHIM, N., MEJIA, J. A., BOOSER, D., THERIAULT, R. L., BUZDAR, A. U., DEMPSEY, P. J. ET AL. (2006). Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. Journal of clinical oncology 24 42364244.KRUSKAL, W. H. (1958). Ordinal measures of association. Journal of the American Statistical Association 53 814861.
LAURITZEN, S. L. (1996). Graphical models, vol. 17. Oxford University Press.LIU, H., HAN, F., YUAN, M., LAFFERTY, J. and WASSERMAN, L. (2012). High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics 40 22932326.
LIU, H. and WANG, L. (2012). Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. Tech. rep., Massachusett Institute of Technology.
LIU, H., WANG, L. and ZHAO, T. (2013). Multivariate regression with calibration. arXiv preprint arXiv:1305.2238 .
STOER, J., BULIRSCH, R., BARTELS, R., GAUTSCHI, W. and WITZGALL, C. (1993). Introduction to numerical analysis, vol. 2. Springer New York.
YUAN, M. (2010). High dimensional inverse covariance matrix estimation via linear programming.The Journal of Machine Learning Research 11 22612286.
YUAN, M. and LIN, Y. (2007). Model selection and estimation in the gaussian graphical model.Biometrika 94 1935.
ZHAO, T., LIU, H., ROEDER, K., LAFFERTY, J. and WASSERMAN, L. (2012). The huge package for high-dimensional undirected graph estimation in r. The Journal of Machine Learning Research 9 10591062.
ZHAO, T., ROEDER, K. and LIU, H. (2013). Positive semidefinite rank-based correlation matrix estimation with application to semiparametric graph estimation. Journal of Computational and Graphical Statistics To appear.
-----1
[1] David Maxwell Chickering. Learning Bayesian networks is NP-complete. In Learning from data, pages 121130. Springer, 1996.
[2] Nir Friedman, Iftach Nachman, and Dana Peer. Learning Bayesian network structure from massive datasets: the Sparse Candidate algorithm. In Proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence, pages 206215. Morgan Kaufmann Publishers Inc., 1999.
[3] Wenjiang J Fu. Penalized regressions: the bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3):397416, 1998.
[4] David Heckerman, Dan Geiger, and David M Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197243, 1995.
[5] Shuai Huang, Jing Li, Jieping Ye, Adam Fleisher, Kewei Chen, Teresa Wu, and Eric Reiman.A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):13281342, 2013.
[6] Tommi Jaakkola, David Sontag, Amir Globerson, and Marina Meila. Learning Bayesian net- work structure using LP relaxations. In Proceedings of the Thirteenth International Conference on Artificial intelligence and Statistics (AISTATS), 2010.
[7] Mikko Koivisto and Kismat Sood. Exact Bayesian structure discovery in Bayesian networks.Journal of Machine Learning Research, 5:549573, 2004.
[8] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques.MIT press, 2009.
[9] Wai Lam and Fahiem Bacchus. Learning Bayesian belief networks: An approach based on the MDL principle. Computational intelligence, 10(3):269293, 1994.
[10] Maxim Likhachev, Geoff Gordon, and Sebastian Thrun. ARA*: Anytime A* with provable bounds on sub-optimality. Advances in Neural Information Processing Systems (NIPS), 16, 2003.
[11] Jean-Philippe Pellet and Andre Elisseeff. Using Markov blankets for causal structure learning.The Journal of Machine Learning Research, 9:12951342, 2008.
[12] Stuart Jonathan Russell, Peter Norvig, John F Canny, Jitendra M Malik, and Douglas D Ed- wards. Artificial intelligence: a modern approach, volume 74. Prentice hall Englewood Cliffs, 1995.
[13] Mark Schmidt, Alexandru Niculescu-Mizil, and Kevin Murphy. Learning graphical model structure using L1-regularization paths. In Proceedings of the National Conference on Artificial Intelligence, volume 22, page 1278, 2007.
[14] Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461464, 1978.
[15] Ajit Singh and Andrew Moore. Finding optimal Bayesian networks by dynamic programming.Technical Report 05-106, School of Computer Science, Carnegie Mellon University, 2005.
[16] Marc Teyssier and Daphne Koller. Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In Proceedings of the Twentieth conference on Uncertainty in Artificial Intelligence, pages 584590, 2005.
[17] Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):3178, 2006.
[18] Ioannis Tsamardinos, Alexander Statnikov, Laura E Brown, and Constantin F Aliferis. Gen- erating realistic large Bayesian networks by tiling. In the Nineteenth International FLAIRS conference, pages 592597, 2006.
[19] Changhe Yuan, Brandon Malone, and Xiaojian Wu. Learning optimal Bayesian networks using A* search. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pages 21862191. AAAI Press, 2011.
-----1
[1] F. Bach. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res., 9:11791225, 2008.
[2] P.J. Bickel, Y. Ritov, and A.B. Tsybakov. Simultaneous analysis of lasso and dantzig selector. Ann. Statis., 37(4):17051732, 2009.
[3] P. Buhlmann and S. van de Geer. Statistics for high-dimensional data: Methods, theory and applications.2011.
[4] F. Bunea. Honest variable selection in linear and logistic regression models via `1 and `1+`2 penalization.Electron. J. Stat., 2:11531194, 2008.
[5] E. Cande`s and B. Recht. Simple bounds for recovering low-complexity models. Math. Prog. Ser. A, pages 113, 2012.
[6] J. Guo, E. Levina, G. Michailidis, and J. Zhu. Asymptotic properties of the joint neighborhood selection method for estimating categorical markov networks. arXiv preprint.
[7] L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In Int. Conf. Mach. Learn.(ICML), pages 433440. ACM, 2009.
[8] A. Jalali, P. Ravikumar, V. Vasuki, S. Sanghavi, and UT ECE. On learning discrete graphical models using group-sparse regularization. In Int. Conf. Artif. Intell. Stat. (AISTATS), 2011.
[9] G.M. James, C. Paulson, and P. Rusmevichientong. The constrained lasso. Technical report, University of Southern California, 2012.
[10] S.M. Kakade, O. Shamir, K. Sridharan, and A. Tewari. Learning exponential families in high-dimensions: Strong convexity and sparsity. In Int. Conf. Artif. Intell. Stat. (AISTATS), 2010.
[11] M. Kolar, L. Song, A. Ahmed, and E. Xing. Estimating time-varying networks. Ann. Appl. Stat., 4(1):94 123, 2010.
[12] C. Lam and J. Fan. Sparsistency and rates of convergence in large covariance matrix estimation. Ann.Statis., 37(6B):4254, 2009.
[13] J.D. Lee and T. Hastie. Learning mixed graphical models. arXiv preprint arXiv:1205.5012, 2012.
[14] P.L. Loh and M.J. Wainwright. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. arXiv:1212.0478, 2012.
[15] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso. Ann.Statis., 34(3):14361462, 2006.
[16] Y. Nardi and A. Rinaldo. On the asymptotic properties of the group lasso estimator for linear models.Electron. J. Stat., 2:605633, 2008.
[17] S.N. Negahban, P. Ravikumar, M.J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist. Sci., 27(4):538557, 2012.
[18] G. Obozinski, M.J. Wainwright, and M.I. Jordan. Support union recovery in high-dimensional multivariate regression. Ann. Statis., 39(1):147, 2011.
[19] P. Ravikumar, M.J. Wainwright, and J.D. Lafferty. High-dimensional ising model selection using `1- regularized logistic regression. Ann. Statis., 38(3):12871319, 2010.
[20] P. Ravikumar, M.J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing `1-penalized log-determinant divergence. Electron. J. Stat., 5:935980, 2011.
[21] A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation.Electron. J. Stat., 2:494515, 2008.
[22] Y. She. Sparse regression with exact clustering. Electron. J. Stat., 4:10551096, 2010.
[23] R.J. Tibshirani and J.E. Taylor. The solution path of the generalized lasso. Ann. Statis., 39(3):13351371, 2011.
[24] S. Vaiter, G. Peyre, C. Dossal, and J. Fadili. Robust sparse analysis regularization. IEEE Trans. Inform.Theory, 59(4):20012016, 2013.
[25] S. van de Geer. Weakly decomposable regularization penalties and structured sparsity. arXiv preprint arXiv:1204.4813, 2012.
[26] M.J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using `1-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory, 55(5):21832202, 2009.
[27] E. Yang and P. Ravikumar. Dirty statistical models. In Adv. Neural Inf. Process. Syst. (NIPS), pages 827835, 2013.
[28] E. Yang, P. Ravikumar, G.I. Allen, and Z. Liu. On graphical models via univariate exponential family distributions. arXiv:1301.4183, 2013.
[29] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.Ser. B Stat. Methodol., 68(1):4967, 2006.
[30] P. Zhao and B. Yu. On model selection consistency of lasso. J. Mach. Learn. Res., 7:25412563, 2006.
-----1
[1] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows, and J. P. Donoghue, Instant neural control of a movement signal., Nature, vol. 416, no. 6877, pp. 141142, 2002.
[2] K. J. Miller et al., Cortical activity during motor execution, motor imagery, and imagery-based online feedback., PNAS, vol. 107, no. 9, pp. 44304435, 2010.
[3] D. J. McFarland, W. A. Sarnacki, and J. R. Wolpaw, Electroencephalographic (eeg) control of three- dimensional movement., Journal of Neural Engineering, vol. 7, no. 3, p. 036007, 2010.
[4] V. Gilja et al., A high-performance neural prosthesis enabled by control algorithm design., Nat Neurosci, 2012.
[5] L. R. Hochberg et al., Reach and grasp by people with tetraplegia using a neurally controlled robotic arm, Nature, vol. 485, no. 7398, pp. 372375, 2012.
[6] D. Putrino et al., Development of a closed-loop feedback system for real-time control of a high- dimensional brain machine interface, Conf Proc IEEE EMBS, vol. 2012, pp. 45674570, 2012.
[7] S. Koyama et al., Comparison of brain-computer interface decoding algorithms in open-loop and closed- loop control., Journal of Computational Neuroscience, vol. 29, no. 1-2, pp. 7387, 2010.
[8] J. M. Carmena et al., Learning to control a brainmachine interface for reaching and grasping by pri- mates, PLoS Biology, vol. 1, no. 2, p. E42, 2003.
[9] V. Gilja et al., A brain machine interface control algorithm designed from a feedback control perspec- tive., Conf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 131822, 2012.
[10] Z. Li, J. E. ODoherty, M. A. Lebedev, and M. A. L. Nicolelis, Adaptive decoding for brain-machine interfaces through bayesian parameter updates., Neural Comput., vol. 23, no. 12, pp. 3162204, 2011.
[11] K. Kowalski, B. He, and L. Srinivasan, Dynamic analysis of naive adaptive brain-machine interfaces, Neural Comput., vol. 25, no. 9, pp. 23732420, 2013.
[12] C. Vidaurre, C. Sannelli, K.-R. Muller, and B. Blankertz, Machine-learning based co-adaptive calibration for brain-computer interfaces, Neural Computation, vol. 816, no. 3, pp. 791816, 2011.
[13] M. Lagang and L. Srinivasan, Stochastic optimal control as a theory of brain-machine interface opera- tion, Neural Comput., vol. 25, pp. 374417, Feb. 2013.
[14] R. Heliot, K. Ganguly, J. Jimenez, and J. M. Carmena, Learning in closed-loop brain-machine inter- faces: Modeling and experimental validation, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 40, no. 5, pp. 13871397, 2010.
[15] S. Dangi, A. L. Orsborn, H. G. Moorman, and J. M. Carmena, Design and Analysis of Closed-Loop De- coder Adaptation Algorithms for Brain-Machine Interfaces, Neural Computation, pp. 139, Apr. 2013.
[16] Y. Zhang, A. B. Schwartz, S. M. Chase, and R. E. Kass, Bayesian learning in assisted brain-computer interface tasks., Conf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 27403, 2012.
[17] S. Waldert et al., A review on directional information in neural signals for brain-machine interfaces., Journal Of Physiology Paris, vol. 103, no. 3-5, pp. 244254, 2009.
[18] G. P. Papavassilopoulos, Solution of some stochastic quadratic Nash and leader-follower games, SIAM J. Control Optim., vol. 19, pp. 651666, Sept. 1981.
[19] E. Doi and M. S. Lewicki, Characterization of minimum error linear coding with sensory and neural noise., Neural Computation, vol. 23, no. 10, pp. 24982510, 2011.
[20] M. Athans, The discrete time linear-quadratic-Gaussian stochastic control problem, Annals of Economic and Social Measurement, vol. 1, pp. 446488, September 1972.
[21] M. D. Golub, S. M. Chase, and B. M. Yu, Learning an internal dynamics model from control demonstra- tion., 30th International Conference on Machine Learning, 2013.
[22] R. Shadmehr, M. A. Smith, and J. W. Krakauer, Error correction, sensory prediction, and adaptation in motor control., Annual Review of Neuroscience, vol. 33, no. March, pp. 89108, 2010.
[23] L. Shpigelman, H. Lalazar, and E. Vaadia, Kernel-arma for hand tracking and brain-machine interfacing during 3d motor control, in NIPS, pp. 14891496, 2008.
[24] A. C. Koralek, X. Jin, J. D. Long II, R. M. Costa, and J. M. Carmena, Corticostriatal plasticity is neces- sary for learning intentional neuroprosthetic skills., Nature, vol. 483, no. 7389, pp. 331335, 2012.
[25] T. Basar, On the uniqueness of the Nash solution in linear-quadratic differential games, International Journal of Game Theory, vol. 5, no. 2-3, pp. 6590, 1976.
[26] J. P. Cunningham et al., A closed-loop human simulator for investigating the role of feedback control in brain-machine interfaces., Journal of Neurophysiology, vol. 105, no. 4, pp. 19321949, 2010.
[27] Y. T. Wong et al., Decoding arm and hand movements across layers of the macaque frontal cortices., Conf Proc IEEE Eng Med Biol Soc, vol. 2012, pp. 175760, 2012.
-----1
[1] A. Ijspeert and S. Schaal. Learning Attractor Landscapes for Learning Motor Primitives. In Advances in Neural Information Processing Systems 15, (NIPS). MIT Press, Cambridge, MA, 2003.
[2] M. Khansari-Zadeh and A. Billard. Learning Stable Non-Linear Dynamical Systems with Gaussian Mix- ture Models. IEEE Transaction on Robotics, 2011.
[3] J. Kober, K. Mlling, O. Kroemer, C. Lampert, B. Schlkopf, and J. Peters. Movement Templates for Learning of Hitting and Batting. In International Conference on Robotics and Automation (ICRA), 2010.
[4] J. Kober and J. Peters. Policy Search for Motor Primitives in Robotics. Machine Learning, pages 133, 2010.
[5] A. Ude, A. Gams, T. Asfour, and J. Morimoto. Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives. Trans. Rob., (5), October 2010.
[6] B. da Silva, G. Konidaris, and A. Barto. Learning Parameterized Skills. In International Conference on Machine Learning, 2012.
[7] P. Kormushev, S. Calinon, and D. Caldwell. Robot Motor Skill Coordination with EM-based Reinforce- ment Learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010.
[8] C. Daniel, G. Neumann, and J. Peters. Learning Concurrent Motor Skills in Versatile Solution Spaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
[9] George Konidaris, Scott Kuindersma, Roderic Grupen, and Andrew Barto. Robot Learning from Demon- stration by Constructing Skill Trees. International Journal of Robotics Research, 31(3):360375, March 2012.
[10] A. dAvella and E. Bizzi. Shared and Specific Muscle Synergies in Natural Motor Behaviors. Proceedings of the National Academy of Sciences (PNAS), 102(3):30763081, 2005.
[11] M. Williams, B.and Toussaint and A. Storkey. Modelling Motion Primitives and their Timing in Biologi- cally Executed Movements. In Advances in Neural Information Processing Systems (NIPS), 2007.
[12] L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras. Learning Collaborative Impedance-Based Robot Behaviors. In AAAI Conference on Artificial Intelligence, 2013.
[13] E. Rueckert, G. Neumann, M. Toussaint, and W.Pr Maass. Learned Graphical Models for Probabilistic Planning provide a new Class of Movement Primitives. 2012.
[14] L. Righetti and A Ijspeert. Programmable central pattern generators: an application to biped locomotion control. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, 2006.
[15] A. Paraschos, G Neumann, and J. Peters. A probabilistic approach to robot trajectory generation. In Proceedings of the International Conference on Humanoid Robots (HUMANOIDS), 2013.
[16] S. Calinon, P. Kormushev, and D. Caldwell. Compliant Skills Acquisition and Multi-Optima Policy Search with EM-based Reinforcement Learning. Robotics and Autonomous Systems (RAS), 61(4):369  379, 2013.
[17] E. Todorov and M. Jordan. Optimal Feedback Control as a Theory of Motor Coordination. Nature Neuroscience, 5:12261235, 2002.
[18] S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert. Learning Movement Primitives. In International Symposium on Robotics Research, (ISRR), 2003.
[19] A. Lazaric and M. Ghavamzadeh. Bayesian Multi-Task Reinforcement Learning. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.
[20] J. Peters, M. Mistry, F. E. Udwadia, J. Nakanishi, and S. Schaal. A Unifying Methodology for Robot Control with Redundant DOFs. Autonomous Robots, (1):112, 2008.
[21] H. Stark and J. Woods. Probability and Random Processes with Applications to Signal Processing (3rd Edition). 3 edition, August 2001.
[22] M. Toussaint. Robot Trajectory Optimization using Approximate Inference. In Proceedings of the 26th International Conference on Machine Learning, (ICML), 2009.
-----1
[1] A. Bagnell, S. Kakade, A. Ng, and J. Schneider. Policy search by dynamic programming. In Advances in Neural Information Processing Systems (NIPS), 2003.
[2] M. Deisenroth and C. Rasmussen. PILCO: a model-based and data-efficient approach to policy search. In International Conference on Machine Learning (ICML), 2011.
[3] T. Furmston and D. Barber. Variational methods for reinforcement learning. Journal of Ma- chine Learning Research, 9:241248, 2010.
[4] D. Jacobson and D. Mayne. Differential Dynamic Programming. Elsevier, 1970.
[5] S. Julier and J. Uhlmann. A new extension of the Kalman filter to nonlinear systems. In International Symposium on Aerospace/Defense Sensing, Simulation, and Control, 1997.
[6] S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In International Conference on Machine Learning (ICML), 2002.
[7] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal. STOMP: stochastic trajec- tory optimization for motion planning. In International Conference on Robotics and Automa- tion, 2011.
[8] J. Kober and J. Peters. Learning motor primitives for robotics. In International Conference on Robotics and Automation, 2009.
[9] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[10] S. Levine and V. Koltun. Guided policy search. In International Conference on Machine Learning (ICML), 2013.
[11] G. Neumann. Variational inference for policy search in changing situations. In International Conference on Machine Learning (ICML), 2011.
[12] J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4):682697, 2008.
[13] K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. In Robotics: Science and Systems, 2012.
[14] S. Ross, G. Gordon, and A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research, 15:627635, 2011.
[15] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393):8286, 1986.
[16] E. Todorov. Policy gradients in linearly-solvable MDPs. In Advances in Neural Information Processing Systems (NIPS 23), 2010.
[17] E. Todorov and W. Li. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, 2005.
[18] E. Todorov and Y. Tassa. Iterative local dynamic programming. In IEEE Symposium on Adap- tive Dynamic Programming and Reinforcement Learning (ADPRL), 2009.
[19] M. Toussaint. Robot trajectory optimization using approximate inference. In International Conference on Machine Learning (ICML), 2009.
[20] M. Toussaint, L. Charlin, and P. Poupart. Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence (UAI), 2008.
[21] N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots, 27(2):123130, 2009.
[22] K. Yin, K. Loken, and M. van de Panne. SIMBICON: simple biped locomotion control. ACM Transactions Graphics, 26(3), 2007.
[23] B. Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, Carnegie Mellon University, 2010.
-----1
[1] P. Abbeel, A. Coates, and A. Y. Ng. Autonomous helicopter aerobatics through apprenticeship learning.IJRR, 29(13), 2010.
[2] B. Akgun, M. Cakmak, K. Jiang, and A. L. Thomaz. Keyframe-based learning from demonstration. IJSR, 4(4):343355, 2012.
[3] R. Alterovitz, T. Simon, and K. Goldberg. The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty. In RSS, 2007.
[4] J. V. D. Berg, P. Abbeel, and K. Goldberg. Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. In RSS, 2010.
[5] S. Calinon, F. Guenter, and A. Billard. On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, 2007.
[6] D. Dey, T. Y. Liu, M. Hebert, and J. A. Bagnell. Contextual sequence prediction with application to control library optimization. In RSS, 2012.
[7] R. Diankov. Automated Construction of Robotic Manipulation Programs. PhD thesis, CMU, RI, 2010.
[8] A. Dragan and S. Srinivasa. Generating legible motion. In RSS, 2013.
[9] C. J. Green and A. Kelly. Toward optimal sampling in the space of paths. In ISRR. 2007.
[10] Y. Jiang, M. Lim, and A. Saxena. Learning object arrangements in 3d scenes using human context. In ICML, 2012.
[11] Y. Jiang, M. Lim, C. Zheng, and A. Saxena. Learning to place new objects in a scene. IJRR, 31(9), 2012.
[12] Y. Jiang, H. Koppula, and A. Saxena. Hallucinated humans as the hidden context for labeling 3d scenes.In CVPR, 2013.
[13] T. Joachims. Training linear svms in linear time. In KDD, 2006.
[14] T. Joachims, T. Finley, and C. Yu. Cutting-plane training of structural svms. Mach Learn, 77(1), 2009.
[15] S. Karaman and E. Frazzoli. Incremental sampling-based algorithms for optimal motion planning. In RSS, 2010.
[16] E. Klingbeil, D. Rao, B. Carpenter, V. Ganapathi, A. Y. Ng, and O. Khatib. Grasping with application to an autonomous checkout robot. In ICRA, 2011.
[17] J. Kober and J. Peters. Policy search for motor primitives in robotics. Machine Learning, 84(1), 2011.
[18] H. S. Koppula and A. Saxena. Anticipating human activities using object affordances for reactive robotic response. In RSS, 2013.
[19] H. S. Koppula, A. Anand, T. Joachims, and A. Saxena. Semantic labeling of 3d point clouds for indoor scenes. In NIPS, 2011.
[20] S. M. LaValle and J. J. Kuffner. Randomized kinodynamic planning. IJRR, 20(5):378400, 2001.
[21] I. Lenz, H. Lee, and A. Saxena. Deep learning for detecting robotic grasps. In RSS, 2013.
[22] S. Levine and V. Koltun. Continuous inverse optimal control with locally optimal examples. In ICML, 2012.
[23] J. Mainprice, E. A. Sisbot, L. Jaillet, J. Corts, R. Alami, and T. Simon. Planning human-aware motions using a sampling-based costmap planner. In ICRA, 2011.
[24] C. D. Manning, P. Raghavan, and H. Schtze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008.
[25] N. Ratliff. Learning to Search: Structured Prediction Techniques for Imitation Learning. PhD thesis, CMU, RI, 2009.
[26] N. Ratliff, J. A. Bagnell, and M. Zinkevich. Maximum margin planning. In ICML, 2006.
[27] N. Ratliff, D. Bradley, J. A. Bagnell, and J. Chestnutt. Boosting structured prediction for imitation learn- ing. In NIPS, 2007.
[28] N. Ratliff, D. Silver, and J. A. Bagnell. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1):2553, 2009.
[29] N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. In ICRA, 2009.
[30] A. Saxena, J. Driemeyer, and A.Y. Ng. Robotic grasping of novel objects using vision. IJRR, 27(2), 2008.
[31] P. Shivaswamy and T. Joachims. Online structured prediction via coactive learning. In ICML, 2012.
[32] B. Shneiderman and C. Plaisant. Designing The User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publication, 2010.
[33] E. A. Sisbot, L. F. Marin, and R. Alami. Spatial reasoning for human robot interaction. In IROS, 2007.
[34] E. A. Sisbot, L. F. Marin-Urias, R. Alami, and T. Simeon. A human aware mobile robot motion planner.IEEE Transactions on Robotics, 2007.
[35] I. A. Sucan, M. Moll, and L. E. Kavraki. The Open Motion Planning Library. IEEE Robotics & Automa- tion Magazine, 19(4):7282, 2012. http://ompl.kavrakilab.org.
[36] P. Vernaza and J. A. Bagnell. Efficient high dimensional maximum entropy modeling via symmetric partition functions. In NIPS, 2012.
[37] A. Wilson, A. Fern, and P. Tadepalli. A bayesian approach for policy learning from trajectory preference queries. In NIPS, 2012.
[38] F. Zacharias, C. Schlette, F. Schmidt, C. Borst, J. Rossmann, and G. Hirzinger. Making planned paths look more human-like in humanoid robot manipulation planning. In ICRA, 2011.
[39] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning.In AAAI, 2008.
-----1
[1] J. Banks, M. Olson, and D. Porter. An experimental analysis of the bandit problem. Economic Theory, 10:5577, 2013.
[2] R. Bellman. On the theory of dynamic programming. Proceedings of the National Academy of Sciences, 1952.
[3] R. Cho, L. Nystrom, E. Brown, A. Jones, T. Braver, P. Holmes, and J. D. Cohen. Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task. Cognitive, Affective and Behavioral Neuroscience, 2:283299, 2002.
[4] J. D. Cohen, S. M. McClure, and A. J. Yu. Should I stay or should I go? Exploration versus exploitation. Philosophical Transactions of the Royal Society B: Biological Sciences, 362:933 942, 2007.
[5] N. D. Daw, J. P. ODoherty, P. Dayan, B. Seymour, and R. J. Dolan. Cortical substrates for exploratory decisions in humans. Nature, 441:876879, 2006.
[6] A. Ejova, D. J. Navarro, and A. F. Perfors. When to walk away: The effect of variability on keeping options viable. In N. Taatgen, H. van Rijn, L. Schomaker, and J. Nerbonne, editors, Proceedings of the 31st Annual Conference of the Cognitive Science Society, Austin, TX, 2009.
[7] P. Frazier, W. Powell, and S. Dayanik. A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47:24102439, 2008.
[8] W. R. Garner. An informational analysis of absolute judgments of loudness. Journal of Exper- imental Psychology, 46:373380, 1953.
[9] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall/CRC, Boca Raton, FL, 2 edition, 2004.
[10] J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, 41:148177, 1979.
[11] L. P. Kaebling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237285, 1996.
[12] M. D. Lee, S. Zhang, M. Munro, and M. Steyvers. Psychological models of human and optimal performance in bandit problems. Cognitive Systems Research, 12:164174, 2011.
[13] M. I. Posner and Y. Cohen. Components of visual orienting. Attention and Performance Vol.X, 1984.
[14] W. Powell and I. Ryzhov. Optimal Learning. Wiley, 1 edition, 2012.
[15] H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527535, 1952.
[16] I. Ryzhov, W. Powell, and P. Frazier. The knowledge gradient algorithm for a general class of online learning problems. Operations Research, 60:180195, 2012.
[17] J. Shin and D. Ariely. Keeping doors open: The effect of unavailability on incentives to keep options viable. MANAGEMENT SCIENCE, 50:575586, 2004.
[18] M. Steyvers, M. D. Lee, and E.-J. Wagenmakers. A bayesian analysis of human decision- making on bandit problems. Journal of Mathematical Psychology, 53:168179, 2009.
[19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cam- bridge, MA, 1998.
[20] M. C. Treisman and T. C. Williams. A theory of criterion setting with an application to se- quential dependencies. Psychological Review, 91:68111, 1984.
[21] A. J. Yu and J. D. Cohen. Sequential effects: Superstition or rational behavior? In Advances in Neural Information Processing Systems, volume 21, pages 18731880, Cambridge, MA., 2009. MIT Press.
[22] S. Zhang and A. J. Yu. Cheap but clever: Human active learning in a bandit setting. In Proceedings of the Cognitive Science Society Conference, 2013.
-----0
Ahmad, S., & Yu, A. (2013). Active sensing as bayes-optimal sequential decision-making. Uncertainty in Artificial Intelligence.
Bellman, R. (1952). On the theory of dynamic programming. PNAS, 38(8), 716-719.Butko, N. J., & Movellan, J. R. (2010). Infomax control of eyemovements. IEEE Transactions on Autonomous Mental Development, 2(2), 91-107.Itti, L. (2005). Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Visual Cognition, 12(6), 1093-1123.
Itti, L., & Baldi, P. (2006). Bayesian surprise attracts human attention. In Advances in neural information processing systems, vol. 19 (p. 1-8). Cambridge, MA: MIT Press.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10-12), 1489-506.
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol..
Lee, T. S., & Yu, S. (2000). An information-theoretic framework for understanding saccadic behaviors. In Advance in neural information processing systems (Vol. 12). Cambridge, MA: MIT Press.
Naghshvar, M., & Javidi, T. (2012). Active sequential hypothesis testing. arXiv preprint arXiv:1203.4626.
Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434(7031), 387-91.Nickerson, R. S. (1998). Confirmation bias: a ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175.
Yarbus, A. F. (1967). Eye movements and vision. New York: Plenum Press.Yu, A. J., & Cohen, J. D. (2009). Sequential effects: Superstition or rational behavior? Advances in Neural Information Processing Systems, 21, 1873-80.
-----1
[1] D. Di Castro and S. Mannor. Adaptive bases for reinforcement learning. Machine Learning and Knowl- edge Discovery in Databases, pages 312327, 2010.
[2] J.Z. Kolter and A.Y. Ng. Regularization and feature selection in least-squares temporal difference learn- ing. In International Conference on Machine Learning, 2009.
[3] P.W. Keller, S. Mannor, and D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In International Conference on Machine Learning, 2006.
[4] P. Manoonpong, F. Worgotter, and J. Morimoto. Extraction of reward-related feature space using correlation-based and reward-based learning methods. Neural Information Processing. Theory and Al- gorithms, pages 414421, 2010.
[5] A. Geramifard, F. Doshi, J. Redding, N. Roy, and J.P. How. Online discovery of feature dependencies. In International Conference on Machine Learning, 2011.
[6] R. Parr, C. Painter-Wakefield, L. Li, and M. Littman. Analyzing feature generation for value-function approximation. In International Conference on Machine Learning, 2007.
[7] J. Boyan and A.W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems, 1995.
[8] E.J. Cande`s and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies. Information Theory, IEEE Transactions on, 52(12):54065425, 2006.
[9] E.J. Cande`s and M.B. Wakin. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):2130, 2008.
[10] O.A. Maillard and R. Munos. Linear regression with random projections. Journal of Machine Learning Research, 13:27352772, 2012.
[11] M.M. Fard, Y. Grinberg, J. Pineau, and D. Precup. Compressed least-squares regression on sparse spaces.In AAAI, 2012.
[12] O.A. Maillard and R. Munos. Compressed least-squares regression. In Advances in Neural Information Processing Systems, 2009.
[13] S. Zhou, J. Lafferty, and L. Wasserman. Compressed regression. In Proceedings of Advances in neural information processing systems, 2007.
[14] M. Ghavamzadeh, A. Lazaric, O.A. Maillard, and R. Munos. LSTD with random projections. In Advances in Neural Information Processing Systems, 2010.
[15] B.A. Olshausen, P. Sallee, and M.S. Lewicki. Learning sparse image codes using a wavelet pyramid architecture. In Advances in neural information processing systems, 2001.
[16] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[17] S.J. Bradtke and A.G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1):3357, 1996.
[18] J.A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49(2): 233246, 2002.
[19] M.G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4: 11071149, 2003. ISSN 1532-4435.
[20] H.R. Maei and R.S. Sutton. GQ (?): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Third Conference on Artificial General Intelligence, 2010.
[21] I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1):215238, 2005.
[22] M.A. Davenport, M.B. Wakin, and R.G. Baraniuk. Detection and estimation with compressive measure- ments. Dept. of ECE, Rice University, Tech. Rep, 2006.
[23] M.M. Fard, Y. Grinberg, J. Pineau, and D. Precup. Random projections preserve linearity in sparse spaces.School of Computer Science, Mcgill University, Tech. Rep, 2012.
[24] M.A. Davenport, P.T. Boufounos, M.B. Wakin, and R.G. Baraniuk. Signal processing with compressive measurements. Selected Topics in Signal Processing, IEEE Journal of, 4(2):445460, 2010.
[25] Andrew Y Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX, pages 363372. Springer, 2006.
[26] Richard Barrett, Michael Berry, Tony F Chan, James Demmel, June Donato, Jack Dongarra, Victor Ei- jkhout, Roldan Pozo, Charles Romine, and Henk Van der Vorst. Templates for the solution of linear systems: building blocks for iterative methods. Number 43. Society for Industrial and Applied Mathemat- ics, 1987.
[27] A.M. Farahmand, M. Ghavamzadeh, and C. Szepesvari. Regularized policy iteration. In Advances in Neural Information Processing Systems, 2010.
[28] J. Johns, C. Painter-Wakefield, and R. Parr. Linear complementarity for regularized policy evaluation and improvement. In Advances in Neural Information Processing Systems, 2010.
-----1
[Brafman and Tennenholtz, 2002] Brafman, R. I. and Tennenholtz, M. (2002). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213231.
[Bubeck and Slivkins, 2012] Bubeck, S. and Slivkins, A. (2012). The best of both worlds: Stochas- tic and adversarial bandits. Journal of Machine Learning Research - Proceedings Track, 23:42.1 42.23.
[Even-Dar et al., 2005] Even-Dar, E., Kakade, S. M., and Mansour, Y. (2005). Experts in a markov decision process. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Informa- tion Processing Systems 17, pages 401408. MIT Press, Cambridge, MA.
[Even-Dar et al., 2009] Even-Dar, E., Kakade, S. M., and Mansour, Y. (2009). Online markov decision processes. Math. Oper. Res., 34(3):726736.
[Iyengar, 2005] Iyengar, G. N. (2005). Robust dynamic programming. Math. Oper. Res., 30(2):257 280.
[Jaksch et al., 2010] Jaksch, T., Ortner, R., and Auer, P. (2010). Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 99:15631600.
[Mannor et al., 2012] Mannor, S., Mebel, O., and Xu, H. (2012). Lightning does not strike twice: Robust mdps with coupled uncertainty. In ICML.
[Mannor et al., 2007] Mannor, S., Simester, D., Sun, P., and Tsitsiklis, J. N. (2007). Bias and variance approximation in value function estimates. Manage. Sci., 53(2):308322.
[McDiarmid, 1989] McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics, number 141 in London Mathematical Society Lecture Note Series, pages 148 188. Cambridge University Press.
[Neu et al., 2012] Neu, G., Gyorgy, A., and Szepesvari, C. (2012). The adversarial stochastic short- est path problem with unknown transition probabilities. Journal of Machine Learning Research - Proceedings Track, 22:805813.
[Neu et al., 2010] Neu, G., Gyorgy, A., Szepesvari, C., and Antos, A. (2010). Online markov deci- sion processes under bandit feedback. In NIPS, pages 18041812.
[Nilim and El Ghaoui, 2005] Nilim, A. and El Ghaoui, L. (2005). Robust control of markov deci- sion processes with uncertain transition matrices. Oper. Res., 53(5):780798.
[Puterman, 1994] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dy- namic Programming. Wiley-Interscience.
[Strens, 2000] Strens, M. (2000). A bayesian framework for reinforcement learning. In In Proceed- ings of the Seventeenth International Conference on Machine Learning, pages 943950. ICML.
[Tewari and Bartlett, 2007] Tewari, A. and Bartlett, P. (2007). Bounded parameter markov decision processes with average reward criterion. Learning Theory, pages 263277.
[Weissman et al., 2003] Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., and Weinberger, M. J. (2003). Inequalities for the l1 deviation of the empirical distribution. Technical report, Information Theory Research Group, HP Laboratories.
[Xu and Mannor, 2012] Xu, H. and Mannor, S. (2012). Distributionally robust markov decision processes. Math. Oper. Res., 37(2):288300.
[Yu and Mannor, 2009] Yu, J. Y. and Mannor, S. (2009). Arbitrarily modulated markov decision processes. In CDC, pages 29462953.
[Yu et al., 2009] Yu, J. Y., Mannor, S., and Shimkin, N. (2009). Markov decision processes with arbitrary reward processes. Math. Oper. Res., 34(3):737757.
-----1
[1] S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10:251276, 1998.
[2] K. J. A?strom and T. Hagglund. PID Controllers: Theory, Design, and Tuning. ISA: The Instrumentation, Systems, and Automation Society, 1995.
[3] M. T. Soylemez, N. Munro, and H. Baki. Fast calculation of stabilizing PID controllers. Automatica, 39 (1):121126, 2003.
[4] C. L. Lynch and M. R. Popovic. Functional electrical stimulation. In IEEE Control Systems Magazine, volume 28, pages 4050.
[5] E. K. Chadwick, D. Blana, A. J. van den Bogert, and R. F. Kirsch. A real-time 3-D musculoskeletal model for dynamic simulation of arm movements. In IEEE Transactions on Biomedical Engineering, volume 56, pages 941948, 2009.
[6] K. Jagodnik and A. van den Bogert. A proportional derivative FES controller for planar arm movement.In 12th Annual Conference International FES Society, Philadelphia, PA, 2007.
[7] P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Twenty- First Innovative Applications of Artificial Intelligence, 2009.
[8] T. J. Perkins and A. G. Barto. Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3:803832, 2003.
[9] H. Bendrahim and J. A. Franklin. Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems, 22:283302, 1997.
[10] A. Arapostathis, R. Kumar, and S. P. Hsu. Control of markov chains with safety bounds. In IEEE Transactions on Automation Science and Engineering, volume 2, pages 333343, October 2005.
[11] E. Arvelo and N. C. Martins. Control design for Markov chains under safety constraints: A convex approach. CoRR, abs/1209.2883, 2012.
[12] P. Geibel and F. Wysotzki. Risk-sensitive reinforcement learning applied to control under constraints.Journal of Artificial Intelligence Research 24, pages 81108, 2005.
[13] S. Kuindersma, R. Grupen, and A. G. Barto. Variational bayesian optimization for runtime risk-sensitive control. In Robotics: Science and Systems VIII, 2012.
[14] S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45(11):24712482, 2009.
[15] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Technical Report UCB/EECS-2010-24, Electrical Engineering and Computer Sciences, University of California at Berkeley, March 2010.
[16] S. Amari and S. Douglas. Why natural gradient? In Proceedings of the 1998 IEEE International Confer- ence on Acoustics, Speech, and Signal Processing, volume 2, pages 12131216, 1998.
[17] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, 1983.
[18] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex opti- mization. Operations Research Letters, 2003.
[19] S. Mahadevan and B. Liu. Sparse Q-learning with mirror descent. In Proceedings of the Conference on Unvertainty in Artificial Intelligence, 2012.
[20] S. Mahadevan, S. Giguere, and N. Jacek. Basis adaptation for sparse nonlinear reinforcement learning.In Proceedings of the Conference on Artificial Intelligence, 2013.
[21] R. Tyrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1970.
[22] J. Nocedal and S. Wright. Numerical Optimization. Springer, second edition, 2006.
[23] S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, volume 14, pages 15311538, 2002.
[24] R. S. Sutton, D. McAllester, S. Singh, and Y.Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057 1063, 2000.
[25] T. Morimura, E. Uchibe, and K. Doya. Utilizing the natural gradient in temporal difference reinforcement learning with eligibility traces. In International Symposium on Information Geometry and its Application, 2005.
[26] J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:11801190, 2008.
[27] P. S. Thomas and A. G. Barto. Motor primitive discovery. In Procedings of the IEEE Conference on Development and Learning and EPigenetic Robotics, 2012.
[28] T. Degris, P. M. Pilarski, and R. S. Sutton. Model-free reinforcement learning with continuous action in practice. In Proceedings of the 2012 American Control Conference, 2012.
[29] P. S. Thomas. Bias in natural actor-critic algorithms. Technical Report UM-CS-2012-018, Department of Computer Science, University of Massachusetts at Amherst, 2012.
[30] D. Blana, R. F. Kirsch, and E. K. Chadwick. Combined feedforward and feedback control of a redundant, nonlinear, dynamic musculoskeletal system. Medical and Biological Engineering and Computing, 47: 533542, 2009.
[31] P. Deegan. Whole-Body Strategies for Mobility and Manipulation. PhD thesis, University of Mas- sachusetts Amherst, 2010.
[32] S. R. Kuindersma, E. Hannigan, D. Ruiken, and R. A. Grupen. Dexterous mobility with the uBot-5 mobile manipulator. In Proceedings of the 14th International Conference on Advanced Robotics, 2009.
-----1
[1] A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for markov decision processes.Mathematics of Operations Research, 22(1):222255, 1997.
[2] P. R. Kumar and P. Varaiya. Stochastic systems: estimation, identification and adaptive control. Prentice-Hall, Inc., 1986.
[3] T.L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):422, 1985.
[4] T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning.The Journal of Machine Learning Research, 99:15631600, 2010.
[5] P. L. Bartlett and A. Tewari. Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 3542. AUAI Press, 2009.
[6] R. I. Brafman and M. Tennenholtz. R-max-a general polynomial time algorithm for near- optimal reinforcement learning. The Journal of Machine Learning Research, 3:213231, 2003.
[7] S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, University of London, 2003.
[8] M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209232, 2002.
[9] W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285294, 1933.
[10] O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Neural Information Processing Systems (NIPS), 2011.
[11] S.L. Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639658, 2010.
[12] S. Agrawal and N. Goyal. Further optimal regret bounds for Thompson sampling. arXiv preprint arXiv:1209.3353, 2012.
[13] S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. arXiv preprint arXiv:1209.3352, 2012.
[14] E. Kauffmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite time analysis. In International Conference on Algorithmic Learning Theory, 2012.
[15] D. Russo and B. Van Roy. Learning to optimize via posterior sampling. CoRR, abs/1301.2609, 2013.
[16] M. Strens. A Bayesian framework for reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pages 943950, 2000.
[17] J. Z. Kolter and A. Y. Ng. Near-Bayesian exploration in polynomial time. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 513520. ACM, 2009.
[18] T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd international conference on Machine learning, pages 956963. ACM, 2005.
[19] A. Guez, D. Silver, and P. Dayan. Efficient bayes-adaptive reinforcement learning using sample- based search. arXiv preprint arXiv:1205.3109, 2012.
[20] J. Asmuth and M. L. Littman. Approaching bayes-optimalilty using monte-carlo tree search.In Proc. 21st Int. Conf. Automat. Plan. Sched., Freiburg, Germany, 2011.
[21] A. L. Strehl and M. L. Littman. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):13091331, 2008.
[22] S. Filippi, O. Cappe, and A. Garivier. Optimism in reinforcement learning based on kullback- leibler divergence. CoRR, abs/1004.5229, 2010.A Relating Bayesian to frequentist regret Let M be any family of MDPs with non-zero probability under the prior. Then, for any  > 0, ? > 12 : P ( Regret(T, piPS? ) T? >  ??M? ?M)? 0 This provides regret bounds even if M? is not distributed according to f . As long as the true MDP is not impossible under the prior, we will have an asymptotic frequentist regret close to the theoretical lower bounds of in T -dependence of O( ? T ).Proof. We have for any  > 0: E[Regret(T, piPS? )] T? ? E 
[ Regret(T, piPS? ) T? ??M? ?M]P (M? ?M) ? P ( Regret(T, piPS? ) T? ??M? ?M)P (M? ?M) Therefore via theorem (1), for any ? > 12 : P ( Regret(T, piPS? ) T? ??M? ?M) ? ( 1 P (M? ?M) ) E[Regret(T, piPS? )] T? ? 0 B Bounding the sum of confidence set widths We are interested in bounding min{??m k=1 ?? i=1 min{?kstk+i, atk+i), 1}, T} which we claim is O(?S ? AT log(SAT ) for ?k(s, a) := ? 14S log(2SAmtk) max{1,Ntk (s,a)} .Proof. In a manner similar to [4] we can say: m? k=1 ?? i=1 ? 14S log(2SAmtk) max{1, Ntk (s, a)} ? m? k=1 ?? i=1 1{Ntk??} + m? k=1 ?? i=1 1{Ntk>?} ? 14S log(2SAmtk) max{1, Ntk (s, a)} Now, the consider the event (st, at) = (s, a) and (Ntk (s, a) ? ?). This can happen fewer than 2? times per state action pair. Therefore, ?m k=1 ?? i=1 1(Ntk (s, a) ? ?) ? 2?SA.Now, suppose Ntk (s, a) > ? . Then for any t ? {tk, .., tk+1 ? 1}, Nt(s, a) + 1 ? Ntk (s, a) + ? ? 2Ntk (s, a).Therefore: m? k=1 tk+1?1? t=tk ? 1(Ntk (st, at) > ?) Ntk (st, at) ? m? k=1 tk+1?1? t=tk ? Nt(st, at) + 1 = ? T? t=1 (Nt(st, at) + 1)?1/2 ? ? ? s,a NT+1(s,a)? j=1 j?1/2 ? ? ? s,a ? NT+1(s,a) x=0 x?1/2 dx ? ? 2SA ? s,a NT+1(s, a) = ? 2SAT Note that since all rewards and transitions are absolutely constrained ? [0, 1] our regret min{? m? k=1 ?? i=1 min{?k(stk+i, atk+i), 1}, T} ? min{2?2SA+ ? ? 28S2AT log(SAT ), T} ? ? 2?2SAT + ? ? 28S2AT log(SAT ) ? ?S ? 30AT log(SAT ) Which is our required result.
-----1
[1] Jan Peters and Stefan Schaal. Policy gradient methods for robotics. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 22192225. IEEE, 2006.
[2] James C Spall. Multivariate stochastic approximation using a simultaneous perturbation gra- dient approximation. Automatic Control, IEEE Transactions on, 37(3):332341, 1992.
[3] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist rein- forcement learning. Machine Learning, 8(3-4):229256, May 1992.
[4] Jonathan Baxter and Peter L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319350, 2001.
[5] Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in neural infor- mation processing systems, 12(22), 2000.
[6] Jan Peters and Stefan Schaal. Reinforcement learning of motor skills with policy gradients.Neural Networks, 21(4):682697, 2008.
[7] Sham Kakade. A natural policy gradient. Advances in neural information processing systems, 14:15311538, 2001.
[8] Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7):11801190, 2008.
[9] Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Math- ematical Statistics, pages 400407, 1951.
[10] P. Wagner. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration. Advances in Neural Information Processing Systems, 24, 2011.
[11] Jorge J More and David J Thuente. Line search algorithms with guaranteed sufficient decrease.ACM Transactions on Mathematical Software (TOMS), 20(3):286307, 1994.
[12] J. Kober and J. Peters. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press, 2009.
[13] Nikos Vlassis, Marc Toussaint, Georgios Kontes, and Savas Piperidis. Learning model-free robot control by a monte carlo em algorithm. Autonomous Robots, 27(2):123130, 2009.
[14] S.M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, PhD thesis, University College London, 2003.
[15] Matteo Pirotta, Marcello Restelli, Alessio Pecorino, and Daniele Calandriello. Safe policy iteration. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th In- ternational Conference on Machine Learning (ICML-13), volume 28, pages 307315. JMLR Workshop and Conference Proceedings, May 2013.
[16] S. Pinsker. Information and Information Stability of Random Variable and Processes. Holden- Day Series in Time Series Analysis. Holden-Day, Inc., 1964.
[17] Tingting Zhao, Hirotaka Hachiya, Gang Niu, and Masashi Sugiyama. Analysis and improve- ment of policy gradient estimation. Neural Networks, 26(0):118  129, 2012.
-----1
[1] C. L. Isbell, C. Shelton, M. Kearns, S. Singh, and P. Stone, A social reinforcement learning agent, in Proc. of the 5th Intl. Conf. on Autonomous Agents, pp. 377384, 2001.
[2] H. S. Chang, Reinforcement learning with supervision by combining multiple learnings and expert advices, in Proc. of the American Control Conference, 2006.
[3] W. B. Knox and P. Stone, Tamer: Training an agent manually via evaluative reinforcement, in Proc. of the 7th IEEE ICDL, pp. 292297, 2008.
[4] A. Tenorio-Gonzalez, E. Morales, and L. Villaseor-Pineda, Dynamic reward shaping: training a robot by voice, in Advances in Artificial IntelligenceIBERAMIA, pp. 483492, 2010.
[5] P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, and R. S. Sutton, Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning, in Proc. of the IEEE ICORR, pp. 17, 2011.
[6] A. L. Thomaz and C. Breazeal, Teachable robots: Understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, vol. 172, no. 6-7, pp. 716737, 2008.
[7] W. B. Knox and P. Stone, Combining manual feedback with subsequent MDP reward signals for reinforcement learning, in Proc. of the 9th Intl. Conf. on AAMAS, pp. 512, 2010.
[8] R. Dearden, N. Friedman, and S. Russell, Bayesian Q-learning, in Proc. of the 15th AAAI, pp. 761768, 1998.
[9] C. Watkins and P. Dayan, Q learning: Technical note, Machine Learning, vol. 8, no. 3-4, pp. 279292, 1992.
[10] T. Matthews, S. D. Ramchurn, and G. Chalkiadakis, Competing with humans at fantasy football: Team formation in large partially-observable domains, in Proc. of the 26th AAAI, pp. 13941400, 2012.
[11] A. Y. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, in Proc. of the 16th ICML, pp. 341348, 1999.
[12] C. L. Isbell, M. Kearns, S. Singh, C. R. Shelton, P. Stone, and D. Kormann, Cobot in Lamb- daMOO: An Adaptive Social Statistics Agent, JAAMAS, vol. 13, no. 3, pp. 327354, 2006.
[13] W. B. Knox and P. Stone, Reinforcement learning from simultaneous human and MDP re- ward, in Proc. of the 11th Intl. Conf. on AAMAS, pp. 475482, 2012.
[14] A. Y. Ng and S. Russell, Algorithms for inverse reinforcement learning, in Proc. of the 17th ICML, 2000.
[15] P. Abbeel and A. Y. Ng, Apprenticeship learning via inverse reinforcement learning, in Proc.of the 21st ICML, 2004.
[16] C. Atkeson and S. Schaal, Learning tasks from a single demonstration, in Proc. of the IEEE ICRA, pp. 17061712, 1997.
[17] M. Taylor, H. B. Suay, and S. Chernova, Integrating reinforcement learning with human demonstrations of varying ability, in Proc. of the Intl. Conf. on AAMAS, pp. 617624, 2011.
[18] L. P. Kaelbling, M. L. Littmann, and A. W. Moore, Reinforcement learning: A survey, JAIR, vol. 4, pp. 237285, 1996.
[19] W. D. Smart and L. P. Kaelbling, Effective reinforcement learning for mobile robots, 2002.
[20] R. Maclin and J. W. Shavlik, Creating advice-taking reinforcement learners, Machine Learn- ing, vol. 22, no. 1-3, pp. 251281, 1996.
[21] L. Torrey, J. Shavlik, T. Walker, and R. Maclin, Transfer learning via advice taking, in Ad- vances in Machine Learning I, Studies in Computational Intelligence (J. Koronacki, S. Wirz- chon, Z. Ras, and J. Kacprzyk, eds.), vol. 262, pp. 147170, Springer Berlin Heidelberg, 2010.
[22] C. Bailer-Jones and K. Smith, Combining probabilities. GAIA-C8-TN-MPIA-CBJ-053, 2011.
[23] M. L. Littman, G. A. Keim, and N. Shazeer, A probabilistic approach to solving crossword puzzles, Artificial Ingelligence, vol. 134, no. 1-2, pp. 2355, 2002.
[24] G. Konidaris and A. Barto, Autonomous shaping: Knowledge transfer in reinforcement learn- ing, in Proc. of the 23rd ICML, pp. 489496, 2006.
-----1
convergence can be established with enough softness [18, 14]. The natural actor-critic algorithm, which is convergent and optimal, is placed to the lower left corner by Theorem 1, while the in- evitable non-optimality of soft-greedy variants toward right follows from Theorems 2 and 3. The exact (problem-dependent) place and shape of the line separating non-convergent and convergent soft-greedy variants (dashed line on the map) remains an open problem.The main value of Theorem 1 is in bringing the greedy value function and policy gradient method- ologies closer to each other. In our context, the unifying NAC(?) formulation in (8) permits interpo- lation between the methodologies using the ? parameter. As discussed at the end of Section 4, the policy-forgetting term requires a Markovian problem for being justified: a greedy update implicitly 0? 0 ? ? Non-optimistic hard-greedy 7 Oscillation (Bertsekas, . . . ) 7 Non-optimality Optimistic hard-greedy 7 Chattering (Bertsekas, . . . ) 7 Non-optimality Natural actor-critic 3 Convergence (Theorem 1) 3 Optimality Non-optimistic soft-greedy (small ? ) 7 Non-convergence (Perkins & Precup) 7 Non-optimality (Theorems 23) Non-optimistic soft-greedy (large ? ) 3 Convergence (Perkins & Precup) 7 Non-optimality (Theorems 23) Optimistic soft-greedy (large ? ) 3 Convergence (Melo et al.) 7 Non-optimality (Theorems 23) cf. Fig. 2b cf. F ig. 2 c Figure 1: The hyperparameter space of the general form of (approximate) optimistic policy iteration in (4), with known convergence and optimality properties (see text for assumptions).y1 s1 s2 0 1 1/4 al al ar ar (a) A non-Markovian problem (adapted from 
[24]). The incoming arrow indicates the start state.Arrows leading out indicate termination with the shown reward.0 5 10 15 20 0.5 iteration ? ( le ft ) ? ? ( ri gh t) ? = 0.2, ? = 1 ? = 0.2, ? = 0.2 ? = 0.2, ? = 0.05 (b) Non-optimality or oscillation with ? 6? 0. The variants are marked with in Fig. 1 (schematic).0 5 10 15 20 0.5 iteration ? ( le ft ) ? ? ( ri gh t) ? = ? = 0.2 ? = ? = 0.05 ? = ? = 0.01 NAC (? = 1) (c) Interpolation toward NAC with ?? 0 and ? ? 0. The variants are marked with in Fig. 1 (schematic).Figure 2: Empirical illustration of the behavior of optimistic policy iteration ((1), (2), (4) and (5), with tabular ?) in the proximity of a stochastic optimum. The problem is shown in Fig. 2a. In Figures 2b and 2c, the optimum at ?(left) ? ?(right) = log(2) is denoted by a solid green line. The uniformly stochastic policy is denoted by a dashed red line.stands on aMarkov assumption and the ? parameter in (8) can be interpreted as adjusting the strength of this assumption. In this respect, the policy improvement parameter ? in NAC(?) can be seen (in- versely) as a dual in spirit to the policy evaluation parameter ? in TD(?)-style algorithms. On the policy evaluation side, having ? = 0 obtains variance reduction by assuming and exploiting Marko- vianity of the problem, while ? = 1 obtains unbiased estimates also for non-Markovian problems.On the policy improvement side, with ? = 1, we have strictly greedy updates that gain in speed as the policy can respond instantly to new opportunities appearing in the value function (for empirical observations of such a speed gain, see [11, 25]), and in representational flexibility due to the lack of continuity constraints between successive policies (for a canonical example, consider fitted Q itera- tion). This comes at the price of either oscillation or non-optimality if the Markov assumption fails to hold, which is illustrated in Figure 2b for the problem in 2a. With ? ? 0, we approach natural gradient updates that remain sound also in non-Markovian settings, which is illustrated in Figure 2c.The possibility to interpolate between the approaches might turn out useful in problems with partial Markovianity: a large ? in the NAC(?) formulation can be used to quickly find the rough direction of the strongest attractors, after which gradually decreasing ? allows a convergent final ascent toward an optimum.Acknowledgments This work has been financially supported by the Academy of Finland through project no. 254104, and by the Foundation of Nokia Corporation.
[1] Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1), 13.
[2] Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDPs via direct gradient ascent. In Proceedings of the Seventeenth International Conference on Machine Learning, (pp. 4148).
[3] Bertsekas, D. P. (1997). A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4), 913926.
[4] Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
[5] Bertsekas, D. P. (2010). Pathologies of temporal difference methods in approximate dynamic program- ming. In 49th IEEE Conference on Decision and Control, (pp. 30343039).
[6] Bertsekas, D. P. (2011). Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3), 310335.
[7] Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
[8] Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., & Lee, M. (2009). Natural actor-critic algorithms. Auto- matica, 45(11), 24712482.
[9] Busoniu, L., Babus?ka, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. CRC Press.
[10] De Farias, D. P., & Van Roy, B. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3), 589608.
[11] Kakade, S. M. (2002). A natural policy gradient. In Advances in Neural Information Processing Systems.
[12] Kakade, S. M. (2003). On the Sample Complexity of Reinforcement Learning. Ph.D. thesis, University College London.
[13] Konda, V. R., & Tsitsiklis, J. N. (2004). On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 11431166.
[14] Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference onMachine Learning, (pp. 664671).
[15] Nedic, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications, 13(12), 79110.
[16] Pendrith, M. D., &McGarity, M. J. (1998). An analysis of direct reinforcement learning in non-Markovian domains. In Proceedings of the Fifteenth International Conference on Machine Learning.
[17] Perkins, T. J., & Pendrith, M. D. (2002). On the existence of fixed points for Q-learning and sarsa in partially observable domains. In Proceedings of the Nineteenth International Conference on Machine Learning, (pp. 490497).
[18] Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In Advances in Neural Information Processing Systems.
[19] Peters, J., & Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7-9), 11801190.
[20] Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the Eleventh International Conference on Machine Learning.
[21] Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
[22] Sutton, R. S., McAllester, D., Singh, S., &Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems.
[23] Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool Publishers.
[24] Wagner, P. (2011). A reinterpretation of the policy oscillation phenomenon in approximate policy itera- tion. In Advances in Neural Information Processing Systems 24, (pp. 25732581).
[25] Wagner, P. (to appear). Policy oscillation is overshooting. Neural Networks. Author manuscript available at http://users.ics.aalto.fi/pwagner/.
-----1
[1] O. Madani, S. Hanks, and A. Condon. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proc. AAAI Conf. on Arti- ficial Intelligence, pages 541548, 1999. 1 
[2] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa. Online planning algorithms for POMDPs. J.Artificial Intelligence Research, 32(1):663704, 2008. 1, 2, 7, 8 
[3] D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems (NIPS). 2010. 1, 2, 7, 8 
[4] L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In Proc. Eur. Conf. on Machine Learning, pages 282293, 2006. 1 
[5] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proc. Uncertainty in Artificial Intelligence, 2007. 1 
[6] A.Y. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs.In Proc. Uncertainty in Artificial Intelligence, pages 406415, 2000. 1, 2 
[7] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In Proc. Int. Jnt. Conf. on Artificial Intelligence, pages 477484, 2003. 2, 7 
[8] M.T.J. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for POMDPs.J. Artificial Intelligence Research, 24:195220, 2005. 2 
[9] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and imple- mentation. In Proc. Uncertainty in Artificial Intelligence, 2005. 2 
[10] H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008. 2, 7 
[11] M. Kearns, Y. Mansour, and A.Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems (NIPS), volume 12, pages 10011007. 1999. 2 
[12] D. McAllester and S. Singh. Approximate planning for factored POMDPs using belief state simplification. In Proc. Uncertainty in Artificial Intelligence, 1999. 2 
[13] J. Asmuth and M.L. Littman. Approaching Bayes-optimality using Monte-Carlo tree search.In Proc. Int. Conf. on Automated Planning & Scheduling, 2011. 2 
[14] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, 3rd edition, 2005. 2 
[15] S. Gelly and D. Silver. Combining online and offline knowledge in UCT. In Proc. Int. Conf.on Machine Learning, 2007. 2 
[16] R. He, E. Brunskill, and N. Roy. Efficient planning under uncertainty with macro-actions. J.Artificial Intelligence Research, 40(1):523570, 2011. 2 
[17] E.K.P. Chong, R.L. Givan, and H.S. Chang. A framework for simulation-based network control via hindsight optimization. In Proc. IEEE Conf. on Decision & Control, volume 2, pages 1433 1438, 2000. 3 
[18] S.W. Yoon, A. Fern, R. Givan, and S. Kambhampati. Probabilistic planning via determinization in hindsight. In AAAI, pages 10101016, 2008. 3 
[19] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78150, 1992. 4 
[20] T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. In Proc. Uncertainty in Artificial Intelligence, pages 520527, 2004. 5, 7 
[21] S. Ross and B. Chaib-Draa. AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs. In Proc. Int. Jnt. Conf. on Artificial Intelligence, pages 25922598. 2007. 7 
[22] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robotics Research, 29(8):10531068, 2010. 8 
-----1
[1] D. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
[2] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[3] H. Burgiel. How to Lose at Tetris. Mathematical Gazette, 81:194200, 1997.
[4] E. Demaine, S. Hohenberger, and D. Liben-Nowell. Tetris is hard, even to approximate. In Proceedings of the Ninth International Computing and Combinatorics Conference, pages 351 363, 2003.
[5] C. Fahey. Tetris AI, Computer plays Tetris, 2003. http://colinfahey.com/tetris/ tetris.html.
[6] V. Farias and B. van Roy. Tetris: A study of randomized constraint sampling. Springer-Verlag, 2006.
[7] A. Fern, S. Yoon, and R. Givan. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. Journal of Artificial Intelligence Research, 25:75118, 2006.
[8] T. Furmston and D. Barber. A unifying perspective of parametric policy search methods for Markov decision processes. In Proceedings of the Advances in Neural Information Processing Systems, pages 27262734, 2012.
[9] V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer. Classification-based policy itera- tion with a critic. In Proceedings of ICML, pages 10491056, 2011.
[10] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strate- gies. Evolutionary Computation, 9:159195, 2001.
[11] S. Kakade. A natural policy gradient. In Proceedings of the Advances in Neural Information Processing Systems, pages 15311538, 2001.
[12] M. Lagoudakis and R. Parr. Reinforcement Learning as Classification: Leveraging Modern Classifiers. In Proceedings of ICML, pages 424431, 2003.
[13] A. Lazaric, M. Ghavamzadeh, and R. Munos. Analysis of a Classification-based Policy Itera- tion Algorithm. In Proceedings of ICML, pages 607614, 2010.
[14] M. Puterman and M. Shin. Modified policy iteration algorithms for discounted Markov deci- sion problems. Management Science, 24(11), 1978.
[15] R. Rubinstein and D. Kroese. The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Springer-Verlag, 2004.
[16] B. Scherrer. Performance Bounds for ?-Policy Iteration and Application to the Game of Tetris.Journal of Machine Learning Research, 14:11751221, 2013.
[17] B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist. Approximate modified policy itera- tion. In Proceedings of ICML, pages 12071214, 2012.
[18] I. Szita and A. Lo?rincz. Learning Tetris Using the Noisy Cross-Entropy Method. Neural Computation, 18(12):29362941, 2006.
[19] C. Thiery and B. Scherrer. Building Controllers for Tetris. International Computer Games Association Journal, 32:311, 2009.
[20] C. Thiery and B. Scherrer. Improvements on Learning Tetris with Cross Entropy. International Computer Games Association Journal, 32, 2009.
[21] C. Thiery and B. Scherrer. MDPTetris features documentation, 2010. http:// mdptetris.gforge.inria.fr/doc/feature_functions_8h.html.
[22] J. Tsitsiklis and B Van Roy. Feature-based methods for large scale dynamic programming.Machine Learning, 22:5994, 1996.
-----1
[1] Christopher G. Atkeson and Juan Carlos Santamaria. A comparison of direct and model-based reinforce- ment learning. In International Conference on Robotics and Automation, pages 35573564, 1997.
[2] Peter L Bartlett and Jonathan Baxter. Stochastic optimization of controlled partially observable markov decision processes. In Proceedings of the 39th IEEE Conference on Decision and Control., volume 1, pages 124129, 2000.
[3] Urszula Chajewska, Daphne Koller, and Ronald Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 363 369, 2000.
[4] Levente Kocsis and Csaba Szepesvari. Bandit based monte-carlo planning. In Machine Learning: ECML, pages 282293. Springer, 2006.
[5] George Konidaris and Andrew Barto. Autonomous shaping: Knowledge transfer in reinforcement learn- ing. In Proceedings of the 23rd International Conference on Machine learning, pages 489496, 2006.
[6] George Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learn- ing. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, volume 2, pages 895900, 2007.
[7] Alessandro Lazaric, Marcello Restelli, and Andrea Bonarini. Transfer of samples in batch reinforcement learning. In Proceedings of the 25th International Conference on Machine learning, pages 544551, 2008.
[8] Yaxin Liu and Peter Stone. Value-function-based transfer for reinforcement learning using structure map- ping. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, volume 21(1), page 415, 2006.
[9] Sriraam Natarajan and Prasad Tadepalli. Dynamic preferences in multi-criteria reinforcement learning.In Proceedings of the 22nd International Conference on Machine learning, 2005.
[10] Andrew Y. Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: The- ory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 278287, 1999.
[11] Andrew Y Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 663670, 2000.
[12] Theodore J Perkins and Doina Precup. Using options for knowledge transfer in reinforcement learning.University of Massachusetts, Amherst, MA, USA, Tech. Rep, 1999.
[13] Satinder Singh, Richard L Lewis, Andrew G Barto, and Jonathan Sorg. Intrinsically motivated reinforce- ment learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development., 2(2):7082, 2010.
[14] Jonathan Sorg, Satinder Singh, and Richard L Lewis. Reward design via online gradient ascent. Advances of Neural Information Processing Systems, 23, 2010.
[15] Jonathan Sorg, Satinder Singh, and Richard L Lewis. Optimal rewards versus leaf-evaluation heuristics in planning agents. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, 2011.
[16] Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning on the distribution of mdps.In Proceedings IEEE International Symposium on Computational Intelligence in Robotics and Automa- tion., volume 3, pages 11081113, 2003.
[17] Matthew E Taylor, Nicholas K Jong, and Peter Stone. Transferring instances for model-based reinforce- ment learning. In Machine Learning and Knowledge Discovery in Databases, pages 488505. 2008.
[18] Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10:16331685, 2009.
[19] Matthew E Taylor, Shimon Whiteson, and Peter Stone. Transfer via inter-task mappings in policy search reinforcement learning. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, page 37, 2007.
[20] Lisa Torrey and Jude Shavlik. Policy transfer via Markov logic networks. In Inductive Logic Program- ming, pages 234248. Springer, 2010.
[21] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279292, 1992.
-----1
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, pages 798805, 2006.
[2] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2):174188, 2002.
[3] B. Babenko, M. Yang, and S. Belongie. Robust object tracking with online multiple instance learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):16191632, 2011.
[4] C. Bao, Y. Wu, H. Ling, and H. Ji. Real time robust L1 tracker using accelerated proximal gradient approach. In CVPR, pages 18301837, 2012.
[5] A. Doucet, D. N. Freitas, and N. Gordon. Sequential Monte Carlo Methods In Practice. Springer, New York, 2001.
[6] H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, pages 4756, 2006.
[7] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, pages 234247, 2008.
[8] S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, pages 263270, 2011.
[9] G. Hinton. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade, pages 599619. 2012.
[10] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):8297, 2012.
[11] X. Jia, H. Lu, and M. Yang. Visual tracking via adaptive structural local sparse appearance model. In CVPR, pages 18221829, 2012.
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiers by structural constraints. In CVPR, pages 4956, 2010.
[13] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):14091422, 2012.
[14] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, pages 11061114, 2012.
[15] J. Kwon and K. Lee. Visual tracking decomposition. In CVPR, pages 12691276, 2010.
[16] X. Mei and H. Ling. Robust visual tracking using l1 minimization. In ICCV, pages 14361443, 2009.
[17] B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):33113326, 1997.
[18] D. Ross, J. Lim, R. Lin, and M. Yang. Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1):125141, 2008.
[19] A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958 1970, 2008.
[20] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11:33713408, 2010.
[21] D. Wang, H. Lu, and M. Yang. Online object tracking with sparse prototypes. IEEE Transactions on Image Processing, 22(1), 2013.
[22] Q. Wang, F. Chen, J. Yang, W. Xu, and M. Yang. Transferring visual prior for online object tracking.IEEE Transactions on Image Processing, 21(7):32963305, 2012.
[23] Y. Wu, J. Lim, and M. Yang. Online object tracking: A benchmark. In CVPR, 2013.
[24] K. Zhang, L. Zhang, and M.-H. Yang. Real-time compressive tracking. In ECCV, pages 864877, 2012.
[25] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Low-rank sparse learning for robust visual tracking. ECCV, pages 470484, 2012.
[26] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via multi-task sparse learning. In CVPR, pages 20422049, 2012.
-----1
[1] M. Bethge. Factorial coding of natural images: how effective are linear models in removing higher-order dependencies? 23(6):12531268, June 2006.
[2] Michael J. Black and P. Anandan. A framework for the robust estimation of optical flow. In ICCV, pages 231236, 1993.
[3] Daniel J. Butler, Jonas Wulff, Garrett B. Stanley, and Michael J. Black. A naturalistic open source movie for optical flow evaluation. In ECCV (6), pages 611625, 2012.
[4] David J. Fleet, Michael J. Black, Yaser Yacoob, and Allan D. Jepson. Design and use of linear models for image motion analysis. International Journal of Computer Vision, 36(3):171193, 2000.
[5] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, pages 33543361, 2012.
[6] Berthold KP Horn and Brian G Schunck. Determining optical flow. Artificial intelligence, 17(1):185203, 1981.
[7] Bruce D Lucas, Takeo Kanade, et al. An iterative image registration technique with an appli- cation to stereo vision. In Proceedings of the 7th international joint conference on Artificial intelligence, 1981.
[8] Xiaofeng Ren. Local grouping for optical flow. In CVPR, 2008.
[9] Stefan Roth and Michael J. Black. On the spatial statistics of optical flow. International Journal of Computer Vision, 74(1):3350, 2007.
[10] J Sohl-Dickstein and BJ Culpepper. Hamiltonian annealed importance sampling for partition function estimation. 2011.
[11] Deqing Sun, Stefan Roth, and Michael J Black. Secrets of optical flow estimation and their principles. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 24322439. IEEE, 2010.
[12] Li Xu, Zhenlong Dai, and Jiaya Jia. Scale invariant optical flow. In Computer VisionECCV 2012, pages 385399. Springer, 2012.
[13] Henning Zimmer, Andres Bruhn, and Joachim Weickert. Optic flow in harmony. International Journal of Computer Vision, 93(3):368388, 2011.
[14] Daniel Zoran and Yair Weiss. Natural images, gaussian mixtures and dead leaves. In NIPS, pages 17451753, 2012.
-----1
[1] Jonas August and Steven W Zucker. The curve indicator random field: Curve organization via edge correlation. In Perceptual organization for artificial vision systems, pages 265288.Springer, 2000.
[2] A.J. Bell and T.J. Sejnowski. The independent components of natural scenes are edge filters.Vision research, 37(23):33273338, 1997.
[3] O. Ben-Shahar and S. Zucker. Geometrical computations explain projection patterns of long- range horizontal connections in visual cortex. Neural Computation, 16(3):445476, 2004.
[4] William H Bosking, Ying Zhang, Brett Schofield, and David Fitzpatrick. Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. The Journal of Neuroscience, 17(6):21122127, 1997.
[5] Heather J. Chisum, Franois Mooser, and David Fitzpatrick. Emergent properties of layer 2/3 neurons reflect the collinear arrangement of horizontal connections in tree shrew visual cortex.The Journal of Neuroscience, 23(7):29472960, 2003.
[6] James H Elder and Richard M Goldberg. Ecological statistics of gestalt laws for the perceptual organization of contours. Journal of Vision, 2(4), 2002.
[7] J.H. Elder and RM Goldberg. The statistics of natural image contours. In Proceedings of the IEEE Workshop on Perceptual Organisation in Computer Vision. Citeseer, 1998.
[8] WS Geisler, JS Perry, BJ Super, and DP Gallogly. Edge co-occurrence in natural images predicts contour grouping performance. Vision research, 41(6):711724, 2001.
[9] Bruce C Hansen and Edward A Essock. A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. Journal of Vision, 4(12), 2004.
[10] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings: Biological Sciences, 265(1394):359366, Mar 1998.
[11] Norbert Kruger. Collinearity and parallelism are statistically significant second-order relations of complex cell responses. Neural Processing Letters, 8(2):117129, 1998.
[12] Ifije E Ohiorhenuan and Jonathan D Victor. Information-geometric measure of 3-neuron fir- ing patterns characterizes scale-dependence in cortical networks. Journal of computational neuroscience, 30(1):125141, 2011.
[13] Bruno A Olshausen et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607609, 1996.
[14] T.K. Sato, I. Nauhaus, and M. Carandini. Traveling waves in visual cortex. Neuron, 75(2):218 229, 2012.
[15] Elad Schneidman, Michael J Berry, Ronen Segev, and William Bialek. Weak pairwise correla- tions imply strongly correlated network states in a neural population. Nature, 440(7087):1007 1012, 2006.
[16] Gas?per Tkac?ik, Jason S Prentice, Jonathan D Victor, and Vijay Balasubramanian. Local statis- tics in natural scenes predict the saliency of synthetic textures. Proceedings of the National Academy of Sciences, 107(42):1814918154, 2010.
[17] JH Van Hateren. A theory of maximizing sensory information. Biological cybernetics, 68(1):2329, 1992.
[18] William E Vinje and Jack L Gallant. Sparse coding and decorrelation in primary visual cortex during natural vision. Science, 287(5456):12731276, 2000.
-----1
[1] D. Mumford and B. Gidas. Stochastic models for generic images. Q. Appl. Math., 59:85111, 2001.
[2] J. Lucke, R. Turner, M. Sahani, and M. Henniges. Occlusive Components Analysis. NIPS, 22:106977, 2009.
[3] Nicolas LeRoux, Nicolas Heess, Jamie Shotton, and John Winn. Learning a generative model of images by factoring appearance and shape. Neural Computation, 23:593650, 2011.
[4] D. Zoran and Y. Weiss. Natural images, gaussian mixtures and dead leaves. In NIPS, pages 17451753.2012.
[5] J. Bornschein, M. Henniges, and J. Lucke. Are V1 receptive fields shaped by low-level visual occlusions? A comparative study. PLoS Computational Biology, 9(6):e1003062, June 2013.
[6] G. E. Hinton. A parallel computation that assigns canonical object-based frames of reference. In Proc.IJCAI, pages 683685, 1981.
[7] C. H. Anderson and D. C. Van Essen. Shifter circuits: a computational strategy for dynamic aspects of visual processing. PNAS, 84(17):62976301, 1987.
[8] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, and W. Konen.Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3):300311, 1993.
[9] A. Hyvarinen and P. Hoyer. Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation, 12(7):170520, July 2000.
[10] D.W. Arathorn. Map-seeking circuits in visual cognition  a computational mechanism for biological and machine vision. 2002.
[11] J. Lucke, C. Keck, and C. von der Malsburg. Rapid convergence to feature layer correspondences. Neural Computation, 20(10):24412463, 2008.
[12] Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. Proceed- ings of 2010 IEEE International Symposium on Circuits and Systems, pages 2536, 2010.
[13] N. Jojic and B. Frey. Learning flexible sprites in video layers. In CVPR, 2001.
[14] C. K. I. Williams and M. K. Titsias. Greedy learning of multiple objects in images using robust statistics and factorial learning. Neural Computation, 16:103962, 2004.
[15] J. B. Tenenbaum and W. T. Freeman. Separating Style and Content with Bilinear Models. Neural Com- putation, 12(6):124783, June 2000.
[16] P. Berkes, R. E. Turner, and M. Sahani. A structured model of video reproduces primary visual cortical organisation. PLoS computational biology, 5(9):e1000495, 2009.
[17] C. F. Cadieu and B. A. Olshausen. Learning intermediate-level representations of form and motion from natural movies. Neural Computation, 24(4):827866, April 2012.
[18] K. Kavukcuoglu, P. Sermanet, Y.L. Boureau, K. Gregor, M. Mathieu, and Y. LeCun. Learning convolu- tional feature hierarchies for visual recognition. NIPS, 23:14, 2010.
[19] K. Gregor and Y. LeCun. Efficient learning of sparse invariant representations. CoRR, abs/1105.5307, 2011.
[20] J. Eggert, H. Wersing, and E. Korner. Transformation-invariant representation and NMF. In 2004 IEEE International Joint Conference on Neural Networks, pages 253539, 2004.
[21] Z. Dai and J. Lucke. Unsupervised learning of translation invariant occlusive components. In CVPR, pages 24002407. 2012.
[22] J. Lucke and J. Eggert. Expectation truncation and the benefits of preselection in training generative models. Journal of Machine Learning Research, 11:2855900, 2010.
[23] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London B, 265:35966, 1998.
[24] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 211(11):1019  1025, 1999.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, volume 25, pages 11061114, 2012.
[26] M. Rehn and F. T. Sommer. A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields. Journal of Computational Neuroscience, 22(2):13546, 2007.
[27] J. Lucke. Receptive field self-organization in a model of the fine-structure in V1 cortical columns. Neural Computation, 21(10):280545, 2009.
[28] D. L. Ringach. Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of Neurophysiology, 88:45563, 2002.
[29] G. Puertas, J. Bornschein, and J. Lucke. The maximal causes of natural scenes are edge filters. In NIPS, volume 23, pages 19391947. 2010.
[30] A. Szlam, K. Kavukcuoglu, and Y. LeCun. Convolutional matching pursuit and dictionary training. arXiv preprint arXiv:1010.0422, 2010.
[31] B. A. Olshausen, C. F. Cadieu, and D. K. Warland. Learning real and complex overcomplete representa- tions from the statistics of natural images. volume 7446, page 74460S. SPIE, 2009.
[32] B. A. Olshausen. Highly overcomplete sparse coding. In Proc. of HVEI, page 86510S, 2013.
[33] M. Eickenberg, R.J. Rowekamp, M. Kouh, and T.O. Sharpee. Characterizing responses of translation- invariant neurons to natural stimuli: maximally informative invariant dimensions. Neural Comput., 24(9):2384421, 2012.
[34] B. Vintch, A. Zaharia, J.A. Movshon, and E.P. Simoncelli. Efficient and direct estimation of a neural subunit model for sensory coding. In Proc. of NIPS, pages 31133121, 2012.
[35] B. A. Olshausen, C. H. Anderson, and D. C. Van Essen. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. The Journal of Neuroscience, 13(11):47004719, 1993.
[36] R. Memisevic and G. E. Hinton. Learning to represent spatial transformations with factored higher-order boltzmann machines. Neural Computation, 22(6):14731492, 2010.
[37] M.J.D. Powell. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal, 7(2):155162, 1964.
-----1
[1] E. Bazavan, F. Li, and C. Sminchisescu. Fourier kernel learning. In European Conference on Computer Vision, 2012.
[2] A. Borji and L. Itti. State-of-the-art in visual attention modelling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2011.
[3] G. T. Buswell. How People Look at Pictures: A Study of the Psychology of Perception in Art. Chicago University Press, 1935.
[4] N. J. Butko and J. R. Movellan. Infomax control of eye movements. IEEE Transactions on Autonomous Mental Development, 2:91107, 2010.
[5] M. S. Castelhano, M. L. Mack, and J. M. Henderson. Viewing task influences eye movement control during active scene perception. Journal of Vision, 9, 2008.
[6] M. Cerf, E. P. Frady, and C. Koch. Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision, 9, 2009.
[7] M. Cerf, J. Harel, W. Einhauser, and C. Koch. Predicting human gaze using low-level saliency combined with face detection. In Advances in Neural Information Processing Systems, 2007.
[8] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
[9] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society, 1977.
[10] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal- network.org/challenges/VOC/voc2012/workshop/index.html.
[11] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert shifts of visual attention.Vision Research, 40, 2000.
[12] T. Judd, F. Durand, and A. Torralba. Fixations on low resolution images. In IEEE International Confer- ence on Computer Vision, 2009.
[13] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In IEEE International Conference on Computer Vision, 2009.
[14] K.A.Ehinger, B.Sotelo, A.Torralba, and A.Oliva. Modeling search for people in 900 scenes: A combined source model of eye guidance. Visual Cognition, 17, 2009.
[15] W. Kienzle, B. Scholkopf, F. Wichmann, and M. Franz. How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In DAGM, 2007.
[16] M. F. Land and B. W. Tatler. Looking and Acting. Oxford University Press, 2009.
[17] F. Li, G. Lebanon, and C. Sminchisescu. Chebyshev approximations to the histogram ?2 kernel. In IEEE International Conference on Computer Vision and Pattern Recognition, 2012.
[18] E. Marinoiu, D. Papava, and C. Sminchisescu. Pictorial human spaces: How well do humans perceive a 3d articulated pose? In IEEE International Conference on Computer Vision, 2013.
[19] S. Mathe and C. Sminchisescu. Dynamic eye movement datasets and learnt saliency models for visual action recognition. In European Conference on Computer Vision, 2012.
[20] L. W. Renninger, J. Coughlan, P. Verghese, and J. Malik. An information maximization model of eye movements. In Advances in Neural Information Processing Systems, pages 11211128, 2004.
[21] R. Subramanian, H. Katti, N. Sebe, and T.-S. Kankanhalli, M. Chua. An eye fixation database for saliency detection in images. In European Conference on Computer Vision, 2010.
[22] A. Torralba, A. Oliva, M. Castelhano, and J. Henderson. Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113, 2006.
[23] E. Vig, M. Dorr, and D. D. Cox. Space-variant descriptor sampling for action recognition based on saliency and eye movements. In European Conference on Computer Vision, 2012.
[24] S. Winkler and R. Subramanian. Overview of eye tracking datasets. In International Workshop on Quality of Multimedia Experience, 2013.
[25] A. Yarbus. Eye Movements and Vision. New York Plenum Press, 1967.
[26] K. Yun, Y. Pen, D. Samaras, G. J. Zelinsky, and T. L. Berg. Studying relationships between human gaze, description and computer vision. In IEEE International Conference on Computer Vision and Pattern Recognition, 2013.
[27] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning.In AAAI Conference on Artificial Intelligence, 2008.
-----1
[1] M. Blaschko and C. Lampert. Learning to localize objects with structured output regression. ECCV, 2008.
[2] C. Chen and K. Grauman. Efficient activity detection with max-subgraph search. In CVPR, 2012.
[3] K. G. Derpanis, M. Sizintsev, K. Cannons, and R. P. Wildes. Efficient action spotting based on a spacetime oriented structure representation. In CVPR, 2010.
[4] D. V. Essen, B. Olshausen, C. Anderson, and J. Gallant. Pattern recognition, attention, and information bottlenecks in the primate visual system. SPIE Conference on Visual Information Processing: From Neurons to Chips, 1991.
[5] A. Fathi, Y. Li, and J. M. Rehg. Learning to recognize daily actions using gaze. In ECCV, 2012.
[6] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, 2010.
[7] Y. Ke, R. Sukthankar, and M. Hebert. Event detection in crowded videos. In ICCV, 2007.
[8] A. Kovashka and K. Grauman. Learning a Hierarchy of Discriminative Space-Time Neighborhood Fea- tures for Human Action Recognition. In CVPR, 2010.
[9] T. Lan, Y. Wang, and G. Mori. Discriminative figure-centric models for joint action localization and recognition. In ICCV, 2011.
[10] I. Laptev. On space-time interest points. IJCV, 64, 2005.
[11] S. Mathe and C. Sminchisescu. Dynamic eye movement datasets and learnt saliency models for visual action recognition. In ECCV, 2012.
[12] M. Raptis, I. Kokkinos, and S. Soatto. Discovering discriminative action parts from mid-level video representations. In CVPR, 2012.
[13] M. Raptis and L. Sigal. Poselet key-framing: A model for human activity recognition. In CVPR, 2013.
[14] M. Rodriguez, J. Ahmed, and M. Shah. Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, 2008.
[15] M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recogni- tion of complex human activities. In ICCV, 2009.
[16] N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori. Similarity constrained latent support vector machine: An application to weakly supervised action classification. In ECCV, 2012.
[17] P. Siva and T. Xiang. Weakly supervised action detection. In BMVC, 2011.
[18] D. Tran and J. Yuan. Optimal spatio-temporal path discovery for video event detection. In CVPR, 2011.
[19] D. Tran and J. Yuan. Max-margin structured output regression for spatio-temporal action localization. In NIPS, 2012.
[20] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea. Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11):14731488, 2008.
[21] E. Vig, M. Dorr, and D. Cox. Space-variant descriptor sampling for action recognition based on saliency and eye movements. In ECCV, 2012.
[22] H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In BMVC, 2009.
[23] Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic vs. max-margin.IEEE PAMI, 2010.
[24] D. Weinland, R. Ronfard, and E. Boyer. A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, 115(2):224241, 2011.
[25] J. Yang and M.-H. Yang. Top-down visual saliency via joint crf and dictionary learning. In CVPR, 2012.
[26] C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, 2009.
[27] J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In CVPR, 2009.
[28] J. Yuan, Z. Liu, and Y. Wu. Discriminative video pattern search for efficient action detection. IEEE PAMI, 33(9), 2011.
-----1
[1] Barron, J.T. & Malik, J. (2012) Shape, albedo, and illumination from a single image of an unknown object.In IEEE CVPR, pp. 334-341. Providence, USA.
[2] Barron, J.T. & Malik, J. (2012) Color constancy, intrinsic images, and shape estimation. In ECCV, pp.57-70. Florence, Italy.
[3] Barron, J.T. & Malik, J. (2012) High-frequency shape and albedo from shading using natural image statis- tics. In IEEE CVPR, pp. 2521-2528. CO, USA.
[4] Barron, J., & Malik, J. (2013) Intrinsic scene properties from a single RGB-D image. In IEEE CVPR.
[5] Gehler, P.V., Rother, C., Kiefel, M., Zhang, L. & Bernhard, S. (2011) Recovering intrinsic images with a global sparsity prior on reflectance. In NIPS, pp. 765-773. Granada, Spain.
[6] Farhadi, A., Endres, I., Hoiem, D. & Forsyth D.A., (2009) Describing objects by their attributes. In IEEE CVPR, pp. 1778-1785. Miami, USA.
[7] Kohli, P., Kumar, M.P., & Torr, P.H.S. (2009) P & beyond: move making algorithms for solving higher order functions. In IEEE PAMI, pp. 1645-1656.
[8] Ladicky, L., Sturgess, P., Russell C., Sengupta, S., Bastnlar, Y., Clocksin, W.F., & Torr P.H.S. (2012) Joint optimization for object class segmentation and dense stereo reconstruction. In IJCV, pp. 739-746.
[9] Ladicky, L., Russell C., Kohli P. & Torr P.H.S., (2009) Associative hierarchical CRFs for object class image segmentation. In IEEE ICCV, pp. 739-746. Kyoto, Japan.
[10] Sloan, P.P., Kautz, J., & Snyder, J., (2002) Precomputed radiance transfer for real-time rendering in dy- namic, low-frequency lighting environments. In SIGGRAPH, pp. 527-536.
[11] Vineet, V., Warrell J., & Torr P.H.S., (2012) Filter-based mean-field inference for random fields with higher-order terms and product label-spaces . In IEEE ECCV, pp. 31-44. Florence, Italy.
[12] Krahenbuhl P. & Koltun V., (2011) Efficient inference in fully connected CRFs with Gaussian edge poten- tials. In IEEE NIPS, pp. 109-117. Granada, Spain.
[13] Barrow, H.G. & Tenenbaum, J.M. (1978) Recovering intrinsic scene characteristics from images. In A.Hanson and E. Riseman, editors, Computer Vision Systems, pp. 3-26. Academic Press, 1978.
[14] Weijer, J.V.d., Schmid, C. & Verbeek, J. (2007) Using high-level visual information for color constancy.In IEEE, ICCV, pp. 1-8.
[15] Liu, C., Sharan, L., Adelson, E.H., & Rosenholtz, R. (2010) Exploring features in a bayesian framework for material recognition. In IEEE, CVPR, pp. 239-246.
[16] Horn, B.K.P. (1970) Shape from shading: a method for obtaining the shape of a smooth opaque object from one view. Technical Report, MIT.
[17] Land, E.H., & McCann, J.J. (1971) Lightness and retinex theory. In JOSA.
[18] Osadchy, M., Jacobs, D.W. & Ramamoorthi, R. (2003) Using specularities for recognition . In IEEE ICCV.
[19] Adelson, E.H. (2000) Lightness perception and lightness illusions. The New Cognitive Neuroscience, 2nd Ed. MIT Press, pp. 339-351.
[20] Adelson, E.H. (2001) On seeing stuff: the perception of materials by humans and machines. SPIE, vol.4299, pp. 1-12.
[21] Felzenswalb, P.F., & Huttenlocker, D.P. (2004) Efficient graph-based image segmentation. In IJCV.
[22] Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2003) TextonBoost for Image Understanding: Multi- Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. In IEEE IJCV.
[23] Tighe, J. & Lazebnik, S. (2011) Understanding scenes on many levels. In IEEE ICCV pp. 335-342.
[24] LeCun, Y., Huang, F.J., & Bottou, L. (2004) Learning methods for generic object recognition with invari- ance to pose and lighting. In IEEE CVPR pp. 97-104.
[25] Silberman, N., Hoim, D., Kohli, P., & Fergus, R. (2012) Indoor segmentation and support inference from RGBD images. In IEEE ECCV pp. 746-760.
[26] Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M. & Zisserman, A. (2010) The pascal visual object classes (VOC) challenge. In IEEE IJCV pp. 303-338.
[27] Cheng, M. M., Zheng, S., Lin, W.Y., Warrell, J., Vineet, V., Sturgess, P., Mitra, N., Crook, N., & Torr, P.H.S. (2013) ImageSpirit: Verbal Guided Image Parsing. Oxford Brookes Technical Report.
[28] Domj, Q. T., Necoara, I., & Diehl, M. (2013) Fast Inexact Decomposition Algorithms for Large-Scale Separable Convex Optimization. In JOTA.
[29] Kohli, P., Ladicky, L., & Torr, P.H.S. (2008) on. In IEEE CVPR, 2008.
-----1
[1] Y. Amit and D. Geman. Randomized inquiries about shape; an application to handwritten digit recogni- tion. Technical Report 401, Dept. of Statistics, University of Chicago, IL, Nov 1994.
[2] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:17051749, Oct. 2005.
[3] D. Benbouzid, R. Busa-Fekete, and B. Kegl. Fast classification using sparse decision DAGs. In Proc. Intl Conf. on Machine Learning (ICML), New York, NY, USA, 2012. ACM.
[4] K. P. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu. Enlarging the margins in perceptron decision trees. Machine Learning, 41(3):295313, 2000.
[5] L. Breiman. Random forests. Machine Learning, 45(1), 2001.
[6] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman and Hall/CRC, 1984.
[7] H. Chipman, E. I. George, and R. E. Mcculloch. Bayesian CART model search. Journal of the American Statistical Association, 93:935960, 1997.
[8] P. Chou. Optimal partitioning for classification and regression trees. IEEE Trans. PAMI, 13(4), 1991.
[9] A. Criminisi and J. Shotton. Decision Forests for Computer Vision and Medical Image Analysis. Springer, 2013.
[10] T. Elomaa and M. Kaariainen. On the practice of branching program boosting. In European Conf. on Machine Learning (ECML), 2001.
[11] M. Everingham, L. van Gool, C. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. http://www.pascal-network.org/challenges/VOC/.
[12] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Proc. IEEE ICCV, 2009.
[13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.
[14] E. B. Hunt, J. Marin, and P. T. Stone. Experiments in Induction. Academic Press, New York, 1966.
[15] L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):1517, 1976.
[16] B. Kijsirikul, N. Ussivakul, and S. Meknavin. Adaptive directed acyclic graphs for multiclass classifica- tion. In Pacific Rim Intl Conference on Artificial Intelligence (PRICAI), 2002.
[17] R. Kohavi and C.-H. Li. Oblivious decision trees, graphs, and top-down pruning. In Intl Joint Conf. on Artifical Intelligence (IJCAI), 1995.
[18] P. Kontschieder, P. Kohli, J. Shotton, and A. Criminisi. GeoF: Geodesic forests for learning coupled predictors. In Proc. IEEE CVPR, 2013.
[19] V. Lepetit and P. Fua. Keypoint recognition using randomized trees. IEEE Trans. PAMI, 2006.
[20] J. Mahoney and R. J. Mooney. Initializing ID5R with a domain theory: some negative results. Technical Report 91-154, Dept. of Computer Science, University of Texas, Austin, TX, 1991.
[21] K. V. S. Murthy and S. L. Salzberg. On growing better decision trees from data. PhD thesis, John Hopkins University, 1995.
[22] D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases. Technical Report 28, University of California, Irvine, Department of Information and Computer Science, 1998.
[23] A. L. Oliveira and A. Sangiovanni-Vincentelli. Using the minimum description length principle to infer reduced ordered decision graphs. Machine Learning, 12, 1995.
[24] J. J. Oliver. Decision graphs  an extension of decision trees. Technical Report 92/173, Dept. of Computer Science, Monash University, Victoria, Australia, 1992.
[25] A. H. Peterson and T. R. Martinez. Reducing decision trees ensemble size using parallel decision DAGs.Intl Journ. on Artificial Intelligence Tools, 18(4), 2009.
[26] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In Proc.NIPS, pages 547553, 2000.
[27] J. R. Quinlan. Induction of decision trees. Machine Learning, 1986.
[28] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[29] J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake. Efficient human pose estimation from single depth images. IEEE Trans. PAMI, 2013.
-----1
[1] Jiang, J.: A literature survey on domain adaptation of statistical classifiers. (2008) 
[2] Caruana, R.: Multitask Learning. Machine Learning 28 (1997) 
[3] Evgeniou, T., Micchelli, C., Pontil, M.: Learning Multiple Tasks with Kernel Methods. JMLR 6 (2005) 
[4] Bach, F.R., Jordan, M.I.: Kernel Independent Component Analysis. JMLR 3 (2002) 148 
[5] Ek, C.H., Torr, P.H., Lawrence, N.D.: Ambiguity Modelling in Latent Spaces. In: MLMI.(2008) 
[6] Salzmann, M., Ek, C.H., Urtasun, R., Darrell, T.: Factorized Orthogonal Latent Spaces. In: AISTATS. (2010) 
[7] Memisevic, R., Sigal, L., Fleet, D.J.: Shared Kernel Information Embedding for Discrimina- tive Inference. PAMI (April 2012) 778790 
[8] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001) 
[9] Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Sun, G.: A General Boosting Method and Its Application to Learning Ranking Functions for Web Search. In: NIPS. (2007) 
[10] Chapelle, O., Shivaswamy, P., Vadrevu, S., Weinberger, K., Zhang, Y., Tseng, B.: Boosted Multi-Task Learning. Machine Learning (2010) 
[11] Turetken, E., Benmansour, F., Fua, P.: Automated Reconstruction of Tree Structures Using Path Classifiers and Mixed Integer Programming. In: CVPR. (June 2012) 
[12] Baxter, J.: A Model of Inductive Bias Learning. Journal of Artificial Intelligence Research (2000) 
[13] Ando, R.K., Zhang, T.: A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR 6 (2005) 18171853 
[14] Daume, H.: Bayesian Multitask Learning with Latent Hierarchies. In: UAI. (2009) 
[15] Kumar, A., Daume, H.: Learning Task Grouping and Overlap in Multi-task Learning. In: ICML. (2012) 
[16] Xue, Y., Liao, X., Carin, L., Krishnapuram, B.: Multi-task Learning for Classification with Dirichlet Process Priors. JMLR 8 (2007) 
[17] Jacob, L., Bach, F., Vert, J.P.: Clustered Multi-task Learning: a Convex Formulation. In: NIPS. (2008) 
[18] Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting Visual Category Models to New Do- mains. In: ECCV. (2010) 
[19] Shon, A.P., Grochow, K., Hertzmann, A., Rao, R.P.N.: Learning Shared Latent Structure for Image Synthesis and Robotic Imitation. In: NIPS. (2006) 12331240 
[20] Kulis, B., Saenko, K., Darrell, T.: What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms. In: CVPR. (2011) 
[21] Gopalan, R., Li, R., Chellappa, R.: Domain Adaptation for Object Recognition: An Unsuper- vised Approach. In: ICCV. (2011) 
[22] Rosset, S., Zhu, J., Hastie, T.: Boosting as a Regularized Path to a Maximum Margin Classifier.JMLR (2004) 
[23] Caruana, R., Niculescu-Mizil, A.: An Empirical Comparison of Supervised Learning Algo- rithms. In: ICML. (2006) 
[24] Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: CVPR. (2001) 
[25] Ali, K., Fleuret, F., Hasler, D., Fua, P.: A Real-Time Deformable Detector. PAMI 34(2) (February 2012) 225239 
[26] Freund, Y., Schapire, R.: A Short Introduction to Boosting (1999) Journal of Japanese Society for Artificial Intelligence, 14(5):771-780.
[27] Becker, C., Ali, K., Knott, G., Fua, P.: Learning Context Cues for Synapse Segmentation. TMI (2013) In Press.
-----1
[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. Slic superpixels compared to state- of-the-art superpixel methods. IEEE TPAMI, 2012.
[2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation.IEEE TPAMI, 2010.
[3] E. Bertin. Global fuctuations and gumbel statistics. Physical Review Letters, 2005.
[4] E. Bertin and M. Clusel. Generalised extreme value statistics and sum of correlated variables. Journal of Pnysics A, 2006.
[5] M. J. Bravo and H. Farid. Search for a category target in clutter. Perception, 2004.
[6] M. J. Bravo and H. Farid. A scale invariant measure of clutter. Journal of Vision, 2008.
[7] G. J. Burghouts, A. W. M. Smeulders, and J.-M. Geusebroek. The distribution family of similarity dis- tances. In NIPS, 2007.
[8] J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. In CVPR, 2010.
[9] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE TPAMI, 2002.
[10] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[11] P. F. Felzenszwalb and H. D. P. Efficient graph-based image segmentation. In ICCV, 2004.
[12] J.-M. Geusebroek and A. W. Smeulders. A six-stimulus theory for stochastic texture. IJCV, 2005.
[13] J. M. Henderson, M. Chanceaux, and T. J. Smith. The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements. Journal of Vision, 2009.
[14] A. Ion, J. Carreira, and C. Sminchisescu. Image segmentation by figure-ground composition into maximal cliques. In ICCV, 2011.
[15] A. J. Izenman. Recent developments in nonparametric density estimation. Journal of the American Statistical Association, 1991.
[16] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal on Optimization, 1998.
[17] E. Levina and P. Bickel. The earth movers distance is the mallows distance: some insights from statistics.In ICCV, 2001.
[18] M. C. Lohrenz, J. G. Trafton, R. M. Beck, and M. L. Gendron. A model of clutter for complex, multivari- ate geospatial displays. Human Factors, 2009.
[19] M. L. Mack and A. Oliva. Computational estimation of visual complexity. In the 12th Annual Object, Perception, Attention, and Memory Conference, 2004.
[20] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
[21] M. B. Neider and G. J. Zelinsky. Cutting through the clutter: searching for targets in evolving complex scenes. Journal of Vision, 2011.
[22] J. A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 1965.
[23] S. R. Rao, H. Mobahi, A. Y. Yang, S. Sastry, and Y. Ma. Natural image segmentation with adaptive texture and boundary encoding. In ACCV, 2009.
[24] R. A. Rensink. Seeing, sensing, and scrutinizing. Vision Research, 2000.
[25] R. Rosenholtz, Y. Li, and L. Nakano. Measuring visual clutter. Journal of Vision, 2007.
[26] Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases.In ICCV, 1998.
[27] T. Steihaug. The conjugate gradient method and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 1983.
[28] P. L. Toint. Towards an efficient sparsity exploiting newton method for minimization. Sparse Matrices and Their Uses, 1981.
[29] R. van den Berg, F. W. Cornelissen, and J. B. T. M. Roerdink. A crowding model of visual clutter. Journal of Vision, 2009.
[30] O. Veksler, Y. Boykov, and P. Mehrani. Superpixels and supervoxels in an energy optimization framework.In ECCV, 2010.
[31] M. Wischnewski, A. Belardinelli, W. X. Schneider, and J. J. Steil. Where to look next? combining static and dynamic proto-objects in a tva-based model of visual attention. Cognitive Computation, 2010.
[32] J. M. Wolfe. Visual search. Attention, 1998.
[33] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
[34] A. Y. Yang, J. Wright, Y.Ma, and S. Sastry. Unsupervised segmentation of natural images via lossy data compression. CVIU, 2008.
[35] V. Yanulevskaya and J.-M. Geusebroek. Significance of the Weibull distribution and its sub-models in natural image statistics. In Int. Conference on Computer Vision Theory and Applications, 2009.
-----1
[1] H. E. Cetingul and R. Vidal. Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds. In CVPR, 2009.
[2] Y. Cheng. Mean shift, mode seeking, and clustering. PAMI, 17(8):790799, 1995.
[3] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. In CVPR, 2000.
[4] D. Comaniciu, V. Ramesh, and P. Meer. The variable bandwidth mean shift and data-driven scale selec- tion. In ICCV, 2001.
[5] C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros. What makes Paris look like Paris? SIGGRAPH, 2012.
[6] I. Endres, K. Shih, J. Jiaa, and D. Hoiem. Learning collections of part models for object recognition. In CVPR, 2013.
[7] D. F. Fouhey, A. Gupta, and M. Hebert. Data-driven 3D primitives for single image understanding. In ICCV, 2013.
[8] K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, 1975.
[9] B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification example. In CVPR, 2003.
[10] B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification.In ECCV, 2012.
[11] A. Jain, A. Gupta, M. Rodriguez, and L. Davis. Representing videos using mid-level discriminative patches. In CVPR, 2013.
[12] M. Juneja, A. Vedaldi, C. V. Jawahar, and A. Zisserman. Blocks that shout: Distinctive parts for scene classification. In CVPR, 2013.
[13] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, 2012.
[14] L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. NIPS, 2010.
[15] Q. Li, J. Wu, and Z. Tu. Harvesting mid-level visual concepts from large-scale internet images. In CVPR, 2013.
[16] T. Malisiewicz and A. A. Efros. Recognition by association via learning per-exemplar distances. In CVPR, 2008.
[17] M. Pandey and S. Lazebnik. Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, 2011.
[18] S. N. Parizi, J. G. Oberlin, and P. F. Felzenszwalb. Reconfigurable models for scene recognition. In CVPR, 2012.
[19] A. Quattoni and A. Torralba. Recognizing indoor scenes. In CVPR, 2009.
[20] M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In ICML, 2009.
[21] B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006.
[22] F. Sadeghi and M. F. Tappen. Latent pyramidal regions for recognizing scenes. In ECCV. 2012.
[23] S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012.
[24] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, 2003.
[25] M. Sugiyama, T. Suzuki, and T. Kanamori. Density ratio estimation: A comprehensive review. RIMS Kokyuroku, 2010.
[26] J. Sun and J. Ponce. Learning discriminative part detectors for image classification and cosegmentation.In ICCV, 2013.
[27] X. Wang, B. Wang, X. Bai, W. Liu, and Z. Tu. Max-margin multiple-instance dictionary learning. In ICML, 2013.
[28] J. Wu and J. M. Rehg. Centrist: A visual descriptor for scene categorization. PAMI, 2011.
[29] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In NIPS, 2004.
[30] J. Zhu, L.-J. Li, L. Fei-Fei, and E. P. Xing. Large margin learning of upstream scene understanding models. NIPS, 2010.
-----1
[1] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion.Journal of the Optical Society of America A Optics and image science, 2(2):28499, 1985.
[2] K. R. Brooks, T. Morris, and P. Thompson. Contrast and stimulus complexity moderate the relationship between spatial frequency and perceived speed: Implications for MT models of speed perception. Journal of Vision, 11(14), 2011.
[3] M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13(1):5162, 2012.
[4] M. O. Ernst and M. S. Banks. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870):42933, 2002.
[5] N. Graham and J. Nachmias. Detection of grating patterns containing two spatial frequencies: a comparison of single-channel and multiple-channel models. Vision Research, pages 251259, 1971.
[6] D. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9(2):181197, 1992.
[7] J. M. Hillis, S. J. Watt, M. S. Landy, and M. S. Banks. Slant from texture and disparity cues : Optimal cue combination. Journal of Vision, 4(12):967992, 2004.
[8] F. Hurlimann, D. C. Kiper, and M. Carandini. Testing the Bayesian model of perceived speed.Vision Research, 42:22532257, 2002.
[9] H. Nover, C. H. Anderson, and G. C. DeAngelis. A logarithmic, scale-invariant representation of speed in macaque middle temporal area accounts for speed discrimination performance. J.Neurosci, 25:1004960, 2005.
[10] N. J. Priebe and S. G. Lisberger. Estimating target speed from the population response in visual area MT. Journal of Neuroscience, 24(8):19071916, 2004.
[11] T. D. Sanger. Probability density estimation for the interpretation of neural population codes.J. Neurophysiology, 76(4):279093, 1996.
[12] E. P. Simoncelli and D. J. Heeger. A model of neuronal responses in visual area MT. Vision Research, 38(5):743761, 1998.
[13] E. P. Simoncelli and O. Schwartz. Modeling surround suppression in V1 neurons with a statistically-derived normalization model. Advances in Neural Information Processing Sys- tems (NIPS), 11, 1999.
[14] A. T. Smith and G. K. Edgar. Perceived speed and direction of complex gratings and plaids.Journal of the Optical Society of America A Optics and image science, 8(7):11611171, 1991.
[15] A. A. Stocker. Analog integrated 2-D optical flow sensor. Analog Integrated Circuits and Signal Processing, 46(2):121138, February 2006.
[16] A. A. Stocker and E. P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci, 4(9):57885, 2006.
[17] L. S. Stone and P. Thompson. Human speed perception is contrast dependent. Vision Research, 32(8):15351549, 1992.
[18] Y. Weiss, E. P. Simoncelli, and E. H. Adelson. Motion illusions as optimal percepts. Nature Neuroscience, 5(6):598604, 2002.
-----1
[1] S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, NIPS, 2010.
[2] Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model. Journal of Machine Learning Research, 3:11371155, 2003.
[3] A. Coates and A. Ng. The importance of encoding versus training with sparse coding and vector quanti- zation. In International Conference on Machine Learning (ICML), 2011.
[4] Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, MarcAurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, NIPS, 2012.
[5] Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yag- nik. Fast, accurate detection of 100,000 object classes on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[6] Jia Deng, Alex Berg, Sanjeev Satheesh, Hao Su, Aditya Khosla, and Fei-Fei Li. Imagenet large scale visual recognition challenge 2012.
[7] Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[8] Thomas Deselaers and Vittorio Ferrari. Visual and semantic similarity in imagenet. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
[9] J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:21212159, 2011.
[10] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdi- nov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, NIPS, 2012.
[12] Thomas Mensink, Jakob Verbeek, Florent Perronnin, and Gabriela Csurka. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In European Conference on Computer Vision (ECCV), 2012.
[13] Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. Efficient estimation of word represen- tations in vector space. In International Conference on Learning Representations (ICLR), Scottsdale, Arizona, USA, 2013.
[14] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, NIPS, 2013.
[15] Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Jonathon Shlens, Andrea Frome, Greg S. Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv (to be submitted), 2013.
[16] Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, and Tom M. Mitchell. Zero-shot learning with semantic output codes. In Advances in Neural Information Processing Systems, NIPS, 2009.
[17] Marcus Rohrbach, Michael Stark, and Bernt Schiele. Evaluating knowledge transfer and zero-shot learn- ing in a large-scale setting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
[18] R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. D. Manning, and A. Y. Ng. Zero-shot learning through cross-modal transfer. In International Conference on Learning Representations (ICLR), Scottsdale, Ari- zona, USA, 2013.
[19] L.J.P. van der Maaten and G.E. Hinton. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9:25792605, 2008.
[20] Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):2135, 2010.
-----1
[1] J. T. Abbott, J. L. Austerweil, and T. L. Griffiths. Constructing a hypothesis space from the Web for large-scale Bayesian word learning. In Proceedings of the 34th Annual Conference of the Cognitive Science Society, 2012.
[2] A. Berg, J. Deng, and L. Fei-Fei. ILSVRC 2010. http://www.image- net.org/challenges/LSVRC/2010/.
[3] S. Carey. The child as word learner. Linguistic Theory and Psychological Reality, 1978.
[4] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierar- chical image database. In CVPR, 2009.
[5] J. Deng, J. Krause, A. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy- specificity trade-offs in large scale visual recognition. In CVPR, 2012.
[6] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:21212159, 2010.
[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes (VOC) challenge. IJCV, 88(2):303338, 2010.
[8] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In CVPR, 2009.
[9] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[10] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.
[11] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang. Large-scale image classification: fast feature extraction and svm training. In CVPR, 2011.
[12] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145175, 2001.
[13] D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.
[14] A. Quattoni, M. Collins, and T. Darrell. Transfer learning for image classification with sparse prototype representations. In CVPR, 2008.
[15] E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, and P. Boyes-Braem. Basic objects in natural categories. Cognitive psychology, 8(3):382439, 1976.
[16] R. Salakhutdinov, A. Torralba, and J.B. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011.
[17] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:13171323, 1987.
[18] J. B. Tenenbaum. Bayesian modeling of human concept learning. In NIPS, 1999.
[19] J. B. Tenenbaum. Rules and similarity in concept learning. In NIPS, 2000.
[20] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Be- havioral and Brain Sciences, 24(4):629640, 2001.
[21] F. Xu and J.B. Tenenbaum. Word learning as Bayesian inference. Psychological Review, 114(2):245272, 2007.
-----1
[1] T. Poggio, J. Mutch, F. Anselmi, J. Z. Leibo, L. Rosasco, and A. Tacchetti, The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work), MIT-CSAIL-TR-2012- 035, 2012.
[2] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, Labeled faces in the wild: A database for study- ing face recognition in unconstrained environments, in Workshop on faces in real-life images: Detection, alignment and recognition (ECCV), (Marseille, Fr), 2008.
[3] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, Attribute and Simile Classifiers for Face Verification, in IEEE International Conference on Computer Vision (ICCV), (Kyoto, JP), Oct. 2009.
[4] N. Pinto, Z. Stone, T. Zickler, and D. D. Cox, Scaling-up Biologically-Inspired Computer Vision: A Case-Study on Facebook, in IEEE Computer Vision and Pattern Recognition, Workshop on Biologically Consistent Vision, 2011.
[5] S. Dali, The persistence of memory (1931). Museum of Modern Art, New York, NY.
[6] A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in neural information processing systems, vol. 25, (Lake Tahoe, CA), 2012.
[7] O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 42774280, 2012.
[8] C. F. Cadieu, H. Hong, D. Yamins, N. Pinto, N. J. Majaj, and J. J. DiCarlo, The neural representation benchmark and its evaluation on brain and machine, arXiv preprint arXiv:1301.3530, 2013.9Note: Our method of testing does not strictly conform to the protocol recommended by the creators of LFW [2]: we re-aligned (worse) the faces. We also use the identities of the individuals during training.10The original PubFig dataset was only provided as a list of URLs from which the images could be down- loaded. Now only half the images remain available. On the original dataset, the strongest performance reported is 78.7% [3]. The authors of that study also made their features available, so we estimated the performance of their features on the available subset of images (using SVM). We found that an SVM classifier, using their features, and our cross-validation splits gets 78.4% correct3.3% lower than our best model.
[9] P. Foldiak, Learning invariance from transformation sequences, Neural Computation, vol. 3, no. 2, pp. 194200, 1991.
[10] L. Wiskott and T. Sejnowski, Slow feature analysis: Unsupervised learning of invariances, Neural com- putation, vol. 14, no. 4, pp. 715770, 2002.
[11] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, What is the best multi-stage architecture for object recognition?, IEEE International Conference on Computer Vision, pp. 21462153, 2009.
[12] J. Z. Leibo, J. Mutch, L. Rosasco, S. Ullman, and T. Poggio, Learning Generic Invariances in Object Recognition: Translation and Scale, MIT-CSAIL-TR-2010-061, CBCL-294, 2010.
[13] A. Saxe, P. W. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Y. Ng, On random weights and unsupervised feature learning, Proceedings of the International Conference on Machine Learning (ICML), 2011.
[14] M. Riesenhuber and T. Poggio, Hierarchical models of object recognition in cortex, Nature Neuro- science, vol. 2, pp. 10191025, Nov. 1999.
[15] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, Robust Object Recognition with Cortex- Like Mechanisms, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411426, 2007.
[16] K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recog- nition unaffected by shift in position, Biological Cybernetics, vol. 36, pp. 193202, Apr. 1980.
[17] Y. LeCun, F. J. Huang, and L. Bottou, Learning methods for generic object recognition with invariance to pose and lighting, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 9097, IEEE, 2004.
[18] E. Bart and S. Ullman, Class-based feature matching across unrestricted transformations, Pattern Anal- ysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 9, pp. 16181631, 2008.
[19] N. Pinto, Y. Barhomi, D. Cox, and J. J. DiCarlo, Comparing state-of-the-art visual features on invariant object recognition tasks, in Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pp. 463 470, IEEE, 2011.
[20] T. Vetter, A. Hurlbert, and T. Poggio, View-based models of 3D object recognition: invariance to imaging transformations, Cerebral Cortex, vol. 5, no. 3, p. 261, 1995.
[21] J. Z. Leibo, J. Mutch, and T. Poggio, Why The Brain Separates Face Recognition From Object Recogni- tion, in Advances in Neural Information Processing Systems (NIPS), (Granada, Spain), 2011.
[22] H. Kim, J. Wohlwend, J. Z. Leibo, and T. Poggio, Body-form and body-pose recognition with a hierar- chical model of the ventral stream, MIT-CSAIL-TR-2013-013, CBCL-312, 2013.
[23] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 886-893, 2005.
[24] E. Oja, Simplified neuron model as a principal component analyzer, Journal of mathematical biology, vol. 15, no. 3, pp. 267273, 1982.
[25] A. Afraz, M. V. Pashkam, and P. Cavanagh, Spatial heterogeneity in the perception of face and form attributes, Current Biology, vol. 20, no. 23, pp. 21122116, 2010.
[26] J. Z. Leibo, Q. Liao, and T. Poggio, Subtasks of Unconstrained Face Recognition, in International Joint Conference on Computer Vision, Imaging and Computer Graphics, VISIGRAPP, (Portugal), 2014.
[27] T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture clas- sification with local binary patterns, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 7, pp. 971987, 2002.
[28] X. Tan and B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, in Analysis and Modeling of Faces and Gestures, pp. 168182, Springer, 2007.
[29] V. Ojansivu and J. Heikkila, Blur insensitive texture classification using local phase quantization, in Image and Signal Processing, pp. 236243, Springer, 2008.
[30] X. Zhu and D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (Providence, RI), pp. 28792886, 2012.
[31] S. u. Hussain, T. Napoleon, and F. Jurie, Face recognition using local quantized patterns, in Proc. British Machine Vision Conference (BMCV), vol. 1, (Guildford, UK), pp. 5261, 2012.
[32] M. Kouh and T. Poggio, A canonical neural circuit for cortical nonlinear operations, Neural computa- tion, vol. 20, no. 6, pp. 14271451, 2008.
[33] D. Hubel and T. Wiesel, Receptive fields, binocular interaction and functional architecture in the cats visual cortex, The Journal of Physiology, vol. 160, no. 1, p. 106, 1962.
-----1
[1] Narendra Ahuja and Sinisa Todorovic. Learning the taxonomy and models of categories present in arbi- trary images. In International Conference on Computer Vision, 2007.
[2] Yoshua Bengio. Learning deep architectures for ai. Foundations and Trends R in Machine Learning, 2(1):1127, 2009.
[3] Dan Ciresan, Alessandro Giusti, Juergen Schmidhuber, et al. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in Neural Information Processing Systems 25, 2012.
[4] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005.
[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Computer Vision and Pattern Recognition, 2009.
[6] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. In Conference on Learning Theory. ACL, 2010.
[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303338, 2010.
[8] Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):19151929, 2013.
[9] Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 32(9):16271645, 2010.
[10] Sanja Fidler and Ales? Leonardis. Towards scalable representations of object categories: Learning a hier- archy of parts. In Computer Vision and Pattern Recognition, 2007.
[11] R. B. Girshick, P. F. Felzenszwalb, and D. McAllester. Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/ rbg/latent-release5/.
[12] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural net- works. Science, 313(5786):504507, 2006.
[13] Iasonas Kokkinos and Alan Yuille. Inference and learning with hierarchical shape models. International Journal of Computer Vision, 93(2):201225, 2011.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 2012.
[15] Quoc V Le, MarcAurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S Corrado, Jeff Dean, and Andrew Y Ng. Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning, 2012.
[16] Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995.
[17] Jorge Sanchez and Florent Perronnin. High-dimensional signature compression for large-scale image classification. In Computer Vision and Pattern Recognition, 2011.
[18] Hannes Schulz and Sven Behnke. Object-class segmentation using deep convolutional neural networks.In Proceedings of the DAGM Workshop on New Challenges in Neural Computation, 2011.
[19] Long Zhu, Yuanhao Chen, Alan Yuille, and William Freeman. Latent hierarchical structural learning for object detection. In Computer Vision and Pattern Recognition, 2010.
[20] Song Chun Zhu and David Mumford. A stochastic grammar of images. Computer Graphics and Vision, 2(4):259362, 2007.
-----1
[1] T. Malisiewicz and A. Gupta and A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. In International Conference on Computer Vision, 2011.
[2] P. F. Felzenszwalb and R. B. Girshick and D. McAllester and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Confer- ence on Computer Vision and Pattern Recognition, 2005.
[4] H. Rowley and S. Baluja and T. Kanade. Neural Network-Based Face Detection. In IEEE Transactions On Pattern Analysis and Machine intelligence, 1998.
[5] P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features in Con- ference on Computer Vision and Pattern Recognition, 2001 
[6] R. Sznitman, C. Becker, F. Fleuret, and P. Fua. Fast Object Detection with Entropy-Driven Evaluation. in Conference on Computer Vision and Pattern Recognition, 2013 
[7] P. F. Felzenszwalb and R. B. Girshick and D. McAllester. Cascade Object Detection with De- formable Part Models. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[8] M. Pedersoli and J. Gonzalez and A. Bagdanov and and JJ. Villanueva. Recursive Coarse-to- Fine Localization for fast Object Detection. In European Conference on Computer Vision, 2010.
[9] C. Dubout and F. Fleuret. Exact Acceleration of Linear Object Detectors. In European Confer- ence on Computer Vision, 2012.
[10] T. Dean and M. Ruzon and M. Segal and J. Shlens and S. Vijayanarasimhan and J. Yagnik.Fast, Accurate Detection of 100,000 Object Classes on a Single Machine. In IEEE Conference on Computer Vision and Pattern Recognition, 2013.
[11] P. Indyk and R. Motwani. Approximate nearest neighbours: Towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, 1998.
[12] A. Vedaldi and A. Zisserman. Sparse Kernel Approximations for Efficient Classification and Detection In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[13] S. Maji and A. Berg, J. Malik. Efficient Classification for Additive Kernel SVMs. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
[14] I. Kokkinos. Bounding Part Scores for Rapid Detection with Deformable Part Models In 2nd Parts and Attributes Workshop, in conjunction with ECCV, 2012.
[15] Herv Jgou and Matthijs Douze and Cordelia Schmid. Product quantization for nearest neigh- bour search. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
[16] R. M. Gray and D. L. Neuhoff. Quantization. In IEEE Transactions on Information Theory, 1998.
[17] S. Singh, and A. Gupta and A. Efros. Unsupervised Discovery of Mid-level Discriminative Patches. In European Conference on Computer Vision, 2012.
[18] I. Endres and K. Shih and J. Jiaa and D. Hoiem. Learning Collections of Part Models for Object Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2013.
[19] C. Vondrick and A. Khosla and T. Malisiewicz and A. Torralba. Inverting and Visualizing Features for Object Detection. In arXiv preprint arXiv:1212.2278, 2012.
[20] X. Ren and D. Ramanan. Histograms of Sparse Codes for Object Detection. In IEEE Confer- ence on Computer Vision and Pattern Recognition, 2013.
[21] P. Felzenszwalb and R. Girshick and D. McAllester. Discriminatively Trained Deformable Part Models, Release 4. In http://people.cs.uchicago.edu/ pff/latent-release4/.
[22] R. Girshick and P. Felzenszwalb and D. McAllester. Discriminatively Trained Deformable Part Models, Release 5. In http://people.cs.uchicago.edu/ rbg/latent-release5/.
[23] S. Divvala and A. Efros and M. Hebert. How important are Deformable Parts in the De- formable Parts Model? In European Conference on Computer Vision, Parts and Attributes Workshop, 2012 
[24] S. Shalev-Shwartz and Y. Singer and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM in Proceedings of the 24th international conference on Machine learning, 
-----1
[1] E. Bart & S. Ullman. Single-example learning of novel classes using representation by similarity. In BMVC, 2005.
[2] A. Berg, J. Deng, & L. Fei-Fei. ILSVRC 2010. www.image-net.org/challenges/LSVRC/2010/, 2010.
[3] U. Blanke & B. Schiele. Remember and transfer what you have learned - recognizing composite activities based on activity spotting. In ISWC, 2010.
[4] J. Choi, M. Rastegari, A. Farhadi, & L. S. Davis. Adding Unlabeled Samples to Categories by Learned Attributes. In CVPR, 2013.
[5] S. Ebert, D. Larlus, & B. Schiele. Extracting Structures in Image Collections for Object Recognition. In ECCV, 2010.
[6] R. Farrell, O. Oza, V. Morariu, T. Darrell, & L. S. Davis. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In ICCV, 2011.
[7] R. Fergus, Y. Weiss, & A. Torralba. Semi-supervised learning in gigantic image collections. NIPS 2009.
[8] M. Fink. Object classification from a single example utilizing class relevance pseudo-metrics. In NIPS, 2004.
[9] Y. Fu, T. M. Hospedales, T. Xiang, & S. Gong. Learning multi-modal latent attributes. TPAMI, PP(99), 2013.
[10] P. Kankuekul, A. Kawewong, S. Tangruamsub, & O. Hasegawa. Online Incremental Attribute-based Zero-shot Learning. In CVPR, 2012.
[11] C. Lampert, H. Nickisch, & S. Harmeling. Attribute-based classification for zero-shot learning of object categories. TPAMI, PP(99), 2013.
[12] H.-T. Lin, C.-J. Lin, & R. C. Weng. A note on platts probabilistic outputs for support vector machines.Machine Learning, 2007.
[13] J. Liu, B. Kuipers, & S. Savarese. Recognizing human actions by attributes. In CVPR, 2011.
[14] U. Luxburg. A tutorial on spectral clustering. Stat Comput, 17(4):395416, 2007.
[15] M. Maier, U. V. Luxburg, & M. Hein. Influence of graph construction on graph-based clustering measures.In NIPS, 2008.
[16] T. Mensink, J. Verbeek, F. Perronnin, & G. Csurka. Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost. In ECCV, 2012.
[17] Y. Moses, S. Ullman, & S. Edelman. Generalization to novel images in upright and inverted faces.Perception, 25:443461, 1996.
[18] A. Y. Ng, M. I. Jordan, & Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2002.
[19] M. Palatucci, D. Pomerleau, G. Hinton, & T. Mitchell. Zero-shot learning with semantic output codes. In NIPS, 2009.
[20] S. J. Pan & Q. Yang. A survey on transfer learning. TKDE, 22:134559, 2010.
[21] R. Raina, A. Battle, H. Lee, B. Packer, & A. Ng. Self-taught learning: Transfer learning from unlabeled data. In ICML, 2007.
[22] M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal, & B. Schiele. Script data for attribute-based recognition of composite activities. In ECCV, 2012.
[23] M. Rohrbach, M. Stark, & B. Schiele. Evaluating Knowledge Transfer and Zero-Shot Learning in a Large-Scale Setting. In CVPR, 2011.
[24] M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, & B. Schiele. What Helps Where  And Why? Semantic Relatedness for Knowledge Transfer. In CVPR, 2010.
[25] K. Saenko, B. Kulis, M. Fritz, & T. Darrell. Adapting visual category models to new domains. In ECCV, 2010.
[26] V. Sharmanska, N. Quadrianto, & C. H. Lampert. Augmented Attribute Representations. In ECCV, 2012.
[27] A. Shrivastava, S. Singh, & A. Gupta. Constrained Semi-Supervised Learning Using Attributes and Comparative Attributes. In ECCV, 2012.
[28] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, & W. T. Freeman. Discovering Object Categories in Image Collections. In ICCV, 2005.
[29] S. Thrun. Is learning the n-th thing any easier than learning the first. In NIPS, 1996.
[30] A. Torralba, K. Murphy, & W. Freeman. Sharing visual features for multiclass and multiview object detection. In CVPR, 2004.
[31] D. Tran & A. Sorokin. Human activity recognition with metric learning. In ECCV, 2008.
[32] M. Weber, M. Welling, & P. Perona. Towards automatic discovery of object categories. In CVPR, 2000.
[33] D. Zhou, O. Bousquet, T. N. Lal, Jason Weston, & B. Scholkopf. Learning with Local and Global Consistency. In NIPS, 2004.
[34] X. Zhu, Z. Ghahramani, & J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
[35] A. Zweig & D. Weinshall. Exploiting object hierarchy: Combining models from different category levels.In ICCV, 2007.
-----1
[1] L. Duan, D. Xu, I.W. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In CVPR, 2010.
[2] K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV, 2010.
[3] R. Gopalan, R. Li, and R. Chellappa. Domain adaptation for object recognition: An unsupervised ap- proach. In ICCV, 2011.
[4] B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, 2012.
[5] H. Daume III. Frustratingly easy domain adaptation. In ACL, 2007.
[6] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In EMNLP, 2006.
[7] J. Huang, A.J. Smola, A. Gretton, K.M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In NIPS, 2007.
[8] S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Trans. NN, (99):112, 2009.
[9] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence. Dataset shift in machine learning. The MIT Press, 2009.
[10] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In NIPS, 2009.
[11] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
[12] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303338, 2010.
[13] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: a database and web-based tool for image annotation. IJCV, 77:157173, 2008.
[14] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical report, California Institute of Technology, 2007.
[15] D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In ICCV, 2007.
[16] A. Torralba and A.A. Efros. Unbiased look at dataset bias. In CVPR, 2011.
[17] B. Gong, F. Sha, and K. Grauman. Overcoming dataset bias: An unsupervised domain adaptation ap- proach. In NIPS Workshop on Large Scale Visual Recognition and Retrieval, 2012.
[18] L. Cao, Z. Liu, and T. S Huang. Cross-dataset action detection. In CVPR, 2010.
[19] T. Tommasi, N. Quadrianto, B. Caputo, and C. Lampert. Beyond dataset bias: multi-task unaligned shared knowledge transfer. In ACCV, 2012.
[20] J. Hoffman, B. Kulis, T. Darrell, and K. Saenko. Discovering latent domains for multisource domain adaptation. In ECCV. 2012.
[21] A. Gretton, K. Borgwardt, M. Rasch, B. Schoelkopf, and A. Smola. A kernel method for the two-sample- problem. In NIPS. 2007.
[22] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227244, 2000.
[23] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation.In NIPS, 2007.
[24] R. Li and T. Zickler. Discriminative virtual views for cross-view action recognition. In CVPR, 2012.
[25] K. Tang, V. Ramanathan, L. Fei-Fei, and D. Koller. Shifting weights: Adapting object detectors from image to video. In NIPS, 2012.
[26] B. Gong, K. Grauman, and F. Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In ICML, 2013.
[27] S. Jegelka, A. Gretton, B. Scholkopf, B. K Sriperumbudur, and U. Von Luxburg. Generalized clustering via kernel embeddings. In Advances in Artificial Intelligence, 2009.
[28] Q. Sun, R. Chattopadhyay, S. Panchanathan, and J. Ye. A two-stage weighting framework for multi-source domain adaptation. In NIPS, 2011.
[29] L. Duan, I. W Tsang, D. Xu, and T. Chua. Domain adaptation from multiple sources via auxiliary classi- fiers. In ICML, 2009.
[30] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up robust features. In ECCV, 2006.
[31] D. Tran and A. Sorokin. Human activity recognition with metric learning. In ECCV. 2008.
[32] A. Bergamo and L. Torresani. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In NIPS, 2010.
[33] A. Farhadi and M. Tabrizi. Learning to recognize activities from the wrong view point. In ECCV, 2008.
[34] C.-H. Huang, Y.-R. Yeh, and Y.-C. Wang. Recognizing actions across cameras by exploring the correlated subspace. In ECCV, 2012.
[35] J. Liu, M. Shah, B. Kuipers, and S. Savarese. Cross-view action recognition via view knowledge transfer.In CVPR, 2011.
-----1
[1] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unla- beled data. Journal of Machine Learning Research, 6:18171853, 2005.
[2] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In B. Scholkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 4148, Vancouver, British Columbia, Canada, 2006.
[3] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research, 4:8399, 2003.
[4] J. Baxter. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Ma- chine Learning, 28(1):739, 1997.
[5] J. C. Bezdek and R. J. Hathaway. Convergence of alternating optimization. Neural, Parallel & Scientific Computations, 11(4):351368, 2003.
[6] E. Bonilla, K. M. A. Chai, and C. Williams. Multi-task Gaussian process prediction. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 153160, Vancouver, British Columbia, Canada, 2007.
[7] L. Bottou and V. Vapnik. Local learning algorithms. Neural Computation, 4(6):888900, 1992.
[8] R. Caruana. Multitask learning. Machine Learning, 28(1):4175, 1997.
[9] T. Evgeniou and M. Pontil. Regularized multi-task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 109117, Seattle, Washing- ton, USA, 2004.
[10] T. V. Gestel, J. A. K. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, and J. Van- dewalle. Benchmarking least squares support vector machine classifiers. Machine Learning, 54(1):532, 2004.
[11] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, 2011.
[12] L. Jacob, F. Bach, and J.-P. Vert. Clustered multi-task learning: a convex formulation. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 745752, Vancouver, British Columbia, Canada, 2008.
[13] A. Kumar and H. Daume III. Learning task grouping and overlap in multi-task learning. In Proceedings of the 29 th International Conference on Machine Learning, Edinburgh, Scotland, UK, 2012.
[14] S. Parameswaran and K. Weinberger. Large margin multi-task metric learning. In J. Lafferty, C. K. I.Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Process- ing Systems 23, pages 18671875, 2010.
[15] J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods: Support Vector Learning. MIT Press, 1998.
[16] S. Thrun. Is learning the n-th thing any easier than learning the first? In D. S. Touretzky, M. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 640646, Denver, CO, 1995.
[17] S. Thrun and J. OSullivan. Discovering structure in multiple learning tasks: The TC algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 489497, Bari, Italy, 1996.
[18] M. Wu and B. Scholkopf. A local learning approach for clustering. In B. Scholkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 15291536, Vancou- ver, British Columbia, Canada, 2006.
[19] M. Wu, K. Yu, S. Yu, and B. Scholkopf. Local learning projections. In Proceedings of the Twenty-Fourth International Conference on Machine Learning, pages 10391046, Corvallis, Oregon, USA, 2007.
[20] Y. Zhang and D.-Y. Yeung. A convex formulation for learning task relationships in multi-task learning.In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pages 733742, Catalina Island, California, 2010.
-----1
[1] I. M. Johnstone and D. M. Titterington. Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4237, 2009.
[2] T. J. Mitchell and J. J. Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):10231032, 1988.
[3] C. M. Carvalho, N. G. Polson, and J. G. Scott. Handling sparsity via the horseshoe. Journal of Machine Learning Research W&CP, 5:7380, 2009.
[4] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society.Series B (Methodological), 58(1):267288, 1996.
[5] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211244, 2001.
[6] J. M. Hernandez-Lobato, D. Hernandez-Lobato, and A. Suarez. Network-based sparse Bayesian classifi- cation. Pattern Recognition, 44:886900, 2011.
[7] M. Van Gerven, B. Cseke, R. Oostenveld, and T. Heskes. Bayesian source localization with the multivari- ate Laplace prior. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 19011909, 2009.
[8] Julia E. Vogt and Volker Roth. The group-lasso: `1,? regularization versus `1,2 regularization. In Goesele et al., editor, 32nd Anual Symposium of the German Association for Pattern Recognition, volume 6376, pages 252261. Springer, 2010.
[9] Y. Kim, J. Kim, and Y. Kim. Blockwise sparse regression. Statistica Sinica, 16(2):375, 2006.
[10] D. Hernandez-Lobato, J. M. Hernandez-Lobato, T. Helleputte, and P. Dupont. Expectation propagation for Bayesian multi-task feature selection. In Jose L. Balcazar, Francesco Bonchi, Aristides Gionis, and Miche`le Sebag, editors, Proceedings of the European Conference on Machine Learning, volume 6321, pages 522537. Springer, 2010.
[11] G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, pages 122, 2009.
[12] T. Xiong, J. Bi, B. Rao, and V. Cherkassky. Probabilistic joint feature selection for multi-task learning.In Proceedings of the Seventh SIAM International Conference on Data Mining, pages 332342. SIAM, 2007.
[13] T. Jebara. Multi-task feature and kernel selection for svms. In Proceedings of the twenty-first international conference on Machine learning, pages 5562. ACM, 2004.
[14] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 4148. MIT Press, Cambridge, MA, 2007.
[15] A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 964972. 2010.
[16] P. Garrigues and B. Olshausen. Learning horizontal connections in a sparse coding model of natural images. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 505512. MIT Press, Cambridge, MA, 2008.
[17] T. Peleg, Y. C Eldar, and M. Elad. Exploiting statistical dependencies in sparse representations for signal recovery. Signal Processing, IEEE Transactions on, 60(5):22862303, 2012.
[18] A. Papoulis. Probability, Random Variables, and Stochastic Processes. Mc-Graw Hill, 1984.
[19] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, August 2006.
[20] T. Minka. A Family of Algorithms for approximate Bayesian Inference. PhD thesis, Massachusetts Insti- tute of Technology, 2001.
[21] M. W. Seeger. Expectation propagation for exponential families. Technical report, Department of EECS, University of California, Berkeley, 2006.
[22] T. Minka. Power EP. Technical report, Carnegie Mellon University, Department of Statistics, 2004.
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):22782324, 1998.
[24] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding.Journal of Machine Learning Research, 11:1960, 2010.
-----1
[1] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems, volume 19, pages 4148. 2007.
[2] A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. In Advances in Neural Information Processing Systems, vol- ume 20, pages 2532. 2008.
[3] L. Cao and F. Tay. Support vector machine with adaptive parameters in finantial time series forecasting. IEEE Transactions on Neural Networks, 14(6):15061518, 2003.
[4] F. R. Bach, D. Heckerman, and E. Horvits. Considering cost asymmetry in learning classifiers.Journal of Machine Learning Research, 7:171341, 2006.
[5] R. Koenker. Quantile Regression. Cambridge University Press, 2005.
[6] K. Ritter. On parametric linear and quadratic programming problems. mathematical Pro- gramming: Proceedings of the International Congress on Mathematical Programming, pages 307335, 1984.
[7] E. L. Allgower and K. George. Continuation and path following. Acta Numerica, 2:163, 1993.
[8] T. Gal. Postoptimal Analysis, Parametric Programming, and Related Topics. Walter de Gruyter, 1995.
[9] M. J. Best. An algorithm for the solution of the parametric quadratic programming problem.Applied Mathemetics and Parallel Computing, pages 5776, 1996.
[10] M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, volume 6, pages 47344739, 2001.
[11] B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technomet- rics, 47:349363, 2005.
[12] G. Obozinski, B. Taskar, and M. Jordan. Joint covariate selection and joint sbspace selection for multiple classification problems. Statistics and Computing, 20(2):231252, 2010.
[13] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(20):389404, 2000.
[14] B. Efron and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407499, 2004.
[15] T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391415, 2004.
[16] Y. Lin, Y. Lee, and G. Wahba. Support vector machines for classification in nonstandard situations. Machine Learning, 46:191202, 2002.
[17] M. A. Davenport, R. G. Baraniuk, and C. D. Scott. Tuning support vector machine for mini- max and Neyman-Pearson classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
[18] G. Lee and C. Scott. Nested support vector machines. IEEE Transactions on Signal Processing, 58(3):16481660, 2010.
[19] R. Koenker. Quantile Regression. Cambridge University Press, 2005.
[20] I. Takeuchi, Q. V. Le, T. Sears, and A. J. Smola. Nonparametric quantile estimation. Journal of Machine Learning Research, 7:12311264, 2006.
[21] L. K. Bachrach, T. Hastie, M. C. Wang, B. Narasimhan, and R. Marcus. Acquisition in healthy Asian, hispanic, black and caucasian youth. a longitudinal study. The Journal of Clinical Endocrinology and Metabolism, 84:47024712, 1999.
[22] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
-----1
[1] P. Bartlett and M. Traskin. AdaBoost is consistent. Journal of Machine Learning Research, 8:23472368, 2007.
[2] M. Bazaraa, H. Sherali and C. Shetty. Nonlinear Programming: Theory and Algorithms, 3rd Edition.Wiley-Interscience, 2006.
[3] D. P. Bertsekas. A distributed algorithm for the assignment problem. Technical Report, MIT, 1979.
[4] D. Bertsekas. Network Optimization: Continuous and Discrete Models. Athena Scientific, 1998.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[6] A. Demiriz, K. Bennett and J. Shawe-Taylor. Linear programming boosting via column generation, Ma- chine Learning, 46:225254, 2002.
[7] L. Devroye, L. Gyorfi and G. Lugosi. A Probabilistic Theory of Pattern Recognition Springer, New York, 1996.
[8] A. Frank and A. Asuncion. UCI Machine Learning Repository. School of Information and Computer Science, University of California at Irvine, 2006.
[9] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119139, 1997.
[10] Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293318, 2001.
[11] J. Friedman, T. Hastie and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337374, 2000.
[12] K. Glocer. Entropy regularization and soft margin maximization. Ph.D. Dissertation, UCSC, 2009.
[13] K. Hoffgen, H. Simon and K. van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50(1):114125, 1995.
[14] P. Long and R. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78:287-304, 2010.
[15] E. Mammen and A. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27, 1808-1829, 1999.
[16] D. McAllester, T. Hazan and J. Keshet. Direct loss minimization for structured prediction. Neural Infor- mation Processing Systems (NIPS), 1594-1602, 2010.
[17] R. Schapire, Y. Freund, P. Bartlett and W. Lee. Boosting the margin: A new explanation for the effective- ness of voting methods. The Annals of Statistics, 26(5):16511686, 1998.
[18] R. Schapire and Y. Freund. Boosting: Foundations and Algorithms. MIT Press, 2012.
[19] S. Shalev-Shwartz and Y. Singer. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms. Machine Learning, 80(2-3): 141-163, 2010.
[20] I. Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1):128-142, 2005.
[21] P. Tseng and D. Bertsekas. Relaxation methods for strictly convex costs and linear constraints. Mathe- matics of Operations Research, 16:462-481, 1991.
[22] P. Tseng. Convergence of block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475494, 2001.
[23] A. Tsybakov. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135- 166, 2004.
[24] V. Vapnik. Statistical Learning Theory. John Wiley, 1998.
[25] M. Warmuth, K. Glocer and G. Ratsch. Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems (NIPS), 21, 1585-1592, 2007.
[26] M. Warmuth, K. Glocer and S. Vishwanathan. Entropy regularized LPBoost. The 19th International conference on Algorithmic Learning Theory (ALT), 256-271, 2008.
[27] S. Zhai, T. Xia, M. Tan and S. Wang. Direct 0-1 loss minimization and margin maximization with boosting. Technical Report, 2013.
[28] T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimiza- tion. The Annals of Statistics, 32(1):5685, 2004.
[29] T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. The Annals of Statistics, 33:15381579, 2005.
-----1
[1] Antoine Bordes, Seyda Ertekin, Jason Weston, and Leon Bottou. Fast kernel classifiers with online and active learning. J. Mach. Learn. Res., 6:15791619, December 2005.
[2] Joseph K. Bradley and Robert E. Schapire. Filterboost: Regression and classification on large datasets. In NIPS, 2007.
[3] Nicol Cesa-Bianchi and Claudio Gentile. Tracking the best hyperplane with a simple budget perceptron. In In Proc. of Nineteenth Annual Conference on Computational Learning Theory, pages 483498. Springer-Verlag, 2006.
[4] Shang-Tse Chen, Hsuan-Tien Lin, and Chi-Jen Lu. An online boosting algorithm with theoretical justifications. In John Langford and Joelle Pineau, editors, ICML, ICML 12, pages 10071014, New York, NY, USA, July 2012. Omnipress.
[5] Adam Coates and Andrew Ng. The importance of encoding versus training with sparse coding and vector quantization. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML 11, pages 921928, New York, NY, USA, June 2011. ACM.
[6] Koby Crammer, Jaz S. Kandola, and Yoram Singer. Online classification on a budget. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schlkopf, editors, NIPS. MIT Press, 2003.
[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886893 vol. 1, 2005.
[8] Ofer Dekel and Yoram Singer. Support vector machines on a budget. In NIPS, pages 345352, 2006.
[9] Carlos Domingo and Osamu Watanabe. Madaboost: A modification of adaboost. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT 00, pages 180189, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[10] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119139, August 1997.
[11] Helmut Grabner and Horst Bischof. On-line boosting and vision. In CVPR (1), pages 260267, 2006.
[12] Mihajlo Grbovic and Slobodan Vucetic. Tracking concept change with incremental boosting by minimization of the evolving exponential loss. In ECML PKDD, ECML PKDD11, pages 516532, Berlin, Heidelberg, 2011. Springer-Verlag.
[13] Zdenek Kalal, Jiri Matas, and Krystian Mikolajczyk. Weighted sampling for large-scale boosting.In BMVC, 2008.
[14] C. Leistner, A. Saffari, P.M. Roth, and H. Bischof. On robustness of on-line boosting - a competitive study. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 1362 1369, 27 2009-oct. 4 2009.
[15] Nikunj C. Oza and Stuart Russell. Online bagging and boosting. In In Artificial Intelligence and Statistics 2001, pages 105112. Morgan Kaufmann, 2001.
[16] Bordes Antoine Weston Jason and Leon Bottou. Online (and offline) on an even tighter budget.In In Artificial Intelligence and Statistics 2005, 2005.
-----1
[1] http://www.howtogeek.com/howto/15799/how-to-use-autofill-on-a-google-docs-spreadsheet-quick-tips/.
[2] S. Agarwal, Jongwoo Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise clustering. In CVPR, 2005.
[3] Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawlat, Gautam Das, and Cong Yu. Group recommenda- tion: semantics and efficiency. Proc. VLDB Endow., 2(1):754765, 2009.
[4] Christina Brandt, Thorsten Joachims, Yisong Yue, and Jacob Bank. Dynamic ranked retrieval. InWSDM, pages 247256, 2011.
[5] Andrei Z. Broder. On the resemblance and containment of documents. In the Compression and Complexity of Sequences, pages 2129, Positano, Italy, 1997.
[6] Andrei Z. Broder, Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, pages 327336, Dallas, TX, 1998.
[7] Olivier Chapelle, Patrick Haffner, and Vladimir N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks, 10(5):10551064, 1999.
[8] Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
[9] Flavio Chierichetti and Ravi Kumar. LSH-preserving functions and their applications. In SODA, 2012.
[10] Dennis Fetterly, Mark Manasse, Marc Najork, and Janet L. Wiener. A large-scale study of the evolution of web pages. In WWW, pages 669678, Budapest, Hungary, 2003.
[11] Jerome H. Friedman, F. Baskett, and L. Shustek. An algorithm for finding nearest neighbors. IEEE Transactions on Computers, 24:10001006, 1975.
[12] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42(6):11151145, 1995.
[13] Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 8(14):321350, 2012.
[14] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimen- sionality. In STOC, pages 604613, Dallas, TX, 1998.
[15] Sergey Ioffe. Improved consistent sampling, weighted minhash and l1 sketching. In ICDM, 2010.
[16] Yugang Jiang, Chongwah Ngo, and Jun Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In CIVR, pages 494501, Amsterdam, Netherlands, 2007.
[17] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning. Technical report, arXiv:1207.6083, 2013.
[18] Ping Li and Kenneth W. Church. A sketch algorithm for estimating two-way and multi-way associations.Computational Linguistics (Preliminary results appeared in HLT/EMNLP 2005), 33(3):305354, 2007.
[19] Ping Li, Kenneth W. Church, and Trevor J. Hastie. Conditional random sampling: A sketch-based sam- pling technique for sparse data. In NIPS, pages 873880, Vancouver, BC, Canada, 2006.
[20] Ping Li and Arnd Christian Konig. b-bit minwise hashing. InWWW, pages 671680, Raleigh, NC, 2010.
[21] Ping Li, Arnd Christian Konig, and Wenhao Gui. b-bit minwise hashing for estimating three-way simi- larities. In NIPS, Vancouver, BC, 2010.
[22] Ping Li, Art B Owen, and Cun-Hui Zhang. One permutation hashing. In NIPS, Lake Tahoe, NV, 2012.
[23] Ping Li, Anshumali Shrivastava, and Arnd Christian Konig. b-bit minwise hashing in practice. In Inter- netware, Changsha, China, 20103.
[24] Avner Magen and Anastasios Zouzias. Near optimal dimensionality reductions that preserve volumes. In APPROX / RANDOM, pages 523534, 2008.
[25] Mark Manasse, Frank McSherry, and Kunal Talwar. Consistent weighted sampling. Technical Report MSR-TR-2010-73, Microsoft Research, 2010.
[26] Anshumali Shrivastava and Ping Li. Fast near neighbor search in high-dimensional binary data. In ECML, 2012.
[27] Anshumali Shrivastava and Ping Li. Densifying one permutation hashing via rotation for fast near neigh- bor search. Technical report, 2013.
[28] Roger Weber, Hans-Jorg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194205, 1998.
[29] Yair Weiss, Antonio Torralba, and Robert Fergus. Spectral hashing. In NIPS, 2008.
[30] D. Zhou, J. Huang, and B. Scholkopf. Beyond pairwise classification and clustering using hypergraphs.2006.