-----0
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels compared to state-ofthe-art superpixel methods. TPAMI, 2012.
D. Batra, S. Nowozin, and P. Kohli. Tighter relaxations for MAP-MRF inference: A local primal-dual gap based separation algorithm. In AISTATS, 2011.
D. Batra, P. Yadollahpour, A. Guzman-Rivera, and G. Shakhnarovich. Diverse M-best solutions in Markov Random Fields. ECCV, 2012.
C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller.Context-specific independence in bayesian networks. In UAI, 1996.
T. Cour, S. Yu, and J. Shi. Matlab normalized cuts segmentation code. www.seas.upenn.edu/~timothee/ software/ncut/ncut.html, 2010.
M. Fromer and A. Globerson. An LP view of the M-best MAP problem. NIPS, 2009.
A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for map lprelaxations. NIPS, 2007.S. Gould. Max-margin learning for lower linear envelope potentials in binary Markov Random Fields. In ICML, 2011.
D. Heckerman. A tractable inference algorithm for diagnosing multiple diseases. In UAI, 1989.
P. Kohli, M. Kumar, and P. Torr. P3 & beyond: Solving energies with higher order cliques. In CVPR, 2007.
P. Kohli, L. Ladicky, and P. H. Torr. Robust higher order potentials for enforcing label consistency. IJCV, 2009.
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. TPAMI, 2006.
N. Komodakis and N. Paragios. Beyond loose LPrelaxations: Optimizing MRFs by repairing cycles. 2008.
N. Komodakis and N. Paragios. Beyond pairwise energies: Efficient optimization for higher-order MRFs. In CVPR, 2009.
N. Komodakis, N. Paragios, and G. Tziritas. Mrf optimization via dual decomposition: Message-passing revisited.2007.
N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition.TPAMI, 2010.
T. Koo, A. Rush, M. Collins, T. Jaakkola, and D. Sontag.Dual decomposition for parsing with non-projective head automata. In ACL-EMNLP, 2010.
T. Leung and J. Malik. Contour continuity in region based image segmentation. ECCV, 1998.
D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
A. Martins, N. Smith, E. Xing, P. Aguiar, and M. Figueiredo. Turbo parsers: Dependency parsing by approximate variational inference. In ACL-EMNLP, 2010.
E. Mezuman and Y. Weiss. Globally optimizing graph partitioning problems using message passing. In AISTATS, 2012.
A. Mueller. Superpixels for python pretty SLIC, 2012.S. Nowozin and C. Lampert. Global connectivity potentials for random field models. In CVPR, 2009.
J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
C. Rother, P. Kohli, W. Feng, and J. Jia. Minimizing sparse higher order energy functions of discrete variables. In CVPR, 2009.
E. Santos Jr. On the generation of alternative explanations with implications for belief revision. In UAI, 1991.
M. Schmidt. Ugm: matlab code for undirected graphical models. www.di.ens.fr/~mschmidt/Software/UGM.html, 2012.
D. Smith and J. Eisner. Dependency parsing by belief propagation. In ACL-EMNLP, 2008.
D. Sontag. Approximate Inference in Graphical Models using LP Relaxations. PhD thesis, MIT, EECS, 2010.
D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. 2008.
D. Sontag, A. Globerson, and T. Jaakkola. Introduction to dual decomposition for inference. 2010.
D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in MAP inference. In UAI, 2012.
K. Swersky, D. Tarlow, I. Sutskever, R. Salakhutdinov, R. Zemel, and Adams. Cardinality Restricted Boltzmann Machines. In NIPS, 2012.
D. Tarlow, I. Givoni, and R. Zemel. HOP-MAP: Efficient message passing with high order potentials. In AISTATS, 2010.
D. Tarlow, D. Batra, P. Kohli, and V. Kolmogorov. Dynamic tree block coordinate ascent. In ICML, 2011.
D. Tarlow, K. Swersky, R. Zemel, R. Adams, and B. Frey.Fast exact inference for recursive cardinality models. In UAI, 2012.
S. Vicente, V. Kolmogorov, and C. Rother. Joint optimization of segmentation and appearance models. In ICCV.2009.
M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 2008.
Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propagation with convex free energies. In UAI. Citeseer, 2007.
Y. Weiss, C. Yanover, and T. Meltzer. Linear Programming and variants of Belief Propagation in (Blake and Rother, ed.) Markov Random Fields for Vision and Image Processing. Mit Pr, 2011.
T. Werner. A linear programming approach to max-sum problem: A review. TPAMI, 2007.
T. Werner. High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (map-mrf). In CVPR, 2008.
J. Yedidia, W. Freeman, and Y. Weiss. Constructing freeenergy approximations and generalized belief propagation algorithms. IEEE Trans. on Info. Theory, 2005.
-----1
[1] P. Abbeel and A.Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference on Machine learning (ICML 2004), 2004.
[2] Abdeslam Boularias, Jens Kober, and Jan Peters.Relative entropy inverse reinforcement learning.In AISTATS 2011, Journal of Machine Learning Research - Proceedings Track, volume 15, pages 182189, 2011.
[3] C. Boutilier. A POMDP formulation of preference elicitation problems. In Proceedings of the Na- tional Conference on Artificial Intelligence, pages 239246. Menlo Park, CA; Cambridge, MA; Lon- don; AAAI Press; MIT Press; 1999, 2002.
[4] S.J. Bradtke and A.G. Barto. Linear least-squares algorithms for temporal difference learning. Ma- chine Learning, 22(1):3357, 1996.
[5] J. Choi and K.E. Kim. MAP inference for Bayesian inverse reinforcement learning. In NIPS, pages 19891997, 2011.
[6] J. Choi and K.E. Kim. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. In NIPS, 2012.
[7] Christos Dimitrakakis and Constantin A.Rothkopf. Bayesian multitask inverse rein- forcement learning. In European Workshop on Reinforcement Learning (EWRL 2011), number 7188 in LNCS, pages 273284, 2011.
[8] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-free Ap- prenticeship Learning. In Proceedings of the 9th European Workshop on Reinforcement Learning, pages 112, Athens, Greece, September 2011.
[9] Edouard Klein, Matthieu Geist, Bilal Piot, and Olivier Pietquin. Inverse Reinforcement Learning through Structured Classification. In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe (NV, USA), December 2012.
[10] Wolfgang Konen and Thomas Bartz-Beielstein.Reinforcement learning for games: failures and successes. In Proceedings of the 11th Annual Con- ference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, pages 26412648. ACM, 2009.
[11] M.G. Lagoudakis and R. Parr. Least-squares pol- icy iteration. The Journal of Machine Learning Research, 4:11071149, 2003.
[12] Sergey Levine, Zoran Popovic, and Vladlen Koltun. Feature construction for inverse rein- forcement learning. In J. Lafferty, C. K. I.Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Infor- mation Processing Systems 23, pages 13421350.2010.
[13] Michael L. Littman. Markov games as a frame- work for multi-agent reinforcement learning. In Proceedings of the Eleventh International Confer- ence on Machine Learning, pages 157163. Mor- gan Kaufmann, 1994.
[14] Takeshi Mori, Matthew Howard, and Sethu Vi- jayakumar. Model-free apprenticeship learning for transfer of human impedance behaviour. In IEEE Int. Conf. Humanoid Robots, 2011.
[15] Andrew Y. Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conf. on Machine Learn- ing, pages 663670. Morgan Kaufmann, 2000.
[16] Nathan D. Ratliff, Andrew J. Bagnell, and Mar- tin A. Zinkevich. Maximum margin planning. In ICML 06: Proceedings of the 23rd international conference on Machine learning, pages 729736, New York, NY, USA, 2006.
[17] Constantin A. Rothkopf and Christos Dimi- trakakis. Preference elicitation and inverse rein- forcement learning. In ECML/PKDD (3), volume 6913 of LNCS, pages 3448, 2011.
[18] L. S. Shapley. Stochastic games. PNAS, pages 10951100, 1953.
[19] Richard S. Sutton and Andrew G. Barto. Rein- forcement Learning: An Introduction. 1998.
[20] Umar Syed and Robert E. Schapire. A game- theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Sys- tems, volume 10, 2008.
[21] Brian D. Ziebart, J. Andrew Bagnell, and Anind K. Dey. Modelling interaction via the prin- ciple of maximum causal entropy. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel.
-----1
[1] C.E. Rasmussen. The innite Gaussian mixture model. Advances in Neural Information Process- ing Systems, 12(5.2):2, 2000.
[2] C.E. Rasmussen and CKI Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, USA, 2006.
[3] N.D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional data.Advances in Neural Information Processing Sys- tems, 16:329{336, 2004.
[4] H. Nickisch and C. Rasmussen. Gaussian mix- ture modeling with Gaussian process latent vari- able models. Pattern Recognition, pages 272{282, 2010.
[5] S.N. MacEachern and P. Muller. Estimating mix- ture of Dirichlet process models. Journal of Com- putational and Graphical Statistics, pages 223{ 238, 1998.
[6] Jayaram Sethuraman. A constructive denition of Dirichlet priors. Statistica Sinica, 4:639{650, 1994.
[7] J. Qui~nonero-Candela and C.E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. The Journal of Machine Learning Research, 6:1939{1959, 2005.
[8] E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. Advances in Neu- ral Information Processing Systems, 2006.
[9] M. Salzmann, R. Urtasun, and P. Fua. Local de- formation models for monocular 3D shape recov- ery. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 1{8, 2008.
[10] N.D. Lawrence and R. Urtasun. Non-linear ma- trix factorization with Gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 601{608.ACM, 2009.
[11] M. Titsias and N. Lawrence. Bayesian Gaussian process latent variable model. AISTATS, 2010.
[12] A. Geiger, R. Urtasun, and T. Darrell. Rank priors for continuous non-linear dimensionality reduction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 880{887. IEEE, 2009.
[13] M.E. Tipping and C.M. Bishop. Mixtures of prob- abilistic principal component analyzers. Neural computation, 11(2):443{482, 1999.
[14] Z. Ghahramani and M.J. Beal. Variational in- ference for Bayesian mixtures of factor analysers.Advances in Neural Information Processing Sys- tems, 12:449{455, 2000.
[15] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2:849{ 856, 2002.
[16] W. Cao and R. Haralick. Nonlinear manifold clus- tering by dimensionality. In International Confer- ence on Pattern Recognition (ICPR), volume 1, pages 920{924. IEEE, 2006.
[17] Ehsan Elhamifar and Rene Vidal. Sparse man- ifold clustering and embedding. In Advances in Neural Information Processing Systems, pages 55{63, 2011.
[18] J. Wang, J. Lee, and C. Zhang. Kernel trick em- bedded Gaussian mixture model. In Algorithmic Learning Theory, pages 159{174. Springer, 2003.
[19] Carlos E Rodrguez and Stephen G Walker. Uni- variate bayesian nonparametric mixture modeling with unimodal kernels. Statistics and Computing, pages 1{15, 2012.
[20] Daniel B Graham and Nigel M Allinson. Char- acterizing virtual eigensignatures for general pur- pose face recognition. Face Recognition: From Theory to Applications, 163:446{456, 1998.
[21] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. ACM Trans.Intell. Syst. Technol., 2(3):27:1{27:27, 2011.
[22] R.P. Adams and Z. Ghahramani. Archipelago: nonparametric Bayesian semi-supervised learn- ing. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning. ACM, 2009.
[23] W.M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, pages 846{850, 1971.
[24] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei.Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566{ 1581, 2006.
[25] R.M. Neal. Density modeling and clustering us- ing dirichlet diusion trees. Bayesian Statistics, 7:619{629, 2003.
[26] Y. Zhang and C. Sutton. Quasi-Newton Markov chain Monte Carlo. Advances in Neural Informa- tion Processing Systems, pages 2393{2401, 2011.
-----0
Allman, E. S., Matias, C., and Rhodes, J. A. Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37:30993132, 2009.
Benaglia, M., Chauveau, D., and Hunter, D. R. Bandwidth selection in an EM-like algorithm for nonparametric multivariate mixtures. In Nonparametrics and Mixture Models, 2011.
Bohning, D. and Seidel, W. Editorial: recent developments in mixture models. Computational Statistics & Data Analysis, 41(3):349357, 2003.
Chauveau, D., Hunter, D. R., and Levine, M. Estimation for conditional independence multivariate finite mixture models. Technical Report, 2010.
De Silva, V. and Lim, L.H. Tensor rank and the illposedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications, 30(3):10841127, 2008.
Feng, Z. D. and McCulloch, C. E. Using bootstrap likelihood ratios in finite mixture models. Journal of the Royal Statistical Society. Series B (Methodological), 58(3):609617, 1996.Frank, A. and Asuncion, A. UCI machine learning repository, 2010. URL http://archive.ics.uci.edu/ml.
Fukumizu, K., Gretton, A., Sun, X., and Scholkopf, B.Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems, volume 20, pp. 489496. MIT Press, 09 2008.
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Scholkopf, B., and Smola, A. A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20, pp. 585592, Cambridge, 2008. MIT Press.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13:671 721, 2012.
Hall, P. and Zhou, X.-H. Nonparametric estimation of component distributions in a multivariate mixture.
The Annals of Statistics, 31(1):201224, 2003.Hoyer, P., Janzing, D., Mooij, J., Peters, J., and Scholkopf, B. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 2008. MIT Press, 2009.
Iwata, T., Duvenaud, D., and Ghahramani, Z. Warped mixtures for nonparametric cluster shapes. In Uncertainty in Artificial Intelligence, 2013.
Janzing, D., Peters, J., Mooij, J., and Scholkopf, B.Identifying latent confounders using additive noise models. In Uncertainty in Artificial Intelligence, pp.249257, 2009.
Janzing, D., Sgouritsa, E., Stegle, O., Peters, J., and Scholkopf, B. Detecting low-complexity unobserved causes. In Uncertainty in Artificial Intelligence, pp.383391, 2011.
Kasahara, H. and Shimotsu, K. Nonparametric identification of multivariate mixtures. Technical report, 2010.
Kruskal, J. B. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18(2):95  138, 1977.
Levine, M., Hunter, D. R., and Chauveau, D. Maximum smoothed likelihood for multivariate mixtures.Biometrika, 98(2):403416, 2011.
Pearl, J. Causality. Cambridge University Press, 2000.Rasmussen, Carl Edward. The infinite gaussian mixture model. Advances in Neural Information Processing Systems, 12(5.2):2, 2000.Scholkopf, B. and Smola, A. Learning with kernels.MIT Press, Cambridge, MA, 2002.
Shimizu, S., Hoyer, P., and Hyvarinen, A. Estimation of linear non-gaussian acyclic models for latent factors. Neurocomputing, 72(79):20242027, 2009.
Silva, R., Scheines, R., Glymour, C., and Spirtes, P.Learning the structure of linear latent variable models. JMLR, 7:191246, 2006.
Smola, A., Gretton, A., Song, L., and Scholkopf, B.A hilbert space embedding for distributions. In Algorithmic Learning Theory, pp. 1331. SpringerVerlag, 2007.
Spirtes, P., Glymour, C., and Scheines, R. Causation, Prediction, and Search. MIT Press, 2. edition, 2001.
Von Luxburg, U. A tutorial on spectral clustering.Statistics and computing, 17(4):395416, 2007.
-----1
[1] D. Allouche, S. de Givry, and T. Schiex. Toulbar2, an open source exact cost function network solver. Tech- nical report, INRIA, 2010.
[2] S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. In Foundations of Com- puter Science, 1993. Proceedings., 34th Annual Sym- posium on, pp. 724733. IEEE, 1993.
[3] E. Berlekamp, R. McEliece, and H. Van Tilborg. On the inherent intractability of certain coding problems.Information Theory, IEEE Transactions on, 24(3): 384386, 1978.
[4] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization. Athena Scientific Belmont, MA, 1997.
[5] J. L. Carter and M. N. Wegman. Universal classes of hash functions. Journal of computer and system sciences, 18(2):143154, 1979.
[6] S. Ermon, C. Gomes, A. Sabharwal, and B. Selman.Taming the curse of dimensionality: Discrete integra- tion by hashing and optimization. In ICML (To ap- pear), 2013.
[7] J. Feldman, M. J. Wainwright, and D. R. Karger. Us- ing linear programming to decode binary linear codes.Information Theory, IEEE Transactions on, 51(3): 954972, 2005.
[8] R. Gallager. Low-density parity-check codes. Informa- tion Theory, IRE Transactions on, 8(1):2128, 1962.
[9] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for map lp- relaxations. Advances in Neural Information Process- ing Systems, 21(1.6), 2007.
[10] V. Gogate and R. Dechter. SampleSearch: A scheme that searches for consistent samples. In Proc. 10th International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
[11] V. Gogate and R. Dechter. SampleSearch: Impor- tance sampling in presence of determinism. Artificial Intelligence, 175(2):694729, 2011.
[12] O. Goldreich. Randomized methods in computation.Lecture Notes, 2011.
[13] C. Gomes, A. Sabharwal, and B. Selman. Model counting: A new strategy for obtaining good bounds.In AAAI, pp. 5461, 2006.
[14] C. Gomes, A. Sabharwal, and B. Selman. Near- uniform sampling of combinatorial spaces using XOR constraints. Advances In Neural Information Process- ing Systems, 19:481488, 2006.
[15] IBM ILOG. IBM ILOG CPLEX Optimization Studio 12.3, 2011.
[16] R. Jeroslow. On defining sets of vertices of the hyper- cube by linear inequalities. Discrete Mathematics, 11 (2):119124, 1975.
[17] M. Jerrum and A. Sinclair. The Markov chain Monte Carlo method: an approach to approximate counting and integration. Approximation algorithms for NP- hard problems, pp. 482520, 1997.
[18] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
[19] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Fac- tor graphs and the sum-product algorithm. Informa- tion Theory, IEEE Transactions on, 47(2):498519, 2001.
[20] S. L. Lauritzen and D. J. Spiegelhalter. Local compu- tations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological), pp. 157224, 1988.
[21] N. Madras. Lectures on Monte Carlo Methods. Amer- ican Mathematical Society, 2002. ISBN 0821829785.
[22] J. Mooij. libDAI: A free and open source c++ library for discrete approximate inference in graphical mod- els. JMLR, 11:21692173, 2010.
[23] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In UAI, 1999.
[24] K. B. Petersen and M. S. Pedersen. The matrix cook- book. Technical University of Denmark, pp. 715, 2008.
[25] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In UAI, 2008.
[26] J. Stern. Approximating the number of error locations within a constant ratio is np-complete. In Proceed- ings of the 10th International Symposium on Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, pp. 325331. Springer-Verlag, 1993.
[27] D. R. Stinson. On the connections between universal hashing, combinatorial designs and error-correcting codes. Congressus Numerantium, pp. 728, 1996.
[28] S. Vadhan. Pseudorandomness. Foundations and Trends in Theoretical Computer Science, 2011.
[29] A. Vardy. Algorithmic complexity in coding theory and the minimum distance problem. In STOC, 1997.
[30] M. Wainwright. Tree-reweighted belief propagation al- gorithms and approximate ML estimation via pseudo- moment matching. In AISTATS, 2003.
[31] M. Wainwright and M. Jordan. Graphical models, ex- ponential families, and variational inference. Founda- tions and Trends in Machine Learning, 1(1-2):1305, 2008.
[32] W. Wei and B. Selman. A new approach to model counting. In Theory and Applications of Satisfiability Testing (SAT), pp. 324339, 2005.
[33] A. Wigderson. Lectures on the fusion method and derandomization. Technical report, Technical Report SOCS-95.2, School of Computer Science, McGill Uni- versity, 1995.
[34] M. Yannakakis. Expressing combinatorial optimiza- tion problems by linear programs. Journal of Com- puter and System Sciences, 43(3):441466, 1991.
[35] C. Yanover, T. Meltzer, and Y. Weiss. Linear pro- gramming relaxations and belief propagationan em- pirical study. The Journal of Machine Learning Re- search, 7:18871907, 2006.
-----0
Arya, Sunil, Mount, David M., Netanyahu, Nathan S., 
Silverman, Ruth, and Wu, Angela Y. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 1998.Austerweil, Joseph and Griffiths, Tom. Learning invariant features using the transformed indian buffet process. In NIPS, 2010.
Beygelzimer, Alina, Kakade, Sham, and Langford, John. Cover trees for nearest neighbor. In ICML, 2006.
Doshi-Velez, Finale and Ghahramani, Zoubin. Correlated non-parametric latent feature models. In UAI, 2009.
Gershman, Samuel J., Frazier, Peter I., and Blei, David M. Distance dependent infinite latent feature models. Technical report, arXiv, 2012.
Gilks, Walter R. and Wild, Pascal. Adaptive rejection sampling for gibbs sampling. Applied Statistics, 1992.
Goldberger, Jacob, Roweis, Sam T., Hinton, Geoffrey E., and Salakhutdinov, Ruslan. Neighbourhood components analysis. In NIPS, 2004.
Gorur, Dilan, Jakel, Frank, and Rasmussen, Carl Edward. A choice model with infinitely many latent features. In ICML, 2006.
Griffiths, Thomas L. and Ghahramani, Zoubin. Infinite latent feature models and the indian buffet process. In NIPS, 2005.
Griffiths, Thomas L. and Ghahramani, Zoubin. The indian buffet process: An introduction and review.Journal of Machine Learning Research, 2011.
Indyk, Piotr and Motwani, Rajeev. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, 1998.
Kemp, Charles, Tenenbaum, Joshua B., Griffiths, Thomas L., Yamada, Takeshi, and Ueda, Naonori.Learning systems of concepts with an infinite relational model. In AAAI, 2006.
Knowles, David and Ghahramani, Zoubin. Infinite sparse factor analysis and infinite independent components analysis. In ICA, 2007.
Lampert, Christoph H., Nickisch, Hannes, and Harmeling, Stefan. Learning to detect unseen object classes by betweenclass attribute transfer. In CVPR, 2009.
Meeds, Edward, Ghahramani, Zoubin, Neal, Radford M., and Roweis, Sam T. Modeling dyadic data with binary latent factors. In NIPS, 2007.
Miller, Kurt, Griffiths, Thomas, and Jordan, Michael.Nonparametric latent feature models for link prediction. In NIPS, 2009.
Miller, Kurt T., Griffiths, Thomas L., and Jordan, Michael I. The phylogenetic indian buffet process: A non-exchangeable nonparametric prior for latent features. In UAI, 2008.
Mu, Yadong, Shen, Jialie, and Yan, Shuicheng.Weakly-supervised hashing in kernel space. In CVPR, 2010.
Murray, Iain, Adams, Ryan Prescott, and MacKay, David J. C. Elliptical slice sampling. AISTATS, 2010.
Neal, Radford M. Slice sampling. Annals of Statistics, 2003.
Norouzi, Mohammad, Fleet, David J., and Salakhutdinov, Ruslan. Hamming distance metric learning.In NIPS, 2012.
Osherson, Daniel N., Stern, Joshua, Wilkie, Ormond, Stob, Michael, and Smith, Edward E. Default probability. Cognitive Science, 1991.
Quadrianto, Novi and Lampert, Christoph H. Learning multi-view neighborhood preserving projections.In ICML, 2011.
Rai, Piyush and Daume III, Hal. Multi-label prediction via sparse infinite CCA. In NIPS, 2009.
Restle, Frank. Psychology of judgment and choice: A theoretical essay. John Wiley & Sons, 1961.
Salakhutdinov, Ruslan and Hinton, Geoffrey E. Semantic Hashing. In SIGIR workshop on Information Retrieval and applications of Graphical Models, 2007.Schultz, Matthew and Joachims, Thorsten. Learning a distance metric from relative comparisons. In NIPS, 2003.
Teh, Yee Whye, Gorur, Dilan, and Ghahramani, Zoubin. Stick-breaking construction for the indian buffet process. AISTATS, 2007.
Torralba, Antonio B., Fergus, Robert, and Weiss, Yair.Small codes and large image databases for recognition. In CVPR, 2008.
Wang, Jun, Kumar, Sanjiv, and Chang, ShihFu. Semi-supervised hashing for large-scale search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 2012.
Weinberger, Kilian Q. and Saul, Lawrence K. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 2009.
Weiss, Yair, Torralba, Antonio, and Fergus, Rob.Spectral hashing. In NIPS, 2009.
Williamson, Sinead, Orbanz, Peter, and Ghahramani, Zoubin. Dependent indian buffet processes. AISTATS, 2010.
Zhai, Ke, Hu, Yuening, Boyd-Graber, Jordan L., and Williamson, Sinead. Modeling images using transformed indian buffet processes. In ICML, 2012.
-----0
D. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3(3):507554, 2002.
D. Chickering, D. Heckerman, and C. Meek. Largesample learning of Bayesian networks is NP-hard.
J. Mach. Learn. Res., 5:12871330, December 2004.T. Claassen and T. Heskes. A logical characterization of constraint-based causal discovery. In Proc. of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pages 135144, 2011.
T. Claassen, J. Mooij, and T. Heskes. Proof supplement to Learning sparse causal models is not NP-hard. Technical report, Faculty of Science, Radboud University Nijmegen, 2013.http://www.cs.ru.nl/~tomc/docs/NPHardSup.pdf.
D. Colombo, M. Maathuis, M. Kalisch, and T. Richardson. Learning high-dimensional DAGs with latent and selection variables. The Annals of Statistics, 40(1):294321, 2012.
G. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, (42):393405, 1990.
J. Cussens. Bayesian network learning with cutting planes. In Proc. of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pages 153 160. AUAI Press, 2011.
R. Evans and T. Richardson. Maximum likelihood fitting of acyclic directed mixed graphs to binary data.
In Proc. of the 26th Conference on Uncertainty in Artificial Intelligence, pages 177184, 2010.
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., 1979.
O. Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008.
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009.
D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12, pages 505511, 1999.
J. Pearl. Causality: models, reasoning and inference.Cambridge University Press, 2000.
J. Pearl and T. Verma. A theory of inferred causation.In Knowledge Representation and Reasoning: Proc.of the Second Int. Conf., pages 441452, 1991.
T. Richardson and P. Spirtes. Ancestral graph Markov models. Ann. Stat., 30(4):9621030, 2002.
P. Spirtes, C. Meek, and T. Richardson. An algorithm for causal inference in the presence of latent variables and selection bias. In Computation, Causation, and Discovery, pages 211252. AAAI Press, Menlo Park, CA, 1999.
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, Cambridge, Massachusetts, 2nd edition, 2000.
J. Tian, A. Paz, and J. Pearl. Finding minimal dseparators. Technical Report R-254, UCLA Cognitive Systems Laboratory, 1998.
C. Yuan and B. Malone. An improved admissible heuristic for learning optimal Bayesian networks. In Proc. of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), pages 924933, Corvallis, Oregon, 2012. AUAI Press.
J. Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17):1873  1896, 2008.
-----1
[1] R. Iris Bahar, Erica Frohm, Charles Gaona, Gary Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebraic Decision Diagrams and their applications. In IEEE /ACM International Con- ference on CAD, 1993.
[2] Richard E. Bellman. Dynamic Programming.Princeton University Press, Princeton, NJ, 1957.
[3] Craig Boutilier, Thomas Dean, and Steve Hanks.Decision-theoretic planning: Structural assump- tions and computational leverage. JAIR, 11:194, 1999.
[4] Justin Boyan and Michael Littman. Exact solu- tions to time-dependent MDPs. In Advances in Neural Information Processing Systems NIPS-00, pages 10261032, 2001.
[5] John L. Bresina, Richard Dearden, Nicolas Meuleau, Sailesh Ramkrishnan, David E. Smith, and Richard Washington. Planning under contin- uous time and resource uncertainty:a challenge for ai. In Uncertainty in Artificial Intelligence (UAI- 02), 2002.
[6] Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Com- putational Intelligence, 5(3):142150, 1989.
[7] Zhengzhu Feng, Richard Dearden, Nicolas Meuleau, and Richard Washington. Dynamic pro- gramming for structured continuous markov deci- sion problems. In Uncertainty in Artificial Intel- ligence (UAI-04), pages 154161, 2004.
[8] Branislav Kveton and Milos Hauskrecht. Learn- ing basis functions in hybrid domains. In In Pro- ceedings of the 21st National Conference on Ar- tificial Intelligence (AAAI-06), pages 11611166, Boston, USA, 2006.
[9] Branislav Kveton, Milos Hauskrecht, and Carlos Guestrin. Solving factored mdps with hybrid state and action variables. Journal Artificial Intelli- gence Research (JAIR), 27:153201, 2006.
[10] Lihong Li and Michael L. Littman. Lazy ap- proximation for solving continuous finite-horizon mdps. In National Conference on Artificial Intel- ligence AAAI-05, pages 11751180, 2005.
[11] Janusz Marecki, Sven Koenig, and Milind Tambe.A fast analytical algorithm for solving markov de- cision processes with real-valued resources. In In- ternational Conference on Uncertainty in Artifi- cial Intelligence IJCAI, pages 25362541, 2007.
[12] Nicolas Meuleau, Emmanuel Benazera, Ronen I.Brafman, Eric A. Hansen, and Mausam. A heuris- tic search approach to planning with continuous resources in stochastic domains. Journal Artificial Intelligence Research (JAIR), 34:2759, 2009.
[13] Andrew Moore Remi Munos. Variable resolution discretization in optimal control. Machine Learn- ing, 49, 23:291323, 2002.
[14] Scott Sanner, Karina Valdivia Delgado, and Leliane Nunes de Barros. Symbolic dynamic pro- gramming for discrete and continuous state mdps.In Proceedings of the 27th Conference on Uncer- tainty in AI (UAI-2011), Barcelona, 2011.
[15] Herbert E Scarf. Inventory theory. Operations Research, 50(1):186191, 2002.
[16] Robert St-Aubin, Jesse Hoey, and Craig Boutilier.APRICODD: Approximate policy construction using decision diagrams. In NIPS-2000, pages 10891095, Denver, 2000.
[17] Zahra Zamani, Scott Sanner, and Cheng Fang.Symbolic dynamic programming for continuous state and action mdps. In Jorg Hoffmann and Bart Selman, editors, AAAI. AAAI Press, 2012.
-----1
[1] L. R. Rabiner. A tutorial on hidden Markov mod- els and selected applications in speech recogni- tion. Proc. IEEE, 77(2):257285, 1989.
[2] Yoshua Bengio and Paolo Frasconi. An Input Output HMM Architecture. In Advances in Neu- ral Information Processing Systems, 1995.
[3] Michael Littman, Richard Sutton, and Satinder Singh. Predictive representations of state. In Ad- vances in Neural Information Processing Systems (NIPS), 2002.
[4] Byron Boots, Sajid M. Siddiqi, and Geoffrey J.Gordon. Closing the learning-planning loop with predictive state representations. In Proceedings of Robotics: Science and Systems VI, 2010.
[5] David Wingate and Satinder Singh. Exponential family predictive representations of state. In Proc.NIPS, 2007.
[6] David Wingate and Satinder Singh. On discovery and learning of models with predictive represen- tations of state for agents with continuous actions and observations. In Proc. AAMAS, 2007.
[7] A.J. Smola, A. Gretton, L. Song, and B. Scholkopf. A Hilbert space embedding for distributions. In E. Takimoto, editor, Al- gorithmic Learning Theory, Lecture Notes on Computer Science. Springer, 2007.
[8] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Scholkopf. Injective Hilbert space embeddings of probability measures. 2008.
[9] Kenji Fukumizu, Le Song, and Arthur Gretton.Kernel bayes rule. In J. Shawe-Taylor, R.S.Zemel, P. Bartlett, F.C.N. Pereira, and K.Q.Weinberger, editors, Advances in Neural Infor- mation Processing Systems 24, pages 17371745.2011.
[10] L. Song, B. Boots, S. M. Siddiqi, G. J. Gordon, and A. J. Smola. Hilbert space embeddings of hidden Markov models. In Proc. 27th Intl. Conf.on Machine Learning (ICML), 2010.
[11] Byron Boots and Geoffrey Gordon. Two-manifold problems with applications to nonlinear system identification. In Proc. 29th Intl. Conf. on Ma- chine Learning (ICML), 2012.
[12] Steffen Grunewalder, Guy Lever, Luca Baldas- sarre, Massimiliano Pontil, and Arthur Gretton.Modelling transition dynamics in MDPs with RKHS embeddings. CoRR, abs/1206.4655, 2012.
[13] Y Nishiyama, A Boularias, A Gretton, and K Fukumizu. Hilbert space embeddings of POMDPs. 2012.
[14] Judea Pearl. Causality: models, reasoning, and inference. Cambridge University Press, 2000.
[15] Michael Bowling, Peter McCracken, Michael James, James Neufeld, and Dana Wilkinson.Learning predictive state representations using non-blind policies. In Proc. ICML, 2006.
[16] Satinder Singh, Michael James, and Matthew Rudary. Predictive state representations: A new theory for modeling dynamical systems. In Proc.UAI, 2004.
[17] C. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathe- matical Society, 186:273289, 1973.
[18] L. Song, J. Huang, A. Smola, and K. Fukumizu.Hilbert space embeddings of conditional distribu- tions. In International Conference on Machine Learning, 2009.
[19] S. Grunewalder, G. Lever, L. Baldassarre, S. Pat- terson, A. Gretton, and M. Pontil. Conditional mean embeddings as regressors. In ICML, 2012.
[20] Byron Boots and Geoffrey Gordon. An online spectral learning algorithm for partially observ- able nonlinear dynamical systems. In Proceedings of the 25th National Conference on Artificial In- telligence (AAAI-2011), 2011.
[21] V. Verdult, J.A.K. Suykens, J. Boets, I. Goethals, and B. De Moor. Least squares support vec- tor machines for kernel cca in nonlinear state- space identification. In Proceedings of the 16th international symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, Belgium, 2004.
[22] Yoshinobu Kawahara, Takehisa Yairi, and Kazuo Machida. A kernel subspace method by stochastic realization for learning nonlinear dynamical sys- tems, 2006.
[23] B. Boots. Learning Stable Linear Dynamical Sys- tems. Data Analysis Project, Carnegie Mellon University, 2009.
[24] Sajid Siddiqi, Byron Boots, and Geoffrey J. Gor- don. A constraint generation approach to learning stable linear dynamical systems. In Proc. NIPS, 2007.
-----0
Biere, A., Heule, M. J. H., van Maaren, H., and Walsh, T., editors (2009). Handbook of Satisfiability. IOS Press.
Borboudakis, G., Triantafilou, S., Lagani, V., and Tsamardinos, I. (2011). A constraint-based approach to incorporate prior knowledge in causal models. In Proc. ESANN, pages 321326.
Borboudakis, G. and Tsamardinos, I. (2012). Incorporating causal prior knowledge as path-constraints in bayesian networks and maximal ancestral graphs.In Proc. ICML, pages 17991806.
Claassen, T. and Heskes, T. (2010). Causal discovery discovery in multiple models from different experiments. In Proc. NIPS, pages 415423.
Een, N. and Sorensson, N. (2003). Temporal induction by incremental SAT solving. Electr. Notes Theor.Comput. Sci., 89(4):543560.
Een, N. and Sorensson, N. (2004). An extensible SATsolver. In Proc. SAT 2003, pages 502518.
Geiger, D., Verma, T., and Pearl, J. (1990). Identifying independence in Bayesian networks. Networks, 20:507533.
Hoyer, P. O., Shimizu, S., Kerminen, A. J., and Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49:362378.
Hyttinen, A., Eberhardt, F., and Hoyer, P. O. (2012).Causal discovery of linear cyclic models from multiple experimental data sets with overlapping variables. In Proc. UAI, pages 387396.
Jarvisalo, M., Le Berre, D., Roussel, O., and Simon, L.(2012). The international SAT solver competitions.AI Magazine, 33(1):8992.
Koster, J. T. A. (2002). Marginalizing and conditioning in graphical models. Bernoulli, 8(6):817840.
Lagani, V., Tsamardinos, I., and Triantafilou, S.(2012). Learning from mixture of experimental data: A constraint-based approach. In Proc. SETN, pages 124131.
Marques-Silva, J. P. and Sakallah, K. A. (1999).GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Computers, 48(5):506521.
Moskewicz, M. W., Madigan, C. F., Zhao, Y., Zhang, L., and Malik, S. (2001). Chaff: Engineering an efficient SAT solver. In Proc. DAC, pages 530535.
Neal, R. (2000). On deducing conditional independence from d-separation in causal graphs with feedback. Journal of Artificial Intelligence Research, 12:8791.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.
Pearl, J. and Dechter, R. (1996). Identifying independencies in causal graphs with feedback. In Proc. UAI, pages 420426.
Peters, J., Janzing, D., and Scholkopf, B. (2010). Identifying cause and effect on discrete data using additive noise models. In Proc. AISTATS, pages 597 604.
Richardson, T. and Spirtes, P. (1999). Automated discovery of linear feedback models. In Glymour, C.and Cooper, G. F., editors, Computation, Causation & Discovery, pages 253302. AAAI / MIT Press.
Richardson, T. S. (1996). Feedback Models: Interpretation and Discovery. PhD thesis, Carnegie Mellon University.
Schmidt, M. and Murphy, K. (2009). Modeling discrete interventional data using directed cyclic graphical models. In Proc. UAI, pages 487495.
Spirtes, P. (1995). Directed cyclic graphical representation of feedback models. In Proc. UAI, pages 491 498.
Spirtes, P., Glymour, C., and Scheines, R. (1993).Causation, Prediction, and Search. Springer-Verlag.
Spirtes, P., Meek, C., and Richardson, T. (1999). An algorithm for causal inference in the presence of latent variables and selection bias. In Glymour, C. and 
Cooper, G. F., editors, Computation, Causation & Discovery, pages 211252. AAAI / MIT Press.
Studeny, M. (1998). Bayesian networks from the point of view of chain graphs. In Proc. UAI, pages 496 503.
Tillman, R. E., Danks, D., and Glymour, C. (2009).Integrating locally learned causal structures with overlapping variables. In Proc. NIPS 2008, pages 16651672.
Triantafillou, S., Tsamardinos, I., and Tollis, I. G.(2010). Learning causal structure from overlapping variable sets. In Proc. AISTATS, pages 860867.
Tsamardinos, I., Triantafillou, S., and Lagani, V.(2012). Towards integrative causal analysis of heterogeneous data sets and studies. Journal of Machine Learning Research, 13:10971157.
Tseitin, G. S. (1983). On the complexity of derivation in propositional calculus. In Automation of Reasoning 2: Classical Papers on Computational Logic 19671970, pages 466483. Springer.
-----0
Aslam, J. A., & Montague, M. (2001). Models for metasearch. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR2001 (pp.
276284). New Orleans, Louisiana, United States: ACM.Carterette, B., & Petkova, D. (2006). Learning a ranking from pairwise preferences. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR2006 (pp. 629630). Seattle, Washington, USA: ACM.
Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P.(2009). Expected reciprocal rank for graded relevance. Proceedings of the 18th ACM international conference on Information and knowledge management CIKM2009 (pp. 621630). Hong Kong, China.
Cormack, G. V., Clarke, C. L. A., & Buettcher, S.(2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval SIGIR2009 (pp. 758759). New York, NY, USA: ACM.
Dwork, C., Kumar, R., Naor, M., & Sivakumar, D.(2001). Rank aggregation methods for the web.Proceedings of the 10th international conference on World Wide Web WWW2001 (pp. 613622). New York, NY, USA: ACM.
Farah, M., & Vanderpooten, D. (2007). An outranking approach for rank aggregation in information retrieval. Proceedings of the 30th international ACM SIGIR conference on Research and development in information retrieval SIGIR2007 (pp. 591 598). Amsterdam, The Netherlands: ACM.
Gleich, D. F., & Lim, L.-h. (2011). Rank aggregation via nuclear norm minimization. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining KDD2011 (pp. 6068). San Diego, California, USA: ACM.
Guiver, J., & Snelson, E. (2009). Bayesian inference for plackett-luce ranking models. Proceedings of the 26th Annual International Conference on Machine Learning ICML2009 (pp. 377384). New York, NY, USA: ACM.
Hong, L., Bekkerman, R., Adler, J., & Davison, B. D.(2012). Learning to rank social update streams. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval SIGIR2012 (pp. 651660). Portland, Oregon, USA: ACM.
Jarvelin, K., & Kekalainen, J. (2002). Cumulated gainbased evaluation of ir techniques. ACM Trans. Inf.Syst., 20, 422446.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev., 51, 455500.
Lebanon, G., & Lafferty, J. D. (2002). Cranking: Combining rankings using conditional probability models on permutations. roceedings of the 19th Annual International Conference on Machine Learning ICML2002 (pp. 363370).
Liu, Y.-T., Liu, T.-Y., Qin, T., Ma, Z.-M., & Li, H.(2007). Supervised rank aggregation. Proceedings of the 16th international conference on World Wide Web WWW2007 (pp. 481490). New York, NY, USA: ACM.
Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27, 2:12:27.
Qin, T., Geng, X., & Liu, T.-Y. (2010). A new probabilistic model for rank aggregation. Advances in Neural Information Processing Systems 23 NIPS2010 (pp. 19481956).
Thurstone, L. L. (1927). The method of paired comparisons for social values. The Journal of Abnormal and Social Psychology, 21, 384400.
Volkovs, M. N., Larochelle, H., & Zemel, R. S. (2012).Learning to rank by aggregating expert preferences.Proceedings of the 21st ACM international conference on Information and knowledge management CIKM2012 (pp. 843851). Maui, Hawaii, USA: ACM.
Volkovs, M. N., & Zemel, R. S. (2012). A flexible generative model for preference aggregation. Proceedings of the 21st international conference on World Wide Web WWW2012 (pp. 479488). New York, NY, USA: ACM.
Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. CLEF '01 (pp. 355370).Springer-Verlag.
-----0
Bekkerman, R., Bilenko, M., and Langford, J. (2011). Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge Univ. Press, NY.
Cao, N., Low, K. H., and Dolan, J. M. (2013). Multi-robot informative path planning for active sensing of environmental phenomena: A tale of two algorithms. In Proc.AAMAS, pages 714.
Chang, E. Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., and Cui, H. (2007). Parallelizing support vector machines on distributed computers. In Proc. NIPS.
Chen, J., Low, K. H., Tan, C. K.-Y., Oran, A., Jaillet, P., Dolan, J. M., and Sukhatme, G. S. (2012). Decentralized data fusion and active sensing with mobile sensors for modeling and predicting spatiotemporal traffic phenomena. In Proc. UAI, pages 163173.
Chen, J., Cao, N., Low, K. H., Ouyang, R., Tan, C. K.-Y., and Jaillet, P. (2013). Parallel Gaussian process regression with low-rank covariance matrix approximations.arXiv:1305.5826.
Choudhury, A., Nair, P. B., and Keane, A. J. (2002). A data parallel approach for large-scale Gaussian process modeling. In Proc. SDM, pages 95111.
Das, K. and Srivastava, A. N. (2010). Block-GP: Scalable Gaussian process regression for multimodal data.In Proc. ICDM, pages 791796.
Dolan, J. M., Podnar, G., Stancliff, S., Low, K. H., Elfes, A., Higinbotham, J., Hosler, J. C., Moisan, T. A., and Moisan, J. (2009). Cooperative aquatic sensing using the telesupervised adaptive ocean sensor fleet. In Proc.
SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473.
Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets.JCGS, 15(3), 502523.
Golub, G. H. and Van Loan, C.-F. (1996). Matrix Computations. Johns Hopkins Univ. Press, third edition.
Ingram, B. and Cornford, D. (2010). Parallel geostatistics for sparse and dense datasets. In P. M. Atkinson and C. D. Lloyd, editors, Proc. geoENV VII, pages 371 381. Quantitative Geology and Geostatistics Volume 16, Springer, Netherlands.
Krause, A., Singh, A., and Guestrin, C. (2008). Nearoptimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9, 235284.
Lawrence, N. D., Seeger, M., and Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems 15, pages 609616. MIT Press.
Low, K. H., Gordon, G. J., Dolan, J. M., and Khosla, P.(2007). Adaptive sampling for multi-robot wide-area exploration. In Proc. IEEE ICRA, pages 755760.
Low, K. H., Dolan, J. M., and Khosla, P. (2011). Active Markov information-theoretic path planning for robotic environmental sensing. In Proc. AAMAS, pages 753 760.
Low, K. H., Chen, J., Dolan, J. M., Chien, S., and Thompson, D. R. (2012). Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing. In Proc. AAMAS, pages 105 112.
Park, C., Huang, J. Z., and Ding, Y. (2011). Domain decomposition approach for fast Gaussian process regression of large spatial data sets. JMLR, 12, 16971728.
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G. E., Gabriel, E., and Dongarra, J. (2007). Performance analysis of MPI collective operations. Cluster Computing, 10(2), 127143.
Podnar, G., Dolan, J. M., Low, K. H., and Elfes, A. (2010).Telesupervised remote surface water quality sensing. In Proc. IEEE Aerospace Conference.
Quinonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. JMLR, 6, 19391959.
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
Schwaighofer, A. and Tresp, V. (2002). Transductive and inductive methods for approximate Gaussian process regression. In Proc. NIPS, pages 953960.
Seeger, M. and Williams, C. (2003). Fast forward selection to speed up sparse Gaussian process regression. In Proc.AISTATS.
Snelson, E. (2007). Local and global sparse Gaussian process approximations. In Proc. AISTATS.
Snelson, E. and Ghahramani, Z. (2005). Sparse Gaussian processes using pseudo-inputs. In Proc. NIPS.
Vanhatalo, J. and Vehtari, A. (2008). Modeling local and global phenomena with sparse Gaussian processes. In Proc. UAI, pages 571578.
Vijayakumar, S., DSouza, A., and Schaal, S. (2005). Incremental online learning in high dimensions. Neural Comput., 17(12), 26022634.
Williams, C. K. I. and Seeger, M. (2000). Using the Nystrom method to speed up kernel machines. In Proc.NIPS, pages 682688.
Yu, J., Low, K. H., Oran, A., and Jaillet, P. (2012). Hierarchical Bayesian nonparametric approach to modeling and learning the wisdom of crowds of urban traffic route planning agents. In Proc. IAT , pages 478485.
-----0
Barandela, Ricardo and Gasca, Eduardo. Decontamination of Training Samples for Supervised Pattern Recognition Methods. In Advances in Pattern Recognition, volume 1876 of Lecture Notes in Computer Science, pp. 621630. 2000.
Bootkrajang, Jakramate and Kaban, Ata. Label-Noise Robust Logistic Regression and Its Applications. In Proceedings of the 2012 European conference on Machine learning and knowledge discovery in databases Volume Part I, ECML-PKDD12, pp. 143158.Springer-Verlag, 2012.
Bootkrajang, Jakramate and Kaban, Ata. Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics, 29(7):870877, 2013.
Bouveyron, Charles and Girard, Stephane. Robust supervised classification with mixture models: Learning from data with uncertain labels. Pattern Recognition, 42(11):26492658, 2009.
Fan, Wei and Stolfo, Salvatore J. AdaCost: misclassification cost-sensitive boosting. In Proeedings of the 16th International Conference on Machine Learning, pp. 97105. Morgan Kaufmann, 1999.
Frank, J. Andrew. and Asuncion, Arthur. UCI Machine Learning Repository, 2010. URL http://archive.ics.uci.edu/ml.
Freund, Yoav. A more robust boosting algorithm.Technical Report arXiv:0905.2138, 2009.
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. Additive Logistic Regression: a Statistical View of Boosting. Annals of Statistics, 28, 1998.
Hernandez-Lobato, Daniel, Hernandez-Lobato, Jose Miguel, and Dupont, Pierre. Robust MultiClass Gaussian Process Classification. In NIPS, pp.280288, 2011.
Karmaker, Amitava and Kwek, Stephen. A boosting approach to remove class label noise. International Journal of Hybrid Intelligent Systems, 3(3):169177, August 2006.
Krieger, Abba, Long, Chuan, and Wyner, Abraham. Boosting Noisy Data. In Proceedings of the 18th International Conference on Machine Learning, ICML01, pp. 274281. Morgan Kaufmann, 2001.
Lawrence, Neil D. and Scholkopf, Bernhard. Estimating a Kernel Fisher Discriminant in the Presence of Label Noise. In Proceedings of the 18th International Conference on Machine Learning, pp. 306313. Morgan Kaufmann, 2001.
Long, Philip M. and Servedio, Rocco A. Random classification noise defeats all convex potential boosters.Machine Learning, 78(3):287304, March 2010.
Masnadi-Shirazi, Hamed and Vasconcelos, Nuno.Cost-sensitive boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2):294 309, 2011.
Niculescu-Mizil, Alexandru and Caruana, Rich. Obtaining Calibrated Probabilities from Boosting. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI05, pp. 413420.AUAI Press, 2005.
Platt, John C. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In Advances in large margin classifiers, pp. 6174. MIT Press, 1999.
Raykar, Vikas C., Yu, Shipeng, Zhao, Linda H., Valadez, Gerardo Hermosillo, Florin, Charles, Bogoni, Luca, and Moy, Linda. Learning From Crowds.Journal of Machine Learning Research, 11:1297 1322, 2010.
Robertson, Tim. Wright F. T. Dykstra Richard L. Order Restricted Statistical Inference. John Wiley and Sons, New York, 1988.
Takenouchi, Takashi, Eguchi, Shinto, Murata, Noboru, and Kanamori, Takafumi. Robust boosting algorithm against mislabeling in multiclass problems. Neural Computation, 20(6):15961630, June 2008.
Vezhnevets, Alexander and Vezhnevets, Vladimir.Modest AdaBoost  Teaching AdaBoost to Generalize Better. In GraphiCon, Novosibirsk Akademgorodok, Russia, 2005.
-----0
Steven J. Bradtke and Andrew G. Barto. Linear leastsquares algorithms for temporal difference learning.Journal of Machine Learning Research (JMLR), 22:3357, 1996.
Alborz Geramifard, Finale Doshi, Joshua Redding, Nicholas Roy, and Jonathan How. Online discovery of feature dependencies. In Lise Getoor and Tobias Scheffer, editors, International Conference on Machine Learning (ICML), pages 881888. ACM, June 2011.
Alborz Geramifard, Robert H Klein, and Jonathan P How. RLPy: The Reinforcement Learning Library for Education and Research. http://acl.mit.edu/ RLPy, April 2013.
Noah D. Goodman, Joshua B. Tenenbaum, Thomas L.Griffiths, and Jacob Feldman. Compositionality in rational analysis: Grammar-based induction for concept learning. In M. Oaksford and N. Chater, editors, The probabilistic mind: Prospects for Bayesian cognitive science, 2008.
Carlos Guestrin, Daphne Koller, and Ronald Parr.Max-norm projections for factored mdps. In International Joint Conference on Artificial Intelligence (IJCAI), pages 673682, 2001.
Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. Journal of Machine Learning Research (JMLR), 4:11071149, 2003.
Stephen Lin and Robert Wright. Evolutionary tile coding: An automated state abstraction algorithm for reinforcement learning. In AAAI Workshop: Abstraction, Reformulation, and Approximation, Atlanta, Georgia, USA, 2010.
Sridhar Mahadevan, Mauro Maggioni, and Carlos Guestrin. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research (JMLR), 8:2007, 2006.
Shie Mannor and Doina Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In International Conference on Machine Learning (ICML), pages 449456. ACM Press, 2006.
Christopher Painter-Wakefield and Ronald Parr.Greedy algorithms for sparse reinforcement learning. In International Conference on Machine Learning (ICML), pages 968975. ACM, 2012.
Ronald Parr, Christopher Painter-Wakefield, Lihong Li, and Michael Littman. Analyzing feature generation for value-function approximation. In International Conference on Machine Learning (ICML), pages 737744, New York, NY, USA, 2007. ACM.
Bohdana Ratitch and Doina Precup. Sparse distributed memories for on-line value-based reinforcement learning. In European Conference on Machine Learning (ECML), pages 347358, 2004.
David Silver, Richard S. Sutton, and Martin Muller.Sample-based learning and search with permanent and transient memories. In International Conference on Machine Learning (ICML), pages 968975, New York, NY, USA, 2008. ACM.
Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement learning for RoboCupsoccer keepaway. International Society for Adaptive Behavior, 13(3):165188, 2005.
Nathan R. Sturtevant and Adam M. White. Feature construction for reinforcement learning in hearts.
In 5th International Conference on Computers and Games, 2006.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:944, 1988.
Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Adaptive tile coding for value function approximation. Technical Report AI-TR-07-339, University of Texas at Austin, 2007.
-----1
Charles W. Anderson. Learning and Problem Solving with Multilayer Connectionist Systems. PhD thesis, University of Massachusetts, Department of Com- puter and Information Science, Amherst, MA, USA, 1986.
A.G. Barto, R.S Sutton, and C. Anderson. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834846, 1983.
Richard Bellman. A markovian decision process. In- diana Univ. Math. J., 6:679684, 1957. ISSN 0022- 2518.
Dimitri P. Bertsekas. Distributed dynamic program- ming. IEEE Transactions on Automated Control, AC-27:610616, 1982.
C. Buyukkoc, Pravin Varaiya, and Jean Walrand. Ex- tensions of the multi-armed bandit problem. Techni- cal Report UCB/ERL M83/14, EECS Department, University of California, Berkeley, 1983.
Q. Chen. Approximate Kalman Filtering. Series in Ap- proximations and Decompositions. World Scientific, 1993. ISBN 9789810213596.
David Choi and Benjamin Van Roy. A generalized kalman filter for fixed point approximation and ef- ficient temporal-difference learning. Discrete Event Dynamic Systems, 16(2):207239, 2006.
Peter Dayan. The convergence of td(?) for general ?.Machine Learning, 8:341362, 1992.
A. Doucet, N. De Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Statistics for En- gineering and Information Science. Springer, 2001.ISBN 9780387951461.
Damien Ernst, Pierre Geurts, Louis Wehenkel, and L. Littman. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6: 503556, 2005.
Ra?zvan V. Florian. Correct equations for the dynamics of the cart-pole, 2007.
Jurgen Forster and Manfred K. Warmuth. Relative loss bounds for temporal-difference learning. Ma- chine Learning, 51(1):2350, 2003.
N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/non-gaussian bayesian state estimation. Radar and Signal Processing, IEE Pro- ceedings F, 140(2):107 113, apr 1993. ISSN 0956- 375X.
Ronald A. Howard. Dynamic Programming and Markov Processes. The M.I.T. Press, Cambridge, MA, USA, 1960. ISBN 0-262-08009-5.
Rudolph Emil Kalman. A new approach to linear fil- tering and prediction problems. Transactions of the ASMEJournal of Basic Engineering, 82(Series D): 3545, 1960.
G. Kitagawa and W. Gersch. Smoothness Priors Anal- ysis of Time Series. Lecture Notes in Statistics.Springer, 1996. ISBN 9780387948195.
Daphne Koller and Nir Friedman. Probabilistic Graph- ical Models - Principles and Techniques. MIT Press, 2009. ISBN 978-0-262-01319-2.
Michail G. Lagoudakis, Ronald Parr, and L. Bartlett.Least-squares policy iteration. Journal of Machine Learning Research, 4:2003, 2003.
D. Michie and R.A. Chambers. BOXES: An ex- periment in adaptive control. In E. Dale and D. Michie, editors, Machine Intelligence 2, pages 137152. Oliver and Boyd, Edinburgh, 1968.
Fernando J. Pineda. Mean-field theory for batched td (&lgr;). Neural Comput., 9(7):14031419, October 1997. ISSN 0899-7667.
Ben Van Roy. Learning and Value Function Approxi- mation in Complex Decision Processes. PhD thesis, MIT, 1998.
Donald B. Rubin. The sir algorithm. Journal of the American Statistical Association, 82(398):pp. 543 546, 1987. ISSN 01621459.
Stuart J. Russell and Peter Norvig. Artificial Intel- ligence: A Modern Approach. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, second edition, 2003.ISBN 978-81-203-2382-7.
Robert E. Schapire and Manfred K. Warmuth. On the worst-case analysis of temporal-difference learn- ing algorithms. Machine Learning, 22(1-3):95121, 1996.
Jrgen Schmidhuber. Networks adjusting networks. In Proceedings of Distributed Adaptive Neural Infor- mation Processing, St.Augustin, pages 2425. Old- enbourg, 1990.
J. Si and Yu-Tsung Wang. On-line learning control by association and reinforcement. Neural Networks, IEEE Transactions on, 12(2):264 276, mar 2001.ISSN 1045-9227. doi: 10.1109/72.914523.
Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:944, 1988. ISSN 0885-6125. 10.1007/BF00115009.
John N. Tsitsiklis and Benjamin Van Roy. Analysis of temporal-diffference learning with function approx- imation. In NIPS, pages 10751081, 1996.
C. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, England, 1989.
Greg Welch and Gary Bishop. An introduction to the kalman filter. Technical report, SIGGRAPH 2001, 2001. Course Notes.
-----0
Antonucci, A., Bruhlmann, R., Piatti, A., and Zaffalon, M. (2009). Credal networks for military identification problems. International Journal of Approximate Reasoning, 50(4):666679.
Antonucci, A. and Piatti, A. (2009). Modeling unreliable observations in Bayesian networks by credal networks. In Proceedings of the 3rd International Conference on Scalable Uncertainty Management (SUM), volume 5785 of Lecture Notes in Computer Science, pages 2839. Springer.
Antonucci, A., Piatti, A., and Zaffalon, M. (2007).Credal networks for operational risk measurement and management. In International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES), volume LNCS 4693, pages 604611.
Antonucci, A. and Zaffalon, M. (2006). Equivalence between Bayesian and credal nets on an updating problem. In Proceedings of 3rd International Conference on Soft Methods in Probability and Statistics (SMPS), pages 223230. Springer.
Benavoli, A., Zaffalon, M., and Miranda, E. (2011).Robust filtering through coherent lower previsions. IEEE Transactions on Automatic Control, 56(7):15671581.
Cozman, F. G. (2000). Credal networks. Artificial Intelligence, 120(2):199233.
Cozman, F. G., De Campos, C. P., Ide, J. S., and da Rocha, J. C. F. (2004). Propositional and relational Bayesian networks associated with imprecise and qualitative probabilistic assessments. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pages 104111. AUAI Press.
de Campos, C. P. (2011). New complexity results for MAP in Bayesian networks. In International Joint Conference on Artificial Intelligence (IJCAI), pages 21002106. AAAI Press.de Campos, C. P. and Cozman, F. G. (2005). The inferential complexity of Bayesian and credal networks. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 13131318, San Francisco, CA, USA. Morgan Kaufmann.de Cooman, G., Hermans, F., Antonucci, A., and Zaffalon, M. (2010). Epistemic irrelevance in credal nets: the case of imprecise Markov trees.International Journal of Approximate Reasoning, 51(9):10291052.
de Cooman, G. and Troffaes, M. (2004). Coherent lower previsions in systems modelling : products and aggregation rules. Reliability engineering & system safety, 85(1-3):113134.
Fagiuoli, E. and Zaffalon, M. (1998). 2U: An exact interval propagation algorithm for polytrees with binary variables. Artificial Intelligence, 106(1):77 107.
Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness. W.H. Freeman.
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models. MIT press.
Kwisthout, J. and van der Gaag, L. C. (2008). The computational complexity of sensitivity analysis and parameter tuning. In Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI), pages 349356. AUAI Press.
Levi, I. (1980). The Enterprise of Knowledge. MIT Press, London.
Maua, D. D., de Campos, C. P., and Zaffalon, M.(2011). Solving limited memory influence diagrams.CoRR, abs/1109.1754.
Maua, D. D., de Campos, C. P., and Zaffalon, M.(2012). Solving limited memory influence diagrams.Journal of Artificial Intelligence Research, 44:97 140.
Maua, D. D., de Campos, C. P., and Zaffalon, M.(2012). Updating credal networks is approximable in polynomial time. International Journal of Approximate Reasoning, 53(8):11831199.
Park, J. D. and Darwiche, A. (2004). Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research, 21:101133.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California.
Piatti, A., Antonucci, A., and Zaffalon, M. (2010).Building Knowledge-Based Systems by Credal Networks: A Tutorial. Nova Science.
Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London.
Zaffalon, M. and Miranda, E. (2009). Conservative inference rule for uncertain reasoning under incompleteness. Journal of Artificial Intelligence Research, 34:757821.
-----0
J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), volume 3, page 3, 2008.
Jean-Yves Audibert, Remi Munos, and Csaba Szepesvari. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor.Comput. Sci., 2009a.
J.Y. Audibert, R. Munos, and C. Szepesvari.Explorationexploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):18761902, 2009b.
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235256, 2002a.
P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E.Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48 77, 2002b.
M.F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. JCSS, 75(1), 2009.
Y. Baram, R. El-Yaniv, and K. Luz. Online choice of active learning algorithms. The Journal of Machine Learning Research, 5:255291, 2004.
P.L. Bartlett, V. Dani, T. Hayes, S. Kakade, A. Rakhlin, and A. Tewari. High-probability regret bounds for bandit online linear optimization. COLT, 2008.
E.B. Baum and K. Lang. Query learning can work poorly when a human oracle is used. In IJCNN, 1992.
Sebastien Bubeck and Nicolo` Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721, 2012.
W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng. Unbiased online active learning in data streams. In SIGKDD, 2011.
D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15 (2), 1994.
S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. NIPS, 2007.
Patrick Flaherty, Michael I. Jordan, and Adam P.Arkin. Robust design of biological experiments. In Neural Information Processing Systems, 2005.
R. Ganti and A. Gray. Upal: Unbiased pool based active learning. Arxiv preprint arXiv:1111.1784, 2011.
Y. Guo and R. Greiner. Optimistic active learning using mutual information. In IJCAI, 2007.
S. Hanneke and L. Yang. Surrogate losses in passive and active learning. arXiv preprint arXiv:1207.3772, 2012.
S.C.H. Hoi, R. Jin, J. Zhu, and M.R. Lyu. Batch mode active learning and its application to medical image classification. In ICML, 2006.
D.D. Lewis and W.A. Gale. A sequential algorithm for training text classifiers. In SIGIR, 1994.
Yurii Nesterov and A Nemirovsky. Interior point polynomial methods in convex programming, 1994.
Antoine Salomon and Jean-Yves Audibert. Deviations of stochastic bandit regret. In Algorithmic Learning Theory, pages 159173. Springer, 2011.
B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In EMNLP, 2008.
Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of WisconsinMadison, 2009.
H.S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In COLT, pages 287294. ACM, 1992.
S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM international conference on Multimedia, 2001.
T. Zhang and F. Oles. The value of unlabeled data for classification problems. In ICML, 2000.
Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani.Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
-----1
[1] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite- time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235256, 2002.
[2] R. Balla and A. Fern. UCT for tactical assault planning in real-time strategy games. In IJCAI, pages 4045, 2009.
[3] R. Bjarnason, A. Fern, and P. Tadepalli. Lower bounding Klondike Solitaire with Monte-Carlo planning. In ICAPS, 2009.
[4] B. Bonet and H. Geffner. Action selection for MDPs: Anytime AO? vs. UCT. In AAAI, 2012.
[5] S. Bubeck and R. Munos. Open loop optimistic planning. In COLT, pages 477489, 2010.
[6] S. Bubeck, R. Munos, and G. Stoltz. Pure ex- ploration in finitely-armed and continuous-armed bandits. Theor. Comput. Sci., 412(19):18321852, 2011.
[7] L. Busoniu and R. Munos. Optimistic planning for Markov decision processes. In AISTATS, num- ber 22 in JMLR (Proceedings Track), pages 182 189, 2012.
[8] T. Cazenave. Nested Monte-Carlo search. In IJ- CAI, pages 456461, 2009.
[9] P-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proceedings of the 23rd Con- ference on Uncertainty in Artificial Intelligence (UAI), pages 6774, Vancouver, BC, Canada, 2007.
[10] P. Eyerich, T. Keller, and M. Helmert. High- quality policies for the Canadian Travelers prob- lem. In AAAI, 2010.
[11] Z. Feldman and C. Domshlak. Simple regret op- timization in online planning for markov decision processes. CoRR, arXiv:1206.3382v2 [cs.AI], 2012.
[12] S. Gelly and D. Silver. Monte-Carlo tree search and rapid action value estimation in computer Go. AIJ, 175(11):18561875, 2011.
[13] N. Hay, S. E. Shimony, D. Tolpin, and S. Russell.Selecting computations: Theory and applications.In UAI, 2012.
[14] T. Keller and P. Eyerich. Probabilistic planning based on UCT. In ICAPS, 2012.
[15] L. Kocsis and C. Szepesvari. Bandit based Monte- Carlo planning. In ECML, pages 282293, 2006.
[16] A. Kolobov, Mausam, and D. Weld. LRTDP vs.UCT for online probabilistic planning. In AAAI, 2012.
[17] A. Y. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, 2000.
[18] L. Peret and F. Garcia. On-line search for solving Markov decision processes via heuristic sampling.In ECAI, pages 530534, 2004.
[19] M. Puterman. Markov Decision Processes. Wiley, 1994.
[20] H. Robbins. Some aspects of the sequential de- sign of experiments. Bull. Amer. Math. Soc., 58(5):527535, 1952.
[21] C. D. Rosin. Nested rollout policy adaptation for Monte Carlo tree search. In IJCAI, pages 649 654, 2011.
[22] N. Sturtevant. An analysis of UCT in multi-player games. In CCG, page 3749, 2008.
[23] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[24] D. Tolpin and S. E. Shimony. Doing better than UCT: Rational Monte Carlo sampling in trees.CoRR, arXiv:1108.3711v1 [cs.AI], 2011.
[25] D. Tolpin and S. E. Shimony. MCTS based on simple regret. In AAAI, 2012.
[26] S. W. Yoon, A. Fern, R. Givan, and S. Kambham- pati. Probabilistic planning via determinization in hindsight. In AAAI, pages 10101016, 2008.
-----1
[1] A. Argyriou, T. Evgeniou, and M. Pontil. Con- vex multi-task feature learning. Machine Learning, 73(3):243272, 2008.
[2] A. Argyriou., C. A. Micchelli, M. Pontil, and Y. Ying.A spectral regularization framework for multi-task structure learning. NIPS, 2008.
[3] S. Bengio, F. Pereira, Y. Singer, and D. Strelow.Group sparse coding. NIPS, 22, 2009.
[4] J. Bigot, R. Biscay, J. M. Loubes, and L. M. Al- varez. Group Lasso estimation of high-dimensional covariance matrices. Journal of Machine Learning Research, 2011.
[5] J. Bigot, R. Biscay, J. M. Loubes, and L. Muniz- Alvarez. Nonparametric estimation of covariance functions by model selection. EJS, 4, 2010.
[6] E.J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? JACM, 2011.
[7] V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, and A.S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal of Optimization, 2011.
[8] J. Fan, Y. Fan, and J. Lv. High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1):186197, 2008.
[9] Pinghua Gong, Jieping Ye, and Changshui Zhang.Multi-stage multi-task feature learning. NIPS, 2012.
[10] D.A. Harville. Maximum likelihood approaches to variance component estimation and to related prob- lems. JASA, pages 320338, 1977.
[11] D. Hsu, S. M. Kakade, and T. Zhang. Robust Matrix Decomposition with Outliers. IEEE Transactions on Information Theory, 2011.
[12] J. Huang and T. Zhang. The benefit of group sparsity.The Annals of Statistics, 38(4):19782004, 2010.
[13] K. Lounici, M. Pontil, A. B. Tsybakov, and S. van der Geer. Taking advantage of sparsity in multi-task learning. COLT09, 2009.
[14] J. A. Tropp. Algorithms for simultaneous sparse ap- proximation. part II: Convex relaxation. Signal Pro- cessing, 86:589602, 2006.
[15] D. P. Wipf and B. D. Rao. An empirical bayesian strategy for solving the simultaneous sparse approx- imation problem. Signal Processing, IEEE Transac- tions on, 55(7):37043716, 2007.
[16] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. JRSS: Series B, 68, 2006.
[17] Tong Zhang. Analysis of multi-stage convex relax- ation for sparse regularization. JMLR, 2010.
-----0
S. Amizadeh, S. Wang, and M. Hauskrecht. An efficient framework for constructing generalized locallyinduced text metrics. In IJCAI-11, pages 1159 1164, 2011.
S. Amizadeh, B. Thiesson, and M. Hauskrecht. Variational dual-tree framework for large-scale transition matrix approximation. In the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12), pages 6473, 2012a.
S. Amizadeh, H. Valizadegan, and M. Hauskrecht. Factorized diffusion map approximation. Journal of Machine Learning Research-Proceedings Track, 22: 3746, 2012b.
A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S.Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In ACM SIGKDD, pages 509514, 2004.
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. JMLR, 6: 17051749, 2005.
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:13731396, 2002.
L. Cayton. Fast nearest neighbor retrieval for bregman divergences. In ICML, pages 112119, 2008.
H. Deng, J. Han, B. Zhao, Y. Yu, and C. X. Lin.Probabilistic topic models with biased propagation on heterogeneous information networks. In KDD, pages 12711279, 2011.
I. Dhillon and S. Sra. Generalized nonnegative matrix approximations with Bregman divergences. 2005.
D. Greene and P. Cunningham. Practical solutions to the problem of diagonal dominance in kernel document clustering. In ICML, pages 377384, 2006.
T. Jebara, J. Wang, and S. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, volume 382, page 56. ACM, 2009.
S. Kumar, M. Mohri, and A. Talwalkar. Sampling techniques for the nystrom method. Journal of Machine Learning Research Proceedings Track, 5:304 311, 2009.
Ken Lang. News weeder: Learning to filter netnews.In Machine Learning International Workshop, pages 331339. Morgan Kufmann Publishers, Inc., 1995.
D. Lee, A.G. Gray, and A.W. Moore. Dual-tree fast gauss transforms. arXiv preprint arXiv:1102.2878, 2011.
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y.Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proc. of 49th An. Meeting of the Assoc. for Comp. Ling., pages 142150, June 2011.
A. W. Moore. The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In UAI, pages 397405, 2000.
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001a.
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In SIGIR 01, pages 258266, August 2001b.
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Link analysis, eigenvectors and stability. IJCAI, pages 903 910, 2001c.
L. Qiao, S. Chen, and X. Tan. Sparsity preserving projections with applications to face recognition. Pattern Recognition, 43(1):331341, 2010.
T S?ingliar and M. Hauskrecht. Noisy-or component analysis and its application to link analysis.The Journal of Machine Learning Research, 7:2189 2213, 2006.
A. Talwalkar, S. Kumar, and H. A. Rowley. Largescale manifold learning. In CVPR, pages 18, 2008.
M. Telgarsky and S. Dasgupta. Agglomerative bregman clustering. arXiv:1206.6446, 2012.
B. Thiesson and J. Kim. Fast variational modeseeking. In AISTATS 2012, JMLR 22: W&CP 22.Journal of Machine Learning Research, 2012.
U. von Luxburg. A tutorial on spectral clustering.Statistics and Computing, 17(4):395416, 2007.
C. Yang, R. Duraiswami, N.A. Gumerov, and L. Davis.Improved fast gauss transform and efficient kernel density estimation. In Computer Vision, pages 664 671, 2003.
C. Yang, R. Duraiswami, L. Davis, et al. Efficient kernel machines using the improved fast gauss transform. NIPS, 17:15611568, 2005.
L. Zhang, S. Chen, and L. Qiao. Graph optimization for dimensionality reduction with sparsity constraints. Pattern Recognition, 45(3):12051210, 2012.
Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. KH Tung. Similarity search on bregman divergence: Towards non-metric indexing. Proc. of the VLDB Endowment, 2(1):1324, 2009.
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS. MIT Press, 2003.
X. Zhu. Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2005.
-----1
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Net- work Flows: Theory, Algorithms, and Applications.Prentice Hall, 1993.
[2] F. Bach. Structured sparsity-inducing norms through submodular functions. In Advances in Neural Infor- mation Processing Systems 23 (NIPS 2010), pages 118126, 2010.
[3] A. Beck and M. Teboulle. A fast iterative shrinkage- thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183202, 2009.
[4] A. Chambolle and J. Darbon. On total variation mini- mization and surface evolution using parametric max- imal flows. International Journal of Computer Vision, 84(3), 2009.
[5] P. Boldi et al. Laboratory for Web Algorithmics.http://law.di.unimi.it/datasets.php.
[6] L. Fleischer and S. Iwata. A push-relabel frame- work for submodular function minimization and ap- plications to parametric optimization. Discrete Appl.Math., 131:311322, 2003.
[7] J. Friedman, T. Hastie, H. Holfling, and R. Tibshi- rani. Pathwise coordinate optimization. Annals of statistics, 1(2):302332, 2007.
[8] S. Fujishige. Lexicographically optimal base of a poly- matroid with respect to a weight vector. Mathematics of Operations Research, 5:186196, 1980.
[9] S. Fujishige. Submodular Functions and Optimization.Elsevier, 2nd edition, 2005.
[10] S. Fujishige, T. Hayashi, and S. Isotani. The minimum-norm-point algorithm applied to submod- ular function minimization and linear programming.Technical report, Research Institute for Mathemati- cal Sciences Preprint RIMS-1571, Kyoto University, Kyoto, Japan, 2006.
[11] G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications.SIAM Journal on Computing, 18:3055, 1989.
[12] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum flow problem. Journal of the ACM, 35:921940, 1988.
[13] S. Iwata and K. Nagano. Submodular function min- imization under covering constraints. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2009), pages 671680, 2009.
[14] S. Jegelka, H. Lin, and J. Bilmes. On fast approximate submodular minimization. In Advances in Neural In- formation Processing Systems 24 (NIPS 2011), pages 460468, 2011.
[15] Pushmeet Kohli, Lubor Ladicky, and Philip H.S. Torr.Robust higher order potentials for enforcing label con- sistency. International Journal of Computer Vision, 82:302324, 2009.
[16] V. Kolmogorov. A faster algorithm for computing the principal sequence of partitions ofa graph. Algorith- mica, 56:394412, 2010.
[17] L. Lovasz. Submodular functions and convexity. In A. Bachem, M. Grotschel, and B. Korte, editors, Mathematical Programming  The State of the Art, pages 235257. Springer-Verlag, 1983.
[18] J. Mairal, F. Bach, J. Ponce, G. Sapiro, R. Jenat- ton, and G. Obozinski. SPArse Modeling Software.http://spams-devel.gforge.inria.fr/.
[19] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach.Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research, 12:26812720, 2011.
[20] N. Megiddo. Optimal flows in networks with multiple sources and sinks. Mathematical Programming, 7:97 107, 1974.
[21] K. Nagano. A faster parametric submodular function minimization algorithm and applications. Technical report, METR 2007-43, University of Tokyo, Tokyo, Japan, 2007.
[22] K. Nagano and K. Aihara. Equivalence of convex min- imization problems over base polytopes. Japan jour- nal of industrial and applied mathematics, 29:519534, 2012.
[23] K. Nagano, Y. Kawahara, and K. Aihara. Size- constrained submodular minimization through mini- mum norm base. In Proceedings of the 28th Interna- tional Conference on Machine Learning (ICML 2011), pages 977984, 2011.
[24] K. Nagano, Y. Kawahara, and S. Iwata. Minimum average cost clustering. In Advances in Neural In- formation Processing Systems 23 (NIPS 2010), pages 17591767, 2010.
[25] M. Narasimhan, N. Jojic, and J. Bilmes. Q-clustering.In Advances in Neural Information Processing Sys- tems 18 (NIPS 2005), pages 979986, 2005.
[26] Yu. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127152, 2005.
[27] J. B. Orlin. A faster strongly polynomial time algo- rithm for submodular function minimization. Mathe- matical Programming, 118:237251, 2009.
[28] J. B. Orlin. Max flows in O(nm) time or less. In Pro- ceedings of the 45th ACM Symposium on the Theory of Computing (STOC 2013), 2013. To appear.
[29] M. Queyranne. Minimizing symmetric submodular functions. Mathematical Programming, 82:312, 1998.
[30] A. Schrijver. Combinatorial Optimization  Polyhe- dra and Efficiency. Springer-Verlag, 2003.
[31] P. Stobbe and A. Krause. Efficient minimization of decomposable submodular functions. In Advances in Neural Information Processing Systems 23 (NIPS 2010), pages 22082216, 2010.
-----1
[1] C. M. Bishop. Pattern recognition and machine learning, volume 4. Springer New York, 2006.
[2] J. Chen, K. H. Low, C. K.-Y. Tan, A. Oran, P. Jaillet, J. Dolan, and G. Sukhatme. Decentralized data fusion and active sensing with mobile sensors for modeling and pre- dicting spatiotemporal traffic phenomena. In Proceedings of the Twenty-Eighth Conference Annual Conference on Un- certainty in Artificial Intelligence (UAI-12), pages 163173, Corvallis, Oregon, 2012.
[3] M. Cherubini, M. Zhu, N. Oliver, and M. Cebrian. Explor- ing Social Networks as an Infrastructure for Transportation Networks. In Presentation at the International School and Conference on Network Science (NetSci10), Boston, MA, USA, 2010.
[4] E. Cho, S. A. Myers, and J. Leskovec. Friendship and mo- bility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international con- ference on Knowledge discovery and data mining, pages 10821090. ACM, 2011.
[5] CIA Directorate of Intelligence. The world factbook. July 2008.
[6] N. Eagle and A. S. Pentland. Eigenbehaviors: identifying structure in routine. Behavioral Ecology and Sociobiology, 63(7):10571066, 2009.
[7] H. Gao, J. Tang, and H. Liu. Exploring social-historical ties on location-based social networks. In 6th International AAAI Conference on Weblogs and Social Media, 2012.
[8] M. Gonzalez, C. Hidalgo, and A. Barabasi. Under- standing individual human mobility patterns. Nature, 453(7196):779782, June 2008.
[9] L. Hufnagel, D. Brockmann, and T. Geisel. Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences of the United States of America, 101(42):1512415129, 2004.
[10] B. Keller, P. von Bergen, R. Wattenhofer, and S. Welten. On the feasibility of opportunistic ad hoc music sharing. Mobile Data Challenge by Nokia Workshop, in conjunction with In- ternational Conference on Pervasive Computing, 2012.
[11] K. Laskey, N. Xu, and C. H. Chen. Propagation of delays in the national airspace system. In Proceedings of the Twenty- Second Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 265272, Arlington, Virginia, 2006.
[12] J. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T. Do, O. Dousse, J. Eberle, and M. Miettinen. The mo- bile data challenge: Big data for mobile computing research.In Mobile Data Challenge by Nokia Workshop, in conjunc- tion with International Conference on Pervasive Computing, Newcastle, UK, 2012.
[13] C. Liu and J. Wu. Practical routing in a cyclic mobispace.Networking, IEEE/ACM Transactions on, 19(2):369382, 2011.
[14] J. McInerney, A. Rogers, and N. R. Jennings. Improving location prediction services for new users with probabilis- tic latent semantic analysis. In Mobile Data Challenge by Nokia Workshop, in conjunction with International Confer- ence on Pervasive Computing, 2012.
[15] J. McInerney, J. Zheng, A. Rogers, and N. R. Jennings.Modelling heterogeneous location habits in human popula- tions for location prediction under data sparsity. In Interna- tional Joint Conference on Pervasive and Ubiquitous Com- puting (UbiComp 2013), in press.
[16] R. M. Neal. Markov chain sampling methods for dirich- let process mixture models. Journal of computational and graphical statistics, 9(2):249265, 2000.
[17] E. Nikolova, M. Brand, and D. R. Karger. Optimal route planning under uncertainty. In Proceedings of International Conference on Automated Planning and Scheduling, 2006.
[18] E. Nikolova and D. R. Karger. Route planning under un- certainty: The canadian traveller problem. In Proc. AAAI, pages 969974, 2008.
[19] G. Pickard, I. Rahwan, W. Pan, M. Cebrian, R. Crane, A. Madan, and A. Pentland. Time critical social mobiliza- tion: The darpa network challenge winning strategy. 2010.
[20] M. L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc., 1994.
[21] I. Rahwan, S. Dsouza, A. Rutherford, V. Naroditskiy, J. McInerney, M. Venanzi, N. Jennings, and M. Cebrian.Global manhunt pushes the limits of social mobilization.IEEE Computer, 46(4):6875, 2013.
[22] A. Sadilek and J. Krumm. Far out: Predicting long-term human mobility. In Twenty-Sixth AAAI Conference on Arti- ficial Intelligence, 2012.
[23] S. Scellato, M. Musolesi, C. Mascolo, V. Latora, and A. Campbell. Nextplace: a spatio-temporal prediction framework for pervasive systems. In Pervasive Computing, pages 152169, San Francisco, CA, USA, 2011. Springer.
[24] J. Scott, A. J. Brush, J. Krumm, B. Meyers, M. Hazas, S. Hodges, and N. Villar. PreHeat: controlling home heating using occupancy prediction. In Proceedings of the 13th in- ternational conference on Ubiquitous computing (UbiComp 2011), pages 281290, Beijing, China, 2011.
[25] L. Song, D. Kotz, R. Jain, and X. He. Evaluating next-cell predictors with extensive wi-fi mobility data. IEEE Trans- actions on Mobile Computing, 5(12):16331649, 2006.
[26] E. Stevens-Navarro, Y. Lin, and V. W. Wong. An mdp-based vertical handoff decision algorithm for heterogeneous wire- less networks. Vehicular Technology, IEEE Transactions on, 57(2):12431254, 2008.
[27] V. Vukadinovic, . R. Helgason, and G. Karlsson. A mobil- ity model for pedestrian content distribution. In Proceedings of the 2nd International Conference on Simulation Tools and Techniques, page 93. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2009.
[28] A. Wesolowski, N. Eagle, A. J. Tatem, D. L. Smith, A. M. Noor, R. W. Snow, and C. O. Buckee. Quantify- ing the impact of human mobility on malaria. Science, 338(6104):267270, 2012.
[29] J. H. Wu and R. Givan. Feature-discovering approximate value iteration methods. Abstraction, Reformulation and Approximation, pages 901901, 2005.
-----1
[1] Walid Ben-Ameur and Jose Neto. Acceleration of cutting-plane and column generation algoirthms: ap- plications to network design. Networks, 49(1):3  17, 2007.
[2] Hung Hai Bui, Tuyen N. Huynh, and Rodrigo de Salvo Braz. Lifted inference with distinct soft evi- dence on every object. In AAAI-2012, 2012.
[3] Hung Hai Bui, Tuyen N. Huynh, and Se- bastian Riedel. Automorphism groups of graphical models and lifted variational in- ference. Technical report, 2012. URL http://arxiv.org/abs/1207.4814.
[4] R. de Salvo Braz, E. Amir, and D. Roth. Lifted first- order probabilistic inference. In Proceedings of the 19th International Joint Conference on Artificial In- telligence (IJCAI 05), pages 13191125, 2005.
[5] Amir Globerson and Tommi Jaakkola. Fixing max- product: Convergent message passing algorithms for MAP LP-relaxations. In Advances in Neural Informa- tion Processing Systems (NIPS 07), pages 553560, 2007.
[6] Chris Godsil and Gordon Royle. Algebraic Graph Theory. Springer, 2001.
[7] V. Gogate and P. Domingos. Exploiting logical struc- ture in lifted probabilistic inference. In AAAI Work- shop on Statistical Relational AI, 2010.
[8] Vibhav Gogate and Pedro Domingos. Probabilis- tic theorem proving. In Proceedings of the Twenty- Seventh Annual Conference on Uncertainty in Artifi- cial Intelligence (UAI-11), pages 256265, 2011.
[9] Ariel Jaimovich, Ofer Meshi, and Nir Friedman.Template based inference in symmetric relational markov random fields. In Proceedings of the Twenty- Third Conference on Uncertainty in Artificial Intel- ligence, Vancouver, BC, Canada, July 19-22, 2007, pages 191199. AUAI Press, 2007.
[10] Brendan D. McKay. Practical Graph Isomorphism.Congressus Numerantium, 30:4587, 1981.
[11] M. Mladenov, B. Ahmadi, and K. Kersting. Lifted linear programming. In 15th International Confer- ence on Artificial Intelligence and Statistics (AISTATS 2012), 2012.
[12] Mathias Niepert. Lifted probabilistic inference: an MCMC perspective. In Statistical Relational AI Workshop at UAI 2012, 2012.
[13] Mathias Niepert. Markov chains on orbits of permu- tation groups. In UAI-2012, 2012.
[14] Matt Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62:107136, 2006.
[15] Hanif D. Sherali and Warren P. Adams. A hierar- chy of relaxations between the continuous and convex hull representations for zero-one programming prob- lems. SIAM Journal on Discrete Mathematics, 3(3): 411430, 1990.
[16] Parag Singla and Pedro Domingos. Lifted first- order belief propagation. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI 08), pages 10941099, 2008.
[17] D. Sontag and T. Jaakkola. New outer bounds on the marginal polytope. In Advances in Neural Infor- mation Processing Systems (NIPS 07), pages 1393 1400, 2007.
[18] David Sontag. Approximate Inference in Graphi- cal Models using LP Relaxations. PhD thesis, Mas- sachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2010.
[19] Martin Wainwright and Michael Jordan. Graphical Models, Exponential Families, and Variational Infer- ence. Now Publishers, 2008.
-----1
[1] F. Bach. Structured sparsity-inducing norms through submodular functions. NIPS, 2010.
[2] F. Bach. Learning with Submodular functions: A convex Optimization Perspective. Arxiv, 2011.
[3] A. Banerjee, S. Meregu, I. S. Dhilon, and J. Ghosh.Clustering with Bregman divergences. JMLR, 6:17051749, 2005.
[4] J. Bartholdi, C. Tovey, and M. Trick. Voting schemes for which it can be difficult to tell who won the election. Social Choice and welfare, 6(2):157 165, 1989.
[5] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Comput. Math and Math Physics, 7, 1967.
[6] L. Busse, P. Orbanz, and J. Buhmann. Cluster analysis of heterogeneous rank data. In In ICML, volume 227, pages 113120, 2007.
[7] Y. Censor and S. Zenios. Parallel optimization: Theory, algorithms, and applications. Oxford Uni- versity Press, USA, 1997.
[8] S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya. Structured learning for non- smooth ranking losses. In SIGKDD, pages 8896.ACM, 2008.
[9] G. Choquet. Theory of capacities. In Annales de linstitut Fourier, volume 5, page 87, 1953.
[10] D. Critchlow. Metric methods for analyzing par- tially ranked data in Lecture Notes in Statistics No. 34. Springer-Verlag, Berlin 1985, 1985.
[11] W. H. Cunningham. Decomposition of submodular functions. Combinatorica, 3(1):5368, 1983.
[12] A. Dubey, J. Machchhar, C. Bhattacharyya, and S. Chakrabarti. Conditional models for non- smooth ranking loss functions. In ICDM, pages 129138, 2009.
[13] J. Edmonds. Submodular functions, matroids and certain polyhedra. Combinatorial structures and their Applications, 1970.
[14] M. Fligner and J. Verducci. Distance based rank- ing models. Journal of the Royal Statistical Society.Series B (Methodological), pages 359369, 1986.
[15] M. Fligner and J. Verducci. Multistage ranking models. Journal of the American Statistical Asso- ciation, 83(403):892901, 1988.
[16] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer.An efficient boosting algorithm for combining pref- erences. JMLR, 4:933969, 2003.
[17] S. Fujishige. Submodular functions and optimiza- tion, volume 58. Elsevier Science, 2005.
[18] G. Gordon. Regret bounds for prediction problems.In In COLT, pages 2940. ACM, 1999.
[19] R. Iyer and J. Bilmes. The submodular Breg- man and Lovasz-Bregman divergences with appli- cations. In NIPS, 2012.
[20] R. Iyer and J. Bilmes. The Lovasz-Bregman Diver- gence and connections to rank aggregation, clus- tering and web ranking: Extended Version of UAI paper. 2013.
[21] R. Iyer, S. Jegelka, and J. Bilmes. Fast semidif- ferential based submodular function optimization.In ICML, 2013.
[22] K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents.In In SIGIR, pages 4148. ACM, 2000.
[23] M. Kendall. A new measure of rank correlation.Biometrika, 30(1/2):8193, 1938.
[24] K. Kirchhoff et al. Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In ICSLP, volume 98, pages 891894. Citeseer, 1998.
[25] K. Kiwiel. Proximal minimization methods with generalized Bregman functions. SIAM Journal on Control and Optimization, 35(4):11421168, 1997.
[26] A. Klementiev, D. Roth, and K. Small. Unsuper- vised rank aggregation with distance-based models.In ICML, 2008.
[27] G. Lebanon and J. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In ICML, 2002.
[28] T.-Y. Liu. Learning to rank for information re- trieval. Foundations and Trends in Information Retrieval, 3(3):225331, 2009.
[29] L. Lovasz. Submodular functions and convexity.Mathematical Programming, 1983.
[30] C. Mallows. Non-null ranking models. i.Biometrika, 44(1/2):114130, 1957.
[31] M. Meila?, K. Phadnis, A. Patterson, and J. Bilmes.Consensus ranking under the exponential model.In In UAI, 2007.
[32] T. Murphy and D. Martin. Mixtures of distance- based models for ranking data. Computational statistics & data analysis, 41(3):645655, 2003.
[33] R. Rockafellar. Convex analysis, volume 28.Princeton Univ Pr, 1970.
[34] A.-V. I. Rosti, N. F. Ayan, B. Xiang, S. Matsoukas, R. Schwartz, and B. Dorr. Combining outputs from multiple machine translation systems. In NAACL - HLT, 2007.
[35] K. A. Spackman. Signal detection theory: Valu- able tools for evaluating inductive learning. In Proceedings of the sixth international workshop on Machine learning, 1989.
[36] A. F. Tehrani, W. Cheng, and E. Hullermeier. Pref- erence learning using the choquet integral: The case of multipartite ranking. IEEE Transactions on Fuzzy Systems, 2012.
[37] M. Telgarsky and S. Dasgupta. Agglomerative Bregman clustering. In ICML, 2012.
[38] Y. Yue, T. Finley, F. Radlinski, and T. Joachims.A support vector method for optimizing average precision. In SIGIR. ACM, 2007.
-----0
Abbasi-Yadkori, Y., Pal, D., & Szepesvari, C. (2011). Improved Algorithms for Linear Stochastic Bandits. In Advances in Neural Information Processing Systems 24, pp. 23122320.
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397422.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2003). The Nonstochastic Multiarmed Bandit Problem. SIAM Journal on Computing, 32 (1), 4877.
Beygelzimer, A., Langford, J., Li, L., Reyzin, L., & Schapire, R. E. (2010). Contextual Bandit Algorithms with Supervised Learning Guarantees.Machine Learning, 15, 14.
Bubeck, S., & Cesa-Bianchi, N. (2012). Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, 5, 1122.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press, New York, NY.
Chen, Y., Garcia, E. K., Gupta, M. R., Rahimi, A., & Cazzanti, L. (2009). Similarity-based Classification: Concepts and Algorithms. Journal of Machine Learning Research, 10 (206), 747776.
Chu, L., Li, L., Reyzin, L., & Schapire, R. E. (2011).Contextual Bandits with Linear Payoff Functions. In Proceedings of the 14th International Conference on Articial Intelligence and Statistics.
Dani, V., Hayes, T. P., & Kakade, S. M. (2008).Stochastic Linear Optimization under Bandit Feedback. In The 21st Annual Conference on Learning Theory, pp. 355366.
Dudik, M., Hsu, D., Kale, S., Karampatziakis, N., Langford, J., Reyzin, L., & Zhang, T. (2011).Efficient Optimal Learning for Contextual Bandits. Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence.
Grunewalder, S., Audibert, J.-Y., Opper, M., & Shawe-Taylor, J. (2010). Regret Bounds for Gaussian Process Bandit Problems. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics.
Haasdonk, B., & Pekalska, E. (2010). Classification with Kernel Mahalanobis Distance Classifiers. Advances in Data Analysis, Data Handling and Business Intelligence, 351361.
Kleinberg, R., Slivkins, A., & Upfal, E. (2008). Multiarmed bandit problems in metric spaces. In Proceedings of the 40th ACM symposium on Theory Of Computing, pp. 681690.
Krause, A., & Ong, C. S. (2011). Contextual Gaussian Process Bandit Optimization. In Proceedings of Neural Information Processing Systems (NIPS).
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6 (1), 422.
Langford, J., & Zhang, T. (2008). The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. In Platt, J. C., Koller, D., Singer, Y., & Roweis, S. (Eds.), Advances in Neural Information Processing Systems 20, pp. 817824. MIT Press, Cambridge, MA.
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010).A Contextual-Bandit Approach to Personalized News Article Recommendation. WWW 10, 173, 10.
Lu, T., Pal, D., & Pal, M. (2010). Contextual MultiArmed Bandits. In Teh, Y. W., & Titterington, M. (Eds.), Proceedings of the 13th international conference on Artificial Intelligence and Statistics, Vol. 9, pp. 485492.
Rusmevichientong, P., & Tsitsiklis, J. N. (2010). Linearly Parameterized Bandits. Math. Oper. Res., 35 (2), 395411.
Seldin, Y., Auer, P., Laviolette, F., Shawe-Taylor, J. S., & Ortner, R. (2011). PAC-Bayesian Analysis of Contextual Bandits. In Neural Information Processing Systems (NIPS), pp. 16831691.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.
Slivkins, A. (2009). Contextual Bandits with Similarity Information. Proceedings of the 24th annual Conference On Learning Theory, 127.
Srinivas, N., Krause, A., Kakade, S., & Seeger, M.(2010). Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. Proceedings of International Conference on Machine Learning, 10151022.
Steinberger, R., Pouliquen, B., & Van der Goot, E.(2009). An Introduction to the {Europe Media Monitor} Family of Applications. In Information Access in a Multilingual World-Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR2009), pp. 18.
Zhang, F. (2005). The Schur complement and its applications, Vol. 4. Springer.
-----1
[Abiteboul et al., 1995] Abiteboul, S., Hull, R., and Vianu, V. (1995). Foundations of Databases.
[Akers, 1978] Akers, S. B. (1978). Binary decision di- agrams. IEEE Trans. Computers, 27(6):509516.
[Bacchus et al., 2003] Bacchus, F., Dalmao, S., and Pitassi, T. (2003). Algorithms and complexity re- sults for #sat and bayesian inference. In FOCS, pages 340351.
[Bayardo et al., 2000] Bayardo, R. J., Jr., and Pe- houshek, J. D. (2000). Counting models using con- nected components. In AAAI, pages 157162.
[Beame et al., 2010] Beame, P., Impagliazzo, R., Pitassi, T., and Segerlind, N. (2010). Formula caching in dpll. TOCT, 1(3).
[Beame et al., 2004] Beame, P., Kautz, H. A., and Sabharwal, A. (2004). Towards understanding and harnessing the potential of clause learning. J. Artif.Intell. Res. (JAIR), 22:319351.
[Ben-Sasson and Wigderson, 2001] Ben-Sasson, E.and Wigderson, A. (2001). Short proofs are narrow  resolution made simple. J. ACM, 48(2):149169.
[Bollig and Wegener, 1998] Bollig, B. and Wegener, I.(1998). A very simple function that requires ex- ponential size read-once branching programs. Inf.Process. Lett., 66(2):5357.
[Bryant, 1986] Bryant, R. E. (1986). Graph-based al- gorithms for boolean function manipulation. IEEE Trans. Computers, 35(8):677691.
[Darwiche, 2001a] Darwiche, A. (2001a). Decompos- able negation normal form. J. ACM, 48(4):608647.
[Darwiche, 2001b] Darwiche, A. (2001b). On the tractable counting of theory models and its applica- tion to truth maintenance and belief revision. Jour- nal of Applied Non-Classical Logics, 11(1-2):1134.
[Darwiche and Marquis, 2002] Darwiche, A. and Mar- quis, P. (2002). A knowledge compilation map. J.Artif. Int. Res., 17(1):229264.
[Davis et al., 1962] Davis, M., Logemann, G., and Loveland, D. (1962). A machine program for theorem-proving. Commun. ACM, 5(7):394397.
[Davis and Putnam, 1960] Davis, M. and Putnam, H.(1960). A computing procedure for quantification theory. J. ACM, 7(3):201215.
[Domingos and Lowd, 2009] Domingos, P. and Lowd, D. (2009). Markov Logic: An Interface Layer for Artificial Intelligence.
[Gomes et al., 2009] Gomes, C. P., Sabharwal, A., and Selman, B. (2009). Model counting. In Hand- book of Satisfiability, pages 633654.
[Huang and Darwiche, 2005] Huang, J. and Darwiche, A. (2005). Dpll with a trace: From sat to knowledge compilation. In IJCAI, pages 156162.
[Huang and Darwiche, 2007] Huang, J. and Darwiche, A. (2007). The language of search. JAIR, 29:191 219.
[Jha et al., 2010] Jha, A. K., Gogate, V., Meliou, A., and Suciu, D. (2010). Lifted inference seen from the other side : The tractable features. In NIPS, pages 973981.
[Majercik and Littman, 1998] Majercik, S. M. and Littman, M. L. (1998). Using caching to solve larger probabilistic planning problems. In AAAI, pages 954959.
[Masek, 1976] Masek, W. J. (1976). A fast algorithm for the string editing problem and decision graph complexity. Masters thesis, MIT.
[Muise et al., 2012] Muise, C., McIlraith, S. A., Beck, J. C., and Hsu, E. I. (2012). Dsharp: fast d-dnnf compilation with sharpsat. In Canadian AI, pages 356361.
[Sang et al., 2004] Sang, T., Bacchus, F., Beame, P., Kautz, H. A., and Pitassi, T. (2004). Combining component caching and clause learning for effective model counting. In SAT.
[Suciu et al., 2011] Suciu, D., Olteanu, D., Re, C., and Koch, C. (2011). Probabilistic Databases.
[Thurley, 2006] Thurley, M. (2006). sharpsat: count- ing models with advanced component caching and implicit bcp. In SAT, pages 424429.
[Vardi, 1982] Vardi, M. Y. (1982). The complexity of relational query languages. In STOC, pages 137 146.
[Wegener, 2000] Wegener, I. (2000). Branching pro- grams and binary decision diagrams: theory and ap- plications.
-----1
[1] David M. Blei. Probabilistic topic models. Com- mun. ACM, 55(4):7784, 2012.
[2] David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, 57(2), 2010.
[3] David M. Blei, Andrew Ng, and Michael Jordan.Latent dirichlet allocation. JMLR, 3:9931022, 2003.
[4] K. Canini, L. Shi, and T. Griffiths. Online in- ference of topics with latent Dirichlet allocation.In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, 2009.
[5] T. Griffiths and M. Steyvers. Finding scientific topics. In Proceedings of the National Academy of Sciences, volume 101, pages 52285235, 2004.
[6] Geoffrey E. Hinton. Training products of ex- perts by minimizing contrastive divergence. Neu- ral Computation, 14(8):17111800, 2002.
[7] Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, pages 2717 2725. 2012.
[8] D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet- multinomial regression. In UAI, pages 411418, 2008.
[9] Radford M. Neal. Annealed importance sampling.Statistics and Computing, 11(2):125139, April 2001.
[10] R. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In Proceedings of the In- ternational Conference on Artificial Intelligence and Statistics, volume 12, 2009.
[11] Ruslan Salakhutdinov and Geoff Hinton. A better way to pretrain deep boltzmann machines. In Ad- vances in Neural Information Processing Systems 25, pages 24562464. 2012.
[12] Ruslan Salakhutdinov and Geoffrey Hinton.Replicated softmax: an undirected topic model.In Advances in Neural Information Processing Systems 22, pages 16071614. 2009.
[13] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes.Journal of the American Statistical Association, 101(476):15661581, 2006.
[14] Y. W. Teh, K. Kurihara, and M. Welling. Col- lapsed variational inference for HDP. In Advances in Neural Information Processing Systems, vol- ume 20, 2008.
[15] T. Tieleman. Training restricted Boltzmann ma- chines using approximations to the likelihood gra- dient. In ICML. ACM, 2008.
[16] Chong Wang and David M. Blei. Variational in- ference for the nested chinese restaurant process.In NIPS, pages 19901998, 2009.
[17] Eric P. Xing, Rong Yan, and Alexander G. Haupt- mann. Mining associated text and images with dual-wing harmoniums. In UAI, pages 633641.AUAI Press, 2005.
[18] L. Younes. On the convergence of Markovian stochastic algorithms with rapidly decreasing er- godicity rates, March 17 2000.
-----1
[1] Naoki Abe, Prem Melville, Cezar Pendus, Vince P.Thomas, James J. Bennett, Gary F. Anderson, Brent R. Cooley, Melissa Kowalczyk, Mark Domick, and Timothy Gardinier. Optimizing debt collections using constrained reinforcement learning. In KDD, 2010.
[2] Eitan Altman. Constrained Markov Decision Pro- cesses. Chapman and Hall/CRC, 1999.
[3] Stephen Boyd and Lieven Vandenberghe. Convex Op- timization. Cambridge University Press, 2004.
[4] Garud N. Iyengar. Robust dynamic programming.Mathematics of Operations Research, 30:257280, 2005.
[5] Arnab Nilim and Laurent El Ghaoui. Robust Markov decision processes with uncertain transition matrices.Operations Research, 53(5):780798, 2005.
[6] J. Pazis and M. Lagoudakis. Binary action search for learning continuous action constrol policies. In In- ternational Conference on Machine Learning, pages 793800, 2009.
[7] Marek Petrik and Dharmashankar Subramanian. An approximate solution method for large risk-averse markov decision processes. In Proceedings of Confer- ence on Uncertainty in Artificial Intelligence, 2012.
[8] Marek Petrik and Shlomo Zilberstein. Linear dy- namic programs for resource management. In Con- ference on Artificial Intelligence, 2011.
[9] Martin L. Puterman. Markov decision processes: Dis- crete stochastic dynamic programming. John Wiley & Sons, Inc., 2005.
[10] Mohit Tawarmalani, Jean-Philippe P. Richard, and Chuanhui Xiong. Explicit convex and concave en- velopes through polyhedral subdivisions. Mathemat- ical Programming, 2012.
[11] Ari Weinstein and Michael L. Littman. Bandit-based planning and learning in continuous action markov decision processes. In International Conference on Automated Planning and Scheduling, 2012.
-----1
[1] L. An and P. Tao. The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research, 133: 2346, 2005.
[2] F. Bach and M. Jordan. Thin junction trees. In Neural Information Processing Systems, 2001.
[3] S. Bach, M. Broecheler, L. Getoor, and D. OLeary. Scaling MPE inference for con- strained continuous Markov random fields with consensus optimization. In Neural Information Processing Systems, 2012.
[4] J. Besag. Statistical analysis of non-lattice data.Journal of the Royal Statistical Society, 24(3): 179195, 1975.
[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statisti- cal learning via the alternating direction method of multipliers. Foundations and Trends in Ma- chine Learning, 3(1), 2011.
[6] M. Broecheler, L. Mihalkova, and L. Getoor.Probabilistic similarity logic. In Uncertainty in Artificial Intelligence, 2010.
[7] M. Collins. Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Empirical Meth- ods in Natural Language Processing, 2002.
[8] P. Domingos and W. Webb. A tractable first- order probabilistic logic. In AAAI Conference on Artificial Intelligence, 2012.
[9] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collabo- rative filtering algorithm. Information Retrieval, 4(2):133151, July 2001.
[10] B. Huang, A. Kimmig, L. Getoor, and J. Golbeck.A flexible framework for probabilistic models of social trust. In Conference on Social Comput- ing, Behavioral-Cultural Modeling, & Prediction, 2013.
[11] T. Huynh and R. Mooney. Online max-margin weight learning for Markov logic networks. In SIAM International Conference on Data Mining, 2011.
[12] T. Joachims, T. Finley, and C. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):2759, 2009.
[13] A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor. A short introduction to probabilis- tic soft logic. In NIPS Workshop on Probabilis- tic Programming: Foundations and Applications, 2012.
[14] D. Lowd and P. Domingos. Efficient weight learn- ing for Markov logic networks. In Principles and Practice of Knowledge Discovery in Databases, 2007.
[15] A. Martins, M. Figueiredo, P. Aguiar, N. Smith, and E. Xing. An augmented Lagrangian approach to constrained MAP inference. In International Conference on Machine Learning, 2011.
[16] O. Meshi and A. Globerson. An alternating di- rection method for dual MAP LP relaxation. In European Conference on Machine Learning and Knowledge Discovery in Databases, 2011.
[17] J. Neville and D. Jensen. Relational dependency networks. Journal of Machine Learning Research, 8:653692, 2007.
[18] H. Poon and P. Domingos. Sum-product net- works: A new deep architecture. In Uncertainty in Artificial Intelligence, 2011.
[19] M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1-2):107136, 2006.
[20] R. Salakhutdinov and A. Mnih. Bayesian prob- abilistic matrix factorization using Markov chain Monte Carlo. In International Conference on Ma- chine Learning, 2008.
[21] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gal- lagher, and T. Eliassi-Rad. Collective classifica- tion in network data. AI Magazine, 29(3):93106, 2008.
[22] D. Sontag and T. Jaakkola. Tree block coordinate descent for MAP in graphical models. In Artificial Intelligence and Statistics, 2009.
[23] B. Taskar, M. Wong, P. Abbeel, and D. Koller.Link prediction in relational data. In Neural In- formation Processing Systems, 2003.
[24] B. Taskar, C. Guestrin, and D. Koller. Max- margin Markov networks. In Neural Information Processing Systems, 2004.
[25] M. Wainwright and M. Jordan. Graphical mod- els, exponential families, and variational infer- ence. Foundations and Trends in Machine Learn- ing, 1(1-2), January 2008.
[26] L. Xiong, X. Chen, T. Huang, J. Schneider, and J. Carbonell. Temporal collaborative filtering with Bayesian probabilistic tensor factorization.In SIAM International Conference on Data Min- ing, 2010.
-----1
[1] Tobias Achterberg. Constraint Integer Program- ming. PhD thesis, TU Berlin, July 2007.
[2] David Avis. A revised implementation of the re- verse search vertex enumeration algorithm. In Gil Kalai and Gunter M. Ziegler, editors, Polytopes Combinatorics and Computation, volume 29 of DMV Seminar, pages 177198. Birkhuser Basel, 2000.
[3] Egon Balas and Shu Ming Ng. On the set cov- ering polytope: 1. All the facets with coefficients in {0,1,2}. Mathematical Programming, 43:5769, 1989.
[4] Vaclav Chvatal. Edmonds polytopes and a hier- archy of combinatorial problems. Discrete Math- ematics, 4:305337, 1973.
[5] James Cussens. Bayesian network learning by compiling to weighted MAX-SAT. In Proceed- ings of the 24th Conference on Uncertainty in Ar- tificial Intelligence (UAI 2008), pages 105112, Helsinki, 2008. AUAI Press.
[6] James Cussens. Bayesian network learning with cutting planes. In Fabio G. Cozman and Avi Pfef- fer, editors, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 153160, Barcelona, 2011. AUAI Press.
[7] James Cussens. Column generation for exact BN learning: Work in progress. In Proc. ECAI-2012 workshop on COmbining COnstraint solving with MIning and LEarning (CoCoMile 2012), 2012.
[8] James Cussens, Mark Bartlett, Elinor M. Jones, and Nuala A. Sheehan. Maximum likelihood pedi- gree reconstruction using integer linear program- ming. Genetic Epidemiology, 37(1):6983, Janary 2013.
[9] Cassio de Campos and Qiang Ji. Properties of Bayesian Dirichlet scores to learn Bayesian net- work structures. In AAAI-10, pages 431436, 2010.
[10] Michel X. Goemans and Leslie A. Hall. The strongest facets of the acyclic subgraph poly- tope are unknown. In Integer Programming and Combinatorial Optimization, volume 1084 of Lec- tures Notes in Computer Science, pages 415429.Springer, 1996.
[11] Martin Grotschel, Michael Junger, and Gerhard Reinelt. On the acyclic subgraph polytope. Math- ematical Programming, 33(1):2842, 1985.
[12] Raymond Hemmecke, Silvia Lindner, and Mi- lan Studeny. Characteristic imsets for learning Bayesian network structure. International Jour- nal of Approximate Reasoning, 53:13361349, 2012.
[13] Tommi Jaakkola, David Sontag, Amir Globerson, and Marina Meila. Learning Bayesian network structure using LP relaxations. In Proceedings of 13th International Conference on Artificial Intel- ligence and Statistics (AISTATS 2010), volume 9, pages 358365, 2010. Journal of Machine Learn- ing Research Workshop and Conference Proceed- ings.
[14] Daphne Koller and Nir Friedman. Probabilis- tic Graphical Models: Principles and Techniques.MIT Press, 2009.
[15] Tomi Silander and Petri Myllymaki. A simple ap- proach for finding the globally optimal Bayesian network structure. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelli- gence (UAI-06), pages 44545, 2006.
[16] Laurence A. Wolsey. Integer Programming. John Wiley, 1998.
[17] Changhe Yuan and Brandon Malone. An im- proved admissible heuristic for learning optimal Bayesian networks. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelli- gence (UAI-12), Catalina Island, CA, 2012.
-----0
T. Adel, B. Smith, and D. Stashuk. Muscle categorization using pdf estimation and na?ve Bayes classification. In IEEE Engineering in Medicine & Biology Society (EMBC), pp. 261922, 2012.E. Alpaydin. Introduction to Machine Learning. MIT Press, 2nd edition, 2010.S. Andrews and T. Hofmann. Multiple-instance learning via disjunctive programming boosting. In NIPS, pp. 65 72, 2003.
S. Andrews, T. Hofmann, and I. Tsochantaridis. Multiple instance learning with generalized support vector machines. In AAAI/IAAI, pp. 943944, 2002a.
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In NIPS, pp. 561568, 2002b.
J. Basmajian and C. D. Luca. Muscles Alive: Their Functions Revealed by Electromyography. Williams & Wilkins, 1985.
A. Blum and A. Kalai. A note on learning from multipleinstance examples. Mach. Learn., 30(1):2329, 1998.
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:127:27, 2011.
C. Cortes and V. Vapnik. Support-vector networks. Mach.Learn., 20(3):273297, Sept. 1995.
N. de Freitas and H. Kuck. Learning about individuals from group statistics. In UAI, pp. 332339, 2005.
T. G. Dietterich, R. H. Lathrop, and T. Lozano-Perez.Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1-2):3171, 1997.
D. I. Diochnos, R. H. Sloan, and G. Turan. On multipleinstance learning of halfspaces. Inf. Process. Lett., 112 (23):933936, 2012.
D. Dumitru, A. Amato, and M. Zwarts. Electrodiagnostic Medicine. Hanley & Belfus, first edition, 1995.
M. Dundar, G. Fung, B. Krishnapuram, and R. B. Rao.Multiple-instance learning algorithms for computeraided detection. IEEE Trans. Biomed. Engineering, 55 (3):10151021, 2008.
C. Farkas, D. Stashuk, A. Hamilton-Wright, and H. Parsaei. A review of clinical quantitative electromyography. Crit. Rev. Biomed. Eng, 38(5):467485, 2010.
J. R. Foulds and E. Frank. A review of multi-instance learning assumptions. Knowledge Eng. Review, 25(1): 125, 2010.
J. R. Foulds and P. Smyth. Multi-instance mixture models.In SDM, pp. 606617. SIAM, 2011.
T. Gartner, P. A. Flach, A. Kowalczyk, and A. J. Smola.Multi-instance kernels. In ICML, pp. 179186, 2002.
F. Han, D. Wang, and X. Liao. An improved multipleinstance learning algorithm. In Advances in Neural Networks ISNN 2007, number 4491 in LNCS, pp. 1104 1109, Jan. 2007.
T.-K. Huang, R.-C. Weng, and C.-J. Lin. Generalized Bradley-Terry models and multi-class probability estimates. J. Mach. Learn. Res., 7:85115, Dec. 2006.
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.P. M. Long and L. Tan. PAC learning axis-aligned rectangles with respect to product distributions from multipleinstance examples. Mach. Learn., 30(1):721, 1998.M. I. Mandel and D. P. W. Ellis. Multiple-instance learning for music information retrieval. In ISMIR, pp. 577582, 2008.
O. Maron and T. Lozano-Perez. A framework for multipleinstance learning. In NIPS, pp. 570576, 1998.O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In ICML, pp. 341349, 1998.
L. Pino. Neuromuscular clinical decision support using motor unit potentials characterized by pattern discovery. University of Waterloo, PhD Thesis, 2008.
R. Rahmani and S. A. Goldman. Missl: multiple-instance semi-supervised learning. In ICML, pp. 705712, 2006.
S. Sabato, N. Srebro, and N. Tishby. Reducing label complexity by learning from bags. Journal of Machine Learning Research, 9:685692, 2010.
B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In NIPS, pp. 12891296, 2007.
H. U. Simon. PAC-learning in the presence of one-sided classification noise. In ISAIM, 2012.
A. Sklar. Fonctions de repartition a` n dimensions et leurs marges. Publications de lInstitut de Statistique de lUniversite de Paris, 8:229231, 1959.
D. Stashuk. Decomposition and quantitative analysis of clinical electromyographic signals. Med Eng Phys, 21: 389404, 1999.
M. Stikic and B. Schiele. Activity recognition from sparsely labeled data using multi-instance learning. In LoCA, pp.156173, 2009.
Q. Tao, S. D. Scott, N. V. Vinodchandran, and T. T. Osugi. SVM-based generalized multiple-instance learning via approximate box counting. In ICML, 2004.
G. R. Terrell. The maximal smoothing principle in density estimation. Journal of the American Statistical Association, 85(410):470477, 1990.
P. A. Viola, J. C. Platt, and C. Zhang. Multiple instance boosting for object detection. In NIPS, pp. 14171426, 2005.
J. Wang and J.-D. Zucker. Solving the multiple-instance problem: A lazy learning approach. In ICML, pp. 1119 1126, 2000.
X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In PAKDD, pp. 272281, 2004.
S.-H. Yang, H. Zha, and B.-G. Hu. Dirichlet-bernoulli alignment: A generative model for multi-class multilabel multi-instance corpora. In NIPS, pp. 21432150, 2009.
Q. Zhang and S. A. Goldman. EM-DD: An improved multiple-instance learning technique. In NIPS, pp. 1073 1080, 2001.
Q. Zhang, S. A. Goldman, W. Yu, and J. E. Fritts.Content-based image retrieval using multiple-instance learning. In ICML, pp. 682689, 2002.
Z.-H. Zhou and J.-M. Xu. On the relation between multi-instance learning and semi-supervised learning. In ICML, pp. 11671174, 2007.
-----0
R. I. Brafman and M. Tennenholtz. R-maxa general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213231, October 2002.E. Brunskill. Bayes-optimal reinforcement learning for discrete uncertainty domains. In Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1385 1386, 2012.
T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142150, 1989.
C. Diuk, L. Li, and B. R. Leffler. The adaptive kmeteorologists problem and its application to structure discovery and feature selection in reinforcement learning. In Proceedings of the Twenty-Sixth International Conference on Machine Learning (ICML), pages 249 256, 2009.
K. Dyagilev, S. Mannor, and N. Shimkin. Efficient reinforcement learning in parameterized models: Discrete parameter case. In Recent Advances in Reinforcement Learning, volume 5323 of Lecture Notes in Computer Science, pages 4154, 2008.
T. P. Hayes. A large-deviation inequality for vector-valued martingales, 2005. Unpublished manuscript.
T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:15631600, 2010.
S. M. Kakade. On the Sample Complexity of Reinforcement Learning. . PhD thesis, University College London, 2003.
M. J. Kearns and S. P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2 3):209232, 2002.
T. Lattimore, M. Hutter, and P. Sunehag. The samplecomplexity of general reinforcement learning. In Proceedings of Thirtieth International Conference on Machine Learning (ICML), 2013. To appear.
A. Lazaric and M. Restelli. Transfer from multiple MDPs.In Proceedings of the Neural Information Processing Systems (NIPS), pages 17461754, 2011.
L. Li, M. L. Littman, T. J. Walsh, and A. L. Strehl. Knows what it knows: A framework for self-aware learning.Machine Learning, 82(3):399443, 2011.
T. A. Mann and Y. Choe. Directed exploration in reinforcement learning with transferred knowledge. In European Workshop on Reinforcement Learning, 2012.
N. Mehta, S. Natarajan, P. Tadepalli, and A. Fern. Transfer in variable-reward hierarchical reinforcement learning.Machine Learning, 73(3):289312, 2008.
P. Poupart, N. Vlassis, J. Hoey, and K. Regan. An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 697704, 2006.
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, New York, 1994. ISBN 0-471-61977-9.
J. Sorg and S. P. Singh. Transfer via soft homomorphisms.In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent System (AAMAS), pages 741748, 2009.
A. L. Strehl and M. L. Littman. An analysis of modelbased interval estimation for Markov decision processes.Journal of Computer and System Sciences, 74(8):1309 1331, 2008.
A. L. Strehl, L. Li, and M. L. Littman. Incremental modelbased learners with formal learning-time guarantees. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI), pages 485493, 2006a.
A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L.Littman. PAC model-free reinforcement learning. In Proceedings of the Twenty-Third International Conference on Machine Learning (ICML), pages 881888, 2006b.
A. L. Strehl, L. Li, and M. L. Littman. Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:24132444, 2009.
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, March 1998.ISBN 0-262-19398-1.
I. Szita and C. Szepesvari. Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the Twenty-Seventh International Conference on Machine Learning (ICML), pages 10311038, 2010.
M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1):16331685, 2009.
A. Wilson, A. Fern, S. Ray, and P. Tadepalli. Multi-task reinforcement learning: a hierarchical Bayesian approach. In Proceedings of the International Conference on Machine Learning (ICML), pages 10151022, 2007.
-----0
Bartholomew, D. J., Steele, F., Moustaki, I., & Galbraith,  J. I. (2002). The Analysis and Interpretation of  Multivariate Data for Social Scientists (Texts in  Statistical Science Series). 
Chapman & Hall/CRC.  Bollen, K. A. (1989). Structural Equations with Latent  Variables. Wiley-Interscience.  
Drton, M., Massam, H., & Olkin, I. (2008).  Moments of  minors of Wishart matrices. Annals of Statistics 36(5):  2261-2283  
Elidan, G., Lotner, N., Friedman, N., & Koller, D. (2001).  Discovering hidden variables: A structure-based  approach. Proceedings from Advanced in Neural  Information Processing Systems.   
Harman, H. H. (1976). Modern Factor Analysis.  University Of Chicago Press.  
Kalisch, M., and P. Bhlmann (2007). Estimating highdimensional directed acyclic graphs with the PCalgorithm. Journal of Machine Learning Research, 8,  613636.  
Kilmer, S., Light, W., Sun, X. & Yu, X. (1996)  Approximation by Translates of a Positive Definite  Function, J. Mathematical Analysis and Applications,  201, 631-641.  
Pearl, J. (2000). Causality: Models, Reasoning, and  Inference. Cambridge University Press.  
Robins, J., Scheines, R.,  Spirtes, P. & Wasserman, L.  (2003) Uniform Consistency In Causal Inference,  
Biometrika, September, 2003, 90, 491-515.   Jackson, A., and Scheines, R. (2005) Single Mothers'  Self-Efficacy, Parenting in the Home Environment, and  Children's Development in a Two-Wave Study, in Social  Work Research, 29, 1, pp. 7-20.  
Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006).  Learning the structure of linear latent variable models. J  Mach Learn Res, 7, 191-246.  
Spirtes, P. (1995) Directed Cyclic Graphical  Representation of Feedback Models, in Proceedings of  the Eleventh Conference on Uncertainty in Artificial  Intelligence, ed. by Philippe Besnard and Steve Hanks,  Morgan Kaufmann Publishers, Inc., San Mateo.  
Spirtes, P., Meek, C., & Richardson, T. S. (1995). Causal  inference in the presence of latent variables and selection  bias. Proceedings from Eleventh Conference on  Uncertainty in Artificial Intelligence, San Francisco, CA.  
Spirtes, P., Glymour, C., & Scheines, R. (2001).  Causation, Prediction, and Search, Second Edition  (Adaptive Computation and Machine Learning). The MIT  Press.  
Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek  Separation for Gaussian Graphical Models. Ann Stat,  38(3), 1665-1685.  
Thurstone, L. (1936). The Vectors Of Mind: Multiple  Factor Analysis For The Isolation Of Primary Traits.  Nabu Press.  
Uhler, C. Raskutti, G., Buhlmann, P, and Yu, B. (2012).  Geometry of faithfulness assumption in causal inference,  Arxiv.Org Math 1207.0547, to appear in Annals Of  Statistics.          
-----1
[Allwein et al., 2001] Allwein, E., Schapire, R., and Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1:113141.
[Bergstra and Bengio, 2012] Bergstra, J. and Bengio, Y.(2012). Random search for hyper-parameter optimiza- tion. Journal of Machine Learning Research, 13:281 305.
[Bertin-Mahieux et al., 2011] Bertin-Mahieux, T., Ellis, D. P., Whitman, B., and Lamere, P. (2011). The mil- lion song dataset. In Proceedings of the 12th Interna- tional Conference on Music Information Retrieval (IS- MIR 2011).
[Cesa-Bianchi et al., 2005] Cesa-Bianchi, N., Conconi, A., and Gentile, C. (2005). A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640 668.
[Corcuera and Giummole, 1998] Corcuera, J. M. and Giummole, F. (1998). A characterization of monotone and regular divergences. Annals of the Institute of Sta- tistical Mathematics, 50(3):433450.
[Crammer et al., 2009] Crammer, K., Kulesza, A., and Dredze, M. (2009). Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems.
[Duchi et al., 2011] Duchi, J., Hazan, E., and Singer, Y.(2011). Adaptive subgradient methdos for online learn- ing and stochastic optimization. JMLR.
[Frank and Asuncion, 2010] Frank, A. and Asuncion, A.(2010). UCI machine learning repository.
[Hazan, 2006] Hazan, E. (2006). Efficient algorithms for online convex optimization and their applications. Tech- nical report, Princeton.
[Hutter et al., 2013] Hutter, F., Hoos, H., and Leyton- Brown, K. (2013). Identifying key algorithm parameters and instance features using forward selection. In Learn- ing and Intelligent Optimization (LION 7).
[Li and Zhang, 1998] Li, G. and Zhang, J. (1998). Spher- ing and its properties. Sankhya: The Indian Journal of Statistics, Series A, pages 119133.
[Loosli et al., 2007] Loosli, G., Canu, S., and Bottou, L.(2007). Training invariant support vector machines us- ing selective sampling. In Bottou, L., Chapelle, O., De- Coste, D., and Weston, J., editors, Large Scale Kernel Machines, pages 301320. MIT Press, Cambridge, MA.
[McMahan and Streeter, 2010] McMahan, H. B. and Streeter, M. (2010). Adaptive bound optimization for online convex optimization. In Conference on Learning Theory.
[Moro et al., 2011] Moro, S., Laureano, R., and Cortez, P.(2011). Using data mining for bank direct marketing: An application of the crisp-dm methodology. In et al., P. N., editor, Proceedings of the European Simulation and Modelling Conference - ESM2011, pages 117121, Guimaraes, Portugal. EUROSIS.
[Orabona et al., 2012] Orabona, F., Crammer, K., and Cesa-Bianchi, N. (2012). A generalized online mirror descent with applications to classification and regres- sion. Technical report, Unimi.
[Oren, 1974] Oren, S. S. (1974). Self-scaling variable met- ric (ssvm) algorithms. part ii: Implementation and ex- periments. Management Science, 20(5):pp. 863874.
[Ross et al., 2012] Ross, S., Mineiro, P., and Langford, J.(2012). Vowpal wabbit implementation of nag and snag.Technical report, github.com.
[Schaul et al., 2012] Schaul, T., Zhang, S., and LeCun, Y. (2012). No more pesky learning rates. CoRR, abs/1206.1106.
[Snoek et al., 2012] Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In NIPS.
[Sonnenburg and Franc, 2010] Sonnenburg, S. and Franc, V. (2010). COFFIN: a computational framework for lin- ear SVMs. In Proceedings of the 27nd International Machine Learning Conference.
[Wagenaar, 1998] Wagenaar, D. (1998). Information ge- ometry for neural networks. Technical report, Kings College London.
[Zinkevich, 2003] Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient as- cent. In Proceedings of the International Conference on Machine Learning (ICML 2003), pages 928936.
-----0
W. L. Buntine. Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2:159225, 1994.
T. Claassen and T. Heskes. A Bayesian approach to constraint based causal inference. In Proceedings of the Twenty-Eighth Annual Conference on Uncertainty in Artificial Intelligence, pages 207216, 2012.
D. Colombo and M. H. Maathuis. A Modification of the PC Algorithm Yielding Order-Independent Skeletons. arXiv preprint arXiv:1211.3295, 2012.
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer.Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence, volume 16, pages 13001309, 1999.
L. Getoor, N. Friedman, D. Koller, and B. Taskar.Learning probabilistic models of link structure.Journal of Machine Learning Research, 3:679707, 2002.
L. Getoor, N. Friedman, D. Koller, A. Pfeffer, and B. Taskar. Probabilistic relational models. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, pages 129174. MIT Press, Cambridge, MA, 2007.
W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modeling. The Statistician, 43:169177, 1994.
D. Heckerman, C. Meek, and D. Koller. Probabilistic entity-relationship models, PRMs, and plate models. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, pages 201 238. MIT Press, Cambridge, MA, 2007.
P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Scholkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 22, pages 689696, 2008.
D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 259266, 2002.
S. Kramer, N. Lavrac?, and P. Flach. Propositionalization approaches to relational data mining. In S. Dz?eroski and N. Lavrac?, editors, Relational Data 
Mining, pages 262286. Springer-Verlag, New York, NY, 2001.
M. Maier, B. Taylor, H. Oktay, and D. Jensen. Learning causal models of relational domains. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pages 531538, 2010.
M. Maier, K. Marazopoulou, and D. Jensen. Reasoning about Independence in Probabilistic Models of Relational Data. arXiv preprint arXiv:1302.4381, 2013.
D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12, pages 505511, 1999.
C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 403410, 1995.
J. Pearl. Causality: Models, Reasoning, and Inference.Cambridge University Press, New York, NY, 2000.
J.-P. Pellet and A. Elisseeff. Using Markov blankets for causal structure learning. Journal of Machine Learning Research, 9:12951342, 2008.
J. Peters, D. Janzing, and B. Scholkopf. Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):24362450, 2011.
M. J. Rattigan, M. Maier, and D. Jensen. Relational blocking for causal discovery. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pages 145151, 2011.
S. Shimizu, P. O. Hoyer, A. Hyvarinen, and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:20032030, 2006.
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction and Search. MIT Press, Cambridge, MA, 2nd edition, 2000.
-----1
[1] C. P. de Campos and Q. Ji. Strategy selection in influ- ence diagrams using imprecise probabilities. In Pro- ceedings of the 24th Conference on Uncertainty in Ar- tificial Intelligence (UAI-08), pages 121128, 2008.
[2] R. Dechter and R. Mateescu. AND/OR search spaces for graphical models. Artificial Intelligence, 171(2- 3):73106, 2007.
[3] E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dy- namic programming for partially observable stochas- tic games. In Proceedings of the 19th National Con- ference on Artificial Intelligence (AAAI-04), pages 709715, 2004.
[4] M. C. Horsch and D. Poole. An anytime algorithm for decision making under uncertainty. In Proceed- ings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), 1998.
[5] R. A. Howard and J. E. Matheson. Influence dia- grams. In R. A. Howard and J. E. Matheson., editors, The Principles and Applications of Decision Analysis, pages 719762, Menlo Park, CA, 1981.
[6] F. Jensen, F. V. Jensen, and S. L. Dittmer. From in- fluence diagrams to junction trees. In Proceedings of the 10th Conference on Uncertainty in Artificial Intel- ligence (UAI-94), pages 367373, 1994.
[7] F. V. Jensen and T. D. Nielsen. Bayesian Networks and Decision Graphs, chapter 10, page 358. Springer Science+Business Media, LLC, New York, 2 edition, 2007.
[8] S. L. Lauritzen and D. Nilsson. Representing and solving decision problems with limited information.Management Science, 47(9):12351251, 2001.
[9] R. Marinescu. A new approach to influence diagrams evaluation. In Proceedings of the 29th SGAI Interna- tional Conference on Artificial Intelligence (AI-2009), pages 107120, 2009.
[10] D. Maua, C. de Campos, and M. Zaffalon. Solving limited memory influence diagrams. Journal of Arti- ficial Intelligence Research, 44:97140, 2012.
[11] D. D. Maua and C. P. de Campos. Solving decision problems with limited information. In Advances in Neural Information Processing Systems 24: Proceed- ings of the 25th Annual Conference on Neural Infor- mation Processing Systems (NIPS-11), 2011.
[12] R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized POMDPs: To- wards efficient policy computation for multiagent settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI- 03), pages 705711, 2003.
[13] T. D. Nielsen and F. V. Jensen. Well-defined decision scenarios. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 502511, 1999.
[14] D. Nilsson and M. Hohle. Computing bounds on ex- pected utilities for optimal policies based on limited information. Technical Report 94, Danish Informatics Network in the Agricultural Sciences, 2001.
[15] S. M. Olmsted. On representing and solving decision problems. PhD thesis, Stanford University, 1983.
[16] R. Qi and D. L. Poole. A new method for influ- ence diagram evaluation. Computational Intelligence, 11:498528, 1995.
[17] H. Raiffa and R. Schlaifer. Applied Statistical Deci- sion Theory. MIT Press, Cambridge, 1961.
[18] S. Seuken and S. Zilberstein. Memory-bounded dy- namic programming for DEC-POMDPs. In Proceed- ings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 20092015, 2007.
[19] R. Shachter. Evaluating influence diagrams. Opera- tions Research, 34:871882, 1986.
[20] R. Shachter and M. Peot. Decision making using probabilistic inference methods. In Proceedings of the 8th Conference on Uncertainty in Artificial Intel- ligence (UAI-92), pages 276283, 1992.
[21] R. D. Shachter. Efficient value of information com- putation. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 594601, 1999.
[22] J. E. Smith, S. Holtzman, and J. E. Matheson. Struc- turing conditional relationships in influence diagrams.Operations Research, 41:280297, April 1993.
[23] C. Yuan and E. A. Hansen. Efficient computation of jointree bounds for systematic MAP search. In Pro- ceedings of 21st International Joint Conference on Artificial Intelligence (IJCAI-09), 2009.
[24] C. Yuan, X. Wu, and E. A. Hansen. Solving mul- tistage influence diagrams using branch-and-bound search. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI-10), pages 691700, 2010.
-----0
R. P. Adams and Z. Ghahramani. Archipelago: Nonparametric Bayesian Semi-Supervised Learning.Proceedings of the 26th International Conference on Machine Learning, 2009.
M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A.Kowalski, S. Lukasik, and S. Zak. Information Technologies in Biomedicine, volume 69 of Advances in Intelligent and Soft Computing. Springer Berlin Heidelberg, 2010.
W. Chu, V. Sindhwani, Z. Ghahramani, and S. S.Keerthi. Relational learning with Gaussian Processes. Advances in Neural Information Processing Systems, pages 289296, 2007.
M. Cuturi. Fast global alignment kernels. Proceedings of the 28th International Conference on Machine Learning, pages 929936, 2011.I. Dhillon, Y. Guan, and B. Kulis.
M. D. Escobar and M. West. Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90:577588, 1995.
J. E. Gentle. Matrix Algebra: Theory, Computations and Applications in Statistics. Springer, 2007.
J. Ginibre. Statistical Ensembles of Complex, Quaternion and Real Matrices. Journal of Mathematical Physics, 6:440, 1965.
L. Hubert and P. Arabie. Comparing Partitions. Journal of Classification, pages 193218, 1985.
A. Kulesza and B. Taskar. Structured Determinanteal Point Processes. Advances in Neural Information Processing Systems, 2010.
A. Kulesza and B. Taskar. Learning Determinantal Point Processes. Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, 2011a.
A. Kulesza and B. Taskar. k-dpps: Fixed-size Determinantal Point Processes. Proceedings of the 28th International Conference on Machine Learning, 2011b.
A. Kulesza and B. Taskar. Determinantal Point Processes for Machine Learning, 2013. URL http://arxiv.org/abs/1207.6083.
B. Kulis and M.I. Jordan. Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Proceedings of the 29th International Conference on Machine Learning, 2012.
N. D. Lawrence and M. I. Jordan. Semi-supervised Learning via Gaussian Processes. Advances in Neural Information Processing Systems, pages 753760, 2005.
O. Macchi. The Coincidence Approach to Stochastic Point Processes. Advances in Applied Probability, 7(1):83122, 1975.
C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, 2008.
M. L. Mehta and M. Gaudin. On the density of Eigenvaluse of a Random Matrix. Nuclear Physics, 18(0): 420427, 1960.
I. Murray, Z. Ghahramani, and D. MacKay. Mcmc for Doubly-Intractable Distributions. Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, pages 359366, 2006.
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
S. Rogers and M. Girolami. Multi-class Semisupervised Learning with the -truncated Multinomial Probit Gaussian Process. Journal of Machine Learning Research: Gaussian Processes in Practise, pages 1732, 2007.
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on PAMI, 2000.
V. Sindhwani, W. Chu, and S. S. Keerthi. Semisupervised Gaussian process classifiers. International Joint Conference on Artificial Intelligence, pages 10591064, 2007.
R. Socher, A. Maas, and C. D. Manning. Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities. Proceedings of the 14th Conference on Artificial Intelligence and Statistics, 2011.
J. Wang, J. Lee, and C. Zhang. Kernel Trick Embedded Gaussian Mixture Model. Proceedings of the 14th International Conference of Algorithmic Learning Theory, pages 159174, 2003.
H. S. Wilf. Generatingfunctionology. A K Peters, Ltd., 3rd edition, 2006.
Z. Wu and R. Leahy. An Optimal Graph Theoretic Approach to Data Clustering: Theory and its Application to Image Segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 15:11:101113, 1993.X. Zhu. Semi-Supervised Learning Literature Survey.Technical Report 1530, Dept. of Computer Sciences, University of Wisconsin, Madison, 2005.
-----1
[1] D. H. Ackley, G. E. Hinton, and T. J. Seinowski. A learning algorithm for boltzmann machines. Cognitive Science, 9(1):147169, 1985.
[2] W. P. Bergsma and T. Rudas. Marginal models for categorical data. Annals of Statistics, 30(1):140159, 2002.
[3] M. Drton and T. S. Richardson. Binary models for marginal independence. Journal of the Royal Statisti- cal Society (Series B), 70(2):287309, 2008.
[4] R. J. Evans. Parametrizations of Discrete Graphical Models. PhD thesis, Department of Statistics, Univer- sity of Washington, 2011.
[5] R. J. Evans and A. Forcina. Two algorithms for fitting constrained marginal models. Computational Statis- tics & Data Analysis, 66:17, 2013.
[6] R. J. Evans and T. S. Richardson. Marginal log-linear parameters for graphical Markov models. Journal of the Royal Statistical Society (Series B) (to appear), 2013.
[7] D. Geiger, D. Heckerman, H. King, and C. Meek.Stratified exponential families: Graphical models and model selection. Annals of Statistics, 29(2):505529, 2001.
[8] S. L. Lauritzen. Graphical Models. Oxford, U.K.: Clarendon, 1996.
[9] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34(3):14361462, 2006.
[10] J. Pearl. Probabilistic Reasoning in Intelligent Sys- tems. Morgan and Kaufmann, San Mateo, 1988.
[11] T. S. Richardson, J. M. Robins, and I. Shpitser.Nested Markov properties for acyclic directed mixed graphs. In 28th Conference on Uncertainty in Artifi- cial Intelligence (UAI-12). AUAI Press, 2012.
[12] T. Rudas, W. P. Bergsma, and R. Nemeth. Parame- terization and estimation of path models for categor- ical data. In Proceedings in Computational Statistics, 17th Symposium, pages 383394. Physica-Verlag HD, 2006.
[13] G. E. Schwarz. Estimating the dimension of a model.Annals of Statistics, 6:461464, 1978.
[14] I. Shpitser, T. S. Richardson, and J. M. Robins. An efficient algorithm for computing interventional dis- tributions in latent variable causal models. In 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). AUAI Press, 2011.
[15] I. Shpitser, T. S. Richardson, J. M. Robins, and R. J.Evans. Parameter and structure learning in nested Markov models. In UAI Workshop on Causal Struc- ture Learning 2012, 2012.
[16] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Se- ries B), 58(1):267288, 1996.
[17] T. S. Verma and Judea Pearl. Equivalence and synthe- sis of causal models. Technical Report R-150, Depart- ment of Computer Science, University of California, Los Angeles, 1990.
-----1
[1] E. Altman. Constrained Markov Decision Processes.CRC Press, 1999.
[2] R.E Bellman. Dynamic programming. Princeton uni- versity press, Princeton, 1957.
[3] C. Boutilier. Sequential optimality and coordination in multiagent systems. In Proc. IJCAI, 1999.
[4] K. Chatterjee, R. Majumdar, and T.A. Henzinger.Markov decision processes with multiple objectives.In STACS, 2006.
[5] K.M. Chong. An induction theorem for rearrange- ments. Canadian Journal of Mathematics, 28:154 160, 1976.
[6] I. Diakonikolas and M. Yannakakis. Small ap- proximate pareto sets for biobjective shortest paths and other problems. SIAM Journal on Computing, 39(4):13401371, 2009.
[7] D. Dolgov and E. Durfee. Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors. In IJCAI, 2005.
[8] N. Furukawa. Vector-valued markovian decision pro- cesses with countable state space. Ann. Math. Stat., 36, 1965.
[9] C. Guestrin, D. Koller, and R. Parr. Multiagent plan- ning with factored MDPs. In NIPS, 2001.
[10] P. Hansen. Multiple Criteria Decision Making Theory and Application, chapter Bicriterion Path Problems, pages 109127. Springer, 1979.
[11] R.A. Howard. Dynamic Programming and Markov Processes. The M.I.T. Press, 1960.
[12] J.Y. Kwak, P. Varakantham, R. Maheswaran, M. Tambe, F. Jazizadeh, G. Kavulya, L. Klein, B. Becerik-Gerber, T. Hayes, and W. Wood. Saves: A sustainable multiagent application to conserve build- ing energy considering occupants. In AAMAS, 2012.
[13] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler. Com- bining convergence and diversity in evolutionary mul- tiobjective optimization. Evolutionary Computation, 10(3):263282., 2002.
[14] A.W. Marshall and I. Olkin. Inequalities: Theory of Majorization and its Applications. Academic Press, 1979.
[15] A.I. Mouaddib. Multi-objective decision-theoretic path planning. IEEE Int. Conf. Robotics and Automa- tion, 3:28142819, 2004.
[16] W. Ogryczak, P. Perny, and P. Weng. On minimizing ordered weighted regrets in multiobjective Markov de- cision processes. In Int. Conf. on Algorithmic Decision Theory, 2011.
[17] W. Ogryczak and T. Sliwinski. On solving linear pro- grams with the ordered weighted averaging objective.Eur. J. Operational Research, 148:8091, 2003.
[18] A. Osyczka. An approach to multicriterion optimiza- tion problems for engineering design. Computer Meth- ods in Applied Mechanics and Engineering, 15(3):309 333, 1978.
[19] C.H. Papadimitriou and M. Yannakakis. On the ap- proximability of trade-offs and optimal access of web sources. In FOCS, pages 8692, 2000.
[20] P. Perny and O. Spanjaard. An axiomatic approach to robustness in search problems with multiple scenarios.In UAI, volume 19, pages 469476, 2003.
[21] P. Perny and P. Weng. On finding compromise solu- tions in multiobjective Markov decision processes. In European Conference on Artificial Intelligence Mul- tidisciplinary Workshop on Advances in Preference Handling, 2010.
[22] M.L. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley, 1994.
[23] P. Serafini. Some considerations about computational complexity for multiobjective combinatorial problems.In J. Jahn and W. Krabs, editors, Recent advances and historical development of vector optimization, volume 294 of Lecture Notes in Economics and Mathematical Systems, Berlin, 1986. Springer-Verlag.
[24] A.F. Shorrocks. Ranking income distributions. Eco- nomica, 50:317, 1983.
[25] R.E. Steuer. Multiple criteria optimization. John Wi- ley, 1986.
[26] B. Viswanathan, V.V. Aggarwal, and K.P.K. Nair.Multiple criteria Markov decision processes. TIMS Studies in the Management Sciences, 6:263272, 1977.
[27] D.J. White. Multi-objective infinite-horizon dis- counted Markov decision processes. J. Math. Anal.Appls., 89:639647, 1982.
[28] A.P. Wierzbicki. A mathematical basis for satisficing decision making. Mathematical Mod- elling, 3:391 405, 1982.
-----0
N. Alon and J. H. Spencer. The probabilistic method.Wiley-Interscience series in discrete mathematics and optimization. Wiley, 2000.
M. Biskup, C. Borgs, J. T. Chayes, and R. Kotecky.Gibbs states of graphical representations of the Potts model with external fields. Journal of Mathematical Physics, 41(3):11701210, 2000.
Y. Crama and P. L Hammer. Boolean functions: Theory, algorithms, and applications, volume 142. Cambridge University Press, 2011.
L. Goldberg and M. Jerrum. Approximating the partition function of the ferromagnetic Potts model. In Automata, Languages and Programming, volume 6198 of Lecture Notes in Computer Science, pages 396407. Springer Berlin Heidelberg, 2010.
G. Grimmett. The Random-Cluster Model. Number 333 in Grundl. Math. Wissen. Springer-Verlag, New York, 2006.
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2): 183233, 1999.
L. Lovasz. Submodular functions and convexity. In Mathematical Programming The State of the Art, pages 235257. Springer Berlin Heidelberg, 1983.
N. Ruozzi. The Bethe partition function of logsupermodular graphical models. In Neural Information Processing Systems (NIPS), Lake Tahoe, NV, Dec. 2012.
Dmitrij Schlesinger. Exact solution of permuted submodular minsum problems. In Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pages 2838. Springer, 2007.
A. D. Sokal. Surveys in Combinatorics 2005, chapter The multivariate Tutte polynomial (alias Potts model) for graphs and matroids. Cambridge University Press, 2005.
B. Szegedy. Edge coloring models and reflection positivity. J. Amer. Math. Soc., 20(4):969988, 2007.
P. O. Vontobel. Counting in graph covers: A combinatorial characterization of the Bethe entropy function. Information Theory, IEEE Transactions on, Jan. 2013.
P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of messagepassing iterative decoding of LDPC codes. CoRR, abs/cs/0512078, 2005.
Y. Weiss. Advanced Mean Field Methods: Theory and Practice, chapter Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs, pages 229 239. MIT Press, 2001.
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. Information 
Theory, IEEE Transactions on, 51(7):2282  2312, July 2005.
-----1
[1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Dis- tributed optimization and statistical learning via the alternat- ing direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1122, 2011.
[2] Y. Censor and S. Zenios. Parallel Optimization: Theory, Al- gorithms, and Applications. Oxford University Press, 1998.
[3] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[4] C. Chekuri, S. Khanna, J. Naor, and L. Zosin. A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM Journal on Discrete Mathematics, 18(3):608625, Mar. 2005.
[5] Q. Fu, A. Banerjee, S. Liess, and P. K. Snyder. Drought detection of the last century: An MRF-based approach. In Proceedings of the SIAM International Conference on Data Mining, 2012.
[6] Q. Fu, H. Wang, A. Banerjee, S. Liess, and P. K. Sny- der. MAP inference on million node graphical models: KL-divergence based alternating directions method. Techni- cal report, Computer Science and Engineering Department, University of Minesota, 2012.
[7] A. Globerson and T. Jaakkola. Fixing max-product: Conver- gent message passing algorithms for MAP LP-relaxations.In Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, 2007.
[8] B. He and X. Yuan. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM Journal on Numerical Analysis, 50(2):700709, 2012.
[9] V. Jojic, S. Gould, and D. Koller. Fast and smooth: Acceler- ated dual decomposition for MAP inference. In Proceedings of the twenty-Seventh International Conference on Machine Learning, 2010.
[10] S. P. Kasiviswanathan, P. Melville, A. Banerjee, and V. Sind- hwani. Emerging topic detection using dictionary learning.In Proceedings of the Twentieth ACM international confer- ence on Information and knowledge management, 2011.
[11] N. Komodakis, N. Paragios, and G. Tziritas. MRF energy minimization and beyond via dual decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3):531 552, march 2011.
[12] F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498519, 2001.
[13] A. F. Martins. The Geometry of Constrained Structured Pre- diction: Applications to Inference and Learning of Natural Language Syntax. PhD thesis, Carnegie Mellon University, 2012.
[14] A. F. Martins, P. M. Aguiar, M. A. Figueiredo, N. A.Smith, and E. P. Xing. An augmented Lagrangian ap- proach to constrained MAP inference. In Proceedings of the Twenty-Eighth International Conference on Machine Learn- ing, 2011.
[15] O. Meshi and A. Globerson. An alternating direction method for dual MAP LP relaxation. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.
[16] T. D. Mitchell, T. R. Carter, P. D. Jones, M. Hulme, and M. New. A comprehensive set of high-resolution grids of monthly climate for Europe and the globe: the observed record (1901-2000) and 16 scenarios (2001-2100). Tyndall Centre for Climate Change Research, 2004.
[17] P. Raghavan and C. D. Thompson. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Combinatorica, 7(4):365374, 1987.
[18] P. Ravikumar, A. Agarwal, and M. J. Wainwright. Message- passing for graph-structured linear programs: Proximal methods and rounding schemes. Journal of Machine Learn- ing Research, 11:10431080, 2010.
[19] D. Sontag, A. Globerson, and T. Jaakkola. Introduction to dual decomposition for inference. In S. Sra, S. Nowozin, and S. J. Wright, editors, Optimization for Machine Learning.MIT Press, 2011.
[20] D. Sontag and T. Jaakkola. Tree block coordinate descent for MAP in graphical models. In Proceedings of the Twelfth In- ternational Conference on Artificial Intelligence and Statis- tics.
[21] D. Tarlow, D. Batra, P. Kohli, and V. Kolmogorov. Dy- namic tree block coordinate ascent. In Proceedings of the Twenty-Eighth International Conference on Machine Learn- ing, 2011.
[22] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. MAP estimation via agreement on (hyper)trees: Message-passing and linear-programming approaches. IEEE Transactions of Information Theory, 51(11):36973717, 2005.
[23] M. J. Wainwright and M. I. Jordan. Graphical models, expo- nential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1305, 2008.
[24] H. Wang and A. Banerjee. Online alternating direction method. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.
[25] J. Yang and Y. Zhang. Alternating direction algorithms for l1-problems in compressive sensing. SIAM Journal on Sci- entific Computing, 33(1):250278, 2011.
[26] C. Yanover, T. Meltzer, and Y. Weiss. Linear programming relaxations and belief propagation: an empirical study. Jour- mal of Machine Learning Research, 7:18871907, 2006.
-----0
B. Abramson, J. Brown, W. Edwards, A. Murphy, and R. L. Winkler. Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12(1):5771, 1996.
I. A. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In J. Hunter, editor, Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, pages 247256.Springer-Verlag, 1989.
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.
W. Buntine. Theory refinement on Bayesian networks.In B. DAmbrosio and P. Smets, editors, UAI, pages 5260. Morgan Kaufmann, 1991.
M. Charikar, P. Indyk, and R. Panigrahy. New algorithms for subset query, partial match, orthogonal range searching, and related problems. In P. Widmayer, F. T. Ruiz, R. M. Bueno, M. Hennessy, S. Eidenbenz, and R. Conejo, editors, ICALP, volume 2380 of Lecture Notes in Computer Science, pages 451462. Springer, 2002.
B. Ellis and W. H. Wong. Learning causal Bayesian network structures from experimental data. Journal of the American Statistical Association, 103:778 789, 2008.
N. Friedman and D. Koller. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50:95125, 2003.
D. Heckerman, D. Geiger, and D. M. Chickering.Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197243, 1995.
R. Kennes. Computational aspects of the Moebius transformation of graphs. IEEE Transactions on 
Systems, Man, and Cybernetics, 22:201223, 1991.R. Kennes and P. Smets. Computational aspects of the Mobius transformation. In P. P. Bonissone, M. Henrion, L. N. Kanal, and J. F. Lemmer, editors, UAI, pages 401416. Elsevier, 1990.
M. Koivisto. Advances in exact Bayesian structure discovery in Bayesian networks. In UAI, pages 241 248. AUAI, 2006.
M. Koivisto and K. Sood. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5:549573, 2004.
T. Niinimaki and M. Koivisto. Annealed importance sampling for structure learning in Bayesian networks. In IJCAI, 2013. To appear.
T. Niinimaki, P. Parviainen, and M. Koivisto. Partial order MCMC for structure discovery in Bayesian networks. In F. G. Cozman and A. Pfeffer, editors, UAI, pages 557564. AUAI, 2011.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, 1988.
J. Pearl. Causality: Models, Reasoning, and Inference.Cambridge University Press, Cambridge, 2000.
M. Pa?trascu. Unifying the landscape of cell-probe lower bounds. SIAM J. Comput., 40(3):827847, 2011.
F. Yates. The design and analysis of factorial experiments. Harpenden: Imperial Bureau of Soil Science Technical Communication 35, 1937.
-----1
[1] C. S. Jensen, U. Kjaerulff, and A. Kong, Blocking Gibbs Sampling in Very Large Probabilistic Expert Systems, In- ternational Journal of Human Computer Studies. Special Issue on Real-World Applications of Uncertain Reasoning, vol. 42, pp. 647666, 1993.
[2] J. S. Liu, W. H. Wong, and A. Kong, Covariance structure of the Gibbs sampler with applications to the comparison of estimators and augmentation schemes, Biometrika, vol. 81, pp. 2740, 1994.
[3] S. Geman and D. Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 6, pp. 721741, 1984.
[4] F. Hamze and N. de Freitas, From Fields to Trees, in Pro- ceedings of the Twentieth Conference on Uncertainty in Ar- tificial Intelligence, pp. 243250, 2004.
[5] A. Darwiche, Modeling and Reasoning with Bayesian Net- works. Cambridge University Press, 2009.
[6] D. Koller and N. Friedman, Probabilistic Graphical Mod- els: Principles and Techniques. MIT Press, 2009.
[7] J. S. Liu, Monte Carlo Strategies in Scientific Computing.Springer Publishing Company, Incorporated, 2001.
[8] S. L. Lauritzen and D. J. Spiegelhalter, Local Computa- tions with Probabilities on Graphical Structures and Their Application to Expert Systems, Journal of the Royal Sta- tistical Society. Series B (Methodological), vol. 50, no. 2, pp. 157224, 1988.
[9] R. Dechter and R. Mateescu, AND/OR Search Spaces for Graphical Models, Artificial Intelligence, vol. 171, no. 2-3, pp. 73106, 2007.
[10] N. Zhang and D. Poole, A simple approach to Bayesian network computations, in Proceedings of the Tenth Bien- nial Canadian Artificial Intelligence Conference, 1994.
[11] R. Dechter, Bucket elimination: A unifying framework for reasoning, Artificial Intelligence, vol. 113, pp. 4185, 1999.
[12] S. Arnborg, D. G. Corneil, and A. Proskurowski, Complex- ity of finding embeddings in a k-tree, SIAM Journal of Al- gebraic Discrete Methods, vol. 8, pp. 277284, Apr. 1987.
[13] R. T. Marler and J. S. Arora, Survey of multi-objective op- timization methods for engineering, Structural and Multi- disciplinary Optimization, vol. 26, pp. 369395, April 2004.
[14] C. Hwang and M. A. S., Multiple objective decision mak- ing, methods and applications: a state-of-the-art survey.Springer-Verlag, 1979.
[15] G. Roberts and J. Rosenthal, Coupling and Ergodic- ity of Adaptive MCMC, Journal of Applied Probability, vol. 44(2), pp. 458477, 2007.
[16] B. Bidyuk and R. Dechter, Cutset Sampling for Bayesian Networks., Journal of Artificial Intelligence Research, vol. 28, pp. 148, 2007.
[17] M. A. Paskin, Sample Propagation, in Advances in Neural Information Processing Systems, pp. 425432, 2003.
[18] D. Venugopal and V. Gogate, On lifting the gibbs sam- pling algorithm, in Proceedings of the 26th Annual Con- ference on Neural Information Processing Systems (NIPS), pp. 16641672, 2012.
[19] J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin, Parallel gibbs sampling: From colored fields to thin junction trees, in In Artificial Intelligence and Statistics, May 2011.
[20] L. Getoor and B. Taskar, eds., Introduction to Statistical Re- lational Learning. MIT Press, 2007.
[21] M. Richardson and P. Domingos, Markov Logic Net- works, Machine Learning, vol. 62, pp. 107136, 2006.
[22] P. Domingos and D. Lowd, Markov Logic: An Interface Layer for Artificial Intelligence. San Rafael, CA: Morgan & Claypool, 2009.
[23] M. Fishelson and D. Geiger, Optimizing Exact Genetic Linkage Computations, Journal of Computational Biology, vol. 11, no. 2/3, pp. 263275, 2004.
[24] B. Wemmenhove, J. M. Mooij, W.Wiegerinck, M. A. R.Leisink, H. J. Kappen, and J. P. Neijt, Inference in the promedas medical expert system, in Eleventh Conference on Artificial Intelligence in Medicine, pp. 456460, 2007.
-----1
[Boutilier et al., 2001] Craig Boutilier, Fahiem Bac- chus, and Ronen I. Brafman. Ucp-networks: A di- rected graphical representation of conditional utili- ties. In UAI, pages 5664, 2001.
[Boutilier et al., 2004] Craig Boutilier, Ronen I. Braf- man, Carmel Domshlak, Holger H. Hoos, and David Poole. Cp-nets: A tool for representing and rea- soning with conditional ceteris paribus preference statements. JAIR, 21:135191, 2004.
[Conitzer et al., 2011] Vincent Conitzer, Jerome Lang, and Lirong Xia. Hypercubewise preference aggregation in multi-issue domains. In IJCAI, pages 158163, 2011.
[Cornelio, 2012] Cristina Cornelio. Dynamic and probabilistic cp-nets. Masters thesis, University of Padova, 2012.
[de Amo et al., 2012] Sandra de Amo, Marcos L. P.Bueno, Guilherme Alves, and Nadia Felix F.da Silva. Cprefminer: An algorithm for mining user contextual preferences based on bayesian networks.In ICTAI, pages 114121, 2012.
[Flum and Grohe, 2006] Jorg Flum and Martin Grohe. Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series).Springer-Verlag, 2006.
[Gelle and Weigel, 1996] Esther Gelle and Rainer Weigel. Interactive configuration using constraint satisfaction techniques. In PACT, pages 3744, 1996.
[Goldsmith et al., 2005] Judy Goldsmith, Jerome Lang, Miroslaw Truszczynski, and Nic Wilson.The computational complexity of dominance and consistency in cp-nets. In IJCAI, pages 144149, 2005.
[Gonzales et al., 2008] Christophe Gonzales, Patrice Perny, and Sergio Queiroz. Preference aggregation with graphical utility models. In AAAI, pages 1037 1042, 2008.
[Koriche and Zanuttini, 2009] Frederic Koriche and Bruno Zanuttini. Learning conditional preference networks with queries. In IJCAI, 2009.
[Li et al., 2011] Minyi Li, Quoc Bao Vo, and Ryszard Kowalczyk. Majority-rule-based preference aggre- gation on multi-attribute domains with cp-nets. In The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AA- MAS, 2011.
[Rao and Georgeff, 1991] Anand S. Rao and Michael P. Georgeff. Modeling rational agents within a bdi-architecture. In KR, 1991.
[Vadhan, 2002] Salil P. Vadhan. The complexity of counting in sparse, regular, and planar graphs.SIAM J. Comput., 31(2):398427, February 2002.
[Xia et al., 2008] Lirong Xia, Vincent Conitzer, and Jerome Lang. Voting on multiattribute domains with cyclic preferential dependencies. In AAAI, pages 202207, 2008.
-----0
Bollen, K. A. (1989). Structural Equations with Latent Variables. John Wiley & Sons.
Eaton, D. and Murphy, K. (2007). Exact bayesian structure learning from uncertain interventions. In Proceedings AISTATS 2007.
Eberhardt, F., Hoyer, P. O., and Scheines, R. (2010).Combining experiments to discover linear cyclic models with latent variables. Proceedings AISTATS 2010, pages 185192.
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., and Scholkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21 (NIPS*2008), pages 689696.
Hyttinen, A., Eberhardt, F., and Hoyer, P. (2012).Learning linear cyclic causal models with latent variables. Journal for Machine Learning Research, 13:33873439.
Itani, S., Ohannessian, M., Sachs, K., Nolan, G. P., and Dahleh, M. A. (2010). Structure learning in causal cyclic networks. In JMLR Workshop and Conference Proceedings, volume 6, page 165176.
Lacerda, G., Spirtes, P., Ramsey, J., and Hoyer, P. O.(2008). Discovering cyclic causal models by independent components analysis. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI-2008).
Laplace, P. (1774). Memoir on the probability of causes of events. In Memoires de Mathematique et de Physique, Tome Sixie`me.
Meinshausen, N. and Buhlmann, P. (2010). Stability selection (with discussion). Journal of the Royal Statistical Society Series B, 72:417473.
Mooij, J. M., Janzing, D., Heskes, T., and Scholkopf, B. (2011). On causal discovery with cyclic additive noise models. In Shawe-Taylor, J., Zemel, R., 
Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems 24 (NIPS*2011), pages 639647.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.
Peters, J., Mooij, J. M., Janzing, D., and Scholkopf, B. (2011). Identifiability of causal graphs using functional models. In Cozman, F. G. and Pfeffer, A., editors, Proceedings of the 27th Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), pages 589598. AUAI Press.
Rasmussen, C. E. and Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press.
Sachs, K., Perez, O., Peer, D., Lauffenburger, D., and Nolan, G. (2005). Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308:523529.
Schmidt, M. and Murphy, K. (2009). Modeling discrete interventional data using directed cyclic graphical models. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI-09).
Shimizu, S., Hoyer, P. O., Hyvarinen, A., and Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:20032030.
Snelson, E. and Ghahramani, Z. (2006). Sparse gaussian processes using pseudo-inputs. In Weiss, 
Y., Scholkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18 (NIPS*2005), pages 12571264. MIT Press, Cambridge, MA.
Solak, E., Murray-Smith, R., Leithead, W. E., Leith, D. J., and Rasmussen, C. E. (2003). Derivative observations in gaussian process models of dynamic systems. In S. Becker, S. T. and Obermayer, K., editors, Advances in Neural Information Processing Systems 15 (NIPS*2002), pages 10331040. MIT Press, Cambridge, MA.
Zhang, K. and Hyvarinen, A. (2009). On the identifiability of the post-nonlinear causal model. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI-09).
-----0
Bollen, K. A. (1989). Structural Equations with Latent Variables. John Wiley & Sons.
Dash, D. (2005). Restructuring dynamic causal systems in equilibrium. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS 2005).
Hyttinen, A., Eberhardt, F., and Hoyer, P. (2012).Learning linear cyclic causal models with latent variables. Journal for Machine Learning Research, 13:33873439.
Itani, S., Ohannessian, M., Sachs, K., Nolan, G. P., and Dahleh, M. A. (2010). Structure learning in causal cyclic networks. In JMLR Workshop and Conference Proceedings, volume 6, page 165176.
Iwasaki, Y. and Simon, H. A. (1994). Causality and model abstraction. Artificial Intelligence, 67:143 194.
Koster, J. T. A. (1996). Markov properties of nonrecursive causal models. Annals of Statistics, 24(5):21482177.
Lacerda, G., Spirtes, P., Ramsey, J., and Hoyer, P. O.(2008). Discovering cyclic causal models by independent components analysis. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI-2008).
Lauritzen, S. (1996). Graphical models. Clarendon Press.
Mooij, J. M., Janzing, D., Heskes, T., and Scholkopf, B. (2011). On causal discovery with cyclic additive noise models. In Shawe-Taylor, J., Zemel, R.,  Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems 24 (NIPS*2011), pages 639647.
Murray, J. (2002). Mathematical Biology. I: An Introduction. Springer, 3 edition.
Neal, R. (2000). On deducing conditional independence from d -separation in causal graphs with feedback. Journal of Artificial Intelligence Research, 12:8791.
Pearl, J. (2000). Causality. Cambridge University Press.
Pearl, J. and Dechter, R. (1996). Identifying independence in causal graphs with feedback. In Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence (UAI-96), pages 420426.
Richardson, T. (1996). A discovery algorithm for directed cyclic graphs. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-1996).
Schmidt, M. and Murphy, K. (2009). Modeling discrete interventional data using directed cyclic graphical models. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI-09).
Sokol, A. and Hansen, N. R. (2013). Causal interpretation of stochastic differential equations. arXiv.org preprint, arXiv:1304.0217 [math.PR].
Spirtes, P. (1995). Directed cyclic graphical representations of feedback models. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI-95), page 491499.
Spirtes, P., Glymour, C., and Scheines, R. (1993).Causation, prediction, and search. Springer-Verlag.(2nd edition MIT Press 2000).
Voortman, M., Dash, D., and Druzdzel, M. (2010).Learning why things change: The difference-based causality learner. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 641650, Corvallis, Oregon.AUAI Press.
-----1
[1] M. Araya-Lo`pez, V. Thomas, O. Buffet, and F. Charpillet. A closer look at MOMDPs. In Proceedings of the Twenty-Second IEEE Interna- tional Conference on Tools with Artificial Intelli- gence (ICTAI-10), 2010.
[2] Richard Bellman. A Markovian Decision Process.Indiana Univ. Math. J., 6:679684, 1957.
[3] Didier Dubois and Henri Prade. Possibility theory as a basis for qualitative decision theory. pages 19241930. Morgan Kaufmann, 1995.
[4] Didier Dubois, Henri Prade, and Sandra San- dri. On possibility/probability transformations.In Proceedings of Fourth IFSA Conference, pages 103112. Kluwer Academic Publ, 1993.
[5] Didier Dubois, Henri Prade, and Philippe Smets.Representing partial ignorance. IEEE Trans. on Systems, Man and Cybernetics, 26:361377, 1996.
[6] Hele`ne Fargier, Jerome Lang, and Regis Sab- badin. Towards qualitative approaches to multi- stage decision making . 19:441471, 1998. Far- gLSab001.
[7] Hideaki Itoh and Kiyohiko Nakamura. Par- tially observable markov decision processes with imprecise parameters. Artificial Intelligence, 171(89):453  490, 2007.
[8] Yaodong Ni and Zhi-Qiang Liu. Policy iteration for bounded-parameter pomdps. Soft Computing, pages 112, 2012.
[9] Sylvie C. W. Ong, Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under uncertainty for robotic tasks with mixed observability. Int. J.Rob. Res., 29(8):10531068, July 2010.
[10] Cedric Pralet, Thomas Schiex, and Gerard Ver- faillie. Sequential Decision-Making Problems - Representation and Solution. Wiley, 2009.
[11] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edi- tion, 1994.
[12] Regis Sabbadin. A possibilistic model for qual- itative sequential decision problems under un- certainty in partially observable environments.In 15th Conference on Uncertainty in Artifi- cial Intelligence (UAI99) , Stockholm, 30/07/99- 01/08/99, pages 567564, San Francisco, juillet 1999. Morgan Kaufmann.
[13] Regis Sabbadin. Empirical comparison of proba- bilistic and possibilistic markov decision processes algorithms. In Werner Horn, editor, ECAI, pages 586590. IOS Press, 2000.
[14] Regis Sabbadin. Possibilistic markov decision pro- cesses. Engineering Applications of Artificial In- telligence, 14(3):287  300, 2001. Soft Computing for Planning and Scheduling.
[15] Richard D. Smallwood and Edward J. Sondik.The Optimal Control of Partially Observable Markov Processes Over a Finite Horizon, vol- ume 21. INFORMS, 1973.
[16] Paul Weng. Conditions generales pour ladmissibilite de la programmation dynamique dans la decision sequentielle possibiliste. Revue dIntelligence Artificielle, 21(1):129143, 2007.NAT LIP6 DECISION.
-----0
Mauricio A. Alvarez and Neil D. Lawrence. Computationally efficient convolved multiple output Gaussian processes. Journal of Machine Learning Research, 12:14251466, May 2011.
Lehel Csato and Manfred Opper. Sparse on-line Gaussian processes. Neural Computation, 14(3):641668, 2002.
Andreas Damianou, Michalis K. Titsias, and Neil D.Lawrence. Variational Gaussian process dynamical systems. In Peter Bartlett, Fernando Peirrera, Chris 
Williams, and John Lafferty, editors, Advances in Neural Information Processing Systems, volume 24, Cambridge, MA, 2011. MIT Press.
Andreas Damianou, Carl Henrik Ek, Michalis K. Titsias, and Neil D. Lawrence. Manifold relevance determination. In John Langford and Joelle Pineau, editors, Proceedings of the International Conference in Machine Learning, volume 29, San Francisco, CA, 2012. Morgan Kauffman. To appear.
Mark N. Gibbs and David J. C. MacKay. Variational Gaussian process classifiers. IEEE Transactions on Neural Networks, 11(6):14581464, 2000.
James Hensman, Magnus Rattray, and Neil D.Lawrence. Fast variational inference in the exponential family. NIPS 2012, 2012.
Matthew Hoffman, David M. Blei, Chong Wang, and John Paisley. Stochastic variational inference. arXiv preprint arXiv:1206.7051, 2012.
Malte Kuss and Carl Edward Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6:16791704, 2005.
Neil D. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6:17831816, 11 2005.
Joaquin Quinonero Candela and Carl Edward Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:19391959, 2005.
Carl Edward Rasmussen and Christopher K. I.Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. ISBN 0262-18253-X.
Matthias Seeger, Christopher K. I. Williams, and Neil D. Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In Christopher M. Bishop and Brendan J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, 36 Jan 2003.
Edward Snelson and Zoubin Ghahramani. Local and global sparse Gaussian process approximations. In Marina Meila and Xiaotong Shen, editors, Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics, San Juan, Puerto Rico, 21-24 March 2007. Omnipress.
Michalis K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Workshop on Artificial Intelligence and Statistics, volume 5, pages 567574, Clearwater Beach, FL, 16-18 April 2009. JMLR W&CP 5.
Michalis K. Titsias and Neil D. Lawrence. Bayesian Gaussian process latent variable model. In Yee Whye Teh and D. Michael Titterington, editors, Proceedings of the Thirteenth International Workshop on Artificial Intelligence and Statistics, volume 9, pages 844851, Chia Laguna Resort, Sardinia, Italy, 13-16 May 2010. JMLR W&CP 9.
Raquel Urtasun and Trevor Darrell. Local probabilistic regression for activity-independent human pose inference. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008.
-----0
Carmen Lacave and Francisco J D?ez. A review of explanation methods for Bayesian networks.The Knowledge Engineering Review, 17(2):107127, 2002.
J. Pearl. Probabilistic reasoning in intelligent systems.Morgan Kaufmann, San Francisco, CA, 1988.
Changhe Yuan and Tsai-Ching Lu. Finding explanations in Bayesian networks. In The 18th International Workshop on Principles of Diagnosis, pages 414419, 2007.
Tania Lombrozo. Explanation and abductive inference. Oxford handbook of thinking and reasoning, pages 260276, 2012.
Ulf Nielsen, Jean-Philippe Pellet, and Andre Elisseeff. Explanation trees for causal Bayesian networks.Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), 2008.
M Flores, Jose Gamez, and Seraf?n Moral. Abductive inference in Bayesian networks: finding a partition of the explanation space. Symbolic and Quantitative Approaches to Reasoning with Uncertainty, pages 470470, 2005.
Changhe Yuan, Heejin Lim, and Tsai-Ching Lu. Most relevant explanation in Bayesian networks. Journal of Artificial Intelligence Research, 42(1):309 352, 2011.
Solomon E Shimony. Explanation, irrelevance and statistical independence. In Proceedings of the ninth National conference on Artificial intelligenceVolume 1, pages 482487, 1991.
Luis M De Campos, Jose A Gamez, and Seraf?n Moral.Simplifying explanations in Bayesian belief networks. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(04):461489, 2001.
Branden Fitelson. Likelihoodism, Bayesianism, and relational confirmation. Synthese, 156(3):473489, 2007.
Joshua B Tenenbaum and Thomas L Griffiths. The rational basis of representativeness. In Proceedings of the 23rd annual conference of the Cognitive Science Society, pages 10361041, 2001.
Joshua T Abbott, Katherine A Heller, Zoubin Ghahramani, and Thomas L Griffiths. Testing a Bayesian Measure of Representativeness Using a Large Image Database. In Proceedings of the Thirty-First Annual Conference of the Cognitive Science Society, 2012.
Nihat Ay and Daniel Polani. Information flows in causal networks. Advances in Complex Systems, 11 (01):1741, 2008.
J. Pearl. Causality: Models, reasoning and inference.Cambridge University Press, Cambridge, UK, 2000.
Tania Lombrozo. Simplicity and probability in causal explanation. Cognitive Psychology, 55(3):232257, 2007.
Changhe Yuan. Some properties of Most Relevant Explanation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence ExaCt Workshop, pages 118126, 2009.
-----1
[1] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In Pro- ceedings of the Annual Conference on Neural Infor- mation Processing Systems (NIPS), pages 126134, Lake Tahoe, NV, USA, 2012.
[2] James O. Berger. Statistical Decision Theory and Bayesian Analysis. James O. Berger, 2nd edition, 1985.
[3] Steven Berry, James Levinsohn, and Ariel Pakes. Au- tomobile prices in market equilibrium. Econometrica, 63(4):841890, 1995.
[4] Steven Berry, James Levinsohn, and Ariel Pakes. Dif- ferentiated products demand systems from a combi- nation of micro and macro data: The new car market.Journal of Political Economy, 112(1):68105, 2004.
[5] Edwin Bonilla, Shengbo Guo, and Scott Sanner.Gaussian process preference elicitation. In Advances in Neural Information Processing Systems 23, pages 262270. 2010.
[6] Craig Boutilier. On the foundations of expected ex- pected utility. In Proceedings of the Eighteenth In- ternational Joint Conference on Artificial Intelligence (IJCAI), pages 285290, Acapulco, Mexico, 2003.
[7] Craig Boutilier. Computational Decision Support: Regret-based Models for Optimization and Prefer- ence Elicitation. In P. H. Crowley and T. R. Zentall, editors, Comparative Decision Making: Analysis and Support Across Disciplines and Applications. Oxford University Press, 2013.
[8] Ralph Allan Bradley and Milton E. Terry. Rank anal- ysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4):324345, 1952.
[9] Urszula Chajewska, Daphne Koller, and Ron Parr.Making rational decisions using adaptive utility elici- tation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 363369, Austin, TX, USA, 2000.
[10] Kathryn Chaloner and Isabella Verdinelli. Bayesian Experimental Design: A Review. Statistical Science, 10(3):273304, 1995.
[11] Vincent Conitzer and Tuomas Sandholm. Vote elic- itation: Complexity and strategy-proofness. In Pro- ceedings of the National Conference on Artificial In- telligence (AAAI), pages 392397, Edmonton, AB, Canada, 2002.
[12] Lester R. Ford, Jr. Solution of a ranking problem from binary comparisons. The American Mathemati- cal Monthly, 64(8):2833, 1957.
[13] Neil Houlsby, Jose Miguel Hernandez-Lobato, Fer- enc Huszar, and Zoubin Ghahramani. Collabora- tive gaussian processes for preference learning. In Proceedings of the Annual Conference on Neural In- formation Processing Systems (NIPS), pages 2105 2113. Lake Tahoe, NV, USA, 2012.
[14] Toshihiro Kamishima. Nantonac collaborative filter- ing: Recommendation based on order responses. In Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (KDD), pages 583588, Washington, DC, USA, 2003.
[15] Jerome Lang and Lirong Xia. Sequential composition of voting rules in multi-issue domains. Mathematical Social Sciences, 57(3):304324, 2009.
[16] Thomas A. Louis. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society Series B (Statistical Method- ology), 44:226233, 1982.
[17] Tyler Lu and Craig Boutilier. Robust approximation and incremental elicitation in voting protocols. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), pages 287293, Barcelona, Catalonia, Spain, 2011.
[18] Robert Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.
[19] Thomas Pfeiffer, Xi Alice Gao, Andrew Mao, Yiling Chen, and David G. Rand. Adaptive Polling and In- formation Aggregation. In Proceedings of the Na- tional Conference on Artificial Intelligence (AAAI), pages 122128, Toronto, Canada, 2012.
[20] Robin L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society. Series C (Ap- plied Statistics), 24(2):193202, 1975.
[21] Tuomas Sandholm and Craig Boutilier. Preference elicitation in combinatorial auctions. In Peter Cram- ton, Yoav Shoham, and Richard Steinberg, editors, Combinatorial Auctions, chapter 10, pages 233263.MIT Press, 2006.
[22] Louis Leon Thurstone. A law of comparative judge- ment. Psychological Review, 34(4):273286, 1927.
[23] Joan Walker and Moshe Ben-Akiva. Generalized ran- dom utility model. Mathematical Social Sciences, 43(3):303343, 2002.
-----1
[1] M. A. Alvarez, L. Rosasco, and N. Lawrence.Kernels for vector-valued functions: A review.Foundations and Trends in Machine Learning, 4(3):195266, 2012.
[2] M. Arbeitman, E. Furlong, F. Imam, E. John- son, B. Null, B. Baker, M. Krasnow, M. Scott, R. Davis, and K. White. Gene expression during the life cycle of drosophila melanogaster., 2002.
[3] A. Arnold, Y. Liu, and N. Abe. Temporal causal modeling with graphical granger methods. In KDD, 2007.
[4] N. Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical So- ciety, 68:337404, 1950.
[5] F. Bach, R. Jenatton, J. Mairal, and G. Obozin- ski. Optimization with sparsity inducing penal- ties. Foundations and Trends in Machine Learn- ing, 2011.
[6] G. Bakir, T. Hofmann, B. Schlkopf, A. Smola, B. Taskar, and S. E. Vishwanathan. Predicting Structured Data. MIT Press, 2007.
[7] P. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and struc- tural results. JMLR, 3:463482, 2002.
[8] P. Buhlmann and S. V. D. Geer. Statistics for High Dimensional Data. Springer, 2010.
[9] A. Caponnetto, M. Pontil, C.Micchelli, and Y. Ying. Universal multi-task kernels. Journal of Machine Learning Research, 9:16151646, 2008.
[10] K. Clarkson. Coresets, sparse greedy approxima- tion, and the frank-wolfe algorithm. ACM Trans- actions on Algorithms, 2010.
[11] C. Cortes, M. Mohri, and A. Rostamizadeh. Gen- eralization bounds for learning kernels. In ICML, 2010.
[12] P. G. G. P. Francesco Dinuzzo, Cheng Soon Ong.Learning output kernels with block coordinate de- scent. In ICML, 2011.
[13] B. Gartner and J. Matousek. Approxima- tion Algorithms and Semi-definite Programming.Springer-Verlag, Berlin Heidelberg, 2012.
[14] P. Gehler and S. Nowozin. On feature combina- tion for multiclass object classification. In ICCV, 2009.
[15] C. Granger. Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2:329352, 1980.
[16] E. Hazan. Sparse approximate solutions to semi- definite programs. In LATIN, 2008.
[17] M. Jaggi and M. Sulovsky. A simple algorithm for nuclear norm regularized problems. In ICML, 2010.
[18] H. Kadri, A. Rakotomamonjy, F. Bach, and P. Preux. Multiple operator-valued kernel learn- ing. In NIPS, 2012.
[19] S. Kakade, S. Shalev-Schwartz, and A. Tewari.Regularization techniques for learning with ma- trices. In Journal of Machine Learning Research, 2012.
[20] M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien.lp-norm multiple kernel learning. JMLR, 12:953 997, 2011.
[21] V. Koltchinskii and M. Yuan. Sparsity in mul- tiple kernel learning. The Annals of Statistics, 38(6):36603695, 2010.
[22] A. C. Lozano, N. Abe, Y. Liu, and S. Rosset.Grouped graphical granger modeling methods for temporal causal modeling. In KDD, pages 577 586, 2009.
[23] A. Maurer. The rademacher complexity of lin- ear transformation classes. In Proceedings of the Conference on Learning Theory (COLT), 2006.
[24] A. Maurer and M. Pontil. Structured sparsity and generalization. Journal of Machine Learning Research, 13:671690, 2012.
[25] C. A. Micchelli and M. Pontil. On learning vector- valued functions. Neural Computation, 17:177 204, 2005.
[26] C. Michelli and M. Pontil. Learning the kernel function via regularization. JMLR, 6:10991125, 2005.
[27] C. Michelli and M. Pontil. Regularizers for struc- tured sparsity. Advances in Computational Math- ematics, 2011.
[28] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In NIPS, 2007.
[29] A. Rakotomamonjy, F.Bach, S. Cano, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:24912521, 2008.
[30] J. Ramsay and B. W. Silverman. Functional Data Analysis. Springer, Providence, RI, 2005.
[31] A. J. Rothman, E. Levina, and J. Zhu. Sparse multivariate regression with covariance estima- tion. Journal of Computational and Graphical Statistics, 1:947962, 2010.
[32] L. Schwartz. Sous-espaces hilbertiens despaces vectoriels topologiques et noyaux associes (noy- aux reproduisants). J. Analyse Math., 13:115 256, 1964.
[33] A. Shojaie and G. Michailidis. Discovering graph- ical granger causality using the truncating lasso penalty. Bioinformatics, 26(18):i517i523, Sept.2010.
[34] V. Sindhwani, H. Q. Minh, and A. Lozano.Supplementary material at: http://arxiv.org/ abs/1210.4792v2. Technical report, 2013.
[35] A. Tikhonov. Regularization of incorrectly posed problems. Sov. Math. Dokl., 17:4:10351038, 1963.
[36] R. Tomioka and T. Suzuki. Regularization strate- gies and empirical bayesian learning for mkl. In NIPS Workshops 2010, 2010.
[37] A. Vedaldi, V. Gulshan, M. Varma, and A. Zis- serman. Multiple kernels for object detection.In International Conference on Computer Vision, 2009.
[38] Y. Ying and C. Cambell. Generalization bounds for learning the kernel problem. In COLT, 2009.
[39] M. Yuan, A. Ekici, Z. Lu, and R. Monteiro.Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 69:32946, 2007.
-----0
Arora, C., Banerjee, S., Kalra, P., & Maheshwari, S. N.(2012). Generic cuts: An efficient algorithm for optimal inference in higher order MRF-MAP. ECCV (5) (pp.1730).
Bayati, M., Borgs, C., Chayes, J., & Zecchina, R. (2008).On the exactness of the cavity method for weighted bmatchings on arbitrary graphs and its relation to linear programs. Journal of Statistical Mechanics: Theory and Experiment, 2008, L06001 (10pp).
Bayati, M., Shah, D., & Sharma, M. (2005). Maximum weight matching via max-product belief propagation.
IEEE International Symposium on Information Theory.Bertele, U., & Brioschi, F. (1972). Nonserial dynamic programming. Academic Press. ISBN 0-12-093450-7.
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 11241137.
Chan, T.-H. H., Chang, K., & Raman, R. (2009). An SDP primal-dual algorithm for approximating the Lovasztheta function. IEEE International Symposium on Information Theory.
Chudnovsky, M., Robertson, N., Seymour, P., & Thomas, R. (2006). The strong perfect graph theorem. Ann.Math, 164, 51229.
Chvatal, V. (1985). Star-cutsets and perfect graphs. J.Comb. Theory, Ser. B, 39, 189199.
Cooper, M. C., de Givry, S., Sanchez, M., Schiex, T., Zytnicki, M., & Werner, T. (2010). Soft arc consistency revisited. Artif. Intell., 174, 449478.
Diestel, R. (2010). Graph theory. Springer. Fourth edition.Faenza, Y., Oriolo, G., & Stauffer, G. (2011). An algorithmic decomposition of claw-free graphs leading to an O(n3)-algorithm for the weighted stable set problem.SODA (pp. 630646).
Foulds, J., Navaroli, N., Smyth, P., & Ihler, A. (2011). Revisiting MAP estimation, message passing, and perfect graphs. Artificial Intelligence and Statistics.
Gallai, T. (1962). Graphen mit triangulierbaren ungeraden Vielecken. Magyar Tud. Akad. Mat. Kutato Int. Kozl., 7, 336.
Greig, D., Porteous, B., & Seheult, A. (1989). Exact maximum a posteriori estimation for binary images. J. Royal Statistical Soc., Series B, 51, 271279.
Grotschel, M., Lovasz, L., & Schrijver, A. (1984). Topics on perfect graphs, chapter Polynomial algorithms for perfect graphs. North-Hollannd, Amsterdam.
Harary, F. (1953). On the notion of balance of a signed graph. Michigan Mathematical Journal, 2, 143146.
Huang, B., & Jebara, T. (2007). Loopy belief propagation for bipartite maximum weight b-matching. Artificial Intelligence and Statistics.
Jebara, T. (2009). MAP estimation, message passing, and perfect graphs. Uncertainty in Artificial Intelligence.
Jebara, T. (2012). Tractability: Practical approaches to hard problems, chapter Perfect graphs and graphical modeling. Cambridge Press.
Jegou, P. (1993). Decomposition of domains based on the micro-structure of finite constraint-satisfaction problems. AAAI (pp. 731736).
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Trans.Pattern Analysis and Machine Intelligence, 26, 147159.
Lovasz, L. (1972). Normal hypergraphs and the perfect graph conjecture. Discrete Mathematics, 2, 253267.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.
Pletscher, P., & Kohli, P. (2012). Learning low-order models for enforcing high-order statistics. AISTATS.
Ravikumar, P., & Lafferty, J. (2006). Quadratic programming relaxations for metric labeling and Markov random field MAP estimation. International Conference on Machine Learning.
Rother, C., Kolmogorov, V., Lempitsky, V. S., & Szummer, M. (2007). Optimizing binary MRFs via extended roof duality. CVPR.
Sanghavi, S., Malioutov, D., & Willsky, A. (2008). Linear programming analysis of loopy belief propagation for weighted matching. Neural Information Processing Systems.
Sanghavi, S., Shah, D., & Willsky, A. S. (2009). Message passing for maximum weight independent set. IEEE Transactions on Information Theory, 55, 48224834.
Shimony, S. (1994). Finding MAPs for belief networks is NP-hard. Aritifical Intelligence, 68, 399410.
Sontag, D., Meltzer, T., Globerson, A., Jaakkola, T., & Weiss, Y. (2008). Tightening LP relaxations for MAP using message passing. UAI (pp. 503510). AUAI Press.
Wainwright, M., & Jordan, M. (2008). Graphical models, exponential families and variational inference. Foundations and Trends in Machine Learning, 1, 1305.
Weiss, Y., Yanover, C., & Meltzer, T. (2007). MAP estimation, linear programming and belief propagation with convex free energies. UAI (pp. 416425). AUAI Press.
Yedidia, J., Freeman, W., & Weiss, Y. (2001). Understanding belief propagation and its generalizations. International Joint Conference on Artificial Intelligence, Distinguished Lecture Track.
Yildirim, E., & Fan-Orzechowski, X. (2006). On extracting maximum stable sets in perfect graphs using Lovaszs theta function. Computational Optimization and Applications, 33, 229247.
Zivny, S., Cohen, D. A., & Jeavons, P. G. (2009). The expressive power of binary submodular functions. Discrete Applied Mathematics, 157, 33473358.
-----0
Achterberg, Tobias. 2009. SCIP: Solving constraint integer programs. Mathematical Programming Computation, 1(1), 141.
Aliferis, Constantin F., Tsamardinos, Ioannis, Statnikov, Alexander R., & Brown, Laura E. 2003. Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery. Pages 371376 of: METMBS.
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. 1989. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. Pages 247256 of: Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine. Springer-Verlag.
Chickering, D. 1996. Learning Bayesian Networks is NP Complete. Pages 121130 of: Fisher, D., & Lenz, H.J.(eds), Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag.
Chickering, D. 2002. Learning Equivalence Classes of Bayesian-Network Structures. Journal of Machine Learning Research, 2, 445498.
Chickering, D., Heckerman, D., & Meek, C. 2004. LargeSample Learning of Bayesian Networks is NP-Hard.Journal of Machine Learning Research, 5, 12871330.
Cover, T.M., & Thomas, J.A. 2006. Elements of information theory. John Wiley & Sons.
Csiszar, I., & Korner, J. 2011. Information theory: coding theorems for discrete memoryless systems. Cambridge University Press.Cussens, James. 2011. Bayesian network learning with cutting planes. Pages 153160 of: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11). Corvallis, Oregon: AUAI Press.
Dasgupta, S. 1999. Learning polytrees. In: Proc. of the 15th Conference on Uncertainty in Artificial Intelligence.de Campos, Luis M. 2006. A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests. J. Mach. Learn. Res., 7(Dec.), 21492187.
Dembo, A., & Zeitouni, O. 2009. Large deviations techniques and applications. Vol. 38. Springer.Fast, A. 2010. Learning the structure of Bayesian networks with constraint satisfaction. Ph.D. thesis, University of Massachusetts Amherst.
Friedman, N., & Yakhini, Z. 1996. On the Sample Complexity of Learning Bayesian Networks. In: Proc. of the 12th Conference on Uncertainty in Artificial Intelligence.
Friedman, Nir, Nachman, Iftach, & Peer, Dana. 1999.Learning Bayesian Network Structure from Massive Datasets: The Sparse Candidate Algorithm. Pages 206215 of: UAI.
Heckerman, D., Geiger, D., & Chickering, D. M. 1995.Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3), 197243. Available as Technical Report MSRTR-94-09.
Hoeffding, W. 1965. Asymptotically optimal tests for multinomial distributions. The Annals of Mathematical Statistics, 369401.Hoffgen, K.U. 1993. Learning and robust learning of product distributions. Pages 7783 of: Proceedings of the sixth annual conference on Computational learning theory. ACM.
Jaakkola, Tommi, Sontag, David, Globerson, Amir, & Meila, Marina. 2010. Learning Bayesian Network Structure using LP Relaxations. Pages 358365 of: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AI-STATS), vol. 9.JMLR: W&CP.
Lam, Wai, & Bacchus, Fahiem. 1994. Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Computational Intelligence, 10, 269294.
Paninski, Liam. 2003. Estimation of entropy and mutual information. Neural Comput., 15(6), 11911253.
Pearl, Judea, & Verma, Thomas. 1991. A Theory of Inferred Causation. Pages 441452 of: KR.
Sachs, Karen, Perez, Omar, Peer, Dana, Lauffenburger, Douglas A., & Nolan, Garry P. 2005. Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Science, 308(5721), 523529.
Spirtes, P., Glymour, C., & Scheines, R. 2001. Causation, Prediction, and Search, 2nd Edition. The MIT Press.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. 2006.The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65(1), 31 78.
Zuk, Or, Margel, Shiri, & Domany, Eytan. 2006. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network. In: UAI. AUAI Press.
-----1
[1] S. B. Andersson and D. Hristu. Symbolic feed- back control for navigation. IEEE Transactions on Automatic Control, 51(6):926937, 2006.
[2] A. R. Cassandra, L. P. Kaelbling, and M. L.Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the National Conference on Artificial Intelligence, pages 1023 1023. JOHN WILEY & SONS LTD, 1995.
[3] A. Condon and R. J. Lipton. On the complexity of space bounded interactive proofs. In FOCS, pages 462467, 1989.
[4] K. Culik and J. Kari. Digital images and formal languages. Handbook of formal languages, pages 599616, 1997.
[5] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison.Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge Univ.Press, 1998.
[6] R. Durrett. Probability: Theory and Examples (Second Edition). Duxbury Press, 1996.
[7] J. Filar and K. Vrieze. Competitive Markov De- cision Processes. Springer-Verlag, 1997.
[8] H. Gimbert and Y. Oualhadj. Probabilistic au- tomata on finite words: Decidable and undecid- able problems. In Proc. of ICALP, LNCS 6199, pages 527538. Springer, 2010.
[9] E. A. Hansen and R. Zhou. Synthesis of hierarchi- cal finite-state controllers for pomdps. In ICAPS, pages 113122, 2003.
[10] H. Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.
[11] L. P. Kaelbling, M. L. Littman, and A. R. Cas- sandra. Planning and acting in partially observ- able stochastic domains. Artificial intelligence, 101(1):99134, 1998.
[12] L. P. Kaelbling, M. L. Littman, and A. W. Moore.Reinforcement learning: A survey. J. of Artif.Intell. Research, 4:237285, 1996.
[13] H. Kress-Gazit, G. E. Fainekos, and G. J. Pap- pas. Temporal-logic-based reactive mission and motion planning. IEEE Transactions on Robotics, 25(6):13701381, 2009.
[14] M. L. Littman, J. Goldsmith, and M. Mund- henk. The computational complexity of proba- bilistic planning. J. Artif. Intell. Res. (JAIR), 9:136, 1998.
[15] M. L. Littman. Algorithms for Sequential Deci- sion Making. PhD thesis, Brown University, 1996.
[16] O. Madani, S. Hanks, and A. Condon. On the un- decidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1-2):534, 2003.
[17] N. Meuleau, L. Peshkin, K-E. Kim, and L.P. Kael- bling. Learning finite-state controllers for par- tially observable environments. In Proceedings of the Fifteenth conference on Uncertainty in ar- tificial intelligence, UAI99, pages 427436, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[18] N. Meuleau, K.-E. Kim, L. P. Kaelbling, and A. R. Cassandra. Solving pomdps by searching the space of finite policies. In UAI, pages 417 426, 1999.
[19] M. Mohri. Finite-state transducers in language and speech processing. Computational Linguis- tics, 23(2):269311, 1997.
[20] C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision processes. Mathe- matics of Operations Research, 12:441450, 1987.
[21] A. Paz. Introduction to probabilistic automata (Computer science and applied mathematics).Academic Press, 1971.
[22] A. Pogosyants, R. Segala, and N. Lynch. Ver- ification of the randomized consensus algorithm of Aspnes and Herlihy: a case study. Distributed Computing, 13(3):155186, 2000.
[23] M. L. Puterman. Markov Decision Processes.John Wiley and Sons, 1994.
[24] M. O. Rabin. Probabilistic automata. Informa- tion and Control, 6:230245, 1963.
[25] J. H. Reif. The complexity of two-player games of incomplete information. Journal of Computer and System Sciences, 29(2):274301, 1984.
[26] M. I. A. Stoelinga. Fun with FireWire: Exper- iments with verifying the IEEE1394 root con- tention protocol. In Formal Aspects of Comput- ing, 2002.
[27] J. D. Williams and S. Young. Partially observable markov decision processes for spoken dialog sys- tems. Computer Speech & Language, 21(2):393 422, 2007.
-----0
Charu C Aggarwal and ChengXiang Zhai. A survey of text clustering algorithms. Mining Text Data, pages 77128, 2012.
Amr Ahmed and Eric P Xing. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 11401150. Association for Computational Linguistics, 2010.
David M Blei, Andrew Y Ng, and Michael I Jordan.Latent dirichlet allocation. the Journal of Machine Learning Research, 3:9931022, 2003.
Jonathan Boyd-Graber, Jordan Chang, Sean Gerrish, Chong Wang, and David Blei. Reading tea leaves: how humans interpret topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, 2009.
Deng Cai, Xiaofei He, and Jiawei Han. Locally consistent concept factorization for document clustering.Knowledge and Data Engineering, IEEE Transactions on, 23(6):902913, 2011.
Chaitanya Chemudugunta and Padhraic Smyth Mark Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, volume 19, page 241. MIT Press, 2007.
Scott Deerwester, Susan T. Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman.Indexing by latent semantic analysis. Journal of the American society for Information Science, 41(6): 391407, 1990.
Li Fei-Fei and Pietro Perona. A bayesian hierarchical model for learning natural scene categories. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 524531. IEEE, 2005.
Thomas Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177196, 2001.
Yue Lu, Qiaozhu Mei, and ChengXiang Zhai. Investigating task performance of probabilistic topic models: an empirical study of plsa and lda. Information Retrieval, 14(2):178203, 2011.
Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2: 849856, 2002.
Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine 
Intelligence, IEEE Transactions on, 22(8):888905, 2000.
Ivan Titov and Ryan McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web, pages 111120. ACM, 2008.
Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1305, 2008.
Hanna M Wallach. Structured topic models for language. Unpublished doctoral dissertation, Univ. of Cambridge, 2008.
Wei Xu and Yihong Gong. Document clustering by concept factorization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 202209. ACM, 2004.
Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and Development in Informaion Retrieval, pages 267273. ACM, 2003.
Jun Zhu, Li-Jia Li, Li Fei-Fei, and Eric P Xing. Large margin learning of upstream scene understanding models. Advances in Neural Information Processing Systems, 24, 2010.
-----0
F. Bacchus, J. Y. Halpern, and H. J. Levesque. Reasoning about noisy sensors and effectors in the situation calculus. Artificial Intelligence, 111(12):171  208, 1999.
V. Belle and H. J. Levesque. Reasoning about continuous uncertainty in the situation calculus. In Proc. IJCAI, 2013.
V. Belle and H. J. Levesque. Robot location estimation in the situation calculus. In ICAPS Workshop on Planning and Robotics. 2013.
C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun.Decision-theoretic, high-level agent programming in the situation calculus. In Proc. AAAI, pages 355362, 2000.
C. Boutilier, R. Reiter, and B. Price. Symbolic dynamic programming for first-order MDPs. In Proc. IJCAI, pages 690697, 2001.
G. E. P. Box and G. C. Tiao. Bayesian inference in statistical analysis. Addison-Wesley, 1973.
X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In Proc. UAI, pages 3342, 1998.
T. Dean and K. Kanazawa. Probabilistic temporal reasoning. In Proc. AAAI, pages 524529, 1988.
T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational intelligence, 5(2):142150, 1989.
T. Dean and M. Wellman. Planning and control. Morgan Kaufmann Publishers Inc., 1991.R. Fagin and J. Y. Halpern. Reasoning about knowledge and probability. J. ACM, 41(2):340367, 1994.
D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello. Bayesian filtering for location estimation. Pervasive Computing, IEEE, 2(3):2433, 2003.
C. Fritz and S. A. McIlraith. Monitoring plan optimality during execution. In Proc. ICAPS, pages 144151, 2007.
V. Gogate and P. Domingos. Formula-based probabilistic inference. In Proc. UAI, pages 210219, 2010.
H. Hajishirzi and E. Amir. Reasoning about deterministic actions with probabilistic prior and application to stochastic filtering. In Proc. KR, 2010.
J. Y. Halpern and M. R. Tuttle. Knowledge, probability, and adversaries. J. ACM, 40:917960, 1993.
J.Y. Halpern. An analysis of first-order logics of probability. Artificial Intelligence, 46(3):311350, 1990.
N. Kushmerick, S. Hanks, and D.S. Weld. An algorithm for probabilistic planning. Artificial Intelligence, 76(1):239286, 1995.
H. J. Levesque, F. Pirri, and R. Reiter. Foundations for the situation calculus. Electron. Trans. Artif. Intell., 2:159 178, 1998.
H. J. Levesque. What is planning in the presence of sensing? In Proc. AAAI / IAAI, pages 11391146, 1996.
J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of artificial intelligence. In Machine Intelligence, pages 463502, 1969.
J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
D. Poole. Decision theory, the situation calculus and conditional plans. Electron. Trans. Artif. Intell., 2:105158, 1998.
D. Poole. First-order probabilistic inference. In Proc. IJCAI, pages 985991, 2003.
R. Reiter. Knowledge in action: logical foundations for specifying and implementing dynamical systems. MIT Press, 2001.
M. Richardson and P. Domingos. Markov logic networks.Machine learning, 62(1):107136, 2006.
J. Rintanen. Regression for classical and nondeterministic planning. In Proc. ECAI, pages 568572, 2008.
S. Sanner, K. V. Delgado, and L. N. de Barros. Symbolic dynamic programming for discrete and continuous state MDPs. In Proc. UAI, pages 643652, 2011.
S. Sanner. Relational dynamic influence diagram language (rddl): Language description. Technical report, Australian National University, 2011.
R. B. Scherl and H. J. Levesque. Knowledge, action, and the frame problem. Artificial Intelligence, 144(1-2):1 39, 2003.
M. Thielscher. Planning with noisy actions (preliminary report). In Proc. Australian Joint Conference on Artificial Intelligence, pages 2745, 2001.
S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics.MIT Press, 2005.
J. Van Benthem, J. Gerbrandy, and B. Kooi. Dynamic update with probabilities. Studia Logica, 93(1):6796, 2009.
H. Van Ditmarsch, A. Herzig, and T. De Lima. Optimal regression for reasoning about knowledge and actions.In Proc. AAAI, pages 10701075, 2007.
R. Waldinger. Achieving several goals simultaneously. In Machine Intelligence, volume 8, pages 94136. 1977.
H. Younes and M. Littman. PPDDL 1. 0: An extension to pddl for expressing planning domains with probabilistic effects. Technical report, Carnegie Mellon University, 2004.
-----0
I. S. Abramson. On bandwidth variation in kernel estimates-a square root law. The Annals of Statistics, 10(4):12171223, 1982.
A. Berlinet and T. C. Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, 2004.
P. C. Bhat. Multivariate Analysis Methods in Particle Physics. Ann.Rev.Nucl.Part.Sci., 61:281309, 2011.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 9931022, 2003.
J. Bovy, J. F. Hennawi, D. W. Hogg, A. D. Myers, J. A.Kirkpatrick, D. J.Schlegel, N. P. Ross, E. S. Sheldon, I. D. McGreer, D. P. Schneider, and B. A. Weaver. Think outside the color box: Probabilistic target selection and the sdss-xdqso quasar targeting catalog. The Astrophysical Journal, 729(2):141, 2011.
L. Breiman, W. Meisel, and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics, 19 (2):135144, 1977.
P. K. Chan and M. V. Mahoney. Modeling multiple time series for anomaly detection. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), pages 9097. IEEE Computer Society, 2005.
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):158, July 2009.
K. Das, J. Schneider, and D. B. Neill. Anomaly pattern detection in categorical datasets. In ACM-SIGKDD, pages 169176. ACM, 2008.
K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5:7399, December 2004.
H. Hoffmann. Kernel PCA for novelty detection. Pattern Recognition, 40(3):863874, 2007.
J. A. Kirkpatrick, D. J. Schlegel, N. P. Ross, A. D. Myers, J. F. Hennawi, E. S.Sheldon, D. P. Schneider, and B. A.Weaver. A simple likelihood method for quasar target selection. The Astrophysical Journal, 743(2):125, 2011.
K. Muandet, K. Fukumizu, F. Dinuzzo, and B. Scholkopf.Learning from distributions via support measure machines. In Advances in Neural Information Processing Systems (NIPS), pages 1018. 2012.
B. Poczos, L. Xiong, and J. G. Schneider. Nonparametric divergence estimation with applications to machine learning on distributions. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pages 599608, 2011.
N. P. Ross, A. D. Myers, E. S. Sheldon, C. Yche, M. A.Strauss, J. Bovy, J. A. Kirkpatrick, G. T. Richards, ric Aubourg, M. R. Blanton, W. N. Brandt, W. C. Carithers, R. A. C. Croft, R. da Silva, K. Dawson, D. J. Eisenstein, J. F. Hennawi, S. Ho, D. W. Hogg, K.-G. Lee, B. Lundgren, R. G. McMahon, J. MiraldaEscud, N. Palanque-Delabrouille, I. Pris, P. Petitjean, M. M. Pieri, J. Rich, N. A. Roe, D. Schiminovich, D. J.Schlegel, D. P. Schneider, A. Slosar, N. Suzuki, J. L.Tinker, D. H. Weinberg, A. Weyant, M. White, and W. M. Wood-Vasey. The sdss-iii baryon oscillation spectroscopic survey: Quasar target selection for data release nine. The Astrophysical Journal Supplement Series, 199 (1):3, 2012.
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.ISBN 0262194759.
B. Scholkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a highdimensional distribution. Neural Computation, 13:1443 1471, 2001.
A. Smola, A. Gretton, L. Song, and B. Scholkopf. A hilbert space embedding for distributions. In Proceedings of the 18th International Conference on Algorithmic Learning Theory, pages 1331. Springer-Verlag, 2007.
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Scholkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 2010.
D. M. J. Tax and R. P. W. Duin. Support vector domain description. Pattern Recognition Letters, 20:11911199, 1999.
D. M. J. Tax and R. P. W. Duin. Support vector data description. Machine Learning, 54(1):4566, 2004.
G. R. Terrell and D. W. Scott. Variable kernel density estimation. The Annals of Statistics, 20(3):12361265, 1992.
T. Vatanen, M. Kuusela, E. Malmi, T. Raiko, T. Aaltonen, and Y. Nagai. Semi-supervised detection of collective anomalies with an application in high energy particle physics. In IJCNN, pages 18. IEEE, 2012.
L. Xiong, B. Poczos, and J. Schneider. Group anomaly detection using flexible genre models. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2011a.
L. Xiong, B. Poczos, J. G. Schneider, A. Connolly, and J. VanderPlas. Hierarchical probabilistic models for group anomaly detection. Journal of Machine Learning Research Proceedings Track, 15:789797, 2011b.
M. Zhao and V. Saligrama. Anomaly detection with score functions based on nearest neighbor graphs. In Proceedings of Advances in Neural Information Processing Systems (NIPS), pages 22502258, 2009.
-----1
[1] Olivier Bousquet and Manfred K. Warmuth.Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Re- search, 3:363396, 2002.
[2] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[3] Nicolo` Cesa-Bianchi, Yoav Freund, David Haus- sler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice.J. ACM, 44(3):427485, 1997.
[4] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. On- line passive-aggressive algorithms. Journal of Ma- chine Learning Research, 7:551585, 2006.
[5] Koby Crammer, Mark Dredze, and Fernando Pereira. Exact convex confidence-weighted learn- ing. In Advances in Neural Information Process- ing Systems (NIPS), pages 345352, 2008.
[6] Koby Crammer, Alex Kulesza, and Mark Dredze.Adaptive regularization of weight vectors. In NIP- S, pages 414422, 2009.
[7] Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification.In Proceedings of the 25th International Confer- ence on Machine Learning (ICML2008), pages 264271, 2008.
[8] Dean P. Foster and Rakesh V. Vohra. A random- ization rule for selecting forecasts. Oper. Res., 41:704709, July 1993.
[9] Yoav Freund and Robert E. Schapire. A decision- theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55:119139, August 1997.
[10] Claudio Gentile. A new approximate maximal margin classification algorithm. Journal of Ma- chine Learning Research, 2:213242, 2001.
[11] J. Hannan. Approximation to bayes risk in re- peated plays. Contributions to the Theory of Games, 3:97139, 1957.
[12] David Haussler, Jyrki Kivinen, and Manfred K.Warmuth. Tight worst-case loss bounds for pre- dicting with expert advice. In EuroCOLT, pages 6983, 1995.
[13] Mark Herbster and Manfred K. Warmuth. Track- ing the best expert. Machine Learning, 32(2):151 178, 1998.
[14] Steven C. H. Hoi, Rong Jin, Peilin Zhao, and Tianbao Yang. Online multiple kernel classifica- tion. Machine Learning, 90(2):289316, 2013.
[15] Steven C.H. Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Semisupervised svm batch mod- e active learning with applications to image re- trieval. ACM Transactions on Information Sys- tems (TOIS), 27(3):16, 2009.
[16] Steven C.H. Hoi, Jialei Wang, and Peilin Zhao.LIBOL: A Library for Online Learning Algo- rithms. Nanyang Technological University, 2012.
[17] Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, and Ramesh Jain. Micro-blogging sentiment detection by collaborative online learning. In ICDM, pages 893898, 2010.
[18] Yi Li and Philip M. Long. The relaxed on- line maximum margin algorithm. In Advances in Neural Information Processing Systems (NIPS), pages 498504, 1999.
[19] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212261, 1994.
[20] Francesco Orabona and Koby Crammer. New adaptive algorithms for online classification. In NIPS, pages 18401848, 2010.
[21] Frank Rosenblatt. The perceptron: A probabilis- tic model for information storage and organiza- tion in the brain. Psychological Review, 65:386 407, 1958.
[22] Volodimir G. Vovk. Aggregating strategies. In Proceedings of the third annual workshop on Com- putational learning theory (COLT'90), pages 371 386, 1990.
[23] Jialei Wang, Peilin Zhao, and Steven C. H. Hoi.Cost-sensitive online classification. In ICDM, pages 11401145, 2012.
[24] Jialei Wang, Peilin Zhao, and Steven C. H. Hoi.Exact soft confidence-weighted learning. In ICM- L, 2012.
[25] Jialei Wang, Peilin Zhao, Steven C.H. Hoi, and Rong Jin. Online feature selection and its appli- cations. IEEE Transactions on Knowledge and Data Engineering, pages 114, 2013.
[26] Peilin Zhao, Steven C. H. Hoi, and Rong Jin. Dou- ble updating online learning. Journal of Machine Learning Research, 12:15871615, 2011.
-----0
N. Angelopoulos and J. Cussens. Bayesian learning of Bayesian networks with informative priors. Annals of Mathematics and Artificial Intelligence, 54(1-3): 5398, 2008.
I. Beinlich, G. Suermondt, R. Chavez, and G. Cooper.The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the 2nd European Conference in Artificial Intelligence in Medicine, pages 247256, 1989.
J. Binder, D. Koller, S. Russell, and K. Kanazawa.Adaptive probabilistic networks with hidden variables. Machine Learning, 29(2-3):213244, 1997.
G. Borboudakis and I. Tsamardinos. Incorporating causal prior knowledge as path-constraints in Bayesian networks and maximal ancestral graphs.Proceedings of the 29th International Conference on Machine Learning, pages 17991806, 2012.
G. Borboudakis, S. Triantafillou, V. Lagani, and I. Tsamardinos. A constraint-based approach to incorporate prior knowledge in causal models. Proceeding of the 19th European Symposium on Artificial Neural Networks, pages 321326, 2011.
W. Buntine. Theory refinement on Bayesian networks.Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence, pages 5260, 1991.
G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309347, 1992.
T. M. Cover and J. A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, 2nd edition, 2006.
I. Csiszar. I-Divergence geometry of probability distributions and minimization problems. Annals of Probability, 3(1):146158, 1975.
J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43(5):14701480, 1972.
C. Demetrescu and G. F. Italiano. Maintaining dynamic matrices for fully dynamic transitive closure.Algorithmica, 51(4):387427, 2008.
W. E. Deming and F. F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11(4):427444, 1940.
P. Hansen, B. Jaumard, M. Poggi de Aragao, F. Chauny, and S. Perron. Probabilistic satisfiability with imprecise probabilities. International Journal of Approximate Reasoning, 24(2-3):171189, 2000.
D. Heckerman, D. Geiger, and D. M. Chickering.Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197243, 1995.
J. Kuipers and G. Moffa. Uniform generation of large random acyclic digraphs. ArXiv e-prints, May 2013.
C. Meek. Causal inference and causal explanation with background knowledge. Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence, pages 403410, August 1995.
G. Melancon, M. Bousquet-Melou, and I. Dutour.Random generation of DAGs for graph drawing.
Technical report, CWI, Stichting Mathematisch Centrum, 2000.R. E. Neapolitan. Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ, USA, 2003.
R. S. Niculescu, T. M. Mitchell, and R. B. Rao.Bayesian network learning with parameter constraints. Journal of Machine Learning Research, 7 (July):13571383, 2006.
R. T. ODonnell, A. E. Nicholson, B. Han, K. B. Korb, M. J. Alam, and L. R. Hope. Incorporating expert elicited structural information in the CaMML causal discovery program. Proceedings of the 19th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, pages 116, 2006.
J. Pearl. Causality, Models, Reasoning, and Inference.Cambridge University Press, New York, NY, USA, 2000.
C. Robert and S. Arno. Priors on network structures.biasing the search for Bayesian networks. International Journal of Approximate Reasoning, 24(1):39 57, 2000.
R. W. Robinson. Counting labeled acyclic digraphs.New Directions in the Theory of Graphs: Proceedings of the Third Annual Arbor Conference on Graph Theory, pages 239273, 1973.
K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theoretical Computer Science, 58(1-3):325346, 1988.P. Spirtes. Introduction to causal inference. Journal of Machine Learning Research, 11(May):16431662, 2010.
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000.
I. Tsamardinos, L. Brown, and C. Aliferis. The maxmin hill-climbing Bayesian network structure learning algorithm.Machine Learning, 65(1):3178, 2006.
J. Vomlel. Integrating inconsistent data in a probabilistic model. Journal of Applied NonClassical Logics, 14(3):367386, 2004.
-----0
Allman, Elizabeth S, Matias, Catherine, & Rhodes, John A. 2009. Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A), 30993132.
Anandkumar, A., Hsu, D., & Kakade, S. 2012a. A method of moments for mixture models and hidden Markov models. In: COLT.
Anandkumar, Anima, Foster, Dean, Hsu, Daniel, Kakade, Sham, & Liu, Yi-Kai. 2012b. A spectral algorithm for latent Dirichlet allocation. Pages 926 934 of: Advances in Neural Information Processing Systems 25.
Anandkumar, Animashree, Hsu, Daniel, Javanmard, Adel, & Kakade, Sham M. 2012c. Learning Linear Bayesian Networks with Latent Variables. arXiv preprint arXiv:1209.5350.
Anandkumar, Animashree, Hsu, Daniel, & Kakade, Sham M. 2012d. A method of moments for mixture models and hidden Markov models. arXiv preprint arXiv:1203.0683.
Berge, JosM.F. 1991. Kruskals polynomial for 22 2 arrays and a generalization to 2  n  n arrays.Psychometrika, 56, 631636.
Chang, Joseph T. 1996. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Mathematical biosciences, 137(1), 51 73.
Cooper, Gregory F. 1987. Probabilistic Inference Using Belief Networks Is NP-Hard. Technical Report BMIR-1987-0195. Medical Computer Science Group, Stanford University.
Heckerman, David E. 1990. A tractable inference algorithm for diagnosing multiple diseases. Knowledge Systems Laboratory, Stanford University.
Hsu, D., Kakade, S. M., & Liang, P. 2012. Identifiability and Unmixing of Latent Parse Trees. In: Advances in Neural Information Processing Systems (NIPS).
Jaakkola, Tommi S, & Jordan, Michael I. 1999. Variational Probabilistic Inference and the QMR-DT Network. Journal of Artificial Intelligence Research, 10, 291322.
Kearns, Michael, & Mansour, Yishay. 1998. Exact inference of hidden structure from sample data in noisy-OR networks. Pages 304310 of: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.
Miller, Randolph A., Pople, Harry E., & Myers, Jack D. 1982. Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine. New England Journal of Medicine, 307(8), 468476.
Miller, Randolph A., McNeil, Melissa A., Challinor, Sue M., Fred E. Masarie, Jr., & Myers, Jack D. 1986.
The INTERNIST-1/QUICK MEDICAL REFERENCE project  Status report. West J Med, 145(Dec), 816822.
Morris, Quaid. 2001. Anonymised QMR KB to aQMRDT.Mossel, Elchanan, & Roch, Sebastien. 2005. Learning nonsingular phylogenies and hidden Markov models.Pages 366375 of: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing.ACM.
Ng, Andrew Y, & Jordan, Michael I. 2000. Approximate inference algorithms for two-layer Bayesian networks. Advances in neural information processing systems, 12.
Shwe, Michael A, Middleton, B, Heckerman, DE, Henrion, M, Horvitz, EJ, Lehmann, HP, & Cooper, GF.1991. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Meth.Inform. Med, 30, 241255.
S?ingliar, Tomas?, & Hauskrecht, Milos?. 2006. Noisy-or component analysis and its application to link analysis. The Journal of Machine Learning Research, 7, 21892213.
-----0
P.L. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Annals of Statistics, 33 (4):14971537, 2005.
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J.W. Vaughan. A theory of learning from different domains. Machine Learning, 79(1): 151175, 2010.
G. Bennett. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57(297):3345, 1962.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM, 36(4):929965, 1989.
O. Bousquet. A bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique, 334(6):495 500, 2002.
O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. Advanced Lectures on Machine Learning, pages 169207, 2004.
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):1330, 1963.
R. Lata la. Sudakov minoration principle and supremum of some processes. Geometric & Functional Analysis GAFA, 7(5):936953, 1997.
S. Mendelson. A few notes on statistical learning theory. Advanced Lectures on Machine Learning, pages 140, 2003.
M. Mohri and A. Rostamizadeh. Stability bounds for stationary ?-mixing and ?-mixing processes.Journal of Machine Learning Research, 11:798814, 2010.
Michel Talagrand. Constructions of majorizing measures bernoulli processes and cotype. Geometric & Functional Analysis GAFA, 4(6):660717, 1994a.
Michel Talagrand. The supremum of some canonical processes. American Journal of Mathematics, 116 (2):283325, 1994b.
A. Van der Vaart and J. Wellner. Weak Convergence and Empirical Processes: with Aapplications to Statistics. Springer, 1996.
V.N. Vapnik. Statistical Learning Theory. Wiley, 1998.
-----0
Aine, S.; Chakrabarti, P. P.; and Kumar, R. 2007. AWA*a window constrained anytime heuristic search algorithm.In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI07, 22502255. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Buntine, W. 1991. Theory refinement on Bayesian networks. In Proceedings of the seventh conference (1991) on Uncertainty in artificial intelligence, 5260. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Chickering, D. M. 1996. Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V, 121130. Springer-Verlag.
Chickering, D. M. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2.
Cussens, J. 2011. Bayesian network learning with cutting planes. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI-11), 153160. Corvallis, Oregon: AUAI Press.
de Campos, C. P., and Ji, Q. 2011. Efficient learning of Bayesian networks using constraints. Journal of Machine Learning Research 12:663689.
Hansen, E. A., and Zhou, R. 2007. Anytime heuristic search. Journal of Artificial Intelligence Research 28.
Hart, P. E.; Nilsson, N. J.; and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions On Systems Science And Cybernetics 4(2):100107.
Heckerman, D. 1996. A tutorial on learning with Bayesian networks. Technical report, Learning in Graphical Models.
Ide, J. S., and Cozman, F. G. 2002. Random generation of bayesian networks. In Brazillian Symposium on Artificial Intelligence, 366375.
Jaakkola, T.; Sontag, D.; Globerson, A.; and Meila, M.2010. Learning Bayesian network structure using LP relaxations. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS).
Koivisto, M., and Sood, K. 2004. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research 549573.Lam, W., and Bacchus, F. 1994. Learning Bayesian belief networks: An approach based on the MDL principle.Computational Intelligence 10:269293.
Likhachev, M.; Gordon, G.; and Thrun, S. 2003.ARA*: Anytime A* search with provable bounds on suboptimality. In Thrun, S.; Saul, L.; and Scholkopf, B., eds., Proceedings of Conference on Neural Information Processing Systems (NIPS). MIT Press.
Malone, B., and Yuan, C. 2012. A parallel, anytime, bounded error algorithm for exact bayesian network structure learning. In Proceedings of the Sixth European Workshop on Probabilistic Graphical Models (PGM-12).
Malone, B.; Yuan, C.; Hansen, E.; and Bridges, S. 2011.Improving the scalability of optimal Bayesian network learning with external-memory frontier breadth-first branch and bound search. In Conference on Uncertainty in Artificial Intelligence (UAI-11), 479488. Corvallis, Oregon: AUAI Press.
Malone, B.; Yuan, C.; and Hansen, E. 2011. Memoryefficient dynamic programming for learning optimal Bayesian networks. In National conference on Artifical intelligence.
Moore, A., and Wong, W.-K. 2003. Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In Intl. Conf. on Machine Learning, 552559.
Ott, S.; Imoto, S.; and Miyano, S. 2004. Finding optimal models for small gene networks. In Pac. Symp. Biocomput, 557567.
Pohl, I. 1970. Heuristic search viewed as path finding in a graph. Artificial Intelligence 1(3-4):193  204.
Silander, T., and Myllymaki, P. 2006. A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06). Arlington, Virginia: AUAI Press.
Silander, T.; Roos, T.; Kontkanen, P.; and Myllymaki, P.2008. Factorized normalized maximum likelihood criterion for learning Bayesian network structures. In Proceedings of the 4th European Workshop on Probabilistic Graphical Models (PGM-08), 257272.
Singh, A., and Moore, A. 2005. Finding optimal Bayesian networks by dynamic programming. Technical report, Carnegie Mellon University.
Teyssier, M., and Koller, D. 2005. Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), 584590. Arlington, Virginia: AUAI Press.
Yuan, C., and Malone, B. 2012. An improved admissible heuristic for finding optimal bayesian networks. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI-12). AUAI Press.
Yuan, C.; Malone, B.; and Wu, X. 2011. Learning optimal Bayesian networks using A* search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.
-----0
Allan.R.Sampson. Positive dependence properties of elliptically symmetric distributions. Journal of Multivariate Analysis, 13(2):375381, 1983.
T. Bedford and R. Cooke. Vines a new graphical model for dependent random variables. Annals of Statistics, 2002.
A. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, 1997.
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, 1991.
G. Elidan. Lightning-speed structure learning of nonlinear continuous networks. In Proceedings of the AI and Statistics Conference (AISTATS), 2012.
Gal Elidan. Copula Bayesian networks. In Advances in Neural Information Processing Systems (NIPS), 2010.
A.M. Hanea, D. Kurowicka, Roger M. Cooke, and D.A.Ababei. Mining and visualising ordinal data with non-parametric continuous bbns. Comp Statistics and Data Analysis, 54(3):668687, 2010.
H. Joe. Majorization, randomness and dependence for mutivariate distributions. The Annals of Probability, 15(3):12171225, 1987.
H. Joe. Multivariate models and dependence concepts.Monographs on Statistics and Applied Probability, 73, 1997.S. Kirshner. Learning with tree-averaged densities and distributions. In Advances in Neural Information Processing Systems (NIPS), 2007.
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009.
D. Kurowicka and R. Cooke. The vine copula method for representing high dimensional dependent distributions: Applications to cont. belief nets. In Proc.of the Simulation Conf., 2002.
R.M. Marion, A. Regev, E. Segal, Y. Barash, D. Koller, N. Friedman, and E.K. OShea. Sfp1 is a stressand nutrient-sensitive regulator of ribosomal protein gene expression. Proc Natl Acad Sci U S A, 101(40):1431522, 2004.
R. Nelsen. An Introduction to Copulas. Springer, 2007.E. Parzen. On estimation of a probability density function and mode. Annals of Math. Statistics, 33:1065 1076, 1962.
J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.
B. Schweizer and E. Wolff. On nonparameteric measures of dependence for random variables. The Annals of Statistics, 9, 1981.
M. Shaked and J. Shanthikumar. Stochastic Orders.Springer, 2007.
A. Sklar. Fonctions de repartition a n dimensions et leurs marges. Publications de lInstitut de Statistique de LUniversite de Paris, 8:229231, 1959.
-----0
H. Aissi, C. Bazgan, and D. Vanderpooten. Approximation complexity of min-max (regret) versions of shortest path, spanning tree, and knapsack. In Proceedings of the 13th annual European conference on Algorithms, pages 862 873, Berlin, Heidelberg, 2005. Springer-Verlag.
F. Bach. Structured sparsity-inducing norms through submodular functions. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 118126. 2010.
G. H. Bakir, T. Hofmann, B. Scholkopf, A. J. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting Structured Data. The MIT Press, 2007.
C. Berge. The Theory of Graphs. Dover Books on Mathematics Series. Dover, 1962.J. K. Bradley and C. Guestrin. Learning tree conditional random fields. In J. Furnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 127134. Omnipress, 2010.
A. Chechetka and C. Guestrin. Evidence-specific structures for rich tractable crfs. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 352360. 2010.
C. I. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462467, 1968.
M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP, 2002.
K. Dembczynski, W. Cheng, and E. Hullermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, pages 279286, 2010.
T. Finley and T. Joachims. Training structural SVMs when exact inference is intractable. In Proceedings of the 25th International Conference on Machine learning, pages 304311, 2008.
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432441, 2008.
E. Hazan and S. Kale. Beyond convexity: Online submodular minimization. In Advances in Neural Information Processing Systems 22, pages 700708. 2009.
J. B. Kruskal. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proceedings of the American Math. Society, 7(1):4850, 1956.
S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher.Block-coordinate Frank-Wolfe optimization for structural SVMs. In Proceedings of The 30th International Conference on Machine Learning, pages 5361, 2013.
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th Int. Conf. on Machine Learning, pages 282289, 2001.
S.-I. Lee, V. Ganapathi, and D. Koller. Efficient structure learning of Markov networks using L1-regularization. In Advances in Neural Information Processing Systems (NIPS 2006), 2007.
O. Meshi, D. Sontag, T. Jaakkola, and A. Globerson.Learning efficiently with approximate inference via dual losses. In ICML, pages 783790, New York, NY, USA, 2010. ACM.
J. G. Oxley. Matroid Theory. Oxford University Press, 2006.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
P. Ravikumar, G. Raskutti, M. J. Wainwright, and B. Yu.Model selection in gaussian graphical models: Highdimensional consistency of l1-regularized MLE. In Advances in Neural Info. Processing Systems 17. 2008.
P. Ravikumar, M. J. Wainwright, and J. Lafferty. Highdimensional ising model selection using l1-regularized logistic regression. Annals of Statistics, 38(3):12871319, 2010.
M. Schmidt, K. Murphy, G. Fung, and R. Rosales.Structure learning in random fields for heart motion abnormality detection. In CVPR, pages 1 8, 2008.
Y. Shimony. Finding the MAPs for belief networks is NPhard. Aritifical Intelligence, 68(2):399410, 1994.
D. Sontag, O. Meshi, T. Jaakkola, and A. Globerson. More data means less inference: A pseudo-max approach to structured learning. In Advances in Neural Information Processing Systems 23, pages 21812189. 2010.
P. D. Tao and L. T. H. An. Convex analysis approach to dc programming: theory, algorithms and applications. Acta Mathematica Vietnamica, 22(1):289355, 1997.
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In Advances in Neural Information Processing Systems. MIT Press, 2003.
A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual models for object detection using boosted random fields. In Advances in Neural Information Processing Systems 17, pages 14011408. 2004.
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun.Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006.
M. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Now Publishers Inc., Hanover, MA, USA, 2008.
A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Comput., 15(4):915936, Apr. 2003.
Y. Zhang and J. Schneider. Maximum margin output coding. In ICML, 2012.
J. Zhu, E. P. Xing, and B. Zhang. Primal sparse maxmargin markov networks. In Proceedings of the ACM SIGKDD international conf. on Knowledge discovery and data mining, KDD 09, pages 10471056, 2009.
-----0
Bareinboim, E., and Pearl, J. 2012a. Causal inference by surrogate experiments: z-identifiability. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 113120. AUAI Press.
Bareinboim, E., and Pearl, J. 2012b. Transportability of causal effects: Completeness results. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 698704. AAAI Press.
Bareinboim, E., and Pearl, J. 2013a. Causal transportability of limited experiments. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press. To appear.
Bareinboim, E., and Pearl, J. 2013b. Metatransportability of causal effects: A formal approach. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics.To appear.
Galles, D., and Pearl, J. 1995. Testing identifiability of causal effects. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, 185195. Morgan Kaufmann.
Huang, Y., and Valtorta, M. 2006. Pearls calculus of intervention is complete. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 217224. AUAI Press.
Lee, S., and Honavar, V. 2013. m-transportability: Transportability of a causal effect from multiple environments. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press. To appear.
Pearl, J., and Bareinboim, E. 2011. Transportability of causal and statistical relations: A formal approach. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 247 254. AAAI Press.
Pearl, J. 1995. Causal diagrams for empirical research.Biometrika 82(4):669688.
Pearl, J. 2000. Causality: models, reasoning, and inference. New York, NY, USA: Cambridge University Press.
Pearl, J. 2012. The do-calculus revisited. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 411. AUAI Press.
Shpitser, I., and Pearl, J. 2006a. Identification of conditional interventional distributions. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 437444. AUAI Press.
Shpitser, I., and Pearl, J. 2006b. Identification of joint interventional distributions in recursive semimarkovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, 12191226. AAAI Press.
Tian, J., and Pearl, J. 2002. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, 567573. AAAI Press / The MIT Press.
Tian, J. 2004. Identifying conditional causal effects.In Proceedings of the Twentieth Conference in Uncertainty in Artificial Intelligence, 561568. AUAI Press.
-----0
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning.In NIPS, 2002.
B. Babenko, M.H. Yang, and S. Belongie. Visual tracking with online multiple instance learning. In Computer Vision and Pattern Recognition (CVPR), 2009.
R.C. Bunescu and R.J. Mooney. Multiple instance learning for sparse positive bags. In Proceedings of the 24th International Conference on Machine Learning (ICML), pages 105112. ACM, 2007.
C.C. Chang and C.J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
Y. Chen, J. Bi, and J.Z. Wang. Miles: Multipleinstance learning via embedded instance selection.T-PAMI, 28(12):19311947, 2006.
Thomas Deselaers and Vittorio Ferrari. A conditional random field for multiple-instance learning. 2010.
T.G. Dietterich, R.H. Lathrop, and T. Lozano-Perez.Solving the multiple instance problem with axisparallel rectangles. Artificial Intelligence, 89(1-2): 3171, 1997.
T.M.T. Do and T. Artie`res. Large margin training for hidden markov models with partially observed states. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 265272. ACM, 2009.
L. Duan, W. Li, I. Tsang, and D. Xu. Improving web image search by bag-based re-ranking. IEEE Transactions on Image Processing, 20(11): 32803290, 2011.
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 32(9):1627 1645, 2010.
T. Gartner, P.A. Flach, A. Kowalczyk, and A.J.Smola. Multi-instance kernels. In Proceedings of the 19th International Conference on Machine Learning (ICML), pages 179186, 2002.
P. Gehler and O. Chapelle. Deterministic annealing for multiple-instance learning. In AISTATS, 2007.
R. Gupta, A.A. Diwan, and S. Sarawagi. Efficient inference with cardinality-based clique potentials. In Proceedings of the 24th International Conference on Machine Learning (ICML), pages 329336. ACM, 2007.
H. Hajimirsadeghi and G. Mori. Multiple instance real boosting with aggregation functions. In Proceedings of the 21st International Conference on Pattern Recognition, 2012.H. Kueck and N. de Freitas. Learning about individuals from group statistics. In Conference on Uncertainty in Artificial Intelligence (UAI), 2005.
C. Leistner, A. Saffari, and H. Bischof. Miforests: Multiple-instance learning with randomized trees. In European Conference on Computer Vision (ECCV), 2010.
F. Li and C. Sminchisescu. Convex multiple-instance learning by estimating likelihood ratio. Advances in Neural Information Processing Systems (NIPS), pages 13601368, 2010.
W. Li, L. Duan, D. Xu, and I.W.H. Tsang. Text-based image retrieval using progressive multi-instance learning. In International Conference on Computer Vision (ICCV), pages 20492055. IEEE, 2011.
J. Malik, S. Belongie, T. K. Leung, and J. Shi. Contour and texture analysis for image segmentation.International Journal of Computer Vision, 43(1):7 27, 2001.
O. Maron and T. Lozano-Perez. A framework for multiple-instance learning. Advances in Neural Information Processing Systems (NIPS), pages 570 576, 1998.
N. Quadrianto, A. Smola, T. Caetano, and Q. Le. Estimating labels from label proportions. The Journal of Machine Learning Research, 10:23492374, 2009.
S. Ray and M. Craven. Supervised versus multiple instance learning: An empirical comparison. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 697704. ACM, 2005.
S. Rueping. Svm classifier estimation from group probabilities. In Proc. of the 27th Int. Conf. on Machine Learning (ICML), 2010.
Daniel Tarlow, Kevin Swersky, Richard S Zemel, Ryan Prescott Adams, and Brendan J Frey. Fast exact inference for recursive cardinality models. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3): 480492, 2012.
P. Viola, J. Platt, and C. Zhang. Multiple instance boosting for object detection. In NIPS, 2006.
Jonathan Warrell and Philip HS Torr. Multipleinstance learning with structured bag models. In Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 369384. Springer, 2011.
Q. Zhang and S.A. Goldman. Em-dd: An improved multiple-instance learning technique. Advances in Neural Information Processing Systems (NIPS), 14: 10731080, 2002.
Z.H. Zhou, Y.Y. Sun, and Y.F. Li. Multi-instance learning by treating instances as non-iid samples. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 1249 1256. ACM, 2009.
-----0
Aloise, D., Seshpande, A., Hansen, P., and Popat, P. Np-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75:245249, 2009.
Anandkumar, A., Hsu, D., and Kakade, S. A method of moments for mixture models and hidden Markov models. In Proc. Conference on Learning Theory, 2012.
Arora, S. and Kannan, R. Learning mixtures of separated nonspherical Gaussians. The Annals of Applied Probability, 15 (1A):6992, 2005.
Bach, F. and Harchaoui, Z. Diffrac: A discriminative and flexible framework for clustering. In Advances in Neural Information Processing Systems 20, 2007.
Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:17051749, 2005.
Berman, A. and Xu, C. 55 completely positive matrices. Linear Algebra and its Applications, 393:5571, 2004.
Borwein, J. and Lewis, A. Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS books in Mathematics.Canadian Mathematical Society, 2000.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1123, 2010.
Chapelle, O., Scholkopf, B., and Zien, A. (eds.). Semi-Supervised Learning. MIT Press, 2006.Chaudhuri, K., Dasgupta, S., and Vattani, A. Learning mixtures of Gaussians using the k-means algorithm. arXiv:0912.0086v1, 2009.
Dasgupta, S. The hardness of k-means clustering. Technical Report CS2008-0916, CSE Department, UCSD, 2008.
Dasgupta, S. and Schulman, L. A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. Journal of Machine Learning Research, 8:203226, 2007.
Dempster, A., Laird, N., and Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1):122, 1977.
Frank, A. and Asuncion, A. UCI machine learning repository, 2010. URL http://archive.ics.uci.edu/ml.
Guo, Y. and Schuurmans, D. Convex relaxations of latent variable training. In Adv. Neural Infor. Processing Systems 20, 2007.
Hansen, P., Jaumard, B., and Mladenovic, N. Minimum sum of squares clustering in a low dimensional space. Journal of Classification, 15(1):3755, 1998.
Horn, R. and Johnson, C. Matrix Analysis. Cambridge University Press, Cambridge, 1985.Hsu, D. and Kakade, S. Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. In Innovations in Theoretical Computer Science (ITCS), 2013.
Inaba, M., Katoh, N., and Imai, H. Applications of weighted Voronoi diagrams and randomization to variance-based kclustering. In Proc. Symp. Computational Geometry, 1994.
Joulin, A. and Bach, F. A convex relaxation for weakly supervised classifiers. In Proceedings of the International Conference on Machine Learning, 2012.
Joulin, A., Bach, F., and Ponce, J. Efficient optimization for discriminative latent class models. In Advances in Neural Information Processing Systems 23, 2010.
Kalai, A., Moitra, A., and Valiant, G. Efficiently learning mixtures of two Gaussians. In Proceedings ACM Symposium on Theory of Computing, 2010.
Kumar, A., Sabharwal, Y., and Sen, S. A simple linear time (1+ )-approximation algorithm for k-means clustering in any dimensions. In Proc. Symposium on Foundations of Computer Science,, 2004.
Lashkari, D. and Golland, P. Convex clustering with exemplarbased models. In Advances in Neural Information Processing Systems 20, 2007.
Laue, S. A hybrid algorithm for convex semidefinite optimization. In Proceedings of the International Conference on Machine Learning, 2012.
MacQueen, J. Some methods of classification and analysis of multivariate observations. In Proc. 5th Berkeley Symposium on Math., Stat., and Prob., pp. 281. 1967.
Mirsky, L. An Introduction to Linear Algebra. Oxford, 1955.Mirsky, L. A trace inequality of John von Neumann. Monatsh.Math., 79(4):303306, 1975.
Moitra, A. and Valiant, G. Settling the polynomial learnability of mixtures of Gaussians. In Proc. Symposium on Foundations of Computer Science,, 2010.
Neal, R. and Hinton, G. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M. (ed.), Learning in Graphical Models. Kluwer, 1998.
Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001.
Nowozin, S. and Bakir, G. A decoupled approach to exemplarbased unsupervised learning. In Proceedings of the International Conference on Machine Learning, 2008.
Peng, J. and Wei, Y. Approximating k-means-type clustering via semidefinite programming. SIAM J. on Optimization, 18:186 205, 2007.
Shi, J. and Malik, J. Normalized cuts and image segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888905, 2000.
Tsuda, K., Ratsch, G., and Warmuth, M. Matrix exponentiated gradient updates for on-line learning and Bregman projections.
In Advances in Neural Information Processing Systems 17, 2004.Wang, S. and Schuurmans, D. Learning continuous latent variable models with Bregman divergence. In International Conference on Algorithmic Learning Theory, 2003.
Xing, E. and Jordan, M. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley, 2003.
Xu, L. and Schuurmans, D. Unsupervised and semi-supervised multi-class support vector machines. In Proc. Conf. Association for the Advancement of Artificial Intelligence (AAAI), 2005.
Zass, R. and Shashua, A. A unifying approach to hard and probabilistic clustering. In Proc. Intl. Conf. Computer Vision, 2005.
Zha, H., Ding, C., Gu, M., He, X., and Simon, H. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems (NIPS), 2001.
Zhang, X., Yu, Y., and Schuurmans, D. Accelerated training for matrix-norm regularization: A boosting approach. In Advances in Neural Information Processing Systems 25, 2012.
-----0
Banerjee, O., El Ghaoui, L., & dAspremont, A.(2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR.
Banerjee, O., El Ghaoui, L., dAspremont, A., & Natsoulis, G. (2006). Convex optimization techniques for fitting sparse Gaussian graphical models. ICML.
Cohen, A., Dahmen, W., & DeVore, R. (2009). Compressed sensing and best k-term approximation. J.Amer. Math. Soc.
Dempster, A. (1972). Covariance selection. Biometrics.
Duchi, J., Gould, S., & Koller, D. (2008). Projected subgradient methods for learning sparse Gaussians.UAI.
Friedman, J., Hastie, T., & Tibshirani, R. (2007).Sparse inverse covariance estimation with the graphical lasso. Biostatistics.
Guillot, D., Rajaratnam, B., Rolfs, B., Maleki, A., & Wong, I. (2012). Iterative thresholding algorithm for sparse inverse covariance estimation. NIPS.
Hsieh, C., Dhillon, I., Ravikumar, P., & Banerjee, A.(2012). A divide-and-conquer procedure for sparse inverse covariance estimation. NIPS.
Hsieh, C., Sustik, M., Dhillon, I., & Ravikumar, P.(2011). Sparse inverse covariance matrix estimation using quadratic approximation. NIPS.
Huang, J., Liu, N., Pourahmadi, M., & Liu, L. (2006).Covariance matrix selection and estimation via penalized normal likelihood. Biometrika.
Johnson, C., Jalali, A., & Ravikumar, P. (2012). Highdimensional sparse inverse covariance estimation using greedy methods. AISTATS.
Johnstone, I. (2001). On the distribution of the largest principal component. The Annals of Statistics.
Lauritzen, S. (1996). Graphical models. Oxford Press.
Lim, Y. (2006). The matrix golden mean and its applications to Riccati matrix equations. SIAM Journal on Matrix Analysis and Applications.
Lu, Z. (2009). Smooth optimization approach for sparse covariance selection. SIAM Journal on Optimization.
Mazumder, R., & Hastie, T. (2012). Exact covariance thresholding into connected components for largescale graphical lasso. JMLR.
Meinshausen, N., & Buhlmann, P. (2006). High dimensional graphs and variable selection with the lasso.The Annals of Statistics.
Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics.
Ng, A. (2004). Feature selection, `1 vs. `2 regularization, and rotational invariance. ICML.
Olsen, P., Oztoprak, F., Nocedal, J., & Rennie, S.(2012). Newton-like methods for sparse inverse covariance estimation. NIPS.
Ravikumar, P., Wainwright, M., Raskutti, G., & Yu, B. (2011). High-dimensional covariance estimation by minimizing `1-penalized log-determinant divergence. Electronic Journal of Statistics.
Rothman, A., Bickel, P., Levina, E., & Zhu, J. (2008).Sparse permutation invariant covariance estimation.Electronic Journal of Statistics.
Scheinberg, K., Ma, S., & Goldfarb, D. (2010). Sparse inverse covariance selection via alternating linearization methods. NIPS.
Scheinberg, K., & Rish, I. (2010). Learning sparse Gaussian Markov networks using a greedy coordinate ascent approach. ECML.
Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-Newton algorithm. AISTATS.
Witten, D., & Tibshirani, R. (2009). Covarianceregularized regression and classification for highdimensional problems. Journal of the Royal Statistical Society.
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model.Biometrika.
Yun, S., Tseng, P., & Toh, K. (2011). A block coordinate gradient descent method for regularized convex separable optimization and covariance selection.Mathematical Programming.
Zhou, S., Rutimann, P., Xu, M., & Buhlmann, P.(2011). High-dimensional covariance estimation based on Gaussian graphical models. JMLR.
-----0
Mauricio Araya-Lopez, Vincent Thomas, Olivier Buffet, and Francois Charpillet. A closer look at momdps. In Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, volume 2, pages 197204. IEEE, 2010.
R Bellman. On the theory of dynamic programming.PNAS, 38(8):716719, 1952.
S Branson, P Perona, and S Belongie. Strong supervision from weak annotation: Interactive training of deformable part models. IEEE International Conference on Computer Vision, 2:18321839, 2011.
M.D. Buhmann. Radial basis functions: theory and implementations, volume 12. Cambridge university press, 2003.
N J Butko and J R Movellan. Infomax control of eyemovements. IEEE Transactions on Autonomous Mental Development, 2(2):91107, 2010.
Timothy H. Chung and Joel W. Burdick. A decisionmaking framework for control strategies in probabilistic search. Intl. Conference on Robotics and Automation. ICRA, April 2007.
T. Darrell and A. Pentland. Active gesture recognition using partially observable markov decision processes. In Pattern Recognition, 1996., Proceedings of the 13th International Conference on, volume 3, pages 984988. IEEE, 1996.
L. Dorard, D. Glowacka, and J. Shawe-Taylor. Gaussian process modelling of dependencies in multiarmed bandit problems. In Int. Symp. Op. Res, 2009.
J M Findley. Global processing for saccadic eye movements. Vision Research, 1982.
J C Gittins. Bandit processes and dynamic allocation indices. J. Royal Stat. Soc., 41:14877, 1979.
D. Golovin, A. Krause, and D. Ray. Near-optimal bayesian active learning with noisy observations.arXiv preprint arXiv:1010.3091, 2010.
L Itti and P Baldi. Bayesian surprise attracts human attention. In Advances in Neural Information 
Processing Systems, Vol. 19, pages 18, Cambridge, MA, 2006. MIT Press.
L Itti and C Koch. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10-12):1489506, 2000.
S. Ji, R. Parr, and L. Carin. Nonmyopic multiaspect sensing with partially observable markov decision processes. Signal Processing, IEEE Transactions on, 55(6):27202730, 2007.
R. Kaplow. Point-based POMDP solvers: Survey and comparative analysis. PhD thesis, McGill University, 2010.
C Koch and S Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. Hum.Neurobiol., 1985.
C. Kwok and D. Fox. Reinforcement learning for sensing strategies. In Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, volume 4, pages 3158 3163. IEEE, 2004.
M.G. Lagoudakis and R. Parr. Least-squares policy iteration. The Journal of Machine Learning Research, 4:11071149, 2003.
Tai Sing Lee and Stella Yu. An information-theoretic framework for understanding saccadic behaviors. In Advance in Neural Information Processing Systems, volume 12, Cambridge, MA, 2000. MIT Press.
K. Liu and Q. Zhao. Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. Information Theory, IEEE Transactions on, 56(11):55475567, 2010.
W.S. Lovejoy. Computationally feasible bounds for partially observed markov decision processes. Operations research, 39(1):162175, 1991.
D Martin, C Fowlkes, D Tal, and J Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th International Conference on Computer Vision, volume 2, pages 416423, 2001.
S Minut and S Mahadevan. A reinforcement learning model of selective visual attention. In Proceedings of the Fifth International Conference on Autonomous Agents, Montreal, 2001.
M. Naghshvar and T. Javidi. Active m-ary sequential hypothesis testing. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pages 16231627. IEEE, 2010.
J Najemnik and W S Geisler. Optimal eye movement strategies in visual search. Nature, 434(7031):387 91, 2005.
Sylvie CW Ong, Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under uncertainty for robotic tasks with mixed observability. The International Journal of Robotics Research, 29(8):1053 1068, 2010.
S. Pandey, D. Chakrabarti, and D. Agarwal. Multiarmed bandit problems with dependent arms. In Proceedings of the 24th international conference on Machine learning, pages 721728. ACM, 2007.
J. Pineau, G. Gordon, and S. Thrun. Anytime pointbased approximations for large pomdps. Journal of Artificial Intelligence Research, 27(1):335380, 2006.
W.B. Powell. Approximate Dynamic Programming: Solving the curses of dimensionality, volume 703.Wiley-Interscience, 2007.
M Sridharan, J Wyatt, and R Dearden. Planning to see: A hierarchical approach to planning visual actions on a robot using pomdps. Artificial Intelligence, pages 704725, 2010.
R. Washburn and M. Schneider. Optimal policies for a class of restless multiarmed bandit scheduling problems with applications to sensor management. Journal of Advances in Information Fusion. v3, 2008.
P Whittle. Restless bandits: activity allocation in a changing world. J. App. Probability, 25A:28798, 1988.
C K I Williams and C E Rasmussen. Gaussian processes for regression. In M.C. Mozer D. S. Touretzky and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 51420. MIT Press, Cambridge, MA, 1996.
A F Yarbus. Eye Movements and Vision. Plenum Press, New York, 1967.
G J Zelinsky, R P Rao, M M Hayhoe, and D H Ballard.Eye movements reveal the spatio-temporal dynamics of visual search. Psychol. Sci., 1997.
-----0
H. Abbey. An examination of the Reed-Frost theory of epidemics. Human Biology, 24(3):201233, 1952.
Kiyan Ahmadizadeh, Bistra Dilkina, Carla P. Gomes, and Ashish Sabharwal. An empirical study of optimization for maximizing diffusion in networks. In Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming, pages 514521, 2010.
Roy M. Anderson and Robert M. May, editors. Infectious diseases of humans: dynamics and control.Oxford press, 2002.
C.F. Daganzo. The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B: Methodological, 28(4):269287, 1994.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical society, Series B, 39(1):138, 1977.
Pedro Domingos and Matt Richardson. Mining the network value of customers. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 5766, 2001.
Robert Fourer, David M. Gay, and Brian W.Kernighan. AMPL: A Modeling Language for Mathematical Programming. Duxbury Press, November 2002.
Philip E. Gill, Walter Murray, and Michael A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Journal on Optimization, 12(4):9791006, 2002.
Daniel Golovin, Andreas Krause, Beth Gardner, Sarah J. Converse, and Steve Morey. Dynamic resource allocation in conservation planning. In Proceedings of the 25th Conference on Artificial Intelligence, pages 13311336, 2011.
Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery From Data, 5(4):21, 2012.
M. Elizabeth Halloran, Ira Longini, and Claudio Struchiner. Binomial and stochastic transmission models. In Design and Analysis of Vaccine Studies, Statistics for Biology and Health, pages 6384.Springer New York, 2010.
I. Hanski, editor. Metapopulation ecology. Oxford University Press, 1999.
David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137 146, 2003.
Akshat Kumar, Xiaojian Wu, and Shlomo Zilberstein.Lagrangian relaxation techniques for scalable spatial conservation planning. In AAAI Conference on Artificial Intelligence, pages 309315, 2012.
Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. The dynamics of viral marketing. ACM Trans. Web, 1(1), May 2007.
Seth A. Myers and Jure Leskovec. On the convexity of latent social network inference. In Advances in Neural Information Processing Systems, pages 17411749, 2010.
Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer 
Systems, SIGMETRICS 12, pages 211222, 2012.V. N. Nguyen, P. Kilby, and P. Lamb. Validation of the TRITRAM traffic network simulation model. In International Conference of ITS Australia, 1997.
Daniel R. Sheldon and Thomas G. Dietterich. Collective graphical models. In Advances in Neural Information Processing Systems, pages 11611169, 2011.
Daniel Sheldon, Bistra Dilkina, Adam Elmachtoub, Ryan Finseth, Ashish Sabharwal, Jon Conrad, Carla 
Gomes, David Shmoys, William Allen, Ole Amundsen, and William Vaughan. Maximizing the spread of cascades using network design. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pages 517526, 2010.
Sylvie Thiebaux, Peter Lamb, and Bella Robinson.Updating turn probabilities at intersections in response to traffic incidents. In International Conference of ITS Australia, 1999.
Liaoruo Wang, Stefano Ermon, and John E. Hopcroft.Feature-enhanced probabilistic models for diffusion network inference. In European conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD12, pages 499514, 2012.
-----1
[1] R. Bahar, E. Frohm, C. Gaona, G. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Algebraic decision diagrams and their applications. In Proceedings of the IEEE/ACM In- ternational Conference on Computer-Aided Design, pages 188191, 1993.
[2] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller.Context-Specific Independence in Bayesian Networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 115123, 1996.
[3] M. Chavira and A. Darwiche. Compiling Bayesian Net- works Using Variable Elimination. In Proceedings of the Twentieth International Joint Conference on Artificial Intel- ligence, pages 24432449, 2007.
[4] M. Chavira and A. Darwiche. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6- 7):772799, 2008.
[5] A. Culotta, A. McCallum, B. Selman, and A. Sabharwal.Sparse message passing algorithms for weighted maximum satisfiability. In New England Student Colloquium on Arti- ficial Intelligence (NESCAI), 2007.
[6] A. Darwiche. Modeling and Reasoning with Bayesian Net- works. Cambridge University Press, 2009.
[7] A. P. Dawid, U. Kjaerulff, and S. L. Lauritzen. Hybrid Prop- agation in Junction Trees. In Advances in Intelligent Com- puting, pages 8597, 1994.
[8] G. Elidan and A. Globerson. The 2010 UAI Ap- proximate Inference Challenge. Available online at: http://www.cs.huji.ac.il/project/UAI10/index.php, 2010.
[9] V. Gogate and R. Dechter. AND/OR Importance Sampling.In Proceedings of the Twenty-Fourth Conference on Uncer- tainty in Artificial Intelligence, pages 212219, 2008.
[10] V. Gogate and P. Domingos. Approximation by Quantiza- tion. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 247255, 2011.
[11] V. Gogate and P. Domingos. Probabilistic Theorem Prov- ing. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 256265. AUAI Press, 2011.
[12] L. D. Hernandez and S. Moral. Mixing Exact and Im- portance Sampling Propagation Algorithms in Dependence Graphs. International Journal of Approximate Reasoning, 12(8):553576, 1995.
[13] C. Huang and A. Darwiche. Inference in Belief Networks: A Procedural Guide. International Journal of Approximate Reasoning, 15(3):225263, 1996.
[14] A. T. Ihler and D. A. McAllester. Particle Belief Propaga- tion. Twelfth International Conference on Artificial Intelli- gence and Statistics, pages 256263, 2009.
[15] D. Koller and N. Friedman. Probabilistic Graphical Mod- els: Principles and Techniques. MIT Press, 2009.
[16] D. Larkin and R. Dechter. Bayesian inference in presence of determinism. In Ninth International Workshop on Artificial Intelligence and Statistics, 2003.
[17] S. L. Lauritzen and D. J. Spiegelhalter. Local Computations with Probabilities on Graphical Structures and Their Appli- cation to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2):157224, 1988.
[18] D. Lowd and P. Domingos. Approximate inference by com- pilation to arithmetic circuits. In Proceedings of the 24th Annual Conference on Neural Information Processing Sys- tems (NIPS), pages 14771485, 2010.
[19] R. Mateescu, K. Kask, V. Gogate, and R. Dechter. Iterative Join Graph Propagation algorithms. Journal of Artificial In- telligence Research, 37:279328, 2010.
[20] T. P. Minka. Expectation Propagation for approximate Bayesian inference. In Proceedings of the Seventeenth Con- ference on Uncertainty in Artificial Intelligence, pages 362 369, 2001.
[21] T. P. Minka and Y. Qi. Tree-structured Approximations by Expectation Propagation. In Proceedings of the 17th An- nual Conference on Neural Information Processing Systems (NIPS), 2003.
[22] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy Be- lief Propagation for Approximate Inference: An Empirical Study. In Proceedings of the Fifteenth Conference on Un- certainty in Artificial Intelligence, pages 467475, 1999.
[23] M. A. Paskin. Sample Propagation. In Advances in Neural Information Processing Systems, pages 425432, 2003.
[24] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA, 1988.
[25] T. Sang, P. Beame, and H. Kautz. Solving Bayesian net- works by weighted model counting. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 475482, 2005.
[26] S. Sanner and D. A. McAllester. Affine algebraic decision diagrams (aadds) and their application to structured proba- bilistic inference. In Proceedings of the Nineteenth Inter- national Joint Conference on Artificial Intelligence, pages 13841390, 2005.
[27] S. Sanner, W. T. B. Uther, and K. V. Delgado. Approximate dynamic programming with affine ADDs. In Nineth Inter- national Conference on Autonomous Agents and Multiagent Systems, pages 13491356, 2010.
[28] A. Silberschatz, H. F. Korth, and S. Sudarshan. Database System Concepts, 4th Edition. McGraw-Hill Book Com- pany, 2001.
[29] F. Somenzi. CUDD: CU Decision Diagram Package Release 2.2.0, 1998.
[30] R. St-aubin, J. Hoey, and C. Boutilier. APRICODD: Ap- proximate policy construction using decision diagrams. In In Proceedings of Conference on Neural Information Pro- cessing Systems, pages 10891095, 2000.
[31] E. B. Sudderth, A. T. Ihler, W. T. Freeman, and A. S. Will- sky. Nonparametric Belief Propagation. In IEEE Computer Society Conference on Computer Vision and Pattern Recog- nition, pages 605612, 2003.
[32] T. Walsh. SAT v CSP. In Sixth International Conference on Principles and Practice of Constraint Programming, pages 441456, 2000.
[33] M. Welling, T. P. Minka, and Y. W. Teh. Structured Re- gion Graphs: Morphing EP into GBP. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intel- ligence, pages 609614, 2005.
[34] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized Belief propaga- tion algorithms. IEEE Transactions on Information Theory, 51(7):22822312, 2005.
-----1
[1] James Aspnes, Kevin L. Chang, and Aleksandr Yampolskiy. Inoculation strategies for victims of viruses and the sum-of-squares partition problem.J. Comput. Syst. Sci., 72(6):10771093, 2006.
[2] Po-An Chen, Mary David, and David Kempe.Better vaccination strategies for better people. In ACM Conference on Electronic Commerce, pages 179188, 2010.
[3] A. Fabrikant, C. H. Papadimitriou, and K. Tal- war. The complexity of pure nash equilibria. In STOC, pages 604612, 2004.
[4] S. Hart and Y. Mansour. The communication complexity of uncoupled nash equilibrium proce- dures. In STOC, pages 345353, 2007.
[5] David Heckerman, Eric Horvitz, and Blackford Middleton. An approximate nonmyopic compu- tation for value of information. In UAI, pages 135141, 1991.
[6] B.A. Howard. Information value theory. IEEE Transactions on Systems Science and Cybernet- ics, 2:2226, 1996.
[7] Michael Kearns and Luis E. Ortiz. Algorithms for interdependent security games. In In Advances in Neural Information Processing Systems. MIT Press, 2004.
[8] Howard Kunreuther and Geoffrey Heal. Inter- dependent security. Journal of Risk and Uncer- tainty, 26(2-3):231249, 2003.
[9] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1996.
[10] Igal Milchtaich. Congestion games with player- specific payoff functions. Games and Economic Behavior, 13(1):111124, 1996.
[11] D. Monderer and L.S. Shapley. Potential games.Games and Economic Behavior, 14:124143, 1996.
[12] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, Cali- fornia, 1988.
[13] Kim-Leng Poh and Eric Horvitz. A graph- theoretic analysis of information value. In UAI, pages 427435, 1996.
[14] R.W. Rosenthal. A class of games possessing pure-strategy nash equilibria. International Jour- nal of Game Theory, 2:6567, 1973.
[15] H. Varian. Price discrimination and social welfare.American Economic Review, 75(4):870875, 1985.
[16] H. Varian. Price discrimination. In R. Schmalensee and R. Willig, editors, Handbook of Industrial Organization: Volume I, pages 597654. 1989.
[17] A. C. Yao. Some complexity questions related to distributive computing. In ACM Symposium on Theory of Computing, pages 209213, 1979.
-----0
Abdi, Herv. Partial least squares regression and projection on latent structure regression. Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 2010.
Allen, Genevera I. and Tibshirani, Robert. Inference with transposable data: modelling the effects of row and column correlations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012.
Altun, Yasemin and Smola, Alexander J. Unifying divergence minimization and statistical inference via convex duality. In COLT, 2006.
Argyriou, Andreas, Evgeniou, Theodoros, and Pontil, Massimiliano. Multi-task feature learning. In NIPS. 2007.
Bishop, Christopher M. Bayesian PCA. In NIPS, 1998.Bishop, C.M. Pattern Recognition and Machine Learning.Information Science and Statistics. Springer, 2006.
Borwein, J.M. and Zhu, Q.J. Techniques of Variational Analysis. CMS Books in Mathematics. Springer, 2005.
Brown, L.D. Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory.
Ims Lecture Notes-Monograph Ser.: Vol.9. Inst of Mathematical Statistic, 1986.
Candes, Emmanuel J. and Recht, Benjamin. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717772, 2009.
Cortes, Corinna and Mohri, Mehryar. Auc optimization vs.error rate minimization. In Advances in Neural Information Processing Systems. MIT Press, 2004.
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert.Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432441, 2008.
Ganchev, Kuzman, Graca, Joao, Gillenwater, Jennifer, and Taskar, Ben. Posterior regularization for structured latent variable models. J. Mach. Learn. Res., 11:20012049, 2010.
Gelfand, Alan E., Smith, Adrian F. M., and Lee, Tai-Ming.Bayesian analysis of constrained parameter and truncated data problems using gibbs sampling. Journal of the American Statistical Association, 87(418):pp. 523 532, 1992.
Jaakkola, Tommi, Meila, Marina, and Jebara, Tony. Maximum entropy discrimination. In NIPS, 1999.
Kullback, Solomon. Information Theory and Statistics.Dover, 1959.
Li, Lu and Toh, Kim-Chuan. An inexact interior point method for l 1-regularized sparse covariance selection. Mathematical Programming Computation, 2(3): 291315, 2010.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. J. Mach.Learn. Res., 12:28252830, 2011.
Pereira, Francisco, Mitchell, Tom, and Botvinick, Matthew.Machine learning classifiers and fMRI: A tutorial overview. NeuroImage, 45:S199S209, 2009. Mathematics in Brain Imaging.
Poldrack, Russell J.A. Inferring mental states from neuroimaging data: From reverse inference to large-scale decoding. Neuron, 72(5):692697, 2011.
Rai, Piyush and Daume III, Hal. Infinite predictor subspace models for multitask learning. In AISTATS, 2010.
Salakhutdinov, Ruslan and Mnih, Andriy. Probabilistic matrix factorization. In NIPS, volume 20, 2008.
Smola, A. J. and Kondor, I.R. Kernels and regularization on graphs. In COLT, 2003.
Stegle, Oliver, Lippert, Christoph, Mooij, Joris M., Lawrence, Neil D., and Borgwardt, Karsten M. Efficient inference in matrix-variate gaussian models with observation noise. In NIPS, 2011.
Williams, Peter M. Bayesian conditionalisation and the principle of minimum information. The British Journal for the Philosophy of Science, 31(2):131144, 1980.
Ye, Jieping and Xiong, Tao. Svm versus least squares svm.In AISTATS, 2007.
Yuan, Ming, Ekici, Ali, Lu, Zhaosong, and Monteiro, Renato. Dimension reduction and coefficient estimation in multivariate linear regression. J. Roy. Stat. Soc. B, 69(3): 329346, 2007.
Zellner, Arnold. Optimal information processing and bayess theorem. The American Statistician, 42(4):pp.278280, 1988.Zhang, Yi and Schneider, Jeff G. Learning multiple tasks with a sparse matrix-normal penalty. In NIPS, 2010.
Zhu, Jun. Max-margin nonparametric latent feature models for link prediction. In ICML, 2012.
Zhu, Jun, Ahmed, Amr, and Xing, Eric P. Medlda: maximum margin supervised topic models for regression and classification. In ICML, 2009.
Zhu, Jun, Chen, Ning, and Xing, Eric P. Infinite Latent SVM for Classification and Multi-task Learning. In NIPS, 2011.
Zhu, Jun, Chen, Ning, and Xing, Eric P. Bayesian inference with posterior regularization and infinite latent support vector machines. CoRR, abs/1210.1766, 2012. Retrieved 02/01/2013.