Edoardo M Airoldi

Associate Professor of Statistics, Harvard University
Associate Faculty Member, The Broad Insititute of MIT & Harvard
I lead the Harvard Laboratory for Applied Statistical Methodology

mailing address: Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138, USA
phone: (617) 496-8318  
fax: (617) 496-8057
email: airoldi AT fas DOT harvard DOT edu


bio.   I received a Ph.D. from Carnegie Mellon University in 2007, working at the intersection of statistical machine learning and computational social science with Stephen Fienberg and Kathleen Carley. My PhD thesis explored modeling approaches and inference strategies for analyzing social and biological networks. Till December 2008, I was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University working with Olga Troyanskaya and David Botstein. We developed mechanistic models of regulation, leveraging of high-thoughput technology, to gain insights into aspects of cellular dynamics that are not directly measurable at the desired resolution, such as growth rate. I joined the Statistics Department at Harvard University in 2009. See my CV for more details.

research.   My research explores modeling, inferential, and other methodological issues that often arise in applied problems where network data (i.e., measurements on pairs of units, or tuples more generally) need to be considered, and standard statistical theory and methods are no longer adequate to support the goals of the analysis. More broadly, my research interests encompass statistical methodology and theory with application to molecular biology and computational social science, including
  1. Design and analysis of experiments in the presence of interference
  2. Design and evaluation of network sampling mechanisms, and inference from non-ignorable sampling mechanisms
  3. Geometry of the inference in ill-posed inverse problems, including network tomography and contingency tables
  4. Theory and methods for the analysis of network data
  5. Modeling and inference in high-throughput biology, including sequencing and mass spectrometry
  6. Applications to computational social science and marketing
  7. Foundations of a theory of statistics and computing, including statistical computing strategies for massive data sets

Areas of technical interest include approximation theorems, inequalities, convex and combinatorial optimization, and geometry.

funding.   My work has been supported by an NSF Career Award, an ONR Young Investigator Award, a Sloan Research Fellowship, and grants from Google, Facebook, AT&T, the NSF, the NIH and the US Army.


  news   recent and upcoming talks
  • TBD. MIT (Stochastics & Statistics), September 25, 2015, Cambridge, MA.

  • TBD. Microsoft Research New England (Statistics & Data Science Symposium), June 12-13, 2015, Cambridge, MA.

  • TBD. University of Washington (Computer Science), June 2, 2015, Seattle, WA.

  • TBD. University of Washington (Statistics), June 1, 2015, Seattle, WA.

  • TBD. University of Montreal (Centre de Recherches Matematiques), May 4 - 8, 2015, Montreal, Canada.

  • TBD. Harvard University (Center of Mathematical Sciences and Applications), April 30 - May 2, 2015, Cambridge, MA.

  • TBD. University of Chicago (Booth School of Business), April 2, 2015, Chicago, IL.

  • Designing optimal experiments in the presence of social interference. National Academy of Sciences (Sackler Symposium on Drawing Causal Inference from Big Data), March 26-27, 2015, Washington, DC.

  • TBD. University of Connecticut (Statistics), March 11, 2015, Storrs, CT.

  • Valid statistical analyses and reproducible science in the era of high-throughput biology. Harvard School of Public Health (PQG Short Course), March 10, 2015, Boston, MA.

  • Statistical and machine learning challenges in the analysis of large networks. UC Berkeley (IEOR), February 13, 2015, Berkeley, CA.

  • Statistical and machine learning challenges in the analysis of large networks. Northeastern University (Mathematics), February 10, 2015, Boston, MA.

  • Design and analysis of experiments in the presence of network interference. Princeton University (Political Science), December 5, 2014, Princeton, NJ.

  • Statistical and machine learning challenges in the analysis of large networks. Princeton University (Computer Science), December 2, 2014, Princeton, NJ.

  • Design and analysis of experiments in the presence of network interference. Yale University (Statistics), December 2, 2013, New Haven, CT.

  • Design and analysis of experiments with interfering units. Simons Institute for the Theory of Computing, November 18-21, 2013, UC Berkeley, CA.

  pre-prints
  • Geometric representations of distributions on hypergraphs. (pdf)

  • Bayesian inference from non-ignorable network sampling designs. (pdf)

  • Inference of network summary statistics through network denoising. (pdf)

  • Sharp total variation bounds for finitely exchangeable arrays. (pdf)

  • The geometry of 2x2 contingency tables. (java app) (source code)

  • Estimating cellular pathways from an ensemble of heterogeneous data sources. (pdf)

  • Variable stoichiometry among core ribosomal proteins. (pdf)

  selected publications (see my CV or Google Scholar for more publications and bibliographic details)

  interface between statistics and computing
  • Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing, 2015.

  • Implicit stochastic gradient methods for principled estimation with large data sets. (pdf) (a shorter version appeared at ICML 2014)
  theory and methods for network data analysis
  • A consistent total variation estimator for exchangeable graph models. (pdf) (a shorter version appeared at ICML 2014)

  • Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. NIPS, 2013. (pdf)

  • Stochastic blockmodels with growing number of classes. Biometrika, 2012. (pdf)

  • Confidence sets for network structure. Statistical Analysis and Data Mining, 2011. (pdf) (a shorter version appeared at NIPS 2011)

  • Graphlets decomposition of a weighted network. Journal of Machine Learning Research, W&CP, 2011. (pdf) (MSR best student paper award, NESS 2012)

  • Network sampling and classification: An investigation of network model representations. Decision Support Systems, 2011. (pdf)

  • A survey of statistical network models. Foundations and Trends in Machine Learning, 2010. (pdf)

  • Mixed-membership stochastic blockmodels. Journal of Machine Learning Research, 2008. (pdf) (r code) (fast code) (John Van Ryzin award, 2006)
  geometry and inference in ill-posed inverse problems
  • Estimating latent processes on a network from indirect measurements. Journal of the American Statistical Association, 2013. (pdf) (supp) (r code) (IBM best student paper award, NESS 2011)

  • Polytope samplers for inference in ill-posed inverse problems. Journal of Machine Learning Research, W&CP, 2011. (pdf)

  • Tree preserving embedding Proceedings of the National Academy of Sciences, 2011. (pdf) (r code) (a shorter version appeared at ICML 2011)
  modeling and inference in high-throughput biology
  • Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology. Journal of the American Statistical Association, in press. (IBM best student paper award, NESS 2013)

  • Generalized species sampling priors with latent beta reinforcements. Journal of the American Statistical Association, 2014. (pdf)

  • Multi-way blockmodels for analyzing coordinated high-dimensional responses. Annals of Applied Statistics, 2013. (pdf) (supp)

  • Analysis and design of RNA sequencing experiments for identifying mRNA isoform regulation. Nature Methods, 2010. (pdf) (supp) (code)

  • Ranking relations using analogies in biological and information networks. Annals of Applied Statistics, 2010. (pdf) (code)

  • Predicting cellular growth from gene expression signatures. PLoS Computational Biology, 2009. (pdf) (code & data) (a shorter version appeared at NIPS 2008)

  • Getting started in probabilistic graphical models. PLoS Computational Biology, 2007. (pdf)
  applications in molecular biology
  • Musashi proteins are post-transcriptional regulators of the epithelial-luminal cell state. eLife, 2014. (pdf) (editor's choice in Science)

  • Quantifying condition-dependent intracellular protein levels enables high-precision fitness estimates. PLoS One, 2013. (pdf)

  • A conserved cell growth cycle can account for the environmental stress responses of divergent eukaryotes. Molecular Biology of the Cell, 2012. (pdf)

  • Systems-level dynamic analyses of fate change in murine embryonic stem cells. Nature, 2009. (pdf) (supp) (F1000) (news & views, Nat BT) (editor's choice, Sci Sig)

  • Coordination of growth rate, cell cycle, stress response and metabolic activity in yeast. Molecular Biology of the Cell, 2008. (pdf) (code & data)
  modeling and inference in computational social science
  • Causal inference for ordinal outcomes. (pdf)

  • A model of text for experimentation in the social sciences. (pdf) (a shorter version appeared at NIPS 2013)

  • Robust summaries of topical content with word frequency and exclusivity. (pdf) (a shorter version appeared at ICML 2012)
  applications in computational social science
  • A natural experiment of social network formation and dynamics. Proceedings of the National Acadamy of Sciences, in press.

  • Discussion of Hennig and Liao 'How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification'. Journal of the Royal Statistical Society, Series C, 2013. (pdf) (article)

  • Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences, 2010. (pdf) (editorial feature)

  • Whose ideas? Whose words? Authorship of the Ronald Reagan radio addresses. Political Science & Politics, 2007. (pdf) (op-ed by Skinner & Rice)

  • Who wrote Ronald Reagan's radio addresses? Bayesian Analysis, 2006. (pdf) (tr with detailed predictions) (notes on Negative Binomial)
  theses
  • Bayesian mixed-membership models of complex and evolving networks. Doctoral dissertation, 2007. (Savage award honorable mention, 2007)

  • The theory of weak convergence of probability measures and its applications in statistics. Undergraduate thesis, 1999. (Gold medal for best graduates, 1999)