Edoardo M Airoldi

Associate Professor of Statistics, Harvard University
Associate Faculty Member, The Broad Insititute of MIT & Harvard
At Harvard, I lead a research group in Applied Statistical Methodology & Data Science
  (formerly known as the Harvard Laboratory for Applied Statistical Methodology & Data Science, 2009-2017)

mailing address: Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138, USA
phone: (617) 496-8318  
fax: (617) 496-8057
email: airoldi AT fas DOT harvard DOT edu

bio

I received a Ph.D. from Carnegie Mellon University in 2007, working at the intersection of statistical machine learning and computational social science with Stephen Fienberg and Kathleen Carley. My PhD thesis explored modeling approaches and inference strategies for analyzing social and biological networks. Till December 2008, I was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University working with Olga Troyanskaya and David Botstein. We developed mechanistic models of regulation, leveraging of high-thoughput technology, to gain insights into aspects of cellular dynamics that are not directly measurable at the desired resolution, such as growth rate. I joined the Statistics Department at Harvard University in 2009. See my CV for more details.

research

My research explores modeling, inferential, and other methodological issues that often arise in applied problems where network data (i.e., measurements on pairs of units, or tuples more generally) need to be considered, and standard statistical theory and methods are no longer adequate to support the goals of the analysis. More broadly, my research interests encompass statistical methodology and theory with application to molecular biology and computational social science, including Theory and methods for the analysis of network data

Design and analysis of experiments in the presence of interference

Inference from, and design and evaluation of, non-ignorable network sampling mechanisms

Geometry of inference in ill-posed inverse problems, including network tomography and contingency tables

Modeling and inference of regulation and signaling dynamics, including mass spectrometry and next-generation sequencing

Approximate inference strategies for data analysis at scale

Areas of technical interest include approximation theorems, inequalities, convex and combinatorial optimization, and geometry.

funding

My work has been supported by an NSF Career Award, an ONR Young Investigator Award, a Sloan Research Fellowship, a Shutzer Fellowship, grants from the National Science Foundation, the National Institute of Health and the US Army, and gifts from the Broad Institute, Google, Microsoft Research, LinkedIn, Facebook, and AT&T.

  news
  • Postdoctoral positions available. (postdoc flyer pdf)

  • The Handbook of Mixex Membership Models (co-edited with David Blei, Elena Erosheva and Stephen Fienberg) is out. (www)

  recent and upcoming talks
  • Two model-assisted strategies for desgning experiments on networks. UC Berkeley (Econometrics & Statistics of Networks), November 4-5, 2016, Berkeley, CA.

  • Optimal design of experiments on social networks. MIT (Conference on Digital Experimentation), October 14-15, 2016, Cambridge, MA.

  • Discussant. University of Chicago (Machine Learning & Economics), September 23-24, 2016, Chicago, IL.

  pre-prints
  • Optimal model-assisted design of experiments for network correlated outcomes suggests new notions of network balance. (pdf)

  • Valid inference from non-ignorable network sampling designs. (pdf)

  • Implicit stochastic approximation. (pdf)

  • Non-standard conditionally specified models for non-ignorable missing data. (pdf)

  • Causal inference for ordinal outcomes. (pdf)

  • The geometry of 2×2 contingency tables. (java app) (source code)

  selected publications (see my CV or Google Scholar for more publications and bibliographic details)

  statistical inference strategies for massive data sets
  • Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics, 2016. (pdf)

  • Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing, 2015. (pdf)
  theory and methods for network data analysis
  • Geometric representations of distributions on hypergraphs. Journal of the American Statistical Association, 2016. (pdf)

  • Nonparametric estimation and testing of exchangeable graph models. Journal of Machine Learning Research, W&CP, 2014. (pdf)

  • A consistent total variation estimator for exchangeable graph models. (pdf) (a shorter version appeared at ICML 2014)

  • Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. NIPS, 2013. (pdf)

  • Stochastic blockmodels with growing number of classes. Biometrika, 2012. (pdf)

  • Confidence sets for network structure. Statistical Analysis and Data Mining, 2011. (pdf) (a shorter version appeared at NIPS 2011)

  • Graphlets decomposition of a weighted network. Journal of Machine Learning Research, W&CP (AISTAT), 2011. (pdf) (MSR best student paper award, NESS 2012)

  • A survey of statistical network models. Foundations and Trends in Machine Learning, 2010. (pdf)

  • Mixed-membership stochastic blockmodels. Journal of Machine Learning Research, 2008. (pdf) (r code) (fast code) (John Van Ryzin award, 2006)
  geometry and inference in ill-posed inverse problems
  • Estimating latent processes on a network from indirect measurements. Journal of the American Statistical Association, 2013. (pdf) (supp) (r code) (IBM best student paper award, NESS 2011)

  • Polytope samplers for inference in ill-posed inverse problems. Journal of Machine Learning Research, W&CP (AISTAT), 2011. (pdf)

  • Tree preserving embedding Proceedings of the National Academy of Sciences, 2011. (pdf) (r code) (a shorter version appeared at ICML 2011)
  modeling and inference in high-throughput biology
  • Template-based models for genome-wide analysis of next-generation sequencing data at base-pair resolution. Journal of the American Statistical Association, 2016. (pdf)

  • Estimating cellular pathways from an ensemble of heterogeneous data sources. Annals of Applied Statistics, 2015. (pdf)

  • Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast. PLoS Genetics, 2015. (pdf) (data)

  • Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology. Journal of the American Statistical Association, 2015. (pdf) (IBM best student paper award, NESS 2013) (W. J. Youden Award in Interlaboratory Testing, ASA 2015)

  • Generalized species sampling priors with latent beta reinforcements. Journal of the American Statistical Association, 2014. (pdf)

  • Multi-way blockmodels for analyzing coordinated high-dimensional responses. Annals of Applied Statistics, 2013. (pdf) (supp)

  • Analysis and design of RNA sequencing experiments for identifying mRNA isoform regulation. Nature Methods, 2010. (pdf) (supp) (code)

  • Ranking relations using analogies in biological and information networks. Annals of Applied Statistics, 2010. (pdf) (code)

  • Predicting cellular growth from gene expression signatures. PLoS Computational Biology, 2009. (pdf) (code & data) (a shorter version appeared at NIPS 2008)
  applications in molecular biology
  • Defining the essential function of yeast Hsf1 reveals a compact transcriptional program for maintaining eukaryotic proteostasis. Molecular Cell, 2016. (pdf) (preview)

  • Reversible, specific, active aggregates of endogenous proteins assemble upon heat stress. Cell, 2015. (pdf)

  • Differential stoichiometry among core Ribosomal proteins. Cell Reports, 2015. (pdf)

  • Musashi proteins are post-transcriptional regulators of the epithelial-luminal cell state. eLife, 2014. (pdf) (editor’s choice in Science)

  • Systems-level dynamic analyses of fate change in murine embryonic stem cells. Nature, 2009. (pdf) (supp) (F1000) (news & views, Nat BT) (editor’s choice, Sci Sig)

  • Coordination of growth rate, cell cycle, stress response and metabolic activity in yeast. Molecular Biology of the Cell, 2008. (pdf) (code & data)
  applied methodology in computational social science
  • A regularization scheme on word occurrence rates that improves estimation and interpretation of topical content (with discussion). Journal of the American Statistical Association, 2016. (pdf) (Oustanding Statistical Application Award, ASA 2016)

  • A model of text for experimentation in the social sciences. Journal of the American Statistical Association, 2016. (pdf)

  • Predicting traffic volumes and estimating the effects of shocks in massive transportation systems. Proceedings of the National Acadamy of Sciences, 2015.

  • A natural experiment of social network formation and dynamics. Proceedings of the National Acadamy of Sciences, 2015.

  • Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences, 2010. (pdf) (editorial feature)

  • Whose ideas? Whose words? Authorship of the Ronald Reagan radio addresses. Political Science & Politics, 2007. (pdf) (op-ed by Skinner & Rice)

  • Who wrote Ronald Reagan’s radio addresses? Bayesian Analysis, 2006. (pdf) (tr with detailed predictions) (notes on Negative Binomial)
  theses
  • Bayesian mixed-membership models of complex and evolving networks. Doctoral dissertation, 2007. (Savage award honorable mention, 2007)

  • The theory of weak convergence of probability measures and its applications in statistics. Undergraduate thesis, 1999. (Gold medal for best graduates, 1999)