Readings

Table key

[MHE] = Angrist, J. D., and J. S. Pischke. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, 2009. ISBN: 9780691120355. [Preview with Google Books]

TOPIC # READINGS
I

Regression Recap

[MHE] Chapters 1–2 and 3.1–3.2.
The first two chapters explain our experimentalist perspective on applied econometrics. Chapter 3 covers regression basics and more advanced topics related to regression and matching.

Limited Dependent Variables and Marginal Effects

[MHE] Section 3.4.2

Dale, S., and A. Krueger. "Estimating the Payoff to Attending a More Selective College: An Application of Selection on Observables and Unobservables." The Quaterly Journal of Economics 117, no. 4 (2002): 1491–527.

———. This resource may not render correctly in a screen reader."Estimating the Return to College Selectivity Over the Career using Administrative Earnings Data." (PDF) The Journal of Human Resources, 2014. (NBER Working Paper no. 17159)

II

Matching

[MHE] Section 3.3.1.

Angrist, J. "Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants." Econometrica 66, no. 2 (1998): 249–88.

Abadie, A., and G. Imbens. "Large Sample Properties of Matching Estimators for Average Treatment Effects." Econometrica 74, no. 1 (2006): 235–67.

Imbens, G. "Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review." The Review of Economics and Statistics 86, no. 1 (2004): 4–29.

Training and the Propensity Score

[MHE] Sections 3.3.2–3.3.3.

Ashenfelter, O. "Estimating the Effect of Training Programs on Earnings." The Review of Economics and Statistics 60, no. 1 (1978): 47–57.

Ashenfelter, O., and D. Card. "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." The Review of Economics and Statistics 67, no. 4 (1985): 648–60.

LaLonde, R. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data." The American Economic Review 76, no. 4 (1986): 604–20.

Heckman, J., and J. Hotz. "Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training." Journal of the American Statistical Association 84, no. 408 (1989): 862–74.

Rosenbaum, P. R., and B. Rubin. "Reducing Bias in Observational Studies Using Subclassification on the Propensity Score." Journal of the American Statistical Association 79, no. 387 (1984): 516–24.

Dehejia, R., and S. Wahba. "Causal Effects in Nonexperimental Studies: Re-evaluating the Evaluation of Training Programs." Journal of the American Statistical Association 94, no. 448 (1999): 1053–62.

Smith, J., and P. Todd. "Does Matching Overcome LaLonde's Critique of Nonexperimental Estimators?" Journal of Econometrics 125, no. 1–2 (2005): 305–53.

Kline, P. "Oaxaca-blinder as a Reweighting Estimator." The American Economic Review 101, no. 3 (2011): 532–37.

Hahn, J. "On the Role of the Propensity Score in Efficient Estimation of Average Treatment Effects." Econometrica 66, no. 2 (1998): 315–31.

Angrist, J., and J. Hahn. "When to Control for Covariates? Panel Asymptotics for Estimates of Treatment Effects." The Review of Economics and Statistics 86, no. 1 (2004): 58–72.

Hirano, K., G. Imbens, et al. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score." Econometrica 71, no. 4 (2003): 1161–89.

Abadie, A., and G. Imbens. This resource may not render correctly in a screen reader."Matching on the Estimated Propensity Score." (PDF) Mimeo, NBER Working Paper no. 15301, 2012.

III

Part 1

2SLS with Constant Effects; The Wald Estimator, Grouped Data

[MHE] Section 4.1.

Angrist, J., and A. Krueger. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments." Journal of Economic Perspectives 15, no. 4 (2001): 69–85.

Angrist., J. "Grouped Data Estimation and Testing in Simple Labor Supply Models." Journal of Econometrics 47, no. 2–3 (1991): 243–66.

———. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records." The American Economic Review 80, no. 3 (1990): 313–36.

Two-sample IV and Related Estimators

[MHE] Section 4.3.

Angrist, J., and A. Krueger. This resource may not render correctly in a screen reader."The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples." (PDF) Journal of the American Statistical Association 57, no. 412 (1992): 11.

———. "Split-sample Instrumental Variables Estimates of the Returns to Schooling." Journal of Business and Economic Statistics 13, no. 2 (1995): 225–35.

Inoue, Atsushi, and G. Solon. "Two-sample Instrumental Variables Estimators." The Review of Economics and Statistics 92, no. 3 (2010): 557–61.

IV Details

[MHE] Section 4.6.1: 2SLS Misakes.

[MHE] Section 4.6.4: The Bias of 2SLS.

Angrist, J., G. Imbens, et al. "Jackknife Instrumental Variables Estimation." Journal of Applied Econometrics 14, no. 1 (1999): 57–67.

Flores-lagunes, Alfonso. "Finite-sample Evidence of IV Estimators under Weak Instruments." Journal of Applied Econometrics 22, no. 3 (2007): 677–94.

Kolesar, M. This resource may not render correctly in a screen reader."Estimation in an Instrumental Variables Model with Treatment Effect Heterogeneity." (PDF) Princeton Department of Economics, Mimeo, 2013.

III

Part 2

Instrumental Variables with Heterogeneous Potential Outcomes

[MHE] Section 4.4.

Imbens, G., and J. Angrist. "Identification and Estimation of Local Average Treatment Effects." Econometrica 62, no. 2 (1994): 467–75.

Angrist, J., G. Imbens, et al. "Identification of Causal Effects Using Instrumental Variables." Journal of the American Statistical Association 91, no. 434 (1996): 444–55.

Abadie, A. This resource may not render correctly in a screen reader."Bootstrap Tests for Distributional Treatment Effects in Instrumental Variables Models." (PDF) Journal of the American Statistical Association 97, no. 457 (2002): 284–92.

———. "Semiparametric Instrumental Variable Estimation of Treatment Response Models." Journal of Econometrics 113, no. 2 (2003): 231–63.

Abadie, A., J. Angrist, et al. "Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings." Econometrica 70, no. 1 (2002): 91–117.

Angrist, J. "Instrumental Variables Methods in Experimental Criminological Research: What, Why, and How." Journal of Experimental Criminological 2, no. 1 (2005): 23–44.

Angrist, J., S. Cohodes, et al. This resource may not render correctly in a screen reader."Stand and Deliver: Effects of Boston's Charter High Schools on College Preparation, Entry, and Choice." (PDF) The Journal of Labor Economics, 2015. (NBER Working Paper no. 19275) [Forthcoming]

Behaghel, L., B. Crepon, et al. This resource may not render correctly in a screen reader."Robustness of the Encouragement Design in a Two-treatment Randomized Control Trial." (PDF) IZA DP No. 7447, 2013.

Models with Variable and Continuous Treatment Intensity

[MHE] Section 4.5.3.

Angrist, J., and G. Imbens. "Two-stage Least Squares Estimation of Average Causal Effects in Models With Variable Treatment Intensity." Journal of the American Statistical Association 90, no. 430 (1995): 431–42.

Angrist, J., and A. Krueger. "Does Compulsory School Attendance Affect Schooling and Earnings?" The Quarterly Journal of Economics 106, no. 4 (1991): 979–1014.

Card., D. "The Causal Effect of Education on Earnings." Chapter 30 in The Handbook of Labor Economics: Volume 3A. North Holland, 1999. ISBN: 9780444501875. [Preview with Google Books]

Angrist, J., G. Imbens, et al. "The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish." Review of Economic Studies 67, no. 3 (2000): 499–528.

Powers, D. E., and S. S. Swinton. "Effects of Self-study for Coachable Test Item Types." Journal of Educational Psychology 76, no. 2 (1984): 266–78.

IV

External Validity

Angrist, J. "Treatment Effect Heterogeneity in Theory and Practice." The Economic Journal 114, no. 494 (2004): C52–C83.

Oreopoulos, P. "Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really Matter." The American Economic Review 96, no. 1 (2006): 152–75. (See also Oreopoulos' S Corrigendum posted on the AER web site.)

Angrist, J., V. Lavy, et al. "Multiple Experiments for the Causal Link Between the Quantity and Quality of Children." Journal of Labor Economics 28, no. 4 (2010): 773–823.

Angrist, J., and I. Fernandez-val. "ExtrapoLATE-ing: External Validity and Overidentification in the LATE Framework." In Advances in Econometrics Theory and Applications, Tenth World Congress. Vol. III. Cambridge University Press, 2013. ISBN: 9781107628861.

Testing Exclusion

Imbens, G., and D. Rubin. "Estimating Outcome Distributions for Compliers in Instrumental Variable Models." The Review of Economic Studies 64, no. 4 (1997): 555–74.

Kitagawa, T. This resource may not render correctly in a screen reader."A Test for Instrument Validity." (PDF) UCL, Working Papers CWP34 / 14, 2014.

Huber, M. "Testing the Validity of the Sibling Sex Ratio Instrument." University of St. Gallen Department of Economics, Working Paper, 2013.

Chaisemartin, Clement de. This resource may not render correctly in a screen reader."All you Need is LATE." (PDF) University of Warwick, Mimeo, 2012.

Evdokimov, Kirill, and David Lee. This resource may not render correctly in a screen reader."Diagnostics for Exclusion Restrictions in Instrumental Variables Estimation." (PDF) Princeton Department of Economics, Mimeo, 2014.

Peer Effects

[MHE] Section 4.6.2.

Angrist, J. This resource may not render correctly in a screen reader."The Perils of Peer Effects." (PDF) NBER Working Paper no. 19774, 2013.

Sacerdote, B. "Peer Effects with Random Assignment: Results for Dartmouth Roommates." The Quarterly Journal of Economics 116, no. 2 (2001): 681–704.

Townsend, R. "Risk and Insurance in Village India." Econometrica 62, no. 3 (1994): 539–91.

Angrist, J., and K. Lang. "Does School Integration Generate Peer effects? Evidence from Boston's Metco Program." The American Economic Review 94, no. 5 (2004): 1613–34.

Kling, J., J. Leibman, et al. "Experimental Analysis of Neighborhood Effects." Econometrica 75, no. 1 (2007): 83–119.

Guryan, J., K. Kroft, et al. "Peer Effects in the Workplace: Evidence from Random Groupings in Professional Golf Tournaments." The American Economic Review Applied Economics 1, no. 4 (2009): 34–68.

Duflo, E., P. Dupas, et al. "Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya." The American Economic Review 101, no. 5 (2011): 1739–74.

Crepon, B., Esther Duflo, et al. "Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment." The Quarterly Journal of Economics 128, no. 2 (2013): 531–80.

V

Differences-in-differences

[MHE] Chapter 5.

Abadie, A., A. Diamond, et al. "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." Journal of the American Statistical Association 105, no. 490 (2010): 493–505.

Abadie, A. "Semiparametric Differences-in-Differences Estimators." The Review of Economic Studies 72, no. 1 (2005): 1–19.

Athey, S., and G. Imbens. "Identification and Inference in Nonlinear Difference-in-Difference Models." Econometrica 74, no. 2 (2006): 431–97.

Chaisemartin, C. de, and X. D'Haultfoeuille. This resource may not render correctly in a screen reader."Fuzzy Changes in Changes." (PDF) The Paris School of Economics, Mimeo, 2012.

VI

Regression-discontinuity Designs

Basics
[MHE] Chapter 6.

Hahn, J., P. Todd, et al. "Identification and Estimation of Treatment Effects with a Regression-discontinuity Design." Econometrica 69, no. 1 (2001): 201–9.

Cook, T. "Waiting for Life to Arrive: A History of the Regression-discontinuity Design in Psychology, Statistics, and Economics." Journal of Econometrics 142, no. 2 (2008): 636–54.

Imbens, G., and T. Lemieux. "Regression Discontinuity Designs: A Guide to Practice." Journal of Econometrics 142, no. 2 (2008): 615–35.

Lee, D. "Randomized Experiments from Non-random Selection in U.S. House Elections." Journal of Econometrics 142, no. 2 (2008): 675–97.

Angrist, J., and V. Lavy. "Using Maimonides Rule to Estimate the Effect of Class Size on Scholastic Achievement." The Quarterly Journal of Economics 114, no. 2 (1999): 533–75.

Porter, J. This resource may not render correctly in a screen reader."Estimation in the Regression Discontinuity Model." (PDF) University of Wisconsin Department of Economics, Mimeo, 2003.

Abdulkadiroglu, A., J. Angrist, et al. "The Elite Illusion: Achievement Effects at Boston and New York Exam Schools." Econometrica 82, no. 1 (2014): 137–96.

Regression Kinks

Card, D., D. Lee, et al. This resource may not render correctly in a screen reader."Nonlinear Policy Rules and the Identification and Estimation of Causal Effects in a Generalized Regression Kink Design." (PDF) NBER Working Paper no. 18564, 2012.

Extrapolation

Angrist, J., and M. Rokkanen. "Wanna Get Away? RD Identification Away from the Cutoff." NBER Working Paper no. 18662, 2012.

Dong, Y., and A. Lewbel. This resource may not render correctly in a screen reader."Identifying the Effect of Changing Policy Thresholds in Regression Discontinuity Models." (PDF) Boston College Department of Economics, Mimeo, 2014. (Revised)

Rokkanen, M. This resource may not render correctly in a screen reader."Exam Schools, Ability, and the Effects of Affirmative Action: Latent Factor Extrapolation in the Regression Discontinuity Design." (PDF - 1.2MB) MIT Department of Economics, Mimeo, 2013.

Wing, C., T. Cook, et al. "Strengthening the Regression Discontinuity Design Using Additional Design Elements: A Within-study Comparison." Journal of Policy Analysis and Management 32, no. 4 (2013): 853–77.

Nonparametrics

Ludwig, J., and D. Miller. "Does Head Start Improve Children's Life Chances? Evidence from a Regression Discontinuity Design." The Quarterly Journal of Economics 122, no. 1 (2007): 159–208.

Frandsen, B., M. Frölich, et al. "Quantile Treatment Effects in the Regression Discontinuity Design." Journal of Econometrics 168, no. 2 (2012): 382–95.

Imbens, G., and K. Kalyanaraman. This resource may not render correctly in a screen reader."Optimal Bandwidth Choice for the Regression Discontinuity Estimator." (PDF) Review of Economic Studies 79, no. 3 (2012): 933–59.

Calonico, S., M. Cattaneo, et al. "Robust Nonparametric Confidence Intervals for Regression Discontinuity Designs." Econometrica 82, no. 6 (2014): 2295–326. 

Heaping

Almond, D., J. Doyle, et al. "Estimating the Marginal Returns to Medical Care: Evidence from At-risk Newborns." The Quarterly Journal of Economics 125, no. 2 (2010): 591–634.

Barreca, A., M. Guildi, et al. "Saving Babies? Revisiting the Effect of Very Low Birthweight Classification." The Quarterly Journal of Economics 126, no. 4 (2011): 2117–23.

Almond, Douglas, Joseph J. Doyle, Jr., Amanda E. Kowalski, et. al. This resource may not render correctly in a screen reader."Reply to Barreca, et al." (PDF) The Quarterly Journal of Economics 126, no. 4 (2011).

Dong, Y. "Regression Discontinuity Applications with Rounding Errors in the Running Variable." Journal of Applied Econometrics, 30, no. 3 (2015).

VII

Review of Large-sample Theory

[MHE] Section 3.1.3.

Chamberlain, G. "Panel Data." Chapter 22 in Handbook of Econometrics. Vol. 2. North-holland, 1987. ISBN: 9780444861863.

Finite-sample Issues

[MHE] Chapter 8.

Chesher, A., and I. Jewitt. "The Bias of a Heteroskedasticity-consistent Covariance Matrix Estimator." Econometrica 55, no. 5 (1987): 1217–22.

Moulton, Brent. "Random Group Effects and the Precision of Regression Estimates." Journal of Econometrics 32, no. 3 (1986): 385–97.

Bertrand, Marianne, Esther Duflo, et al. "How Much Should We Trust Differences-in-Differences Estimates?" The Quarterly Journal of Economics 119, no. 1 (2004): 249–75.

Hansen, C. "Asymptotic Properties of a Robust Variance Estimator for Panel Data When T is Large." Journal of Econometrics 141, no. 2 (2007): 597–620.

———. "Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects." Journal of Econometrics 140, no. 2 (2007): 670–94.

Cameron, C., J. Gelbach, et al. "Bootstrap-based Improvements for Inference with Clustered Errors." The Review of Economics and Statistics 90, no. 3 (2008): 414–27.

Imbens, G., and M. Kolesar. This resource may not render correctly in a screen reader."Robust Standard Errors in Small Samples: Some Practical Advice." (PDF) NBER Working Paper no. 18478, 2012.

Abadie, A., G. Imbens, et al. "Inference for Misspecified Models with Fixed Regressors." Journal of the American Statistical Association 109, no. 508 (2014): 1601–14.

Abadie, A., M. Chingos, et al. "Endogenous Stratification in Randomized Experiments." Harvard Mimeo, 2014.

VIII

Prediction with a Large Number of Covariates ("Big P")

Varian, Hal R. "Big Data: New Tricks for Econometrics." Journal of Economic Perspectives 28, no. 2 (2014): 3–28.
This reference gives a helicopter tour of various methods; we shall focus on the most useful ones.

Chernozhukov, V., and Hansen, C. "Econometrics of High-Dimensional Sparse Models." NBER Lectures and Video Materials. Accessed June 22, 2015. http://www.nber.org/econometric s_minicourse_2013/.

Belloni, Alexandre, Victor Chernozhukov, et al. "Inference for High-dimensional Sparse Econometric Models." Econometric Society World Congress, 2011, arXiv preprint arXiv:1201.0220.
Treats prediction and provides uniformly valid confidence interval construction in linear models with classical errors.

Additional References

Bickel, Peter J., Ya'acov Ritov, et al. "Simultaneous Analysis of Lasso and Dantzig Selector." The Annals of Statistics 37, no. 4 (2009): 1705–32.
This is a widely cited reference giving a powerful analysis of Lasso in the Gaussian regression setting.

Candes, Emmanuel, and Terence Tao. "The Dantzig Selector: Statistical Estimation when p is much Larger than n." The Annals of Statistics 35, no. 6 (2007): 2313–51.
This is a classical reference introducing the Dantzig selector, a variant of Lasso.

Leeb, Hannes, and Benedikt M. Pötscher. "Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator." Journal of Econometrics 142, no. 1 (2008): 201–11.
This is a quintessential reference that poses a serious criticism to "perfect model selection" and implications that are derived from it. Methods we emphasize never rely on "perfect selection" for their validity.

Belloni, A., D. Chen, et al. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain." Econometrica 80, no. 6 (2012): 2369–429.
Prediction using Lasso and Post-Lasso with heteroscedastic, non-Gaussian data.

Belloni, Chernozhukov, Wang Belloni, et al. "Square-root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming." Biometrika 98, no. 4 (2011): 791–806.
Introduces a self-tuned Lasso.

Belloni, A., V. Chernozhukov, et al. "Pivotal Estimation via Square-root Lasso in Nonparametric Regression." The Annals of Statistics 42, no. 2 (2014): 757–88.
Extends self-tuned Lasso to heteroscedastic, non-Gaussian data.

Manresa, E. This resource may not render correctly in a screen reader."Estimating the Social Structure of Interactions Using Panel Data." (PDF) MIT Sloan Working Paper, 2013.
This paper applies modern selection methods in a panel settings to estimate spillover effects across firms.

Brodie, J., Daubechies, I., et al. "Sparse and Stable Markowitz Portfolios." Proceedings of the National Academy of Sciences of the United States of America 106, no. 30 (2009): 12267–72.
This is an important financial application to a highly practical portfolio selection problem.

IX

Valid Confidence Intervals with a Large Number of Covariates ("Big P")

Chernozhukov, V., and C. Hansen. "Econometrics of High-Dimensional Sparse Models." NBER Lectures and Video Materials. Accessed June 22, 2015.  http://www.nber.org/econometric s_minicourse_2013/.

Belloni, A., V. Chernozhukov, et al. "High-dimensional Methods and Inference on Structural and Treatment Effects." Journal of Economic Perspectives 28, no. 2 (2014): 29–50.
This is a light expository article focusing on the problem of building valid confidence bands after the model selection.

Belloni, Alexandre, Victor Chernozhukov, et al. This resource may not render correctly in a screen reader."Inference for High-dimensional Sparse Econometric Models." Econometric Society World Congress, 2011, arXiv preprint arXiv:1201.0220.
Treats prediction and provides uniformly valid confidence interval construction in linear models.

Leeb, Hannes, and Benedikt M. Pötscher. "Model Selection and Inference: Facts and Fiction." Econometric Theory 21, no. 1 (2005): 21–59.
This paper proves that naive post-model selection and post-penalizes estimators are not what they are claimed to be. We shall emphasize non-naive selection that guards against the caveats raised by LP.

Additional References

Belloni, A., D. Chen, et al. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain." Econometrica 80, no. 6 (2012): 2369–429.
Provides valid confidence intervals after a selection of many instruments.

Belloni, A., Victor Chernozhukov, et al. "Inference on Treatment Effects after Selection among High-dimensional Controls." The Review of Economic Studies 81, no. 2 (2014): 608–50. (ArXiv 2011).
Proposes a "double selection" approach that builds orthogonality with respect to nuisance parameters, aka double robustness, to construct confidence bands for inference on reg coefficient in a linear model with non-classical errors, confidence bands for ATE and ATE for the treated in a heterogeneous model.

Farrell, Max H. "Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations." Chicago Booth Working Paper, 2014.
Provides uniformly valid confidence intervals for ATE with multivalued treatments, greatly extending BCH and also relying on orthogonality.

Kozbur, D. This resource may not render correctly in a screen reader."Inference in Additively Separable Models with a High Dimensional Component." (PDF) Chicago Booth Working Paper, 2014.
Considers inference on average derivatives with regression having a high-dimensional nuisance component; inference relies on orthogonality.

Belloni, A., V. Chernozhukov, et al. "Pivotal Estimation via Square-root Lasso in Nonparametric Regression." The Annals of Statistics 42, no. 2 (2014): 757–88.
Confidence intervals on a low-dimensional parameter in moment condition models, with moments orthogonalized with respect to the nuisance high-dimensional parameters.

Belloni, A., V. Chernozhukov, et al. "Uniform Post Selection Inference for Least Absolute Deviation Regression and other Z-estimation Problems." Biometrika 102, no. 1 (2012): 77–94 to appear in Biometrika.
Confidence intervals on a very high-dimensional parameter in moment condition models, with moments for each parameter orthogonalized with respect to other high-dimensional parameters.

Belloni, A., V. Chernozhukov, et al. "Program Evaluation with High-dimensional Data." 2013, arXiv preprint arXiv:1311.2645.
This paper consider LATE and LQTE and other similar effects in a modern program evaluation framework where there are many controls; functional response data is allowed.

X

Analysis with Large Sample Sizes ("Big N")

Varian, Hal R. "Big Data: New Tricks for Econometrics." Journal of Economic Perspectives 28, no. 2 (2014): 3–28.
The reference also gives an overview of dealing with big N.

Gentzkow, M., and J. Shapiro. "Nuts and Bolts: Computing with Large Data." NBER Lecture and Videos. Accessed June 22, 2015.  http://www.nber.org/econometric s_minicourse_2013/
This reference describes the current state of art of statistical computing with big-N data, giving rules of thumb for different value of N, as well as overview of the main principles.

Breen, Jeffrey. "R and Hadoop: Step by-step Tutorials."  Revolutions. Accessed June 22, 2015. http://blog.revolutionanalytics.com/2012/03/r-and-hadoop-step-by-step-tutorials.html.
This is one of the ways of dealing with very large N, with many TBs of data. This is one of the ways of dealing with very large N, with many TBs of data.