In this work, we develop a distributed least-square approximation (DLSA) method that is able to solve a large family of regression problems (e.g., linear regression, logistic regression, and Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. Moreover, it requires only one round of communication. We further conduct a shrinkage estimation based on the DLSA estimation using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator possesses the oracle property and is selection consistent by using a newly designed distributed Bayesian information criterion. The finite sample performance and computational efficiency are further illustrated by an extensive numerical study and an airline dataset. The airline dataset is 52 GB in size. The entire methodology has been implemented in Python for a de-facto standard Spark system. The proposed DLSA algorithm on the Spark system takes 26 min to obtain a logistic regression estimator, which is more efficient and memory friendly than conventional methods. Supplementary materials for this article are available online.
Background Air pollutants, particularly fine particulate matters (PM2.5) have been associated with mental disorder such as depression. Clean air policy (CAP, i.e., a series of emission-control actions) has been shown to reduce the public health burden of air pollutions. There were few studies on the health effects of CAP on mental health, particularly, in low-income and middle-income countries (LMICs). We investigated the association between a stringent CAP and depressive symptoms among general adults in China. Methods We used three waves (2011, 2013 and 2015) of the China Health and Retirement Longitudinal Study (CHARLS), a prospective nationwide cohort of the middle-aged and older population in China. We assessed exposure to PM2.5 through a satellite-retrieved dataset. We implemented a difference-in-differences (DID) approach, under the quasi-experimental framework of the temporal contrast between 2011 (before the CAP) and 2015 (after the CAP), to evaluate the effect of CAP on depressive symptoms. The association was further explored using a mixed-effects model of the three waves. To increase the interpretability, the estimated impact of PM2.5 was compared to that of aging, an established risk factor for depression. Findings Our analysis included 15,954 participants. In the DID model, we found a 10-µg/m3 reduction of PM2.5 concentration was associated with a 4.14% (95% CI: 0.41–8.00%) decrement in the depressive score. The estimate was similar to that from the mixed-effects model (3.63% [95% CI, 2.00–5.27%]). We also found improved air quality during 2011–2015 offset the negative impact from 5-years’ aging. Interpretation The findings suggest that implementing CAP may improve mental wellbeing of adults in China and other LMICs.