科研成果 by Year: 2021

2021
Janeway MG, Zhao X, Rosenthaler M, Zuo Y, Balasubramaniyan K, Poulson M, Neufeld M, Siracuse JJ, Takahashi CE, Allee L, et al. Clinical Diagnostic Phenotypes in Hospitalizations Due to Self-Inflicted Firearm Injury. Journal of Affective Disorders. 2021;278:172–180.Abstract
Hospitalized self-inflicted firearm injuries have not been extensively studied, particularly regarding clinical diagnoses at the index admission. The objective of this study was to discover the diagnostic phenotypes (DPs) or clusters of hospitalized self-inflicted firearm injuries. Using Nationwide Inpatient Sample data in the US from 1993 to 2014, we used International Classification of Diseases, Ninth Revision codes to identify self-inflicted firearm injuries among those ≥18 years of age. The 25 most frequent diagnostic codes were used to compute a dissimilarity matrix and the optimal number of clusters. We used hierarchical clustering to identify the main DPs. The overall cohort included 14072 hospitalizations, with self-inflicted firearm injuries occurring mainly in those between 16 to 45 years of age, black, with co-occurring tobacco and alcohol use, and mental illness. Out of the three identified DPs, DP1 was the largest (n=10,110), and included most common diagnoses similar to overall cohort, including major depressive disorders (27.7%), hypertension (16.8%), acute post hemorrhagic anemia (16.7%), tobacco (15.7%) and alcohol use (12.6%). DP2 (n=3,725) was not characterized by any of the top 25 ICD-9 diagnoses codes, and included children and peripartum women. DP3, the smallest phenotype (n=237), had high prevalence of depression similar to DP1, and defined by fewer fatal injuries of chest and abdomen. There were three distinct diagnostic phenotypes in hospitalizations due to self-inflicted firearm injuries. Further research is needed to determine how DPs can be used to tailor clinical care and prevention efforts.
Kang Y, Spiliotis E, Petropoulos F, Athiniotis N, Li F, Assimakopoulos V. Déjà vu: A Data-Centric Forecasting Approach through Time Series Cross-Similarity. Journal of Business Research [Internet]. 2021;132:719–731. 访问链接Abstract
Accurate forecasts are vital for supporting the decisions of modern companies. Forecasters typically select the most appropriate statistical model for each time series. However, statistical models usually presume some data generation process while making strong assumptions about the errors. In this paper, we present a novel data-centric approach — ‘forecasting with cross-similarity’, which tackles model uncertainty in a model-free manner. Existing similarity-based methods focus on identifying similar patterns within the series, i.e., ‘self-similarity’. In contrast, we propose searching for similar patterns from a reference set, i.e., ‘cross-similarity’. Instead of extrapolating, the future paths of the similar series are aggregated to obtain the forecasts of the target series. Building on the cross-learning concept, our approach allows the application of similarity-based forecasting on series with limited lengths. We evaluate the approach using a rich collection of real data and show that it yields competitive accuracy in both points forecasts and prediction intervals.
Zhu X, Li F, Wang H. Least-Square Approximation for a Distributed System. Journal of Computational and Graphical Statistics [Internet]. 2021;30:1004–1018. 访问链接Abstract
In this work, we develop a distributed least-square approximation (DLSA) method that is able to solve a large family of regression problems (e.g., linear regression, logistic regression, and Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. Moreover, it requires only one round of communication. We further conduct a shrinkage estimation based on the DLSA estimation using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator possesses the oracle property and is selection consistent by using a newly designed distributed Bayesian information criterion. The finite sample performance and computational efficiency are further illustrated by an extensive numerical study and an airline dataset. The airline dataset is 52 GB in size. The entire methodology has been implemented in Python for a de-facto standard Spark system. The proposed DLSA algorithm on the Spark system takes 26 min to obtain a logistic regression estimator, which is more efficient and memory friendly than conventional methods. Supplementary materials for this article are available online.