科研成果 by Year: 2024

2024
Gao Y, Pan R, Li F, Zhang R, Wang H. Grid Point Approximation for Distributed Nonparametric Smoothing and Prediction. Journal of Computational and Graphical Statistics [Internet]. 2024:1–29. 访问链接Abstract
Kernel smoothing is a widely used nonparametric method in modern statistical analysis. The problem of efficiently conducting kernel smoothing for a massive dataset on a distributed system is a problem of great importance. In this work, we find that the popularly used one-shot type estimator is highly inefficient for prediction purposes. To this end, we propose a novel grid point approximation (GPA) method, which has the following advantages. First, the resulting GPA estimator is as statistically efficient as the global estimator under mild conditions. Second, it requires no communication and is extremely efficient in terms of computation for prediction. Third, it is applicable to the case where the data are not randomly distributed across different machines. To select a suitable bandwidth, two novel bandwidth selectors are further developed and theoretically supported. Extensive numerical studies are conducted to corroborate our theoretical findings. Two real data examples are also provided to demonstrate the usefulness of our GPA method.
Li F. Book Review of Causality: Models, Reasoning, and Inference, Judea Pearl. (Second Edition). (2009). International Journal of Forecasting [Internet]. 2024;40:423–425. 访问链接Abstract
With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.
Li L, Li F, Kang Y. Forecasting Large Collections of Time Series: Feature-Based Methods. In: Hamoudia M, Makridakis S, Spiliotis E Forecasting with Artificial Intelligence: Theory and Applications. Springer Nature Switzerland; 2024. pp. 251–276. 访问链接Abstract
In economics and many other forecasting domains, the real world problems are too complex for a single model that assumes a specific data generation process. The forecasting performance of different methods changesChange(s) depending on the nature of the time series. When forecasting large collections of time series, two lines of approaches have been developed using time series features, namely feature-based model selection and feature-based model combination. This chapter discusses the state-of-the-art feature-based methods, with reference to open-source software implementationsImplementation.
Huang Y, Li F, Li T, Lin T-C. Local Information Advantage and Stock Returns: Evidence from Social Media. Contemporary Accounting Research [Internet]. 2024;41:1089–1119. 访问链接Abstract
We examine the information asymmetry between local and nonlocal investors with a large dataset of stock message board postings. We document that abnormal relative postings of a firm, i.e., unusual changes in the volume of postings from local versus nonlocal investors, capture locals' information advantage. This measure positively predicts firms' short-term stock returns as well as those of peer firms in the same city. Sentiment analysis shows that posting activities primarily reflect good news, potentially due to social transmission bias and short-sales constraints. We identify the information driving return predictability through content-based analysis. Abnormal relative postings also lead analysts' forecast revisions. Overall, investors' interactions on social media contain valuable geography-based private information.
Wang H, Wang W, Li F, Kang Y, Li H. Catastrophe Duration and Loss Prediction via Natural Language Processing. Variance. 2024;Forthcoming.Abstract
Textual information from online news is more timely than insurance claim data during catastrophes, and there is value in using this information to achieve earlier damage estimates. In this paper, we use text-based information to predict the duration and severity of catastrophes. We construct text vectors through Word2Vec and BERT models, using Random Forest, LightGBM, and XGBoost as different learners, all of which show more satisfactory prediction results. This new approach is informative in providing timely warnings of the severity of a catastrophe, which can aid decision-making and support appropriate responses.