科研成果 by Type: 期刊论文

2026

Li F, Ruan T. Recovery-Informed Forecasting Strategy Enhancement. Annals of Tourism Research [Internet]. 2026;118:104164. 访问链接 Abstract

We propose a three-stage framework named as Recovery-Informed Strategy Enhancement (RISE) to forecast the recovery of Chinese outbound tourism following the coronavirus disease 2019 pandemic. The framework decomposes the forecasts into three parts: the initial forecasts, the terminal forecasts and the recovery curve forecasts that connect the two points. We integrate multiple sources of information and employ forecast combination techniques in all stages, enhancing both the accuracy and robustness of recovery forecasts. Compared with conventional forecasting approaches, our framework provides a structured and transparent pipeline to integrate model-based forecasts with expert-informed judgment under structural breaks and high uncertainty. Our findings demonstrate the effectiveness of this framework, offering an adaptable tool for recovery trajectory forecasting in post-crisis contexts.

2025

王雯, 李丰. 基于分段组合VARX模型的中国出境游客数量预测. 经济管理学刊. 2025;4:255–284.Abstract

本文对结构性变化的旅游需求进行研究,基于带有外生变量的向量自回归(VARX)模型,提出了一种分段组合预测的方法。与既有研究普遍采用的基于完整数据集构建组合预测模型不同,本文创新性地将时间因素纳入组合预测考量,通过将不同时间段的变量视为独立的单元,构建出分段时间序列数据集的组合预测模型。该方法以游客的网络搜索行为作为外生变量用于预测旅游人数,并捕捉这些外生变量在不同时间节点上对旅游人数产生的差异化影响,特别是在新冠疫情等突发冲击下的动态变化。实证结果显示,VARX模型的分段组合在预测中国出境旅游人数时展现出更高的准确性,其预测精度因考虑了外生变量在不同时间段的特异性影响而得以提升。事后分析进一步显示,特别是针对2024年中国出境旅游趋势的外样本预测结果,随着新冠疫情影响的逐渐消退及全球旅游市场的逐步复苏,中国出境旅游人数将呈现积极向上的增长态势。这一结论与现有公开文献中的趋势分析相吻合,进一步印证了本文预测方法的实践应用价值。

Zhong Y, Ren Y, Cao G, Li F, Qi H. Optimal starting point for time series forecasting. Expert Systems with Applications [Internet]. 2025;273:126798. 访问链接 Abstract

Recent advances on time series forecasting mainly focus on improving the forecasting models themselves. However, when the time series data suffer from potential structural breaks or concept drifts, the forecasting performance might be significantly reduced. In this paper, we introduce a novel approach called Optimal Starting Point Time Series Forecast (OSP-TSP) for optimal forecasting, which can be combined with existing time series forecasting models. By adjusting the sequence length via leveraging the XGBoost and LightGBM models, the proposed approach can determine the optimal starting point (OSP) of the time series and then enhance the prediction performances of the base forecasting models. To illustrate the effectiveness of the proposed approach, comprehensive empirical analysis have been conducted on the M4 dataset and other real world datasets. Empirical results indicate that predictions based on the OSP-TSP approach consistently outperform those using the complete time series dataset. Moreover, comparison results reveals that combining our approach with existing forecasting models can achieve better prediction accuracy, which also reflect the advantages of the proposed approach.

2024

Gao Y, Pan R, Li F, Zhang R, Wang H. Grid Point Approximation for Distributed Nonparametric Smoothing and Prediction. Journal of Computational and Graphical Statistics [Internet]. 2024:1–29. 访问链接 Abstract

Kernel smoothing is a widely used nonparametric method in modern statistical analysis. The problem of efficiently conducting kernel smoothing for a massive dataset on a distributed system is a problem of great importance. In this work, we find that the popularly used one-shot type estimator is highly inefficient for prediction purposes. To this end, we propose a novel grid point approximation (GPA) method, which has the following advantages. First, the resulting GPA estimator is as statistically efficient as the global estimator under mild conditions. Second, it requires no communication and is extremely efficient in terms of computation for prediction. Third, it is applicable to the case where the data are not randomly distributed across different machines. To select a suitable bandwidth, two novel bandwidth selectors are further developed and theoretically supported. Extensive numerical studies are conducted to corroborate our theoretical findings. Two real data examples are also provided to demonstrate the usefulness of our GPA method.

Li F. Book Review of Causality: Models, Reasoning, and Inference, Judea Pearl. (Second Edition). (2009). International Journal of Forecasting [Internet]. 2024;40:423–425. 访问链接 Abstract

With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.

Huang Y, Li F, Li T, Lin T-C. Local Information Advantage and Stock Returns: Evidence from Social Media. Contemporary Accounting Research [Internet]. 2024;41:1089–1119. 访问链接 Abstract

We examine the information asymmetry between local and nonlocal investors with a large dataset of stock message board postings. We document that abnormal relative postings of a firm, i.e., unusual changes in the volume of postings from local versus nonlocal investors, capture locals' information advantage. This measure positively predicts firms' short-term stock returns as well as those of peer firms in the same city. Sentiment analysis shows that posting activities primarily reflect good news, potentially due to social transmission bias and short-sales constraints. We identify the information driving return predictability through content-based analysis. Abnormal relative postings also lead analysts' forecast revisions. Overall, investors' interactions on social media contain valuable geography-based private information.

Wang H, Wang W, Li F, Kang Y, Li H. Catastrophe Duration and Loss Prediction via Natural Language Processing. Variance. 2024;Forthcoming.Abstract

Textual information from online news is more timely than insurance claim data during catastrophes, and there is value in using this information to achieve earlier damage estimates. In this paper, we use text-based information to predict the duration and severity of catastrophes. We construct text vectors through Word2Vec and BERT models, using Random Forest, LightGBM, and XGBoost as different learners, all of which show more satisfactory prediction results. This new approach is informative in providing timely warnings of the severity of a catastrophe, which can aid decision-making and support appropriate responses.

2023

Li L, Kang Y, Petropoulos F, Li F. Feature-Based Intermittent Demand Forecast Combinations: Accuracy and Inventory Implications. International Journal of Production Research [Internet]. 2023;61:7557–7572. 访问链接 Abstract

Intermittent demand forecasting is a ubiquitous and challenging problem in production systems and supply chain management. In recent years, there has been a growing focus on developing forecasting approaches for intermittent demand from academic and practical perspectives. However, limited attention has been given to forecast combination methods, which have achieved competitive performance in forecasting fast-moving time series. The current study aims to examine the empirical outcomes of some existing forecast combination methods and propose a generalized feature-based framework for intermittent demand forecasting. The proposed framework has been shown to improve the accuracy of point and quantile forecasts based on two real data sets. Further, some analysis of features, forecasting pools and computational efficiency is also provided. The findings indicate the intelligibility and flexibility of the proposed approach in intermittent demand forecasting and offer insights regarding inventory decisions.

Zhang B, Kang Y, Panagiotelis A, Li F. Optimal Reconciliation with Immutable Forecasts. European Journal of Operational Research [Internet]. 2023;308:650–660. 访问链接 Abstract

The practical importance of coherent forecasts in hierarchical forecasting has inspired many studies on forecast reconciliation. Under this approach, so-called base forecasts are produced for every series in the hierarchy and are subsequently adjusted to be coherent in a second reconciliation step. Reconciliation methods have been shown to improve forecast accuracy, but will, in general, adjust the base forecast of every series. However, in an operational context, it is sometimes necessary or beneficial to keep forecasts of some variables unchanged after forecast reconciliation. In this paper, we formulate reconciliation methodology that keeps forecasts of a pre-specified subset of variables unchanged or "immutable". In contrast to existing approaches, these immutable forecasts need not all come from the same level of a hierarchy, and our method can also be applied to grouped hierarchies. We prove that our approach preserves unbiasedness in base forecasts. Our method can also account for correlations between base forecasting errors and ensure non-negativity of forecasts. We also perform empirical experiments, including an application to sales of a large scale online retailer, to assess the impacts of our proposed methodology.

Wang X, Hyndman RJ, Li F, Kang Y. Forecast Combinations: An over 50-Year Review. International Journal of Forecasting [Internet]. 2023;39:1518–1547. 访问链接 Abstract

Forecast combinations have flourished remarkably in the forecasting community and, in recent years, have become part of mainstream forecasting research and activities. Combining multiple forecasts produced for a target time series is now widely used to improve accuracy through the integration of information gleaned from different sources, thereby avoiding the need to identify a single “best” forecast. Combination schemes have evolved from simple combination methods without estimation to sophisticated techniques involving time-varying weights, nonlinear combinations, correlations among components, and cross-learning. They include combining point forecasts and combining probabilistic forecasts. This paper provides an up-to-date review of the extensive literature on forecast combinations and a reference to available open-source software implementations. We discuss the potential and limitations of various methods and highlight how these ideas have developed over time. Some crucial issues concerning the utility of forecast combinations are also surveyed. Finally, we conclude with current research gaps and potential insights for future research.

Li L, Kang Y, Li F. Bayesian Forecast Combination Using Time-Varying Features. International Journal of Forecasting [Internet]. 2023;39:1287–1302. 访问链接 Abstract

In this work, we propose a novel framework for density forecast combination by constructing time-varying weights based on time-varying features. Our framework estimates weights in the forecast combination via Bayesian log predictive scores, in which the optimal forecast combination is determined by time series features from historical information. In particular, we use an automatic Bayesian variable selection method to identify the importance of different features. To this end, our approach has better interpretability compared to other black-box forecasting combination schemes. We apply our framework to stock market data and M3 competition data. Based on our structure, a simple maximum-a-posteriori scheme outperforms benchmark methods, and Bayesian variable selection can further enhance the accuracy for both point forecasts and density forecasts.

Wang X, Kang Y, Hyndman RJ, Li F. Distributed ARIMA Models for Ultra-Long Time Series. International Journal of Forecasting [Internet]. 2023;39:1163–1184. 访问链接 Abstract

Providing forecasts for ultra-long time series plays a vital role in various activities, such as investment decisions, industrial production arrangements, and farm management. This paper develops a novel distributed forecasting framework to tackle the challenges of forecasting ultra-long time series using the industry-standard MapReduce framework. The proposed model combination approach retains the local time dependency. It utilizes a straightforward splitting across samples to facilitate distributed forecasting by combining the local estimators of time series models delivered from worker nodes and minimizing a global loss function. Instead of unrealistically assuming the data generating process (DGP) of an ultra-long time series stays invariant, we only make assumptions on the DGP of subseries spanning shorter time periods. We investigate the performance of the proposed approach with AutoRegressive Integrated Moving Average (ARIMA) models using the real data application as well as numerical simulations. Our approach improves forecasting accuracy and computational efficiency in point forecasts and prediction intervals, especially for longer forecast horizons, compared to directly fitting the whole data with ARIMA models. Moreover, we explore some potential factors that may affect the forecasting performance of our approach.

2022

Talagala TS, Li F, Kang Y. FFORMPP: Feature-Based Forecast Model Performance Prediction. International Journal of Forecasting [Internet]. 2022;38:920–943. 访问链接 Abstract

This paper introduces a novel meta-learning algorithm for time series forecast model performance prediction. We model the forecast error as a function of time series features calculated from historical time series with an efficient Bayesian multivariate surface regression approach. The minimum predicted forecast error is then used to identify an individual model or a combination of models to produce the final forecasts. It is well known that the performance of most meta-learning models depends on the representativeness of the reference dataset used for training. In such circumstances, we augment the reference dataset with a feature-based time series simulation approach, namely GRATIS, to generate a rich and representative time series collection. The proposed framework is tested using the M4 competition data and is compared against commonly used forecasting approaches. Our approach provides comparable performance to other model selection and combination approaches but at a lower computational cost and a higher degree of interpretability, which is important for supporting decisions. We also provide useful insights regarding which forecasting models are expected to work better for particular types of time series, the intrinsic mechanisms of the meta-learners, and how the forecasting performance is affected by various factors.

Anderer M, Li F. Hierarchical Forecasting with a Top-down Alignment of Independent-Level Forecasts. International Journal of Forecasting [Internet]. 2022;38:1405–1414. 访问链接 Abstract

Hierarchical forecasting with intermittent time series is a challenge in both research and empirical studies. Extensive research focuses on improving the accuracy of each hierarchy, especially the intermittent time series at bottom levels. Then, hierarchical reconciliation can be used to improve the overall performance further. In this paper, we present a hierarchical-forecasting-with-alignment approach that treats the bottom-level forecasts as mutable to ensure higher forecasting accuracy on the upper levels of the hierarchy. We employ a pure deep learning forecasting approach, N-BEATS, for continuous time series at the top levels, and a widely used tree-based algorithm, LightGBM, for intermittent time series at the bottom level. The hierarchical-forecasting-with-alignment approach is a simple yet effective variant of the bottom-up method, accounting for biases that are difficult to observe at the bottom level. It allows suboptimal forecasts at the lower level to retain a higher overall performance. The approach in this empirical study was developed by the first author during the M5 Accuracy competition, ranking second place. The method is also business orientated and can be used to facilitate strategic business planning.

Wang Z, Pang Y, Gan M, Skitmore M, Li F. Escalator Accident Mechanism Analysis and Injury Prediction Approaches in Heavy Capacity Metro Rail Transit Stations. Safety Science. 2022;154:105850.Abstract

The semi-open character with high passenger flow in Metro Rail Transport Stations (MRTS) makes safety management of human-electromechanical interaction escalator systems more complex. Safety management should not consider only single failures, but also the complex interactions in the system. This study applies task driven behavior theory and system theory to reveal a generic framework of the MRTS escalator accident mechanism and uses Lasso-Logistic Regression (LLR) for escalator injury prediction. Escalator accidents in the Beijing MRTS are used as a case study to estimate the applicability of the methodologies. The main results affirm that the application of System-Theoretical Process Analysis (STPA) and Task Driven Accident Process Analysis (TDAPA) to the generic escalator accident mechanism reveals non-failure state task driven passenger behaviors and constraints on safety that are not addressed in previous studies. The results also confirm that LLR is able to predict escalator accidents where there is a relatively large number of variables with limited observations. Additionally, increasing the amount of data improves the prediction accuracy for all three types of injuries in the case study, suggesting the LLR model has good extrapolation ability. The results can be applied in MRTS as instruments for both escalator accident investigation and accident prevention.

Kang Y, Cao W, Petropoulos F, Li F. Forecast with Forecasts: Diversity Matters. European Journal of Operational Research [Internet]. 2022;301:180–190. 访问链接 Abstract

Forecast combinations have been widely applied in the last few decades to improve forecasting. Estimating optimal weights that can outperform simple averages is not always an easy task. In recent years, the idea of using time series features for forecast combinations has flourished. Although this idea has been proved to be beneficial in several forecasting competitions, it may not be practical in many situations. For example, the task of selecting appropriate features to build forecasting models is often challenging. Even if there was an acceptable way to define the features, existing features are estimated based on the historical patterns, which are likely to change in the future. Other times, the estimation of the features is infeasible due to limited historical data. In this work, we suggest a change of focus from the historical data to the produced forecasts to extract features. We use out-of-sample forecasts to obtain weights for forecast combinations by amplifying the diversity of the pool of methods being combined. A rich set of time series is used to evaluate the performance of the proposed method. Experimental results show that our diversity-based forecast combination framework not only simplifies the modeling process but also achieves superior forecasting performance in terms of both point forecasts and prediction intervals. The value of our proposition lies on its simplicity, transparency, and computational efficiency, elements that are important from both an optimization and a decision analysis perspective.

Petropoulos F, Apiletti D, Assimakopoulos V, Babai MZ, Barrow DK, Ben Taieb S, Bergmeir C, Bessa RJ, Bijak J, Boylan JE, et al. Forecasting: Theory and Practice. International Journal of Forecasting [Internet]. 2022;38:705–871. 访问链接 Abstract

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Pan R, Ren T, Guo B, Li F, Li G, Wang H. A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating. Journal of Business and Economic Statistics [Internet]. 2022;40:1691–1700. 访问链接 Abstract

Quantile regression is a method of fundamental importance. How to efficiently conduct quantile regression for a large dataset on a distributed system is of great importance. We show that the popularly used one-shot estimation is statistically inefficient if data are not randomly distributed across different workers. To fix the problem, a novel one-step estimation method is developed with the following nice properties. First, the algorithm is communication efficient. That is the communication cost demanded is practically acceptable. Second, the resulting estimator is statistically efficient. That is its asymptotic covariance is the same as that of the global estimator. Third, the estimator is robust against data distribution. That is its consistency is guaranteed even if data are not randomly distributed across different workers. Numerical experiments are provided to corroborate our findings. A real example is also presented for illustration.

Wang X, Kang Y, Petropoulos F, Li F. The Uncertainty Estimation of Feature-Based Forecast Combinations. Journal of the Operational Research Society [Internet]. 2022;73:979–993. 访问链接 Abstract

Forecasting is an indispensable element of operational research (OR) and an important aid to planning. The accurate estimation of the forecast uncertainty facilitates several operations management activities, predominantly in supporting decisions in inventory and supply chain management and effectively setting safety stocks. In this paper, we introduce a feature-based framework, which links the relationship between time series features and the interval forecasting performance into providing reliable interval forecasts. We propose an optimal threshold ratio searching algorithm and a new weight determination mechanism for selecting an appropriate subset of models and assigning combination weights for each time series tailored to the observed features. We evaluate our approach using a large set of time series from the M4 competition. Our experiments show that our approach significantly outperforms a wide range of benchmark models, both in terms of point forecasts as well as prediction intervals.

2021

Janeway MG, Zhao X, Rosenthaler M, Zuo Y, Balasubramaniyan K, Poulson M, Neufeld M, Siracuse JJ, Takahashi CE, Allee L, et al. Clinical Diagnostic Phenotypes in Hospitalizations Due to Self-Inflicted Firearm Injury. Journal of Affective Disorders. 2021;278:172–180.Abstract

Hospitalized self-inflicted firearm injuries have not been extensively studied, particularly regarding clinical diagnoses at the index admission. The objective of this study was to discover the diagnostic phenotypes (DPs) or clusters of hospitalized self-inflicted firearm injuries. Using Nationwide Inpatient Sample data in the US from 1993 to 2014, we used International Classification of Diseases, Ninth Revision codes to identify self-inflicted firearm injuries among those ≥18 years of age. The 25 most frequent diagnostic codes were used to compute a dissimilarity matrix and the optimal number of clusters. We used hierarchical clustering to identify the main DPs. The overall cohort included 14072 hospitalizations, with self-inflicted firearm injuries occurring mainly in those between 16 to 45 years of age, black, with co-occurring tobacco and alcohol use, and mental illness. Out of the three identified DPs, DP1 was the largest (n=10,110), and included most common diagnoses similar to overall cohort, including major depressive disorders (27.7%), hypertension (16.8%), acute post hemorrhagic anemia (16.7%), tobacco (15.7%) and alcohol use (12.6%). DP2 (n=3,725) was not characterized by any of the top 25 ICD-9 diagnoses codes, and included children and peripartum women. DP3, the smallest phenotype (n=237), had high prevalence of depression similar to DP1, and defined by fewer fatal injuries of chest and abdomen. There were three distinct diagnostic phenotypes in hospitalizations due to self-inflicted firearm injuries. Further research is needed to determine how DPs can be used to tailor clinical care and prevention efforts.

李丰 (Feng Li)

北京大学光华管理学院　商务统计与经济计量系　副教授、研究员，博士生导师

科研成果 by Type: 期刊论文

Pages

成果类型

成果概览

最新科研成果