# 科研成果

Zhang H, Jia J. Elastic-net regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection. Working Paper [Internet]. 研究手稿. 访问链接
Yu J, Zhang H. Optimal Subsampling for Big Data Generalized Linear Models. Working Paper. 研究手稿.Abstract
A new subsampling algorithms, motivated from A-optimality criterion (OSMAC), has been proposed for fast approximate the ML estimator of the massive data logistic regression in Wang \emph{et al.} (2017, \emph{JASA}). In this paper, we extend the subsampling framework of OSMAC for generalized linear models with canonical link function. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and we derive optimal subsampling probabilities under a little modified version of MV and MVc optimality criteria. Next, we determine the subsampling size based on a $F$-norm matrix concentration inequality. The optimal subsampling probabilities also depend on the full data, so we establish a two-step algorithm to approximate the optimal subsampling procedure, and consistency and asymptotic normality of the estimator are also proved. As applications, the optimal subsampling strategy are searched out in several situations.
2017
Zhang H, Li B, Jay KG. A characterization of signed discrete infinitely divisible distributions. Studia Scientiarum Mathematicarum Hungarica [Internet]. 2017;54(4):446–470. 访问链接
2016
Zhang H. Infinite Divisibility and Compound Poisson Law:Related Count Data Models and High-Dimensional Variable Selection. 华中师范大学 硕士论文 [Internet]. 2016. 访问链接Abstract
In this master thesis, we explore the probability theory, statistical inference and numerical computation of discrete compound Poisson (DCP) distribution. In particular, we do a very comprehensive literature review of DCP distributions and its applications in related statistical models of count data fields, and especially, we discuss penalized generalized linear model of count data regression.The discrete compound Poisson distributions have the probability generating function in the form of the following: The famous Feller’s characterization of the compound Poisson states that a discrete distribution is compound Poisson if and only if its distribution is discrete infinitely divisible. This is a special case of Levy-Khinchine formula. When the{ai}i=1∞, may take negative values and the sum is absolutely convergent, it is called pseudo discrete compound Poisson distribution.In the first chapter, we introduce an important tool (probability generating function and Fourier transform) as preliminaries and improve the flawed proof of Feller’s characterization, and then we give a short introduction of variable selection method about Lasso and generalization. We close this chapter with the infinitely divisibile prior distribution in Bayesian Lasso and we envisages appropriate zero-inflated distribution as prior distribution which obtains the nonzero sparse estimation of coefficients. The chapter Ⅱ discusses characterizations of DCP distribution(process) with ten methods to prove the probability mass function are given in Appendix, and we give over a hundred kinds of special cases or sub-families of DCP distribution which are listed in a table with references. We use Stein-Chen method and operator semigroup method to obtain the upper bound of the total variation between a sum of independent discrete r.v. and a related discrete compound Poisson r.v., and use row sum in random triangular array to approximate discrete compound Poisson distribution. Chapter Ⅳ studys statistics, parameters estimation, FFT of DCP probability mass. Chapter Ⅴ firstly uses cumulants estimation and Fourier transform estimation to actuarial claim data with zero-inflated and overdispersion properties, then compares its Kolmogorov-Smimov test and Chi-squared test. We give a theorem that a set of count data obeys discrete pseudo compound Poisson distribution if its. probability of zero is larger than the probability of nonzero. Further more, we use this zero-inflated property of pseudo discrete compound Poisson with adding virtual frequency techniques; we get an algorithm to fit any discrete distributions. Chapter V also discusses count GLM related to the DCP distribution and use penalized estimation to select important regression variables. In particular, we consider the Elastic net estimates of negative binomial regression, and we give a necessary and sufficient condition(like Karush-Kuhn-Tucker conditions) for non-zero(zero) coefficient estimates. Using a spider count data, we analysis this real example by negative binomial regression with MLE, Lasso, Elastic net penalties. Next, we set forth the survival functions in discrete frailty model and cured rate models (or long term survivor models with competing causes) which are derived from some DCP distributions. In the last section, we look forword to the future study that mixed Poisson distribution to approximate any discrete distribution, and states the problem of variable selection in mixture components. Due to the complexity of the mixture, it results the high-dimentional problem.
Zhang H, Li B. Characterizations of discrete compound Poisson distributions. Communications in Statistics-Theory and Methods [Internet]. 2016;45(22):6789-6802. 访问链接
Zhang H. New proofs of Chaundy--Bullard identity in "The Problem of Points''. The Mathematical Intelligencer [Internet]. 2016;38(1):4-5. 访问链接
2015

2014
Zhang H, Liu Y, Li B. Notes on discrete compound Poisson model with applications to risk theory. Insurance: Mathematics and Economics [Internet]. 2014;56:325-336. 访问链接