Infinite Divisibility and Compound Poisson Law:Related Count Data Models and High-Dimensional Variable Selection

Thesis Type:

Master thesis


In this master thesis, we explore the probability theory, statistical inference and numerical computation of discrete compound Poisson (DCP) distribution. In particular, we do a very comprehensive literature review of DCP distributions and its applications in related statistical models of count data fields, and especially, we discuss penalized generalized linear model of count data regression.The discrete compound Poisson distributions have the probability generating function in the form of the following: The famous Feller’s characterization of the compound Poisson states that a discrete distribution is compound Poisson if and only if its distribution is discrete infinitely divisible. This is a special case of Levy-Khinchine formula. When the{ai}i=1∞, may take negative values and the sum is absolutely convergent, it is called pseudo discrete compound Poisson distribution.In the first chapter, we introduce an important tool (probability generating function and Fourier transform) as preliminaries and improve the flawed proof of Feller’s characterization, and then we give a short introduction of variable selection method about Lasso and generalization. We close this chapter with the infinitely divisibile prior distribution in Bayesian Lasso and we envisages appropriate zero-inflated distribution as prior distribution which obtains the nonzero sparse estimation of coefficients. The chapter Ⅱ discusses characterizations of DCP distribution(process) with ten methods to prove the probability mass function are given in Appendix, and we give over a hundred kinds of special cases or sub-families of DCP distribution which are listed in a table with references. We use Stein-Chen method and operator semigroup method to obtain the upper bound of the total variation between a sum of independent discrete r.v. and a related discrete compound Poisson r.v., and use row sum in random triangular array to approximate discrete compound Poisson distribution. Chapter Ⅳ studys statistics, parameters estimation, FFT of DCP probability mass. Chapter Ⅴ firstly uses cumulants estimation and Fourier transform estimation to actuarial claim data with zero-inflated and overdispersion properties, then compares its Kolmogorov-Smimov test and Chi-squared test. We give a theorem that a set of count data obeys discrete pseudo compound Poisson distribution if its. probability of zero is larger than the probability of nonzero. Further more, we use this zero-inflated property of pseudo discrete compound Poisson with adding virtual frequency techniques; we get an algorithm to fit any discrete distributions. Chapter V also discusses count GLM related to the DCP distribution and use penalized estimation to select important regression variables. In particular, we consider the Elastic net estimates of negative binomial regression, and we give a necessary and sufficient condition(like Karush-Kuhn-Tucker conditions) for non-zero(zero) coefficient estimates. Using a spider count data, we analysis this real example by negative binomial regression with MLE, Lasso, Elastic net penalties. Next, we set forth the survival functions in discrete frailty model and cured rate models (or long term survivor models with competing causes) which are derived from some DCP distributions. In the last section, we look forword to the future study that mixed Poisson distribution to approximate any discrete distribution, and states the problem of variable selection in mixture components. Due to the complexity of the mixture, it results the high-dimentional problem. 


Full Text

Infinite Divisibility and Compound Poisson Law: Related Count Data Models and High-Dimensional Variable Selection