search for


Estimation of Gini-Simpson index for SNP data
Journal of the Korean Data & Information Science Society 2017;28:1557-64
Published online November 30, 2017
© 2017 Korean Data & Information Science Society.

Joonsung Kang1

1Department of Information Statistics, Gangneung-Wonju National University
Correspondence to: Joonsung Kang
Associate professor, Department of Information Statistics, Gangneung-Wonju National University, Jukheon-gil 7, Gangneung-si, Korea. E-mail:
Received September 29, 2017; Revised October 30, 2017; Accepted November 1, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
We take genomic sequences of high-dimensional low sample size (HDLSS) without ordering of response categories into account. When constructing an appropriate test statistics in this model, the classical multivariate analysis of variance (MANOVA) approach might not be useful owing to very large number of parameters and very small sample size. For these reasons, we present a pseudo marginal model based upon the Gini-Simpson index estimated via Bayesian approach. In view of small sample size, we consider the permutation distribution by every possible n! (equally likely) permutation of the joined sample observations across G groups of (sizes n1,...nG).We simulate data and apply false discovery rate (FDR) and positive false discovery rate (pFDR) with associated proposed test statistics to the data. And we also analyze real SARS data and compute FDR and pFDR. FDR and pFDR procedure along with the associated test statistics for each gene control the FDR and pFDR respectively at any level α for the set of p-values by using the exact conditional permutation theory.
Keywords : Gini-Simpson index, HDLSS, FDR, MANOVA