RT Journal Article
JF IEEE Transactions on Knowledge & Data Engineering
YR 2013
VO 25
IS 11
SP 2658
TI Bias Correction in a Small Sample from Big Data
A1 Jianguo Lu,
A1 Dingding Li,
K1 Equations
K1 Estimation
K1 Sociology
K1 Statistics
K1 Mathematical model
K1 Twitter
K1 Information management
K1 size estimation
K1 Big data
K1 online social networks
K1 small sample
K1 bias
AB This paper discusses the bias problem when estimating the population size of big data such as online social networks (OSN) using uniform random sampling and simple random walk. Unlike the traditional estimation problem where the sample size is not very small relative to the data size, in big data, a small sample relative to the data size is already very large and costly to obtain. We point out that when small samples are used, there is a bias that is no longer negligible. This paper shows analytically that the relative bias can be approximated by the reciprocal of the number of collisions; thereby, a bias correction estimator is introduced. The result is further supported by both simulation studies and the real Twitter network that contains 41.7 million nodes.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 1041-4347
LA English
DO 10.1109/TKDE.2012.220
LK http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.220