Target Distribution Guided Network Sampling

Renjie Fan, Zhiwen Yu, Bin Guo, Liang Wang, Dingqi Yang

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Studying public users' data on social networks to provide service and prediction for the society has been a widespread and effective way thanks to the rapid raise of social networks. However, users' population structure online is usually different from that of physical world, which may influence the researches significantly. Thus it may become an essential limitation for studies conducted by revealing knowledge from social media data owing to the biased network population structure. Tradition sample approaches are either resources-intensive or data-biased. In this paper, we proposed a target distribution guided sample process to solve the problem of imbalanced user data in the virtual space. We make intervention to the sampling procedure according to the real-Time divergence of the collected sample set against the target distribution, apply theory of homophily to discover the users with matched features and refine the samples with recursive sampling. Experiments show this method is able to successfully constrain samples' overall structure according to the given distribution within a given JS divergence of 0.1 while leaving the unrelated features distributed randomly. Moreover, it takes less times of access to collect a certain number of samples for the method proposed in this paper and thus save time and computer resources.

源语言英语
主期刊名Proceedings - 5th International Conference on Advanced Cloud and Big Data, CBD 2017
出版商Institute of Electrical and Electronics Engineers Inc.
374-379
页数6
ISBN(电子版)9781538610725
DOI
出版状态已出版 - 6 9月 2017
已对外发布
活动5th International Conference on Advanced Cloud and Big Data, CBD 2017 - Shanghai, 中国
期限: 13 8月 201716 8月 2017

出版系列

姓名Proceedings - 5th International Conference on Advanced Cloud and Big Data, CBD 2017

会议

会议5th International Conference on Advanced Cloud and Big Data, CBD 2017
国家/地区中国
Shanghai
时期13/08/1716/08/17

指纹

探究 'Target Distribution Guided Network Sampling' 的科研主题。它们共同构成独一无二的指纹。

引用此