Abstract
The proliferation of contaminated data on Internet of Things (IoT) devices has the potential to undermine the accuracy of data-driven decision-making by altering the distribution of original data. Existing data cleaning methods primarily depend on cloud center or cloud-edge cooperation, leading to prolonged data transmission delays and reduced cleaning accuracy. In this study, we identify edge server placement as a crucial step aligned with data cleaning and view the collaborative edge server placement with distributed data cleaning (SPDC) as a holistic problem. We comprehensively quantify the complexity of our issue through the analysis of numerous scenarios. To address this problem, we introduce a novel distributed collaborative edge framework comprising two key stages: server placement and data cleaning. We propose an optimized clustering algorithm for the former, considering the data distribution on the IoT layer and the constraints of the edge layer. For the latter, we introduce a gossip-based data cleaning algorithm that fully utilizes edge collaboration to enhance data cleaning accuracy. The algorithm exhibits an approximate performance complexity of O(ln m), where m represents the number of users' tasks. Both theoretical analysis and experimental results reveal that our algorithm an average improvement in data cleaning accuracy of 9.02% and a reduction in delay of 36.61%, surpassing the performance of state-of-the-art works in various scenarios.
Original language | English |
---|---|
Journal | IEEE Transactions on Services Computing |
DOIs | |
State | Accepted/In press - 2025 |
Keywords
- collaborative edge computing
- data cleaning
- Server placement