Author

Dongqing Xiao

Faculty Advisor or Committee Member

Mohamed Y. Eltabakh, Advisor

Faculty Advisor or Committee Member

Elke A. Rundensteiner, Committee Chair

Faculty Advisor or Committee Member

Xiangnan Kong, Committee Member

Faculty Advisor or Committee Member

Yuanyuan Tian, Committee Member

Faculty Advisor or Committee Member

Craig Wills, Department Head

Identifier

etd-050317-134652

Abstract

In many prevalent application domains, such as business to business network, social networks, and sensor networks, graphs serve as a powerful model to capture the complex relationships inside. These graphs are of significant importance in various domains such as marketing, psychology, and system design. The management and analysis of these graphs is a recurring research theme. The increasing scale of data poses a new challenge on graph analysis tasks. Meanwhile, the revealed edge uncertainty in the released graph raises new privacy concerns for the individuals involved. In this dissertation, we first study how to design an efficient distributed triangle listing algorithms for web-scale graphs with MapReduce. This is a challenging task since triangle listing requires accessing the neighbors of the neighbor of a vertex, which may appear arbitrarily in different graph partitions (poor locality in data access). We present the “Bermuda” method that effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. Bermuda encompasses two general optimization principles that fully utilize the locality and re-use distance of local pivot message. Leveraging these two principles, Bermuda not only speeds up the triangle listing computations by factors up to 10 times but also scales up to larger datasets. Second, we focus on designing anonymization approach to resisting de-anonymization with little utility loss over uncertain graphs. In uncertain graphs, the adversary can also take advantage of the additional information in the released uncertain graph, such as the uncertainty of edge existence, to re-identify the graph nodes. In this research, we first show the conventional graph anonymization techniques either fails to guarantee anonymity or deteriorates utility over uncertain graphs. To this end, we devise a novel and efficient framework Chameleon that seamlessly integrates uncertainty. First, a proper utility evaluation model for uncertain graphs is proposed. It focuses on the changes on uncertain graph reliability features, but not purely on the amount of injected noise. Second, an efficient algorithm is designed to anonymize a given uncertain graph with relatively small utility loss as empowered by reliability-oriented edge selection and anonymity-oriented edge perturbing. Experiments confirm that at the same level of anonymity, Chameleon provides higher utility than the adaptive version of deterministic graph anonymization methods. Lastly, we consider resisting more complex re-identification risks and propose a simple-yet-effective Galaxy framework for anonymizing uncertain graphs by strategically injecting edge uncertainty based on nodes’ role. In particular, the edge modifications are bounded by the derived anonymous probabilistic degree sequence. Experiments show our method effectively generates anonymized uncertain graphs with high utility.

Publisher

Worcester Polytechnic Institute

Degree Name

PhD

Department

Computer Science

Project Type

Dissertation

Date Accepted

2017-05-03

Accessibility

Unrestricted

Subjects

Graph Analytic, Graph Privacy

Share

COinS