A Study of Replicated and Distributed Web Content

John, Nitin Abraham

Etd

A Study of Replicated and Distributed Web Content

Public

With the increase in traffic on the web, popular web sites get a large number of requests. Servers at these sites are sometimes unable to handle the large number of requests and clients to such sites experience long delays. One approach to overcome this problem is the distribution or replication of content over multiple servers. This approach allows for client requests to be distributed to multiple servers. Several techniques have been suggested to direct client requests to multiple servers. We discuss these techniques. With this work we hope to study the extent and method of content replication and distribution at web sites. To understand the distribution and replication of content we ran client programs to retrieve headers and bodies of web pages and observed the changes in them over multiple requests. We also hope to understand possible problems that could face clients to such sites due to caching and standardization of newer protocols like HTTP/1.1. The main contribution of this work is to understand the actual implementation of replicated and distributed content on multiple servers and its implication for clients. Our investigations showed issues with replicated and distributed content and its effects on caching due to incorrect identifers being send by different servers serving the same content. We were able to identify web sites doing application layer switching mechanisms like DNS and HTTP redirection. Lower layers of switching needed investigation of the HTTP responses from servers, which were hampered by insuffcient tags send by servers. We find web sites employ a large amount of distribution of embedded content and its ramifcations on HTTP/1.1 need further investigation.

Creator