wiki:2012/Projects/Caching

Project Goal : To provide in-network caching service to end users so that the latency of retrieval of the requested content is reduced. The caching service is to be installed on multiple virtual machines instantiated on some intermediate routers called points of presences.

Challenges: The project involves three major challenges. The first challenge involves the type of files to be cached. To deal with this challenge we will agree on some rules of configuration and will configure the cache service accordingly. The rules of configuration could depend on the size of the file as well as the request frequency for that file. The second challenge incorporates where to cache. It means where on the network should the files be cached so that the end user can quickly retrieve the requested file. A possible solution for this challenge would be to cache the content on some common node on the network which connects to many nodes spread around the network some of which are likely to be close to the upcoming location of a moving mobile device. Also, if we were able to detect that the mobile device is travelling through public transportation then we would easily be able to predict the next location of the device and would cache the file directly on to the router close to that next location. The third challenge comprises of caching overhead. This means, what routing information regarding the packet’s route on the network should be cached on each node as the packet traverses from one node to another.

We are implementing our cache on virtual machines to make better use of the servers. All virtual machines will cache some type of content based on rules determined by us. With the use of these VMs, we will be able to allocate the maximum resources to the service in need during a particular time period. For instance, if for some reason there is a time during which lots of users are downloading only video files, we will be able to turn off all VMs that are not caching video files and thus allocate the maximum resources to those VMs that are caching video files. To avoid crashing a VM by caching too many content files on one VM, we will load balance. Using this technique we will set a certain limit for caching on each VM and if this limit is reached we will instantiate more VMs and will cache any new content files on these new VMs.

As of now, we have installed 3-4 VMs on sb5 and installed the Squid caching proxy server on node1-1 on sb5. To get this far, we had to set up a tunnel between our laptop and node1-1 on sb5. To be able to instantiate VMs on the node, we had to first install the kvm hypervisor on the Ubuntu OS of the node. We then had to set up a bridge and configure the network on the VM so that the VM instantiated on the node could connect to the external network. We also had to install UltraVNC viewer on our laptop so that we could view our VM on the node through the tunnel between our laptop and the node. After setting up the VMs we tried out the squid proxy server on our laptops. We installed and configured this proxy server on one of our laptops so that only certain sites would be allowed when browsing requests went through this proxy. We then configured the browser on the other laptop to go through the proxy on the first laptop and made sure that the browsing requests went through the proxy.

Our immediate goal is to install squid on the node and test out the Squid cache service by using the wget command, which is configured to work behind the proxy server on the node, on each VM to download a single large file. This will allow us to determine whether or not that large file is cached as that same file is requested by multiple VMs respectively. During the next week we also want to determine the maximum number of VMs that we can operate on a single node without crashing it as well as the data storage limit of a single VM.

Last modified 5 years ago Last modified on Jul 2, 2012, 2:22:53 PM