PLDC Reading Club

Welcome to the official PLDC reading club!

The idea is to propose papers to discuss within the group. From time to time (maybe weekly) a paper is proposed and presented (by the proposer) during the group meeting.

Ideally, the presenter should write down a summary with the most interesting points of the paper and the discussion, and upload it to this wiki!

The following papers have been already discussed (click to hopefully read an awesome summary :P). The last one is the next paper to be discussed.

Paper presented in the reading group: SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services

Present Time: 2014/01/22

Published Conference: SOSP'13.

Presenter: Ying Liu

Motivation: A common technique for an application to deliver a good service to its clients is to use a geo-distributed approach. So the clients can be served by the closest application distribution, which reduces the access latency. The easiest way to deploy a geo-distributed service is to use a public Cloud service. However, replication of the data used by the applications is not directly managed by the Cloud services. SpanStore is built to schedule an optimal plan for deploying and replicating a geo-distributed storage service using public storage Clouds with minimum cost and satisfying application requirements.

Implementation: Major Components: SpanStore Instance: It manages the metadata generated by the workload and optimal replication policy calculated by the Placement Manager. Placement Manager: It calculates the optimal replication policy offline using two static inputs, which are client application requirements (latency SLOs, consistency and fault tolerant) and Data center metrics (inter-DC latencies and prices) and one dynamic input, which is the workload patterns periodically reported by the SpanStore Instance. Then the optimal replication policy is distributed to all SpanStore Instances. A Library: A Library to interact with Client requests and SpanStore to correct forward requests to the responding Cloud storage instances.

Optimization Problem: The costs to host a geo-replicated storage service in public Cloud services include Storage Cost, Request Access Cost and Data Transfer Cost. These costs are different from different Cloud service providers and their data center locations. There is no optimal pricing solution among Cloud providers. Furthermore, the workload patterns of an application also influence its optimal replication policy. With the identification of these constraints, this optimization problem is then solved with Integer Linear Programming.

Other Techniques: Metadata management and consistency management are presented in the paper. But the approaches are very basic. Many improvements can be done on the efficiency of achieving consistency requirements.

Comments: This paper presents a good idea of integrating several Cloud storage providers to provide an integrated view to clients and providing an optimal pricing plan to host their data and services. In my view, the consistency algorithms in SpanStore are not sufficiently implemented and discussed in evaluation. There is only evaluation of geo-spread workload with eventual consistency requirement. The performance of SpanStore under stringent consistency requirement with geo-spread workload is somehow hided in the paper. According to their two-phase locking algorithm, there will be no performance improvements or even performance degradation using SpanStore.


edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki