Life in Systems: 03/01/2006

Thursday, March 16, 2006

Bandwidth Management Scheme

Normal Operation

For any one particular node in the system, the bandwidth is managed in a fairly simple manner. For each incoming connection, the node fairly allocates as much bandwidth to the connection as it can. Thus, the total capacity of the node's link is equally divided among the number of open connections.

Update and Challenges

When the backup interval elapses, signaling that an update of the node's data needs to occur, the node computes the minimum amount of bandwidth needed for all the open connections from remote nodes. Each of these connections is then throttled down to its minimum, and the rest of the bandwidth is dedicated to sending out the updated blocks or downloading the requested blocks. Once the update or challenge completes, the existing incoming connections are repartitioned to a fair share of the total bandwidth provided the opposite challenge or update is not occurring.

Assumptions and Simplifications

For this to be accurate, each of the incoming connections must have the same minimum bandwidth requirement. If they did not, as the number of connections grew and a fair proportion was divided among each of them, the node would violate the agreement of the connection with the largest minimum requirement first when, instead, it could unequally partition the bandwidth is such a manner that each connection was at or above its minimum.

Presumably, when an update occurs the outgoing connections could be shaped by the remote nodes in such a way that the total capacity of the node's link is not being utilized. In this case, the node should dedicate more bandwidth to the incoming connections rather than just keep them at their minimum requirements until the update finishes.

Monday, March 13, 2006

General Prototype Design

Main

The main thread of the prototype initializes the program, creates a listening socket, and loops forever, waiting for connections and timing events. The loop first calculates how much time it can wait before an event must occur. It then calls select() to poll the listening socket for the specified amount of time, which will block until one of two events occur: a connection is available to accept or a timeout occurred. If a connection is available, the main thread accepts the connection, creates a PeerConnection object around the socket, and spawns a new thread to for the PeerConnection object to process the communication over the object.

If a timeout occurs, there are two cases the main thread must handle. The first occurs if the timeout means that an update interval was reached. In this case, the main thread spawns another thread calling the update() method, which simulates performing an update by creating objects of and spawning threads for the UpdateConnection class. The second case occurs when a challenge session time is reached. In this case, a similar scenario occurs where the main thread spawns another calling the restore() method, which itself spawns multiple threads, each to service a RestoreConnection object that handles the challenge communication with one particular partner.

Bandwidth Management

All of the bandwidth accounting is done in the main(), update(), and restore() functions. While not described below, each spawned connection thread measures its own bandwidth usage and stores the value in a mutex protected shared variable. These three functions then compute the total of all their spawned connections in order to determine network utilization. The scheme for this is not yet fully implemented.

PeerConnection

Each PeerConnection is associated with a single accepted socket. The object loops indefinitely waiting to read data from the socket. The remote host is either trying to store data or retrieve it. Thus, the PeerConnection first reads a single byte t hat determines with case it should expect. If the remote host wishes to store a block, then the PeerConnection loops until it has read in an entire block, which is then "stored to disk," and by that I mean discarded. If the remote host is asking for a block, the Peer Connection generates a block of data and loops until the entire 4KB block has been sent. It then returns to block on a recv() until another byte of data can be read from the socket. When the socket is closed, the PeerConnection is terminated.

UpdateConnection

Each UpdateConnection is associated with one particular partnership. It picks a random number of blocks, connects to the partner, and sends the blocks to the remote host while implementing the simple protocol for the PeerConnection to respond correctly has described above. Once it has completed, it closes the socket and terminates the thread.

RestoreConnection

Each RestoreConnection is, similar to UpdateConnection, associated with one particular partnership. It randomly selects a number of blocks to retrieve from the partner, connects to the host, and asks for blocks one at a time. Once each has been received, it closes the socket and terminates the thread.

For simplicity, it is assumed the remote peer will generate some kind of data to send back, whereas an actual implementation would have to account for the possibility that the host is not there or can't produce the data.

Monday, March 06, 2006

Netnice Reloaded

Netnice provides QoS by setting up what are called VIFs in the /proc/networks directory. Each VIF can be given a weight, priority, and/or bandwidth limitation in order to achieve a network QoS model where the collection of VIFs share the physical network connection.

For the purposes of this experiment, it seems as though we would need a VIF for each connection that a remote peer initiates with a node (in the least) since its the node's responsibility to shape his peers' access to itself. However, it would be best to shape connections the software make as well. Therefore, the software will need to dynamically create and destroy VIFs for each connection it accepts, creates, and terminates. It will further need to be able to set the QoS parameters for each based on established partnership agreements.

I need to find some code samples that do this.... www.netnice.org has none.

Thursday, March 02, 2006

Connection detail

There are two "events" that need to be dealt with in the software: the point in time when an update needs to occur where new data is sent to the partners and the event when a partner connects to the node in order to store data. (We don't need to worry about a host connecting in order to negogiate a parntership just yet).

On an Update

Here, there are several things that need to happen. First a comparison between an old snapshot and the new state of the data needs to be made in order to identify those blocks that need to be updated on other partners. Then, one at a time, a connection needs to be formed with the necessary partners and the data sent while bandwidth is available to support the transfer. In each connection, we need to identify this host to the remote partner, in turn specify the blocks that are going to be sent(given an assumed block size), and send them allowing the remote partner to determine the transfer rate (using netnice).

We model this by assuming a large file of data segmented into blocks, a subset of which are randomly selected and assumed to have changed, and they conveniently reside on a single remote partner. Thus, the software needs to form a single connection with this one partner, and it can allow its IP address to identify itself. It then sends a block ID number of some kind followed by the block itself.

Accepting a connection

When accepting a connection, the software first needs an identification from the remote, connecting partner so as to determine which of its partnership this connection is servicing. Second, it needs to determine if this particular connection is requesting data from it or requesting to store an updated block (again forgoing the possibility that this might be an attempt to negotiate a new partnership). Then, it can expect a block ID number to follow. If it's a request for a restore, the host should send some kind of acknowledgement followed by the block. If not, then the remote partner should immediately begin sending the block to store.

Aside from the file system interface, this needs a full implementation.

Notes

It should be the connecting hosts responsibility to terminate all connections. The accepting (serving) host should be prepared to store or restore successive blocks until the connection is terminated.