Monday, June 07, 2010

Image Upgrade

Slightly improved on my previous images.

I'm getting more into infographics for communicating information. They take much longer to create, but they communicate information at a higher bandwidth, meaning you pick up the information faster.

In this one, I'm splitting the concepts I need to discuss into a conceptual abstraction of these systems and an architectural outline for them (or for the one I'm proposing).

At the conceptual level, a patient exhibits a current Situation, which is the collection of his or her vital signs together with the environmental factors that not only influence those vital signs but also impact the system itself. This Situation is the target of the systems observation and monitoring efforts. By collecting a view of it, the system can make an assessment of the patient's current state and take any needed action, such has alerting a medical professional.

We implement this observation of the patient's situation with a set of sensors attached to the patient that organize into a body area network (a low power peer-to-peer wireless network). The sensors upload their data to a central coordinator in the body area network, which, having the collected information, can implement the algorithms to assess the patients state. It can then initiate the appropriate actions to respond to this assessment.

If communication with a medical professional is needed, the middleware component handles the details of passing that communication from the coordinator to the appropriate doctor, nurse, first responder, or technician. The system has to rely on the wireless networking infrastructure in the patient's environment (since we don't want to restrict where the patient goes), which makes it tough to make guarantees about how good the connection will be... if it even has one. The middleware hides all this, taking messages from the coordinator, connecting to WiFi and 3G/4G networks as needed, and delivering them when a path is found to their recipient.

This last piece is where my work will focus.

Thursday, May 27, 2010

Change to the Application Diagram

Small change to the Application Diagram:

The Middleware will provide information about who's interested in the data and what network interfaces are available, both of which will ultimately affect the decisions made, so they've been added as part of the physical context.

Tuesday, May 25, 2010

Publish/Subscribe

I want to go back to the middleware diagram.

There's an agreement implied between the application and the middleware in the messages exchanged between them. The application has to describe enough information in its proper format so that the middleware, the location service and the routing protocol, can do their job. Being that the diagram doesn't lay any requirements for that out, you might deduce that my proposal will be equally ambiguous. In reality, this is part of what I intend to examine in my work.

To start, though, we've noticed that the interaction of the patients and their healthcare professionals fit nicely with what is known as a Publish/Subscribe Architecture.

The idea of Pub/Sub is that some entity produces a stream of data packets (publishes) to some publication, and other entities subscribe to that publication. The architecture then moves all the data packets from the publisher to the subscribers. It's easiest to think of this in terms of a blog (like this one), where each new blog entry is ultimately sent to the RSS readers of those people subscribed to the blog's RSS feed. The only thing that publishers and subscribers need to communicate is the name of the publication.

A more complicated implementation is called Content-based Publish/Subscribe. Here the publishers, instead of attaching just a name to each message, attach a series of values, like patientID = 1234, signal = ECG, situation = 'heart attack', etc. An interested party can then subscribe by declaring a filter that says something like, "send me all messages where the 'situation' is 'heart attack'."

You might imagine that Content-based Pub/Sub is harder to implement. It might, however, make the entire system easier to operate.

Thus, one of the questions I have to (want to) answer in my work is:

Which Pub/Sub Architecture better serves the application/system?

Monday, May 24, 2010

A Middleware Architecture for Wireless Health Monitoring

A little bit about what I'm getting into these days.

We all know healthcare costs are going up. We've identified a class of applications that fall under the label Wireless Health Monitoring and are argued to help reduce these costs. Essentially, they're related to the remote observation of vital signs and behavior. They help reduce costs by keeping people out of a hospital bed, potentially moving the elderly from the nursing home to an assisted living facility, or moving them from assisted living to their own home (i.e. "aging in place").

Here's a diagram of the typical application:

A set of sensors is attached to the body and produces a set of signals for the application. We combine these inputs with other signals like the battery power of these devices and the network conditions to define an application context, organized through the Monitoring component and presented to the Decision Making Component as the current situation.

The Decision Making Component computes the severity of the situation and takes one of a set of possible actions. Low severity, indicating normal or expected vital signs, requires no action unless there is an outstanding request to service or a regular report to generate. As the severity increases, the application starts taking local action to increase the frequency or granularity of monitoring. A high severity level is reported to appropriate healthcare professionals, such as the patient's Primary Care Physician (PCP), nurses, or Emergency Responders.

The part I'm adding to this dance is the middleware architecture. This part takes responsibility for delivering messages from to and from the application, or, more formally, given a message, to deliver that message to all of the intended recipients. Here's a diagram outlining the components of the middleware:

We define a Location Service that looks up the most likely location of the recipients, patients, doctors, nurses, or emergency responders who could be at home, at work, in transit, or outside their normal routine. The location service uses their past known locations and potentially their calendar service to predict a likely location for the person. This location, if it's found, then aids in the routing component, which is responsible for finding a set of paths to the recipients.

My Task: Design the location service and the routing protocol

  1. Design the location service and the routing protocol (or a bunch of possible designs)
  2. Prove to you that they meet the needs of the application I just overviewed

Saturday, May 22, 2010

Check Up

I should have known that I'd never really update this guy. This is my perennial resolution to update my research blog more. This one, though, comes with the commitment to update my website as well. I'm going for a CSS3 and HTML5 experience. I also want this page to match that layout as best it can. If these Blogger templates don't let that happen, then I might go with something else.

As far as my research goes, I'm working on my proposal. The idea is to determine the systems infrastructure needed to support Personalized Health IT Applications. Many people are starting to track their own vital signs, and we're entering an era where your doctor can ask you to start tracking them in your every day life. The question is then how do we connect you to your doctor in this new intimate, personalized health monitoring environment.

At the moment, I'm having trouble convincing my advisor that I understand the goal of all this.

Wednesday, June 04, 2008

Summer Intern 2008

I sought an internship for this summer (2008) because I was a little tired of making next to nothing. Ericsson has an R&D facility just north of Pittsburgh and was kind enough to put me to work for the summer. It's my third day, and I'm enjoying it.... well, the work I enjoy, but the whole 8-5 thing is a bummer. Yet, being the fourth internship in my life, I knew what I was committing to.

The work will be a good experience. It's everything else involved in working that I'm going to have fun commenting on this summer.

Stay tuned.

Thursday, September 06, 2007

Network Simulators

You would think that Computer Science researchers would develop the simplest, most effective software around for each other to use, test their ideas, and advance the industry as quickly as possible. Not the case. Simulators and support infrastructure is developed hastily and haphazardly with little concern for ease, extensibility, or maintenance. In short, we aren't software engineers, or at least the tight funding opportunities and rapidly approaching deadlines stress us all to quick and dirty design tactics. Consequently, I'm left to muddle through the relatively complicated design of this simulator, for which little to no documentation is provided.

Monday, September 11, 2006

Where have I been

It's been a long time since my last research post. I haven't taken the time to write about what I'm doing because I'm actually doing it. Over the past year and a half, I've been exploring my research options here at Pitt and settling on an area. I attempted to move the distributed storage project, that used Netnice, to the Planetlab testbed, on which I had to develop my own simple quality of service implementation for the system. I then, as part of a class, got involved in a file distribution system, which I'm still planning on working "in the background" with time I don't have.

More recently, though, I've been involved in wireless networks. I evaluated some routing protocols as part of a class, and I've just been getting into a Delay Tolerant Networking approach over the summer.

Thursday, March 16, 2006

Bandwidth Management Scheme

Normal Operation

For any one particular node in the system, the bandwidth is managed in a fairly simple manner. For each incoming connection, the node fairly allocates as much bandwidth to the connection as it can. Thus, the total capacity of the node's link is equally divided among the number of open connections.

Update and Challenges

When the backup interval elapses, signaling that an update of the node's data needs to occur, the node computes the minimum amount of bandwidth needed for all the open connections from remote nodes. Each of these connections is then throttled down to its minimum, and the rest of the bandwidth is dedicated to sending out the updated blocks or downloading the requested blocks. Once the update or challenge completes, the existing incoming connections are repartitioned to a fair share of the total bandwidth provided the opposite challenge or update is not occurring.

Assumptions and Simplifications

For this to be accurate, each of the incoming connections must have the same minimum bandwidth requirement. If they did not, as the number of connections grew and a fair proportion was divided among each of them, the node would violate the agreement of the connection with the largest minimum requirement first when, instead, it could unequally partition the bandwidth is such a manner that each connection was at or above its minimum.

Presumably, when an update occurs the outgoing connections could be shaped by the remote nodes in such a way that the total capacity of the node's link is not being utilized. In this case, the node should dedicate more bandwidth to the incoming connections rather than just keep them at their minimum requirements until the update finishes.

Monday, March 13, 2006

General Prototype Design

Main

The main thread of the prototype initializes the program, creates a listening socket, and loops forever, waiting for connections and timing events. The loop first calculates how much time it can wait before an event must occur. It then calls select() to poll the listening socket for the specified amount of time, which will block until one of two events occur: a connection is available to accept or a timeout occurred. If a connection is available, the main thread accepts the connection, creates a PeerConnection object around the socket, and spawns a new thread to for the PeerConnection object to process the communication over the object.

If a timeout occurs, there are two cases the main thread must handle. The first occurs if the timeout means that an update interval was reached. In this case, the main thread spawns another thread calling the update() method, which simulates performing an update by creating objects of and spawning threads for the UpdateConnection class. The second case occurs when a challenge session time is reached. In this case, a similar scenario occurs where the main thread spawns another calling the restore() method, which itself spawns multiple threads, each to service a RestoreConnection object that handles the challenge communication with one particular partner.

Bandwidth Management
All of the bandwidth accounting is done in the main(), update(), and restore() functions. While not described below, each spawned connection thread measures its own bandwidth usage and stores the value in a mutex protected shared variable. These three functions then compute the total of all their spawned connections in order to determine network utilization. The scheme for this is not yet fully implemented.

PeerConnection

Each PeerConnection is associated with a single accepted socket. The object loops indefinitely waiting to read data from the socket. The remote host is either trying to store data or retrieve it. Thus, the PeerConnection first reads a single byte t hat determines with case it should expect. If the remote host wishes to store a block, then the PeerConnection loops until it has read in an entire block, which is then "stored to disk," and by that I mean discarded. If the remote host is asking for a block, the Peer Connection generates a block of data and loops until the entire 4KB block has been sent. It then returns to block on a recv() until another byte of data can be read from the socket. When the socket is closed, the PeerConnection is terminated.

UpdateConnection

Each UpdateConnection is associated with one particular partnership. It picks a random number of blocks, connects to the partner, and sends the blocks to the remote host while implementing the simple protocol for the PeerConnection to respond correctly has described above. Once it has completed, it closes the socket and terminates the thread.

RestoreConnection

Each RestoreConnection is, similar to UpdateConnection, associated with one particular partnership. It randomly selects a number of blocks to retrieve from the partner, connects to the host, and asks for blocks one at a time. Once each has been received, it closes the socket and terminates the thread.

For simplicity, it is assumed the remote peer will generate some kind of data to send back, whereas an actual implementation would have to account for the possibility that the host is not there or can't produce the data.

Monday, March 06, 2006

Netnice Reloaded

Netnice provides QoS by setting up what are called VIFs in the /proc/networks directory. Each VIF can be given a weight, priority, and/or bandwidth limitation in order to achieve a network QoS model where the collection of VIFs share the physical network connection.

For the purposes of this experiment, it seems as though we would need a VIF for each connection that a remote peer initiates with a node (in the least) since its the node's responsibility to shape his peers' access to itself. However, it would be best to shape connections the software make as well. Therefore, the software will need to dynamically create and destroy VIFs for each connection it accepts, creates, and terminates. It will further need to be able to set the QoS parameters for each based on established partnership agreements.

I need to find some code samples that do this.... www.netnice.org has none.

Thursday, March 02, 2006

Connection detail

There are two "events" that need to be dealt with in the software: the point in time when an update needs to occur where new data is sent to the partners and the event when a partner connects to the node in order to store data. (We don't need to worry about a host connecting in order to negogiate a parntership just yet).

On an Update

Here, there are several things that need to happen. First a comparison between an old snapshot and the new state of the data needs to be made in order to identify those blocks that need to be updated on other partners. Then, one at a time, a connection needs to be formed with the necessary partners and the data sent while bandwidth is available to support the transfer. In each connection, we need to identify this host to the remote partner, in turn specify the blocks that are going to be sent(given an assumed block size), and send them allowing the remote partner to determine the transfer rate (using netnice).

We model this by assuming a large file of data segmented into blocks, a subset of which are randomly selected and assumed to have changed, and they conveniently reside on a single remote partner. Thus, the software needs to form a single connection with this one partner, and it can allow its IP address to identify itself. It then sends a block ID number of some kind followed by the block itself.

Accepting a connection

When accepting a connection, the software first needs an identification from the remote, connecting partner so as to determine which of its partnership this connection is servicing. Second, it needs to determine if this particular connection is requesting data from it or requesting to store an updated block (again forgoing the possibility that this might be an attempt to negotiate a new partnership). Then, it can expect a block ID number to follow. If it's a request for a restore, the host should send some kind of acknowledgement followed by the block. If not, then the remote partner should immediately begin sending the block to store.

Aside from the file system interface, this needs a full implementation.

Notes

  • It should be the connecting hosts responsibility to terminate all connections. The accepting (serving) host should be prepared to store or restore successive blocks until the connection is terminated.

Sunday, February 26, 2006

Rough Software Spec

I'm hoping Landon Cox will send me the Samsara code soon, but, as was discussed, it's not wise to wait on him. Thus, this is a bare minimum description of what the software prototype needs to do in the event we don't get Samsara as a base from which to work.

File System.

The full scale software system would need some kind of configuration interface where the user could specify the set of directories or disk mounts that the software should back up. From this the software would essentially create a single huge block of addressable data, which would then be split into chunks for distribution. It would then need to be able to track changes made to the given set of directories and map those changes into the single block.

For this prototype, it would suffice to use a single huge file to model the addressable data space the full scale system would create. Some means of modification to this file would then need to be devised. The prototype would then to notice the change and accurately determine the section of the file that has been modified. We discussed the use of Delta encoding for this purpose (we have to get more information about its implementation if Samsara never comes).

At a bare minimum, the prototype could simply maintain several small blocks, a subset of which are assumed to have changed with each backup "epoch."

Questions

Given a set of directories, the prototype flattens them and maps an address space over the result. What then happens as files are added to the directories? Would the flattened block then grows from the middle, shifting all subsequent data to different addresses?

Partnership management.

While Samsara/Pastiche have already tackled the problem of finding partners based on comparable software installations, our system would have to add a service level agreement advertisement and negotiation scheme to this for a full blown implementation of the Hard Data Proposal. This would require the user to give an configurable estimate of how much data changes, how varying that measure is, how quickly it should be backed up, and how "important" it is to get good service, which would direct the system to give up proportionately more resources compared to its needs, making a more desirable offer, and attracting better partners.

At a minimum, the prototype could assume a set of partnerships (as little as one), each with an assumed service level agreement that may or may not be an equal partnership.

Per Partnership needs

For each partnership, the prototype would need to store:

  • From The Partner
    • The service level agreement for the data being stored, which comprises the size of data to store and the minimum bandwidth needed
    • An identifiable tag for each block being stored for the partner (perhaps the starting address and size)
    • The location of all blocks for which the local node is responsible
  • To The Partner
    • The service level agreement for the data being held by the partner
    • An identifiable tag for each block being held by the partner
    • The segments of the address space that the partner is servicing, i.e. storing the equivalent blocks

On an Update or Restore

When a connection is created, the connecting host either wants to update the data stored or retrieve it. The initial communication of the connection should communicate this along with a requested amount of bandwidth for the update or restore and an identification of the blocks to be accessed. The serving host can then grant the connecting host a bandwidth allotment between the minimum agreed upon bandwidth and the requested amount, which could indicate the hosts wants all available.

Responsibilities

The serving host must be able to control this as the update or restore progresses while other connections from other partners are created and severed. Furthermore, it is the responsibility of the serving host to monitor and control this bandwidth because a connecting host greedily wants as much as possible while the serving host only wants to give up the smallest amount of resources to ensure the partnership continues. It is then the connecting host's responsibility to ensure his connecting service level agreement is not being starved.

Questions

The questions the arises from this is the serving host's ability to manage his network connection. Should it only make agreements such that the sum of the minimum bandwidth requirements is no more than its network capacity? Or can it overshoot its capacity relying on the fact that not all partners will be using the connection at once? If so, can it then deny or delay a connection based on preoccupied resources? Is this then grounds for termination of a SLA? If not, how frequently can it occur before a SLA should be terminated? (While this would be interesting to research, it would require a large number of hosts and an significantly long experiment. So what assumptions can we make?)

Update Frequency

Based the user settings, the prototype would have to ensure the given data, either the single file or set of directories, is up to date within a given timeframe. The simplest way to do this would be to have the prototype wake up at regular intervals in step with the user configuration. Upon such an event, the prototype would need to in some manner examine the entire set of data such that it compares this examination with the last wakeup event. This comparision should yield a set of address ranges that have changed from the last update, which are then compared with each partnership to obtain the set of partners that are storing the corresponding old data. The prototype then connects to each and uploads the newer data while monitoring the upload bandwidth of each connection to ensure it meets its SLA. Once complete, it retires until the next update.

Considerations

To do this, the prototype would need to be able to map the address ranges back to the directory structure, which could potentially map over several files and directories.

The prototype supposedly would determine whether the outgoing network capacity was sufficient to service the user's demand at the time the user configured the software. But this relied on an average update size and a measure of deviation from it. Thus, the prototype could potentially have more data to update than expected. Or a set of circumstances could arise that otherwise impede the update capability. Thus, the prototype must also manage the outgoing bandwidth such that the update occurs as quickly as possible but does not allow for the network capacity to become the bottleneck, restricting the bandwidth for a connection, which the prototype might incorrectly assume is the serving host violating the SLA. One way of overcoming this obstacle, is to make a single connection at a time, dedicating the entirety of the upload pipe to it and then allow the serving host to throttle the bandwidth down. Subsequent connections can then be made, one after another, with a similar protocol until there is not enough bandwidth left to properly service another connection.

Lastly, it's not difficult to assume that updates to the data do not occur all at once, but rather are spread over the update time frame interval. Therefore, the size of the needed update would increase with the time interval. Network utilization would be negatively affected, though, as larger bursts of transfers would be required at longer intervals while a more sustained average usage would be needed for shorter, almost continuous, intervals. This, however, assumes that the set of data to be backed up, if it were a set of directories, does not include those where running processes would be creating temporary files. (We could potentially examine this trade-off)

Issues/Questions

This raises the issue of what happens if more data needs backed up than the software has time to transfer it. Would the software potentially miss an update interval? What then happens if an update occurs while it's trying to transfer the previous snapshot? Can we include the update? Would it simply be easier to include that in the next snapshot? Would that violate what the user needs?

Thursday, February 23, 2006

The Plan

Steps to completing this project:
  1. Practice using Netnice
  2. Hope I get Samsara soon (I have heard from Landon Cox at Duke)
  3. Examine how Samsara/Pastiche find and creates storage partnerships
  4. Implement bandwidth reservations/restrictions for all partnerships
  5. Implement SLA negotiation between two peers
  6. Ensure the SLA is cannot be violated
  7. Optionally augment the Pastiche routing protocol to include finding peers with desired SLA characteristics
  8. Perform some kind of experimentation on the system to determine.... something
  9. Write a paper describing the work and the experiment

Wednesday, February 22, 2006

Netnice

After discussing the basics of Netnice with someone more knowledge than I am with it, I've found a potential snag in its usefulness. Apparently, Netnice is capable of limiting the bandwidth on a process by process basis whereas the demands for this application seem to need control on a connection by connection, all of which may be created within a single process.

However, this may not be an issue. If the Samsara code I'm expected to get soon forks a separate process for each connection, Netnice should work perfectly. Or, if it's multithreaded in a similar way to the fork() call, which I've seen Linux do, Netnice will probably be able to control its bandwidth with the granularity needed.

I'm expecting the daemon to be a self-contained, single process, though, in which case there are two possible solutions I imagine at the moment. At any given time, only a certain number of connections will be established. Each will have a set bandwidth allocation, and the aggregate total of these could be the value Netnice uses to shape the process' bandwidth. The alternative would be to implement a custom shaping scheme involving the packet size and a timer.

Once I get the code, I'll reexamine my options...

Administration

A portion of the bandwidth used by any particular node in a P2P backup system has to be dedicated to administrative purposes. After a service level agreement is negotiated, a node must constantly ensure that his partner is available and that its data is available on that partner's shared storage. While the former is relatively negligible, ensuring the data is available could be rather expensive. The Samsara and/or Pastiche papers suggested using a computed hash of the data to ensure the data is stored on the host, but Lillibridge points out that this will not ensure the partnering node will provide the data when it's needed. The benefit, which is the unused bandwidth, gained from not supplying the data despite the steep cost of having to hold the data anyway seems rather small but still creates a potential impact to the reliability of the system.

Thus, the added administrative cost is the size of the data block that needs transferred, the bandwidth (time) used in performing the transfer, and the frequency with which this needs to occur for all partners of a paritcular node.

Presumbly, this cost is fixed, and hence assumed, for any service level agreement between two partners. It should, however, be chosen so as to minimize the cost of the administration of the partnership.

Tuesday, February 21, 2006

Negotiation

Each node in the P2P backup system must be able to negotiate it's own service level agreements based on the needs specified by the user. Assuming that a node engages in reciprocol storage relationships with N other nodes in the system, it can agree upon an individual service level agreement at the same time with each of the N nodes based upon the portion of the data it wishes to store on any given one of them. The needed bandwidth to perform an update on the data can be divided among the N nodes in proportion with the amount of data that will be stored on the given node but may not necessarily be equal to the bandwidth the node agrees to supply to its partners.

The duty of enforcing a service level agreement falls upon the supplier. It is in the node's best interest to ensure that none of its partners are consuming more bandwidth or storage than was agreed upon. While enforcing storage space limits has been dealt with in Samsara, a node must be able to restrict a connection from a partner to the agreed upon limit of bandwidth if it needs or chooses to allocate the bandwidth elsewhere. Netnice should work well for this provided it allows the bandwidth provided to a certain connection to be arbitrarily changed.

A malicious node may restrict a connection down beyond the agreed upon bandwidth, but a partner in such a situation would be able to terminate the agreement and find another, more benevolent partner. This would then starve malicious nodes, forcing them in the least to meet the minimum of an agreed upon SLA.

Monday, February 20, 2006

P2P Backup Systems

The System. Peer to peer backup systems offer a relatively cheaper alternative to Internet backup sites while potentially providing the added security of spreading backed up data across multiple, geographically distant hosts. Reliability in such systems depends on ensuring resources are available perform the backup, store the data on a remote host, and retrieve the data when needed. This requires that a large portion of the hosts meet the needs of other peers to store and supply data when needed and that the system ensure no single host is unexpectedly or greedily consuming resources (bandwidth or storage) such that it deprives other hosts from properly using the system.

Commercialization of these systems then require each host to make and conform to service level agreements, which simultaneously predicts the hosts backup behavior and allows the system to ensure resources are available to service this typical behavior.

Service Level Agreements. A typical service level agreement would be similar to a user stating, "I have this directory that needs backed up every t [minutes, hours, days, weeks, years], and, on average, X bytes of data change in that time frame." Two criterion come from this: the initial size of the directory, which is the size of the storage that will be needed across the system; and the bandwidth needed to meet the update demands, which requires some analysis.

Bandwidth. If at every interval of time, X and only X bytes of data needed to be backed up, the bare minimum bandwidth need for the system to back up the data is X/t. While this may provide a good estimate, it would be wise to assume the user of the backing up node may have spikes and lulls in the amount of data that needs backed up from interval to interval. Thus, a measure of the standard deviation in this average amount of data to be backed up would allow an upper bound to be determined on the amount of bandwidth needed to service the majority of the user's needs. Furthermore, it would allow the system to reject or charge extra for needs that exceed this agreed upon SLA.

A corollary to this argument is that, if the bandwidth is available, the node should consume all unused bandwidth to achieve a consistent, backed up state as soon as possible. Under the conditions, each node must still be guaranteed the minimum bandwidth it needs to meet its needs, which requires the system to be able to back off a node that is consuming more bandwidth that it needs in order to ensure that any other node receives its share of the bandwidth.