S h o r t S t o r i e s

// Tales from software development

Calculating typical network data rate

leave a comment »

I have a Windows service that monitors internet connectivity by periodically fetching a web page from a well known web site.

As I’ve recently been having problems with the speed of my internet connection I thought it would be useful to also download a reference file a few times each day and time how long it takes in order to monitor download speeds.

The service is configured to run the download speed test several times, at half hour intervals, in the early hours of the morning.

So, I now have a set of download speed samples for each day but how should these speeds be reported ? The minimum, maximum, or average ?

The problem with the minimum is that it may be spurious. Most internet connections suffer reduced download speed from time to time and it’s also possible that a low download speed may reflect bandwidth limitations in the local network or higher than usual CPU activity on the server where the test is running.

To some extent this applies to all values less than the maximum sample value. The download speed samples should, ideally, all be a single consistent value rather than a range of values fitting, for example, a normal distribution curve. But in the real world, there will be a range of values and I’d like to reject any samples that are perhaps not an accurate reflection of typical download speeds.

There are some statistical methods for rejecting outlier samples such as Chauvenet’s CriterionGrubb’s Test and Peirce’s Test but they are all questionable techniques, especially when applied without any consideration of the nature of the data and its distribution.

Fortunately, I didn’t need a rigorously correct method, I just needed a means to reject low value samples that were not an accurate reflection of download speed and would skew any attempt to determine a typical download speed.

I decided that Chauvenet’s method would be quick and simple to implement but with a small twist. This method relies on calculating the standard deviation for the sample set and then rejecting any samples that are more than one standard deviation from the mean. However, while this sample set may contain spurious low values, the high values are accurate because it’s not possible to have values beyond the actual maximum download speed that the connection is capable of. So, rather than reject all samples more than one standard deviation from the mean, the method I implemented only rejects samples that are below the mean and more than one standard deviation from the mean.

The mean of the remaining samples is calculated and presented as the ‘typical download speed’.

An example set of samples looks like this (download speed in bits per second):

6391320
6052659
6264538
5965232
5500726
5723570
5854644
5867441
5822894
4742675

A graph of these values helps show that while most of the values are in a fairly narrow range, the first and last samples are, arguably, outliers:

download_speed

Plotting the absolute standard deviation shows that the first and last values, with standard deviation greater than 1, are definitely candidates for outlier values and that sample three is right on the threshold:

download_speed_deviation

The first value, 6391320, is 1.25 standard deviations greater than the mean of 5818570 while the last value, 4742675, is 2.35 standard deviations below the mean. Both these samples would be rejected when using Chauvenet’s Criterion but the modified version I’ve implemented only rejects the lower value sample.

An average of all the samples gives a value of 5818570. Applying Chauvent’s Criterion and rejecting the first and last samples results in an average of the remaining samples of 5881463, while using the modified version and rejecting only the last sample gives a sample average of 5938114.

While this modified version of Chauvent’s Criterion skews the result towards the higher value samples, I believe that this is correct for this connection speed data.

Advertisements

Written by Sea Monkey

July 16, 2014 at 6:00 pm

Posted in Environments, General

Tagged with ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: