## Calculating typical network data rate

I have a Windows service that monitors internet connectivity by periodically fetching a web page from a well known web site.

As I’ve recently been having problems with the speed of my internet connection I thought it would be useful to also download a reference file a few times each day and time how long it takes in order to monitor download speeds.

The service is configured to run the download speed test several times, at half hour intervals, in the early hours of the morning.

So, I now have a set of download speed samples for each day but how should these speeds be reported ? The minimum, maximum, or average ?

The problem with the minimum is that it may be spurious. Most internet connections suffer reduced download speed from time to time and it’s also possible that a low download speed may reflect bandwidth limitations in the local network or higher than usual CPU activity on the server where the test is running.

To some extent this applies to all values less than the maximum sample value. The download speed samples should, ideally, all be a single consistent value rather than a range of values fitting, for example, a normal distribution curve. But in the real world, there will be a range of values and I’d like to reject any samples that are perhaps not an accurate reflection of typical download speeds.

There are some statistical methods for rejecting outlier samples such as Chauvenet’s Criterion, Grubb’s Test and Peirce’s Test but they are all questionable techniques, especially when applied without any consideration of the nature of the data and its distribution.

Fortunately, I didn’t need a rigorously correct method, I just needed a means to reject low value samples that were not an accurate reflection of download speed and would skew any attempt to determine a typical download speed.

I decided that Chauvenet’s method would be quick and simple to implement but with a small twist. This method relies on calculating the standard deviation for the sample set and then rejecting any samples that are more than one standard deviation from the mean. However, while this sample set may contain spurious low values, the high values are accurate because it’s not possible to have values beyond the actual maximum download speed that the connection is capable of. So, rather than reject all samples more than one standard deviation from the mean, the method I implemented only rejects samples that are below the mean and more than one standard deviation from the mean.

The mean of the remaining samples is calculated and presented as the ‘typical download speed’.

An example set of samples looks like this (download speed in bits per second):

6391320

6052659

6264538

5965232

5500726

5723570

5854644

5867441

5822894

4742675

A graph of these values helps show that while most of the values are in a fairly narrow range, the first and last samples are, arguably, outliers:

Plotting the absolute standard deviation shows that the first and last values, with standard deviation greater than 1, are definitely candidates for outlier values and that sample three is right on the threshold:

The first value, 6391320, is 1.25 standard deviations greater than the mean of 5818570 while the last value, 4742675, is 2.35 standard deviations below the mean. Both these samples would be rejected when using Chauvenet’s Criterion but the modified version I’ve implemented only rejects the lower value sample.

An average of all the samples gives a value of 5818570. Applying Chauvent’s Criterion and rejecting the first and last samples results in an average of the remaining samples of 5881463, while using the modified version and rejecting only the last sample gives a sample average of 5938114.

While this modified version of Chauvent’s Criterion skews the result towards the higher value samples, I believe that this is correct for this connection speed data.

## Leave a Reply