Nonlinear Filtering of Noisy Data

Abstract

Median filters, implemented in Perl, remove noise from data. The example application demonstrates the Perl inline module and uses data generated by the Hofstadter chaotic sequence.

Test Data

The Hofstadter sequence is computed by this Perl code:
#!/usr/bin/perl
 
for ($i=1; $i<4000; $i++)
{
  print $i," ",qhof($i),"\n";
}
 
sub qhof
{
  my ($n)= @_;
  if ($n==1 or $n==2) {return 1}
  $qc{$n} ||= &qhof($n - &qhof($n-1))+ &qhof($n - &qhof($n-2));
  return  $qc{$n}
} 

Of note is use of a hash for the computed values of the qhof() subroutine and the use of ||= in the assignment. This the 'Orcish Maneuver' in Perl. A more general approach is to use the Memoize module. The ||= operator has the effect of caching the value returned by the recursive subroutine. In this program it speeds up the otherwise extremely slow recursive calculation. While the perl code is much slower than the C code equivalent, this example shows how the algorithmic improvement (that is, caching) can often overcome raw efficiency, and minimize the coding effort.

Using the inline module without caching, the code looks like this:

#!/usr/bin/perl

use Inline C => 'DATA';

for ($i=1; $i<4000; $i++)
{
  print qhof($i),"\n";
}
 
__END__
 
__C__
 
int qhof(int n) {
  if ((n==1) || (n==2)) return 1;
  return qhof(n-qhof(n-1))+qhof(n-qhof(n-2));
}

The Hofstadter sequence makes nice test data for filtering. It looks like this:

This graph was made with gnuplot using these commands:

set terminal gif
set output "/perl/smoothing/hofstadter.gif"
plot "pts"
Where "pts" is a file created by the perl program above.

The nice thing about the Hofstadter sequence is that it looks like a noisy version of y=x/2.

set terminal gif
set xrange [0:4000]
set linestyle 3 lw 2
plot "pts",x/2 ls 3 
Since lines with width don't work with gif output in gnuplot, I used xwd instead to grab the bitmap from the Gnuplot graphics window:
xwd -name Gnuplot > x.xwd
convert x.xwd hof_fit1.gif
Here is the code for the median filter:
#!/usr/bin/perl 

my $cnt=0;
my ($i, $j, $p);
my @pts;
my @arr;
my $median_samples=25;

while(<>)
{
  chomp;
  ($i, $p)=split(/ +/,$_);
  push @pts, $p;
  $cnt++;
}

my $hmedian= int($median_samples/2);
for ($i=0; $i<=$cnt-$median_samples; $i++)
{
  my @arr;
  for ($j=0; $j<=$median_samples; $j++)
  {
    $arr[$j]= $pts[$i+$j]
  }
  my @sorted= sort {$a <=> $b} @arr;
  push @filtered, $sorted[$hmedian];
}

for ($i=0; $i< $hmedian; $i++)
{
  unshift @filtered, $filtered[0];
  push @filtered, $filtered[$#filtered];
}

# print map "$_\n", @filtered;
for ($i=0; $i< $cnt; $i++)
{
  print $i," ", $filtered[$i],"\n";
}
Here is the result of the fit, compared to the original sequence and output:

This filter does a nice job of noise reduction, but there are other filters that will similarly well on this type of data. A moving average filter is one example. A moving average filters is one special case of a wider class of filters that can be designed to remove this type of noise. The median filter has several advantages over moving average filters, so now I will do a comparison, to see if for some noise reduction tasks the median filter can outperform the moving average filter.

Moving Average Filter

The moving average can be implemented like this in perl:
#!/usr/bin/perl

use strict;

my $cnt=0;
my ($i, $p);
my @pts;
my $mov_avg_pts=25;
my $total=0;

while(<>)
{
  chomp;
  ($i, $p)=split(/ +/,$_);
  push @pts, $p;
  $cnt++;
}

for ($i=0; $i<$mov_avg_pts; $i++)
{
  $total += $pts[$i]
}

for ($i=$mov_avg_pts; $i<$#pts; $i++)
{
  print $i," ",$total/$mov_avg_pts,"\n";
  $total += $pts[$i]-$pts[$i-$mov_avg_pts];
}

This code takes advantage of the total of the moving average being incrementally calculated from the previously calculated moving average.

For this data, the moving average performs just as well as the median filter. It is also much less computational effort. To see why you might want a median filter, a different data set will be required.

One advantage of the median filter is that it rejects outliers well. A dataset with a 10% dropout rate can be created with this code:

#!/usr/bin/perl

use strict;

my $cnt=0;
my ($i, $p);
my @pts;
my @dpts;

while(<>)
{
  chomp;
  ($i, $p)=split(/ +/,$_);
  push @pts, $p;
  $cnt++;
}

@dpts= map { (rand(1)<.1) || $_ } @pts;
for ($i=0; $i<$#dpts; $i++)
{
  print $i," ",$dpts[$i],"\n";
}
After filtering the data with both a median filter and a moving average filter, Gnuplot graphs can be created with these commands:
set terminal gif
set output "/perl/smoothing/hof_ma2.gif"
plot "pts.drops","pts.drops.movavg","pts.drops.mfilt",x/2

This shows that the median filter rejects the dropouts while the moving average is substantially lowered by the dropouts.

These types of dropouts are common when a data creation process and a data collection process are not contained within a single process. Examples include:

02/17/2001

By toma