~mcepl/num-utils

5567a5242f3d689cfc2d6878738bd9316feb4409 — Suso Banderas 19 years ago c1ba275 master 0.5
Fifth Beta Release v0.5

- Cleaned up a few bits of code in numsum.
- Adjusted copyright years to 2004.
- Added -c,-r,-x,-y options to numsum to allow it to sum columns
  and rows and specify which rows and columns you want.
  * Special thanks to the folks on #perl on irc.freenode.org for
    helping me out with the regex for this one.
- Added -<n> number hack to numsum to quickly specify the column
  number that you want to sum.  For now it only does columns
  1 through 9 because I need to figure out how to get past
  Getopt::Std to get the whole number specified.  Since this is
  such a hack, I might just leave it as is or drop it.
- Made some changes to the GOALS file in regards to what options
  are being planned.
17 files changed, 296 insertions(+), 58 deletions(-)

M CHANGELOG
M GOALS
M VERSION
M average
M bound
M interval
M normalize
M num-utils.spec.in
M numgrep
M numprocess
M numsum
M random
M range
M round
M template
A tests/columns
A tests/columns2
M CHANGELOG => CHANGELOG +29 -0
@@ 1,4 1,33 @@

 Nov. 19th, 2004 04:53 GMT
----------------------------
Fifth Beta Release v0.5

 - Cleaned up a few bits of code in numsum.
 - Adjusted copyright years to 2004.


 Aug. 28th, 2004 06:40 GMT
----------------------------

 - Added -c,-r,-x,-y options to numsum to allow it
 to sum columns and rows and specify which rows and columns you want.

  * Special thanks to the folks on #perl on irc.freenode.org for helping
    me out with the regex for this one.

 - Added -<n> number hack to numsum to quickly specify the column number
  that you want to sum.  For now it only does columns 1 through 9 because
  I need to figure out how to get past Getopt::Std to get the whole number
  specified.  Since this is such a hack, I might just leave it as is or drop it.


 Aug. 25th, 2004 19:56 GMT
----------------------------

 - Made some changes to the GOALS file in regards to what options are being planned.


 Sept. 23rd, 2003 22:23 GMT
----------------------------
 Fourth Beta Release v0.4

M GOALS => GOALS +32 -20
@@ 97,6 97,7 @@ X   -d   -- Debug information for developers.  This implies verbose and more.

     o tar format, both gz and bz2
     o rpm package
     o gentoo ebuild tree
     o deb package

 7. The Makefile should be setup to be able to create the different package


@@ 180,16 181,16 @@ X   -d   -- Debug information for developers.  This implies verbose and more.

   -a   -- Add all numbers in the file, not just the first ones found on each
           line.   
   -c   -- Treat each line as a set of columns separated by white space, or a
X  -c   -- Treat each line as a set of columns separated by white space, or a
           string if the -s option is used.  Sum up the values in each column
           and print out the result of each column seperated by the seperation
           character.  This is shown above in the advanced examples section.
   -s <string>  [for columns]
X  -s <string>  [for columns]
        -- Use <string> as the separator between each column.  This is allowed
           to be more than one character and possibly even a number.
   -r   -- Treat each line (by default) as a row of numbers to sum up.  The
X  -r   -- Treat each line (by default) as a row of numbers to sum up.  The
           results of the sums of each row will be printed on seperate lines.
   -s <string>  [for rows]
   -s <string>  [for rows] 
        -- When used with the -r flag, this will specify the seperator for
           rows.  By default it is the new line character.  It could be a
           character, set of characters or even a number.


@@ 207,10 208,18 @@ X   -d   -- Debug information for developers.  This implies verbose and more.

 Options that might be included eventually.

   -<n> -- Where <n> is some number.  This would be a shortcut for adding up
X  -x <n> -- Where <n> is some number.  This would be a shortcut for adding up
           all the numbers in the <n>th column of the input.  By default, the
           columns would be determined by white space, but could also be
           determined by the -s flag.
           determined by the -s flag.  This must be used with the -c or -r flag.
           So you can do something like:
           
           $ numsum -c -x 10 access_log  

           To get the total bytes transfered in an access_log.

           Maybe this could also be able to handle comma seperated values, so
           1,5,10 would sum up the 1st, 5th and 10th columns.


-- numgrep --


@@ 238,7 247,7 @@ X  o search for numbers from -10 to 10.

      numgrep /-10..10/ data.txt

   o search for numbers that are multiples of 7
X  o search for numbers that are multiples of 7
   
      numgrep /m7/  data.txt



@@ 253,7 262,7 @@ X  o search for numbers from -10 to 10.

   o seach for numbers that are in the set 1, 4, 7, 10, 13 and 16
   
      numgrep /1..16%3/ data.txt
      numgrep /1..16i3/ data.txt

 Usage options  (in addition to the standard options)



@@ 317,17 326,17 @@ mode values.
    
  Usage options:

   -m   -- Print out the mode value of all the numbers entered.  The mode is
X  -m   -- Print out the mode value of all the numbers entered.  The mode is
           the most frequently occuring value in the set.

   -M   -- Print out the median of the set of numbers entered.  The median is
X  -M   -- Print out the median of the set of numbers entered.  The median is
           the middle value all all numbers encountered.  So if the numbers
           88, 12, 2, 1, 9, 100 and 1000 are encountered, the median of that
           set is 12.  Illustrated:

              1 2 9 12 88 100 1000
                    ^^
   -l   -- Use the lower number of the median on even counted sets.
X  -l   -- Use the lower number of the median on even counted sets.

   -a   -- average all numbers in the file, not just the first ones found on
           each line.   


@@ 349,10 358,10 @@ mode values.

 Options that might be included eventually.

   -<n> -- Where <n> is some number.  This would be a shortcut for averaging
   -n <n> -- Where <n> is some number.  This would be a shortcut for averaging
           all the numbers in the <n>th column of the input.  By default, the
           columns would be determined by white space, but could also be
           determined by the -s flag.
           determined by the -s flag.  This must be used with the -c or -r flags.





@@ 361,9 370,12 @@ mode values.

-- normalize --

  This program will distribute a group of numbers between 0 and 1 by default according
  to their initial value.  You can change the range using the -R option.

  Usage options:

  -R <range>  --  THis is for specifying a range to normalize for instead of 0..1
X  -R <range>  --  This is for specifying a range to normalize for instead of 0..1





@@ 376,14 388,14 @@ mode values.

   Options:
   
    -n <n>  -- Round to the nearest factor of <n>.  Instead of just rounding all
X   -n <n>  -- Round to the nearest factor of <n>.  Instead of just rounding all
              decimal numbers, you can also round to a factor of any number.  So
              if you set <n> to 1000 and you encounter the number 6777, it will
              round that number to 7000.  If you set <n> to 3 and encounter the
              number 7, it will round it to 6.
 
    -c  -- Find the ceiling of each number encountered.  Round up.
    -f  -- Find the floor of each number.  Round down.
X   -c  -- Find the ceiling of each number encountered.  Round up.
X   -f  -- Find the floor of each number.  Round down.


-- range --


@@ 410,11 422,11 @@ mode values.

   -p <prefix> -- Put the following prefix before each number.
   -s <suffix> -- Put the following suffix after each number.
   -n <separator> -- Use <separator> as a character to separate each number.
X  -n <separator> -- Use <separator> as a character to separate each number.
                    By default, a space is used.  Use the sequence \n to specify
                    a newline.
   -N          -- Shortcut for using a newline separator.
   -e <set>
X  -N          -- Shortcut for using a newline separator.
X  -e <set>
      -- Exclude the numbers in <set> from the output.  This is so that if
         you want to do a complex range without including certain numbers.
         <set> is a list of numbers seperated by a ','.

M VERSION => VERSION +1 -1
@@ 1,1 1,1 @@
0.4
0.5

M average => average +1 -1
@@ 2,7 2,7 @@

# average:  Find the average of a set of numbers.
#
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M bound => bound +1 -1
@@ 2,7 2,7 @@

# bound: Find boundary numbers in files or STDIN.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M interval => interval +1 -1
@@ 4,7 4,7 @@
# first line and the second, between the second line and the third on
# through the end of the file.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M normalize => normalize +1 -1
@@ 2,7 2,7 @@

# normalize:  Normalize a set of numbers. By default between 0 and 1.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M num-utils.spec.in => num-utils.spec.in +7 -4
@@ 1,4 1,4 @@
# $Id: num-utils.spec.in,v 1.11 2003/09/23 22:26:12 suso Exp $
# $Id: num-utils.spec.in,v 1.12 2004/11/19 04:55:01 suso Exp $

Summary: num-utils are a set of programs for dealing with numbers.
Name: num-utils


@@ 13,9 13,9 @@ Packager: Suso Banderas <suso@suso.org>
Vendor: suso.org
%description
The num-utils, short for numeric utilities are a set of programs designed
to work together from the Unix shell to do numeric operations on input.
They are basically the numeric equivilent of common Unix text utilities
and aim to help complete the Unix shell vocabulary.
to work together from the unix shell to do numeric operations on input.
They are basically the numeric equivilent of common unix text utilities
and aim to help complete the unix shell vocabulary.

%prep
%setup


@@ 54,6 54,9 @@ make ROOT="$RPM_BUILD_ROOT" rpminstall


%changelog
* Fri Nov 19 2004 Suso Banderas <suso@suso.org>
- 0.5 release

* Tue Sep 23 2003 Suso Banderas <suso@suso.org>
- 0.4 release
- added file entries for normalize program.

M numgrep => numgrep +1 -1
@@ 3,7 3,7 @@
# numgrep:  This program is the numeric equivilent of the grep
# utility.  It searches for numbers, sets of numbers and so on.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M numprocess => numprocess +1 -1
@@ 2,7 2,7 @@

# numprocess: This program mutates numbers as it encounters them.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M numsum => numsum +183 -20
@@ 3,7 3,7 @@
# numsum:  This program adds up all numbers it encounters and prints out
#          the total at the end on STDOUT. 
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License


@@ 26,12 26,12 @@
#######################

use Getopt::Std;
use strict;
#use strict;
use vars qw/ %opts /;

my ($file, $finalsum, @number_array, $verbose);
my ($file, $finalsum, @number_array, @columns_to_print, @column_output, $output_line, $string_seperator, $verbose);

getopts('dhiIqV', \%opts);
getopts('123456789cdhiIqrs:Vx:y:', \%opts);


if ($opts{'h'}) {


@@ 51,6 51,30 @@ if ($opts{'d'}) {
    $verbose = 1;  # Normal output.
}

# I'm not sure whether this should stay in or not.  It's mainly just for convience.
# It's a bad hack because you can only specify 1 through 9.  Maybe sometime I'll figure
# out how to bypass getopts to specify extra options.
for ($n = 1 ; $n < 10; $n++) {
    if ($opts{$n}) {
        $opts{'c'} = 1;
        $opts{'x'} = $n;
    }
}

if ($opts{'x'}) {
    @columns_to_print = split(/,/, $opts{'x'});
}

if ($opts{'y'}) {
    @rows_to_print = split(/,/, $opts{'y'});
}

if ($opts{'s'}) {
    $string_seperator = $opts{'s'};
} else {
    $string_seperator = "\\s+";
}


################
# MAIN PROGRAM #


@@ 68,19 92,69 @@ if (@ARGV) {
    process_filehandle(\*STDIN, \@number_array);
}

$finalsum = add_array(\@number_array);
my $number_array_length = @number_array;

if ($opts{'i'}) {
    $finalsum = int($finalsum);
} elsif ($opts{'I'}) {
    if ($finalsum == int($finalsum)) {
        $finalsum = 0;
if ($opts{'c'}) { # Handle column sum output.
  
    my @column_output = ();
    my $i;
    for ($i = 0; $i < $number_array_length; $i++) {
        my @add_this_array = split(/$string_seperator/, $number_array[$i]);
        sum_columns(\@column_output, \@add_this_array);
    }

    # Now print out the columular output.
    if ($opts{'x'}) {
        my @final_output = ();
        foreach $column_number (@columns_to_print) {
            push (@final_output, $column_output[$column_number-1]);
        }
        $output_line = join(" ", @final_output);
    } else {
        $finalsum =~ s/^(\-?)[0-9]*\.([0-9]*)$/$1.$2/;
        $output_line = join(" ", @column_output);
    }
}

print "$finalsum\n";
    print "$output_line\n";

} elsif ($opts{'r'}) { # Handle row sum output.

    my $i;
    if ($opts{'y'}) {
        foreach $row_number (@rows_to_print) {
            my @add_this_array = split(/ /, $number_array[$row_number-1]);
            my $row_sum = 0;
            foreach $row_value (@add_this_array) {
                $row_sum += $row_value;
            }
            print "$row_sum\n";
        }
    } else {
        for ($i = 0; $i < $number_array_length; $i++) {
            my @add_this_array = split(/ /, $number_array[$i]);
            my $row_sum = 0;
            foreach $row_value (@add_this_array) {
                next if ($row_value !~ m/^[0-9]+$/);
                $row_sum += $row_value;
            }
            print "$row_sum\n";
        }
    }

} else {
    $finalsum = add_array(\@number_array);

    if ($opts{'i'}) {
        $finalsum = int($finalsum);
    } elsif ($opts{'I'}) {
        if ($finalsum == int($finalsum)) {
            $finalsum = 0;
        } else {
            $finalsum =~ s/^(\-?)[0-9]*\.([0-9]*)$/$1.$2/;
        }
    }

    print "$finalsum\n";
}

exit(0);



@@ 103,6 177,15 @@ Options:
        -i      Only return the integer portion of the final sum.
        -I      Only return the decimal portion of the final sum

        -c      Print out the sum of each column.
        -r      Print out the sum of each row.

        -x <n>  Specify a comma seperated list of columns to print.
        -y <n>  Specify a comma seperated list of rows to print.

        -s <string> Specify a seperator string for spliting columns.
                    This defaults to consecutive whitespace.

        -d      Debug. For developers only.
        -h      Help: You're looking at it.
        -V      Increase verbosity.


@@ 117,14 200,34 @@ sub process_filehandle {
    my $number_array_ref = shift;

    while (<$filehandle>) {
        if ($_ =~ /^\s*(\-?[0-9]*\.?[0-9]+)/) {
            print STDERR "number: $1\n" if ($verbose >= 3);
            push(@$number_array_ref, $1);
        my $line = $_;
        chomp($line);
        if ($opts{'c'} || $opts{'r'}) {   # Process columns or rows

            # Make each line column friendly by changing all non-numeric words into 0.


            # Some ideas from the #perl channel on irc.freenode.org.  Thanks to v, dkr, Khisanth and dudeman

            #$input = join ' ', map { /\D/ ? 0 : $_ } split / +/, $input;
            #s!(\S+)!$1=~/\D/?0:$1!ge;
          
            # This one works best. 
            $line =~ s!(\S+)!$1=~/\D/?0:$1!ge;

            push(@$number_array_ref, $line);

        } else {   # Normal processing.
            if ($line =~ /^\s*(\-?[0-9]*\.?[0-9]+)/) {
                print STDERR "number: $1\n" if ($verbose >= 3);
                push(@$number_array_ref, $1);
            }
        }
    }
    return 1;
}


# Function for adding up numbers
sub add_array {
    my $arrayref = shift;


@@ 138,6 241,23 @@ sub add_array {
    return $runningtotal;
}

# Function for summing up the columns.
sub sum_columns {
    my $sum_array_ref = shift;
    my $input_array_ref = shift;


    my $input_array_length = @$input_array_ref;
    my $x;
    for ($x = 0; $x < $input_array_length; $x++) {
        if (${$sum_array_ref}[$x]) {
            ${$sum_array_ref}[$x] += ${$input_array_ref}[$x];
        } else {
            ${$sum_array_ref}[$x] = ${$input_array_ref}[$x];
        }
    }
    return 1;
}

# Lay down some of that perl pod action.
=pod


@@ 148,11 268,11 @@ numsum - numsum program file

=head1 SYNOPSIS

B<numsum> [-iIdhv] <FILE>
B<numsum> [-iIcdhrsvxy] <FILE>

| B<numsum> [-iIdhv]   (Input on STDIN from pipeline.)
| B<numsum> [-iIcdhrsvxy]   (Input on STDIN from pipeline.)

B<numsum> [-iIdhv]     (Input on STDIN.  Use Ctrl-D to stop.)
B<numsum> [-iIcdhrsvxy]     (Input on STDIN.  Use Ctrl-D to stop.)


=head1 DESCRIPTION


@@ 167,6 287,15 @@ handles negative numbers and numbers with decimals.
    -i  Only return the integer portion of the final sum.
    -I  Only return the decimal portion of the final sum.

    -c      Print out the sum of each column.
    -r      Print out the sum of each row.

    -x <n>  Specify a comma seperated list of columns to print.
    -y <n>  Specify a comma seperated list of rows to print.

    -s <string> Specify a string to use as a seperator for columns.
                This defaults to be consecutive whitespace (\s+).

    -h  Help: You're looking at it.
    -V  Increase verbosity.
    -d  Debug mode.  For developers


@@ 187,12 316,46 @@ Enter your own numbers on STDIN. The last number is the answer.
    223

Use it in a command pipeline.
    $ ls -1s | numsum
    $ ls -1s | grep .mp3 | numsum -c -x 5
    72288

Add up the total byte count in a http log file.
    $ cat access_log | awk {'print $10'} numsum

    or 

    numsum -c -x 10 access_log

Add up the columns of numbers of a file.

    $ cat columns
    1 6 11 16 21
    2 7 12 17 22
    3 8 13 18 23
    4 9 14 19 24
    5 10 15 20 25
    $ numsum -c columns
    15 40 65 90 115

Add up the 1st, 2nd and 5th columns only.

    $ numsum -c -x 1,2,5 columns
    15 40 115    

Add up the rows of numbers of a file.

     $ numsum -r columns
     55
     60
     65
     70
     75

Add up the 2nd and 4th rows.

     $ numsum -r -y 2,4 columns
     60
     70

=head1 SEE ALSO


M random => random +1 -1
@@ 2,7 2,7 @@

# random: Print out a random number.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M range => range +25 -4
@@ 2,7 2,7 @@

# range: Print out a range of numbers for use in for loops and such.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License


@@ 30,7 30,10 @@ use vars qw/ %opts $verbose /;

my ($expression, @output_array, $number);

getopts('e:n:NhdV', \%opts);
getopts('e:n:Np:s:hdV', \%opts);

my $prefix = "";
my $suffix = "";

if ($opts{'h'}) {
    &help();


@@ 50,6 53,15 @@ if ($opts{'d'}) {
}

my $range_expression = shift || die "You must specify a range expression\n";

# A shortcut for specifying a prefix and suffix.
$range_expression =~ m/^([^\/]+)\//;
$prefix = $1;
$range_expression =~ s/^[^\/]+\//\//;
$range_expression =~ m/\/([^\/]+)$/;
$suffix = $1;
$range_expression =~ s/\/[^\/]+$/\//;

$range_expression =~ s/^\///;
$range_expression =~ s/\/$//;
my @range_expressions = split(/,/, $range_expression);


@@ 58,6 70,12 @@ my $separator = $opts{'n'} || " ";
$separator =~ s/\\n/\n/g;
$separator = "\n" if ($opts{'N'});

if ($opts{'p'}) {
    $prefix = $opts{'p'};
}
if ($opts{'s'}) {
    $suffix = $opts{'s'};
}

# Make the array of excluded numbers
my @excludes;


@@ 119,9 137,9 @@ my $i = 0;
my $max = @output_array;
while ($i < $max) {
    if ($i == ($max - 1)) {
        print $output_array[$i];  # No need to put a separator for the last element.
        print $prefix . $output_array[$i] . $suffix;  # No need to put a separator for the last element.
    } else {
        print $output_array[$i] . $separator;
        print $prefix . $output_array[$i] . $suffix . $separator;
    }
    $i++;
}


@@ 150,6 168,9 @@ Options:
                is a space, use '\n' or \\n for newline or the -N option.
        -N      Just a quick option for using a newline as the separator.

        -p <string>  Specify a prefix to use for every number output.
        -s <string>  Specify a suffix to use for every number output.

        -d      Debug. For developers only.
        -h      Help: You're looking at it.
        -V      Increase verbosity.

M round => round +1 -1
@@ 2,7 2,7 @@

# round: A program that rounds off numbers it encounters.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

M template => template +1 -1
@@ 3,7 3,7 @@
# template: This is the default template file used to create
# new files.
#   
# Copyright (C) 2002-2003 Suso Banderas
# Copyright (C) 2002-2004 Suso Banderas

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License

A tests/columns => tests/columns +5 -0
@@ 0,0 1,5 @@
1 6  11 16 21 26 31 36 41 46 51
2 7  12 17 22 27 32 37 42 47 52 57 62 67 72
3 8 13 18 23 28  33 38 43 48 53 58 63 68 73 78 83 88 93 98  103
4 9 14 19 24 29  34 39 44 49 54 59 64 69 74 79 84 89 94 99  104
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 

A tests/columns2 => tests/columns2 +5 -0
@@ 0,0 1,5 @@
1 a 6 11 16 c2t     21
2 b 7 12  17 d0g 22
3 c 8    13 18 33 23
4 d 9 14 19 7ark 24
5 e 10 15 20    m0us3   25