Tuesday, April 29, 2008

Further perf ruby, python C++ file reading

Following on from the log files article I decided to do some basic perf checks of ruby and python reading text files. The results were a little disapointing - performance was roughly the same, so my ruby log file reading optimisation was complete rot.



Further experimentation required.































ARGV.each do | param |
cc = 0
File.new(param, 'r').each_line do |line|
cc += line.size
end
puts "File has #{cc} characters"
end

Processing /Users/gcb/work/log-analysis/cc.rb ... created /Users/gcb/work/log-analysis/cc.rb.html


Realy simple script - and probably the most obvious - add up the length of all the lines in the file.


File has 1673435763 characters

real 0m56.035s
user 0m33.873s
sys 0m3.609s





ARGV.each do | param |
cc = 0
i = File.open(param, "r")
begin
line = i.readline()
until line.nil?
cc += line.size
line = i.readline()
end
rescue Exception => e
ensure
i.close
end
puts "File has #{cc} characters"
end

Processing /Users/gcb/work/log-analysis/cc1.rb ... created /Users/gcb/work/log-analysis/cc1.rb.html


Based on previoud observations this one uses the realine method from the IO library but did not affect the performance.


File has 1673435763 characters

real 0m55.569s
user 0m35.506s
sys 0m3.451s




import sys
cc = 0

source = open(sys.argv[1])
for line in source:
cc += len(line)
source.close()
print 'file has ', cc, ' characters'

Processing /Users/gcb/work/log-analysis/cc.py ... created /Users/gcb/work/log-analysis/cc.py.html


As a benchmark a simple python scrpt - again adding up all the line lengths in the file.


file has 1673435763 characters

real 0m53.462s
user 0m23.147s
sys 0m3.781s




#include <stdio.h>


int main(int argc, char** argv)
{
int count = 0;
FILE* f = fopen(argv[1], "r");

while (getc(f))
count++;

printf("File has %d characters\n", count);
}

Processing /Users/gcb/work/log-analysis/cc.cpp ... created /Users/gcb/work/log-analysis/cc.cpp.html


Baseline written in C++


File has 1673392372 characters

real 0m53.167s
user 0m31.473s
sys 0m3.094s




#include <stdio.h>


int main(int argc, char** argv)
{
int count = 0;
FILE* f = fopen(argv[1], "r");

char buffer[512];
int read = fread(buffer, 1, 512, f);

while (read > 0) {
count += read;
read = fread(buffer, 1, 512, f);
}

printf("File has %d characters\n", count);
}

Processing /Users/gcb/work/log-analysis/cc1.cpp ... created /Users/gcb/work/log-analysis/cc1.cpp.html


A (poor) buffered version of the baseline written in C++


File has 1673435763 characters

real 0m52.425s
user 0m1.526s
sys 0m4.473s



No comments: