Sunday, December 21, 2008

SCMRSS reaches Alpha 1 release

SCMRSS is a simple web application that turns Source Control events into an RSS feed. Written in Ruby using the Ramaze web framework. Once configured the web server polls the source control repository for changes and when found delivers those changes as a simple RSS feed.

This is the first alpha release so I would welcome any and all feedback either here or as issues at RubyForge

Saturday, December 20, 2008

Remote Pair Programming setup

For a few years now I have been looking for a solution that allows two or more people to share an editing session who are not sat side by side.

Each OS has its own preferred way of sharing a desctop/workspace for collaborative working. Some work cross platform (e.g. VNC) and some work via the internet (e.g. GoToMyPC). But for much of my daily work I want a nice fast way of sharing the code that I am working on with a collegue. One or both of us are likely to be behind firewalls, proxies and all sort of important security that makes collaborative remote working so very very hard.

On my current project we are using Eclipse as our development IDE, a local IRC server for ad-hoc team communication and point to point instant messaging. Backed up by Skype for person to person video and occasionaly more traditional email and phone for that personal touch.

All in all this is working fiine and I have to admit I had forgotten how useful IRC is in comparison to IM when it comes to team working. Just having everyone aware of the conversations that are taking place can be a real boon.

I had some spare time last evening and decided to see if I could track down a viable solution to the remote pair programming. Confining my requirements to either complete desktop sharing or Eclipse based paring helped quite a bit because the other added complication is that we have a mixed OS development team including Windows, Linux and Mac OS X. The number of OSs is likely to settle down to just two but at the moment we have quite a mixed bag - which is actually quite refreshing.

So back to the remote pairing issue. I quickly realised that sharing the editing session is likely to be sufficient to our needs and that other tools could provide text, voice and video quite effectively and did not need to be replaced.

My first efforts involved the Eclipse Communication Framework (ECF). I had played about with much earlier versions and concluded that it was a little fussy and difficult to work with for my tastes but decided to give it a go. Working behind a proxy with limited ports meant setting up and running a local server.

I can see a lot of promise in the ECF but it just feels far to heavy weight. I might have made some errors but I could not get it to work, connecting to the server seemed ok (although there were a lot of stack backtraces on startup - now I really cant understand why people still dump these things to the console with the idea that users - even developer users will have a hope of understanding what went wrong. With 3 pages of small text flying past it is just too easy to spot the line that tells you what when wrong. So after an unhappy hour or two trying to get two instances of Eclipse to open a shared editing setting I gave up and went back to searching for alternative.

After a little bit more searching I cam up with XPairtise and open source project that seemed to do exactly what I now wanted - share an editing session. Unusually for an open source project the documentation is pretty good although it was early in the morning and I almost missed that there are two downloads; one for the eclipse plugin and the other for the server.

The setup which I eventually came up with involved the XParitise server and Eclipse running natively on my Mac and a Ubuntu VM mimicing a remote pair. The server is nice and quiet just reporting that it is up and running - a refreshing change. After setting up accounts through the Eclipse preferences pane (a little quirky on the UI the first time around) but it was heart warming to receive the 'account created' message.

Getting the shared editing session to work took some time. First when creating a shared workspace all the files from the project are shipped up to the server. When joining the shared workspace again the project files are brought down so take heed of the backup dialog or work in a different Eclipse workspace for shared working.

It took at least 2 Eclipse restarts to get the shared editing to work and there is a note on the XPairtise site about using the eclipse -clean option to refresh all the plug-ins

But after this initial setup headache I have a configuration that will allow remote pair programming.


Update: I have just rerun the setup within a distributed team (2 locations) and after a bit of a lag in synchronising the project contents everything worked fine with 3 concurrent users (Driver, Navigator and Spectator).

Sunday, October 5, 2008

Setting Windows Shell Font

I keep loosing track of this little tidbit

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 00 /d Consolas


The full article can be found here

Saturday, October 4, 2008


About a month ago I was asked to look at an application that has been
under active development and support for quite some time. Written in C
and C++ (my old stomping grounds for 20 years) I was really looking
forward to getting stuck into the code.

What I was not expecting was the lack of tool support. I sort of knew
that refatoring tools, tests and code coverage would be a bit of a
challenge but I did not think that it would be the challenge that it
turned out to be.

I have gotten used to being able to download tools, try them out and
if appropriate buy them and add them to my toolbox. Ideally there
would be an open source tool I could use instead to get the job
done. Moving back into the C++ world also took me back in time to an
era where evaluation downloads are not available. Instead I would have
to register my interest and someone would be in touch to talk to me
about the wonders of their application and to perhaps run the
application over my source code. Well first I tend to get about a bit
and at the time I was out of my home country and well away from the
tool vendors timezone. I was up against the clock in putting together
some initial findings and recommendations, and lastly it was not my
source code so I could not offer it up to someone else to go poking
around in anyway.

Strike one - keep on searching for someone with a little less of a
protectionist attitude.

Unfortunately this seemed to become a theme. Other suppliers had no
evaluations and insisted on money up front and land shipping. In a way
they may have been doing me a favour, if the software is so complex
that it needs to be demonstrated and the licencing enforcement so
strong that it becomes difficult to use (I have been hit before by
node locked liceses to machines that decide to die) then maybe this
approach has just saved me a lot of heart ache and pain.

On the plus side there are tools out in 'open source' land that can
help and there *are* a few companies that have moved with the times
and provide a more open licensing scheme. When I have time I will list
out the toolkit that I came up with.

Thursday, September 25, 2008

Too many VMs

As part of my work life I regularly need to run windows development software on my Mac. Typically for a client engagement. This is a really convenient setup. I can take a snapshot before I start work on something new and then at the end roll back to the original state ready to work on the next project.

One of the downsides I have encountered recently is the need to keep all these machines up to date with patches and other software updates that seem to be part of everyday life.

Don't get me wrong I think its great that we have this steady stream of updates to make things better but when you start multiplying this up for the VMs I have on my machine the effort to keep them up to date starts to be become a bit of a pain.

I think it is time to start rationalising to a smaller number - just so I can keep them up to date.

Monday, August 4, 2008

Quietly Does it

I have had my eye on a pair of Bose headphones for some time but never managed to justify the cost. And to be honest I am not sure that I ever will. Except that they are truly fabulous and on my first tip with them on a plane left me feeling relaxed and comfortable. The noise onboard a plane seem to raise my tension levels but removing that noise really helped me enjoy the flight.

I was told when I bought them that the batteries were fully charged and would each last 10 hours. I can only testify to 8 on one battery with almost continuous use.

I will confess that I have tried a number of different headphones, both in and out of ear, with and without noise cancelling but nothing comes close to the peace that these phones bring with them.
So maybe I have justified them :)

Toronto - Sunday


Today is my first day in Toronto, preparing for Agile 2008. After a surprisingly smooth and peaceful crossing from the UK, and an even smoother journey to downtown Toronto I find myself preparing some slides for my short experience report 'TeamPace - Keeping the build times down' on Thursday.

I am staying at the Hilton just across the street from the conference at the Sheraton Centre for an early morning start.

Friday, August 1, 2008

Code is Poetry

On a recent visit to the Wordpress sight I was struck by their tag line "Code is poetry". I have been searching for some time for a phrase that would sum up how I feel about well crafted code; code that is easy to understand, read and generally feels 'just right'.

I am not completely convinced that code is poetry - but it comes pretty close.


Next week I will be in Toronto for Agile 2008 to present a short experience report, catch up with friends and immerse myself in all things Agile :)

Monday, May 19, 2008

Trains and WiFi

I am just returning from Leeds to London by train with power and free Wireless internet. Ok it is not the fastest link but works well - even for blogging

Tuesday, May 13, 2008

Keyboard touching

I am definitely with Jeff Atwood on when it is appropriate to touch monitors - NEVER.

Tuesday, April 29, 2008

Further perf ruby, python C++ file reading

Following on from the log files article I decided to do some basic perf checks of ruby and python reading text files. The results were a little disapointing - performance was roughly the same, so my ruby log file reading optimisation was complete rot.

Further experimentation required.

ARGV.each do | param |
cc = 0, 'r').each_line do |line|
cc += line.size
puts "File has #{cc} characters"

Processing /Users/gcb/work/log-analysis/cc.rb ... created /Users/gcb/work/log-analysis/cc.rb.html

Realy simple script - and probably the most obvious - add up the length of all the lines in the file.

File has 1673435763 characters

real 0m56.035s
user 0m33.873s
sys 0m3.609s

ARGV.each do | param |
cc = 0
i =, "r")
line = i.readline()
until line.nil?
cc += line.size
line = i.readline()
rescue Exception => e
puts "File has #{cc} characters"

Processing /Users/gcb/work/log-analysis/cc1.rb ... created /Users/gcb/work/log-analysis/cc1.rb.html

Based on previoud observations this one uses the realine method from the IO library but did not affect the performance.

File has 1673435763 characters

real 0m55.569s
user 0m35.506s
sys 0m3.451s

import sys
cc = 0

source = open(sys.argv[1])
for line in source:
cc += len(line)
print 'file has ', cc, ' characters'

Processing /Users/gcb/work/log-analysis/ ... created /Users/gcb/work/log-analysis/

As a benchmark a simple python scrpt - again adding up all the line lengths in the file.

file has 1673435763 characters

real 0m53.462s
user 0m23.147s
sys 0m3.781s

#include <stdio.h>

int main(int argc, char** argv)
int count = 0;
FILE* f = fopen(argv[1], "r");

while (getc(f))

printf("File has %d characters\n", count);

Processing /Users/gcb/work/log-analysis/cc.cpp ... created /Users/gcb/work/log-analysis/cc.cpp.html

Baseline written in C++

File has 1673392372 characters

real 0m53.167s
user 0m31.473s
sys 0m3.094s

#include <stdio.h>

int main(int argc, char** argv)
int count = 0;
FILE* f = fopen(argv[1], "r");

char buffer[512];
int read = fread(buffer, 1, 512, f);

while (read > 0) {
count += read;
read = fread(buffer, 1, 512, f);

printf("File has %d characters\n", count);

Processing /Users/gcb/work/log-analysis/cc1.cpp ... created /Users/gcb/work/log-analysis/cc1.cpp.html

A (poor) buffered version of the baseline written in C++

File has 1673435763 characters

real 0m52.425s
user 0m1.526s
sys 0m4.473s

Sunday, April 27, 2008

Blogging Code

Blogging Code

I quite often find myself blogging about program source code, that code is typically stored in source files which I then run through a pretty printer (something like source-highlight). Combining everyting together means some copy and pasting - not the most repeatable process and quite often the code and article evolve together - so I end up copying and pasting quite often.

So I came up with mashup. A small ruby program to process html files and handle include directives to do inline include of another file and for this purpose the results of a process

The following source was include with

<x:include value="source-highlight -o STDOUT ~/projects/mashup/mashup"/>

By running:

mashup blogging-code.html > blogging-code-publish.html


ARGV.each do |arg|
contents =

contents.sub!(/<s:include\s+value="([^"]*)"\s*\/>/) do |match|
replacement =$1).read()
replacement.gsub!(/.*<body>/m, '')
replacement.gsub!(/<\/body>.*/m, '')

contents.sub!(/<x:include\s+value="([^"]*)"\s*\/>/) do |match|
replacement = `#{$1}`
replacement.gsub!(/.*<body>/m, '')
replacement.gsub!(/<\/body>.*/m, '')

puts contents

Monday, April 14, 2008

Log files

<br /> Log Files<br />

Log files are one of those must have things for any web application. It is just so hard to predict all of the possible ways users are going to interact with the site that gathering post live information about application behaviour is essential. It does however produce quite a lot of data.

On my current project, after a fairly significant release we resolved to check the log files to see if there were any unexpected incidents that required fixes or web content changes. The log files ran to several gigabytes containing entries not only from the application but also from a very noisy subsystems. After a quick look it became evident that it would not be effective just scanning through the log files but that some cleaning or automated analysis was required.

On the initial scan we noticed that there were some Java stack traces being repeated so something that could capture the distinct stack traces and then list the errors that caused them. In this was we could look at the general issues based on priority (number of occurrences).

Our first effort was to write a fairly simple ruby script using a hash map keyed on the text of the stack trace. Each map entry contained an array of error lines from the log files. We kicked the script off after testing with a small log file and went to lunch.

When we came back the script was still running. Sometime later it ran out of memory - not ideal.

It has been sometime since I have written any C++. Most of my work these days involves Java, C# and a little bit of Ruby for work around the codebase so it took a little while for my C++ brain to kick in. A colleague also took up the challenge by writing a solution in Python.

The results were quite startling with the Python script performing almost as well as the C++ at around 500,000 lines per second.

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <map>
#include <vector>

using namespace std;

class progress {
int count;
progress() {
count = 0;
void ping()
cerr << "\b" << "|/-\\"[count++%4];

typedef map<string, vector<string>*> error_map;

void print_errors(error_map& errors)
for(error_map::iterator iter = errors.begin(); iter != errors.end(); iter++ ) {
vector<string>* reports = iter->second;
if (reports->size() > 1)
string first = (*reports)[0];
cout << "\n\n\n";
cout << reports->size() << " instances of\n";
cout << "FIRST instance " << first << "\n";
if (reports->size() > 1)
cout << "LAST instance " << (*reports)[reports->size() -1] << "\n";

cout << iter->first;

void process_file(char* filename, error_map& errors)
progress p;
cerr << "\bProcessing " << filename << "\n";

ifstream file(filename);
string line;
string pending_error;
string pending_stack;
int stack_lines = 0;
int line_number = 1;
bool skipping = false;

while (getline(file, line))
if (line_number % 100000 == 0);


if (line[0] == '#' || line.find("Notice") != string::npos)
if (pending_stack.size() != 0)
// Process stack trace
if (errors[pending_stack] == NULL)
errors[pending_stack] = new vector<string>();

errors[pending_stack]->insert(errors[pending_stack]->end(), pending_error);
// Ignore lines from systems we are not interested in
skipping = line.find("ignore-one") != string::npos
|| line.find("ignore-two") != string::npos
|| line.find("ignore-three") != string::npos
|| line.find("INFO") != string::npos
|| line.find("WARN") != string::npos;

if (!skipping)
pending_error = line;

stack_lines = 0;
if (!skipping && stack_lines < 20)

int main (int argc, char * const argv[]) {
error_map* e = new map<string, vector<string>*>();
error_map& errors = *e;

for (int i = 1; i < argc; i++)
process_file(argv[i], errors);

return 0;

The Python solution is a little shorter however

import os, sys

def is_valid(item):
for token in ['ignore-one', 'ignore-two', 'ignore-tree', 'NavigationLink instance']:
if token in item:
return False
return True

directory = sys.argv[1]

errors = {}

for filename in os.listdir(directory):
last_error = ''
last_stack = ''
stack_count = 0

file_path = os.path.join(directory, filename)
print 'Processing', file_path

source = open(file_path)

for line in source:
if line.startswith('#') or ('<Notice>' in line):
if is_valid(last_stack) and is_valid(last_error):
errors.setdefault(last_stack, []).append(last_error)
last_error = line
last_stack = ''
stack_count = 0
stack_count += 1
if stack_count <= 20:
last_stack += line


print 'Writing report to grok.txt'

out_file = open('grok.txt', 'w')

for stack, error_list in errors.iteritems():
if (len(error_list) > 1) and (len(stack.strip()) > 0):
out_file.write('Found %d items like: %s' % (len(error_list), error_list[0]))


Both solutions limited the number of lines in the stack trace used for the key to 20. This was fine for non-reflected methods.

Saturday, March 1, 2008

Ruby initialise array and add in one step

I keep forgetting the syntax of this so perhaps this will help.

Given a hash where each value is an array of values this gives a nice concise way of setting things up (even if it is a little obscure).

map = {}

(map[:key] ||= []) << :value

The above results in a hash containing a key value of :key with an array containing :value

Tuesday, February 26, 2008

cruisecontrolrb and java

I have been meaning to setup a cruisecontrol.rb instance for some time. Cruisecontrol.rb is a implementation of continuous integration build server in ruby. It is a simple self-contained download with a really quick initial setup time.

I have just started an open source java project for stack trance analysis called why and wanted a build server so thought I would give cruisecontrol.rb a go.

The download took a couple of minutes. Adding the project (cruise add why --url took a few more minutes to do the initial checkout.

Started cruise (cruise start) in a command shell failed.

I needed to provide a custom build command in the cruise_config.rb file in projects/why:

Project.configure do |project|

project.build_command = 'b'

I already had a shell script called b to run ant and build the project. Result - build failed.

Small scratching of head and a chmod +x b later and - build passed.

From start to finish about 20 mins - most of which was download and initial checkout.

Sunday, February 3, 2008

Tagging builds in subversion

At the end of every build I like to add a tag to the project source repository. Although the build log contains the revision of the source being built and so the build can be re-created, I like the fact that all the information about the source and the build is in one place.

<target name="tag-build" description="tag the build revision" >
<svn javahl="false">
<status path="${basedir}" revisionProperty="svn.revision" />
<echo message="Tagging revision ${svn.revision} as tag ${label}" />
<svn username="cruise" password="cruise" javahl="false">
<copy srcURL="http://repository/path/trunk" revision="${svn.revision}" destUrl="http://repository/path/tags/${label}" message="Cruise: Tagging build ${label}" />

Monday, January 21, 2008

Project Kit List

Essential Items for Software Projects

A recent conversation with a friend made me realise that things I have come to think of as standard practice for software projects are by no means universal and that problems I had thought not relevant any more are still around and causing mayhem and chaos around the world for developers and managers alike.

So casting caution to the wind I thought I would draw up a list of things I find essential for a project

  • Version Control

  • Build server

  • Wiki

  • Issue tracking

  • Text Editor

Version Control

It is difficult to understand why any project would not use version control. No decision is final and no PC or server is 100% reliable. At the very least Version Control is a personal safety net for things that can and will go wrong

I like to take things a little further and make version control the only way of communicating changes to the build server. This ensures that everything is under version control and that those little change that I will remember forever (hah) are written down somewhere. (The build server config should also be in version control - makes rebuilding a build server a breeze)

My personal preference is subversion. Its free, open source and works really really well. If you are using a version control system with features not supported by subversion think seriously about whether you really need those features. No really many version control systems have features that promote complex processes and practices that can really get in the way.

Build Server

My second essential item of kit is a build server. Hardware is now so inexpensive that there are few excuses for not having one of these

If you cannot afford an additional server or it is not practical (on the road with a laptop perhaps) then invest in a good virtualisation system had run your build server on the same hardware or one of the dev boxes if you are working in a team.


If you are working in a team or even by yourself using a wiki to record things as the project goes along is great. Especially if your memory is anything like mine. Wikis are especially useful if you need to share information about the project or team with people out of your office.

Wiki's such as track also have a bug tracking system built in and integrate with a subversion version control

Issue Tracking

As your code base matures you are going to need something to keep track of the problems and changes that people want.

Don't go overboard. Putting too much reliance on an issue tracking system is that same as code generators and wizards in your ide - they stop you thinking about the problem and generate a lot of noise

Having said that I am a great fan of code generators :)

Text Editor

Ok so why put a text editor in the same rankings as version control wikis and such?

I am with the Pragmatic Programmers on this one. Having a good text editor to call upon is essential. By good I mean it fits your work, you and the things you need do do. But like all good tools you need to spend time with it learning what it is good at. My personal choices are TextMate on the Mac and Emacs just about everywhere.

Best Practice

Ok now the preachy bit. But as with tools make sure you need the practice before you use it. I have seen project adopt practices because they have seen them work well on other projects without considering the impact or relevance to the current project. Having said that some things really are universal.

Build servers and pipelines

I have talked a bit about having a build server. If you hav'nt got one look into getting one.

If you have one then there are some additional things you should consider.

  • Everything on the server came through version control. Ok maybe everything is a bit steep. Your going to need an OS, runtime and build software and other stuff that is just not practical to put through version control. But perhaps scripting the intallation and putting that script in version control would work. It would certainly make it easier to set up another build box.

  • Everything going to production comes from a build server. Yep no more quick file deployes and patches from dev boxes. If it is important enough to put through testing it is important enough to have traceability through the build.

In some situations one build server is not enough. Perhaps you are maintaining multiple versions of the application or there are complex integration points that are tested as part of a different build. (It is often useful to have a fast developer build to provide rapid feedback to the developers and then a longer integration build that makes sure the application works in its intended envioronment.)

Integration builds

Having mentioned Integraion builds I had better expand on what I mean by integration builds and integration as a concept

Integration and Continuous Integration (CI) (as part of build systems such as CruiseControl etc) are talked about a lot but what does this mean in practice and how best to make use of these tools.

The word integration and continuous integration covers a lot of concepts. Perhpas the first level of integration and I believe the original intent for work integration of the changes made by eack developer in the team.

Managing session state in persitent objects

On my current web project, like most web projects, needed a way of managing user session state. We wanted a clean separation from the web tier and a way of managing warnings or annotations to the supplied user data incase the data was not valid.

We decided to store information supplied in a GET or POST request in document objects. These document objects would then be used by the service tiers to perform the requested operation. Each document object was kept simple by only allowing fields/attributes. The documents would be validated by a validator for each document type, the validators using a utiltity class of validations to keep the duplicate code count low. The documents can be presisted for long running user transactions.

Perhaps the best way to illustrate how this works is with an example. Lets assume that we have a request handler for a registration service that takes a map of name/value pairs for the values the user has supplied on the registration form. For simpliity the response and error handling paths have been ignored.

import java.util.Map;

public class RegisterRequestHandler {
private RegistrationService registrationService;

public RegisterRequestHandler(RegistrationService registrationService) {

this.registrationService = registrationService;

public void handlePost(Map<String, String> params) {
RegisterDocument document = readDocument(params);

RegisterDocumentValidator validator = new RegisterDocumentValidator();

if (validator.validate(document)) {
} else {


private void redirectToSuccessfulConfirmation() {
// ...

private void persistDocument(RegisterDocument document) {
// ...

private void redirectToRegisterPath() {
// ...

private RegisterDocument readDocument(Map<String, String> params) {
RegisterDocument registerDocument = new RegisterDocument(); = new Field(params.get("name"));
registerDocument.password = new Field(params.get("password"));
registerDocument.userName = new Field(params.get("username"));

return registerDocument;

The handlePost entry point turns the name/value pair of parameters into a registraion document representing the users application for an account. The document is validated and if valid a request for a new account is passed on the appropriate service. If the document is not valid the user is redirected to the registraion application form with any warnings identified int he persisted registrtion document.

The registration document contains the fields we expect from the UI and very simple helper methods.

public class RegisterDocument {
Field userName;
Field password;
Field name;

public boolean hasWarning() {
return userName.hasWarning() || password.hasWarning() || name.hasWarning();

The field class manages the value and any warnings.

public class Field {
private String value;
private Annotation annotation;

public Field(String value, Annotation annotation) {
this.value = value;
this.annotation = annotation;

public Field(String value) {
this(value, Annotation.none);

public String getValue() {
return value;

public void addWarning(String warning) {
annotation = new Annotation(warning, Annotation.AnnotationType.warning);

public boolean hasWarning() {
return annotation.getType() == Annotation.AnnotationType.warning;

Perhpas choosing the most overloaded name (Annotation) this class models the validators response. Allowing warnings or infomation to be associated with the value. Used to report problems back to the user.

public class Annotation {
enum AnnotationType {
warning, info, none

private AnnotationType type;
private String value;

public static Annotation none = new Annotation("", AnnotationType.none);

public Annotation(String value, AnnotationType type) {
this.value = value;
this.type = type;

public AnnotationType getType() {
return type;

public class RegisterDocumentValidator {
public boolean validate(RegisterDocument document) {
if (isEmpty( {"You must supply a name");

// ...

return document.hasWarning();

private boolean isEmpty(String name) {
return (name == null || name.length() == 0);

Any services use the documents as parameters. They are decoupled from the UI and can be more easily unit tested.

public interface RegistrationService {
void register(RegisterDocument document);