Djun Kim's blog

Thousands of lines of code, millions of dollars

Djun Kim - November 16, 2007 - 5:45pm

Inspired by Ohloh, and a need to start scoping migration of a large cluster of Drupal sites from Drupal 4.7 to Drupal 5.x/6.x, I've started some work on a static analysis / code metrics tool specifically intended for PHP / Drupal.

Code is available from Bryght's public svn repository, with repo URL https://svn.bryght.com/dev/svn/scripts/metrics.

The metrics.php program is currently implemented as a command line script, which is pointed at a directory containing code to be analyzed:

% php metrics.php code_dir

Reports are generated on standard output.

History

This code started from the sloccount.php script written by Arto Bendiken. I've tried to extend the script by making it more comprehensive and precise, but also more flexible and general. The architecture is intended to be pluggable, allowing users to easily write analysis tools for new types of files.

Currently there are two supported file types: generic, and PHP.

The 'generic' code analysis supports 'C' style code, with in-line comments initiated by '//' and comment blocks set of by '/*' and '*/'. This gives rough metrics for javascript and CSS files.

The PHP analyzer uses PHP's tokenizer and a rudimentary parser to obtain additional information about PHP code.

Analysis is completely separated from reporting: the analysis phase builds an array of metrics per file, where each metric can itself be structured.

read more

Tweaking Drupal Search

Djun Kim - October 8, 2007 - 10:51pm

I've recently been doing work extending and adapting Drupal's search module, and I thought I'd take the opportunity to show just how easy it is to tweak search module's indexing behaviour.

One of the issues I was asked to address is the default behaviour of search module with respect to hyphenated words. One might reasonably expect that searching for 'intuitive' or 'counter' would find a post containing the hyphenated word 'counter-intuitive'. However, search module will by default not return a match unless you type 'counter-intuitive', or 'counterintuitive'.

Reading through the code for search module, we find that embedded dots, underscores and hyphens are simply stripped out of words, to allow meaningful search behaviour for URLs and acronyms.

Fair enough - we'd like searches for 'F.B.I.' to match documents containing 'FBI', and vice-versa. But it is counter-intuitive that searching for the constituents of a hyphenated word won't necessarily find posts containing that word.

Fixing this seemed like it might involve a lot of work. I was happy to discover that the architecture of search module allowed me to enable the behaviour I wanted for hyphens, without breaking the nice default behaviour for such things as acronyms, with a half-dozen lines of code.

The key doing this is the search_preprocess() function, which invokes the hook_search_preprocess() hook in all modules that implement it. The hook takes a string (initially the text to be indexed) and returns a transformed version of the text. Fortunately, this hook is invoked before hyphens and such are stripped out. It is possible to imagine many applications of this kind of transformation, but it is easy to see that we can use it to append the individual words constituting a hyphenated compound to the text.

read more
Categories: hyphenation · search

Maxwell's Demon

Djun Kim - July 31, 2006 - 9:34pm
I thought I'd set down a few thoughts about what I will be trying to accomplish at Bryght. 

The analogy came to me while trying to write a bio for the Bryght Team page. I described myself as a stand-in for Maxwell's demon, and since this particular demon has not as far as I know made an appearance on Buffy, I feel obligated to explain. 

The second law of thermodynamics implies, roughly speaking, that the disorder in the universe increases over time. James Clerk Maxwell, the famous Scottish physicist, proposed a thought experiment which he claimed refuted this second law. The setup is this: a tiny demon stands at a doorway connecting two chambers A and B which initially are filled with gas at the same temperature and pressure (i.e., the average speed of molecules in A is the same as the average speed of the molecules in B). The demon observes the molecules flying about and bouncing off of the door, and opens the door to allow fast-moving molecules to pass through to from A to B, and to allow slow-moving molecules to pass through from B to A. In this way, over time, the average speed of molecules in B will increase, that is, B will get hotter, while A cools off. Maxwell claimed that all of this happened 'without any work being done on or by the system', which violates the second law. 
read more
Categories: chaos · entropy · order · thermodynamics · work
XML feed