Code style

This is an idea I noticed in a response at a tweet from @garybernhardt.

Indenting with tabs, spaces, two or four spaces, where do you place the opening/closing braces, how do you break long lines, whether you place spaces or not around operators, and in which contexts, how many white lines you place between class declarations, all of those things, should be irrelevant. All of those are just visual aids, which:

Should not show up in your diffs, when you are reviewing a pull request from somebody.
They should be ignored by your version control system; any commit that merely changes whitespace/formatting is a non-change, since nothing changes functionally.
You should be able to tell your IDE how you want to look at the code, what code style you want to use. Yet that should not provoke/show up as a change (a commit) in your VCS.

Files

Your IDE should take care of creating/naming files. It should also take care of where to put the things you create as a programmer: globals, functions, classes, modules, etc.

Ideally, your only responsibility should be telling your IDE what's your module structure. It should deal with the rest.

Also, your VCS should stop caring about files. We don't need to know that 'a file was renamed', we also don't need to know that 'a function was moved from this file to that file', if it didn't change.

But, 'a function was moved from this module to this other module' is something I do care about. As I also care about 'all the changes made to this single function'. Perhaps the VCS should version not files, but a standarised, machine friendly, machine-diffable version of the code we write. The IDE, or your VCS should know how to generate those for you, automatically. Possibly the IDE would be a better fit, and the VCS should just be considered a dumb storage, versioning a standarized representation of the code generated by the IDE.

Code analysis

Before, and mostly because of this article from Joel Spolsky, and my own experience as a reviewer, I tried to avoid exceptions, and preferred error codes, because error codes:

Are visible to the reader.
Their handling (or lack of thereof) is visible to the reader.

Because of that, you don't have to zip around your whole codebase when reading some piece of code in order to check whether a method throws an exception or not, and the opposite situation, to check all of the places where an exception is handled, and figure out which one of those is the one calling the function you are reading right now.

Writing exception-based code does give you clean code, with centralized/localized handling of errors (when used properly). You basically end up with the code that handles the happy path, in a single place, and code for handling the possible errors, somewhere else, wherever the exceptions that might be raised are handled. But, that code is awfully hard to read with our current tools, because all of that error handling is littered over so many different files/places in your project, many times even with no obvious connection of how to get it, because many "final" exception handlers (like, for example, DRF's custom exception handlers) are defined dynamically, just by putting a class name in a string in a settings file somewhere... you will not find them by just following the call chain, when reading the code.

But, that is a limitation from our current development tools, nothing more. All in all, I think exception-based code is better than error-status-based code, fundamentally because a dead program is better than a limping program, and exceptions guarantee you that if you did not handle them, your program crashes, (and you get to see the stacktrace of when it crashed) instead of carrying on with God-knows-what you get after something fails and somebody didn't handle that error code properly.

As for reading exception-based code, I wish I was able to ask to my IDE to:

Show me what non-runtime exceptions a given function raises, which is of course the aggregate of whatever exceptions it, or its callees, raise. (This is one of the things that I think Java got right, the compiler tells you that, and the language has the concept of runtime errors and business-logic-related exceptions, ...)
Show me the call tree of a given function: who calls it, who calls its caller(s), and so on and so forth until hitting either main or whatever is the boundary for the framework I'm using: e.g., a Rails controller, a Django view, etc.
Show me all of the places where a given exception (or its ancestors) is handled. And show that somehow in the call tree.

Also related, and occasionally useful, is being able to see the (for lack of a better name) reverse call tree of a function: which functions it calls, which functions those functions call, and so on and so forth. Of course skipping (optionally) the core language/framework stuff.

When analyzing a function, it would also be nice to be able to ask the IDE to show you the dependencies of a variable: where has it been changed in the current function so far (something function-aware, not just a find-in-file), and which other variables/expressions are used to calculate this variable.

Testing

Somewhat related to the code analysis section; I should be able to modify something in the codebase, and have my IDE to run only the affected tests, and no others. The IDE should figure out where what I modified is tested, maybe from coverage information from previous test runs/static analysis/whatever.

Tags: programming

Edit
Comment

In the ongoing saga of "distributed is better"...

I want to have the documentation for whatever tool I'm using at the moment, available without having to be online, in my local system. Such documentation is available as HTML pages in some website, most of the time. Making a mirror of such sites is not really that hard (I often make such a mirror), to check it offline. But, one of the nice features of such websites are that they also provide a really handy search feature, which of course my local mirror doesn't provide because it depends on the server over there in the Internet.

The goal here is to have a local (as in local host) mechanism that provides a web search interface for a local mirror of a website. I should be able to type some text in a search form, get a listing of the more relevant matches, and see the corresponding pages. Without having to be online.

But, creating a local index for the mirror, and the corresponding web search interface for it, is not really that hard (for morlocks!). You have to perform an initial installation and configuration, only once, and a couple of things every time you want to create a new index for a new set of documents/pages.

Note: I'm assuming you use Linux, and more specifically Debian. If not, well, you should :P

Initial configuration

Install Apache, and Xapian Omega:

apt-get install apache2 xapian-omega

Now, create a few directories: one to hold the files we want to index, another for their indexes and another for Omega's config files:

DOC_DIR=/multimedia/documentation
CFG_DIR=/multimedia/omega/
IDX_DIR=/multimedia/omega/indexes
mkdir -p $DOC_DIR
mkdir -p $CFG_DIR
mkdir -p $IDX_DIR

Configure Omega. Create the $CFG_DIR/omega_config file, with the following content:

# Directory containing Xapian databases: this is the value of $IDX_DIR
database_dir /multimedia/omega/indexes

# This value is valid for Debian installations
template_dir /usr/share/xapian-omega/templates

# Directory to write Omega logs to. Make sure that the user used to run
# Apache has write permissions there
log_dir /var/log/xapian-omega

# This value is valid for Debian installations
cdb_dir /var/lib/xapian-omega/cdb

Now configure Apache: this consists mostly in telling Apache where to find the CGI script that performs the search, tell that script of its configuration via an environment variable, and where are the documents it should serve (our local mirrors). Create a file named /etc/apache2/sites-available/omega, with the following content:

<VirtualHost *:80>
    # Change this to a proper value
    ServerAdmin admin@localhost

    ServerName s.home.org
    DefaultType text/html

    # this is $CFG_DIR/omega_config; adjust appropiately
    SetEnv OMEGA_CONFIG_FILE /multimedia/omega/omega_config

    # this is $DOC_DIR; adjust appropiately
    DocumentRoot /multimedia/documentation
    <Directory />
        Options FollowSymLinks
        AllowOverride None
    </Directory>

    # this is $DOC_DIR; adjust appropiately
    <Directory /multimedia/documentation>
        Options Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order allow,deny
        allow from all
    </Directory>

    ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
    <Directory "/usr/lib/cgi-bin">
        AllowOverride None
        Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
        Order allow,deny
        Allow from all
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/error.log
    LogLevel warn
    CustomLog ${APACHE_LOG_DIR}/access.log combined

</VirtualHost>

Now you must make s.home.org point to the local machine. Edit the /etc/hosts file and add the following line:

127.0.1.1   s.home.org  s

Now enable the new Apache site:

a2ensite omega
/etc/init.d/apache2 restart

Finally, you should put a page with a proper search form. Create the $DOC_DIR/index.html file with the following text:

<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    </head>
    <body>
        <div>
            <form action="/cgi-bin/omega/omega" target="_top" method="GET">
                <input type="search" name="P" value="" size="15">
                <input type="hidden" name="DEFAULTOP" value="and">
                <input type="hidden" name="xFILTERS" value="--O">
                <input type="SUBMIT" value="Search">
            </form>
        </div>
    </body>
</html>

That's the initial configuration.

Adding a website

I'll exemplify the process of adding a website using the Django documentation.

First of all, you should fetch the documentation to your computer. Usually wget would be the tool to use (something like wget -r -np -nc -p -k <your-site>), but in this case, we can just download a zip file with all the documentation: https://www.djangoproject.com/m/docs/django-docs-1.3-en.zip. Fetch that file, and extract it to $DOC_DIR/django:

mkdir $DOC_DIR/django
cd $DOC_DIR/django
wget https://www.djangoproject.com/m/docs/django-docs-1.3-en.zip
unzip django-docs-1.3-en.zip
rm django-docs-1.3-en.zip

Now, create the index for those files. Execute the following command:

omindex --mime-type=:text/html --db $IDX_DIR/django --url /django/ $DOC_DIR/django

That command does the following:

Indexes all the files in $DOC_DIR/django.
Saves all the files required for the index in a new Xapian "database": $IDX_DIR/django.
Configures the search results to always prepend /django/ to the URL of each result, so it matches with the DocumentRoot we defined for the Apache configuration, and the directory we created there ($DOC_DIR/django).

That's all you have to do: fetch the docs, put them somewhere, create an index for them. You can now visit http://s.home.org, and search into those documents.

However, there is still some room for improvement: the search at that page checks all the databases, which is to say, all the websites you might have indexed. You can narrow the scope of the search, specifying which database you want to check, by indicating the database to search for, passing a DB GET parameter to the CGI script. The value for this parameter can be any of the directory names available at $IDX_DIR.

With this in mind, you could add a select field to the form at http://s.home.org, and add a new entry every time you add a new website, or make that an script that reads all the directories in $IDX_DIR on each request.

Or, if you are lazy like me, you can just add a custom search to your browser, one for each database you want to use. With Chromium, it goes like this: Preferences -> Manage Search Engines -> Other search engines, and add a search engine for django, with the keyword dj, and the URL http://s.home.org/cgi-bin/omega/omega?P=%s&DB=django&DEFAULTOP=and&xFILTERS=--O. Note both the DB=django and the P=%s GET arguments. With this in place, you can type dj <whatever you want to search for> in chromium's address bar, and jump straight to the results.

There you go: lightning-fast, offline access and search in your documentation.

Addenda

There are tools that do precisely this, like dwww, or doc-central, tailored to index all the documentation available in your system. I tried some of them, and found them inadequate: I either didn't like the search interface (which forced me to use words of more than 3 characters (!)), or the way of adding new documents. But those might be useful to you, so I'm mentioning them.

Tags: programming

Edit
Comment

Older posts

The coming war on general purpose computation
Posted at lunch time on Sunday, January 15th, 2012

Mark Pilgrim
Posted Sunday night, October 16th, 2011

Python, and the os module
Posted at teatime on Thursday, September 22nd, 2011

ICFP 2011
Posted Saturday evening, June 18th, 2011

safe-extract
Posted Wednesday night, June 15th, 2011

Writing great documentation
Posted late Wednesday evening, April 27th, 2011

Scientific computing using Python
Posted late Tuesday evening, January 4th, 2011

Python, closures and scopes
Posted late Friday night, November 13th, 2010

Rule No. 36
Posted at midnight, October 26th, 2010

ikiwiki-nav 2.0
Posted Sunday evening, September 12th, 2010

Revenge, at last!
Posted Thursday night, February 18th, 2010

ikiwiki-nav
Posted at midnight, February 8th, 2010

Django's mascot
Posted at lunch time on Thursday, July 16th, 2009

Debian forks glibc, Drepper forks Debian
Posted late Wednesday night, May 7th, 2009

Man Dies Waiting for Eclipse to Launch
Posted Friday night, March 13th, 2009

Junk code
Posted Wednesday night, March 4th, 2009

speed
Posted at midnight, February 28th, 2009

foo, bar, baz...
Posted Thursday night, January 29th, 2009

A day with C++ templates
Posted at midnight, January 23rd, 2009

Macros with a Variable Number of Arguments in C/C++
Posted Thursday afternoon, November 27th, 2008