Lee on March 14th, 2009

Hmm. Qt’s kinetic is pretty cool. Of course, Amigas could easily do something very similar back in the 80’s (with BOBs), but it’s a big step forward for more “modern” systems ;)

Tags: , , , , , , , ,

This is going to be a slightly cheesed off post. Better skip it if you’re not thick-skinned. Back to happier django posts soon… if I can forgive this.

So I’m deploying my first real django app to a live server, with modwsgi. ModWSGI itself isn’t great, as it insists on a single python version per apache server (and a global WSGIPythonHome rather than per vhost), thereby negating most of the benefits of virtualenv [update: seems I can use site.addsitedir() to get around this]. But my real problem with this deployment has not been modwsgi — that’s the nice, smooth part.

MySQL table name length limits

The problem is that django behaves entirely differently when deployed under modwsgi from testing under runserver(_plus).

First off, I did the obvious: copy the site over to the server, and change the database config from a local SQLite file (used for development) to a MySQL database on the live server. The next step is to run:

./manage.py syncdb

to initialise the database tables. This led to the first problem, which was a huge one. Django creates table names that are too long for MySQL (the limit is 64 characters, I believe) — the most common open source DB around! As a result, it simply doesn’t work with that database backend:

File "/var/lib/python-support/python2.6/django/core/management/commands/syncdb.py", line 80, in handle_noargs
cursor.execute(statement)
File "/var/lib/python-support/python2.6/django/db/backends/util.py", line 19, in execute
return self.cursor.execute(sql, params)
File "/var/lib/python-support/python2.6/django/db/backends/mysql/base.py", line 83, in execute
return self.cursor.execute(query, args)
File "/var/lib/python-support/python2.6/MySQLdb/cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "/var/lib/python-support/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1059, "Identifier name 'preferred_beneficiary_id_refs_basebeneficiary_ptr_id_24644201ef7702ef' is too long")

The only solution seems to be shortening the names of all your models, which is a bit useless after you’ve just completed testing your app and are ready to deploy it, or using another backend. I’ve read that the problem also exists with PostgreSQL. Surprisingly, it existed in Oracle too, much more severely, but that has been fixed:

Naming issues

Oracle imposes a name length limit of 30 characters. To accommodate this, the backend truncates database identifiers to fit, replacing the final four characters of the truncated name with a repeatable MD5 hash value.

I don’t have (or want) Oracle available of course, so that leaves a file-based backend for now, and re-naming lots of core models as soon as I get a bit of time. Not fun. A simple assertion check in the Django code against my model names while I developed the thing (or a few unit tests by Django developers) would have saved me all this deployment hassle.

But that’s just the start of the trouble, it seems.

Middleware ordering

I’m also seeing lots of problems related to middleware and and (probably) to their load-order. Lovely errors like “ModWSGIRequest.user is not defined”, which is pretty fundamental to Django, considering that just about every view uses the .user attribute from its first request argument, and that every template in a dynamic app that depends on the logged-in user also depends on it.

All of that would be fine, if it was my mistake. The thing that gets me is, this load-order stuff was almost entirely copied from django’s (and satchmo’s) examples, and worked fine in the development test server. If anything, I’d expect the test server (especially with debugging enabled) to be the more fussy target.

I’m still getting to the bottom of all this, and I’m not optimistic about solving it soon, since, as others have pointed out, There is no canonical reference for django middleware order, even though the docs often warn that it’s very important. Different examples give different orders (or say things like “must be last”, despite other middleware having the same vague requirement).

Then there are lovely issues with satchmo. Satchmo is a whole other beast though, so I won’t get into those right now.

Suffice to say, deploying django after developing is not at all a straightforward matter of uploading code and initialising or updating the live database. If I ever develop with django again, I’ll almost certainly use a development server with a setup identical to the live one. Runserver just doesn’t cut it. In reality, I’m very much thinking of Pylons or Turbogears for future projects.

Tags: , , , , , , , , , , , , , , , ,

Lee on March 11th, 2009

So I upgraded my desktop to 6GB recently (from 2GB), mainly for virtual machines, but also just because I hate virtual memory. That necessitated switching to full-time use of an AMD64 edition. Up to then, I’d been jumping back and forth trying different combinations of 64-bit OS and 64-/32-bit apps, given issues with drivers, software availability, etc. Since 64-bit flash came out a while back, that pretty much closed the issue. However, I discovered after upgrading that the drivers for my onboard Marvell ethernet didn’t work with that much memory installed, even though it did work fine in 64-bit previously. Annoying. Still, I had an unused USB ethernet adapter lying around, and it’s mainly used for broadband, so no performance issue there really.

I also had bought an external drive (Western Digital MyBook Pro) about a year ago, for backups. It was the most reasonable one I could find that offered USB and firewire. Performance was abysmal though, and it lasted literally a week, before packing up. Seems I’m far from the only one who discovered their cheap PSUs. Tracked down a compatible PSU after a while, and it kind of worked for another week, but then wouldn’t boot any more. Cracked the case open, took the drive out, and installed it in my PC, where it’s been running nicely ever since. Their actual drives are nice, but after that so-called “Pro” product, I’ve scratch western digital off my manufacturers list.

Anyway. I used that for backup for quite a while, but don’t really have that much actual personal, irretrievable data to back up, as opposed to stuff I can download from a distro site or wherever. I also have a small laptop at home running as a server, and a few servers online. Between them, I have more than enough sources and targets for rdiff-backup. So yesterday, I converted my desktop to RAID-1 (mirroring) with mdadm, and LVM on top. Between that and the memory, things seem pretty zippy so far :)

One thing with doing that is that Debian Sid doesn’t seem to be able to detect mdadm/LVM and build a decent initrd for them right now. When booting, it gets to the initrd stage, but then I have to manually run mdadm to start the raid array, and pvchange to enable the logical volumes. After that, I can exit the initrd shell, and boot will continue as normal, so it’s not too much hassle. There’s a debian bug filed for this, but I can’t be bothered tracking it down right now. Seems to be just a regression, so I’m sure it’ll get fixed soon.

Oh, while I’m on about upgrades, I also got me an Ergotron MX Desk-mount Monitor Arm. Very nice bit of kit, and great for moving the monitor around when your neck gets sore or something. The only thing is, I mainly got it for swapping to portrait orientation (with XRandR), but I haven’t used that half as much as I expected to, for a variety of (mainly technical) reasons. Still, I’m getting good use out of it, so it definitely fits in the “upgrade” category :)

Tags: , , , , , , , , ,

Lee on March 9th, 2009

Aaron Toponce posted some interesting “libraries of congress” analogies about the size of the IPv6 address space. I loved how he said that “18,446,744,073,709,551,616…may not look large” compared to the (less than) 2^32 addresses available in IPv4 — he seems to think about future compatibility in the same way I do :)

Anyway, I did find the article a little disappointing. When I read up to the first visualisation on the currently used IPv4 space, I thought the rest of the article would be talking about the uptake of IPv6 by comparison. Instead it was really just about the size of IPv6 as designed. Otherwise known as “is it happening” :)

I know that many OSes and distros are installing an IPv6 stack now, but how many are actually using it for browsing, tunnels, crypto, etc.? What rate is this happening at, and when are we projected to have a similarly pervasive IPv6 that we have now with IPv4? It’s a large address space, but how is the usage of those IPs comparing to the expected usage? We thought IPv4 was big too, but I guess we didn’t count on people putting toasters on the net, and other people owning huge unused blocks.

I recently signed up with SixXS, but haven’t yet bothered to do anything with it. Mainly that’s just lack of time, but I’ve also been surprised a little by my lack of interest. For me, IPv6 is still just an interesting experiment, of which there are many, and it gets low priority.

Tags: , , ,

Lee on March 8th, 2009

It seems that Python 3.1 alpha 1 has been released already. A quick scan of the NEWS file reveals some nice performance improvements in there:

  • IO stuff reimplemented in C
  • new garbage collector
  • faster Unicode handling
  • computed gotos (which I’ve never heard of, but apparently offers 20% speedup on supported compilers)

Things like this keep shocking me in python’s release notes though:

- Issue #4707: round(x, n) now returns an integer if x is an integer.
Previously it returned a float.

and from Python 3.0.1’s NEWS:

- Issue #4998: Decimal no longer subclasses from or is registered to
numbers.Real. Instead, it is registered to numbers.Number so that
isinstance(d, Number) will work.

In a release of the C standard library or something, changes like that would mean API and ABI breakage. With python’s flexible typing, it’s not so much of an issue, I guess. Still, it would be nice if someone had sat down and thought about the math a bit more, and been sure about what was right from the beginning. Working out if Decimals should be Real numbers (or, indeed, if Reals should be Numbers) shouldn’t be this hard, should it?

Hmm. I was sure Java would have gotten this right, but it seems not:

class java.lang.Number
* class java.lang.Double
* class java.lang.Float
* class java.lang.Integer
* class java.lang.Long

So longs aren’t integers, and Floats and Doubles are unrelated, except for being some kind of Number? Oh well. It’s not like math is important in computer science, right? ;)

Anyway. I’m trying to keep track of Python 3.x, but until a lot more libraries are available that run with it, it’s mostly a far off dream. Shame, as I’ve been dying to dump 2.x in favor of the new Unicode stuff :)

Tags: , , , ,

Lee on March 8th, 2009

I seem to be in a love/hate relationship with Git. On the one hand, it’s… well, horrible, usability wise, with all it’s unexpected behaviors and weird terminology for well known concepts. On the other hand, it does nice things like this when changes are committed:


...
rename noodil.py => kit_library/hello.py (71%)
mode change 100755 => 100644
rewrite noodil.py (95%)
...

What I had here was one file, noodil.py, which was an initial piece of executable test code. Moving on a bit, I wanted to make the main executable a wrapper that could run other code, with the initial test code becoming a hello.py demo script. The way I accomplished this was to copy noodil.py to kit_library/hello.py, remove the few startup/teardown lines, and then in the noodil.py file, leave essentially JUST those startup/teardown lines. A little bit of extra editing was done to each file, of course.

Anyway, that “rename (71%) and rewrite (95%)” is a pretty good summary of what happened. The mode change too, is there indicating that I copied the file, but that it’s no longer executable in its new location (actually I saved a second copy from my editor). It’s probably nothing much more than a per-line checksum, but this stuff really does give an impression of git understanding and tracking my work at a high-level. I’m impressed :)

For what it’s worth, Noodil is my fledgling python project to bring a combination of Unix pipes, Automator, Powershell, and other tools to the GUI in a new way. More on that later, when I have something useful to demonstrate — other than the unquestionably useful Hello World, of course :)

Tags: , , ,

Lee on March 6th, 2009

Hmm. Chrome-center has an article up performance of VIA Chrome chips on windows and Linux for Nexuiz, with pretty charts showing that Nexuiz runs faster on Linux. I think this is way off.

For one thing, the Xorg and video driver guys are busy re-writing the entire Linux video stack because it’s entirely out of date.

For another thing, Nexuiz is an open source game. It makes some sense that performance would be more optimized on open source platforms.

Finally, I’d like to know what render settings are being used (both by default and by choice in the drivers) on both windows and Linux.

Tags: , , , , , , ,

Jeffrey Way posted a screencast on NetTuts with the tongue-in-cheek title, How I Can Code Twice as Fast as You.

The basic premise is that repeating long blocks of code known as boilerplate that are always the same (but are required nonetheless) is time-consuming, and that it’s more efficient to use an editor with text expansion to get around this. There’s a little bit of merit in this under certain circumstances, and Jeffrey’s example was possibly contrived. However, I find it quite a wrong-headed approach to development under most circumstances — so much so, that it’s a pet hate of mine at the moment. I’m going to bemoan it a bit then, and in the process, I’ll also attempt to be constructive, sketching out an altogether nicer alternative.

Text Expansion as Proposed

The screencast advocated text expansion shortcuts allowing for:

jquery

followed by a key (such as tab) to expand to something very specific like:

<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>

and:

starthtml

to expand to a whole bunch of HTML boilerplate, something like this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<link href="css/style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="wrapper">
</div>
</body>
</html>

It then goes on to fill this page with more repetitive but quick code entry using further expansions, such as the jquery expansion above, and a startform keyword that expands to five lines of form code, with sample inputs for requesting a name and email address, and empty attributes such as action and method.

Repetition is the Devil

The problem with this is that, if you’re repeating yourself, you’re wasting time and introducing bugs and inconsistencies. No one should be writing boilerplate over and over again, much less HTML boilerplate. By very definition, this text expansion is proposed to save typing repetitive text. Yet, it still doesn’t shrink from creating all that repetitive text by a more shadowy route. All this repetitive text still has to be read and maintained later.

Let’s say we do use this method. Let’s say we build a simple website: a welcome page, a news page, a contact page, an about page, and a few random pages of content. We start with the welcome page. Open a text editor, type our starthtml tag, and we have the basic code for a page. We create our CSS to make it look pretty, type out our nice welcome header making sure to use proper HTML header tags so our CSS can style it, type “Welcome to blahblah...” text between the divs, save in the proper place, and we’re done.

Page two then: About. Same again: a quick starthtml expansion to get us started, a fill in the heading, the content. Oh, we just noticed the title tag in the header. Let’s fill that in too. About. OK, done. Save in the right place. Oh, better go fix the title tag on the first page, too. And link it to the second page. Some link expansion to <a href="about.html">About</a>, of course. Oh, better go to the About page and link back to the welcome page now. Hey… these Welcome and About links are turning into a navbar, so we should standardise the look of this across both files. That’ll mean going to both files, adding a div with a particular name, adding the actual links, making sure they’re in the same order in both files, making sure the appropriate link is disabled and emphasized in one file while being a clickable link in the other, etc.

OK, so how many steps was that again? Relative to how many pages of content? Already, we’ve jumped between files 3 times, missed some important title tags, had a slight chance of inconsistency by missing the title tag in one file or another, and other inconsistencies by wondering where to put the navigation, how to style it, etc. Two files, and it’s already a lot of trouble, with lots of potential for mistakes.

Do you really want to continue building your site like this? Even when you have enough pages and sites to work on that it matters how fast you can type a form? This is bordering on insanity, or at the very least, masochism.

Single-Sourcing: Templates and dynamic, server-side code

Any time you need to repeat code, it should be inherited from a single source; a single template that gets inherited by all the other, much simpler, focused templates. The other templates should look something like:

(inherit base template)
(start content section)
Specific content for this page
(end content section)

And then you’re done. What’s more, you can change the entire layout of a site by just modifying the base template, with every page being updated for perfect consistency. This is exactly how templates work in Django, and many other modern web development frameworks. Even in an embedded HTML scripting language like PHP, or in the lowly Server-Side Includes (SSI) you can do this easily.

Beyond Templates

Going beyond templates, you can also maintain a complete project layout in a version control system. Let’s say your sites always include CSS, a contact page, a welcome page, and a few images. You can easily write this project once, laying out the files:

  • css/
    • style.css
  • images/
    • designed_by_logo.jpg
  • templates/
    • base.shtml
  • welcome.shtml
  • about.shtml

And then commit the whole thing to version control. That’s your complete site template ready to use. Now, when you want to start a project, you can just clone the template into a new directory, edit the welcome and about pages, and change the colors in the stylesheet, say.

That amounts to about two minutes to create a basic but complete (and maintainable) website.

Minimising Maintenance with Version Control

Why is this so maintainable? Well, it’s much easier to maintain work that’s simple, clear, and has been built in a structured way — especially if that structure is shared not just across one project, but across all similar projects. Let’s say we’ve been busy web developers, and have a hundred sites to update with a new logo. This is a slightly contrived example, since (if you’re smart about it) you’d probably just link to a single logo image on your own website, from each client’s website.

How would you go about this logo update with the text expansion method? Best case scenario, you’d just do a search and replace across all the sites. You might open a text editor, and tell it to replace:

<img src="graphics/designed_by_logo.jpg">

With:

<img src="graphics/designed_by_new_logo.jpg">

Simple enough, right?

What what if the edit isn’t so simple, though? What if some of your customers’ contracts say you can’t update their site graphics without a long-winded approval process, or during the shop’s sale seasons, but you want to update the rest? Fine, you update some, but not others. A bit more work separating the different sites to replace text on, but nothing insurmountable. Just a lot of donkey work.

Let’s imagine though, that a law came into force (in reality, it already has), which required all your site graphics must also have text descriptions for blind users. You’d have to update the logo descriptions by adding an alt attribute to each img tag. Let’s say you do these updates at different times for some sites, and do them in a slightly different, but largely equivalent way. Now, you have some sites with img tags that look like:

<img src="graphics/designed_by_new_logo.jpg" alt="Designed by BobInc (Company Logo)">

and some that look like

<img alt="Logo: Designed by BobInc" src="graphics/designed_by_logo.jpg">

Now, you’ve broken at least two aspects of your simple logo:

  1. your company’s branding is no longer consistent, using different text descriptions in different places
  2. your code is physically different in different files, causing problems for tools like search and replace.

If you now look at one of these files, and decide to remove the logos altogether, your search and replace for <img alt="Logo: Designed by Bob" src="graphics/designed_by_logo.jpg"> will miss some of the logos. You have just lost the battle to maintain your sites. You site code is now crufty and no longer maintained. You could well be in breach of a maintenance contract at this point, and if you go further, deleting that “old” logo, you now have highly visible broken links on some of your sites — broken links as an advert for your web firm, that is.

With version control, none of this is an issue — it’s a simple, verifiable edit to a single file). To update the logo, you update your master template, and tell the other sites to update themselves based on that. In most version control systems, this goes something like:

vcs update

And you’re done. If you have some sites to update now, and some to update later, it’s no problem; you update them later. If there is one change to make to all sites, and many more changes to make with just a few sites, then you update the changes you want by selectively choosing them, or updating from a special version control “branch” with changes for their contract type.

It ain’t just the web

The situation is pretty similar with other kinds of development work, like writing documentation, or coding an application, or even producing clip art. If you’re repeating yourself in anything more than the most superficial way, you’re almost certainly involved in bad work that needs to be re-thought. Now, when I say superficial, I mean just that. For example, I don’t really mind typing:

def myfunction():
    some code

Because that's pretty short, and it's only a few keystrokes more than the name of the function. A mnemonic for a template that expands to this wouldn't really be any simpler. If you're really lazy you could assign that template to a keystroke, or even write a macro that pops up a window where you can type in "myfunction", and then have it write the def and (): parts for you. But really, what would you gain? A few seconds, at the price of remembering more keys, and being annoyed when not every possible thing has its own key? What if you're having to code at another machine which doesn't have all your favourite keystrokes? Really, a nice clear, simple, terse language is as good as it gets. def or func or even function is already a shortcut for "The following is a function." If your language or toolkit forces much more than that on you, then you almost certainly just chose bad tools.

Conclusions

Jeffrey wrote of saving about an hour a day through cumulatively saving little bits of time with writing boilerplate. But every time your repeat the same code, you increase your workload and maintenance burden. By definition, if you can't stand typing so much, you definitely can't stand going back over it all multiple times editing in little changes in navbars for each additional page, etc.

I hope I've shown (or at least hinted at the possibility) that there are better tools out there than text expansion. Some of these tools aren't well known to web developers on Windows, but in serious app development, and web development on Unix, they've all been around a long time, and are tried and tested. The sooner you start using them, the sooner you'll start achieving real efficiency gains.

In summary, please don't repeat yourself, even if that repetition is pressing the same shortcut key over and over again. To me, this is the single most fundamental law of computing. It applies in everything from word processing, web development, file management, finance, business process management, and every other aspect of computer use. The whole point of computers is to take care of mundane repetitive details that humans aren't good at looking after, so we can get on with the fun stuff. Don't be a drone bogged down in the repetition --- have fun, and spend your time on creativity!

Links

  1. Django: a decent web framework that'll save you some of this trouble. It's far from perfect right now, but stay tuned as I'll be opening version control repositories soon that make it a lot faster and more intuitive to use (see Pluggable Django: Git templates for Turnkey Projects and Apps for more on this).
  2. Wikipedia's page on Version Control
  3. CherryPy's Documentation on Choosing a Template Language

Tags: , , , , , , , , , , , ,

Lee on February 20th, 2009

Mark Shuttleworth posted about GNOME’s new fast user switch applet recently.

Initially, I was skeptical about the design of this, but it’s actually a pretty nice little tool. When someone wants to borrow a browser, being able to just hit guest, let them play on their own desktop, log them out, and return to what you were doing, is very handy.

The idea of having availability on the main session menu is growing on me too. It makes a lot of sense to centralise your availability, given that you could be available on IRC, by IM, by VOIP, by Avahi, etc., in different apps.

Tags: , , , , ,

Lee on February 19th, 2009

So I upgraded to ext4 with jaunty on my desktop lately. I needed the 2.6.28 kernel to get my tv card (hvr-3000) working, and since ext4dev had been retagged as ext4, it seemed a good time.

All in all, ext4 has been nice so far, but over about three weeks of use, I’ve gotten “no space left on device” errors twice now, despite having over 250GB free. The second instance just happened. First time around, I had to run an reboot/fsck to fix it. The fsck didn’t report fixing anything, so maybe just rebooting did it. I expect that one of those will work again.

Anyway, a quick google for ext4 “no space left” shows I’m far from alone in this. Caveat emptor**.

** What is latin for “installer” anyway? ;)

Tags: , , , , , , ,