Vanishing Countryside

I still haven’t, as I promised, addressed in detail the CPRE’s latest “overcrowded Britain” nonsense, but here’s a very very simple refutation:

Google Local

The tiny dark blob dead centre of the map is Birmingham.

Remember, by 2035, “The countryside is all but over”. Except for nearly all of it, which you can only see from the air, because, er, no-one lives there.

See also this economics piece, by Robin Hanson at Marginal Revolution, on the positive externalities of urban expansion.

We also neglect the benefits we provide others when choosing to live at the edge of the populated area, versus living in an unpopulated area… Local governments are in a position to reduce this externality, but they seem to mostly make matters worse. Minimum lot sizes, maximum building heights, maximum densities, and barriers to development at the populated edge are far more common than their opposites.

Bill Thompson

In looking at news coverage of the Sony story, I saw a piece by Bill Thompson, a technology analyst for BBC. His insight into issues seems to be consistently good. I was particularly impressed by this piece on eBay and tax, where he makes the seemingly obvious but often ignored point:

The internet is not a separate space, but part of the real world.
Politicians have to get to grips with this

He has a blog, but the good stuff seems to be copies of his BBC articles.

Warning about dodgy sources of music

I am aware that some people download music files from the internet. I don’t do this myself, because of the time and hassle, and the risk of getting something other than what I wanted.

I may have to rethink this attitude, however. It is now official – buying music legitimately from the copyright owners can install trojan horses and spyware on your computer, potentially resulting in crashes and other malfunctioning. Much safer to get an MP3.

I already have a CD that I can play at work only because I made a copy of it at home – the original will not play on my work PC because of its copy protection. It’s now getting worse.

A few jobs back we bought a copy of Rational Rose – we never used it because we couldn’t get past the copy protection. It sat on a shelf for years. No repeat business there.

Illegal rip-off software, music and DVD is generally of higher quality than the legal stuff. A free MP3 is worth more than an iTunes download or an original CD, because it’s compatible with more hardware. A hacked game is worth more than the legal copy, because you don’t have to fuss with the license key.

They never learn.

Linus Torvalds

Linus Torvalds wrote the kernel of the Linux operating system, which has been enhanced, expanded, ported by thousands of individuals and companies working over the internet. See everywhere.

Nowadays, he still controls kernel development, but his role for several years has been pretty much entirely one of co-ordination. He is an IT manager. Just as Free Software is software that we can look at the source code of, Linus’ management is done in public for us all to watch and learn from.

I just hope someday he writes a management textbook. It will be full of stuff like this:

Guys, stop being stupid about things. I already sent rmk an email in private.
And Alan, there’s absolutely no point in making things even worse.
Mistakes happen, and the way you fix them is not to pull a tantrum, but tell people that they are idiots and they broke something, and get them to fix it instead.
You don’t have to be polite about it, and swearing is fine. So instead of saying “I don’t want to play any more because Davem made a mistake”, say something like “Davem is a f*cking clueless moron, here’s what he did and here’s why it’s wrong”.
Notice? In both cases you get to vent your unhappiness. In the second case, you make the person who made a mistake look bad. But in the first case, it’s just yourself that looks bad.

More XML

(Apologies to those of my readers that aren’t interested in this stuff. I’ve been giving more time & attention to my work of late, and the results are less blogging, and technical stuff being on the top of my mind more than current affairs)

Very good piece by Jim Waldo of Sun that chimes (in my mind at least) with my piece below. He emphasises the limited scope of what XML is. He doesn’t echo my discussion of whether XML is good, rather he shoves that aside as irrelevant – the comparison is with ASCII. We don’t spend much time arguing over whether ASCII is a good character set – is 32 really the best place to put a space? Do we really need the “at” sign more than the line-and-two-dots “divide-by” sign? Who cares? The goodness or badness of ASCII isn’t the point, and the badness of XML isn’t really the point either.

The comparison with ASCII is very interesting – Waldo talks about using the classic Unix command-line tools like tr, sort, cut, head and so on that can be combined to all sorts of powerful thing with data in ascii line-oriented data files. XML, apparently, is like that.

Well, yes, I agree with all that. But, just a sec, where are those tools? Where are the tools that will do transforms on arbitrary XML data, and that can be combined to do powerful things? It all seems perfectly logical that they should exist and would be useful, but I’ve never seen any! If I want to perform exactly Waldo’s example: producing a unique list of words from an English document, on a file in XML (say OOWriter‘s output), how do I do it? If I want to list all the font sizes used, how do I do that? I can write a 20-30 line program in XSLT or perl to do what I want, just as Waldo could have written a 20-30 line program in Awk or C to do his job, but I can’t just plug together pre-existing tools as Waldo did on his ascii file.

There are tools like IE or XMLSpy that can interactively view, navigate, or edit XML data, and there is XSLT in which you can write programs to do specific transformations for specific XML dialects, but that’s like saying, with Unix ascii data, you’ve got Emacs and Perl – get on with it! The equivalents of sort, join, head and so on, either as commandline tools for scripting or a standard library for compiling against, are conspicuous by their absence.

The nearest thing I can think of is something called XMLStarlet, but even that looks more like awk than like a collection of simple tools, and in any case it is not widely used. Significantly, one of its more useful features is the ability to convert between XML and the PYX format, a data format that is equivalent to XML but easier to read, edit, and process with software (in other words – superior in every way).

As a complete aside – note that pyx would be slightly horrible for marked-up text: it would look a bit like nroff or something. XML is optimised for web pages at the expense of every other function. That is why it is so bad.

Maybe I’m impatient. XML 1.0 has been around since 1998, and while that seems like a long time, it may not be long enough. Any process that involves forming new ways for people to do things actually takes a period of time that is independent of Moore’s law, or “internet time”, or whatever. The general-purpose tools for manipulating arbitrary XML data in useful ways may yet arrive.

But I think the tools have been prevented, or at least held up, by the problems of the XML syntax itself. You could write rough-and-ready implementations of most of the Unix text utilities in a few lines of C, and program size and speed is excellent. To write any kind of tool for processing XML, you’ve got to link in a parser. Until recently, that itself would make your program large and slow. The complete source for the GNU textutils is a 2.7M tgz file, while the source for xerces-c alone is 7.4M. The libc library containing C’s basic string-handling functions (and much more) is a 1.3Mb library, xerces-c is 4.5Mb.

If you have to perform several operations on the data, it is much more efficent to parse the file into a data structure, apply all transformations on the data, and then stream it back to the file. That efficiency probably doesn’t matter, but efficiency matters to many programmers much more than it should. It takes a serious effort of will to build something that uses such an inefficient method. Most programmers will have been drawn irresistibly to bundling a series of transformations into a single process, using XSLT or a conventional language, rather than making them independent subprocesses. The thought that 99% of their program’s activity is going to be building a data structure from the XML, then throwing it away so it has to be built up again by the next tool, just “feels” wrong, even if you don’t actually know or care whether the whole run will take 5ms or 500.

In case I haven’t been clear – I think the “xmlutils” tools are needed, I don’t think the efficiency considerations above are good reasons not to make or use them, but I think they might be the cause of the tools’ unfortunate non-existence.

I also don’t see how they can be used as an argument in favour of XML when they don’t exist.

See also: Terence Parr – when not to use XML

XML Sucks

Pain. Once again, I have had to put structured data in a text file. Once again, I have had to decide whether to use a sane, simple format for the data, knocking up a parser for it in half an hour, or whether to use XML, sacrificing simplicity of code and easy editability of data on the altar of standardisation. Once again, I’ve had to accept that sanity is out and XML is in.

The objections to XML seem trivial. It’s verbose – big deal. It has a pointless distinction between “element content” and “attributes” which makes unneccessary complexity, but not that much unnecessary complexity. It is hideously hard to write a parser for, but who cares? the parsers are written, you just link to one.The triviality of the objections are put in better context alongside the triviality of the problem which XML solves. XML is a text format for arbitrary heirarchically-structured data. That’s not a difficult problem. I firmly believe that I could invent one in 15 minutes, and implement a parser for it in 30, and that it would be superior in every way to XML. If a solution to a difficult problem has trivial flaws, that’s acceptable. If a solution to a trivial problem has trivial flaws, that’s unjustifiable.And yet XML proliferates. Why?Since the only distinctive thing about it is its sheer badness, that is probably the reason. Here’s the mechanism: There was a clear need for a widely-adopted standard format for arbitrary heirarchically-structured data in text files, and yet, prior to XML none existed. Plenty of formats did exist, most of them clearly superior to XML, but none had the status of a standard.Why not? Well, because the problem is so easy. It’s easier to design and implement a suitable format than to find, download and learn the interface to someone else’s. Why use someone else’s library for working with, say, Lisp S-expressions when you could write your own just as easily, and have it customised precisely to your immediate needs? So no widely-used standard emerged.On the other hand, if you want something like XML, but with a slight variation, you’d have to spend weeks implementing its insanities. It’s not worth it – you’re be better of using xerces and living with it. Therefore XML is a standard, when nothing else has been.This is not the “Worse is Better” argument – it’s almost the opposite. The original Richard Gabriel argument is that a simple, half-solution will spread widely because of its simplicity, while a full solution will be held back by its complexity. But that only applies to complex problems. In heirarchical data formats, there is no complex “full solution” – the simple solutions are also full. That is why we went so long without one standard. “Worse is Better” is driven by practical functionality over correctness. “Insane is Better” is driven by the (real) need for standardisation over practical functionality, and therefore the baroque drives out the straightforward. Poor design is XML’s unique selling point.

Large and Small Organisations

Very good piece by Arnold Kling on the differences between large and small organisations.

If large organizations are dehumanizing, then why do they exist? Brad DeLong says that my assessment of large organizations must be incorrect, or else we would not have Wal-Mart.

A point Kling doesn’t make about Wal-Mart is that it is a fairly young organisation. It was in the 1970s that it became a really large organisation, and in the 1980s that it became spectacularly huge. As I have pointed out previously, it is over time that the bad effects of states and other large organisations accumulate. After thirty years, Wal-Mart is a very effective organisation, but one would expect the problems to start soon. The massive state-managed economy Britain instituted in the 1940s started falling apart in the 1970s, and the Soviet organisation set up through the 1920s and 30s probably peaked in effectiveness in the early 60s. Small organisations can stay effective indefinitely.

This piece by Paul Graham is also relevant – describing the Venture Capital / takeover cycle as a way of getting more of the best of both worlds.

Death toll

OK, so the death toll from the Great North Run matched that of the Hatfield rail crash.

I wonder how long the court case will last?

Dr Andrew Vallance-Owen of BUPA said, “At BUPA we encourage everyone to take an active interest in their health and running is a great way to keep fit. This year BUPA is sponsoring six runs including the BUPA Great North and BUPA Great South Runs.”

Oops, wrong page. That was last year. Actually, he said that fun-runners who failed to prepare properly for such gruelling events could suffer heart attacks. (Metro, Monday 19 Sep).

Not that there is any evidence that the victims did fail to prepare properly. The brother of 28-year-old Reuben Wilson said that Wilson had trained for the race. The immediate assertion that “if it didn’t work, you weren’t doing it properly” is one of those things that I generally find very annoying. Facts first, please, then conclusions.

Seriously, I don’t think that the organisers of the race should be considered liable for the deaths that occurred. But there is at least is much justification as in many other cases of accidental death, including Hatfield.

Politeness

There are two views of politeness. One is that it’s a kind of magical fairy-dust that you can add to whatever you do by using meaningless words like “please”.

That might be OK for teaching toddlers, but it’s rubbish.

Real politeness is caring about other people. “please” isn’t meaningless, it’s a contraction of “if you please”, and it means that you’re recognising that the person you’re talking to might not want to do what you’re asking, and that you’re accepting that they might choose not do it.

Giving an order including the word “please” isn’t polite, it’s gibberish. Saying “please” isn’t polite, unless you mean it.

Now the message you get if you go to http://www.legos.com/

“… We would sincerely like your help … Please always refer to our products as LEGO bricks …”

Is, as far as I can see, genuinely polite. They’re not giving orders or making threats. They’re pointing out what they call the stuff they make, and saying that they’d prefer it if their customers called it the same. There’s nothing to suggest that they are unaware that Cory Doctorow or anybody else can call it whatever they like, but like other global companies these days, they prefer to call their product by the same name everywhere (Snickers, anyone?). Unlike Mars, they can’t rename their product from “Legos” to “LEGO”, because it was never Legos in the first place, it’s just that Americans seem to be a bit confused. So they’ve made this polite request. Complaining about seems ridiculously touchy.

The problem here is not BoingBoing, it is the people who never got beyond toddler level, who don’t know the difference between speaking politely and being polite, who say “please do not smoke here” when they mean “if you smoke here we’ll send security guards to throw you out”, who say “please do not copy this CD” when they mean “if you copy this CD we’ll sue you for $100,000”. They leave us in the position where we’re not quite sure whether the Lego message is insufferable bossiness or a mild request.

On reflection, the motive might not even be marketing. It might just make their skin crawl to hear the word “legos”. Mine does, a little, and I’m nothing to do with the company at all.