Elements and Attributes and CSS

VoIP technically sucks: trying to fake a switched circuit connection with packet switching is inherently inefficient.

However, if 99% of the data on the network is well suited to packet switching, putting the rest of the data on the same platform is much more sensible than having a whole separate network just for 1%. I don’t know if voice traffic is as low as 1% of total traffic over the world’s data network, but if it isn’t yet it soon will be. VoIP is therefore the only sensible way to carry voice traffic.

That was just a demonstration.

99% of the web page data you are reading is marked-up text: words you want to read, along with markup describing how different bits of the text should be presented. HTML is a decent format for that, and XHTML is much better – more logical, easier to parse, more extensible.

The other 1% (the element) is document-level metadata – not stuff you’re meant to read. XHTML is a poor format for that, but it’s only 1%, and it’s better to use an inappropriate format than add a separate format into the same document for 1% of the content. So we put up with <meta name=”generator” content=”blogger”/> despite it’s clunkiness.

XML is designed for marked-up-text formats like XHTML. At a pinch, it can be used for other things (like document-level metadata), but it’s fairly crap. So when Tim Bray says:

Today I observe empirically that people who write markup languages like having elements and attributes, and I feel nervous about telling people what they should and shouldn’t like. Also, I have one argument by example that I think is incredibly powerful, a show-stopper: <a href=”http://www.w3.org/”>the W3C</a> This just seems like an elegantly simple and expressive way to encode an anchored one-way hyperlink, and I would resent any syntax that forced me to write it differently.

He’s arguing against “use the best general-purpose format for everything”, and in favour of “use a suitable special-purpose format for the job at hand, like XML for marked-up text”.

A special prize to those who noticed that my XHTML <head> example was just plain wrong. 90% of the head of this document is not XML at all – a document with a completely different syntax is embedded in the XML. Blogger and the W3C have decided that XML is so inappropriate that it shouldn’t be used for this data, even at the cost of needing two parsers to parse one document.

To paraphrase Tim Bray,
writing body{margin:0px;
padding:0px;background:#f6f6f6;color:#000000;font-family:”Trebuchet MS”,Trebuchet,Verdana,Sans-Serif;} just seems like an elegantly simple and expressive way to encode complex structured information, and I would resent any syntax that forced me to write about 1K of XML to do the same thing.

Deborah Davis

Saw this on boingboing:Deborah Davis in Denver, Colorado is being prosecuted for refusing to show ID on a bus.The case is complicated by the status of the bus, which while available to the public is run by a Federal government office complex, and runs through that complex.But leaving that aside, what is interesting is that she was always OK when she said she didn’t have ID, she was arrested (on a later occasion) when she said she had some but wasn’t going to show it.The US, like the UK, doesn’t have a compulsory ID card. That means she was practically OK claiming not to be carrying ID – the problem came when she (bravely, and admirably) made an issue of it by admitting she happened to be carrying ID, but refusing to produce it.This story is the perfect answer to the “we already have so many IDs, what difference does one more make” argument. It is the difference between “I am not carrying ID” and “I won’t show you my ID”, which the police in this case, typically, considered so important.(Of course, technically the government claim that the ID cards they are introducing will not be made compulsory. If you believe that…)
Support No2ID.

Copyright trespass suits

The BPI has brought more civil actions against uploaders of music to peer-to-peer networks in Britain.

Once again, this is a plea not to complain. As I said last time, the practical intellectual property debate is over whether the scope of copyright and patent law should be increased in the light of new technologies. The Right Answer is that it should not. It might be that without such expansion of copyright, certain business models will cease to be sustainable on a large scale. Whether that is the case, and whether different business models can flourish, remain to be seen.

As these matters unfold, copyright owners will attempt to apply existing laws in defense of existing business models. To the extent that this attempt succeeds, there will be less reason to extend those laws, and, most importantly, less justification for restricting the manufacture, sale and use of ordinary general-purpose tools that can be used for copying, modifying and distributing digital information. If widespread unauthorised distribution of copyrighted material can be substantially prevented by bringing civil suits against the people who do it, then the copyright owners’ problems are solved with the least impact on everyone else.

If these legal actions are not effective in protecting the copyright owners’ business models, then the real battle will follow. Showing respect for the law as it stands, and for the copyright owners’ attempts to employ it, is a solid foundation from which one can make principled objections to copyright expansion. “I want free stuff” is not.

Does information want to be free? If I say so, I mean that restricting copying and distribution of digital data is likely to be very difficult. It means that copying is likely to continue despite these suits. It does not, by itself, make a moral argument. You could say, in the same way, that petrol wants to be on fire, but it’s not an excuse to get the matches out.

Beyond what I called the “practical intellectual property debate”, there is questioning over whether copyrights and patents are a good thing at all, and whether their scope should be reduced. Some good arguments have been made, but they don’t really amount to a criticism of the BPI for seeking to protect the legal rights they hold, and have traditionally held. If their program of protecting their business model is entirely unsuccessful, that might strengthen the argument for changing the legal status of information entirely, at the same time as it strengthens the arguments for creating new IP law powers. I think it’s an entirely separate argument.

Vanishing Countryside

I still haven’t, as I promised, addressed in detail the CPRE’s latest “overcrowded Britain” nonsense, but here’s a very very simple refutation:

Google Local

The tiny dark blob dead centre of the map is Birmingham.

Remember, by 2035, “The countryside is all but over”. Except for nearly all of it, which you can only see from the air, because, er, no-one lives there.

See also this economics piece, by Robin Hanson at Marginal Revolution, on the positive externalities of urban expansion.

We also neglect the benefits we provide others when choosing to live at the edge of the populated area, versus living in an unpopulated area… Local governments are in a position to reduce this externality, but they seem to mostly make matters worse. Minimum lot sizes, maximum building heights, maximum densities, and barriers to development at the populated edge are far more common than their opposites.

Bill Thompson

In looking at news coverage of the Sony story, I saw a piece by Bill Thompson, a technology analyst for BBC. His insight into issues seems to be consistently good. I was particularly impressed by this piece on eBay and tax, where he makes the seemingly obvious but often ignored point:

The internet is not a separate space, but part of the real world.
Politicians have to get to grips with this

He has a blog, but the good stuff seems to be copies of his BBC articles.

Warning about dodgy sources of music

I am aware that some people download music files from the internet. I don’t do this myself, because of the time and hassle, and the risk of getting something other than what I wanted.

I may have to rethink this attitude, however. It is now official – buying music legitimately from the copyright owners can install trojan horses and spyware on your computer, potentially resulting in crashes and other malfunctioning. Much safer to get an MP3.

I already have a CD that I can play at work only because I made a copy of it at home – the original will not play on my work PC because of its copy protection. It’s now getting worse.

A few jobs back we bought a copy of Rational Rose – we never used it because we couldn’t get past the copy protection. It sat on a shelf for years. No repeat business there.

Illegal rip-off software, music and DVD is generally of higher quality than the legal stuff. A free MP3 is worth more than an iTunes download or an original CD, because it’s compatible with more hardware. A hacked game is worth more than the legal copy, because you don’t have to fuss with the license key.

They never learn.

Linus Torvalds

Linus Torvalds wrote the kernel of the Linux operating system, which has been enhanced, expanded, ported by thousands of individuals and companies working over the internet. See everywhere.

Nowadays, he still controls kernel development, but his role for several years has been pretty much entirely one of co-ordination. He is an IT manager. Just as Free Software is software that we can look at the source code of, Linus’ management is done in public for us all to watch and learn from.

I just hope someday he writes a management textbook. It will be full of stuff like this:

Guys, stop being stupid about things. I already sent rmk an email in private.
And Alan, there’s absolutely no point in making things even worse.
Mistakes happen, and the way you fix them is not to pull a tantrum, but tell people that they are idiots and they broke something, and get them to fix it instead.
You don’t have to be polite about it, and swearing is fine. So instead of saying “I don’t want to play any more because Davem made a mistake”, say something like “Davem is a f*cking clueless moron, here’s what he did and here’s why it’s wrong”.
Notice? In both cases you get to vent your unhappiness. In the second case, you make the person who made a mistake look bad. But in the first case, it’s just yourself that looks bad.

More XML

(Apologies to those of my readers that aren’t interested in this stuff. I’ve been giving more time & attention to my work of late, and the results are less blogging, and technical stuff being on the top of my mind more than current affairs)

Very good piece by Jim Waldo of Sun that chimes (in my mind at least) with my piece below. He emphasises the limited scope of what XML is. He doesn’t echo my discussion of whether XML is good, rather he shoves that aside as irrelevant – the comparison is with ASCII. We don’t spend much time arguing over whether ASCII is a good character set – is 32 really the best place to put a space? Do we really need the “at” sign more than the line-and-two-dots “divide-by” sign? Who cares? The goodness or badness of ASCII isn’t the point, and the badness of XML isn’t really the point either.

The comparison with ASCII is very interesting – Waldo talks about using the classic Unix command-line tools like tr, sort, cut, head and so on that can be combined to all sorts of powerful thing with data in ascii line-oriented data files. XML, apparently, is like that.

Well, yes, I agree with all that. But, just a sec, where are those tools? Where are the tools that will do transforms on arbitrary XML data, and that can be combined to do powerful things? It all seems perfectly logical that they should exist and would be useful, but I’ve never seen any! If I want to perform exactly Waldo’s example: producing a unique list of words from an English document, on a file in XML (say OOWriter‘s output), how do I do it? If I want to list all the font sizes used, how do I do that? I can write a 20-30 line program in XSLT or perl to do what I want, just as Waldo could have written a 20-30 line program in Awk or C to do his job, but I can’t just plug together pre-existing tools as Waldo did on his ascii file.

There are tools like IE or XMLSpy that can interactively view, navigate, or edit XML data, and there is XSLT in which you can write programs to do specific transformations for specific XML dialects, but that’s like saying, with Unix ascii data, you’ve got Emacs and Perl – get on with it! The equivalents of sort, join, head and so on, either as commandline tools for scripting or a standard library for compiling against, are conspicuous by their absence.

The nearest thing I can think of is something called XMLStarlet, but even that looks more like awk than like a collection of simple tools, and in any case it is not widely used. Significantly, one of its more useful features is the ability to convert between XML and the PYX format, a data format that is equivalent to XML but easier to read, edit, and process with software (in other words – superior in every way).

As a complete aside – note that pyx would be slightly horrible for marked-up text: it would look a bit like nroff or something. XML is optimised for web pages at the expense of every other function. That is why it is so bad.

Maybe I’m impatient. XML 1.0 has been around since 1998, and while that seems like a long time, it may not be long enough. Any process that involves forming new ways for people to do things actually takes a period of time that is independent of Moore’s law, or “internet time”, or whatever. The general-purpose tools for manipulating arbitrary XML data in useful ways may yet arrive.

But I think the tools have been prevented, or at least held up, by the problems of the XML syntax itself. You could write rough-and-ready implementations of most of the Unix text utilities in a few lines of C, and program size and speed is excellent. To write any kind of tool for processing XML, you’ve got to link in a parser. Until recently, that itself would make your program large and slow. The complete source for the GNU textutils is a 2.7M tgz file, while the source for xerces-c alone is 7.4M. The libc library containing C’s basic string-handling functions (and much more) is a 1.3Mb library, xerces-c is 4.5Mb.

If you have to perform several operations on the data, it is much more efficent to parse the file into a data structure, apply all transformations on the data, and then stream it back to the file. That efficiency probably doesn’t matter, but efficiency matters to many programmers much more than it should. It takes a serious effort of will to build something that uses such an inefficient method. Most programmers will have been drawn irresistibly to bundling a series of transformations into a single process, using XSLT or a conventional language, rather than making them independent subprocesses. The thought that 99% of their program’s activity is going to be building a data structure from the XML, then throwing it away so it has to be built up again by the next tool, just “feels” wrong, even if you don’t actually know or care whether the whole run will take 5ms or 500.

In case I haven’t been clear – I think the “xmlutils” tools are needed, I don’t think the efficiency considerations above are good reasons not to make or use them, but I think they might be the cause of the tools’ unfortunate non-existence.

I also don’t see how they can be used as an argument in favour of XML when they don’t exist.

See also: Terence Parr – when not to use XML

XML Sucks

Pain. Once again, I have had to put structured data in a text file. Once again, I have had to decide whether to use a sane, simple format for the data, knocking up a parser for it in half an hour, or whether to use XML, sacrificing simplicity of code and easy editability of data on the altar of standardisation. Once again, I’ve had to accept that sanity is out and XML is in.

The objections to XML seem trivial. It’s verbose – big deal. It has a pointless distinction between “element content” and “attributes” which makes unneccessary complexity, but not that much unnecessary complexity. It is hideously hard to write a parser for, but who cares? the parsers are written, you just link to one.The triviality of the objections are put in better context alongside the triviality of the problem which XML solves. XML is a text format for arbitrary heirarchically-structured data. That’s not a difficult problem. I firmly believe that I could invent one in 15 minutes, and implement a parser for it in 30, and that it would be superior in every way to XML. If a solution to a difficult problem has trivial flaws, that’s acceptable. If a solution to a trivial problem has trivial flaws, that’s unjustifiable.And yet XML proliferates. Why?Since the only distinctive thing about it is its sheer badness, that is probably the reason. Here’s the mechanism: There was a clear need for a widely-adopted standard format for arbitrary heirarchically-structured data in text files, and yet, prior to XML none existed. Plenty of formats did exist, most of them clearly superior to XML, but none had the status of a standard.Why not? Well, because the problem is so easy. It’s easier to design and implement a suitable format than to find, download and learn the interface to someone else’s. Why use someone else’s library for working with, say, Lisp S-expressions when you could write your own just as easily, and have it customised precisely to your immediate needs? So no widely-used standard emerged.On the other hand, if you want something like XML, but with a slight variation, you’d have to spend weeks implementing its insanities. It’s not worth it – you’re be better of using xerces and living with it. Therefore XML is a standard, when nothing else has been.This is not the “Worse is Better” argument – it’s almost the opposite. The original Richard Gabriel argument is that a simple, half-solution will spread widely because of its simplicity, while a full solution will be held back by its complexity. But that only applies to complex problems. In heirarchical data formats, there is no complex “full solution” – the simple solutions are also full. That is why we went so long without one standard. “Worse is Better” is driven by practical functionality over correctness. “Insane is Better” is driven by the (real) need for standardisation over practical functionality, and therefore the baroque drives out the straightforward. Poor design is XML’s unique selling point.

Large and Small Organisations

Very good piece by Arnold Kling on the differences between large and small organisations.

If large organizations are dehumanizing, then why do they exist? Brad DeLong says that my assessment of large organizations must be incorrect, or else we would not have Wal-Mart.

A point Kling doesn’t make about Wal-Mart is that it is a fairly young organisation. It was in the 1970s that it became a really large organisation, and in the 1980s that it became spectacularly huge. As I have pointed out previously, it is over time that the bad effects of states and other large organisations accumulate. After thirty years, Wal-Mart is a very effective organisation, but one would expect the problems to start soon. The massive state-managed economy Britain instituted in the 1940s started falling apart in the 1970s, and the Soviet organisation set up through the 1920s and 30s probably peaked in effectiveness in the early 60s. Small organisations can stay effective indefinitely.

This piece by Paul Graham is also relevant – describing the Venture Capital / takeover cycle as a way of getting more of the best of both worlds.