Home   New Realities 
technology, politics and the future     
corner



About

Author

Email

Discuss

Uncertain futures






 

Wednesday, December 19, 2001


Winzip turns detective

Winzip - a popular piece of shareware software used for compressing and decompressing files, turns out to have an amazing extra ability. It can be used to analyse patterns to determine such things as a document's subject matter, the language it is written in or even its true author.

The technique was devised by Vittorio Loreto and two colleagues at Sapienza University in Italy. New Scientist describes it in its 15 December 2001 issue and Physics Reviews Letters is publishing a more detailed technical description soon.

Unfortunately neither publication makes its full text available online, but versions appear on Vittorio's site - the New Scientist piece 'A gift for language' only as Postscript, while the longer piece 'Language trees and zipping' is best downloaded as an Acrobat pdf file)

Winzip and other compression programs take a sequence of characters and attempt to transform them into the shortest possible file. They typically do this by scanning the text for sequences of characters that repeat, replacing the most common sequences with shorter codes.

The more text Winzip has to work with, the more efficiently it compresses the file, because it is able to allocate the shortest codes to the most frequently occurring sequences more accurately. So Winzip optimises its compression method to the specific language pattern used in the document.

What Vittorio and his colleagues noticed is that this means that Winzip has effectively produced a fingerprint of the language pattern that can then be used to identify other texts.

Consider for example two documents, one written in English, the other in Italian. If the Italian text is appended to the end of the English text and the combined document submitted to Winzip, here is what happens. The zipper starts at the beginning with the English text, and after a while learns its rules and starts encoding it efficiently. When it gets to the Italian text it initially encodes it with the rules it has learned for English - so the Italian text is not compressed very efficiently. But if there's enough Italian text to work with after a while the zip program learns new rules appropriate for Italian, so it starts encoding the Italian efficiently too.

How can this be used to learn something about an unfamiliar text? To understand the principle, let's say you want to know whether something is written in English or Italian.

First take a long document known to be written in English, and encode it with Winzip. Then take a long document written in Italian, and encode that. Now take your mystery text and append it to the end of the original English document, and compress the combined file. And then take the mystery text, add it to the Italian original and compress that combination.

So you end up with four compressed files. The final stage is to compare their lengths.

If the mystery text is written in English then the English-plus-mystery-text file will only be slightly longer than the known English document on its own. On the other hand, if the mystery text is in Italian then the English-plus-mystery-text file will have gone up significantly in length (because Italian won't have been compressed efficiently), while the Italian-plus-mystery-text file will be only slightly longer than the known Italian document on its own.

This may all seem rather laborious, but it does have some potentially useful applications. For a start, it could be used to address those popular academic puzzles about who 'really' wrote Shakespeare's plays or authored the individual Federalist papers.

But Vittorio and his colleagues report that the technique works successfully even with very short mystery language fragments. This makes it a candidate for some practical automated applications - for example, routing incoming email messages to an appropriate language translation module.

Search engines might also use a version of the technique to look for web pages resembling a target fragment.




Are Open Source programmers motivated by altruism - or self-interest?

An interesting alternative explanation of why people are willing to give their time free to Open Source software projects is advanced by David Lancashire on First Monday.

The usual explanations (most famously advanced by Eric Raymond) fall back on altruism or make analogies with the gift-giving behaviour often reported in simpler societies by anthropologists.

But Lancashire, a Ph D student at Berkeley, thinks that more traditional economic concepts may suffice.

He notes that a disproportionate number of Open Source developers are from Europe or Canada, rather than the US. Many of them are young. And a lot of them later migrate to the US to work in paid jobs. So a simpler explanation for all this supposed altruism might be reputation-building for economically-rational careerist reasons.

' We expect individuals to produce free software if doing so can help them shift to a higher wage-level ... Developers may embrace open source work as a way to tap into lucrative corporate networks abroad ... In other words, the appropriate analogy for open source development may not be to cooking-pots and cauldrons so much as to the Mayflower.'



Friday, December 14, 2001


Religious-hatred law dropped to allow other anti-terror laws to go through

The struggle over the proposal to make 'incitement to religious hatred' a criminal offense in the UK has ended with the government giving way to its critics. Yesterday (Thursday) the House of Lords yet again rejected the religious clauses by a large majority. Rather than risk holding up the whole Anti-terrorism, Crime and Security bill, the Home Secretary David Blunkett agreed to drop the whole 'incitement to religious hatred' idea.

The Commons then sent the bill, minus the offending clauses, back to the Lords, which passed them early Friday morning. Some time today the bill is likely to receive royal assent, the final stage in getting it onto the statute book. The police will then be able to use their new legal powers this weekend.

The Lords also won one other important concession in the prolonged struggle over the bill - this time on police access to emails and telephone logs. The original proposal gave the police broad powers to seize email and telephone logs. In the amended law police seizure powers only apply when investigating terrorist offenses, not ordinary criminal ones.
BBC: new anti-terror laws at a glance



Thursday, December 13, 2001


Struggle continues on UK religious hatred law

Late yesterday (Wednesday) the Commons reversed the Lords amendments to the government-sponsored Anti-terrorism, Crime and Security Bill. This sends the controversial package of new laws back to the Lords, the UK's upper chamber, where opposition is strong to those parts of the bill that touch on religion.
(see below under Monday December 10th).

The key area of difficulty concerns clauses that create a new offense of inciting religious hatred, modeled on existing laws against inciting racial hatred, and punishable by up to seven years in jail.

But many people, Lords and an increasing number of Muslims included, think the Labour government, made up largely of atheists and agnostics, is blundering into territory it does not understand.

Its law may well have the unintended effect of outlawing freedom of expression on issues of fundamental importance to many religious people - people who can't be counted on to shut up just because some man-made law tells them to do so.

The end result may well be the opposite of what the government intends, with religious tensions inflamed rather than calmed.

But the government appears deeply attached to its idea of using the law to 'outlaw religious hatred'. It thinks it will work and is determined to keep the relevant clauses in - despite a revolt last night by 27 of its MPs in the Commons.

The Lords is unlikely to be convinced. A deal is not yet in sight, so all the government's new anti-terrorism measures may be blocked if it insists on keeping the religious stuff in.

The full text of the debate on Monday which led to the Lords to reject the goverment's position (by a large 240 to 141 majority) is now online. (I link to a point about three-quarters of the way through)

House of Lords debates have a curious genteel texture, an impression enhanced by the fanciful names of some of the Lords. But on this occasion they are dealing with issues of real principle, and making a fairly good job of it.

One Lord pointed out that since Osama Bin Laden is clearly acting from religious motives several government ministers have already broken the terms their own proposed law by urging (indeed ordering) people to destroy him - a clear-cut 'incitement to hatred' if ever there was one.



Wednesday, December 12, 2001


Calls for UK to copy US Megan's Law following child murder case

The conviction of a known sex offender for the kidnap and murder of eight-year-old Sarah Payne is sure to revive calls for the wider publication of details of child sex offenders in the UK.

Roy Whiting had previously been convicted of the kidnap and indecent assault of a nine-year-old girl in 1995, but was released from prison after only two years of a four-year sentence.

His record was known to the police and probation (prison aftercare) service where Sarah lived, but not to local parents or the general public. Indeed, in the UK there are few legal ways for even schools and childcare groups to find out if they are dealing with someone of known danger to children.

The situation in the US is radically different. Following a similar child murder, of seven-year-old Megan Kanka in 1994, new so-called Megan's Laws have been introduced in all 50 states. They make it the duty of the authorities to inform the local community of any sex offender living nearby.

Each state has enacted the law slightly differently. The most suitable model for the UK is probably California. It has a similar number of convicted child sex offenders as the UK - roughly 100,000, and its public access arrangements incorporate numerous safeguards to prevent vigilante action and keep the police firmly in control.

The California Attorney General's site explains the policy. In California offenders' details are NOT put up on the Internet. Instead the authorities encourage anyone wanting such information to go to a police station in person, where the data is held on CD-ROM.

You have to prove that you live locally and that you are not yourself a sex offender - to prevent paedophiles using the information to contact each other.

Some California districts such as the city of Fresno publish generalised maps online of where sex offenders live. But to get detailed information you still have to talk to a police officer.

This means that in California the police have a record of everyone to whom they've given offender details. It also means the police have an opportunity to counsel enquirers on what they can and can't do with the information.

The whole procedure is designed to discourage vigilante attacks whilst getting essential information into the hands of those who need it most to protect children - in particular school and community leaders and local parents.

In the UK the big fear of the police is that any kind of open offender register will lead to mob violence. Indeed, there was some rioting in the summer of 2000, at the height of a campaign to change the law, in the south coast city of Portsmouth, not far from where Sarah Payne was murdered.

But the US experience has been very different - even in states with full publication of offenders details on the Internet such as Texas, Florida and Ohio, attacks on sex offenders are rare.

For example, most Ohio counties put the addresses and photographs of offenders online. The police in Dayton don't seem to be overly concerned about the mob attacks the UK authorities so fear.

In the seven years since President Clinton signed the federal act requiring all states to pass Megan's Laws, the US experience has been broadly positive. Most police and state attorney's departments are now in favour of the law and don't seem to find it causes them difficulty finding and prosecuting sex offenders. Nor do they find themselves having to deal with a rash of vigilante attacks.

Since these are all fears that have been raised here repeatedly as reasons to NOT implement a UK equivalent of Megan's Law, it is clearly the duty of the government to investigate the US experience much more seriously.

In the UK the tradition of state secrecy is very strong. It is all too easy for the police and government ministers to dismiss the idea of an equivalent UK Sarah's Law as unworkable and carry on in their habitual guarded way.

But this would be a tragic mistake if - as the US experience of Megan's Law suggests, releasing sex offender information more widely really could help save children's lives.

Sarah Paynes mother is on record as saying: "More than 80 per cent of the population support Sarah's Law. Please don't let her death be in vain."

News of the World campaign to change UK law
Summary of US Megan's Laws in all 50 states



Monday, December 10, 2001


Religious hatred law rejected again

The UK government's attempts to pass a new law that seeks to 'outlaw' religious hatred is running into heavy opposition in parliament. Earlier today the house of Lords, the UK's upper chamber, rejected the proposal by an unexpectedly wide margin. This doesn't kill the measure stone dead, but makes some kind of deal more likely.

See below under Monday, November 26, 2001 for why I think this law, although well-intentioned, is a really bad idea and likely to prove counterproductive. What happens now is that the bill goes back to the Commons, where the government will win, but this will just bring it back to the Lords again later this week - and the Lords can throw it out again. Theoretically this stalemate can go on for some time.

In the UK's parliamentary system the Lords has the role of revising new laws - and blocking bad ones until they are improved or withdrawn. The Lords is considerably weaker than the Commons because it is appointed rather than elected. Ultimately in any struggle between Lords and Commons, the elected Commons always wins.

However, by repeatedly throwing out a measure the Lords flags up the issue to the press and public, and creates a delay in which it can be properly discussed. This is what is happening with the proposed religious hatred law. And the government appears to be getting the worst of the wider argument in the country.

Since the dispute is holding up other less controversial measures needed to beef up security in the wake of September 11th, there is still a chance the government may drop the religious parts of the package to get the rest in place before Christmas.




Friday, December 07, 2001


Amazon launches easy-access version of site

Amazon has launched an easy-access version of its site for partially-sited and blind users.

What's significant is that it hasn't gone down the usual giant-type-in-lurid-colours route - used on, for example, the BBC's new site.

Instead Amazon has radically simplified the navigation structure.

This helps because many blind and partially-sited users use special audio software to read the web pages out to them. Normal graphic-oriented sites - even with the images turned off and fonts displayed at huge size, can still be time-consuming and confusing to surf this way.

Amazon's new site, which can be found at www.amazon.com/access, looks very sparse - even sparser than austere search site Google. The fonts are displayed at normal browser size, albeit with a high-contrast colour scheme of black and blue against a plain white background.

This also makes sense, because most partially-sited users will have already set their browser preferences to over-ride the defaults with a font size of their choice.

Amazon has based Amazon Access on the versions it already deploys for mobile users with WAP and i-Mode phones or other simple hand-held devices. Because these devices only have limited displays, Amazon had to rethink and simplify navigation. This work has stood it in good stead when making a version for blind-and-partially-sited users.

Amazon's approach differs from the traditional notions of acccessibility propounded by the W3C, but isn't fundamentally incompatible with them. The W3C recommendations are largely concerned with getting ordinary web authors to make their sites a bit more accessible by following some simple design principles, rather than redesigning their web sites from scratch.

W3C accessibility guidelines
BBC big-font approach
Amazon press release





UK web revenues will soar - thanks to gambling

The UK could take more than a third of all European online consumer entertainment revenue by 2005 - but mainly because of its lax laws on Internet gambling.

According to Schema Consulting, 10 percent of gambling will be online by 2005, compared to just one per cent now. One reason for the very rapid growth is that women are more likly to gamble online than visit a High Street betting shop. Schema talked to over 6,000 consumers in six European countries to compile its report.

Schema brochure
CNET story




This page is powered by Blogger.