Stripmining The User, Part II – A Response to EB

Because the themes on this blog make comment quoting and deep-threading a nightmare, a post in response to Elias’ comment:


Hi – Bizarro here.

Nice to meet a superhero of sorts!

1) your refutation of “the intelligent personal agents that are able to process this structured data still have a long way to go before becoming fully actualized. Really?

Ook?

I told Alisa to include this when she asked for feedback on her original post. My reasoning being personal agents do exist but they are still dumb. And the reason they are dumb, is because there is not *enough* machine processable information to allow them to act without human guidance.

That’s just one of the places where we differ; for me, they’re dumb because we’re dumb, and the gap will never close. No amount of knowledge will ever equate to wisdom, and no quantity of facts about somebody will ever replace a relationship or a connection, and something inside us wants a connection.

Wishing for more and more semantic-able data until some sort of critical mass arises, is just another form of Bullwinkle’s “this time fer sure!” – no rabbit is ever going to be pulled out of that hat.

I would agree that “more data = more nifty stuff being built” – witness mashups of UK Parliamentary Expenses versus distance traveled from London, colour-coded on Google Maps.

That was really enlightening a few months ago; but politicians’ expenses are (or should be) public information. Where I have private information about myself, I should choose with whom it is shared. Where I and another party have shared information about myself, a policy needs to be agreed.

There’s nothing hard about that.

We are getting there, but it’s no mass market opportunity yet.

  1. how much personal semantic data needs to exist before a solution spontaneously erupts?
  2. why should the data be created in the first place?

I’d love to know the answers to those questions.

2) Channeling Adriana? I’m also a member of the VRM Project. So no coincidence we have similar ideas. But to get more evidence of the access point, you can read this blog post I wrote

http://eliasbizannes.com/blog/2008/11/you-dont-nor-need-to-own-your-data/

There is another post I made on the mailing list well over a year ago, and I essentially made the point that your identity data only has value if it is recent…hence why access is more valuable than capturing the data (as you always get the latest data).

If we’re into datestamps:

  • I explained the Mine! concept (including “value of data decreases with time”) to Eve Maler in March 2008 – my iPhoto tells me so. Maybe that’s how you picked it up?
  • We published the white paper in Feb 2008 which alludes to it in §v0.4
  • Further back we mooted the Mine as a technical solution at IIW2007b.
  • And Adriana was talking about the dynamics and value of data as a proxy for control in relationships earlier than spring 2007; the nascent VRM community were wedded to Higgins back then, didn’t want to know.

So: it goes back a long way before 2008/11. As an aside the “user-driven” concept was Adriana’s, too, until it got fubared by people who wanted to use it in a tail-wagging-dog kind of way to describe their pet projects. Alas and alack.

Anyway:

So for example, your relationship status and current employer changes dramatically every year, five years, ten years – if Facebook had that data about you five years ago, it’s less valuable than if you updated it two days ago.

Yep. Agreed, totally. That’s where we started from in November 2007. Now, we’re approaching beta for the first Mine! software, and the cool thing is that it doesn’t have a technology adoption curve.

3) Your broader point about a distaste for monetising people’s identity, I think you need to recognise that’s just how the world works. And rather than preach a utopia where companies cannot do so, we need to instead shape a world where we control our data and the benefit surrounding it.

Accusations of / belief in utopianism generally generally come from someone who can only see the world as a zero-sum game – either the good guys win, or the bad ones.

In reality it doesn’t work like that.

  • I have lots of data about my Amazon purchases.
  • Amazon has that too.
  • Amazon doubtless monetise that information.
  • I want to do that, too.

I want to share the data on my own terms and technology is arising that permits me to do that; subscribers to my data will get the raw data, direct from the source, ie: me.

That’s very valuable stuff, as you yourself agree, and market dynamics suggests what might happen next.

Rather than prohibiting an existing practice, we need to re-engineer it and do so along the lines of incentives for business as that is how you create change.

Prohibiting? Who said anything about prohibiting? I am just talking about throwing away the middle-men. We are users. We can do that. On the ‘net we are waking up to the idea that we don’t need intermediaries, except at our convenience.

Every iPhone and Android runs Unix, but Unix is “too complex for users”. Every bittorrent client is a webserver, but “running a webserver is too complex for users”.

Just imagine what will have been too complex for users, in 2010! 🙂

Stripmining The User: DataPortability, The “Pragmatic” Web, And A Bad Philosophy

Regards the ReadWriteWeb article “The Future Is All About Context: The Pragmatic Web” … Well, I really think you should read it for yourself, ‘coz between the lines it is rather shocking.

I don’t much disagree with the first couple of paragraphs; I’m deeply cynical about the “semantic” web and still hew to my belief that “if you create documents that a computer can read, only a computer will want to read them” but although Alisa Leonard-Hansen [ed: henceforth ALH] echoes the AI zealots of my youth when writing:

…the intelligent personal agents that are able to process this structured data still have a long way to go before becoming fully actualized.

…I think ultimately we disagree because I believe (a) the personal agents are already here, (b) they are called ‘smartphones’ and (c) they will evolve and improve in ways that we cannot adequately imagine, but they certainly will give people a platform capability which, as a Unix sysadmin, I would have drooled for in 1995.

ALH continues by boosting for “the pragmatic web”[1] and ties that term to some thinking with which I mostly agree, viz: your digital identity equates to your digital footprint:

We need to better understand our identity as it begins to define our experience of the Web and the networked-enabled world we inhabit. Our online identity will increasingly be defined by three “pillars”: who I say I am, what I do and say, and who I connect to (and who connects to me).

To clarify, our online identities are comprised primarily of three specific kinds of data:

  • Explicit or prescriptive data (i.e. the data that I input about myself: name, age, occupation, etc.);
  • Activity or behavioral data (i.e. what I do and say online);
  • Relationship data (i.e. my social graph and what my connections say about me).

Indeed, Adriana has been saying this, better, for years; but then ALH’s article goes horribly wrong; the rest of the article flows from a bunch of unstated premises which I think would be written:

  • The user’s identity is not under his control
  • This cannot, perhaps even should not be changed or fixed
  • This is a good thing because it affords business opportunity
  • ALH’s “pragmatic web” is a good thing which must be brought about

Rather than just strawman this, I’ll try to justify how I reverse-engineered these premises:

“The user’s identity is not under his control”

Well, yes, this is a given on Facebook (at least) – you hand over your data poke your friends with vampires, post pictures about yourself vomiting, and then have to fight to control who sees them – if you actually care.

“This cannot, perhaps even should not be changed or fixed”

I justify the “cannot” because there is no alternative presented:

But the centralization of identity data on one or two major networks […] won’t realize the vision of the pragmatic Web. So, how will the pragmatic Web come to be? How do we realize the power of a dynamic Web that is based on our [ed: distributed and uncontrolled digital footprint] identities?

…and instead we see merry pictures of how having one’s identity hanging-out-there-in-public can be exploited for profit monetised:

The resulting vision is that of a highly personalized, dynamic, relevant and remixable Web experience, yielding greater access to information through discovery, communication and collaboration. For enterprise, this could mean the rise of innovative new business models, based on data-driven value exchange.

I further justify the “should not” because it leads into another pseudopremise:

“This is a good thing because it affords business opportunity”

For me this is justified by the money quote:

Consider this: as media companies scramble to identify new and innovative ways to advertise to the sea of nameless, pixeled users who graze through their content each day, a rich supply of highly valuable identity data lies just beneath the surface, left unmeasured and unmonetized.

There it is, folks: you are all natural inforesources begging to be crushed and rendered into yummy data that feed the advertising industry. You are a chicken and the advertisers want McNuggets. Yes they really think like that; they just don’t put it that way because it sounds bad, but what you browse and what you like are more important than “you”, in this world. You exist as a demographic.

And finally:

“The ‘pragmatic web’ is a good thing which must be brought about”

…well, if you sold these concepts to advertisers and vendors (“So, how will the pragmatic Web come to be? How do we realize the power”) you would believe and write everything from that perspective.

So like in any endeavour, with this pragmatic web we have:

  • the motivation (profit)
  • the opportunity (data just lying around waiting to be harvested)

…which leaves only:

  • the ability (tools and a suitable environment)

…to be created. This is what almost everyone is trying to do, nowadays; it’s where the money is. ALH starts to suggests that Elias Bizannes of DataPortability is also channeling Adriana, with:

One final note on identity data as it relates to enterprise. As Bizannes points out, the value of this kind of identity data rests on the key factors of time and timeliness. Essentially, identity data is valuable only if it is recent. Facebook wouldn’t be able to sell your (permissions-enabled) data to advertisers if it used your explicit data from a year ago rather than from today.

…which is astonishingly similar to The Mine Project’s longstanding philosophy that relationships are maintained by sharing information – and that because currency is valuable then your ability to control peoples’ access to your current data, thoughts and feelings identity, gives you the whip hand in a digital relationship.

However Bizannes apparently holds a Bizzaro approach to this line of thought:

So, Bizannes argues that real-time “access” to someone’s identity matters most, and it’s no longer about data “capture.” Thus, as new business models arise out of monetizing permissions-enabled identity data, the value of the business models will depend on these entities having real-time access to the data.

I really do wonder whether ALH is quoting Bizannes correctly?

Anyway, this is another example of what I call FacebookEnvy[2] – you can smell the line of thought:

  • Facebook has all this identity information!
  • We should make it open, so that it’s better!
  • But we need to keep a back door, so we can monetise it!
  • So what we’ll do is, we’ll be intermediaries!
  • Like Facebook is!

I shaln’t name them here but this is a growth area of the web at the moment – world-class hot-air merchants developing systems to empower the little guy / the common man, arguing that the way to do benefit humanity is for them to adopt this wonderful new [THING] to interpose between [YOURSELF] and [THE OUTSIDE WORLD].

Pay no attention to the revenue model behind the curtain, and God (or Law, or Protocol) please forbid that users have, control and use platforms for themselves, or that anyone trust what people say about themselves.

But that’s a rant for a future blog post.

– alec


Postscript 1: The Pragmatic Web

I wonder if ALH really means the same “Pragmatic Web” concept that appears to live at http://www.pragmaticweb.info/ – where they publish a manifesto with the following helpful definition:

The vision of the Pragmatic Web is thus to augment human collaboration effectively by appropriate technologies, such as systems for ontology negotiations, for ontology-based business interactions, and for pragmatic ontology-building efforts in communities of practice. In this view, the Pragmatic Web complements the Semantic Web by improving the quality and legitimacy of collaborative, goal-oriented discourses in communities.

Which is all very “semantic” and seems to overlap with ALH’s article, somewhat.

To highlight the value of this “Pragmatic Web”, the authors also write:

To search for potential window manufacturers (WMs), current search engines suffice, although a general ontology may offer improvement. But once negotiations with different window manufacturers begin, a branch-specific ontology is required that includes, for example, the specification of construction materials.

The WM should only use highly insulated window frames and should construct the windows using specific techniques to avoid thermal bridges.

If the WM is not German, the legal regulations might be unknown and so the manufacturer must understand the underlying ontology and commit to it.

It can also occur that the partners must add new concepts to the existing ontology. For example, they might have to agree on a specific type of low-energy house, namely one using three litres of energy per square meter of area with controlled ventilation and using geological heat sources.

Such a concept is not an objective description of a given reality, but is developed within the conversation between the parties, who in their conceptualization of this kind of house take into account many tacit, non-formalizable context factors. The effect of the resultant joint definition may be that contract negotiation is smoothened, or even that the costs are reduced since some requirements may turn out to be superfluous.

[ed: paragraph breaks added for clarity]

Excusing energy being measured in “litres” I can almost see what they are getting at, because even the godlike powers of Google fail providing the ability to deal with queries such as:

http://www.google.de/?q="windows windowtype:upvc legalsystem:german glazedepth:3 heatloss:-3kw +ventilation:true +attractiveness:pretty +tint:clear +style:art-deco"

…and I agree totally that there is insufficient means to describe windows for a mechanical search, and I understand how it can keep some semantic/markup/librarian-types up all night, worrying about it.

But Me? As in most of my semantic-web scenarios, in reality I use an ultrasoft-AI approach:

I do a search and pick up the phone to discuss what I want.

It works. I’d call that the “pragmatic” approach.


Postscript 2: FacebookEnvy

A line of thought: “Facebook does X. We should do X, but open, so it’s better.” Leads to any amount of pseudoinnovation. Replace with TwitterEnvy where appropriate.

Performance Art: Why Secure E-mail Never Went Mainstream…

…or “Why Johnny doesn’t want to encrypt.”

There’s been a been a thread on Perry Metzger’s “CRYPTOGRAPHY” maillist, about “Why the poor uptake of encrypted email?”.

There was the usual citation of “Why Johnny Can’t Encrypt” (summary: cryptography is hard for people to grasp and harder for programmers to make user-friendly) – but because of my work on Adriana’s Mine Project I see another reason which is not really covered in the paper.

To try drive the point home I submitted my response encrypted under rot13, and bless him Perry was kind enough to post it on verbatim without querying it.

So what I have done by “protecting” that message with cryptography is deny it indexing by Google (cf: deny it integrated search like Spotlight) – make it harder to find, read, quote… all the same side-effects that “secure” e-mail imposes upon a secure e-mail’s recipients in pursuit of solving a transport security problem.

I attach the (unencrypted) body of the e-mail, below; the final few paragraphs are the key ones for the next decade of computing.

I believe there are some fundamental axioms, or postulates of computing that are flat-out wrong – or will be, soon – but which strongly inform our notions of “how computing should be”; to me these are like the the “parallel postulate” of Euclidean geometry – when they fall, when the world suddenly sees their “self-evident” nature is false, then exciting things will happen.

What are these computational parallel-postulates? Stuff like:

  1. users can’t be online 24×7 (cf: ADSL, hosting providers)
  2. (also phrased as) connection time is prohibitively expensive (ie: dial-up)
  3. users are not able / not allowed to serve information to third-parties (cf: blogs, feeds, apache)
  4. users can’t afford to host a server to act upon their behalf (cf: hosting, iPhone)
  5. users can’t store heaps of data (cf: terabyte hard disks)
  6. secure end-to-end communication is not possible between servers / across the internet (cf: im, skype)
  7. the client-server programming model means we should implement hierarchies, not peer-to-peer meshes (iTunes Store vs: Bittorrent) [updated 1]
  8. the web is a series of hierarchies, not a mesh (also known as the “deep linking is bad” argument) [updated 2]
  9. computers need to be rebooted daily/weekly [updated 3]

[ed: I am adding extra entries to the above list, denoted with square brackets]

…and a host of others; I could spend a day enumerating these fallacies and defunct limitations. Almost the entire identity-provider industry is based upon numbers 1, 3, 4 and 6 – whilst people gape at the extraordinary power of Bittorrent and the shocking proposition of OpenID without seeing them as logical consequences of overturning numbers [1,2,3,5,7] and [3,4,6] respectively.

This is one of the reasons I like The Mine Project – it kicks over a bunch of those rules – most, perhaps all of them – and in such a new space very exciting things may happen.

Anyway – herewith the e-mail:

Beyond the “Why Johnny” paper – focusing upon usability – I think there is a higher problem of interoperability and information-access at play here.

There can be no access to your mail without use of a client if you are using cryptography – even ROT13 – and this alone is a big problem, because mediated access to your e-mail is *really* painful.

For some 15 years I used mh/nmh/exmh (latterly with fetchmail), then moved to Mail.app, recently tried Thunderbird for a few months, and am re-considering nmh for long-term archiving of e-mail. I also use my iPod, three laptops with varying species of Unix, and a 3G phone to access e-mail. Occasionally I still copy stuff out of /var/mail/.

I would have suffered immensely were I required to use a particular crypto-enabled client to deal with my e-mail at each stage, or were I required to use historical crypto-clients to access older mails.

Anyone whose college thesis is in WordPerfect on a 5.25″ floppy at the back of a closet somewhere, should understand this problem.

To this day Project Gutenberg uses flat ASCII as a lowest common denominator format, and similarly I need my e-mail in the simplest form so that I can grep it, perl it, quote it and search it.

So “why has encrypted e-mail failed?” I suspect that static data encryption revolts against the nature of personal communication and the needs of personal information re-use.

For comparison, consider the convergence of instant messaging and e-mail – they are becoming ever more alike, but the former mostly relies upon end to end transport security, often assuming that the privacy of logs at either end are at the whim of *that* user.

For some reason this works rather well; as security geeks we complain about it, but there have been many times when Skype has bailed me out of trouble with its ability to drill through almost anything and provide me with messaging and file-transfer.

Similarly AIM, Jabber, GChat – all of which I happily run with OTR – give me necessary mostly-secure communication.

In the world of e-mail the problem is that the end-user inherits a blob of data which was encrypted in order to defend the message as it passes hop by hop over the store-and-forward SMTP-relay (or UUCP?) e- mail network… but the user is left to deal with the effects of solving the *transport* security problem.

The model is old. It is busted. It is (today) wrong.

It’s like ordering lobster bisque, and having a live lobster turn up at your table; what you want is in there – heavily armored – and yes you can render what you receive into what you actually desire; BUT it’s messy and you’re really stuck unless you have a mouli, a saucepan and a small PGP hotplate at hand.

And of course you have to archive copies of the lobster, not the soup.

S/MIME and its bretheren exist to simultaneously address the security of [both] data in motion and data at rest – but people don’t want the latter in the form that it provides, because it inhibits interoperability and usability at a level above the “this software sucks” matter…

And if the “data in motion”/”end to end” security issue is being addressed by things like IM/OTR and Skype, then perhaps “secure” e-mail will soon go the way of Telnet and FTP?

Hankering For A World Without “Identity” or “Federation”

Author’s note: this is not a white paper. This is an opinion-piece, possibly a polemic. In it I expound what I believe rather than making an argument for you to believe it too; however if through it you arrive at a technical question or desire clarification, then please leave a comment using the tool provided. Also, there are footnotes annotated in square brackets. They are worth reading as you go along. Once I have had more coffee I’ll get round to making them into hyperlinks. Sorry.

Abstract

This posting began as an standalone article to describe my tussel with “Identity” in all its various forms, however it has evolved into a companion piece to Adriana’s musings on identity – not only because upon reading her posting I found us using like words and like metaphors to much the same conclusion, but also doubtless because it was she who singlehandedly provided me an alternative to a world without (or with much-reduced) “Big I” Identity.

However I wish to spell out my beliefs rather more bluntly, so here we go:

I believe that Identity is bunk.

I believe that the technologies of Identity are founded upon and perpetuate an outdated model of a passive user who lacks both the critical authority and the ability to participate in an authentication transaction, and further I submit that Identity’s commitment to this model inhibits its further evolution in the modern era.

…but before continuing I want to address a few potential misconstructions to aid later clarity – so for contrast I shall begin by listing a selection of identity-related topics which are emphatically not bunk:

identity theft

I have written about Identity Theft at length elsewhere, and although I still maintain the viewpoint that identity theft is straightforward fraud more than anything novel, I cannot deny that it occurs or that it is a serious matter.

identity management

I have a former BOFH sysadmin’s view of Identity Management, which means that of course I am going to welcome any set of tools which (a) permit me to unify my users’ authentication mechanisms into a homogeneous solution and (b) allow me to effect bans, lockouts or password changes on 30,000 machines at the same time. To deny the utility of this would be insane.

authentication

the act of establishing rights or privileges to access resources is one of the most fundamental (and common) actions to occur within a computer network.

strong authentication

one-time passwords, authentication tokens, javacards, sunray cards, stuff to authenticate more strongly (ie: definitely) to my network? Sure, “bring it on”.

single sign-on

see the section on “strong authentication”, in fact see all of what I have written above. Within a security domain it’s a wonderful thing to not have to keep typing-in your password to authenticate separately to Mail, Calendar, Web and database. It’s a neat trick if you can do it.

I consider all of the above to be perfectly decent, supposedly identity-related matters; where I diverge is in the field that I refer to as “Big I” Identity.

So What Is “Big I” Identity?

“Big I” Identity – let’s just call it “Identity” from now on, so that I don’t go mad spelling it out each time – is the umbrella term I use to describe processes and identity enabling technologies such as:

  1. Digital Identity
  2. Cross-domain Federated Identity
  3. Identity 2.0
  4. Identity Metasystems
  5. CardSpace
  6. Higgins
  7. …and an entire dumpster-load of other projects, toys, tools, XML standards, etc, all borne of the mindset which led to the above

Identity exponents paint a future in which your identity is a digital puppet – or possibly a hive-mind of several – living in a studio flat in cyberspace, buying goods, paying taxes and dealing with the other bureaucracies of life on your behalf, able to transact within cyberspace because your puppet has been certified into existence by some higher authority – most likely after payment of some real-world money.

In some ways the model is very like “Second Life”:

  • You pay for your Identity avatar to continue exist, so it may transact for you, and it will continue to exist only for as long as you pay for it.

  • You imbue it with some of your personal qualities.

  • You manage it awkwardly via remote control.

  • And you likely wish it was somehow also portable into “World of Warcraft” and “Everquest”, or vice versa – a process of federation..

A moment’s consideration of the above will reveal a fundamental concern of mine: your Second Life avatar only exists with the permission of of Linden Labs, and its future is bound to theirs.

Similarly: if your identity exists at the whim of another organisation, then it is not under your control and could cease to exist without your approval.

That would be a bad thing.

But before going further with that matter, I want to rhetorically ask:

Why Pursue Identity At All?

Our culture – our biology – seems geared for use of certificates to gain access to resources: having the “right” scent to enter the anthill, dressing an orphaned lamb in the skin of a dead one so that the latter’s mother will feed it… these demonstrate that nature has some grasp of authentication for a service, even if sometimes it implements weak authentication – e.g. a cuckoo’s egg in a reed-warbler’s nest[5].

What happens next is (I believe) unique to humans: we conflate “authentication” with an abstract concept of “identity”, and thence indirect from that to “authorisation” – so that somehow your state of mind, your beliefs, learnings, and capabilities can be captured, documented and carried-about as a certificate.

To be technical for a moment, traditionally speaking:

  1. authentication is the act of proving your “identity”

  2. a certificate documents an authorisation in an authoritative manner

  3. the process of authorisation provably binds an “identity” to the permission or ability to use or access a privileged resource

…or as otherwise experienced with a Norwegian police officer:

  1. “Yes Officer, my name’s Alec Muffett. Here’s my Passport.” (authentication)

  2. “Yes I am permitted to drive a motorcycle, here’s my license.” (certificate)

  3. “Feel free to check the license, it’s got the hologram, etc.” (authorisation)

  4. “The freeway speed limit is 50 km/h? You have got to be joking…” (negotiation)

…so when demanded by one authority (Norwegian Police) I am required to show two verifiable / hard to forge certificates: one linking the abstract concept of “Alec Muffett” to the actual human-being in front of him, and the other linking the abstract concept of “Alec Muffett” with the privilege of riding a morotcycle in the United Kingdom.

In passing, note that Norway’s recognition of the UK’s motorbike test is some manner of cross-domain federation.

The abstract concept known as “Alec Muffett” is my identity.

The UK Government understands “Alec Muffett” as the identity of a person who in 2001 passed the UK motorbike test thereby granting “Alec Muffett” the privilege of riding a motorbike on the UK’s roads – but although congruent, the identity of “Alec Muffett” is not equal to the six-foot-four hominid commonly associated with the name and who is typing this posting; instead it’s more a cloud of “claims” (either explicit or implicit) which are associated with the latter.

Claims are, for instance:

  1. Explicit: Alec Muffett is male

  2. Explicit: Alec Muffett passed a UK Motorcycle Riding test in 2001

  3. Explicit: Alec Muffett was born in 1968

  4. Implicit: Alec Muffett is old enough to buy alcohol in the state of California
    (since he was born in 1968 and thus is older than 21)

It would be really bad if we had to go around carrying certificates to authorise us for each and every one of the claims which dominate our lives. The cloud of explicit claims about Alec Muffett is large; the cloud of implicit ones is much larger, because an implicit claim derives from the context of someone seeking to verify the claim (eg: a Californian bartender) – and there are a near infinite number of potential contexts in the universe.[2]

However, in the real world, carrying physical certificates seems to be what biology has predisposed us towards.

What happens when we move our identities “online”? What happens is that folk try to replicate the authorise-via-trusted-certificate model of access control; and then they fret about the management issues regarding having done so.

Why do they do this, and why do they fret?

To move back to the Norwegian police analogy above: rather than resorting to credentials and identity to prove my ability to ride a motorcycle to a police officer, why not appeal to the the officer’s ability to observe that:

  1. I am demonstrably riding a motorcycle now.

  2. He has observed me riding it for a few miles.

  3. I would be perfectly happy to undertake a small test, there and then.

In short: why could not the police officer to observe me, develop a relationship with me, and from that satisfy themselves of my capabilities.[3]

If instead of being observed for a couple of miles once-off by a police officer, what if he knew me from the local motorcycling club? Wouldn’t having that relationship shortcut questions about my authorisation to ride a motorcycle – and shortcut invocation of a whole heap of paperwork and certificates, unless I was actually being booked?

The answer is obviously “it doesn’t work like that in the real world – relationships don’t scale in the real world”.

Yes, of course, but why should it not work like that in cyberspace? Because relationships do scale in cyberspace.

So What Am I Saying About Authorisation?

I am saying that authorisation need not be linked to an identity when it can be linked to a relationship with an entity, instead.

Anyone who has heard me speak at length about security in the past ten years or so, will have heard me utter something like:

Amazon really don’t care who you are in respect of your drivers license. They likely don’t care what your passport number is either, or who the government say you are. What they really care about is that the person placing an order today is the same person who placed an order last month, and the month before, and that each time before the person paid.

I submit that the frippery of Identity – that whole circus of indirection from me to a identity, from that identity to some authorisations, contains a potentially unnecessary step, one that can sometimes (perhaps frequently) be circumvented by maintaining a relationship with the entity to which you might otherwise have to authenticate.

What eluded me completely was the obvious next step, which was later inspired by months of talking with Adriana Lukas about Project VRMDoc Searls‘ pursuit of Vendor Relationship Management.

The next step is simple: create a tool that maintains a person’s relationships with third parties, but puts them under his or her own control.

A Different Way To Approach Authentication

To recap the above: traditionally there are three tines of authentication – three things you assert to prove your right to access a resource:

  • something you have

  • something you know

  • something you are[4]

eg: you have a key to a door, you know the password, you are the General in uniform or the appropriately-coloured cuckoo’s egg in a reed-warbler’s nest.

(Author’s note: at this point, if you’ve not read it already, please go read footnote [5] – you’ll need the background in a moment)

All of the above are predicated on the notion of need for repeated authentication – you use your door-key daily, your password likewise, you check your eggs each time you return to the nest.

But here’s a new spin on “something you are” – what if instead of checking the shape and colour of the eggs each time we return to the nest, instead what if we just watched the eggs, ever vigilant and unblinking, all the way from laying to hatching?

What if the reed-warbler was able to stretch its attentions beyond all conceivable bounds and move from weak authentication of the form:

You ARE an egg of the correct shape and colour

…to a more radical strong authentication of:

You ARE the specific egg that was laid, and I can guarantee that fact because I have never ceased to watch you since the time you were laid

In short, what if you had a relationship with your eggs, and could stretch that initial relationship (egg laying) through to conclusion (hatching) without any interruption?

If you were capable of doing that, you would have invented a new style of authentication – “relationship based authentication” – that requires no external parties or authorities to function.

And, interestingly, it would be a form of “single sign-on”.

The Third Form Of Single Sign-On

Eve Maler and Drummond Reed recently published The Venn Of Identity in IEEE Security and Privacy magazine, and it serves as an excellent introduction to a lot of the thinking, terminology, concerns, and perhaps some of the fads of the Identity community.

For me, the critical section is headed “Overview: Federated Identity Model” on page 17, in which it defines terms like “user”, “user-agent”, “identity provider” (IdP) and “service provider” (SP), and goes on to describe how “Single Sign-On” comes in two flavours:

SP-initiated Single Sign-On
Alice wants to buy something online; the vendor (SP) authenticates Alice by contacting a higher authority (her IdP; compare with Norwegian validation of a UK driving license, above)

IdP-initiated Single Sign-On
Alice wants to buy something online; she connects to her IdP which provides pre-authenticated channels to other vendors from whom she can buy.

My question is: Where is the third party in all this? Why has the user no authority or involvement?

Where is “User-initiated Single Sign-On”?

Where is my ability to talk to a vendor and for them to have surety that I am me (and for me to be sure that they are themselves) by virtue of the fact that I am the same person who has been dealing with them for several years?

This also brings me back to my fundamental issue with “Big I” Identity, viz: that the Identity universe is currently predicated upon ignoring the most important person in an authentication transaction: the user.

In Identity-land, the user is considered passive and non-authoritative – the papers and protocols all pay lipservice to the need for “self-asserted claims” – letting a person describe themselves authoritatively – but answers to heavy-hitting questions like:

  • Is this person old enough to buy booze?

  • Is this person permitted to ride a motorcycle?

…are all still dealt with using cyberspace metaphors of the old driving-license-certified-by-authority model.[6]

However, as I’ve outlined above, that is not the only way.

On the web we have an additional way to authenticate – via ongoing relationship; technologies that can implement this are already well-used and well-understood; any network engineer can explain how to use TCP to establish a reliable connection between two nodes albeit layered atop an unreliable datagram connection. All we need in order to to establish a reliable relationship is to stretch the communications mechanism out over time rather than distance – like a warbler watching its eggs rather than riskily re-authenticating them time and again.

You sign-on with a vendor, once. A single time. You can bootstrap that into authenticating all future communications.

This provides “User-initiated single sign-on”.

Identity: Your Part In Somebody Else’s Goldmine

Way back in 2001 some chap at Microsoft came up with a really brilliant idea – everyone in the world could have a free Hotmail account, and could use that e-mail address as an identifier to log into all of the e-commerce sites in the world, the latter being able to query Microsoft (now an Identity Provider, IdP) to prove whom it was that was trying to buy stuff.

PressPass: How widely does Microsoft expect this federation to be adopted?

Payne : We strongly believe that a universal authentication model is extremely valuable to virtually every business. Over time we expect that this interoperability will become as important and ubiquitous as interoperability of e-mail is today. So, I guess you could say we expect adoption to be very strong. Large business and corporations are especially interested in ways in which they can unite their divergent worlds of authentication within their own companys networks. They also want to [be] enabling users [to] navigate inside the company’s firewall with just one authentication and a single sign-in. Or when they need to visit the site or services of a trusted, third-party vendor, supplier or customer. For instance, imagine how easy an employee will find it to have just one password and ID that they can use securely when visiting their company’s HR benefits page, then leave the internal site to visit their company’s travel-services site — even though that site is run by an external vendor.

The rest of the world threw rocks at the idea: your Hotmail account would become the “mark of the beast”, you would not be able to transact without it, Microsoft would hold a treasure trove of information about you, what if Microsoft crashed, the world would not be able to transact… and thus was the Liberty Alliance born, an organisation to challenge the threat from passport and provide an alternative:

The Liberty Alliance was formed in 2001 by approximately 30 organizations to establish open standards, guidelines and best practices for federated identity management. The Liberty Alliance met this goal with the release of Liberty Federation in 2002, the industry standard for successfully addressing the many authentication, privacy and security challenges surrounding online identity management. Deployed by organizations around the world, Liberty Federation allows consumers and users of Internet-based services and e-commerce applications to authenticate and sign-on to a network or domain once from any device and then visit or take part in services from multiple Web sites. This federated approach does not require the user to reauthenticate and can support privacy controls established by the user.

Now here’s the funny thing: the Identity model back in 2001 was very authority-centric, and with some validity (at the time) assumed that the user – beyond use of passwords, etc – was incapable of participating in an authentication process, incapable of making authoritative statements about themselves, and incapable of transacting on the web on their own terms.

The model has not evolved since that time; but the world has moved immensely.

As I write in 2008 some one million, perhaps nearly two million people carry BSD/Unix servers in their pockets – they are called iPhones – and the world’s populace are gradually moving online 24×7; those who don’t yet have Apache running on their phones have hosted servers, blogs, wikis, e-mail accounts…

So the key realisation missing from Identity today is that there is the potential for three equal parties to participate in an transaction – the User, the Service Provider (e.g. vendor) and the Identity Provider.

Or even, as described above, we can drop the IdP out of the loop for some purposes; and the User will take back physical possession of their own data, and perforce will become authoritative regarding their own data, and will be able to project control over their own data.

“Big I” Identity In The Large

Summing up what has been discussed so far:

1) Identity is predicated on an old model of the disempowered user – dating from the Microsoft Passport era of 2001, if not before – and little if any thought seems to be given to the potential for active, even leading participation of a User and his or her iPhone in the authentication process.

2) Following from the above, where the old world of Identity focused upon the importance of third-parties making authoritative statements about someone, a new zeitgeist could concentrate upon people taking charge of their own data, and becoming the definitive source of claims about themselves in the process.

3) And from that, the role of Authority in Identity will fade somewhat.

Adriana describes it most clearly:

In the offline world identity is really third-party driven, to put it crudely, we are what our papers say we are. Your birth certificate attests to your date of birth, your utility bills to your residence, your diploma to your education etc etc. It has been so because our identity management has had several fundamental features ‚ it is centralised, system-centric and it is read-only. We are used to deriving our authority and credibility from a system that grants and confirms it. It is important that we can do that as the only way we can transact in a hierarchical environment is via authorisation from the level above us. (a definition of hierarchy is that in order to interact with somebody on the same level I have to go via a superior level).

Whatever the web turns out to be, it is not a hierarchy. It is a network, i.e. a heterarchy, a network of elements in which each element shares the same “horizontal” position of power and authority, each playing a theoretically equal role. This has impact on how my identity is defined and who defines it. From blogs to social network profiles, people are learning how to define their thoughts and ideas, record their lives in multimedia formats, share their experiences, swarm around causes and defy companies, institutions and authorities. From linky love to P2P, they are bypassing traditional media and distribution channels, learning the ways of direct connections.

People online build and destroy reputations, create and squander careers, establish themselves as experts or celebrities. That’s the birds eye view. The closer look reveals emergence of self-defined (and self-driven) identities. By writing I learn to articulate my thoughts better, by sharing I learn to differentiate from, as well as identify with, others. I become aware of myself and my preferences in ways that in the times before the web were available to a select few – writers, artists, politicians and the more articulate celebrities. We have ways of connecting with others who become validators and authenticators of our self-defined and persistent identities. The challenge is to understand and find how to evolve and use those for other than communication and information transactions.

When attending Identity conferences I encounter startup after startup whose concept of “enabling user-centric Identity” is to reinvent Microsoft Passport in the small; they all promise that you can give them your personal data – and maybe some money – and they will manage your data (your “identity”) securely on your behalf, somehow giving you added value in the process.

There’s even a software project out there now, again predicated on the Identity notion that you are neither fit nor capable to look after your own data, nor are you capable of being an authoritative and accessible resource for the same – but you may be permitted a pretty interface to manager your own data, when held on someone else’s website.

So that’s what Identity’s definition of “user-centric identity” is all about; for a second time (and in a separate posting) Adriana hits the nail on the head:

User-centric says – “we are going to build a system, put the user in the centre instead of the system”. So far, so good, but this sits uncomfortably with me as a user especially as one that is used to the online tools that have changed many an old way. The tools – blogs, wikis, feeds and feed readers, BitTorrent, Flickr, Dopplr, Twitter etc – are revolutionary not just because of their functionality, bits of code or their interface, but their design for usefulness, their modularity and constant evolution. There is an element of open-endedness in their design, either accidental or deliberate, recognising that the designers cannot foresee all the uses to which people will put the tools to. The simplicity is the key, the complexity coming from usage rather than the design. In other words, they are user-driven.

And that’s where I think we’re going, and I don’t think there is any way of stopping it, even if I wanted to. The web is creating this enormous mass of user-capability, and the sheer gravity will drag us all sideways into a world of user-driven identity.

So what happens to “Big I” Identity?

It won’t die, but identity will have to adapt to the user’s definition.

– alec

ps: I am not here going to investigate ideas like transitive-trust as applied to User-initiated Single Sign-On – e.g. that the fact I have a relationship with one party could be used to help me establish a separate relationship with another party; to discuss this would be re-opening notions of federation[7] which I am trying to get away from.

The new user-defined-identity space will be based upon having multiple independent relationships – not some form of corporate-enabled polyamory.

pps: (UPDATE) I am also here not going to get into the weirdness of Identity wherein the goal is to centralise your personal information to make management of it convenient, and then expend phenomenal amounts of brainpower implementing limited-disclosure mechanisms and other mathematica, in order to re-constrain the amount of information that is shared; e.g. “prove you are old enough to buy booze without disclosing how old you are”. Why consolidate the information in the first place, if it’s gonna be more work to keep it secret henceforth? It’s enough to drive you round the twist, but it’ll have to wait for a separate rant.

Footnotes

[1] There is no footnote #1.

[2] For starters, there are at least as many contexts are there are pubs.

[3] Oddly enough, it works exactly like this for drink-driving – in a drink-driving scenario it is assumed that although you may have passed a test at some point in the past, the issue at hand is whether you are capable of driving a vehicle at this precise moment in time. Hence all the “can you walk in a straight line, are your reactions impaired” stuff.

[4] Over (several) drinks recently, Ben Laurie amusingly cited me someone someone who described these rather more accurately as “Something you had, Something you forgot, Something you were” – but alas I forgot which wit came up with that.

[5] The Cuckoo lays eggs parasitically; it finds the nest of one of the host species (typically containing 3..4 eggs) and removes a single egg laying one of its own as a replacement. The surrogate parents do not spot the impostor because the total egg-count is the same, and Cuckoo eggs may be somewhat larger than but have similar colouration to the original eggs. The surrogates brood all the eggs, however the Cuckoo chick hatches early and pushes all other eggs/chicks out of the nest so that there is no competition for resources. The surrogates feed the solitary cuckoo chick, until it fledges. This is clearly a case of identity theft, fraud, and security failure due to weak authentication.

[6] I have heard too many times, statements such as “Governments won’t accept self-asserted claims – for information like my home address – without some third party’s certificate that attests to the accuracy of that data”; somehow the people who tell me this ignore that every time I use a pen to fill-in my address on a tax return, let alone on a DVLA web-form, I am making a self-asserted claim with which the tax office seem perfectly content…

[7] I have nothing against federation within a security domain, eg: If one company merges with another, then it’s nice to have tools which permit hybridisation of the two user-bases without pain; see the BOFH/sysadmin commment in the introduction. However I draw a mental line between that, versus using my DVLA driver identification number to authenticate my purchase of beer from Amazon, or whatever…

National Loyalty Card II – Life Imitates Sarcasm

Old:

UK National Loyalty Card

Think of the possibilities: You could accrue Citizenship Points for snitching on benefit cheats and badly-parked vehicles, teaching immigrants how to talk proper like, y’know, or organising community-minded projects like wheel-clamping or neighbourhood-watch schemes. The sort of complex projects that would otherwise require a middle-class neighbourhood with a high percentage of social-climbers, to achieve.

These Citizenship Points could then be redeemed for positive benefits: being let-off the occasional speeding ticket, a minor discount on taxes, automatic granting of planning-permission for small household extensions – or, at the extreme end, honours, peerages, and a lunch at Buckingham Palace with the Home Secretary of the day.

New:

give yourself the gift of snitch

this holiday why not rat out someone who is cheating on their income taxes (as long as they earn over 200K a year), if you do, you can get up to 30% off of your income tax bill (reward is considered taxable income) :: the form, the rules :: oh my ::

Maybe I should have a word with the new Home Secretary…

Reasons to Love and Hate ZFS

So yesterday – inspired by Chris Gerhard – I installed my own homebrew take a snapshot every 5 minutes script on my shiny new home NAS box. Everything has been going swimmingly, so I decided to cut-over from my old striped-external-disk solution on the iMac, to the new 2.5Tb RAID-Z on the Solaris box.

After all, what could possibly go wrong?

So: yesterday I copied-over the entire iMac data store – nearly 500Gb – to the AMD’s ZFS filesystem and snapshotted it for good measure, independent of the script.[0] Then – and this is where I really goofed – I erased and reformatted the old iMac RAID array.

So it was karma this morning when I awoke to find a moderately wedged Solaris box, and skimmed through the /var/adm/messages file to find:

(heavily edited for repetitititition)

Sep 13 05:33:15 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d,1/ide@0 (ata4):
Sep 13 05:33:15 suzi timeout: abort request, target=0 lun=0

Sep 13 05:33:16 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@1 (ata3):
Sep 13 05:33:16 suzi timeout: abort request, target=0 lun=0

Sep 13 05:33:16 suzi scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@0 (ata2):
Sep 13 05:33:16 suzi timeout: abort request, target=0 lun=0

Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d,1/ide@0/cmdk@0,0 (Disk2):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3

Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@1/cmdk@0,0 (Disk1):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3

Sep 13 05:33:16 suzi gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@d/ide@0/cmdk@0,0 (Disk4):
Sep 13 05:33:16 suzi Error for command ‘write sector’ Error Level: Informational
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Sense Key: aborted command
Sep 13 05:33:16 suzi gda: [ID 107833 kern.notice] Vendor ‘Gen-ATA ‘ error code: 0x3

Sep 13 05:40:06 suzi fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
Sep 13 05:40:06 suzi EVENT-TIME: Thu Sep 13 05:40:06 BST 2007
Sep 13 05:40:06 suzi PLATFORM: System Product Name, CSN: System Serial Number, HOSTNAME: suzi
Sep 13 05:40:06 suzi SOURCE: zfs-diagnosis, REV: 1.0
Sep 13 05:40:06 suzi EVENT-ID: 5ad39b52-c2e7-6d53-b937-d43694ed2568
Sep 13 05:40:06 suzi DESC: The number of checksum errors associated with a ZFS device
Sep 13 05:40:06 suzi exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information.
Sep 13 05:40:06 suzi AUTO-RESPONSE: The device has been marked as degraded. An attempt
Sep 13 05:40:06 suzi will be made to activate a hot spare if available.
Sep 13 05:40:06 suzi IMPACT: Fault tolerance of the pool may be compromised.
Sep 13 05:40:06 suzi REC-ACTION: Run ‘zpool status -x’ and replace the bad device.

…which was deeply scary. What was more scary to me was the output of zpool status:

pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
[…] errors: Permanent errors have been detected in the following files:
<metadata>:<0x305>
tank/local/alecm@quick.20070913.053500:<0x0>

Let me put it to you like this: 15 years of my life are on those spindles – and for that moment they contained the only copy. There was no backup from which to restore.

I didn’t panic – I’m not given to panic in such circumstances – but I downed the machine and thought about it very hard. And swore, rather a lot. Especially about the lack of coffee.

Zpool was calling the tank unmountable, because three drives were missing and two “degraded”, and there were not enough “replicas” to get the pool back up.

Start with the basics: I swapped cables around, checked the SATA power-leads were properly seated (I had had one loosen, before). Cutting a long story short, the next few hours of research were deeply fraught, with multiple reboots, testing hypotheses that the motherboard’s SATA controller might be fried, or that a particular disk might be fried because it was making ticking noises (it’s useful, having a stethoscope) – and so forth.

Particularly worrying was the “cascade” failure, that three disks seem to have died at the same time, although only one of them was ticking curiously, and what really confused was that the names of faulty disks in /var/adm/messages seemed to bear no relationship to the ones which zpool was reporting at fault.

In the end I bet each-way on two possibilities, and fixed both:

  1. that the daisychain powerlead driving three disks in a bank, was at fault
  2. that the SATA data leads were too close/tied together, and crosstalk under heavy load was causing ‘sense errors’

So I separated all the SATA cables out so they have airgaps[1], and will run them separately next time I service the machine; I then reconnected the power cable using different plugs – and the “ticking” stopped, so I am betting that the tail-end power connector was flaky, leading to an undervolt on one/more of the drives.

But this still left the toasted zpool.

The machine booted and mounted the filesystems, but did not appear to enjoy the experience. Occasionally it would hang. I was curiously pleased that it was complaining about having lost metadata since although that would surely require a rebuild:

Unfortunately, the data cannot be repaired, and the only choice to repair the data is to restore the pool from backup.

…at least it gave me hope that data could be recovered without loss before having to rebuild the pool.

So once zpool could see all the disks again, I used zpool clear tank and rebooted. And eventually it would hang. And then I did rebooted again. And hang. And again. And again.

This was overkill, but inbetween each reboot I had time to rsync a little more data onto the reformatted iMac RAID system; and gradually, miracle of all miracles, I built up a copy of all the unreplicated data.

So now I had a corrupt zpool, and nothing to lose; so I could afford to experiment. Noting that the corrupt metadata was pertinent to a (inherently readonly) snapshot:

tank/local/alecm@quick.20070913.053500:<0x0>

– I destroyed all the snapshots on the machine, leaving only the filesystems. Then I iterated:

  1. zpool scrub tank
  2. zpool clear tank
  3. lather, rinse, repeat…
…until magically it sprang back to life:

22:14:43 suzi:~ $ zpool status -x
all pools are healthy

I am still gonna rebuild the pool – not taking chances – but the data is safe and I can take time to hammer the pool thoroughly before committing my archive data to it as a primary store.

First thing I shall do is reinstate the snapshot script.

So why do I love ZFS?

Well I can guarantee I have not lost any data to corruption, or indeed any data at all. No files missing, none with bad blocks. If there were, it would have told me. And I managed to salvage the situation.

So why hate it?

Well if the documentation – and the tool itself – tells you “action: Destroy the pool and restore from backup.” then you will pretty obviously assume you are screwed. Not merely is that defeatist. It is (potentially) wrong – so if you get stuck in this situation, I recommend you consider having a go at working around it.

It worked for me.


[0] Which I am beta-testing and will release once I am happy.
[1] Chatting with Chris Gerhard he confirmed with some research: SATA sucks for crosstalk, the cables are unshielded, under no account tie them together “to improve airflow” because it’ll bugger your system. Keep the data leads away from the power leads, too…

The Deep Truth Of IT Security

Adriana pointed me to a short post on Euan Semple’s blog which has led to quite an amusing bunfight:

A thought on IT security …. there’s not enough granularity in their paranoia.

Security is a tradeoff. Security professionals get it wrong by not considering the business case. Users get it wrong by asking for things which are not possible to secure and not listening to advice. Is more “granularity” the solution? Or is it just proper evaluation and open-mindedness on each side?

The comment was one I came up with in a conversation with others here in Boston on IT security and the problems that are caused with the broad brush approach that most organisations take. As with everything there are no doubt exceptions.

[T]he prevailing behaviour is to apply sweeping security measures that constrict low risk, legitimate activities, for the sake of the small number of high risk ones that grab headlines. Social computing IS about discussing things which is why we are. My concern is for those inside corporate firewalls who can’t take part in our conversation because their IT department has deployed filters that block my blog!

…etc; I see stuff like this quite often, and it’s a good thing to see; people and companies should take the matter of their IT security seriously, and such discussion is a good indication that people care.

Before I carry on, let me establish my credentials; I’ve been working in the field of IT security since 1988, and was hacking for 2 or 3 years before that. Since that time I’ve published papers, presented, taught, moderated USENET groups, broken world records in, done TV programs about, defined software interfaces for, run development teams about, built communities and implemented tools of, argued about, fought over, been misrepresented in the press regarding, won vengance and generally lived large in the world of IT Security.

So I’ve seen a lot of shit; and there’s one thing which hardly anybody in the commercial security industry is ever going to tell you straight. So listen up, because this is it:

There Is No Such Thing As IT Security. There Is Only Policy. And A Lot Of *That* Is Bad Or Wrong Or Designed By Idiots, Or Pushed By Idiots With Product To Sell And/Or Who Want To Keep Their Jobs.

Most people will naturally focus on the latter, inflammatory wording in the second half of that statement, so instead I shall focus on the first part, not only in order to build suspense, but also because it is far far far more philosophically important.

Security” is a will-o’-the-wisp. It’s an meta-quality. You can take a Windows XP machine without firewalls or virus-protection, strap it to the internet backbone, and – if you choose – it will be perfectly secure. Yes within a matter of minutes it will be overrun with viruses, worms, become a haven for lowlife-scum who run botnets and probably crash (possibly terminally) but hey if that’s what you want, and/or if you don’t mind being a sort of digital Typhoid Mary and a platform from which others may be attacked, then you will remain perfectly secure.

Yes, this may seem a perverse way of thinking, but it is the correct one.

You see: when you get hacked, you don’t really suffer a “security failure”; instead you may lose system integrity; you may suffer an exploitation which defeats access controls, bypassing or negating the software which enforces separation of privilege, leading to the execution of unauthorised code.

Your data may be copied, leading to a loss of secrecy or loss of privacy. This can break your chain of trust. If they are unencrypted or weakly encrypted, files containing your credentials could be copied, permitting criminals to fraudulently identify themselves as you, they can spoof authentication processes thereby to steal from third-party vendors.[1]

Or your data may be destroyed, leading to a denial of service – your ability to work – although this problem can equally be caused by someone swamping your network – legitimately or otherwise – or by the other demons of fire, flood, disease, power/hardware failure, and so forth.

Notice: nowhere in the above to I refer to “security”.

So what is security? You know when you’ve got it, sure, but you lose it by losing something else entirely – access, secrecy, privacy, credentials, privileges (Edit:) and resources which should be yours alone to use or disburse…

In short: security is not any of the above. It’s somewhat intangible. Possibly even fictional. It doesn’t even satisfy the laws of mathematical commutation – you can add a firewall to a network, and that’s all you’ve done; but if someone breaks that firewall then they also break your security, so surely they subtract from you something that you never explicitly added?

So how does that work? I’ll explain a little later.

Getting back to the topic: what does “security” mean more practically? In the real world? From the perspective of 20 years in the trade?

Well, it means “policy”. You may not immediately see it that way, but bear with me.

If you choose to wire the aforementioned unprotected XP machine to the internet and call it “secure”, you can do that. Alternatively if you choose to install antivirus software on a laptop and call anything that gets past that a “security failure” then you have implicitly created a “security policy”.

People create security policies all the time. A lot of them tend to come with the culture:

I bought that laptop / that DSL line / that webhosting service, therefore anyone who uses it without my permission is a bad person.

…and I wouldn’t disagree.

This also explains the firewall paradox above: the security you lose (via the broken firewall) is the confidence you you gained from the implicit expectation of control it provided to you. That the firewall was bypassed is an insult to your policy (implicit or explicit) and therefore an insult to your ability to project control over your own resources, and it is therefore an insult to you.

Thus the scope of your security policy – we could almost call this your “morality” – and the extent of your sacrifice of effort towards its adherence (firewalls, latency, hardware cost, tithing) defines how much insult you take in others’ (hackers’) wild behaviour towards your IT resources.

So if you choose to be more permissive – or if you live a simple zen-like security lifestyle which is hard to insult – then in either case you will be “more secure”; the former because you are too open-minded to be insulted by anything (ie: to consider anything a breach of security), the latter because you have nothing much to insult (ie: nothing much to attack).

Ah, but this is where the fun begins; wherever you find the permissive or the simple, you also find the authoritarian, and (possibly worse) you find the middle-class wannabees who think they “get it” and who start defining security, and therefore start by writing policy. Lots of policy, since they perceive “policy = security” and therefore “more policy = more security”. And this is a bad thing, because once you get amateurs drafting policy the whole thing goes rapidly toxic, as any Whitehall civil servant will tell you.

So what happens? Amateurs equate security with fine-grained control, and sometimes that’s good and correct but most of the time it’s overzealous, tasteless and wrong, like painting your bedroom black because you think black is ‘cool’.[2]

For instance: If you have a company of 10,000 employees it is not good security practice to try and maintain a list of which employees of what positions are permitted to access whatever particular websites. Such a rule is neither permissive, nor is it simple, and therefore it almost certainly will be burdensome, expensive to maintain, hard to scale, and dumb. It may sound really impressive to say that you’re going to classify and qualify and constrain every individual employee’s access to the near-infinite resource of the Internet, but really you’re creating an N-squared or NxM scaling problem, and maybe N=10,000 but the value of M is really close to infinity.

That’s a bad thing.

So this is why I tell people to paint their security policy with a broad brush; stick content-filters on your Internet gateways to block access to porn, if that’s your thing, but make your controls equally applicable to everybody. That way you don’t have the cost of having to deploy an authentication solution – a potential single point of failure – for anyone who merely wants to access the interweb.

Ignore those zealots who would micromanage your access controls and privileges – and if the complain, put them personally in charge of firewall maintenance for a couple of months. Don’t give them any budget. That generally puts life into perspective for anyone.

(Edit) Ignore vendors who would try to sell you tools to implement the above cited micromanagement. They make money by pandering to the stupid.

To be practically secure: be permissive (“all employees can access the web…”) – be simple (“but we log everything that goes on, we block porn sites and the websites of companies who are suing us, and we will fire you if you get us into legal trouble…”) – and be explicit about the extent of your policy (“and these are our rules and are the totality of our rules”). Write an Acceptable Use Policy and make it something that people will bother adhering to:

Acceptable Use Policy — (abbreviation: AUP)
A formal set of rules that governs how a network may be used. For example, the original NSFnet Acceptable Use Policy forbade non-research use by commercial organizations. AUPs sometimes restrict the type of material that can be made publicly available; many AUPs ban the transmission of pornographic material.
The enforcement of AUPs has historically been very uneven. This was true of the NSFnet AUP: its limitations on commercial activity were so widely ignored that it was finally abandoned in 1994, enabling the development of today’s commercial Internet. See also Netiquette, Terms of Service.

…and don’t make life difficult for yourself; after all, you’ll be the one to bear the cost, and unlike other forms of morality there is no IT Security Heaven in which you’ll get your just rewards.

You’ll just have an easier life down here on Earth.

ps: I’ll post a rant about vendors, some other time. Update: See the update in red, above.


[1] This latter is often misrepresented in the press as “identity theft”; this is bogus. Nobody ever steals your identity. People replicate it to commit fraud and in the process tarnish your name. You’re not the victim – that honour goes to the person who sells goods to the fraudster who poses as you. Your lot is that of “collateral damage” and yes it is a bloody nuisance, but that still does not make you the victim. That the press make you out to be so is possibly one of the stupider misconceptions of security, since it aids and abets profound idiocies like “Identity Cards” which do nothing to address the real problem.

[2] I speak as an advocate of Role Based Access Control, but that’s typically for a handful of users to access a handful of privileged commands – not a matter of bookkeeping which each and every whosome can access whatever file in /usr/bin.

On Retreiving Readership Statistics From RSS Feed Data

Several people have asked me recently how it is that they can extract useful “readership statistics” for content which they are making available via RSS, ATOM and the like.

It’s a thorny question – there are many challenges, dead-ends and false-starts, and I don’t want to take up too much time analysing what not to do and why; there is just too little time at the moment for me to go into the problems in depth.

Let’s just say “web bugs are not guaranteed, and neither is javascript; also people seem to hate click-throughs and partial feeds, and GoogleReader (which has the lion’s share of FeedReading and is only likely to grow) caches prettymuch everything, so hundreds of people could hide behing a single URL retreival.”

So here is my my solution and my process. It works for me. If you don’t like it, please leave a comment.

  1. Make a list of the RSS or ATOM URLs for which you are interested; in the case of this example, we shall be interested in only one:

    http://www.crypticide.com/dropsafe/index.rss

  2. From the HTTP server logs, obtain logs of all retreivals of the target URLs; note that we can be very specific about which URLs we’re logging, so this is a vastly reduced amount of data, much less than the whole corpus of logs. Note that each record contains a timestamp.

  3. Reduce that data once again, extracting only those retreivals which are sourced from GoogleReader, NewsGator or Bloglines; I am told that these three popular Blog Aggregators / Readers comprise more than 90% of the FeedReader market, and the provide their respective numbers of subscribers-per-feed as part of the “User-Agent” data, thusly:

    Raw Apache Log Data:

    65.214.44.29 – – [15/Mar/2007:08:33:31 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Bloglines/3.1 (http://www.bloglines.com; 37 subscribers)”

    72.14.199.65 – – [15/Mar/2007:08:44:28 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 43 subscribers; feed-id=9998166800354916924)”

    38.102.128.140 – – [15/Mar/2007:09:11:28 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; Rojo 1.0; http://www.rojo.com/corporate/help/agg/; Aggregating on behalf of 1 subscriber(s) online at http://www.rojo.com/?feed-id=243732) Gecko/20021130″

    64.78.155.100 – – [15/Mar/2007:09:16:42 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 12084 “-” “NewsGatorOnline/2.0 (http://www.newsgator.com; 8 subscribers)”

    65.214.44.29 – – [15/Mar/2007:09:38:30 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Bloglines/3.1 (http://www.bloglines.com; 38 subscribers)”

    72.14.199.65 – – [15/Mar/2007:09:44:29 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 43 subscribers; feed-id=9998166800354916924)”

    65.214.44.29 – – [15/Mar/2007:10:10:06 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Bloglines/3.1 (http://www.bloglines.com; 38 subscribers)”

    65.214.44.29 – – [15/Mar/2007:10:38:44 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Bloglines/3.1 (http://www.bloglines.com; 38 subscribers)”

    72.14.199.65 – – [15/Mar/2007:10:44:30 -0400] “GET /dropsafe/index.rss HTTP/1.1” 200 77762 “-” “Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 43 subscribers; feed-id=9998166800354916924)”

    …Rojo is included in the above, also; just because I could.

    Note that some fetches are performed at hourly intervals, some at half-hourly, some daily, and so forth; this reinforces that each feedname’s statistics must be treated / graphed independently.

  4. Process this data and extract timestamp, feedname and subscriber count. Graph the number of readers per feedname against time. This will give you several trend lines.

  5. If the number of readers is generally rising against time, then you are doing something right. If the number is flat, you are not growing your readership, probably a bad thing. If the number is decreasing, you are doing something very wrong indeed.

Two of the key points to remember are the elective nature of these statistics, and that they are per-feed based; by such means you will measure the value and “interestingness” of the feeds as a whole.

Regards the elective nature: what you are getting here are the numbers of people who have chosen to read your feed via their preferred feed reader. They want to subscribe, they have subscribed, and it’s likely (in the nature of feed-readers) that the statistics will reflect individual people, rather than groups or teams.

This is probably what you actually want to know, when you think about it.

Regards the per-feed nature: some people – particularly those who write stuff – might want to know particularly how well individual articles within a feed have been received; this is fruitless and not measurable via any RSS mechanism. My usual analogy for this is:

Overall, do people subscribe to “Playboy” because of one or two article headlines that they have chanced to read, or is it because they are more interested in the general theme of the content?

I reckon that the latter is more likely, and further that said observation indicates the way towards better communication:

Do away with click-thru postings (“Click here to read the rest of the article…“) and instead put the real, unexpurgated posting into the feed, and then you should track the rate of growth of feed popularity.

Deal in feeds and communication, not in high-school essays. If you want to know how interesting a particular article is, go count the page retreivals from the logs, but remember that you’ll need to compensate for all the people who read the whole thing (right?) via RSS.

In short: coerscing people into having to “click-thru” is a barrier to communication, one imposed by personal shortsightedness which is easily circumvented by not being so.

So, none of this will make you able to say precisely how many people read your blog — but by this technique you will be able to make a plausible argument for your being (say) 50% more popular now, than you were six months ago.

The Multi-Headed Beast of Web2.0 Adoption

I recently was forced to describe Web2.0 as a “Multi-Headed Beast”, and as the numbers of heads went up from Cerberus to Hydra proportions, I thought I’d whimsically arrange them into a spectrum of Web2.0 adoption, beginning with the ignorably trivial and working through towards the more larger goals of Web2.0; the result led to blind alleys and confusion, so I was forced to split it into two parts:

First, I propose a list of (some of) the stages of Web2.0 adoption:

  1. “We have wikis, internally”
  2. “We have blogs, internally”
  3. “We have employees who write blogs”
  4. “We have employees who write blogs, hosted upon a corporate blog server”
  5. “All our press releases are posted on the blogserver and have a RSS feed”
  6. “Several of our CxOs have got blogs”
  7. “The CEO has a blog and actually writes his own postings”
  8. “We have wikis, externally, and allow employees to modify them”
  9. “We have employees who read and comment upon blogs of their peers, partners and customers”
  10. “We assign our employees a significant percentage of time to write, read and comment on blogs”
  11. “We have wikis, externally, and allow anyone to modify them”
  12. “We stopped writing press releases, and started communicating with people instead”
  13. “We wash our corporate laundry in public on the blogserver”
  14. “We disabled AJAX hyperlink popups, realising they are evil
  15. “We fired our public relations and marketing staff, realising they are no longer relevant”

…and then there is the list of Web2.0 things which are NOT REALLY Web2.0 things – ie: things you can do without actually achieving anything:

  1. “Our website contains a mashup with google maps”
  2. “We sell hardware and software to ‘web2.0-focused’ startups and service providers”
  3. “We sell software and consultancy to ordinary companies which want to do blogs and wikis internally”
  4. “Our developers live and breathe AJAX, and have enabled cute popups on all our hyperlinks”

More suggestions are welcomed in the comments section. 🙂

When Hardware Vendors try to do Web2.0

When hardware vendors try to do embrace a phenomenon like DotCom, Linux, Web2.0 or somesuch, takeover press releases like this generally result:-

Cisco

SAN JOSE, Calif., February 9, 2007 – Cisco Systems, Inc., (NASDAQ: CSCO) today announced a definitive agreement to acquire privately held Five Across, Inc. of San Francisco, Calif., a leading vendor in the social networking marketplace.

The Five Across platform, Connect Community Builder, empowers companies to easily augment their websites with full-featured communities and user-generated content such as audio/video/photo sharing, blogs, podcasts, and profiles. These user-interaction functions help companies improve the interaction with their customers and overall customer experience on their websites. Social networking functions are of unique interest to media companies, sports leagues, affinity groups and any organization wishing to increase its interaction with its online constituency.

“Cisco believes the network is the platform for organizations to connect with their constituents and for individuals to connect with each other,” said Dan Scheinman, senior vice president and general manager of the Cisco Media Solutions Group (CMSG). “With the acquisition of Five Across, Cisco is taking an important step towards helping its customers evolve their website experience into something more relevant and valuable to the end-user.”

(Via)

It doesn’t get much better when you drill into the product verbiage:-

Enterprise-class platform for social networking and online communities

Five Across is committed to the long-term success of your online community and your business. The Five Across Connect 1.8 Community Builder platform is the foundation of our solution and leverages our expertise and relationships to help you build a high impact, high traffic, high participation community. The Connect 1.8 Community Builder platform addresses the critical needs of high traffic websites looking to promote and monetize user-generated content in order to expand the online experience of their community members. Based on enterprise-class technology, Connect 1.8 delivers the scalability, reliability and performance required by large scale commercial usage.

My first reaction when reading this stuff – well, not my first reaction, ‘coz that was “I really must form a company called Seven Sideways so I can guaranteed to be bought-out overnight” – so my second reaction can largely be summed up as “barf.”

“Website experience”? And who is this “end-user” who – magically transmogrifies into a “community participant” somewhere between the press release and the white paper?

There’s always an attractive proposition in offering what my dear colleagues and I would hopefully term “decent shit which is easy to use, scales well and doesn’t crash”, but any company which kicks off a Web2.0 product pitch with the notion of “building” a “high participation” community through mere provision of technology is one which is deluding its potential customer base – (cf: “Men! Using Lynx deodourant will cause women to spontaneously get naked and chase you along the beach…”) – or perhaps the vendors just don’t understand the actual needs of their potential customers?

Stuff like help? Advice? Experience? You know, sell solutions rather than parachuting a hardware or software box into the customer site, leaving the technical Maquis at the customer without a clue what to do with the new weapons of change?

One colleague recently attended a Cisco training week and accreditation course, and although he’s worked with their kit for years was still gobsmacked at the “from soup to nuts” approach of the Cisco product set. They’re a one-stop shop where the harried IT or Network Manager can purchase prettymuch everything needed to re-outfit a datecentre – not necessarily the most efficient or powerful solutions, but nobody every got fired no brains are required for buying Cisco.

My question regards community building is do they get it? – and although I find some of the Five Aside software stack features appealing:

  • Video and Audio transcoding allows users to upload and share their own content
  • RSS enables syndication of any file type
  • AJAX tools allow for online edits through the browser
  • Dynamic page loading eliminates the lag time associated with real-time access
  • Ratings system can be utilized with all content types including profiles
  • Keyword linking and tagging categorizes profiles and other content for easy search
  • Automatic generation of multiple list types based on recent edits, rankings and page views
  • Dynamic ad inclusion for flexible integration of ads on user pages

(OK, except the last one, I hate the last one) – aside from some of the apparent niftyness of the toys, I cannot help but feel that the answer is “no”; that in the lumbering hands of Cisco’s salesforce the world is about to be populated by yet more ghost communities (“curmunities?”) each populated by a handful of product purchasers who never check whether anyone ever followed up their solitary, lonely posting.