Tuesday, January 13, 2009

The pitfalls of Web authentication, and how to defeat OpenID

At Coding Horror, there was a discussion yesterday about the merits of various authentication schemes used on Web sites, and the tendency of users to pick weak passwords. The discussion focused on methods to ensure that valid username/password pairs don't get guessed by scripts.

One assumption that seemed to be made by everybody was that most Web sites absolutely need to have authentication. However, from the user's point of view, there is no reason why most Web sites should ever ask them to register in the first place.

Consider the Web site reddit.com. Most readers already know what it does, but I'll describe it anyway for clarity: Reddit.com displays a list of rows from a database, each of which has a score, which is used as a sorting key. A person using a Web browser can influence this score by clicking on an up or down-arrow icon, displayed next to the data from the row, thereby influencing that row's position in the list.

The part of this concept that requires authentication escapes me. Nonetheless, Reddit.com, along with the vast majority of Web sites out there will attempt to not do anything useful for a user who cannot prove they've given out their e-mail address by entering a username/password pair.

For most Web sites (that is, excluding banks, e-mail providers, and sites that charge users based on usage), users have no incentive to give out e-mail addresses, create user accounts, think of strong passwords, or do anything whatsoever to keep their passwords secret. I lose nothing if I create a Reddit account and someone else guesses the password and uses my account to influence the scores. It's only a mild annoyance if such a user changes the password. This is why Web sites such as Bugmenot are so very popular, and also why most people don't really care if their password is the first word in the script-kiddie's dictionary.

Web site owners, on the other hand, do have an incentive to ask for users' e-mail addresses, and will therefore continue to do so. E-mail addresses can be sold to spammers. Each Web site owner has the incentive to try and build the biggest database of e-mail addresses possible. This is how those Viagra peddlers got your e-mail address.

For users, the ideal Web would be one where authentication only existed when there was actual sensitive information to protect. OpenID, an effort to allow users to have a single username/password that works everywhere, is a step in the wrong direction. OpenID is a decentralized scheme based on the assumption that if one Web site has authorized a user to access it, then it should be okay for other Web sites to do something for that user other than display the message "Login or Register." It is assumed that at some point, somebody asked for an e-mail address and is keeping it in a database somewhere.

Web site owners have no incentive to accept OpenID: It robs them of e-mail addresses to sell to spammers. They do it anyway, because for smaller Web sites that don't do very much, it's more important to not drive users away with yet another registration process, and the authors of those sites (or their bosses) have never questioned the central assumption that everything must be authenticated. This creates an opportunity for Web sites that do the original authentication, because Web site popularity in a particular area tends to concentrate like wealth in a market economy. This means that eventually, 90% or more of OpenIDs will originate from a single Web site, and that Web site's e-mail address database is going to be enormous. There will be an opportunity to collect subscription fees from spammers in exchange for access to the data. This database gains added value if each e-mail address is associated with statistics about which sites the user logs into the most frequently.

The other possibility is that OpenID will eventually be rendered completely useless. Here's how: OpenID allows anyone to be an OpenID provider. When a site (the Relying Party) uses OpenID for authentication, this means they'll simply ask the HTTP server at some address (indirectly specified by the end user) if they know the user, and accept that server's positive assertion, as long as it's signed according to a protocol that can be implemented by anyone who can understand the specification. A program could be written that runs on an end-user's machine, and provides positive assertions for any Claimed Identifier about which a Relying Party might ask.

This program would have to be small, and would provide a simple mechanism to allow a user to copy a valid OpenID to the clipboard. If this program enjoyed wide enough distribution (for example, if it was built into a popular open-source Web browser), there would be no point in using OpenID, just like there's no point in IRC servers using ident anymore, because the only real information you'd get from OpenID is the user's IP address, which you already get from the REMOTE_ADDR CGI variable.