TfL Has Been Accidentally Allowing Anyone With A Computer To Track Your Boris Bike Journeys

    A hole in Transport for London’s datasets shows why pseudonymised data might not always be anonymous.

    London's transport authority has come under fire for releasing data that allowed anyone with a computer to track the movements of people using its bike-share scheme.

    A researcher at the University of Nottingham told BuzzFeed that this incident should never have occurred, particularly with information that is especially personal.

    Dr Gilad Rosner, who specialises in digital identity and privacy, said that TfL should be applauded for encouraging the idea of open data but were careless in their approach, particularly with something as sensitive as movement data.

    Not only are we talking about TfL who has access to a tremendous amount of users but this is movement data, some of the most sensitive around.With movement data, we can figure out where are you at night, where you drink, did you go to an STI clinic, all this really sensitive information. So the danger of getting a data release like this wrong in terms of its privacy characteristic is very high.

    This comes after James Siddle downloaded the data, freely available from TfL's website, and created visualisations to highlight the potential privacy implications.

    Siddle, 38 also suggested in a blog post that "with a little effort, it's possible to find the actual people who have made the journeys".

    Information from TfL's datasets include the start and end location as well as journey times. Crucially, until Siddle's blog post, this information also included a unique customer ID, which means that users could download the datasets and potentially predict a user's movements.

    This means anyone who downloaded the data would have been able to narrow down journeys made by individual commuters.

    Siddle claims he informed the department about the privacy issue and a spokesperson for TfL told BuzzFeed that it took down the entire dataset following his blog post.

    It re-published the data late last week after removing any sensitive information but emphasised that it would be very difficult to track down individuals.

    But Siddle's blog post refuted these claims. "All that's needed to work out who this profile belongs to is one bit of connecting information," he wrote.

    Although a TfL spokesperson told BuzzFeed it would be very difficult to identify specific individuals, Siddle claims that pseudonymised data can become very personal when combined with other datasets.

    It's anonymous but it's in gray area. It's data that you could argue that it's okay. We're generally putting ourselves at risk. When you start to aggregate datasets together, you start to build an intimate picture of someone's life, which many people probably don't know is possible.

    TfL told BuzzFeed that the information was erroneously made available when transferring to their new website last year.

    TfL's General Manager of Cycle Hire, Nick Aldworth, said: "We're committed to improving transparency across all our services and publish a range of data for customers and stakeholders online.

    "Due to an administrative error, anonymised user identification numbers were shown against individual trips made between 22 July 2012 and 2 February 2013.

    "The data, which did not identify any individual customers online, was removed as soon as the matter was brought to our attention."

    "I'm not going to say that TfL is cavalier but it shows the need to have greater privacy oversight because what's happening is that the availability of data is increasing"."The real question is the next time they release the dataset, who's checking to make sure that error doesn't happen again and who's checking that the methodology for releasing the data is sound from a privacy perspective," he added.