I Will Invade Your Privacy (In Ethical Ways) For Stories

    Americans produce unfathomable volumes of digital data every day, much of it publicly accessible on social media. What does it mean to use social data ethically?

    “hi jeffrey” -- that was Angel Leong’s first Facebook chat message to Jeffrey Ngo. She was an activist in Hong Kong. He was a Hong Kong exchange student in New York.

    This online encounter represented the beginning of an international political movement. Together with Angel and other people he met on Facebook, Ngo put together dozens of protests around the world to support the student protests that were taking place in Hong Kong at the time.

    When I interviewed him for Al Jazeera America a year later, Jeffrey didn’t remember how he met Angel. He did allow me to scrape and analyze his archive of Facebook messages and together we were able to reconstruct the start of an international political movement.

    We conduct more and more of our lives online. Every click, swipe, and tap is stored somewhere. One MIT study found that the average American office worker was producing 5GB of data each day. That was in 2013 -- we haven’t slowed down.

    Journalists already use the data we generate — aggregating every insult in @realDonaldTrump’s Twitter feed or searching for clues to Ken Bone’s character in his Reddit posts. But for the next year, I want to find more structured and ethical ways to to explore our digital archives. As a BuzzFeed Open Lab fellow, I’ll be writing scrapers and producing data visualizations to find stories in social media data and new ways to tell them.

    I’ll be guided in that work by a few core principles.

    Don’t be a creep!


    While the data we all generate on social media is rich in potential, mining it raises real ethical questions.

    A lot of us only have a limited awareness about what our data could be used for. Think of the last time you read through the terms of service for any service you use. We tweet political rants or vacation photos without expecting that tweet to be seen by more than the few hundred folks who follow us. When journalists pickup on a tweet and include it in a story, those observations are exposed to a much larger audience.

    To lurk or not to lurk

    Journalists are taught early on to tell interviewees that they are journalists. When they walk into a church with a notepad or approach strangers at a demonstration, they always start by telling people why they’re there.

    When we’re working with stories that originate in Facebook groups, Subreddits or other online forums, journalists may or may not always follow those rules. What language and means (messenger, announcements in Facebook groups, etc.) should journalists use to introduce themselves to a Facebook group with thousands of members before they use the group’s conversations in a story?

    Does ‘public’ mean ‘public’?

    When we join a Facebook group or respond to a tweet, we often think of our conversations as private. We're writing for our friends, or followers. We're responding to an ongoing conversation, not writing for a national broadcast.

    While some may say that’s fair game — publicly posted information is public — I think journalists need a more nuanced ethical framework before they use social information for their stories. After all, newsrooms have developed ethical practices around many other parts of journalism. For example, investigative journalists usually send out “no surprises” letters to their sources, detailing what they have found in their reporting and asking for comment.

    What protocols should journalists put in place when using social media records and data? How much are journalists obliged to protect sources from things they say in public forums? Do different policies apply to different people? Should they treat vulnerable communities the same way they treat public figures?

    Don’t accidentally leak someone’s information

    Journalists not only use public tweets and Facebook posts. Sometimes sources will hand journalists access to otherwise private Facebook messages, Whatsapp logs or other private information.

    When sources give journalists sensitive information, the journalist carries the responsibility of making sure that information is safe. For some sources, this kind of access can mean the difference between a harassment free life and a life under public scrutiny, sometimes even life and death.

    Migrants, for example, have been using smartphones to perform tasks like navigating the seas using their mapping app’s GPS or staying in touch with loved ones around the world. French journalists at the newspaper Le Monde reconstructed a Syrian migrant’s journey to Europe using Whatsapp data. A BBC video traces the journey of a migrant through their smartphone:

    View this video on YouTube

    BBC on YouTube / Via youtube.com

    If these journalists store this data on their computers, emails or in their cloud, they need to ensure it’s safe. Many migrants have fled persecution at home. If their data is not secured, groups like ISIS can find ways to track down migrants they don’t like.

    What best practices should we implement when receiving access to a source’s data — whether it’s by proxy of being their ‘friend’ on Facebook or by receiving a csv of their tweets or Facebook data?

    These are just some of the issues I will surely encounter as I get going with my projects. If you have any thoughts? Please get in touch!

    lam.vo@buzzfeed.com

    Open Lab for Journalism, Technology, and the Arts is a workshop in BuzzFeed’s San Francisco bureau. We offer fellowships to artists and programmers and storytellers to spend a year making new work in a collaborative environment. Read more about the lab or sign up for our newsletter.