Take a look at this video about Woosh Water‘s project. Woosh provides an amazing service, where ordinary municipal water fountains are replaced by hi-tech fountains, providing clean water for residents, with good user experience and less of the dirt and homelessness often associated with water fountains. Their project seems, at first glance, amazing. However, when you inspect the service in depth, some interesing points emerge.
In order to provide their service, Woosh requires that you register and receive a wireless token. When you use Woosh’s services, your location data is stored, as well as your usage. This effectively means that in order to drink purified tap water you have to sign a ten-page agreement. This creates quite a problem.
But it’s not just water, you know. When people interact, either online or offline, they create crumbs of information. For example, many public transport systems allow or require the purchase of multi-trip tickets in the form of digitized passes or electronic cards. However, they may require the purchaser to provides personal information which is stored on the card and may then be effectively shared with transit operators. Not all of this information is really required, and certainly doesn’t need to be stored. For example, there’s really no need for the bus operators to retain your photos or travel history, yet it it is not certain that such data is erased after use.
The same goes for your location data from the cellular phone operators. While the cellular operator needs your current location to serve your calls, it does not need to retain a history of your data. However, once it retains this information, others may use it. While most countries have in place a system for restricting access to consumers’ personal data, anonymized data – and sometimes not so anonymized – can often be accessed by both government and commercial interests.
We call this information “Residual Data”.
Now, when you develop an application, you’re eager to store as much information as possible. Who knows what you may need it for in the future? This depends on two wrong assumptions. The first is that people will not misuse the information. We don’t have to look far for obvious and numerous examples of authorized and unauthorized misuse of stored information. The second wrong assumption is that statistical and anonymous information, if gathered, is harmless. The act of re-identifying anonymous information becomes easier with the growing power of computing.
For me, the problem begins when you retain information: you want people to access the information you retain (if you’re a social network, for example), and you can’t really protect the information you store which should always be available. Obtaining significant personal information about someone can often be a remrakably easy thing to do. However, you usually only start to think about privacy when the personal information leaks.
Here is how I (usually) work when I help clients design their project.
First, we ask, “Do we really need this information?” This goes for every aspect — not just names and email addresses, but also information that is considered anonymous but may later be re-identified: things like browsing history, IP address or browser identification. Ask yourself why you need it, and whether it can be replaced (either with hash or other information). For example, keeping your users’ email to contact them is great; but keeping their IP address for more than 14 days has no actual use.
Next, ask yourself if the end-user can store the information at the client’s end, and not on your server. Often, using distributed storage may save costs for application developers, but may also limit the data breach. Quite a lot of information, where it is not needed for processing, may be saved at the client end.
Then, once we decide what information is used, we should ask ourselves,”What are the benefits of retaining this information?” For example, if we save a person’s purchase history in order to profile him and tailor advertisements, we might consider just storing the profile information or the categories of the purchased products.
Then, let’s examine the cost of retaining the information. The cost is divided into two groups: (a) the actual cost of saving the information; and (b) the cost of repairing a data breach. This means we need to ask whether the benefit of storing a large amount of data is lower than the cost of repairing the breach where the personal information of X users is online (see the Health Net privacy breach as an example).
So, what can you do?
My recommendation is to plan privacy ahead. Think of your product as something that should not “keep everything, analyze later” but “keep what we must, and dump the rest”.
This will make the cost of a data breach lower, and will actually help you in the long run as being more privacy oriented.
Jonathan Klinger is an Israeli Cyberlaw attorney and blogger; acting as a legal consultant for several high-tech companies and start-ups. Legal counsel to the Israeli Bitcoin Association, Hamakor, Israel’s Open Source Society, Eshnav, people for Intelligent Internet Use, Israel’s Digital Right Movement, and others. Jonathan was chosen by Who’s Who Legal 500 as one of the top practitioners in the field of Internet & e-Commerce both in 2012 and 2013. This post was based on his presentation at WordCamp 2014 (Prezi available here), and was also published on his blog.Privacy