Surveillance start-up Clearview AI, an American technology company, found itself the focus of media and public scrutiny earlier this year. An interesting angle that emerged from the story was Clearview AI’s use of a practice known as ‘screen scraping’ (or ‘data scraping’ / ‘web harvesting’). Using an algorithm that automatically scans and collects third party website data, Clearview AI amassed an unprecedented database of over 3 billion images of individuals from a wide range of sources including Facebook, Google and LinkedIn. (Compare this to the mere 460 million images held on the FBI database.) The company then used this database to train and develop what is widely considered to be the most comprehensive surveillance tool of its kind on the market.
But, leaving aside any ethical concerns for now, did Clearview AI actually do anything wrong? Is screen scraping ‘legal’? Has Clearview AI become a scapegoat for a practice that is actually widespread and, in the context of some industries, normalised?
In answer to the last question first: screen scraping is a widely used tool. For example, it is the technique used to source the data that sits behind many travel and hotel booking aggregation sites. These sites ‘scrape’ price data from a vast swathe of third party providers in order to display comparative information. And interestingly, very few individuals appear to have an issue with this practice in this context (since the effect of the practice generally speaking is to save individuals money) – although of course the businesses affected contend that it diverts custom from their business and pushes their prices down.
So looking at the other questions: did Clearview AI actually do anything wrong and is screen scraping legal?
There are currently no laws in Australia that specifically prohibit the practice of screen-scraping per se, although as the use of this technique become ever more prevalent and extends into new industries and for new use cases, one might expect this to change.
However, there are a number of other legal and contractual frameworks that exist that need to be considered when assessing whether or not a particular form of screen-scraping activity is permissible.
Generally speaking, click-wrap agreements are likely to be enforceable as a contract where knowledge and acceptance of the contract terms are a clear pre-condition to the use of the site. The enforceability of browse-wrap agreements, on the other hand, is less clear cut where notice of the terms is less prominent.
Another issue to consider is the drafting of any prohibition on screen-scraping in the terms. Although Australian courts are yet to consider this in detail, the Court of Justice in the European Union held in Ryanair Ltd v PR Aviation BV that a Dutch price aggregation website was contractually prohibited from using an automated system to extract data from the Ryanair website for commercial purposes. Unsurprisingly, the more specific the language relating to screen scraping, the more effective the provision is likely to be.
Copyright laws may also apply to prohibit screen-scraping in certain situations.
For website content to be protected by copyright under the Copyright Act 1968 (Cth) it must be an original ‘literary work’. It is of course possible for website data to be considered an “original” work; this would be the case when the data is organised in a structured way and the way it is presented is the result of someone’s “originality”.
For example, a simple compilation of data (for example, a list of telephone numbers or prices) generally would not pass the originality threshold. Generally speaking, there must be some reduction of the database to a material form, and some intellectual effort in the creation of that material (See for example IceTV Pty Ltd v Nine Network Australia Pty Ltd (2009)).
In the case of Clearview AI, however, where the subject data was photos of individuals rather than bland list “data”, copyright protection would likely apply to the images collected by the company’s data scraping efforts. Photographs are ‘artistic works’ for purposes of the Copyright Act, with authorship (and unless an agreement is in place that suggests otherwise, ownership) being determined in this context by the person who ‘took’ the photograph.
Note, however, that (assuming that the photographers in question had not assigned their rights to the photographs to the website operator), any action against Clearview would have to be taken by the photographers themselves, which would rely on those individuals having knowledge of Clearview AI’s practices. Given the sheer size of the database collated, and what the images were used for, this may be unlikely.
For certain categories of more “sensitive” data, there are a number of State and Federal laws which make it an offence to access restricted computer data without authority. For example, section 308H of the Crimes Act (NSW) and Div 478 of the Schedule to the Criminal Code Act 1995 (Cth) prohibit unauthorised access to restricted data, i.e. data protected by a password or other security access control. For those charged, the maximum penalty is 2 years’ imprisonment.
Note however that while these laws generally act as a safeguard against confidential, private or commercially sensitive information, they won’t apply in the majority of screen scraping instances which target publicly available databases and more “bland” information.
The law of trespass has existed for decades (if not centuries) to prohibit unauthorised interference with (physical) property that belongs to another person. However, it is unclear whether Australian courts will start to expand this to the digital domain.
If we look overseas, we can see that this is a possibility. For example, the US District Court for California granted a preliminary injunction to eBay on the basis that the screen-scraping activities of another company (Bidder’s Edge) constituted trespass. It was held by the court there that the eBay website was the company’s personal property, and that by intentionally conducting over 10,000 searches Bidder’s Edge caused harm to eBay by draining its resources.
Privacy and Biometric Scanning
Back to our Clearview AI example and the extrapolating of biometric data from images collected.
Biometric information is categorised as ‘sensitive information’ under the Privacy Act 1988 (Cth). Where an organisation is subject to the Privacy Act, it is obliged to obtain an individual’s explicit or inferred consent to collect such information about the individual. The screen-scraping practices of Clearview AI clearly did not involve gathering explicit consent from those whose photographs were obtained. But could their consent be inferred?
In the US case of HiQ Labs Inc v LinkedIn Corporation, LinkedIn attempted to block the scraping of its public profiles by HiQ, which is a company that provides analytics tools on workforce statistics (e.g. when someone is likely to leave their employment). The 9th US Circuit Court of Appeals held in a 3-0 decision that HiQ was able to scrape this data. Although the central question in this case was whether the scraping constituted ‘hacking’ under the US Computer Fraud and Abuse Act, the judges noted that there is “little evidence that LinkedIn users who choose to make their profiles public actually maintain an expectation of privacy with respect to the information that they post publicly” and that with regard to “publicly available profiles, the users quite evidently intend them to be accessed by others.”
No doubt Clearview AI would seek to argue that the public nature of the information they obtained means that inferred consent was given. It remains to be seen if this would be upheld in Australia, given the relatively strong protections afforded to sensitive information under the Privacy Act.
Authors: Lesley Sutton, Nikhil Shah and Alexander Ryan