Website Crawling and Data Scraping Thoughts

Website crawling and data scraping have burdened the growth of e-commerce as website owners are witnessing their data scraped. The legal questions have lingered. Many questions stand out. The prevalence of crawling and scraping has become too of the norm for those using web content for business, research, or marketing purposes. The common theme is that website scraping is used by those who are seeking a short cut in order to catch up to their competition, seeking to emulate their competition, or are seeking to extract information that would otherwise involve too much time. The crawling can be useful for enhancing search relevance, indexing, and accuracy. The software used is not unique. It could be automated just to extract information similar to what search engines do plus do an additional feat by converting the data useful within a database. The data being sought can be extracted from many types of sources. As it could be used by potential newbie business desiring to start at some equal footing, they could seek to get their data from booking websites, yelp, eBay, or even a directory. The potential scrapers can seek to go after a business they desire to emulate. The purposes for which website scraping is pursued gives “big data” gathering a new image with unsavory impressions.

The reality is that the Internet is not without the existence of web crawlers and scrapers which are instrumental to the analysis of website performance in sync with search words and traffic volume measuring. Yet, the method of using web crawlers to either aggregate news content or enhance the relevancy of search result has drawn attention to the legal consequences and the legal issues they cause. To the scrapped website business, the potential arguments could very well be from the spectrum of a violation of the website’s terms of use to the occurrence of computer abuse. Between them is a list of legal considerations which include trespass to chattel, copyright infringement, trademark infringement, and unauthorized access to computer information all in the name of online data collection for better or for worse. The for worse consideration embraces conceptions of a software application tasked to collect online data through scripts, also known as “bot”, and the depicted analysis of the data. These bots give the impression of human actual online interaction. Nevertheless, the legal questions and impact of a bot’s online website data scraping work are diverse.

Among the legal issues is the issue raised regarding the violation of the terms of use that are stated on websites that prohibit the scraping and crawling essentially copying of the respective websites content and data. The argument can embrace the notion of contract by which if one uses or visits the website there is the understanding that visitors are bound by the terms of use (ToS) of the website. Doing such an act that violates the ToS of a website, construes a breach of contract argument, without going into the details – in this short note – addressing the aspects of “clickwrap” and “browsewrap agreements. Both hinge on informed consent and the means of expressing a user’s consent and the user’s clear ‘constructive knowledge’ vis-à-vis the prominence of the ToS on a website. The glaring prominence of the ToS and conditions for a website user to be aware are pivotal to establish a breach of terms of use. The actual event of reading the terms is immaterial. However, the same cannot be said when the crawling or scraping is done by a bot that is not scripted to read and consent to a website’s ToS. The means by which this is technically done skirts the legal elements of ‘consent’, ‘constructive knowledge’, and ‘prominent and clear notice’ that are required to establish a form of breach. The arguments hovering on prohibiting uses of a website have reached the point of discussing commercial and personal uses, with the former being the one restricted and prohibited by the ToS.

In addition to the ToS concern, there is the copyright infringement issue with website scraping data and content. The ultimate question is to determine which aspects provides the best argument. The Copyright Act seeks to protect the expressions whether they be in a visibly readable form or in a digital form on a server. The Copyright Act may not be effective in addressing or preempting the use sought to be addressed by the website owner. For instance, if the crawling and scraping are not done for commercial purposes, the Copyright Act may not yield the leverage necessary. Yet, Facebook’s case against Power.com which was underscored by the Copyright Act was effective in that the defendant was aggregating Facebook’s data unto another site and that was in violation of Facebook’s terms. The Northern District Court of California denied defendant’s motion to dismiss determining that scraping involves the copying that Facebook explicitly restricts in its ToS.

Aside from the copyright infringement issues, there are considerations that scraping a website or crawling a website against the owner’s ToS is tantamount to unauthorized access or exceeding the permitted use of a website and its content. Such a view resorts to the Computer Fraud and Abuse Act (CFAA) that points to the unauthorized access of a computer system and also points to exceeding the scope of use that is permitted. The use of a website must have exceeded what was authorized coupled with an express and clear statement on the website of what was a prohibited use or activity on the website regarding its content and data. Conjoined with this consideration is the often articulated defensive crutch of ‘fair-use’. Yet, scraping website content does not inherently engender to be the beneficiary of the ‘fair-use’ argument.

Furthermore, web crawling and scraping bring as well the concerns for determining the existence of damages if website content and website data is considered as ‘chattel’. As argued by eBay against Bidder’s Edge, the website platform content and data was argued to be chattel to which Bidder’s Edge trespassed. eBay also argued that the defendant’s act interrupted eBay’s operation. However, the effectiveness of the argument must rely on the existence of damages. Without damages, the argument withers and courts do not see trespass to chattels as a workable argument against website scraping and crawling. A frequently used argument against web crawling and scraping is the Digital Millennium Copyright Act (“DMCA”) which resorts to restricting fair-use of content. What is interesting is the actual bypassing that takes place to circumvent a website’s measures to restrict web crawling and scraping. The DMCA provides an enforcement means for copyright rights of a websites digital content.

The complexity created by the use of bots is elusive and evident. Also evident is that the fair use defense along with the absence of damages and the potential absence of the element of consent and constructive knowledge will continue as points of contention, as website owners oppose web scrapers. The legal issues thus far have crossed from intellectual property and contract concerns to unauthorized access to a network or computer system, raising the specter for continued legal disputes over website scraping.
Lorenzo Law Firm is “Working to Protect your Business, Ideas, and Property on the Web." Copyright 2016, all rights reserved Lorenzo Law Firm, P.A.

Internet Law Lawyers, Data Security Law, Intellectual Property Law - Lorenzo Law Firm, P.A.

Monday, September 19, 2016

Website Crawling and Data Scraping Thoughts

No comments:

Post a Comment