June 9, 2023

Tishamarie online

Specialists in technology

A new framework for web scraping data to ensure its validity for use in marketing studies

web network
Credit history: CC0 Public Area

Researchers from Erasmus University Rotterdam, Tilburg University, INSEAD, and Oxford College revealed a new paper in the Journal of Advertising and marketing that proposes a methodological framework targeted on enhancing the validity of world wide web data.

The research is authored by Johannes Boegershausen, Hannes Datta, Abhishek Borah, and Andrew T. Stephen.

The modern ruling of the Ninth Circuit in HiQ Labs v. LinkedIn underscores the significance of navigating the legal troubles when utilizing website scraping to collect data for academic analysis. Although it may possibly be permissible to accumulate information and facts from publicly readily available web pages, researchers still need to be cautious about how they layout their extraction software program. For instance, accumulating information from publicly accessible user profiles in some jurisdictions may well bring about privacy concerns—and prompts researchers to anonymize their knowledge throughout the collection.

Whilst internet marketing researchers more and more use web facts, the idiosyncratic and at times insidious issues in its selection have been given restricted notice. How can researchers be certain that the datasets produced via web scraping and APIs are valid? This research team developed a novel framework that highlights how addressing validity concerns requires the joint consideration of idiosyncratic technological and legal/ethical concerns.

The authors say that their “framework covers the wide spectrum of validity concerns that arise together the a few phases of the automated selection of world-wide-web details for academic use: selecting details sources, designing the facts assortment, and extracting the facts. In speaking about the methodological framework, we present a stylized marketing and advertising example for illustration. We also supply tips for addressing difficulties researchers encounter during the collection of net data via world wide web scraping and APIs.”

The short article even more supplies a systematic evaluate of extra than 300 articles utilizing website details published in the best five internet marketing journals. Making use of this assessment, the scientists reveal how web details has advanced advertising and marketing believed. Being familiar with the richness and versatility of internet data is priceless for students curious about integrating it into their study courses.

Fascinated researchers can access the database produced for this review on the companion internet site. This internet site also features more practical assets and tutorials for amassing world-wide-web data by using world-wide-web scraping and APIs.

The scientists increase that they use their “methodological framework and typology to unearth new and underexploited ‘fields of gold’ involved with world-wide-web knowledge. We seek to demystify the use of world-wide-web scraping and APIs and thus aid broader adoption of world-wide-web details across the internet marketing self-discipline. Our Future Exploration portion highlights novel and imaginative avenues of using website info that include checking out underutilized resources, building rich multi-source datasets, and fully exploiting the probable of APIs further than data extraction.”


Judge orders LinkedIn to halt blocking information-scraping business


More info:
Johannes Boegershausen et al, Express: Fields of Gold: Scraping World-wide-web Data for Internet marketing Insights, Journal of Advertising and marketing (2022). DOI: 10.1177/00222429221100750

Website databases: website-scraping.org/

Delivered by
American Promoting Affiliation


Citation:
A new framework for world wide web scraping facts to make sure its validity for use in advertising reports (2022, June 2)
retrieved 5 June 2022
from https://techxplore.com/information/2022-06-framework-internet-validity.html

This doc is subject to copyright. Aside from any honest working for the reason of personal analyze or analysis, no
part may possibly be reproduced without the created authorization. The written content is delivered for information and facts purposes only.