At webLyzard technology, we always strive to collect the minimal amount of information required to operate our services. Fully compliant with the General Data Protection Regulation (GDPR), we only capture and process public content (i.e., Web pages and postings “manifestly made public” according to Article 9 of the GDPR) using a Web crawler and the Application Programming Interfaces (APIs) of social media platforms, as outlined in the following.
Data Collection and Processing
A Web crawler collects and updates Web pages to be added to webLyzard archive. We currently use the Java-based open source Apache Storm-Crawler to perform this task (released under the terms of the ASF 2.0 License), typically not more than twice a week and using bandwidth limits to minimize the resulting load on third-party servers. The data collection process respects the Web site owner’s robots.txt settings (a text file placed in the top directory, which is used by site administrators to restrict access to files and directories on a Web server). Please contact us if you are a site administrator and have questions regarding this process.
The majority of data is currently gathered for UNEP Live Web Intelligence, an information exploration system to analyze online media coverage on sustainable development goals, and for the research projects InVID (In Video Veritas), ReTV (Re-Inventing TV for the Interactive Age), and CommuniData (Open Data for Local Communities).
Social Media Content
To gather social media content, we use the official APIs provided by the various networking platforms – strictly adhering to these platform’s usage restrictions and only accessing the public portion of the content. To ensure GDPR compliance, this includes the processing of status deletion notices and additional checks in batch mode to ensure that deleted content is removed from all storage systems.
Cookies and Log File Analysis