At webLyzard technology, we always strive to collect the minimal amount of information required to operate our services. Fully compliant with the General Data Protection Regulation (GDPR), we only capture and process public content (i.e., Web pages and postings “manifestly made public” according to Article 9 of the GDPR) using a Web crawler and the Application Programming Interfaces (APIs) of social media platforms, as outlined in the following.
Data Collection and Processing
A Web crawler collects and updates Web pages to be added to webLyzard archive. We currently use the Java-based open source Apache Storm-Crawler to perform this task (released under the terms of the ASF 2.0 License), typically not more than twice a week and using bandwidth limits to minimize the resulting load on third-party servers. The data collection process respects the Web site owner’s robots.txt settings (a text file placed in the top directory, which is used by site administrators to restrict access to files and directories on a Web server). Please contact us if you are a site administrator and have questions regarding this process.
The majority of data is currently gathered for UNEP Live Web Intelligence, an information exploration system to analyze online media coverage on sustainable development goals, and for the research projects InVID (In Video Veritas), ReTV (Re-Inventing TV for the Interactive Age), and CommuniData (Open Data for Local Communities).
Social Media Content
When collecting content using the official APIs of social networking platforms, our system strictly adheres to these platform’s usage restrictions and only accesses the public portion of the content. Using channel / page names and keyword search terms to specify topics of interests for a project, we gather posts and comments together with basic account details. This may include the account name, the number of followers as well as public geo annotations.
The performed content analysis includes story detection, sentiment analysis, brand perception and an assessment of the potential reach of a posting. It is never used to build user profiles or to infer details about an individual. Users can request the deletion of their content via email.
Cookies and Log File Analysis