The webLyzard crawler collects and updates Web pages to be added to our knowledge archive. We currently use the Python-based open source framework Scrapy to perform this task (released under the terms of the BSC License), typically not more than twice a week and using limited bandwidth to minimize the resulting load on servers.
Currently, the majority of data is gathered for UNEP Live Web Intelligence, an information exploration system to analyze news and social media coverage on sustainable development goals, and for the research projects InVID (In Video Veritas) and CommuniData (Open Data for Local Communities).
Social Media Content
To gather social media content, we use the official Application Programming Interfaces (APIs) provided by the various platforms – strictly adhering to these platform’s usage restrictions and only accessing the public portion of the content. This includes the processing of status deletion notices and additional checks in batch mode to ensure that deleted content is removed from memory and all storage systems.
Log File Analysis
To support our ongoing efforts to increase the performance and usability of the www.webLyzard.com Web site, we use Google Analytics to track and examine the level of activity and popularity of specific pages. Google may use the data collected to contextualize and personalize the ads of its own advertising network.