Team 7 ArtVerc

ARTVERC - Fighting Drug Resistance at the Policy Level

ARTVERC: New ways of working to monitoring HIV Guidelines

“HIV down by law and eat by python”


The World Health Organization (WHO) publish from 1999 guidelines about HIV drugs treatment. These guidelines are implemented in each country but,actually, it miss a pipeline to consolidate the implementation and the guidelines. From national pdf guidelines (e.g. in Kenya but also pictures (e.g. in Tanzania (link)), we want to improve this consolidation by using web technologies scrapping and text mining, into a microservice architecture.

The Context of the initiative

This initiative is the output of the HIVHackthon organized at Brussels in digityser place, during the 2018 ( During multiple months, multiple people with different background brainstorm, and try to find a way to analyze national data to fight HIV drug resistance. From all the ideas emerged from these brainstormings, each team tries during 2 days hackathon and from scratch, to tackle one of the identified problems. Our team decided to give to the community a tool to reconstruct database from local legislation and compare it to the guidelines from WHO.

Technical implementation

All the following implementations are stored on and are coded in python.

1)Web Scraping and text extraction with Scrapy

Kenya offical legislation texts were extracted with scrapy and saved in json files for future text mining.

2)OCR with Tesseract

Tesseract is one of the best OCR engine to extract text from pictures. Its was applied on tanzania dataset to extract text in english but also in swahili.

3)Indexing data with elk (elasticsearch, logstash, kibana)

All the text and data were logged in real time in elk, allowing indexing of text, text retriving and find items containing hiv keywords.

Now it is your turn

We laid the foundations of our vision for HIV guidelines consolidation. Now it is your turn, dear reader: fork the project on github and add new stones to improve the system and to go to the next level!!!!