top of page
Writer's pictureHaripriya Sridharan

Visualising Wikipedia: Pet Project

Updated: Nov 4, 2024

I love maps, data and charts. Whenever I read any text, my natural instinct is to use maps to visualise it in a demographic context, which, in turn, deepens my understanding of the topic.


I have been eager to create visualisations of internet data with a demographic focus. And eventually get to a point where I don't have to interact with BOT to retrieve information. (Influenced by all the sci-fi capers I grew up with!)


As a starting point, I aimed at presenting population data across globe.


Open Data Sources

I started with wiki dataset from hugging face. I was exploring their latest collection (2022)

from datasets import load_dataset
dataset = load_dataset("wikipedia", "20220301.en")

num_entries = len(dataset['train'])
print(num_entries)

They have quite a few collections but each had about 64L entries. So, I parked it on the side for later use.


I stumbled upon population.io. I liked their animation of live population count.


I also tested by giving date of birth as current date and below is what I got.

We estimate you will live until age 82.1 years 13 Jun 2105

Obviously inaccurate but looks pretty cool! It inspired me to look out for open apis.


I eventually came across open source wiki api that gives you data in json format. They are pretty fast and requires no api-key. Important to note here is the param "action: parse" . You can also use "action:query" but you wouldn't be able to scrape tabulated data.


api_url = "https://en.wikipedia.org/w/api.php"

params = {
    "action": "parse",
    "page": "List_of_countries_and_dependencies_by_population",
    "format": "json"
}

Developing Server and Client


For backend: I wrote a simple server side code to scrape population count from the wiki page List_of_countries_and_dependencies_by_population. Code source: here


For frontend: I started with popular 3D geospatial service Cesium Ion. However it has its limitations catering to only lat and long values.


For my needs, I required a solution that connects lat and long coordinates to specific countries. So, I turned to opencage. They offer a generous allowance of 2500 free calls per day. (Not bad).


Here is how I connected them.


This is the final result.



You can find the on-going project here. I'm curious to keep this journey going and see where it takes me. I'll be sharing more as I explore new platforms/data sources.

78 views

Recent Posts

See All

Comments


bottom of page