Saturday, March 2, 2019

Python script to find structure of opportunities on Slovakia job market

Introduction


I will try to answer this simple question using google colab + python notebook + web crawling Slovak job ad site + simple NLP (mainly using regex and simple text transformations) and pandas with sklearn.

It's impossible to answer this question using something like TIOBE programming index. This index is composed using trend searches in popular search engines. It doesn't take into consideration what is actual demand for some programming language on job market, let alone niche market like Slovak.

How is this possible?


This is possible now due to change of law on Slovak job market, which basically force companies to publish lowest possible salary they are willing to pay for position. Companies tends to put higher figures in ads, to compete with each other. So real salaries are bit higher, but it should average itself out. There is one problem tough. There is no regulation what type of salary they should put on ad, so there are companies that put net salary and other put gross salary. But. there are not so many of those that put gross salaries.

Data


For correctness, jobs with salary lower than 900 and bigger than 5500 EUR will be ignored, because there is higher probability they are false positive.

We will crawl most popular Slovak job ad site. Crawler will crawl through roughly 1200 pages of IT jobs. Some of which are full programming jobs, others are something in between (Managers, Support, Testers)

We will use corpus of words that will represent most popular programming languages. There will be tree different strategies for parsing programming languages from ad text. You shouldn't worry to much about this. Main reason for this it's difficulty to parse words like "C" or "R" programming languages from ad text, so we must treat it as single word that have no word boundaries.

Python scripts

Here is link to read only google colab python notebook without crawl code (code that actually rip/downloads content from job ad site)

Click here to see the scripts


Summary


As you can see there are some interesting surprises. Java is main language to learn if you want to make between 3000 and 4000 Euros.

Who knew bash is so important to learn? But on other hand is not so hard to learn it. :)

For lower paying positions PHP is main language, but you can also see there R at second spot (maybe some error in parsing?)

It no surprise that for higher salaries than 4000 EUR there is no clear winner. You must be generalist at these positions (Architects, Team Leads, Tech Leads). So answer to the question in title is: None, or there is not silver bullet, just be good at what you do and make sure to learn as much as you can.