Saturday, March 2, 2019

Python script to find programming language which pays biggest salary in Slovakia job market?

Introduction


I will try to answer this simple question using google colab + python notebook + web crawling Slovak job ad site + simple NLP (mainly using regex and simple text transformations) and pandas with sklearn.

It's impossible to answer this question using something like TIOBE programming index. This index is composed using trend searches in popular search engines. It doesn't take into consideration what is actual demand for some programming language on job market, let alone niche market like Slovak.

How is this possible?


This is possible now due to change of law on Slovak job market, which basically force companies to publish lowest possible salary they are willing to pay for position. Companies tends to put higher figures in ads, to compete with each other. So real salaries are bit higher, but it should average itself out. There is one problem tough. There is no regulation what type of salary they should put on ad, so there are companies that put net salary and other put gross salary. But. there are not so many of those that put gross salaries.

Data


For correctness, jobs with salary lower than 900 and bigger than 5500 EUR will be ignored, because there is higher probability they are false positive.

We will crawl most popular Slovak job ad site. Crawler will crawl through roughly 1200 pages of IT jobs. Some of which are full programming jobs, others are something in between (Managers, Support, Testers)

We will use corpus of words that will represent most popular programming languages. There will be tree different strategies for parsing programming languages from ad text. You shouldn't worry to much about this. Main reason for this it's difficulty to parse words like "C" or "R" programming languages from ad text, so we must treat it as single word that have no word boundaries.

Python scripts

Here is link to read only google colab python notebook without crawl code (code that actually rip/downloads content from job ad site)

Click here to see the scripts


Summary


As you can see there are some interesting surprises. Java is main language to learn if you want to make between 3000 and 4000 Euros.

Who knew bash is so important to learn? But on other hand is not so hard to learn it. :)

For lower paying positions PHP is main language, but you can also see there R at second spot (maybe some error in parsing?)

It no surprise that for higher salaries than 4000 EUR there is no clear winner. You must be generalist at these positions (Architects, Team Leads, Tech Leads). So answer to the question in title is: None, or there is not silver bullet, just be good at what you do and make sure to learn as much as you can.

Monday, August 13, 2018

Serverless Architectures

Serverless Architectures 


SA are new approach to application designs. This is hot topic in the software architecture world right now. All "Big Three" (Amazon, Google, Microsoft) are heavily investing in "Serverlessnes" right now.

What is it?

Unlike traditional architectures (where application run on server) it’s run in stateless compute containers that are event-triggered, ephemeral (may only last for one invocation), and fully managed by a third party. These are typically “rich client” applications—think single-page web apps, or mobile apps. These apps use vast ecosystem of cloud-accessible databases authentication services (e.g., Auth0, AWS Cognito), and so on. But if you need computation, you can use for example AWS Lambdas or Azure Functions.

Upsides and downsides

Upsides:

  • No need for system administration (everything is handled by third party)
  • Decreases complexity of your product
  • Natively micro service architecture. 
  • Lesser cost to scale.
  • Elasticity - native scaling 
  • Smaller development and operational costs
  • Decrease time to market

Downsides:

  • Debugging and monitoring is still and issue.
  • Tooling is not there quite yet.
  • Cold start issues.





Tuesday, March 27, 2018

Truffle execute external script call contract function

errors:

 * Error: VM Exception while processing transaction: out of gas
 * Error: Cannot create instance of YourContract; no code at address
 * Error: sender account not recognized
 * Error: invalid address

If you get some of these errors while trying to execute contract function from truffle script, here is proper way to do it:

Monday, March 26, 2018

Truffle external scripts working example

If you getting errors bellow while trying to execute your truffle external script continue to read.

errors:

TypeError: Cannot read property 'apply' of undefined 

exception at require.js:128:1 

TypeError: fn is not a function 

For some reason it was quite difficult to find solution how to run external script. After some time I finally have figure it out so I'm sharing it with world:


Thursday, March 22, 2018

Smart Contracts and Ethereum solidity

WHAT ARE SMART CONTRACTS?

I have opportunity to work on interesting smart contract project. In that respect I decided that I will write something about this _hot topic on my blog. So, what's smart contract anyways?

In 1994, Nick Szabo, a legal scholar, and cryptographer, realized that the decentralized ledger could be used for smart contracts, otherwise called self-executing contracts, blockchain contracts, or digital contracts. In this format, contracts could be converted to computer code, stored and replicated on the system and supervised by the network of computers that run the blockchain. This would also result in ledger feedback such as transferring money and receiving the product or service.

Smart contracts help you exchange money, property, shares, or anything of value in a transparent, conflict-free way while avoiding the services of a middleman.

Can laws be written on smart contacts. What world would that be? Programmers will become lawyers? Imagine code reviews and testing on that contracts.

HOW ARE THIS CONTRACT EXECUTED


Most smart contracts written today are based on Ethereum blockchain. Ethereum blockchain is world computer that can execute smart contracts. Smart contracts are written in language called Solidity (current version is: 0.4.21). Most popular framework to work with smart contract is called Truffle. Contracts are serving as back-end for application, and front-end is written using web3. Web3 is new way of building web apps called decentralized web. Basically there is no classical back-end, only front-end with decentralized smart contracts.

Wednesday, December 9, 2015

Creating intentional memory leak in Java

In Java you can have impression that you don't have to think about memory management. This is true for majority of cases. But there are limits, because if you create too many objects with mixed sizes too fast, garbage collector will work harder and application will be slow.

Memory can become more fragmented which again force garbage collector to compact heap space and make long pauses or throw "Java.lang.OutOfMemoryError" exception. These long pause times are typically triggered when your Java program attempts to allocate large object, such as an array.

Nowadays, modern VM are very efficient and can deal efficiently with rapid small object creation, but if you hit limit you application will die or becomes unresponsive.

Concept of memory leak is very simple, you introduce memory leaks by maintaining obsolete references to Objects. An obsolete reference is simply a reference that will never be dereferenced again. This is so called "simple memory leak".

There are also "true memory leaks". You introduce this leaks when you create objects that are inaccessible by running code but still stored in memory.

One famous example of true leak is concoction of custom class loader, long running thread with thread local variables preferably inside of application container - mmmmmm, so good! :).
This works because the ThreadLocal keeps a reference to the object, which keeps a reference to its Class, which in turn keeps a reference to its ClassLoader. The ClassLoader, in turn, keeps a reference to all the Classes it has loaded.
With multiple deploys you application will break with totally unexpected permanent generation memory leak exception.

There are many "out of memory" errors. Look here for description if interested: memory leaks
 
But in practise, you will see this tree most often.
  • Java.lang.OutOfMemoryError: Java heap space
    • Heap is full
  • Java.lang.OutOfMemoryError: PermGen space
    • Permanent generation space is full.
  • java.lang.OutOfMemoryError: GC Overhead limit exceeded
    • GC is working way to hard with little or no result.

In this blog post I decided to show how easy is to create memory leak. This come come in handy for code interview, or it can be good example of what _not to do.

All examples are runnable, all you need to do is to clone https://github.com/spookysleeper/codingwithpassion/tree/master/leaks repository and run gradle script.

 

Byte leak


To run this example type: "gradlew runByteTest"

This is demonstration of pretty straight forward memory leak using array list and byte array. Array is growing and each element is holding references to one megabyte byte array. Arrays need be allocated as continuous chunks of memory within heap space, and if memory is fragmented GB is struggling and break in the end with Java.lang.OutOfMemoryError: Java heap space exception.



As you can see from this graph, CG didn't have a chance. It's a massacre!



List leak


To run this example type: "gradlew runListTest"

List leak is similar to previous example. It creates list of BigDecimal objects which are newer dereferenced. Simple and effective.
BigDecimal is chosen only because it is heavier than simple Integer or Float or something.



You can see that this time GC is trying really hard to clean heap, but fails eventually.

 

 

Map key leak


Next leak is bit more sophisticated, but at it's core no different than list leak. This is demonstration what will happen when your implementation of hashCode is bad.
Element will bee added indefinitely and every time reference will remain active.

You can run this example by typing: "gradlew runMapBadKeyTest" or you can type "gradlew runMapGoodKeyTest" to test it with good key.



This time CG is not even trying, maybe because StringBuilder with 100000 elements is so much heavier than BigDecimal and simply doesn't have time to do anything.




Class leak


Permanent generation hold internal representations of java classes among other things (names of classes, methods, Strings...). Simplest way of introducing memory leak in this memory area is to create too many classes. Other more sophisticated example is mentioned earlier in this post as "true memory leak".

To run example type: "gradlew runClassTest"



As you can see, it escalate pretty quickly. Because of this, you don't even get PermGen exception every time you run it, it just break on random thing.





Thanks for reading, hope you like it! :)

Saturday, July 25, 2015

Java 8 Streams


Introduction

Every application create and process collections. In Java until recently if you want to do some "finding" or "grouping" on collections you must code it yourself. It was not very exciting and it is repetitive job in nature. groovy for example offers great tools for transforming and managing collection. Check this link for some great examples. Java 8 borrows some concepts from groovy, but also go one step forward with multi core processing and stream concepts.

In SQL you don't need to implement how to calculate grouping or something else, you just describe your expectation (what you want to have).  Stream API in Java 8 is guided with same philosophy.

What is stream?

Stream is basically a sequence of elements from a source that supports aggregate operations. Let's break this statement:
  • Sequence of elements: Stream provides an interface to a sequenced set of values. Implementation of this interface don't store values, values are calculated on run-time.
  • Source: This is where are values are stored. Collection, arrays, I/O.
  • Aggregate operations: All common SQL-like (group, count, sum) and function programming languages constructions (filter, map, reduce, find, match, sorted).
Streams also have to fundamental characteristics:
  • Pipelining: This allows operation on stream to be chained into large pipeline. 
  • Internal iteration: Collections are iterated externally (explicit iteration), stream do the iteration behind the scenes.
Streams are not collections! In a nutshell, collections are about data and streams are about computations. The difference between collections and streams has to do with when things are computed. Every element in the collection has to be computed before it can be added to the collection. In contrast, a stream is a conceptually fixed data structure in which elements are computed on demand. For example in following example no work is actually done until collect is invoked:
List numbers = Arrays.asList(1, 4, 1, 4, 2, 8, 5);
List distinct = numbers.stream().map( i -> i*i).
      distinct().collect(Collectors.toList());
System.out.printf("integers: %s, squares : %s %n", numbers, distinct);
There are two types of stream operations:
  • Intermediate: can be connected together because their return type is a Stream.
  • Terminal: this kind of operation produce a result from a pipeline such as a List, an Integer, or even void (any non-Stream type).
Intermediate operations do not perform any processing until a terminal operation is invoked on the stream pipeline; they are “lazy.”

Streams also use short-circuiting where we need to process only part of the stream, not all of it, to return a result. This is similar to evaluating a large Boolean expression chained with the and operator.

This was just high level overview W/O detailed examples of stream API. It is easy to find examples on other sources like here. In my opinion Stream API is great and refreshing new feature in Java 8, especially with it's lazy, short-circuit multi core features.

Tuesday, July 21, 2015

JavaScript Promises

What are promises?

 

Promises quickly become standard way we handle asynchronous operations in JavaScript. Everybody who code even little bit in JavaScript is familiar with callbacks. Essence of using callback functions in JavaScript passing a function as an argument in another function and later execute that passed-in function or even return it to be executed later.
There are several problems with callback. For example when you need to be sure that two callbacks finishes before you do something, you must introduce new variables to track state of each callback. Callbacks also lead to another problem, which you should be already familiar with: callback hell.

Callback hell

 

I think this all started with node.js and callback hell get a bad rap from the node.js community. This is because when you have your node application with express and mongoose then callbacks are all over the place.
When you need to perform number of actions in specific sequence in JavaScript,  you must use nested functions. Something like this:
asyncCall(function(err, data1){
    if(err) return callback(err);       
    anotherAsyncCall(function(err2, data2){
        if(err2) return calllback(err2);
        oneMoreAsyncCall(function(err3, data3){
            if(err3) return callback(err3);
            // are we done yet?
        });
    });
});
You can use promises to make this code prettier:
asyncCall()
.then(function(data1){
    // do something...
    return anotherAsyncCall();
})
.then(function(data2){
    // do something...  
    return oneMoreAsyncCall();    
})
.then(function(data3){
   // the third and final async response
})
.fail(function(err) {
   // handle any error resulting from any of the above calls    
})
.done();
Lot nicer isn't it?
You can see that instead of requiring a callback we are returning a Promise object. You can chain promises, so subsequent then() calls on the Promise object also return promises.
We don't need to check for error in every callback, but only at the end of promise chain. This is also feature of promises.

Promises are not only solution to callback hell. Some times callback hell is direct consequence of poor code organization. In some cases promises only hide underlying structural problems of code. I mean it you need 5 indention you're screwed anyway, and should fix your program. You can find here some of the hints how to resolve callback hell

Implementation 

 

Promises have arrived natively in JavaScript, but for the end I want to provide half baked promise implementation with comments, so you have feeling how promises are (could be) impelmented:
function Promise(fn) {
  var state = 'pending';
  var value;
  var deferred;
  
  //When function we passed is done, this will be called.
  //If then is called before resolve, then value for then is deffered to function outside promise.
  //If then is called after resolve, then value is readed from internal state.
  function resolve(newValue) {
    value = newValue;
    state = 'resolved';
    
    if(deferred) {
      handle(deferred);
    }
  }

  function handle(onResolved) {
    if(state === 'pending') {
      deferred = onResolved;
      return;
    }

    onResolved(value);
  }
  
  //This will be invoced when client calls it.
  this.then = function(onResolved) {
    handle(onResolved);
  };

  //Executing function that was passed into promise.
  //We are waithing until this function is finished.
  fn(resolve);
}

function testPromise() {
    return new Promise(function(resolve) {
        var value = readFromDatabase();
        resolve(value);
    });
}

testPromise().then(function(databaseValue) {
    log(databaseValue);
});

Monday, September 15, 2014

SOLID object-oriented design

Do you know what SOLID (not solid, but S.O.L.I.D) object-oriented design stand for? It stand for: Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion.

This acronym is coined by Robert Martin. According to him, these principles make a backbone of solid object oriented design. You can read more about these principles in his book "Agile Software Development: Principles, Patterns, and Practices". I will try to describe these principles in following posts, but in a timely manner off course. :)

For starters here are his views on bad object oriented design and what should be avoided:
  • Rigidity - It is hard to change because every change affects too many other parts of the system.
  • Fragility - When you make a change, unexpected parts of the system break.
  • Immobility - It is hard to reuse in another application because it cannot be disentangled from the current application.

Sunday, September 14, 2014

Java built-in profiling and monitoring tools

Java profiling is very useful technique to find performance bottlenecks and/or to solve complete system failures. Common bug that can occur in any system size in Java are slow service, JVM crashes, hangs, deadlocks, frequent JVM pauses, sudden or persistent high CPU usage or even the dreaded OutOfMemoryError (OOME)

Finding this kind of bugs is like art and you need lot of experience to be good at it. That's why some of programmers specialize in Java profiling. In some cases fining bug is impossible if you don't know how system works. Every Java programmer should know at least what are basic profiling tools, because you can't always pay some external specialist to fix memory leaks or deadlocks for you.

Java comes with built-in tools for profiling and monitoring. Some of these tools are:

jmap

This is internal Java tool and it is not profiling tool as such, but it is very useful. Oracle describes jmap as an application that “prints shared object memory maps or heap memory details of a given process or core file or remote debug server”. And it is exactly that. Most useful option is to print memory histogram report. The resulting report shows us a row for each class type currently on the heap, with their count of allocated instances and total bytes consumed. Using this report you can easily identify memory leaks if you have any.

jstack

JStack is also not profiling tool, but it can help you identify thread deadlocks. The output of "jstack" is very useful for debugging. It shows how many deadlocks exist in this JVM process and stack traces of waiting threads with source code line numbers, if source codes were compile with debug options.

jconsole


JConsole is a graphical monitoring tool to monitor Java Virtual Machine (JVM) and Java applications both on a local or remote machine.  It is using for monitoring and not profiling, so you are better with using VisualVM described bellow.

VisualVM

Another tool currently built into the JVM is VisualVM, described by its creators as “a visual tool integrating several command line JDK tools and lightweight profiling capabilities”. This tool can generate memory graph that will show you how your application is consuming memory through time. VisualVM also provides a sampler and a lightweight profiler. Sampler lets you sample your application periodically for CPU and Memory usage. It’s possible to get statistics similar to those available through jmap, with the additional capability to sample your method calls’ CPU usage. The VisualVM Profiler will give you the same information as the sampler, but rather than sampling your application for information at regular intervals.

For me these built-in tools work quite well, but if you want more specialized and more powerful tools for profiling  you can check: BTrace, EurekaJ and Eclipse Memory Analyzer (MAT).