Welcome to the EDSA Data Science Skill Demand Dashboard – examining demand for Data Scientists and the specific skills required of Data Scientists in different geographical regions and industry sectors across the EU. Each view provides a different perspective on a common data set, to support policy- and decision-makers, job seekers and trainees, and practitioners, with a set of exploratory visual analytics and statistical analysis options to:

The dashboard serves a second purpose: to collect feedback from practitioners and others with an interest in Data Science, in order to move closer to the ground truth in the picture of Data Science demand, and therefore determine also the gap between demand and the capability of today's workforce to fill this demand.

 

This page gives a brief description of the input data and the different perspectives provided by the dashboard and the exploratory analysis and search tools provided in each, along with some sample queries. Do contact us if you've any feedback or further questions. Jump to:



Map Views

These location-based overviews display data on job postings advertised across the EU and some EEA countries. There are two perspectives, the first targeted at policy- and decision-makers at management level &ndash. The second maps location of postings and the skills required in each to retrieve relevant learning material – to aid trainee Data Scientists and job seekers in identifying job roles that match their interests and any further training they may require to carry out each role effectively. (These pages open in a new window; close or navigate back to this page when done.)

Data filters:

Search by skill:

Filter by location and/or country:

Filter by time using the coupled timeline slider at the bottom of the map

Search by skill AND location (city or country) using the auto-complete fields. Querying is currently exact match – the auto-complete will filter to the most specific or no matches as you type, providing a preview of potential results.

Pan, physical and semantic zoom

Pan and zoom using the mouse; physical zoom translates into semantic zoom beyond a threshold, showing more detail for each cluster in a result set.

Statistical Analysis

Updates to the charts follow after a query has been posted to the map

 

 

Sample queries

E.g., search for all jobs in London, Greater Manchester, Madrid, Berlin OR Brussels requiring Java AND Statistics. The bar charts on the right summarise the result set and provide more detail around the filters selected. No matches for both skills were found in Berlin.

Show/hide snapshot of filter

Repeating the query above in the job seeker/trainee view reveals jobs for both skills but no matches for both skills in the learning resources database. Searching for statistics alone reveals matches in the latter.

Show/hide snapshot of filter

back to top

Skills Analysis

The Skill Sets Viewer makes use of parallel coordinates to provide an overview of co-occurrence of skills in job postings. This coupled skill set selector on the left lists 46 skills grouped into 7 skill sets, identified as core to Data Science in the initial stages of the project, through interviews with industry experts.
Skill set grouping is reflected in the colour-coded axis labels in the parallel coordinates view. Skills not mentioned in at least one posting are greyed out. Each polyline represents an aggregate per country over a week, starting from the earliest posting date in the dataset, intersecting along each skill axis to show the total frequency of mention for that aggregate. Two trend lines for maximum and median frequency across all aggregates are shown using thick, broken lines, colour-coded rosy pink and green respectively.

Co-ordinated data filters:

Lower the skill frequency axes height to reveal lower peaks in the overview.

 

Sample queries

E.g., search for all jobs requiring Python and Statistics posted in Sep 2015. Mouse over or double-click on a set of postings (polyline) in the result set to reveal selected attributes and the frequency of mention of skills.

Show/hide snapshot of filter

E.g., Identify the most frequently required (mentioned) skills by country.

Show/hide snapshot of filter

back to top

Job Demand Data

Job postings in Data Science are being mined from online portals such as LinkedIn, Monster, Indeed, Stack Overflow and Adzuna. Terminology matching skill sets found to be required of Data Scientists are used as a filter for the data acquisition process, for jobs advertised in the EU. The aim is to collect a dataset that allows us to analyse historical and current trends in Data Science demand, and therefore, predict the future Data Science landscape across the European Union (EU).

Ontology & Knowledge Framework

To aid data acquisition and effective (re)use of the data, as part of our analysis process we continue a cycle of constructing and refining Saro, the Skills and Recruitment Ontology, to capture job postings and other related data, maintaining provenance in each instance. The structure of each post is mapped to the JobPosting concept, which extends schema.org.JobPosting, listing attributes including the Skill concept, which is based on the ESCO Skills Pillar. Other key information extracted includes job posting date, hiring organisation, geographical location and salary.
To connect jobs to learning resources, the ontology also captures information on the educator or trainer who designs curricula and delivers courses based on them, the qualifications that results from these and their awarding bodies. We also define a User, sub-classed into our target user types, including the three main whose tasks we used to guide the redesign of the demand analysis dashboard – the decision maker, the trainee and the practitioner.

Data Acquisition & Pre-processing

The data acquisition pipeline starts by crawling for postings online or extraction through an API. The data is then analysed using a Wikifier to extract skills from a predefined list or matched to relevant terms defined in DBpedia. Location information in the postings are then mapped to GeoNames to extract latitude and longitude and normalise location descriptions. The enriched, annotated data is then encoded as RDF.

As part of the EDSA project's commitment to release findings using open standards, we aim ultimately to make our data available as Linked Data. We currently host the enriched data in an RDF store (Jena Fuseki 1.1). The Data Browser provides sample queries to retrieved the enriched, annotated data describing job roles and skills required in each.

Show/hide snapshot of an annotated job posting

back to top

Recommended Browsers

Other Browsers

Firefox and IE have some issues with interactive pages using JavaScript. If you must, use the most recent version of either to obtain workable interaction.

back to top

Contribute to the EDSA Skills Survey

Do you have data about data science jobs in Europe? Are you a job site, hosting such posts? Are you a practising Data Scientist or a policy or decision maker in an institution requiring data scientists? If so, we need your help!

To contribute your data or help us to extend and refine our initial set of Data Science skills please complete our skills survey or contact us. More detail ...

back to top