Data Hackathon at our Prague Lab

At Rossum, we just talked about how critical data is for machine learning and for us specifically. At the same time, we’d like to think we are doing some pretty exciting machine learning research in our Prague lab that we want to share.

Sometimes people come to us thinking “meh, invoices” at first, but we can convince them that the machine learning behind the financial data extraction is a game changer – for all documents out there. It’s not about accounting or fintech for Rossum’s long game – it’s about getting access to the knowledge any document contains.

We wanted to create an opportunity for showcasing our tech as well as for great developers to meet, work together and also meet other inspiring people. So, we are holding a hackathon! We invited machine learning researchers, Python wizards as well as our bright former students.

To be honest, we are a bit clueless about “hackathons” ourselves – the social aspect is most important to us. But we have a clear task that should be pretty stimulating and be a huge help to us at the same time. It’s about the data! One of our dataset sources is the internet, and we want to challenge you: find interesting ways to get invoices on the web (also called focused crawling).

Data Hackathon - Invoice scraper
Our current invoice scraper is codenamed “Hrabish”.

You’ll get all the bits we already have – Bing search API, a download tool and invoice classifier, as well as sample documents (the part of our dataset you don’t need an NDA for) for some quick machine learning. What next – clever hacks? reinforcement learning? It’s up to you from there! But don’t worry, we’ll make teams of 4 people so you won’t be fighting alone. And there will be a prize.

The hackathon is on Friday, Jun 30 in Dejvice, Prague (near CTU) and starts at 17:00. We have only a few last remaining spots available! Still having your Friday night free? Try to drop us an email at, but don’t wait for too long!

Ready to get started?

Make a quantum leap in your document processing approach. Boost accuracy and effectiveness with an AI-powered data capture solution for all documents.