Big Data or Big Mess?
Big Data, machine learning, AI – the hype words that will pull you into the magic circle of modern technology. Everyone wants it, everyone wants the possibilities, the growth, the kickstart that can be given to you by a good amount of data and a smartly designed software built around it.
But then why 87% of the data analytics, big data and AI projects fails? Aside from organizational and cooperation-relatedproblems (more info here), there is more behind this amazingly high failure rate.
Is your data good data?
You could ask is good data and bad data actually a thing? Well maybe not in itself – but data without context is worthless (read more in our earlier blogpost here). Building the wrong relations, the wrong connections, having the wrong approach and conclusions can make your data bad data.
Like when in The Big Bang Theory Leonard, Howard and Raj found some notebooks of a late physicist, Professor Abbot,filled with hundreds of pages of seemingly random numbers without any notes or explanation. Seems like something important, maybe it is his life’s work, maybe it contains something important, exciting isn’t it? Realizing that it is in fact his daily calorie intake diary such a bummer...
You can’t know if your data is valuable if you don’t have the context. You have to know what this data is about, otherwise you might as well end up building a model to predict some long dead professor’s calory intake….
Seemed like a good idea...
There are a lot of projects that fail because they seemed like a good idea, but in reality were completely worthless.
A company that had a relatively high employee turnover decided to up their „HR game”. They started researching the signs of people resigning, trying to predict which employee will leave the company soon. They tracked measurable data, like the number of years they worked for the company, their commuting distance, their salary, overtime, sick leaves and some hardly quantifiable data like their engagement to the company. They were looking for detectable connections, that will help them make their employees next move predictable.
While in some cases, collecting data and finding connections simply turns out to be not viable at all in production, in our case, that was not the problem. The issue was the wrong understanding of what data is and what data science and statistics can do for you.
The company actually was planning to use the knowledge they earned: they were planning on using it in their evaluation and hiring process. The idea was to evaluate new candidates on their likelihood of leaving, and only hire those applicants whose likelihood of leaving the company in the first 3 year is below a certain rate.
...but it simply does not work that way!
Now this sounds great, isn’t it? Just add in a couple of variables about your new candidates, and a software will spit out the possibility of them becoming a loyal, long-term employee.
Is it doable? From a programming point of view, with quantifiable variables, it is no magic to create a program like that. But will it work? The answer is a straight and obvious no.
Besides the fact that collecting data like this is questionable at best, with regards to the privacy of employees, it does not make sense from a business perspective. If we really give some deep thoughts of why anyone would want to quit their job, the factors are multiple, and are neither predictable, nor measurable for start. How someone will respond to workplace dynamics, or personal issues they might encounter, are not quantifiable and cannot be predicted even with the utmost caution.
And beware of the self-fulfilling prophecy! A simple notification to a manager that a certain employee is most likely to leave the company in the near future based on these parameters will itself affect the workplace dynamics – and might bring upon what one wanted to prevent.
The idea completely ignores the single most important thing of why some people stay at a workplace even when underpaid or working extreme hours, and why others with a generous salary or personalized benefits will still leave. And that is the human factor (read more here).
Causes or consequences – did you make the right conclusions?
But let’s assume, that you are able to quantify and take into consideration every human factor – that is probably a mission impossible, but sticking to the above example let’s just assume it for a second, and try and dig deeper of why the idea is inherently wrong.
The software would calculate a vast number of variables, that are not decidedly causes, but most likely consequences of an employee leaving the company. Having less overtime, going on a sick leave, besides of other factors can actually be consequences of an already made decision: our employee already decided to leave, and preparing for the change, is looking for another job and so on.
The algorithm does not predict, it just tells us what is happening. We are too late to do anything about this employee leaving. That is how wrong connections, or even only seemingly existing relations can make your data bad data – and lead to wrong conclusions and wrong business strategy.
Machine learning and its limits
Last but not least, even if you get every single part of your use case right, although we mentioned it in our previous blog post, but it can never be emphasized enough: machine learning can be a goal, but it is very rare, that it can be your first step of data analytics. The reason for that is simple: you need huge, and I mean really huge amount of data for your software to be able to actually find working relations, rule out the insignificant factors, be aware of unique cases and extremes and create predictions that will actually give you useful and trusted information.
Having a machine learn relations, connections, create predictions is not so much different than having economists and researchers conducting statistical analysis. You need an unbiased representative sample to get to a conclusion, and to avoid biased sample you need to know your population. In our example case even if we ignore any other issues, simply reviewing 500 employees actions are just not enough to be able to get clean data that can be used as a base of any sort of prediction.
For a kickstart, most companies don’t need machine learning right away: synchronizing, centralizing your data, and automating just some of your daily tasks can already be a huge step towards the right direction.
This is how sometimes even small but fast companies on the market are able to steal market share from well-known giants: they already know that there is no need to reinvent wheel to gain momentum.
You can share this blog post
More blog posts
Being a teenager in the 90s, with the internet, mobile phones and all sort of exciting new technologies coming up, it was hard not being enthusiastic about Computer Sciences!
After finishing my degree (Computer Science and Networks), I took a detour, working as a consultant, busy with the soft side of IT (writing design documents, requirements, sketching business and IT processes, these sort of things). After a while I realized that I wanted to go back to the core of technology and changed focus towards more technical and code related work.
The real joy in coding for me is creating something out of your brain, and see it come to life - something most programmers would recognize. In addition to the creative thinking and joy of creation, computers follow straightforward logical patterns, what you code is what you get. Refreshing compared to working with humans (Not that I have misanthropic tendencies, but hey, everyone will agree, people can be difficult ;) ).
I am result-driven, inquisitive, and professional, always curious, and ready to learn new technologies as well as dive deeper in what I already know. I love challenges and new discoveries, which makes my field as a professional really broad. I do not like to fit in a box and am eager to always keep learning! So, java bytecode, deep learning, graph databases, SQL, high level architecture ontologies, web technologies, security, networking, bring it on! :) .
I am one of the owners, Cuurios is my baby. As a child of the first internet bubble, of the Amazon and the Google era, being an entrepreneur has always been one of my deepest dreams. You can see it as an extension of the coder’s creative drive. Thinking out, designing, and developing a visionary product and bring it to market, full control, and total responsibility. Daunting, yet exhilarating!
As owner and lead developer Cuurios gives me the possibility to completely give direction to the work, according to my own ideas, and to where I want to go. It also gives me the freedom of setting my own agenda and being able to work as I wish – probably every thriving programmer dreams about that at some point.
In addition to that, I believe Cuurios does not only hold value for the owners or the employees. We aspire to deliver real values to our customers, always helping them to get the most of their experience with us – not just with our products, but also every level of our communication, every step of us working together. We are very proactive, and we tell it how it is, we can expect no bs from us!
After a first month as part of the YES!Delft investor readiness program, it is time to share some thoughts on the process, what we’ve learned so far and how it impacted us.
DISCLAIMER: This light-hearted description is only very loosely based on reality and should definitely not be taken at face value :-)
Like most, we started this journey as enthusiastic entrepreneurs, completely sold onour ideas, focused on how great it is and all the cool stuff we have in petto, the myriad of functionality it will provide and how it will revolutionize the market. We were pretty confident, sure of our game.
Then we had 1 minute to pitch.
- Everybody understood we had something great, nobody could quite get what it was all about.
Investors are from Mars
That would be the first thing we learned. Pitching to an investor (or a prospective client) is not about YOU, it’s about THEM.
Investors speak the language of TAM, SAM, SEM, ROI, Churn, Business Models, Valuation, EBIDTA, Cash Flows... Learning the investor’s lingo was a damning task, like learning a second language full of acronyms and numbers. Frustrating at the beginning, very enlightening in the end. It really helped us get our story straight and forced us to do the math. Cause you’ve got to do the math. You can’t stay forever in the fluffy stage.
The first hurdle passed, it was not the end of it. We had some numbers, great, but what was it again that we sold? And, most importantly, sell to whom?
- Ok, you’ve got 30 seconds.
Hmm, yes, you know, we make software that does stuff with data, and it’s super cool.
At this point we knew we had to do something about that elevator pitch.
Here comes the WOW method in place. A simple method to get your ideas straight and write an elevator pitch with a true WOW factor (yes, it is in the name).
Suddenly, it all made sense. This is the story we want to tell, these are our customers, and this is what their pain is.
Elevator pitch much improved, still some way to go, but at least most people now understand what we are trying to achieve.
No free lunch
Now that we’ve got a good enough elevator pitch, and some numbers, time for what we’ve all come for: Show me the money!
Again, a good learning experience: there is nothing like a free lunch in this world. Investors are not charities, they’re in for the big bucks! And how to convince them that you are the next Google when you are a fledging start-up?
- Get some customers on board, build traction, show you’ve got it!
But wait a sec, to build any sort of real traction we first need to finish up our product! And wasn’t raising money supposed to help get us there? It seems a bit like a chicken and egg story, isn’t it?
Welcome into the real-world people!
There are some options fortunately, government grants and loans, to help you get started and get you through this first stage in your development, without of course forgetting the most common investment of all, some good old-fashioned hard work!
So far, a very valuable program, it has helped us to sharpen our message, build up our case, and get us ready for the next stage!
Part of my final year of international business at The Hague University of Applied sciences is to do an internship. During this engrossing year I was lucky enough to do my internship here at Cuurios. I have followed almost every business related subject during the last 3 years at my university, ranging from human resources to finance to operations management. I always missed a really tech related subject which is quite important in the modern days. I’m really happy I can fulfil that missing piece here at Cuurios.
I applied to do my internship at Cuurios because it is a relatively young, still small sized but growing company. I like this size during this internship because even though I am an intern at Cuurios I am also part of the team and I am really responsible for the tasks assigned to me. Although the pandemic has made us work from home most of the time, I am still able to get proper guidance and plenty of sparring opportunities.
Whilst I’m doing my internship here, Cuurios participates in the Investor readiness program by YES!Delft. It is a really interesting program where I get the opportunity to translate gained knowledge from my study into reality. I help Cuurios to prepare for an acceleration of their business, this ranges from a good company value proposition and pitch to a financial planning and business plan. The tasks are diverse and I’m able to work however I feel best suited while still being coached and questioned which I find is working perfectly.
For me, programming came in late. I wanted to be a lawyer, I had graduated high-school with all hopes of studying law, but a light shone and a voice called out to me - “Michael, study Computer Science instead”. I then honoured the call and started preparation to get admission into University to study Computer Science. Thankfully, I got in.
In my first year (2012), I was introduced into the art of programming.
The idea of me building something for people to use was similar to being given a magic wand, which felt very good. I started experimenting with Visual Basic, the drag and drop system helped me easily visualize my ideas.
Year after year, I delved deeper, building applications for friends and small organizations. Everything changed when I was paid to build an application in my third year, a holy grail was given to me. I didn’t know people would pay you for what you enjoy doing most. It was an eye-opener.
To me, programming is not a task, but a hobby, and creating things is wired in my core. I became a frontend web developer because it’s the closest programmable bit to the user (had not discovered Product Design at that time) and I enjoy that feeling of being able to engineer experiences for users whilst controlling what they see and how they use the application as a whole.
As humans, it’s pure happiness to see people follow you. In programming, it’s the same feeling, if not more when you see metrics of the people that depend on what you build. I like the influence, though little, to control how people carry out their daily important business, leisure or personal tasks using my applications.
I joined Cuurios in October 2018; a very good decision I must say. I applied because I wanted to learn how things are done in other companies, and Cuurios’ “Data to Actions” tagline sounded like a place that would boost my programming knowledge and nudge me to code more complex applications.
At the very beginning, my first project gave me sleepless nights, as I didn’t understand most of the application. I bought whiteboards and started disintegrating the project to understand the whole quite-complex system. Now, however, I have gotten a better grasp of working on complex systems, my frontend skills have improved dramatically. The best decision so far. I feel my role in Cuurios is important (very much to me), I control how and what the customer sees. Though you need to have a very keen eye for design to do this and Cuurios has enabled me to perform this art efficiently, even using my little Product Design skill. Although I cannot single-handedly add a button anywhere I like, but I can make sure the button sits where it can be easily accessed.
At Cuurios, every ticket is like a HackerRank question, especially when it comes from Leen (COO). Sometimes I’d have to read and re-read to be able to digest the problem and think of a suitable solution which has improved my problem-solving ability. I ask questions a lot and that has helped me grow. In addition to that, Gaetan’s (CTO) experience has made me a better programmer. I take time to study the codebases of the applications built. (When you learn from the best, you become like them).
I also wonder sometimes, how Leen does it, that he is everywhere from a business standpoint. I’ve learnt from him that you need to understand the customers’ request in-and-out.
I believe Cuurios is a place to be to sky-rocket your career and build fantastic projects, and most importantly everyone at Cuurios is human.
One of the common issues we face when developing applications for industrial customers, is the issue of accurately representing their domain.
A domain is the set of assets, equipment, departments, systems, that make up the whole of a company’s operations. Organizations usually have fine grained definitions of who should be responsible for managing a specific asset, which department resupplies systems, etc.
A software system should integrate with an organization structure and enable its operations. More often than not, they achieve the opposite, requiring organizations to fit their processes and structure inside their own rigid asset structure.
While promoting standardization, this approach stifles organizational innovations. It leads to faulty and incomplete domain representation, as assets are not recorded as they are, but as fit the system, or not recorded at all. In the end, many systems end up being hacked by system-integrators or in-house teams to make them fit, or a custom solution is being developed.
Very often these limitations are driven by technological limitations, SQL databases (still the norm for most industrial applications) requiring a rigid structure to be performant.
We think that domains can best be described using graphs. A graph is a representation of information using node (vertices) and links (edges).
The following example should help to shed some light on the concept, and explain why we think it is such a great fit for representing domains:
- Company A operates a small plant.
- The plant has systems, composed of equipment or sub-systems. An equipment or a sub-system can be shared by multiple parent-systems.
- At the same time, each equipment is linked to 2 departments, the maintenance department, and the production department.
- Each equipment also has supervisor, a specific person, and a back-up supervisor pool, a pool of people that can be called up if the supervisor is unavailable.
Now you can see how this would start to be very complex when designed in the traditional fashion, leading to complex and inflexible implementations.
Now look at how we could implement this using a graph: