How a  major bank wasted 1000 hours troubleshooting what machine learning could have done in seconds

Posted by Ivar Sagemo

Find me on:

Vipps, a mobile payment app from Norway's largest bank DnB shut down last week.  Vipps was launched in 2015 and has quickly become a major payment solution with more than 1,9 million users.


According to a news article published by DnB admits to spending about 1000 hours to troubleshoot the problem.  

The problem turned out to be a system update from their payment gateway, Nets.  Yes, you probably got it right - the system update was not communicated to DnB and the Nets engineer that was assigned to participate in troubleshooting was not aware of the system upgrade either.

This is not a surprise. It's a common error.  

The problem is that organizations are not taking the consequences of increasingly connected software that communicates in real time,  where society in general are the hostages of any downtime.

Well, the organizations face significant losses from such downtime, in this case 2 days for 1,9 million users.  DnB admits spending 1000 hours troubleshooting. The cost of 1000 hours wasted of IT operations' / developers' time is of course significant, but the largest loss here is likely reputational damage for the bank as well as lost revenue for any organizations relying on Vipps for payment acceptance.  This reminds me of transit time I spent at Arlanda Airport in Stockholm last year where all electronic payment systems in the whole terminal was down for at least the 2,5 hours I was waiting for a flight.  Swedish kroners in my wallet? Nope.   The thousands of passengers in the terminal had no way of paying for their food, beverages and tax free items unless they had cash.  For the merchants, that is lost revenue.  That revenue will never return.

The same applies across the retail sector.  Transactions are realtime and buyers have no or little patience for delay in payments.  It does not matter if it's a physical store or an online store.  Slow processing of transactions lead to lost revenue and potentially customers lost in the long term.

So, with these  consequences:

  • Do organizations not understand the vulnerability of their systems and the reliance on other parts of the value chain (read "other connected systems")?  In my experience, the IT teams do.  
  • Does management / line-of-business owners understand?  Probably.  
  • Is there a knowledge gap between tech and business.  Definitely.  
  • Is tech capable / interested in communicating their need / vulnerability to the business.  Most often no.

At AIMS we meet with several medium to very very large organizations every week and discuss challenges with regards to core software systems and the vulnerability of dependencies and integration with other internal or external applications.

So let's face it.  The amount of applications that are integrated today (and are relying on real-time & low latency communication) are so extensive that it is hard for anyone to even contemplate the potential consequences if the weakest link breaks.   Across all sectors: finance, insurance, healthcare, retail, supply-chain, logistics, transportation, security and public services the number of applications explode and with digitization they are increasingly connected and co-existing with real-time dependencies.

Historically, the solution has been cumbersome and manual monitoring solutions (people or software) but with the extent of integration and real-time communication manual process and people are bound to fail.  The vulnerability and consequences of failure to society in general are so severe that we all owe each other to implement processes and systems that allow prevention before users are impacted.

The only solution to this challenge is to leverage software that continuously works to identify anomalies early using  big data analytics, machine learning and sophisticated anomalies. 

Subscribe by Email

Most Popular Posts