IFS Digitalization CollABorative: Think Tank - Data/Cybersecurity with Vaibs Kumar
Date of Meeting: 20 February 2025 10:00 AM US Eastern Standard Time
Vaibs Kumar Background:
- SVP of Technology at IFS
- Been at IFS for just over 1 year
A conversation with Vaibs Kumar:
- [Vaibs] I run the technology group within the R&D function. That's about 1/3 of R&D. And that is where we do the hardcore engineering. It is my group that does the base platform that does the application framework, the AI stack the data stack, the integration elements, and then we provide a framework and a platform to our application developers, the rest of R&D which then develop all of this business logic, very rapidly.
- We have a very broad suite of applications which also go very deep, and our ability to do that while being only 2000 people in R&D as compared to some of our competitors who have 30,000 plus. And the secret behind that is effectively this framework based development that we do on the application side. So, it's my team that actually owns and maintains that platform and the framework and all the richness around it, whether it's AI or data, et cetera, et cetera.
- I have worked in tech for over 15 years. I was on the other side of the table where you are before I joined IFS as a software company, I used to work for Shell, the oil and gas company for 14 years, so that's the only other company that I've worked for, and I realized after a few very good years at Shell that I'm really a software person at heart and hence I made the switch over to ifs and I'm really enjoying it.
- I think part of my heritage and where I come from helps because when Sarah came and asked me to talk about data and cybersecurity, the first thing I asked is ok, data and cybersecurity from the perspective of how we secure the product that we provide? or from a customer's perspective? So, it’s a pleasure for me to think back from how I used to look at data and cybersecurity when I was at Shell and I could probably share some nuggets from there, and in fact I have some pointed views. I used to be the chief architect for all of the renewable energy group within Shell. It was the last role that I did before I left and joined IFS. And so we did some of the things in there that were very pertinent to data, so I have a few thoughts. I have not prepared any charts, maybe that's unusual, or maybe that'll actually help me make the few topical points and then we can have a good discussion about it. But equally, I'm very happy to follow the energy of the room. So if you want to talk to me about AI, if you want to talk to me about software development in general, anything goes right and data is really the underpinning factor to all of these things, so it can all be related.
- Maybe the couple of salient points and AI is topical and AI does have an impact on your data strategy. It has an impact on your cybersecurity strategy, but I think the notion of data really being the fuel to AI. I'm sure you realize it's very relevant to you, but even in a world of Gen AI where these large language models have been trained in the corpus of all the publicly available data that you can find on the Internet, your data is still very relevant at the end of the day. You still want to ground all of these large language models on your data. So getting a hold of your data, and making sure that you treat your data as an asset is extremely important and a foundational element to your ability to actually create value from that data. And create value from that data can be as simple as an analytical insight on a dashboard. It can be as complex as the most advanced AI model that you will train from, or using that data which will give you that very advanced analytical insight. The big traditional way that data has been treated from a technology standpoint, is where you have data that sits within systems. If you think about the real gold that enterprises have, which is that structured data that sits within systems. That's the fuel that powers all the business processes that you have within your enterprise. So typically what you do is you go about using a system for certain sub processes, and then in a number of instances in an enterprise landscape, you will end up integrating systems and you pass data from 1 system to another and effectively stitch up a business process. And then traditionally, what's happened is we said, well, OK, you have the operational business processes be supported through data within these systems and the integration of data across, and then what you do is you export that data out onto a data platform, and you bring all that data together in one place, and then you start to build these advanced analytical layers, which you then visualize, or you feed into your AI models, and that's generally been the traditional paradigm. The paradigm of operational data. Which is again, I'll repeat, data sitting within your systems that get integrated support, operational business processes and then the analytical paradigm of data, which is all about exporting data out into big data technologies, data, lakes, data warehouses, et cetera, and then applying analytical engines on top.
- What in my experience tends to happen here, and this was the case in Shell, is you tend to create these teams that create that data platform and that data warehouse and then bring that data together and then try to build this latent knowledge of how that data joins up to then create that analytical insight which is good at the start. But then eventually those data platforms become very disconnected. Why? Because your business does not still sit still. Business changes happen, and when business changes happen, you make changes in your systems. You make changes in your operational integrations because business processes change, but then those changes don't flow out into your data platform. So over a period of time your data platform becomes disconnected. The value of data is only in so much as how much trust it creates, or how trustworthy it can be. And as soon as your data becomes old and doesn't live up to your operational word, people lose trust in that data, and then you start to basically see shadow analytics happen. People export data from the system into Excel and build their own little things and say, well, that's really the data, and that's really the truth. It’s not the stuff that I'm getting for the data platform. And this is something that has been a pattern that I have seen with customers now in IFS where we talk about some of the data problems, but was equally my experience in Shell as well.
- So perhaps the one contrarian, but very important point that I want to make to you today around data and if you really want to start to create data as an asset, is to start to think about architectures and start to think about patterns of how you do not actually treat operational integration between systems as a concern of an integration platform where you're just stitching up data, comes from here, goes there and then comes back from here and goes here. And your orchestrator business process by just passing data between systems. But you actually treat that integration between operational systems as the very, very key plumbing which defines your business. And how you use that data that traverses through this plumbing for your operational business processes as really the source from which you pull data on a continuous basis for your analytics. So, change that that pattern of, I treat my operational paradigm and my analytical paradigm is 2 completely separate things, and start to actually look at how do you take that integration that you're building within your operational landscape, which you need to do, and start to plug your analytics right on top of that. And that is also the new pattern that that you see in the industry, where you have these data lake engines, Databricks is a very popular solution, a very popular platform which then allows you to do exactly this, is to tap into your operational integration so data does not become old over a period of time.
- And if you also look at it from a cybersecurity perspective, one of the key things that is a safeguard to making sure that you only allow as much access to data as a person who has enough privilege sees within operational systems, is the whole permissions model, the whole security model that you have that's codified within systems where you only grant the right role privileges and permissions to certain users.
- When you start to take data out of systems and put it into data lakes, you basically have to recreate that permissions model. And that that is a big overhead of maintaining two different permissions models for access or data confidentiality or data security in two different places. Your ability to actually plug and pull data from those integrations also means that you're actually able to not have to manage those permission models in those security models in two different places. So, there are a few different patterns that you see in the industry. I mean, there's a Kappa pattern, there's a Lambda pattern, these are generally patterns of whether you treat data as fast moving versus slow moving, and you have different stacks or technologies that support fast moving versus slow moving data. Or the Kappa pattern which says what data is data. Whether it's slow or fast moving, I use one stack to treat it. But those are effective architectural patterns to think through as you start to look at how do I really tap into all this data that I have, create meaning out of it, and build technology stacks and data platforms that can live beyond just the first few years when you create those platforms. So that's perhaps the first big top of mind thing that I thought this is a nugget worth sharing.
- And then in the world of cybersecurity, I think particularly what's top of mind for you is the proliferation of Gen AI. And the ability to ensure that Gen AI is useful for you. This means you have to ground Gen AI with your data. But then how do you make sure that your data is not leaked out. How do you make sure that Gen AI is not using your data to give insights to others, but then equally how do you make sure that you're only giving it as much data as you need? How do you make sure that you're giving it data in such a way that the different users of Gen AI in your organization are only able to ground that that large language model and then through that large language model are still able to access only the data or the insight from the data that they are allowed to access. We can talk about how that is enabled, but again it's one that goes back to that the operational integration and pulling and ungrounding LLMs on that, rather than exporting data sets. So that's another cyber security concern that you should have. Aside from that, there's human education that we need to be doing around people's use of information assets that enterprises provide. I mean, social engineering is a big thing. And even in the world of Gen AI, the amount of social engineering that you can now do is increasing. So that's an increasing risk. We're going to have to educate our workforce to understand what deep fakes are, to recognize them to make sure that we keep information assets, the enterprise safe. There's the whole zero trust movement where if you're actually building software within the enterprise and you're doing it using popular micro services patterns that you are doing that in a way that there is zero trust between those services. So, there's a number of those cybersecurity elements that one can consider.
Questions / Answers / Feedback / Responses:
- Q: From past experience, there needs to be a lot of buy in as we create more and more data lakes or digitalization like you said. But that also brings in a lot of making the business to accept some of those data security or data cyber security scenarios and so on. Which brings in some levels of change management, So how do you have a complete by in? A good number of us are already using copilot and making it work. Anonymization of some of those tasks has spread through our enterprise through departments creating silos. Silos now. Security silos. So, who has access? There's going to be layers of security access. Sometimes some of the layers can be huge as you give people access. How do you create some of those buy in? Are there some key knowledge, past stories scenarios that has happened and lesson learned that one can easily pick up from?
- A: That's a very interesting question. It made me reflect a little bit on some of the challenges that we had. If you look at Shell as a big company, you have fast amounts of data stored in all different places. So, the inherent risk of people having more access that they need to, or the same data, having 15 different versions and people not knowing what the source of truth is. Those are very, very common problems that we used to deal with. How do you make sure that there is always the element of the social aspect around the data that's being shared. How do you know who is the one that produced that data? How do you know who to connect with if you want more explanation of that data? And particularly when you talk about spreadsheets stored all over the place, on SharePoint or shared on OneDrive that you then happen to have access to, that kind of thing. The general tendency that all of us have is, we tend to believe numbers within systems. There is this intrinsic trust that we build that the system that we use will hold the truth on some types of data. And so that's another way to make sure that you promote the use of data from the source a lot more than the copies of data that get made. And then finally, I would say that particularly in the context of like Gen AI and people are using copilots, you can't prevent people from using tools that make them productive. That that's going to happen. But one of the things that is worthwhile considering is, for example, you have features of the Microsoft Copilot which ensure that when you're using copilot for enterprise, the documents, the data that you're throwing at the copilot is not actually getting outside the corporate domain. So that data is not getting into the corpus of the reinforcement learning that these LLMs do. So those are good means for you to protect at the perimeter level. The leakage of the data by using the right type of copilot services versus others.
- Q: But one of the points that gets brought up a lot in conversations with business leaders around AI, which to your point at the beginning, that's on everyone's minds. How do we make the best use of it? But how do we mitigate risk? Are we ready to use it in its more sophisticated forms? Etc. One of the biggest questions is, are we ready from a data perspective. And so it seems like there's two very different schools of thought. One is more conservative if you will, thinking like we have a lot of work to do on our data, before we can be ready to use AI better in our business. And then on the far other hand is, let's do the minimum we need to do to start using it, and then figure out more as we go. So, my question was more around like from a data perspective, what advice do you have for organizations on how to know their AI readiness? Are there key things you absolutely have to accomplish first and then other things you can do in progress? The debate between how much work do you do up front before you're ready to go all in versus how much do you try and figure out as you go along? Is there certain criteria, almost like a checklist. For example, as long as you have this, this and this, you can get started, even though you might need to refine and continually improve etcetera.
- A: Actually, there is a checklist that exists. I might send it to you so you can share it with the community after the call. We have a thriving partnership with MIT CISR. And they have this concept of a real time business. And they say it very elegantly. A real time business is built in strong data foundations. And to that extent they have a bit of a flywheel about the different aspects of data, which define the foundations of that that data. It serves as a very, very good checklist for organizations to think about their maturity. If you try to do AI on top of weak data foundations, you will only have pockets of AI and you will only have pockets of value. Is that bad? Maybe not. Even pockets of value create some value. And they've done empirical research to prove that those businesses that become real time businesses, have strong data foundations, other ones were able to make materially better profits. And use data as an asset. Use data to feed better analytics, better AI and hence compete in a much stronger way in the market. And so, investing in those data foundations, and investing it right. And as far as I'm concerned, investing in data foundations, knowing very well that you can potentially even outsource the technologies and the procurement of those technologies, but not forgetting that the data that goes into those technologies is still your data. It's almost like don't buy software and just think, well, software will solve my problem. No. You buy the software, but you still need to build the protocols of what data goes in how that data is kept truthful and trustworthy and how that data is used. So, making sure that you don't think of your data problem is someone else's problem or as a as a software provider's problem. And investing in building those strong data foundations is the only way that you make AI or BI scale.
- F: That was my thinking. If you go back a couple years when the topic of AI first started really trending, you had people at conferences or in conversations that are just really jazzed about, OK, let's buy this tool, let's do this thing, but to your point, the technology is very capable and sophisticated, but it isn't just about investing in the technology. You have to have those foundations to work from and that was my thinking was, what’s the division of responsibility in a way between what an organization needs to do to prepare internally versus what benefit comes from making the investment in the technology itself.
- R: Yeah, and there's of course benefits of buying technology. But buying technology makes your life easier, but it doesn't give you a turnkey solution to strong data foundations. And particularly so if in an enterprise landscape you're using more than one system, which inadvertently you do. Unless you're a very small company who wants to just use one system for everything, you generally end up having a landscape of multiple systems. And the challenge comes when data needs to be passed between different systems to then stitch up the complete view of the world, and so the logic that those systems represent in the business processes that those systems support is all good and great. That's what you get turnkey, but ensuring that you treat the integration between those systems as your responsibility, as a very strategic asset, because those integrations is exactly the place where data is being exchanged, Where that data then needs to also be that truthful, trustworthy data that you can then put into your analytics. Treating those integrations as strategic asset and investing in using that source to then build your data foundations that to me is probably the single most important thing that I would say.
- F: Yeah, I love your point about preventing a false trust in data just because it's data within an organization. I chuckled to myself. I don't know why this is the example that came to my mind, but if you've ever been like at a conference or watched a presentation where someone puts up this point that is so valid to what they want to say and then it was sourced in 2002 or something. And it's like, well, of course you can manipulate things and I'm not saying that's the intent, but it was a good reminder that if there are versions of things that are getting off into the ecosystem, you need to make sure you are working from the actual truth, the real time data to your point. And I think that's a really good reminder.
- Q: It's very interesting what you said from the start. That typically data platform we are using for BI, but typically it gets disconnected. So, when we are now following the Evergreen, we are in IFS Cloud doing the different release updates. There were so many data sets that changes and then we find it difficult to actually do the proper change management. So we actually also get all the power BI reports and the data platform also updated. So that is the challenge and we do not have an enterprise architecture either. So we are absolutely on the journey for doing all these Azure integrations. To connect all the different systems that we have. So, maybe that's also a question to the others on the call that maybe you're seeing the same challenges that we are facing and maybe have some best practice when it comes to have control on the architecture.
- A: I mean if you could use power BI for some of your reporting or viewing. Vaibs said, making sure your data is very viable is the first foundational scenario. Making sure it is really what the business really wants. A lot of time with some of our business mentioned it at the beginning, business is changing but the data really is not changing, because someone somewhere is saving some of the new data in an Excel sheet and it's putting it on a SharePoint, and is sharing it only with its own department. So, you don't have an updated data. You need a complete scenario where it is easily pulled over to Power BI, so there needs to be a complete end to end consolidation of the data. Which can be serious architectural scenario that needs to be well designed so that as it changes you have a change in front end to pull to your Power BI. Hope that helps. Microsoft is going to change Power BI to. I mean depends on what you have in mind so they are also viewing on reporting from IFS that could easily be leveraged. Depending on what business need, that's another issue, so having business requirements well captured is also key.
- A: One thing that we allow is creating power BI reports using the data sets that we will provide. And we even allow you to embed those power BI reports into Office Cloud itself. So, it's the power BI report that you are creating, embedded within IFS Cloud because that just makes it so much more convenient. If John needs to use IFS and John needs to view those power BI reports, it’s not useful having a different browser tab for John to view those reports in, right? It should be ideally be living in the same experience that John is using on a daily basis, which is what we provide today already. But then equally, what we are doing is we're making sure that some of the default reports that we have, we're building that on open source technologies and making that into IFS, there is no real dependency on power BI. So you don't necessarily need to buy power BI if you don't use Power BI just to use IFS and do the visualization in IFS.
- The other strategy that we're working on is actually the concept of data products. To provide you curated data products from within IFS into wherever. Into whichever data lake you have. To then be able to make sure that we are always giving you the most up to date data as quickly as you need it, and it's always at your fingertips. That solves that lack of trust data problem.
- R: Maybe I'm making a very obvious observation here, but the example you just gave about the Power BI reports and then the example you used earlier about copilot for enterprise, to me, both of those examples, it's one of the ways to manage the change or help mitigate risk in these behaviours, is understanding what is the intent of the behaviour, and instead of just creating a blanket approach without understanding those things, it's understanding what is this person who's doing this thing that could potentially bring risk trying to accomplish? And rather than just telling them no and having them maybe do it anyway, is there a way we can accommodate their objective in a more secure way. So it's understanding what people's aims are and taking that into account with the parameters you're trying to set so that if they're working toward an end or they're taking a shortcut because it helps them accomplish a goal, you can find a more secure way for them to accomplish that same goal.
- I think going back to the question, I was thinking about this. The first point you made about the integrations and changing the approach if you are wanting to treat your data as the asset that it is, and I would argue most companies do want to do that or should be doing that. It does seem overwhelming to think about how you make these changes in a landscape where everything is moving so fast. We're talking about things that are best practices or things companies need to consider or do, but while you're trying to do those things, the data's moving, everything is changing and I don't know if you have any advice for how to manage that or how to keep pace with how quickly everything moves, and make sure that the efforts you're taking are going to help you keep pace with change.
- R: The ability for people to experiment and the ability for people to do pockets of accidents like I was saying, we should not prohibit that at all. In fact, we should enable and encourage that. But then for you to do anything material at scale, it takes time, it takes hard work, and it takes you being staunch, it takes perseverance to a target and a goal. That's the same in the world of data. You're not going to magically be able to just build your data foundations overnight. And it takes time, and it takes perseverance. And it's like that flywheel effect. There's a tipping point after that, the exponential value that it starts to deliver. But very materially, my guess is that you're looking to develop an enterprise architecture and you're going to use Azure integration services. Which is the four different Azure services that are geared towards integration of systems, effectively passing data from 1 system to another. It's a great opportunity for you to then say, well, ok, I want to do the hard work to make sure that those integrations are well documented. That we know exactly what data is traveling from where to where. What type of data is it? Is it transactional data? Is it reference data? Is it master data? How can we make sure that that same data is also leaving a copy in my data lake? And then how can we build the analytics engine on top of that rather than having a whole different pattern, or creating integrations is one, but then you know exporting data from systems and then trying to replicate that by joining data up etcetera. And making sure that there are owners for the systems that you have in your landscape and those owners are very much not focused on just the system, but the data within that system. And how that data needs to be kept up to date? Where is the data coming from? Where is the data going to? What's happening within the data in the system? Those are the things that people need to be able to articulate. There is a number of those basics if you start with, and you do that diligently, there's a lot of value to be found.
Next CollABoratives:
- 11 March 2025 11:00 AM US ET / 15:00 GMT / 16:00 CET
IFS Assets CollABorative: Tech Talk: Road Map Session
- 13 March 2025 10:00 AM US ET / 14:00 GMT / 15:00 CET
IFS Digitalization CollABorative: Meet the Member - Journey to IFS Cloud w/ Chris Rundell, Covia
- 19 March 2025 10:00 AM US ET / 14:00 GMT / 15:00 CET
IFS Service CollABorative: Think Tank - Change Management with Alfa Laval
If you are an IFS Customer and you do not have the next meeting invitation to this CollABorative and would like to join, please click here to fill out the form.