The algorithm is important, but the data it runs on is foundational
AI’s power is tethered to the data that's used to train it and the data that the AI is acting on. If we don’t pay enough attention to the data issues and focus only on the algorithmic issues, we’re not looking at the whole picture. Biased algorithms can be pretty difficult to design. What's usually biased is the data that's training it.
Generative AI advancements like that of OpenAI were mostly trained on what's accessible on the public internet, like Wikipedia and Reddit. What's really going to move the needle from a corporate perspective is a company’s ability to leverage its own data to train the AI — especially what I would call “operational data”: Your financial system data, all of your ERP data. If you're in an intellectual property industry, you're developing new products or new capabilities. You've got all sorts of intellectual property in your research and development, and all of that data is really, really juicy.
There are many challenges here: most companies’ data is quite confidential. Until recently, the technology to monitor the broad set of data that most companies have has not been adequate, and the technologies that do exist are largely focused on things like email or office documents — things that human beings write. That aforementioned operational data is harder to access. Plus, data is in different formats and different systems, and is growing very rapidly. Finally, not many businesses have control of the data because it’s stored in different places, like the cloud. It moves around quite a bit and its sensitivity may change quickly.
So here is my advice to leaders, if they want to make the most of this AI moment:
Get full observability of your data. For any business leader who wants to really leverage the power of AI for their company, they've got to get a handle on that type of data. I suspect for most industries it's difficult to know where it is, how it's moving and how it's changing. And it's going to be growing significantly more in the future than it is now, because data is becoming a new oil; data is being generated by everything. It's being used by everything. Any business that wasn't dependent on data before is now or will be.
For example, healthcare is driving the world's largest amount of data. That’s only going to get bigger for that industry, especially when you start thinking about things like Fitbit data, iWatch data, wearables that people are using to manage their health and fitness. Medical providers are going to use those types of tools much more for providing and managing care, and add all that data to this system. And that’s just one example where handling the challenges of today isn't going to be sufficient.
Collaborate to define data governance. My primary recommendation for security and IT leaders is not to assume that they can handle data governance alone. There are a lot of security groups out there who are too disconnected from the business because they tend to remain focused on the technology side of security. They don't recognize how the business is using data. Security leaders do have an advantage that they should lean on, though: that they need to understand how the entire business operates to be effective. They don't just sit in research and development, or in commercial or manufacturing or HR or legal. They have to cover the entire company.
If no one else is driving these discussions, then security and IT leadership could leverage that responsibility and level of scope to start these discussions, especially if no one else is. But it’s critical to recognize that they need people from the business to get involved and understand what their role is in owning and stewarding data.
In the past, an R&D or marketing or procurement leader may not recognize that their line of business, their function and their organization generates and requires data to run. Maybe they assume that IT would take care of that. Now they own that data. They understand its value to the business. They understand its appropriate use. They understand the insights they want to get out of it, and they have to step up and understand what that means from a roles and responsibilities perspective.
Once those conversations begin, adapt your security and risk mindset. You can start coaching and understanding the risky scenarios that may impact that data harmfully or inappropriately. Discuss what’s most important to the business. That’s important too, because in many organizations, maybe 10-15% of the data is the most sensitive that could impact business negatively if it's compromised or exposed. How do you navigate all the data that you have and find that 10-15%? You need the business's help to do that.
Prepare to up-level your data protection and access control capabilities. Most security organizations have built their data protection capabilities around DLP, and their access control governance around on-premises applications. Neither is sufficient for the massive growth and dynamics of today's data models. Building on the visibility and understanding of data I’ve already mentioned, we must build and operationalize modern data security and access control capabilities to address today's and tomorrow's challenges of cloud scale, variability of data and access models and context of lifecycle and usage.
In sum, your data could fuel the most helpful AI for your organization — if you can get to it, govern it and act on it meaningfully.
Mike Towers has three decades of experience in digital trust, data protection, global information security and risk management. He is a strategic advisor and board member, has served as a chief digital trust officer and CISO for multinational corporations like Takeda, Allergan and GlaxoSmithKline, and is the founder and principal of Digital Trust Group.