Artificial Intelligence (AI) is being incorporated into many areas of our lives, especially in the digital ecosystem. Whether it is an online search or on a website that provides materials for students, AI is being used. The premise behind AI and machine learning (ML) is simple: use compute resources to speed up work that would take orders of magnitude longer to do manually and to identify patterns that humans would not normally spot, regardless of whether those patterns are in language, medicine or astrophysics. Given the proliferation of AI/ML solutions, it is important to understand potential areas of concern, especially ones that affect the reputation and trustworthiness of organizations and individuals within the digital ecosystem.
Bias Will Always Be a Problem
AI/ML solutions generally work by processing large amounts of data, especially to build working models. Therefore, anyone working in AI/ML or using solutions integrating AI/ML should be concerned about bias. The good news is that data science has trailblazed methods and training for identifying and handling bias. When the data science field exploded, a major concern was bias, especially by data science professionals. We all carry biases. Unfortunately, we might not be aware of our own biases. As a result, a lot of training has been developed along with algorithms and procedures to detect personal bias.
In addition to personal bias, data scientists are also concerned with bias in the data, especially ones that are not obvious. A classic example of this is the use of zip codes in the United States.1 A data scientist including race as a decision factor in an experiment or test is an obvious bias. If that same data scientist were to use zip codes instead, this would not seem like a bias in favor of race. However, zip codes are strongly correlated with race because neighborhoods tend to be segregated. Therefore, using zip codes as a decision factor is effectively the same thing as using race and indirectly introduces bias.
The way data is collected and processed can also introduce bias. For instance, during the COVID-19 pandemic, when US schools transitioned to remote learning, there was a struggle for families at the bottom of the socioeconomic scale to receive consistent Internet access. If a school put out an online survey for parents to complete, the data collected from that survey would be skewed toward the upper end of the socioeconomic scale because the lower end may not have equal access to complete the survey.
The quality of the data and how it is collected is a critical concern when we start talking about AI solutions. In 2016, Microsoft introduced a chatbot, Tay, on Twitter (now X) that could participate in conversations on that platform. Tay was designed to emulate a female teenager and learn about language by detecting patterns and using those same patterns in its own responses. What was intended as a novel experiment quickly turned into a nightmare for Microsoft.2 As the chatbot interacted with other accounts on Twitter, its speech quickly became more racist, and Microsoft was forced to shut it down.
In this case, Tay’s interactions on Twitter were the chatbot’s data. Members of 4chan put out a call to interact with Tay using bigoted language.3 This concerted effort by 4chan users to skew Tay’s responses worked. Their actions caused Tay to be biased toward racist and misogynistic responses. One can argue quite successfully that Tay’s failure is a design failure.4 In short, Tay should have been built in such a way that it properly filtered its dataset. Unfortunately, it was not.
Microsoft was lampooned in both technical and generic media. Its reputation certainly was negatively impacted because regardless of the 4chan users’ efforts to destroy the experiment, Microsoft owned the experiment. Therefore, the negative reputational hit completely fell on Microsoft.
Because Tay was a public experiment on a social media platform and not a launched business service, the reputational impact likely did not show noticeably in Microsoft’s financial bottom line. However, we can glean from that situation, and others like it, that bias in data or in AI development could negatively impact the digital reputation of an organization. Impact the reputation of an organization and you also impact the trust others have for said organization in a digital ecosystem.
Ethics in AI
On December 2, 2020, Timnit Gebru, former head of ethics for AI at Google, tweeted about how she was forced out of Google. The reason for her ousting centered on a research paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”5 Leadership at Google demanded that the names of the researchers who worked at Google be removed from the research paper. This led to a showdown that eventually ended with Gebru’s departure.
The paper dealt with the ethical use of AI including cost/impact to the environment, disproportionate impact to those lower on the socioeconomic scale and embedded biases that are subtle but have the potential for great impact. Google has a vested interest in the use of AI/ML as Google AI powers Google search.6 Specifically, Google developed the Transformer language model,7 which is the foundation for offerings like OpenAI’s ChatGPT.8
Research casting ethical doubts or questions about Google’s own use of AI would certainly be cause for concern within Google, especially if that research had Google employees listed as part of the research.
Impact the reputation of an organization and you also impact the trust others have for said organization in a digital ecosystem.
The issue of ethics in AI is a global concern, enough such for UNESCO to publish recommendations centered around four core values and ten core principles.9 The first core principle is focused on human rights and human dignity. However, we have seen organizations that have ignored or deprioritized human rights and dignity. Therefore, we should not be surprised if such organizations use AI unethically.
AI and the Criminal Element
When it comes to AI, a substantial concern exists around criminal elements beginning to weaponize the technology. Consider a couple of potential scenarios.
The first scenario concerns phishing attacks. Generative AI can produce text that is professional and crafted for the user’s purpose whether it is designing a party invitation, writing a term paper for research or generating the body of an email. It is the latter that is of concern in phishing attacks. After all, we are typically able to spot a fake email if it has poor grammar, misspellings and other obvious signs that it did not come from the purported sender. If using AI to generate the body of an email works for legitimate purposes, it will work for criminal ones. Currently, it is virtually impossible to tell whether phishing emails are crafted using generative AI or not.10 Consider emails that sound like they are from real people and attack the reputation of a particular organization. Such attacks could be automated. Although organizations may have protections in place, in the information security world, we often need to react quickly to new techniques, tools or exploits from bad actors. As rigorous as these organizations are in maintaining security, bad actors with enough incentive will find weaknesses in these protections.
The second scenario that could potentially involve the criminal element is using AI to accelerate solutions, such as writing code to perform certain tasks. Take the most recent Pwn2Own competition. In that competition, a team used ChatGPT to accelerate their development of exploits.11 The security researchers found vulnerabilities and then used ChatGPT to start a code solution to deliver an exploit they conceived of. While the researchers still had to modify the code to get a working solution, they were able to use ChatGPT to generate enough code to greatly reduce the time for a final solution. There’s nothing stopping bad actors from doing the exact same thing and thus accelerating their own attacks. This threat increases when the bad actor is part of an advanced persistent threat (APT), a government-backed cybersecurity unit with a mission to infiltrate and compromise its adversaries.
AI Is Useful, Powerful and We Have Real Challenges Ahead
Although AI/ML is not new, broadly speaking, explosions in computing capability as well as data have made AI/ML significantly more powerful and usable, even by nontechnologists. Furthermore, innovative mechanisms for AI/ML development, such as Google’s creation of the Transformer model, only assure that AI/ML usage will continue to grow. We want it to grow. After all, AI might provide the answer to cure cancer12 and other worldwide issues, whether we have worked on them for decades or have discovered them more recently.
However, as with almost any developments in technology, we do have real challenges. Bias in data, whether by the professionals developing models or in the way the data is collected, is always going to be a challenge, just like in data science. Ethical use of AI is paramount. However, we need to realize that there are people who don’t care about ethics. Some of those folks are bad actors who would even weaponize AI to attack other organizations. We know that with generative AI, the speed and quality of the attacks created cannot be stopped. All of these issues need to be addressed properly in order to ensure a safe future with AI.
If using AI to generate the body of an email works for legitimate purposes, it will work for criminal ones.
Endnotes
1 George, A.; “Thwarting Bias in AI Systems,” Carnegie Mellon College of Engineering, December 2018, http://engineering.cmu.edu/news-events/news/2018/12/11-datta-proxies.html
2 Schwarz, O. “In 2016, Microsoft’s Racist Chatbot Revealed the Dangers of Online Conversation,” IEEE Spectrum, 25 November 2019, http://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation
3 Ibid.
4 Sinders, C.; “Microsoft’s Tay Is an Example of Bad Design,” Medium, 24 March 2016, http://medium.com/@carolinesinders/microsoft-s-tay-is-an-example-of-bad-design-d4e65bb2569f
5 Hao, K.; “We Read the Paper That Forced Timnit Gebru out of Google. Here’s What it Says.” MIT Technology Review, 4 December 2020, http://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-paper-forced-out-timnit-gebru/
6 Ibid
7 Uszkoreit, J.; “Transformer: A Novel Neural Network Architecture for Language Understanding,” Google Research. 27 August 2017, http://blog.research.google/2017/08/transformer-novel-neural-network.html
8 Molander, O.; “ChatGPTs, LLMs, and Foundation Models–A Closer Look Into the Hype and Implications for Startups,” Medium, 1 February 2023, http://betterprogramming.pub/chatgpt-llms-and-foundation-models-a-closer-look-into-the-hype-and-implications-for-startups-b2f1d82f4d46
9 UNESCO, “Ethics of Artificial Intelligence,” http://www.unesco.org/en/artificial-intelligence/recommendation-ethics
10 Sjouwerman, S.; “It’s Official–Generative AI Has Made Phishing Emails Foolproof,” KnowBe4, 28 September 2023, http://blog.knowbe4.com/generative-ai-foolproof-phishing-emails
11 Vishak, V.; “Used ChatGPT to Win the Pwn2Own Hacker Competition,” CodeandHack, 14 March 2023, http://codeandhack.com/used-chatgpt-to-win-the-pwn2own-hacker-competition/
12 Marks, R.; “How AI Found the Words to Kill Cancer Cells,” University of California San Francisco, USA, 8 December 2022, http://www.ucsf.edu/news/2022/12/424406/how-ai-found-words-kill-cancer-cells
K. BRIAN KELLEY | CISA, CDPSE, CSPO, MCSE, SECURITY+
Is an author and columnist focusing primarily on Microsoft SQL Server and Windows security. He currently serves as a data architect and an independent infrastructure/security architect concentrating on Active Directory, SQL Server and Windows Server. He has served in a myriad of other positions, including senior database administrator, data warehouse architect, web developer, incident response team lead and project manager. Kelley has spoken at 24 Hours of PASS, IT/Dev Connections, SQLConnections, the TechnoSecurity and Forensics Investigation Conference, the IT GRC Forum, SyntaxCon, and at various SQL Saturdays, Code Camps and user groups.