Accepted Abstracts

Human vs. Machine: Behavioral Differences between Expert Humans and Language Models in Wargame Simulations

Max Lamparth, Anthony Corso, Jacob Ganz, Oriana Skylar Mastro, Jacquelyn Schneider, Harold Trinkunas

Short Abstract: To some, the advent of artificial intelligence (AI) promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs) that can be applied to many tasks, behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation and unnecessary conflicts. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 214 national security experts designed to examine crisis escalation in a fictional U.S.-China scenario and compare the behavior of human player teams to LLM-simulated team responses in separate simulations. Wargames have a long history in the development of military strategy and the response of nations to threats or attacks. Here, we find that the LLM-simulated responses can be more aggressive and significantly affected by changes in the scenario. We show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between a team of players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as “pacifist” or “aggressive sociopath.” When probing behavioral consistency across individual moves of the simulation, the tested LLMs deviated from each other but generally showed somewhat consistent behavior. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.

Key Words: natural language processing, decision making, AI safety, military AI, AI ethics, language model

Poster Talk

Understanding Computer Science Students' views of Military (and Military-adjacent) Work

Sahar Abdalla, Alicia Cappello, Mohamed Abdalla, Catherine Stinson

Short Abstract: The increased (potential) adoption of AI by militaries around the world has drawn the attention and raised concerns of both legislators and computer scientists working in industry. However, we do not have a good sense of the views of the field. More specifically: Are computer science students seeking jobs concerned about their labour being used for military purposes or used in military contexts? Are they aware of the working relationships between large US technology companies and militaries around the world? How does this knowledge (or lack thereof) affect their decision to apply to these companies? What would it take to make students reconsider working for companies known to apply for military contracts? We conducted an online survey of computer science students at Canadian universities who are seeking full-time jobs (or recent graduates who have recently obtained their first post-graduation job). Initial results seem to indicate that the majority of students do not particularly privilege the ethics of their labour over other considerations (e.g., remuneration or location). The majority of students were not concerned with their labour being used for military purposes, though this was not the case for all demographic subgroups. For those who were concerned about their labour being used for military purposes, a plurality knew of at least some, if not all, of the military contracts taken by the companies to which they applied. Compared to other ethical concerns (such as environmental impact), students were less concerned by the usage of their work in military contexts (or for military purposes). Understanding students' views to the above questions is vital for a myriad of roles, be it educators looking to study the effectiveness of ethics courses, industry trying to gauge incoming worker sentiments, or military recruiters attempting to understand possible challenges.

Key Words: Survey, Opinion Poll, AI Ethics, Student Views, Industry Jobs

Poster Talk

Balancing Power and Ethics: A Framework for Addressing Human Rights Concerns in Military AI

Mst Rafia Islam, Azmine Toushik Wasi

Short Abstract: AI has made significant strides recently, leading to various applications in both civilian and military sectors. The military sees AI as a solution for developing more effective and faster technologies. While AI offers benefits like improved operational efficiency and precision targeting, it also raises serious ethical and legal concerns, particularly regarding human rights violations. Autonomous weapons that make decisions without human input can threaten the right to life and violate international humanitarian law. To address these issues, we propose a three-stage framework (Design, In Deployment, and During/After Use) for evaluating human rights concerns in the design, deployment, and use of military AI. Each phase includes multiple components that address various concerns specific to that phase, ranging from bias and regulatory issues to violations of International Humanitarian Law. By this framework, we aim to balance the advantages of AI in military operations with the need to protect human rights.

Key Words: Military AI, Human Rights, AI Ethics, Power and Politics

Poster Talk

The Fallacy of Precision: Deconstructing the Narrative Supporting AI-Enhanced Military Weaponry

Sonia Fereidooni, Vicka Heidt

Short Abstract: Recent pro-military arguments have attempted to morally justify the integration of AI into military systems, claiming it leads to more precise and sophisticated weaponry. However, this narrative obscures the reality of AI-based weaponry and is deeply flawed for several reasons. First, the use of AI in military contexts is morally reprehensible, as it perpetuates violence and dehumanization through biased training data and the unethical experimentation on human lives, undermining claims of ethical justification. The development of AI-powered autonomous weapon systems often relies on datasets that reflect existing societal biases, potentially leading to discriminatory targeting and disproportionate impacts on marginalized communities. Furthermore, the refinement of these systems necessitates a troubling 'trial and error' approach using real-world conflicts as testing grounds, effectively treating human lives as expendable data points for AI optimization. Second, contrary to the assertion that AI enhances precision and human control, military AI often leads to reduced oversight, human control, and accountability. In practice, AI military systems, such as autonomous weapon systems (AWS), fail to adequately distinguish between combatants and civilians. Finally, the deployment of AI in warfare contradicts current international humanitarian law (IHL), rendering its use legally indefensible. This paper aims to critically investigate the misleading philosophies driving the push for militaristic AI. This paper argues that militaristic AI is (1) morally indefensible because it necessitates extensive experimentation on human lives to develop sophisticated AI weaponry, disproportionately affecting marginalized communities in the Global South, (2) is associated with reduced human control and precision, evidenced by the high civilian toll inherent in currently deployed AI military systems, and (3) constitutes a violation of IHL. The paper presents case studies of AI systems like 'Where's Daddy?', 'Lavender', and 'The Gospel,' employed by the Israel Defense Forces (IDF) in Palestine, demonstrating how AI-driven 'kill lists' disregard civilian casualties and facilitate the automation of violence. By unmasking the deceptive rhetoric surrounding military AI, this paper aims to elicit critical discourse on the practical ramifications of the use of AI in warfare.

Key Words: Palestine, Gaza, IDF, Warfare, Philosophy, Morality, Dehumanization, AI Military, IHL, International Humanitarian Law

Poster

Machine intelligence cyberwar: evaluating societal risks from military uses of Machine Intelligence Cyber Agents

Timothy Dubber, Seth Lazar

Short Abstract: Cyber warfare is almost certainly the first domain in which fully autonomous machine intelligence combatants will be deployed. This is because cyber warfare occurs in a “constrained” domain, unlike the physical domains of land, air, maritime and space. A Machine Intelligence Cyber Agent (MICA) would not need to be embodied with a comprehensive set of perceptual functions to understand the battlespace. This is because computer network information is already processed in a machine-readable format. Thus, a machine intelligence combatant is already 'native' to the cyber domain. In this paper, we first characterise both the incentives to build MICAs and the current state of the art before articulating five key priorities that researchers and practitioners should pursue now to reduce the risk of MICAs causing catastrophic harm.

Key Words: Military uses of AI; cybersecurity; cyber warfare

Poster

Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations

Aryan Shrivastava, Jessica Hullman, Max Lamparth

Short Abstract: There is an increasing interest in using language models (LMs) for automated decision-making, with multiple countries actively testing LMs to aid in military crisis decision-making. To scrutinize relying on LM decision-making in high-stakes settings, we examine the inconsistency of responses in a crisis simulation ('wargame'), similar to reported tests conducted by the US military. Prior work illustrated escalatory tendencies and varying levels of aggression among LMs but were constrained to simulations with pre-defined actions. This was due to the challenges associated with quantitatively measuring semantic differences and evaluating natural language decision-making without relying on pre-defined actions. In this work, we query LMs for free form responses and use a metric based on BERTScore to measure response inconsistency quantitatively. Leveraging the benefits of BERTScore, we show that the inconsistency metric is robust to linguistic variations that preserve semantic meaning in a question-answering setting across text lengths. We show that all five tested LMs exhibit levels of inconsistency that indicate semantic differences, even when adjusting the wargame setting, anonymizing involved conflict countries, or adjusting the sampling temperature parameter . Further qualitative evaluation shows that models recommend courses of action that share few to no similarities. We also study the impact of different prompt sensitivity variations on inconsistency at temperature . We find that inconsistency due to semantically equivalent prompt variations can exceed response inconsistency from temperature sampling for most studied models across different levels of ablations. Given the high-stakes nature of military deployment, we recommend further consideration be taken before using LMs to inform military decisions or other cases of high-stakes decision-making.

Key Words: Military, AI Safety, Transparency, Inconsistency, Language Models, Natural Language Processing

Poster

Autonomous Weapons Systems Proliferation Poses Risks to Human Rights and International Security

Leif Monnett

Short Abstract: Autonomous weapons systems (AWS) are rapidly being developed, and are likely to proliferate. Many factors will determine which kinds of AWS proliferate, to whom they proliferate, and the pace and scope of proliferation. This paper identifies international security and law risks from the proliferation of AWS to state and non-state actors, including risks to civil society. Potential uses of AWS by state and non-state actors include warfighting, policing, and extrajudicial killing. Challenges to international human rights and humanitarian law from the use of AWS in warfighting and policing have previously been well-examined . However, proliferation of AWS may facilitate the targeted killing of a wide range of at-risk individuals and vulnerable populations by state and non-state actors. The human rights implications of the use of AWS for extrajudicial killing are manifold, and include violations of the rights to life, dignity, freedom of opinion and expression, freedom of religion, freedom of peaceful assembly and association, protection from discrimination, etc. Challenges posed to attribution by AWS may undermine accountability: a core principle underlying international law. This paper provides specific policy recommendations to mitigate such risks, and argues that international action to address these issues is urgently needed.

Key Words: human rights, international security, international law

Poster

Can AI-powered military comply with international humanitarian laws?

Hager Radi Abdelwahed, Omnia Farrag Othman, Hadeer El Ashhab

Short Abstract: Artificial Intelligence (AI) has recorded unprecedented progress in the last few years, which sparked debates about AI safety. Concerns emerged that AI is advancing too fast without considering all safety issues, which led to calls for a slowdown in AI research due to its growing impact on everyone's lives. We believe the AI community has largely overlooked autonomous weapons systems (AWS), which is a concerning use case of AI and an alarming threat to human life. We build upon current work as we question if AI in military can be safely regulated, in light of international humanitarian law (IHL). Our goal was to highlight the gap between the current state of AI use in military and international laws, and show how hard it is to legalize AWS. In future work, we will analyze, in more technical detail, how current AI systems do not comply with international humanitarian law and hence not ready to be used in wars.

Key Words: AI safety, autonomous weapons systems (AWS), AI regulations, International Humanitarian Law (IHL)

Poster

Impending Expansion of AI Misuse towards Militarization in India

Param Raval

Short Abstract: With greater data consolidation capabilities, artificial intelligence synergizes with extended dataveillance to give malicious actors in power, including private and state institutions, better tools to achieve objectives harmful to the greater population. With an urban population of 500 million out of a total 1.45 billion, and an emerging base of around 750-800 million active internet users, India provides a compelling case study to analyze AI misuse. Being a non-western democracy, it also allows us to critique the existing and proposed frameworks to regulate the state use of surveillance and AI technology for public safety. We argue that the recent deterioration of human rights safety in India, and state and military deployment of facial recognition in policing points to a larger threat in impending expansion of harmful use of AI systems. We observe that frameworks proposed to risk assess and regulate such systems are found lacking in this case. Combined with weaponization of AI towards swaying public opinion to favor the government, the situation sets the stage for dangerous advancements in the near future. Based on our observations, we call for a re-evaluation of global regulatory frameworks and to extend this reasoning to other nations with systems vulnerable to AI-driven misuse by harmful state actors.

Key Words: Human rights, India, Digital governance, AI ethics, AI safety, Facial recognition technology

Poster

The purpose of a system is what it does, and science is a thing which people do': from AI epistemology to AI military ethics

Zhanpei Fang

Short Abstract: Drawing upon recent work on understanding the usage of machine learning for natural sciences research, I posit that the epistemic problems of ML have significant bearing upon concerns related to its deployment in a military context. In particular I try to sketch out throughlines between the epistemology and the ethics of AI by way of the useful philosophical lenses of the theory-free ideal and instrumental reason. The urgency of this task is underlined when we consider ML/AI's growing role in administering human life, in military and statecraft as well as in many other contexts. The faulty epistemic practices performed by AI practitioners, commentators and policymakers have real consequences on the social and natural world. I consider ethical consequences of the `conceptual poorness' assigned to ML methods, and provide some theoretical scaffolding to fold AI into other discourses of technology, namely the critique of instrumental reason, additionally applying the Marxian notion of reification to understanding AI as a social technology or organizing activity. Informed by my experiences as a junior applied-ML researcher in the space-tech industry, now in academia studying novel ML methods on satellite imagery in regions of humanitarian & conflict concern, and illustrating with some recent examples, I provide a few policy recommendations as well as recommendations for AI ethics & fairness research directions.

Key Words: military AI, AI epistemology, AI ethics, critical theory, philosophy of technology

Poster

AI in Military Decision Support Systems: A Review of Developments and Debates

Anna Nadibaidze, Ingvild Bode, Qiaochu Zhang

Short Abstract: Reports from war zones underline that artificial intelligence (AI) technologies are increasingly integrated into military decision-making. Armed forces are developing and employing AI-based systems as part of the complex and multilayered process of decision-making that relates to the use of force. Such uses of AI in security and warfare are associated with opportunities and challenges which deserve further scrutiny. To contribute to ongoing discussions on AI-based decision support systems (AI DSS), this report provides a review of 1) the main developments in relation to AI DSS (at the time of writing in September 2024), focusing on specific examples of existing systems; and 2) the main debates about opportunities and challenges related to various uses of AI DSS, with a focus on issues of human-machine interaction in warfare.

Key Words: AI, decision-support, targeting, agency, human-machine interaction

Poster

AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research

Riley Simmons-Edler, Ryan Paul Badman, Shayne Longpre, Kanaka Rajan

Short Abstract: The recent embrace of machine learning (ML) in the development of autonomous weapons systems (AWS) creates serious risks to geopolitical stability and the free exchange of ideas in AI research. This topic has received comparatively little attention of late compared to risks stemming from superintelligent artificial general intelligence (AGI), but requires fewer assumptions about the course of technological development and is thus a nearer-future issue. ML is already enabling the substitution of AWS for human soldiers in many battlefield roles, reducing the upfront human cost, and thus political cost, of waging offensive war. In the case of peer adversaries, this increases the likelihood of 'low intensity' conflicts which risk escalation to broader warfare. In the case of non-peer adversaries, it reduces the domestic blowback to wars of aggression. This effect can occur regardless of other ethical issues around the use of military AI such as the risk of civilian casualties, and does not require any superhuman AI capabilities. Further, the military value of AWS raises the specter of an AI-powered arms race and the misguided imposition of national security restrictions on AI research. Our goal in this paper is to raise awareness among the public and ML researchers on the near-future risks posed by full or near-full autonomy in military technology, and we provide regulatory suggestions to mitigate these risks. We call upon AI policy experts and the defense AI community in particular to embrace transparency and caution in their development and deployment of AWS to avoid the negative effects on global stability and AI research that we highlight here.

Key Words: Autonomous Weapons Systems, Military AI, AI Risks

Poster

Inconsistencies in Artificial Intelligence Strategy Alignment of NATO Member States

Itai Epstein, Dane Malenfant, Sara Parker, Cella Wardrop

Short Abstract: The increasing discourse of the use of AI in military applications, from autonomous weapons to surveillance, raises global security concerns, particularly due to the secrecy surrounding these technologies. Uncertainty about AI's role in warfare could fuel an arms race similar to mid-20th century nuclear proliferation. Our research examines NATO member states' willingness to cooperate on military AI, analyzing public policies and statements in official reports and documents with a table. While only 34% of NATO members have specific military AI policies, 88% have national AI strategies, highlighting a broader interest in AI but limited focus on its military use. Future work will explore correlations between military expenditures, R&D investments, and AI policy development.

Key Words: NATO, policy, military, AI, cybersecurity, digital

Poster

DisasterQA: A Benchmark for Assessing the performance of LLMs in Disaster Response

Rajat Rawat, Kevin Zhu

Short Abstract: The military plays a key role in Humanitarian Assistance and Disaster Relief. Disasters can result in the deaths of many, making quick response times vital. Large Language Models (LLMs) have emerged as valuable in the field. LLMs can be used to process vast amounts of textual information quickly providing situational context during a disaster. However, the question remains whether LLMs should be used for advice and decision-making in a disaster. To evaluate the capabilities of LLMs in disaster response knowledge, we introduce a benchmark: DisasterQA created from six online sources. The benchmark covers a wide range of disaster response topics. We evaluated five LLMs each with four different prompting methods on our benchmark, measuring both accuracy and confidence levels. We hope that this benchmark pushes forth further development of LLMs in disaster response, ultimately enabling these models to work alongside emergency managers in disasters.

Key Words: Humanitarian Assistance and Disaster Relief, Large Language Models, Emergency Response, Disaster Management, Benchmark

Poster

MILITARISING ML: FUNCTIONALITY & HARM

Mel Andrews, Andrew J Smart

Short Abstract: As the purview of artificial intelligence (AI) or machine learning (ML) continues to expand, so does the use of AI/ML in military application. How do we characterise the dimensions of ethical consideration relevant to military uses of ML? We propose that any normative deliberation over the use of ML in warfare must begin with an understanding of the function of the technology and the full range of harms that might flow from its (mis)use. In the first place, it is necessary to understand that the methods of ML are tools for data analysis. As such, they are “epistemic technologies” (c.f., Alvarado, 2023). The methods of ML are therefore to be understood as techniques wielded by human beings towards the ends of gaining information about the world. ML-based systems are not autonomous reasoners, decision-makers, or actors, and any discussion of the ethics of their use in warfare must not treat them as such, at risk of obfuscating or wrongfully absolving human responsibility. Of equal importance is the development of a taxonomy of potential harms which might flow from the military use of ML. We propose that, at the highest level, we ought to distinguish between use cases in which potential harms are specific to the use of ML versus those agnostic to the involvement of ML, what we term means-dependent and means-independent harms. Within the category of means-dependent harms, it is crucial to distinguish between harms which flow from the technology functioning “as intended” versus those resulting from malfunction (c.f., Raji, Kumar, Horowitz, & Selbst, 2022). In attempting to understand how the functionality of ML systems relates to their potential for harm, it is important to recognise in which cases the learning problem is, as specified, not feasible in principle. Problem misspecification and attempts to use ML to accomplish misguided or impossible epistemic tasks pose one of the greatest ethical risks for the use of ML in any domain (c.f., Andrews, Smart, & Birhane, 2024). Lastly, we highlight the role of “AI exceptionalism;” the assumption that the involvement of AI/ML methods makes possible tasks which are widely understood to be impossible, or renders ethical applications of interventions which are, in general, regarded as unethical. We view the discursive role played by AI/ML in modern military operations as an instance of this “AI exceptionalism” (c.f, Fang, 2024; Weirich, 2024).

Key Words: AI, ML, military, functionality, ethics, harms, philosophy

Poster

From Risk to Regulation: Navigating US-China Trust Gaps and Risk Mismatches in Military AI

Jean Dong, Ahmed Mehdi Inane, Riley Simmons-Edler, Ryan Paul Badman

Short Abstract: Lethal Autonomous Weapons (LAWS), defined by the United Nations as systems capable of independently identifying, selecting, and engaging targets without human intervention, represent a transformative yet controversial domain of modern military technology. Since the rise of systems like the Predator drone and earlier autonomous units such as the Israeli Harpy, international calls for regulation have grown. However, most regulatory efforts, including those led by the UN and NGOs, have focused on blanket bans, which global powers often find overly restrictive and impractical. As a result, no significant global agreements on LAWS have been reached. The proliferation of AI-powered LAWS in conflict zones, combined with advancements in military AI, has exacerbated the urgency for effective regulation. Current proposals have struggled to keep pace with AI advancements, often failing to distinguish between AI-enabled autonomous systems and older, remote-controlled technologies. Diverging perspectives from global superpowers further complicate the regulatory landscape: while the United States has resisted nuanced definitions, China has emphasized regulating only the most advanced AI-enabled LAWS. These fragmented approaches and policy failures reflect the broader breakdown of dialogue, rooted in mismatched categorizations and inadequate assessments of AI's unique challenges relative to traditional warfare. This paper identifies trust gaps and risk mismatches in military AI between China and the United States as well as military AI applications that pose significant ethical and security risks, advocating for immediate international consensus to ban or limit these high-risk cases. By establishing clear norms and precedents, the global community can build a foundation for more comprehensive regulation of near-future military AI technologies. This approach aims to bridge trust gaps, mitigate the arms race between the U.S. and China, and promote responsible governance of military AI.

Key Words: Artificial Intelligence, Lethal Autonomous Weapon Systems, Regulation, China, United States

Poster

Accepted Abstracts

Please find a list of the accepted abstracts.

Short talks:

Posters: