research-article

Open Access

The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming

Authors:
James Finnie-Ansley

The University of Auckland, New Zealand

The University of Auckland, New Zealand
View Profile

,
Paul Denny

The University of Auckland, New Zealand

The University of Auckland, New Zealand
View Profile

,
Brett A. Becker

University College Dublin, Ireland

University College Dublin, Ireland
View Profile

,
Andrew Luxton-Reilly

The University of Auckland, New Zealand

The University of Auckland, New Zealand
View Profile

,
James Prather

Abilene Christian University, United States

Abilene Christian University, United States
View Profile

ACE '22: Proceedings of the 24th Australasian Computing Education ConferenceFebruary 2022Pages 10–19https://doi.org/10.1145/3511861.3511863

Published:14 February 2022Publication History

ACE '22: Proceedings of the 24th Australasian Computing Education Conference

Pages 10–19

ABSTRACT

Recent advances in artificial intelligence have been driven by an exponential growth in digitised data. Natural language processing, in particular, has been transformed by machine learning models such as OpenAI’s GPT-3 which generates human-like text so realistic that its developers have warned of the dangers of its misuse. In recent months OpenAI released Codex, a new deep learning model trained on Python code from more than 50 million GitHub repositories. Provided with a natural language description of a programming problem as input, Codex generates solution code as output. It can also explain (in English) input code, translate code between programming languages, and more. In this work, we explore how Codex performs on typical introductory programming problems. We report its performance on real questions taken from introductory programming exams and compare it to results from students who took these same exams under normal conditions, demonstrating that Codex outscores most students. We then explore how Codex handles subtle variations in problem wording using several published variants of the well-known “Rainfall Problem” along with one unpublished variant we have used in our teaching. We find the model passes many test cases for all variants. We also explore how much variation there is in the Codex generated solutions, observing that an identical input prompt frequently leads to very different solutions in terms of algorithmic approach and code length. Finally, we discuss the implications that such technology will have for computing education as it continues to evolve, including both challenges and opportunities.

References

Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (Dec. 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarDigital Library
Joe Michael Allen, Frank Vahid, Alex Edgcomb, Kelly Downey, and Kris Miller. 2019. An Analysis of Using Many Small Programs in CS1. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education(SIGCSE ’19). ACM, NY, NY, USA, 585–591. https://doi.org/10.1145/3287324.3287466Google ScholarDigital Library
Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of the Evolution of Introductory Programming Education Research. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education(SIGCSE ’19). ACM, NY, NY, USA, 338–344. https://doi.org/10.1145/3287324.3287432Google ScholarDigital Library
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, 2020. Language Models Are Few-shot Learners. arXiv preprint arXiv:2005.14165(2020).Google Scholar
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, 2021. Evaluating Large Language Models Trained on Code. (2021). arxiv:cs.LG/2107.03374https://arxiv.org/abs/2107.03374Google Scholar
Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche. 2018. Intelligent Tutoring Systems for Programming Education: A Systematic Review. In Proceedings of the 20th Australasian Computing Education Conference(ACE ’18). ACM, NY, NY, USA, 53–62. https://doi.org/10.1145/3160489.3160492Google ScholarDigital Library
Martin Dick, Judy Sheard, Cathy Bareiss, Janet Carter, Donald Joyce, 2002. Addressing Student Cheating: Definitions and Solutions. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education(ITiCSE-WGR ’02). ACM, NY, NY, USA, 172–184. https://doi.org/10.1145/960568.783000Google ScholarDigital Library
John L. Donaldson, Ann-Marie Lancaster, and Paula H. Sposato. 1981. A Plagiarism Detection System. SIGCSE Bull. 13, 1 (Feb. 1981), 21–25. https://doi.org/10.1145/953049.800955Google ScholarDigital Library
Alireza Ebrahimi. 1994. Novice Programmer Errors: Language Constructs and Plan Composition. Int. J. Hum.-Comput. Stud. 41, 4 (Oct. 1994), 457–480. https://doi.org/10.1006/ijhc.1994.1069Google ScholarDigital Library
Kathi Fisler. 2014. The Recurring Rainfall Problem. In Proceedings of the Tenth Annual Conference on International Computing Education Research(ICER ’14). ACM, NY, NY, USA, 35–42. https://doi.org/10.1145/2632320.2632346Google ScholarDigital Library
Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines 30, 4 (2020), 681–694. https://doi.org/10.1007/s11023-020-09548-1Google ScholarDigital Library
Lex Fridman. 2021. Donald Knuth: Programming, Algorithms, Hard Problems & the Game of Life | Lex Fridman Podcast #219. https://www.youtube.com/watch?v=EE1R8FYUJm0&t=1995sGoogle Scholar
Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-Universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories(MSR ’20). ACM, NY, NY, USA, 431–442. https://doi.org/10.1145/3379597.3387473Google ScholarDigital Library
Mark Guzdial. 2011. From Science to Engineering. Commun. ACM 54, 2 (Feb. 2011), 37–39. https://doi.org/10.1145/1897816.1897831Google ScholarDigital Library
Mark Guzdial. 2013. Exploring Hypotheses about Media Computation. In Proceedings of the Ninth Annual International ACM Conference on International Computing Education Research(ICER ’13). ACM, NY, NY, USA, 19–26. https://doi.org/10.1145/2493394.2493397Google ScholarDigital Library
Mark Guzdial, Rachel Fithian, Andrea Forte, and Lauren Rich. 2003. Report on Pilot Offering of CS1315 Introduction to Media Computation With Comparison to CS1321 and COE1361.Google Scholar
Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A Review of Peer Code Review in Higher Education. ACM Trans. Comput. Educ. 20, 3, Article 22 (Sept. 2020), 25 pages. https://doi.org/10.1145/3403935Google ScholarDigital Library
Antti-Jussi Lakanen, Vesa Lappalainen, and Ville Isomöttönen. 2015. Revisiting Rainfall to Explore Exam Questions and Performance on CS1. In Proceedings of the 15th Koli Calling Conference on Computing Education Research(Koli Calling ’15). ACM, NY, NY, USA, 40–49. https://doi.org/10.1145/2828959.2828970Google ScholarDigital Library
Thomas Lancaster and Codrin Cotarlan. 2021. Contract Cheating by STEM Students Through a File Sharing Website: A Covid-19 Pandemic Perspective. International Journal for Educational Integrity 17, 1 (2021), 1–16.Google ScholarCross Ref
Alberta Lipson and Norma McGavern. 1993. Undergraduate Academic Dishonesty at MIT. Results of a Study of Attitudes and Behavior of Undergraduates, Faculty, and Graduate Teaching Assistants.(1993).Google Scholar
Andrew Luxton-Reilly. 2009. A Systematic Review of Tools That Support Peer Assessment. Computer Science Education 19, 4 (2009), 209–232.Google ScholarCross Ref
Andrew Luxton-Reilly, Simon, Ibrahim Albluwi, Brett A. Becker, Michail Giannakos, 2018. Introductory Programming: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education(ITiCSE 2018 Companion). ACM, NY, NY, USA, 55–106. https://doi.org/10.1145/3293881.3295779Google ScholarDigital Library
Zohar Manna and Richard J. Waldinger. 1971. Toward Automatic Program Synthesis. Commun. ACM 14, 3 (March 1971), 151–165. https://doi.org/10.1145/362566.362568Google ScholarDigital Library
Sathiamoorthy Manoharan. 2017. Personalized Assessment as a Means to Mitigate Plagiarism. IEEE Transactions on Education 60, 2 (2017), 112–119. https://doi.org/10.1109/TE.2016.2604210Google ScholarDigital Library
Sathiamoorthy Manoharan and Ulrich Speidel. 2020. Contract Cheating in Computer Science: A Case Study. In 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 91–98. https://doi.org/10.1109/TALE48869.2020.9368454Google ScholarCross Ref
Charlie McDowell, Linda Werner, Heather Bullock, and Julian Fernald. 2002. The Effects of Pair-Programming on Performance in an Introductory Programming Course. SIGCSE Bull. 34, 1 (Feb. 2002), 38–42. https://doi.org/10.1145/563517.563353Google ScholarDigital Library
Cade Metz. 2021. A.I. Can Now Write Its Own Computer Code. Thats Good News for Humans.https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.htmlGoogle Scholar
OpenAI. 2020. About OpenAI. https://openai.com/about/Google Scholar
Paul Phillips and Luc Cohen. 2014. Convictions of Plagiarism in Computer Science Courses on the Rise. The Daily Princetonian, March 4 (2014), 2014.Google Scholar
Eric Roberts. 2002. Strategies for Promoting Academic Integrity in CS Courses. In 32nd Annual Frontiers in Education, Vol. 2. IEEE, F3G–F3G. https://doi.org/10.1109/FIE.2002.1158209Google ScholarCross Ref
Kevin Scott. 2020. Microsoft teams up with OpenAI to Exclusively License GPT-3 Language Model. https://blogs.microsoft.com/blog/2020/09/22/microsoft-teams-up-with-openai-to-exclusively-license-gpt-3-language-model/Google Scholar
Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do We Know How Difficult the Rainfall Problem Is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research(Koli Calling ’15). ACM, NY, NY, USA, 87–96. https://doi.org/10.1145/2828959.2828963Google ScholarDigital Library
Sam Shead. 2021. Why Everyone is Talking About an Image Generator Released by an Elon Musk-Backed A.I. Lab. https://www.cnbc.com/2021/01/08/openai-shows-off-dall-e-image-generator-after-gpt-3.htmlGoogle Scholar
Judy Sheard, Angela Carbone, and Martin Dick. 2003. Determination of Factors Which Impact on IT Students’ Propensity to Cheat. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20(ACE ’03). Australian Computer Society, Inc., AUS, 119–126.Google Scholar
Judy Sheard, Simon, Matthew Butler, Katrina Falkner, Michael Morgan, 2017. Strategies for Maintaining Academic Integrity in First-Year Computing Courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education(ITiCSE ’17). ACM, NY, NY, USA, 244–249. https://doi.org/10.1145/3059009.3059064Google ScholarDigital Library
Lee S. Shulman. 2005. Signature Pedagogies in the Professions. Daedalus 134, 3 (2005), 52–59. http://www.jstor.org/stable/20027998Google ScholarCross Ref
Simon. 2013. Soloway’s Rainfall Problem Has Become Harder. In 2013 Learning and Teaching in Computing and Engineering. 130–135. https://doi.org/10.1109/LaTiCE.2013.44Google ScholarDigital Library
Simon. 2017. Designing Programming Assignments to Reduce the Likelihood of Cheating. In Proceedings of the 19th Australasian Computing Education Conference(ACE ’17). ACM, NY, NY, USA, 42–47. https://doi.org/10.1145/3013499.3013507Google ScholarDigital Library
E. Soloway. 1986. Learning to Program = Learning to Construct Mechanisms and Explanations. Commun. ACM 29, 9 (Sept. 1986), 850–858. https://doi.org/10.1145/6592.6594Google ScholarDigital Library
Alex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. 2021. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. arXiv preprint arXiv:2102.02503(2021).Google Scholar
Laurie A. Williams and Robert R. Kessler. 2000. All I Really Need to Know about Pair Programming I Learned in Kindergarten. Commun. ACM 43, 5 (May 2000), 108–114. https://doi.org/10.1145/332833.332848Google ScholarDigital Library
Wojciech Zaremba, Greg Brockman, and OpenAI. 2021. OpenAI Codex. https://openai.com/blog/openai-codex/Google Scholar

Index Terms

The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming
1. Computing methodologies
  1. Machine learning
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education

Index terms have been assigned to the content through auto-classification.

Recommendations

My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises
ACE '23: Proceedings of the 25th Australasian Computing Education Conference

The introduction of OpenAI Codex sparked a surge of interest in the impact of generative AI models on computing education practices. Codex is also the underlying model for GitHub Copilot, a plugin which makes AI-generated code accessible to students ...
Read More
Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations
ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with ...
Read More
“It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers
Recent developments in deep learning have resulted in code-generation models that produce source code from natural language and code-based prompts with high accuracy. This is likely to have profound effects in the classroom, where novices learning to code ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACE '22: Proceedings of the 24th Australasian Computing Education Conference
February 2022
200 pages
ISBN:9781450396431
DOI:10.1145/3511861
Editors:
Judy Sheard
Monash University
,
Paul Denny
The University of Auckland
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 February 2022
Check for updates
Author Tags
AI
CS1
Codex
GPT-3
GitHub
OpenAI
academic integrity
artificial intelligence
code generation
code writing
copilot
deep learning
introductory programming
machine learning
neural networks
novice programming
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate161of359submissions,45%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 15,094
  Total Downloads
- Downloads (Last 12 months)7,280
- Downloads (Last 6 weeks)874
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming

ACE '22: Proceedings of the 24th Australasian Computing Education Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises

Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations

“It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers