skip to main content
10.1145/3511861.3511863acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-ceConference Proceedingsconference-collections
research-article
Open Access

The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming

Published:14 February 2022Publication History

ABSTRACT

Recent advances in artificial intelligence have been driven by an exponential growth in digitised data. Natural language processing, in particular, has been transformed by machine learning models such as OpenAI’s GPT-3 which generates human-like text so realistic that its developers have warned of the dangers of its misuse. In recent months OpenAI released Codex, a new deep learning model trained on Python code from more than 50 million GitHub repositories. Provided with a natural language description of a programming problem as input, Codex generates solution code as output. It can also explain (in English) input code, translate code between programming languages, and more. In this work, we explore how Codex performs on typical introductory programming problems. We report its performance on real questions taken from introductory programming exams and compare it to results from students who took these same exams under normal conditions, demonstrating that Codex outscores most students. We then explore how Codex handles subtle variations in problem wording using several published variants of the well-known “Rainfall Problem” along with one unpublished variant we have used in our teaching. We find the model passes many test cases for all variants. We also explore how much variation there is in the Codex generated solutions, observing that an identical input prompt frequently leads to very different solutions in terms of algorithmic approach and code length. Finally, we discuss the implications that such technology will have for computing education as it continues to evolve, including both challenges and opportunities.

References

  1. Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (Dec. 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Joe Michael Allen, Frank Vahid, Alex Edgcomb, Kelly Downey, and Kris Miller. 2019. An Analysis of Using Many Small Programs in CS1. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education(SIGCSE ’19). ACM, NY, NY, USA, 585–591. https://doi.org/10.1145/3287324.3287466Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of the Evolution of Introductory Programming Education Research. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education(SIGCSE ’19). ACM, NY, NY, USA, 338–344. https://doi.org/10.1145/3287324.3287432Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, 2020. Language Models Are Few-shot Learners. arXiv preprint arXiv:2005.14165(2020).Google ScholarGoogle Scholar
  5. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, 2021. Evaluating Large Language Models Trained on Code. (2021). arxiv:cs.LG/2107.03374https://arxiv.org/abs/2107.03374Google ScholarGoogle Scholar
  6. Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche. 2018. Intelligent Tutoring Systems for Programming Education: A Systematic Review. In Proceedings of the 20th Australasian Computing Education Conference(ACE ’18). ACM, NY, NY, USA, 53–62. https://doi.org/10.1145/3160489.3160492Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Dick, Judy Sheard, Cathy Bareiss, Janet Carter, Donald Joyce, 2002. Addressing Student Cheating: Definitions and Solutions. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education(ITiCSE-WGR ’02). ACM, NY, NY, USA, 172–184. https://doi.org/10.1145/960568.783000Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John L. Donaldson, Ann-Marie Lancaster, and Paula H. Sposato. 1981. A Plagiarism Detection System. SIGCSE Bull. 13, 1 (Feb. 1981), 21–25. https://doi.org/10.1145/953049.800955Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alireza Ebrahimi. 1994. Novice Programmer Errors: Language Constructs and Plan Composition. Int. J. Hum.-Comput. Stud. 41, 4 (Oct. 1994), 457–480. https://doi.org/10.1006/ijhc.1994.1069Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kathi Fisler. 2014. The Recurring Rainfall Problem. In Proceedings of the Tenth Annual Conference on International Computing Education Research(ICER ’14). ACM, NY, NY, USA, 35–42. https://doi.org/10.1145/2632320.2632346Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines 30, 4 (2020), 681–694. https://doi.org/10.1007/s11023-020-09548-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lex Fridman. 2021. Donald Knuth: Programming, Algorithms, Hard Problems & the Game of Life | Lex Fridman Podcast #219. https://www.youtube.com/watch?v=EE1R8FYUJm0&t=1995sGoogle ScholarGoogle Scholar
  13. Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-Universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories(MSR ’20). ACM, NY, NY, USA, 431–442. https://doi.org/10.1145/3379597.3387473Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark Guzdial. 2011. From Science to Engineering. Commun. ACM 54, 2 (Feb. 2011), 37–39. https://doi.org/10.1145/1897816.1897831Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mark Guzdial. 2013. Exploring Hypotheses about Media Computation. In Proceedings of the Ninth Annual International ACM Conference on International Computing Education Research(ICER ’13). ACM, NY, NY, USA, 19–26. https://doi.org/10.1145/2493394.2493397Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mark Guzdial, Rachel Fithian, Andrea Forte, and Lauren Rich. 2003. Report on Pilot Offering of CS1315 Introduction to Media Computation With Comparison to CS1321 and COE1361.Google ScholarGoogle Scholar
  17. Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A Review of Peer Code Review in Higher Education. ACM Trans. Comput. Educ. 20, 3, Article 22 (Sept. 2020), 25 pages. https://doi.org/10.1145/3403935Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Antti-Jussi Lakanen, Vesa Lappalainen, and Ville Isomöttönen. 2015. Revisiting Rainfall to Explore Exam Questions and Performance on CS1. In Proceedings of the 15th Koli Calling Conference on Computing Education Research(Koli Calling ’15). ACM, NY, NY, USA, 40–49. https://doi.org/10.1145/2828959.2828970Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas Lancaster and Codrin Cotarlan. 2021. Contract Cheating by STEM Students Through a File Sharing Website: A Covid-19 Pandemic Perspective. International Journal for Educational Integrity 17, 1 (2021), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  20. Alberta Lipson and Norma McGavern. 1993. Undergraduate Academic Dishonesty at MIT. Results of a Study of Attitudes and Behavior of Undergraduates, Faculty, and Graduate Teaching Assistants.(1993).Google ScholarGoogle Scholar
  21. Andrew Luxton-Reilly. 2009. A Systematic Review of Tools That Support Peer Assessment. Computer Science Education 19, 4 (2009), 209–232.Google ScholarGoogle ScholarCross RefCross Ref
  22. Andrew Luxton-Reilly, Simon, Ibrahim Albluwi, Brett A. Becker, Michail Giannakos, 2018. Introductory Programming: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education(ITiCSE 2018 Companion). ACM, NY, NY, USA, 55–106. https://doi.org/10.1145/3293881.3295779Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zohar Manna and Richard J. Waldinger. 1971. Toward Automatic Program Synthesis. Commun. ACM 14, 3 (March 1971), 151–165. https://doi.org/10.1145/362566.362568Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sathiamoorthy Manoharan. 2017. Personalized Assessment as a Means to Mitigate Plagiarism. IEEE Transactions on Education 60, 2 (2017), 112–119. https://doi.org/10.1109/TE.2016.2604210Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sathiamoorthy Manoharan and Ulrich Speidel. 2020. Contract Cheating in Computer Science: A Case Study. In 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 91–98. https://doi.org/10.1109/TALE48869.2020.9368454Google ScholarGoogle ScholarCross RefCross Ref
  26. Charlie McDowell, Linda Werner, Heather Bullock, and Julian Fernald. 2002. The Effects of Pair-Programming on Performance in an Introductory Programming Course. SIGCSE Bull. 34, 1 (Feb. 2002), 38–42. https://doi.org/10.1145/563517.563353Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Cade Metz. 2021. A.I. Can Now Write Its Own Computer Code. Thats Good News for Humans.https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.htmlGoogle ScholarGoogle Scholar
  28. OpenAI. 2020. About OpenAI. https://openai.com/about/Google ScholarGoogle Scholar
  29. Paul Phillips and Luc Cohen. 2014. Convictions of Plagiarism in Computer Science Courses on the Rise. The Daily Princetonian, March 4 (2014), 2014.Google ScholarGoogle Scholar
  30. Eric Roberts. 2002. Strategies for Promoting Academic Integrity in CS Courses. In 32nd Annual Frontiers in Education, Vol. 2. IEEE, F3G–F3G. https://doi.org/10.1109/FIE.2002.1158209Google ScholarGoogle ScholarCross RefCross Ref
  31. Kevin Scott. 2020. Microsoft teams up with OpenAI to Exclusively License GPT-3 Language Model. https://blogs.microsoft.com/blog/2020/09/22/microsoft-teams-up-with-openai-to-exclusively-license-gpt-3-language-model/Google ScholarGoogle Scholar
  32. Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do We Know How Difficult the Rainfall Problem Is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research(Koli Calling ’15). ACM, NY, NY, USA, 87–96. https://doi.org/10.1145/2828959.2828963Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sam Shead. 2021. Why Everyone is Talking About an Image Generator Released by an Elon Musk-Backed A.I. Lab. https://www.cnbc.com/2021/01/08/openai-shows-off-dall-e-image-generator-after-gpt-3.htmlGoogle ScholarGoogle Scholar
  34. Judy Sheard, Angela Carbone, and Martin Dick. 2003. Determination of Factors Which Impact on IT Students’ Propensity to Cheat. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20(ACE ’03). Australian Computer Society, Inc., AUS, 119–126.Google ScholarGoogle Scholar
  35. Judy Sheard, Simon, Matthew Butler, Katrina Falkner, Michael Morgan, 2017. Strategies for Maintaining Academic Integrity in First-Year Computing Courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education(ITiCSE ’17). ACM, NY, NY, USA, 244–249. https://doi.org/10.1145/3059009.3059064Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lee S. Shulman. 2005. Signature Pedagogies in the Professions. Daedalus 134, 3 (2005), 52–59. http://www.jstor.org/stable/20027998Google ScholarGoogle ScholarCross RefCross Ref
  37. Simon. 2013. Soloway’s Rainfall Problem Has Become Harder. In 2013 Learning and Teaching in Computing and Engineering. 130–135. https://doi.org/10.1109/LaTiCE.2013.44Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Simon. 2017. Designing Programming Assignments to Reduce the Likelihood of Cheating. In Proceedings of the 19th Australasian Computing Education Conference(ACE ’17). ACM, NY, NY, USA, 42–47. https://doi.org/10.1145/3013499.3013507Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Soloway. 1986. Learning to Program = Learning to Construct Mechanisms and Explanations. Commun. ACM 29, 9 (Sept. 1986), 850–858. https://doi.org/10.1145/6592.6594Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Alex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. 2021. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. arXiv preprint arXiv:2102.02503(2021).Google ScholarGoogle Scholar
  41. Laurie A. Williams and Robert R. Kessler. 2000. All I Really Need to Know about Pair Programming I Learned in Kindergarten. Commun. ACM 43, 5 (May 2000), 108–114. https://doi.org/10.1145/332833.332848Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wojciech Zaremba, Greg Brockman, and OpenAI. 2021. OpenAI Codex. https://openai.com/blog/openai-codex/Google ScholarGoogle Scholar

Index Terms

  1. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format