Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models

Authors:
Sami Sarsa

Aalto University, Finland

Aalto University, Finland
View Profile

,
Paul Denny

The University of Auckland, New Zealand

The University of Auckland, New Zealand
View Profile

,
Arto Hellas

Aalto University, Finland

Aalto University, Finland
View Profile

,
Juho Leinonen

Aalto University, Finland

Aalto University, Finland
View Profile

ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1August 2022Pages 27–43https://doi.org/10.1145/3501385.3543957

Published:03 August 2022Publication History

ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1

Pages 27–43

ABSTRACT

This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

References

Onni Aarne, Petrus Peltola, Juho Leinonen, and Arto Hellas. 2018. A study of pair programming enjoyment and attendance using study motivation and strategy metrics. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. 759–764.Google ScholarDigital Library
Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.Google Scholar
Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarDigital Library
Joe Michael Allen, Frank Vahid, Kelly Downey, and Alex Daniel Edgcomb. 2018. Weekly programs in a CS1 class: Experiences with auto-graded many-small programs (MSP). In 2018 ASEE Annual Conference & Exposition.Google ScholarCross Ref
Cory Althoff. 2022. The Self-Taught Programmer: The Definitive Guide to Programming Professionally. Hachette UK.Google Scholar
Albert Bandura. 1977. Self-efficacy: toward a unifying theory of behavioral change.Psychological review 84, 2 (1977).Google Scholar
Elisa Baniassad, Lucas Zamprogno, Braxton Hall, and Reid Holmes. 2021. STOP THE (AUTOGRADER) INSANITY: Regression Penalties to Deter Autograder Overreliance. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE ’21). Association for Computing Machinery, New York, NY, USA, 1062–1068. https://doi.org/10.1145/3408877.3432430Google ScholarDigital Library
John B. Biggs and K. F. Collis. 1982. Evaluating the quality of learning : the SOLO taxonomy (structure of the observed learning outcome) / John B. Biggs, Kevin F. Collis. Academic Press New York. xiii, 245 p. : pages.Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in neural information processing systems. 1877–1901.Google Scholar
Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A Validated Scoring Rubric for Explain-in-Plain-English Questions. Association for Computing Machinery, New York, NY, USA, 563–569. https://doi.org/10.1145/3328778.3366879Google ScholarDigital Library
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021).Google Scholar
Matteo Ciniselli, Luca Pascarella, and Gabriele Bavota. 2022. To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?arXiv preprint arXiv:2204.06894(2022).Google Scholar
Catherine H Crouch and Eric Mazur. 2001. Peer instruction: Ten years of experience and results. American journal of physics 69, 9 (2001), 970–977.Google Scholar
Paul Denny, Diana Cukierman, and Jonathan Bhaskar. 2015. Measuring the Effect of Inventing Practice Exercises on Learning in an Introductory Programming Course. In Proceedings of the 15th Koli Calling Conference on Computing Education Research (Koli, Finland) (Koli Calling ’15). Association for Computing Machinery, New York, NY, USA, 13–22. https://doi.org/10.1145/2828959.2828967Google ScholarDigital Library
Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. Codewrite: supporting student-driven practice of java. In Proceedings of the 42nd ACM technical symposium on Computer science education. 471–476.Google ScholarDigital Library
Paul Denny, Ewan Tempero, Dawn Garbett, and Andrew Petersen. 2017. Examining a Student-Generated Question Activity Using Random Topic Assignment. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (Bologna, Italy) (ITiCSE ’17). Association for Computing Machinery, New York, NY, USA, 146–151. https://doi.org/10.1145/3059009.3059033Google ScholarDigital Library
Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, and Gilbert Strang. 2021. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level. https://doi.org/10.48550/ARXIV.2112.15594Google Scholar
Yuemeng Du, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of research on Parsons problems. In Proceedings of the Twenty-Second Australasian Computing Education Conference. 195–202.Google ScholarDigital Library
Angela Lee Duckworth and Lauren Eskreis-Winkler. 2013. True grit. Aps Observer 26(2013).Google Scholar
Rodrigo Duran, Albina Zavgorodniaia, and Juha Sorva. 2021. Cognitive Load Theory in Computing Education Research: A Review. (2021). http://rodrigoduran.net/papers/CLT_in_CER.pdf Preprint.Google Scholar
John Edwards, Joseph Ditton, Dragan Trninic, Hillary Swanson, Shelsey Sullivan, and Chad Mano. 2020. Syntax exercises in CS1. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 216–226.Google ScholarDigital Library
Stephen H. Edwards, Jürgen Börstler, Lillian N. Cassel, Mark S. Hall, and Joseph Hollingsworth. 2008. Developing a Common Format for Sharing Programming Assignments. SIGCSE Bull. 40, 4 (nov 2008), 167–182. https://doi.org/10.1145/1473195.1473240Google ScholarDigital Library
K Anders Ericsson, Ralf T Krampe, and Clemens Tesch-Römer. 1993. The role of deliberate practice in the acquisition of expert performance.Psychological review 100, 3 (1993), 363.Google Scholar
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common Logic Errors Made by Novice Programmers. In Proceedings of the 20th Australasian Computing Education Conference (Brisbane, Queensland, Australia) (ACE ’18). Association for Computing Machinery, New York, NY, USA, 83–89. https://doi.org/10.1145/3160489.3160493Google ScholarDigital Library
Katrina Falkner and Judy Sheard. 2019. Pedagogical approaches(1st ed.). Cambridge University Press, United Kingdom, 445–480. https://doi.org/10.1017/9781108654555.016Google Scholar
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).Google Scholar
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10–19.Google Scholar
Kathi Fisler. 2014. The recurring rainfall problem. In Proceedings of the tenth annual conference on International computing education research. 35–42.Google ScholarDigital Library
Max Fowler, Binglin Chen, Sushmita Azad, Matthew West, and Craig Zilles. 2021. Autograding ”Explain in Plain English” Questions Using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE ’21). Association for Computing Machinery, New York, NY, USA, 1163–1169. https://doi.org/10.1145/3408877.3432539Google ScholarDigital Library
Brian Hanks, Sue Fitzgerald, Renée McCauley, Laurie Murphy, and Carol Zander. 2011. Pair programming in education: a literature review. Computer Science Education 21, 2 (2011), 135–173. https://doi.org/10.1080/08993408.2011.579808 arXiv:https://doi.org/10.1080/08993408.2011.579808Google ScholarCross Ref
Mohammed Hassan and Craig Zilles. 2021. Exploring ‘Reverse-Tracing’ Questions as a Means of Assessing the Tracing Skill on Computer-Based CS 1 Exams. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). Association for Computing Machinery, New York, NY, USA, 115–126. https://doi.org/10.1145/3446871.3469765Google ScholarDigital Library
John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.Google Scholar
Arto Hellas, Juho Leinonen, and Petri Ihantola. 2017. Plagiarism in take-home exams: help-seeking, collaboration, and systematic cheating. In Proceedings of the 2017 ACM conference on innovation and technology in computer science education. 238–243.Google ScholarDigital Library
David Hovemeyer, Matthew Hertz, Paul Denny, Jaime Spacco, Andrei Papancea, John Stamper, and Kelly Rivers. 2013. CloudCoder: Building a Community for Creating, Assigning, Evaluating and Sharing Programming Exercises (Abstract Only). In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (Denver, Colorado, USA) (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA, 742. https://doi.org/10.1145/2445196.2445451Google ScholarDigital Library
Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proceedings of the 10th Koli calling international conference on computing education research. 86–93.Google ScholarDigital Library
Cruz Izu and Peter Dinh. 2018. Can Novice Programmers Write C Functions?. In 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 965–970. https://doi.org/10.1109/TALE.2018.8615375Google ScholarCross Ref
MA Jenkins and Joseph Frederick Traub. 1967. An algorithm for an automatic general polynomial solver. Citeseer.Google Scholar
Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.Google Scholar
Cazembe Kennedy, Aubrey Lawson, Yvon Feaster, and Eileen Kraemer. 2020. Misconception-Based Peer Feedback: A Pedagogical Technique for Reducing Misconceptions. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (Trondheim, Norway) (ITiCSE ’20). Association for Computing Machinery, New York, NY, USA, 166–172. https://doi.org/10.1145/3341525.3387392Google ScholarDigital Library
Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM Transactions on Computing Education (TOCE) 19, 1 (2018), 1–43.Google ScholarDigital Library
Juho Kim. 2015. Learnersourcing: improving learning with collective learner activity. Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. arXiv preprint arXiv:2205.11916(2022).Google Scholar
Teemu Lehtinen, André L Santos, and Juha Sorva. 2021. Let’s Ask Students About Their Programs, Automatically. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 467–475.Google Scholar
Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2021. Exploring the Effects of Contextualized Problem Descriptions on Problem Solving. In Australasian Computing Education Conference (Virtual, SA, Australia) (ACE ’21). Association for Computing Machinery, New York, NY, USA, 30–39. https://doi.org/10.1145/3441636.3442302Google ScholarDigital Library
Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2022. A Comparison of Immediate and Scheduled Feedback in Introductory Programming Projects. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1(Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 885–891. https://doi.org/10.1145/3478431.3499372Google ScholarDigital Library
Juho Leinonen, Krista Longi, Arto Klami, Alireza Ahadi, and Arto Vihavainen. 2016. Typing patterns and authentication in practical programming exams. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. 160–165.Google ScholarDigital Library
Juho Leinonen, Nea Pirttinen, and Arto Hellas. 2020. Crowdsourcing Content Creation for SQL Practice. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 349–355.Google ScholarDigital Library
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, 2022. Competition-Level Code Generation with AlphaCode. arXiv preprint arXiv:2203.07814(2022).Google Scholar
Raymond Lister. 2020. On the Cognitive Development of the Novice Programmer: And the Development of a Computing Education Researcher. In Proceedings of the 9th Computer Science Education Research Conference (Virtual Event, Netherlands) (CSERC ’20). Association for Computing Machinery, New York, NY, USA, Article 2, 15 pages. https://doi.org/10.1145/3442481.3442498Google ScholarDigital Library
Raymond Lister, Elizabeth S. Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto Seppälä, Beth Simon, and Lynda Thomas. 2004. A Multi-National Study of Reading and Tracing Skills in Novice Programmers. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (Leeds, United Kingdom) (ITiCSE-WGR ’04). Association for Computing Machinery, New York, NY, USA, 119–150. https://doi.org/10.1145/1044550.1041673Google ScholarDigital Library
Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship between Explaining, Tracing and Writing Skills in Introductory Programming. SIGCSE Bull. 41, 3 (jul 2009), 161–165. https://doi.org/10.1145/1595496.1562930Google ScholarDigital Library
Raymond Lister, Beth Simon, Errol Thompson, Jacqueline L. Whalley, and Christine Prasad. 2006. Not Seeing the Forest for the Trees: Novice Programmers and the SOLO Taxonomy. SIGCSE Bull. 38, 3 (jun 2006), 118–122. https://doi.org/10.1145/1140123.1140157Google ScholarDigital Library
Richard Lobb and Jenny Harlow. 2016. Coderunner: A Tool for Assessing Computer Programming Skills. ACM Inroads 7, 1 (feb 2016), 47–51. https://doi.org/10.1145/2810041Google ScholarDigital Library
Krista Longi, Juho Leinonen, Henrik Nygren, Joni Salmi, Arto Klami, and Arto Vihavainen. 2015. Identification of programmers from typing patterns. In Proceedings of the 15th Koli Calling conference on computing education research. 60–67.Google ScholarDigital Library
Renée McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67–92. https://doi.org/10.1080/08993400802114581 arXiv:https://doi.org/10.1080/08993400802114581Google ScholarCross Ref
Laurie Murphy, Sue Fitzgerald, Raymond Lister, and Renée McCauley. 2012. Ability to ’explain in Plain English’ Linked to Proficiency in Computer-Based Programming. In Proceedings of the Ninth Annual International Conference on International Computing Education Research (Auckland, New Zealand) (ICER ’12). Association for Computing Machinery, New York, NY, USA, 111–118. https://doi.org/10.1145/2361276.2361299Google ScholarDigital Library
Terence Nip, Elsa L. Gunter, Geoffrey L. Herman, Jason W. Morphew, and Matthew West. 2018. Using a Computer-Based Testing Facility to Improve Student Learning in a Programming Languages and Compilers Course. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 568–573. https://doi.org/10.1145/3159450.3159500Google ScholarDigital Library
Henrik Nygren, Juho Leinonen, and Arto Hellas. 2019. Non-restricted Access to Model Solutions: A Good Idea?. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 44–50.Google ScholarDigital Library
Henrik Nygren, Juho Leinonen, Nea Pirttinen, Antti Leinonen, and Arto Hellas. 2019. Experimenting with model solutions as a support mechanism. In Proceedings of the 1st UK & Ireland Computing Education Research Conference. 1–7.Google ScholarDigital Library
José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Transactions on Computing Education (TOCE) (2022).Google Scholar
Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2021. Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?arXiv preprint arXiv:2112.02125(2021).Google Scholar
Hammond Pearce, Benjamin Tan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Brendan Dolan-Gavitt. 2022. Pop Quiz! Can a Large Language Model Help With Reverse Engineering?arXiv preprint arXiv:2202.01142(2022).Google Scholar
D. N. Perkins, Chris Hancock, Renee Hobbs, Fay Martin, and Rebecca Simmons. 1986. Conditions of Learning in Novice Programmers. Journal of Educational Computing Research 2, 1 (1986), 37–55. https://doi.org/10.2190/GUJT-JCBJ-Q6QU-Q9PLGoogle ScholarCross Ref
Robert Phillips, Dan Lockton, Sharon Baurley, and Sarah Silve. 2013. Making Instructions for Others: Exploring Mental Models through a Simple Exercise. Interactions 20, 5 (sep 2013), 74–79. https://doi.org/10.1145/2505290Google ScholarDigital Library
Nea Pirttinen, Vilma Kangas, Irene Nikkarinen, Henrik Nygren, Juho Leinonen, and Arto Hellas. 2018. Crowdsourcing programming assignments with CrowdSorcerer. In Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. 326–331.Google ScholarDigital Library
Nea Pirttinen and Juho Leinonen. 2022. Can Students Review Their Peers? Comparison of Peer and Instructor Reviews. In Proceedings of the 27th ACM Conference on Innovation and Technology in Computer Science Education Vol 1.Google Scholar
Leo Porter, Daniel Zingaro, Cynthia Lee, Cynthia Taylor, Kevin C. Webb, and Michael Clancy. 2018. Developing Course-Level Learning Goals for Basic Data Structures in CS2. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 858–863. https://doi.org/10.1145/3159450.3159457Google ScholarDigital Library
Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code Tracing With Automatically Generated Programs. Association for Computing Machinery, New York, NY, USA, 427–433. https://doi.org/10.1145/3328778.3366939Google ScholarDigital Library
Emily Q Rosenzweig, Allan Wigfield, and Jacquelyne S Eccles. 2019. Expectancy-value theory and its relevance for student motivation and learning.(2019).Google Scholar
Sam Saarinen, Shriram Krishnamurthi, Kathi Fisler, and Preston Tunnell Wilson. 2019. Harnessing the Wisdom of the Classes: Classsourcing and Machine Learning for Assessment Instrument Generation. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 606–612. https://doi.org/10.1145/3287324.3287504Google ScholarDigital Library
Kate Sanders, Marzieh Ahmadzadeh, Tony Clear, Stephen H Edwards, Mikey Goldweber, Chris Johnson, Raymond Lister, Robert McCartney, Elizabeth Patitsas, and Jaime Spacco. 2013. The Canterbury QuestionBank: Building a repository of multiple-choice CS1 and CS2 questions. In Proceedings of the ITiCSE working group reports conference on Innovation and technology in computer science education-working group reports. 33–52.Google ScholarDigital Library
Kate Sanders, Jonas Boustedt, Anna Eckerdal, Robert McCartney, and Carol Zander. 2017. Folk Pedagogy: Nobody Doesn’t Like Active Learning. In Proceedings of the 2017 ACM Conference on International Computing Education Research (Tacoma, Washington, USA) (ICER ’17). Association for Computing Machinery, New York, NY, USA, 145–154. https://doi.org/10.1145/3105726.3106192Google ScholarDigital Library
Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A Rothkopf, and Kristian Kersting. 2022. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence 4, 3 (2022), 258–268.Google ScholarCross Ref
Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do we know how difficult the rainfall problem is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research. 87–96.Google ScholarDigital Library
Judy Sheard, Angela Carbone, Raymond Lister, Beth Simon, Errol Thompson, and Jacqueline L. Whalley. 2008. Going SOLO to Assess Novice Programmers. In Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education (Madrid, Spain) (ITiCSE ’08). Association for Computing Machinery, New York, NY, USA, 209–213. https://doi.org/10.1145/1384271.1384328Google ScholarDigital Library
Lee S Shulman. 2005. Signature pedagogies in the professions. Daedalus 134, 3 (2005), 52–59.Google ScholarCross Ref
Valerie J Shute. 2008. Focus on formative feedback. Review of educational research 78, 1 (2008), 153–189.Google Scholar
Anjali Singh, Christopher Brooks, Yiwen Lin, and Warren Li. 2021. What’s In It for the Learners? Evidence from a Randomized Field Experiment on Learnersourcing Questions in a MOOC. In Proceedings of the Eighth ACM Conference on Learning@ Scale. 221–233.Google ScholarDigital Library
E. Soloway. 1986. Learning to Program = Learning to Construct Mechanisms and Explanations. Commun. ACM 29, 9 (sep 1986), 850–858. https://doi.org/10.1145/6592.6594Google ScholarDigital Library
Elliot Soloway and Kate Ehrlich. 1984. Empirical studies of programming knowledge. IEEE Transactions on software engineering5 (1984), 595–609.Google ScholarDigital Library
Ben Stephenson. 2018. An Experience Using On-Computer Programming Questions During Exams. In Proceedings of the 23rd Western Canadian Conference on Computing Education (Victoria, BC, Canada) (WCCCE ’18). Association for Computing Machinery, New York, NY, USA, Article 11, 6 pages. https://doi.org/10.1145/3209635.3209639Google ScholarDigital Library
Zahid Ullah, Adidah Lajis, Mona Jamjoom, Abdulrahman Altalhi, Abdullah Al-Ghamdi, and Farrukh Saleem. 2018. The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education 26, 6 (2018), 2328–2341. https://doi.org/10.1002/cae.21974 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cae.21974Google ScholarCross Ref
Anne Venables, Grace Tan, and Raymond Lister. 2009. A Closer Look at Tracing, Explaining and Code Writing Skills in the Novice Programmer. In Proceedings of the Fifth International Workshop on Computing Education Research Workshop (Berkeley, CA, USA) (ICER ’09). Association for Computing Machinery, New York, NY, USA, 117–128. https://doi.org/10.1145/1584322.1584336Google ScholarDigital Library
Arto Vihavainen, Jonne Airaksinen, and Christopher Watson. 2014. A systematic review of approaches for teaching introductory programming and their influence on success. In Proceedings of the tenth annual conference on International computing education research. 19–26.Google ScholarDigital Library
Arto Vihavainen, Matti Paksula, and Matti Luukkainen. 2011. Extreme apprenticeship method in teaching programming for beginners. In Proceedings of the 42nd ACM technical symposium on Computer science education. 93–98.Google ScholarDigital Library
Regina Vollmeyer and Falko Rheinberg. 2005. A surprising effect of feedback on learning. Learning and instruction 15, 6 (2005), 589–602.Google Scholar
Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.Google Scholar
Jacqueline L. Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P. K. Ajith Kumar, and Christine Prasad. 2006. An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proceedings of the 8th Australasian Conference on Computing Education - Volume 52 (Hobart, Australia) (ACE ’06). Australian Computer Society, Inc., AUS, 243–252.Google ScholarDigital Library
Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z. Gajos, Walter S. Lasecki, and Neil Heffernan. 2016. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (Edinburgh, Scotland, UK) (L@S ’16). Association for Computing Machinery, New York, NY, USA, 379–388. https://doi.org/10.1145/2876034.2876042Google ScholarDigital Library
Laurie Williams, Robert R Kessler, Ward Cunningham, and Ron Jeffries. 2000. Strengthening the case for pair programming. IEEE software 17, 4 (2000), 19–25.Google ScholarDigital Library
John Wrenn, Shriram Krishnamurthi, and Kathi Fisler. 2018. Who Tests the Testers?. In Proceedings of the 2018 ACM Conference on International Computing Education Research (Espoo, Finland) (ICER ’18). Association for Computing Machinery, New York, NY, USA, 51–59. https://doi.org/10.1145/3230977.3230999Google ScholarDigital Library
Benjamin Xie, Dastyni Loksa, Greg L. Nelson, Matthew J. Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J. Ko. 2019. A Theory of Instruction for Introductory Programming Skills. Computer Science Education 29, 2-3 (2019), 205–253. https://doi.org/10.1080/08993408.2019.1565235Google ScholarCross Ref
Benjamin Xie, Greg L. Nelson, and Amy J. Ko. 2018. An Explicit Strategy to Scaffold Novice Program Tracing. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 344–349. https://doi.org/10.1145/3159450.3159527Google ScholarDigital Library
Lisa Yan, Annie Hu, and Chris Piech. 2019. Pensieve: Feedback on Coding Process for Novices. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 253–259.Google ScholarDigital Library

Recommendations

Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

AI code generators like OpenAI Codex have the potential to assist novice programmers by generating code from natural language descriptions, however, over-reliance might negatively impact learning and retention. To explore the implications that AI code ...
Read More
Using GitHub Copilot to Solve Simple Programming Problems
SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1

The teaching and assessment of introductory programming involves writing code that solves a problem described by text. Previous research found that OpenAI's Codex, a natural language machine learning model trained on billions of lines of code, performs ...
Read More
Comparing Code Explanations Created by Students and Large Language Models
ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1
August 2022
372 pages
ISBN:9781450391948
DOI:10.1145/3501385
Editors:
Jan Vahrenhold
Westfälische Wilhelms-Universität Münster, Germany
,
Kathi Fisler
Brown University, USA
,
Matthias Hauswirth
Università della Svizzera Italiana, Switzerland
,
Diana Franklin
University of Chicago, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
Automated feedback
CS1
Code explanations
Exercise generation
GPT-3
Large language models
Natural language generation
OpenAI Codex
Programming exercises
Resource generation
Robosourcing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate189of803submissions,24%
Upcoming Conference
ICER 2024

Sponsor:

sigcse

ACM Conference on International Computing Education Research

August 13 - 15, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 6,198
  Total Downloads
- Downloads (Last 12 months)4,211
- Downloads (Last 6 weeks)535
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models

ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1

ABSTRACT

References

Cited By

Recommendations

Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

Using GitHub Copilot to Solve Simple Programming Problems

Comparing Code Explanations Created by Students and Large Language Models