skip to main content
10.1145/3501385.3543957acmconferencesArticle/Chapter ViewAbstractPublication PagesicerConference Proceedingsconference-collections
research-article
Open Access
Best Paper

Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models

Published:03 August 2022Publication History

ABSTRACT

This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

References

  1. Onni Aarne, Petrus Peltola, Juho Leinonen, and Arto Hellas. 2018. A study of pair programming enjoyment and attendance using study motivation and strategy metrics. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. 759–764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.Google ScholarGoogle Scholar
  3. Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Joe Michael Allen, Frank Vahid, Kelly Downey, and Alex Daniel Edgcomb. 2018. Weekly programs in a CS1 class: Experiences with auto-graded many-small programs (MSP). In 2018 ASEE Annual Conference & Exposition.Google ScholarGoogle ScholarCross RefCross Ref
  5. Cory Althoff. 2022. The Self-Taught Programmer: The Definitive Guide to Programming Professionally. Hachette UK.Google ScholarGoogle Scholar
  6. Albert Bandura. 1977. Self-efficacy: toward a unifying theory of behavioral change.Psychological review 84, 2 (1977).Google ScholarGoogle Scholar
  7. Elisa Baniassad, Lucas Zamprogno, Braxton Hall, and Reid Holmes. 2021. STOP THE (AUTOGRADER) INSANITY: Regression Penalties to Deter Autograder Overreliance. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE ’21). Association for Computing Machinery, New York, NY, USA, 1062–1068. https://doi.org/10.1145/3408877.3432430Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John B. Biggs and K. F. Collis. 1982. Evaluating the quality of learning : the SOLO taxonomy (structure of the observed learning outcome) / John B. Biggs, Kevin F. Collis. Academic Press New York. xiii, 245 p. : pages.Google ScholarGoogle Scholar
  9. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in neural information processing systems. 1877–1901.Google ScholarGoogle Scholar
  10. Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A Validated Scoring Rubric for Explain-in-Plain-English Questions. Association for Computing Machinery, New York, NY, USA, 563–569. https://doi.org/10.1145/3328778.3366879Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021).Google ScholarGoogle Scholar
  12. Matteo Ciniselli, Luca Pascarella, and Gabriele Bavota. 2022. To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?arXiv preprint arXiv:2204.06894(2022).Google ScholarGoogle Scholar
  13. Catherine H Crouch and Eric Mazur. 2001. Peer instruction: Ten years of experience and results. American journal of physics 69, 9 (2001), 970–977.Google ScholarGoogle Scholar
  14. Paul Denny, Diana Cukierman, and Jonathan Bhaskar. 2015. Measuring the Effect of Inventing Practice Exercises on Learning in an Introductory Programming Course. In Proceedings of the 15th Koli Calling Conference on Computing Education Research (Koli, Finland) (Koli Calling ’15). Association for Computing Machinery, New York, NY, USA, 13–22. https://doi.org/10.1145/2828959.2828967Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. Codewrite: supporting student-driven practice of java. In Proceedings of the 42nd ACM technical symposium on Computer science education. 471–476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Paul Denny, Ewan Tempero, Dawn Garbett, and Andrew Petersen. 2017. Examining a Student-Generated Question Activity Using Random Topic Assignment. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (Bologna, Italy) (ITiCSE ’17). Association for Computing Machinery, New York, NY, USA, 146–151. https://doi.org/10.1145/3059009.3059033Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, and Gilbert Strang. 2021. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level. https://doi.org/10.48550/ARXIV.2112.15594Google ScholarGoogle Scholar
  18. Yuemeng Du, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of research on Parsons problems. In Proceedings of the Twenty-Second Australasian Computing Education Conference. 195–202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Angela Lee Duckworth and Lauren Eskreis-Winkler. 2013. True grit. Aps Observer 26(2013).Google ScholarGoogle Scholar
  20. Rodrigo Duran, Albina Zavgorodniaia, and Juha Sorva. 2021. Cognitive Load Theory in Computing Education Research: A Review. (2021). http://rodrigoduran.net/papers/CLT_in_CER.pdf Preprint.Google ScholarGoogle Scholar
  21. John Edwards, Joseph Ditton, Dragan Trninic, Hillary Swanson, Shelsey Sullivan, and Chad Mano. 2020. Syntax exercises in CS1. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 216–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Stephen H. Edwards, Jürgen Börstler, Lillian N. Cassel, Mark S. Hall, and Joseph Hollingsworth. 2008. Developing a Common Format for Sharing Programming Assignments. SIGCSE Bull. 40, 4 (nov 2008), 167–182. https://doi.org/10.1145/1473195.1473240Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K Anders Ericsson, Ralf T Krampe, and Clemens Tesch-Römer. 1993. The role of deliberate practice in the acquisition of expert performance.Psychological review 100, 3 (1993), 363.Google ScholarGoogle Scholar
  24. Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common Logic Errors Made by Novice Programmers. In Proceedings of the 20th Australasian Computing Education Conference (Brisbane, Queensland, Australia) (ACE ’18). Association for Computing Machinery, New York, NY, USA, 83–89. https://doi.org/10.1145/3160489.3160493Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Katrina Falkner and Judy Sheard. 2019. Pedagogical approaches(1st ed.). Cambridge University Press, United Kingdom, 445–480. https://doi.org/10.1017/9781108654555.016Google ScholarGoogle Scholar
  26. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).Google ScholarGoogle Scholar
  27. James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10–19.Google ScholarGoogle Scholar
  28. Kathi Fisler. 2014. The recurring rainfall problem. In Proceedings of the tenth annual conference on International computing education research. 35–42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Max Fowler, Binglin Chen, Sushmita Azad, Matthew West, and Craig Zilles. 2021. Autograding ”Explain in Plain English” Questions Using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (Virtual Event, USA) (SIGCSE ’21). Association for Computing Machinery, New York, NY, USA, 1163–1169. https://doi.org/10.1145/3408877.3432539Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Brian Hanks, Sue Fitzgerald, Renée McCauley, Laurie Murphy, and Carol Zander. 2011. Pair programming in education: a literature review. Computer Science Education 21, 2 (2011), 135–173. https://doi.org/10.1080/08993408.2011.579808 arXiv:https://doi.org/10.1080/08993408.2011.579808Google ScholarGoogle ScholarCross RefCross Ref
  31. Mohammed Hassan and Craig Zilles. 2021. Exploring ‘Reverse-Tracing’ Questions as a Means of Assessing the Tracing Skill on Computer-Based CS 1 Exams. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). Association for Computing Machinery, New York, NY, USA, 115–126. https://doi.org/10.1145/3446871.3469765Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.Google ScholarGoogle Scholar
  33. Arto Hellas, Juho Leinonen, and Petri Ihantola. 2017. Plagiarism in take-home exams: help-seeking, collaboration, and systematic cheating. In Proceedings of the 2017 ACM conference on innovation and technology in computer science education. 238–243.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. David Hovemeyer, Matthew Hertz, Paul Denny, Jaime Spacco, Andrei Papancea, John Stamper, and Kelly Rivers. 2013. CloudCoder: Building a Community for Creating, Assigning, Evaluating and Sharing Programming Exercises (Abstract Only). In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (Denver, Colorado, USA) (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA, 742. https://doi.org/10.1145/2445196.2445451Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proceedings of the 10th Koli calling international conference on computing education research. 86–93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cruz Izu and Peter Dinh. 2018. Can Novice Programmers Write C Functions?. In 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 965–970. https://doi.org/10.1109/TALE.2018.8615375Google ScholarGoogle ScholarCross RefCross Ref
  37. MA Jenkins and Joseph Frederick Traub. 1967. An algorithm for an automatic general polynomial solver. Citeseer.Google ScholarGoogle Scholar
  38. Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.Google ScholarGoogle Scholar
  39. Cazembe Kennedy, Aubrey Lawson, Yvon Feaster, and Eileen Kraemer. 2020. Misconception-Based Peer Feedback: A Pedagogical Technique for Reducing Misconceptions. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (Trondheim, Norway) (ITiCSE ’20). Association for Computing Machinery, New York, NY, USA, 166–172. https://doi.org/10.1145/3341525.3387392Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM Transactions on Computing Education (TOCE) 19, 1 (2018), 1–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Juho Kim. 2015. Learnersourcing: improving learning with collective learner activity. Ph. D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  42. Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. arXiv preprint arXiv:2205.11916(2022).Google ScholarGoogle Scholar
  43. Teemu Lehtinen, André L Santos, and Juha Sorva. 2021. Let’s Ask Students About Their Programs, Automatically. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 467–475.Google ScholarGoogle Scholar
  44. Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2021. Exploring the Effects of Contextualized Problem Descriptions on Problem Solving. In Australasian Computing Education Conference (Virtual, SA, Australia) (ACE ’21). Association for Computing Machinery, New York, NY, USA, 30–39. https://doi.org/10.1145/3441636.3442302Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Juho Leinonen, Paul Denny, and Jacqueline Whalley. 2022. A Comparison of Immediate and Scheduled Feedback in Introductory Programming Projects. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1(Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 885–891. https://doi.org/10.1145/3478431.3499372Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Juho Leinonen, Krista Longi, Arto Klami, Alireza Ahadi, and Arto Vihavainen. 2016. Typing patterns and authentication in practical programming exams. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. 160–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Juho Leinonen, Nea Pirttinen, and Arto Hellas. 2020. Crowdsourcing Content Creation for SQL Practice. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 349–355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, 2022. Competition-Level Code Generation with AlphaCode. arXiv preprint arXiv:2203.07814(2022).Google ScholarGoogle Scholar
  49. Raymond Lister. 2020. On the Cognitive Development of the Novice Programmer: And the Development of a Computing Education Researcher. In Proceedings of the 9th Computer Science Education Research Conference (Virtual Event, Netherlands) (CSERC ’20). Association for Computing Machinery, New York, NY, USA, Article 2, 15 pages. https://doi.org/10.1145/3442481.3442498Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Raymond Lister, Elizabeth S. Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto Seppälä, Beth Simon, and Lynda Thomas. 2004. A Multi-National Study of Reading and Tracing Skills in Novice Programmers. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (Leeds, United Kingdom) (ITiCSE-WGR ’04). Association for Computing Machinery, New York, NY, USA, 119–150. https://doi.org/10.1145/1044550.1041673Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship between Explaining, Tracing and Writing Skills in Introductory Programming. SIGCSE Bull. 41, 3 (jul 2009), 161–165. https://doi.org/10.1145/1595496.1562930Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Raymond Lister, Beth Simon, Errol Thompson, Jacqueline L. Whalley, and Christine Prasad. 2006. Not Seeing the Forest for the Trees: Novice Programmers and the SOLO Taxonomy. SIGCSE Bull. 38, 3 (jun 2006), 118–122. https://doi.org/10.1145/1140123.1140157Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Richard Lobb and Jenny Harlow. 2016. Coderunner: A Tool for Assessing Computer Programming Skills. ACM Inroads 7, 1 (feb 2016), 47–51. https://doi.org/10.1145/2810041Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Krista Longi, Juho Leinonen, Henrik Nygren, Joni Salmi, Arto Klami, and Arto Vihavainen. 2015. Identification of programmers from typing patterns. In Proceedings of the 15th Koli Calling conference on computing education research. 60–67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Renée McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67–92. https://doi.org/10.1080/08993400802114581 arXiv:https://doi.org/10.1080/08993400802114581Google ScholarGoogle ScholarCross RefCross Ref
  56. Laurie Murphy, Sue Fitzgerald, Raymond Lister, and Renée McCauley. 2012. Ability to ’explain in Plain English’ Linked to Proficiency in Computer-Based Programming. In Proceedings of the Ninth Annual International Conference on International Computing Education Research (Auckland, New Zealand) (ICER ’12). Association for Computing Machinery, New York, NY, USA, 111–118. https://doi.org/10.1145/2361276.2361299Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Terence Nip, Elsa L. Gunter, Geoffrey L. Herman, Jason W. Morphew, and Matthew West. 2018. Using a Computer-Based Testing Facility to Improve Student Learning in a Programming Languages and Compilers Course. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 568–573. https://doi.org/10.1145/3159450.3159500Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Henrik Nygren, Juho Leinonen, and Arto Hellas. 2019. Non-restricted Access to Model Solutions: A Good Idea?. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 44–50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Henrik Nygren, Juho Leinonen, Nea Pirttinen, Antti Leinonen, and Arto Hellas. 2019. Experimenting with model solutions as a support mechanism. In Proceedings of the 1st UK & Ireland Computing Education Research Conference. 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Transactions on Computing Education (TOCE) (2022).Google ScholarGoogle Scholar
  61. Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2021. Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?arXiv preprint arXiv:2112.02125(2021).Google ScholarGoogle Scholar
  62. Hammond Pearce, Benjamin Tan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Brendan Dolan-Gavitt. 2022. Pop Quiz! Can a Large Language Model Help With Reverse Engineering?arXiv preprint arXiv:2202.01142(2022).Google ScholarGoogle Scholar
  63. D. N. Perkins, Chris Hancock, Renee Hobbs, Fay Martin, and Rebecca Simmons. 1986. Conditions of Learning in Novice Programmers. Journal of Educational Computing Research 2, 1 (1986), 37–55. https://doi.org/10.2190/GUJT-JCBJ-Q6QU-Q9PLGoogle ScholarGoogle ScholarCross RefCross Ref
  64. Robert Phillips, Dan Lockton, Sharon Baurley, and Sarah Silve. 2013. Making Instructions for Others: Exploring Mental Models through a Simple Exercise. Interactions 20, 5 (sep 2013), 74–79. https://doi.org/10.1145/2505290Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Nea Pirttinen, Vilma Kangas, Irene Nikkarinen, Henrik Nygren, Juho Leinonen, and Arto Hellas. 2018. Crowdsourcing programming assignments with CrowdSorcerer. In Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. 326–331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Nea Pirttinen and Juho Leinonen. 2022. Can Students Review Their Peers? Comparison of Peer and Instructor Reviews. In Proceedings of the 27th ACM Conference on Innovation and Technology in Computer Science Education Vol 1.Google ScholarGoogle Scholar
  67. Leo Porter, Daniel Zingaro, Cynthia Lee, Cynthia Taylor, Kevin C. Webb, and Michael Clancy. 2018. Developing Course-Level Learning Goals for Basic Data Structures in CS2. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 858–863. https://doi.org/10.1145/3159450.3159457Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code Tracing With Automatically Generated Programs. Association for Computing Machinery, New York, NY, USA, 427–433. https://doi.org/10.1145/3328778.3366939Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Emily Q Rosenzweig, Allan Wigfield, and Jacquelyne S Eccles. 2019. Expectancy-value theory and its relevance for student motivation and learning.(2019).Google ScholarGoogle Scholar
  70. Sam Saarinen, Shriram Krishnamurthi, Kathi Fisler, and Preston Tunnell Wilson. 2019. Harnessing the Wisdom of the Classes: Classsourcing and Machine Learning for Assessment Instrument Generation. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 606–612. https://doi.org/10.1145/3287324.3287504Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Kate Sanders, Marzieh Ahmadzadeh, Tony Clear, Stephen H Edwards, Mikey Goldweber, Chris Johnson, Raymond Lister, Robert McCartney, Elizabeth Patitsas, and Jaime Spacco. 2013. The Canterbury QuestionBank: Building a repository of multiple-choice CS1 and CS2 questions. In Proceedings of the ITiCSE working group reports conference on Innovation and technology in computer science education-working group reports. 33–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Kate Sanders, Jonas Boustedt, Anna Eckerdal, Robert McCartney, and Carol Zander. 2017. Folk Pedagogy: Nobody Doesn’t Like Active Learning. In Proceedings of the 2017 ACM Conference on International Computing Education Research (Tacoma, Washington, USA) (ICER ’17). Association for Computing Machinery, New York, NY, USA, 145–154. https://doi.org/10.1145/3105726.3106192Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A Rothkopf, and Kristian Kersting. 2022. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence 4, 3 (2022), 258–268.Google ScholarGoogle ScholarCross RefCross Ref
  74. Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do we know how difficult the rainfall problem is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research. 87–96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Judy Sheard, Angela Carbone, Raymond Lister, Beth Simon, Errol Thompson, and Jacqueline L. Whalley. 2008. Going SOLO to Assess Novice Programmers. In Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education (Madrid, Spain) (ITiCSE ’08). Association for Computing Machinery, New York, NY, USA, 209–213. https://doi.org/10.1145/1384271.1384328Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Lee S Shulman. 2005. Signature pedagogies in the professions. Daedalus 134, 3 (2005), 52–59.Google ScholarGoogle ScholarCross RefCross Ref
  77. Valerie J Shute. 2008. Focus on formative feedback. Review of educational research 78, 1 (2008), 153–189.Google ScholarGoogle Scholar
  78. Anjali Singh, Christopher Brooks, Yiwen Lin, and Warren Li. 2021. What’s In It for the Learners? Evidence from a Randomized Field Experiment on Learnersourcing Questions in a MOOC. In Proceedings of the Eighth ACM Conference on Learning@ Scale. 221–233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. E. Soloway. 1986. Learning to Program = Learning to Construct Mechanisms and Explanations. Commun. ACM 29, 9 (sep 1986), 850–858. https://doi.org/10.1145/6592.6594Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Elliot Soloway and Kate Ehrlich. 1984. Empirical studies of programming knowledge. IEEE Transactions on software engineering5 (1984), 595–609.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Ben Stephenson. 2018. An Experience Using On-Computer Programming Questions During Exams. In Proceedings of the 23rd Western Canadian Conference on Computing Education (Victoria, BC, Canada) (WCCCE ’18). Association for Computing Machinery, New York, NY, USA, Article 11, 6 pages. https://doi.org/10.1145/3209635.3209639Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Zahid Ullah, Adidah Lajis, Mona Jamjoom, Abdulrahman Altalhi, Abdullah Al-Ghamdi, and Farrukh Saleem. 2018. The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education 26, 6 (2018), 2328–2341. https://doi.org/10.1002/cae.21974 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cae.21974Google ScholarGoogle ScholarCross RefCross Ref
  83. Anne Venables, Grace Tan, and Raymond Lister. 2009. A Closer Look at Tracing, Explaining and Code Writing Skills in the Novice Programmer. In Proceedings of the Fifth International Workshop on Computing Education Research Workshop (Berkeley, CA, USA) (ICER ’09). Association for Computing Machinery, New York, NY, USA, 117–128. https://doi.org/10.1145/1584322.1584336Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Arto Vihavainen, Jonne Airaksinen, and Christopher Watson. 2014. A systematic review of approaches for teaching introductory programming and their influence on success. In Proceedings of the tenth annual conference on International computing education research. 19–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Arto Vihavainen, Matti Paksula, and Matti Luukkainen. 2011. Extreme apprenticeship method in teaching programming for beginners. In Proceedings of the 42nd ACM technical symposium on Computer science education. 93–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Regina Vollmeyer and Falko Rheinberg. 2005. A surprising effect of feedback on learning. Learning and instruction 15, 6 (2005), 589–602.Google ScholarGoogle Scholar
  87. Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.Google ScholarGoogle Scholar
  88. Jacqueline L. Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P. K. Ajith Kumar, and Christine Prasad. 2006. An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proceedings of the 8th Australasian Conference on Computing Education - Volume 52 (Hobart, Australia) (ACE ’06). Australian Computer Society, Inc., AUS, 243–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z. Gajos, Walter S. Lasecki, and Neil Heffernan. 2016. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (Edinburgh, Scotland, UK) (L@S ’16). Association for Computing Machinery, New York, NY, USA, 379–388. https://doi.org/10.1145/2876034.2876042Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Laurie Williams, Robert R Kessler, Ward Cunningham, and Ron Jeffries. 2000. Strengthening the case for pair programming. IEEE software 17, 4 (2000), 19–25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. John Wrenn, Shriram Krishnamurthi, and Kathi Fisler. 2018. Who Tests the Testers?. In Proceedings of the 2018 ACM Conference on International Computing Education Research (Espoo, Finland) (ICER ’18). Association for Computing Machinery, New York, NY, USA, 51–59. https://doi.org/10.1145/3230977.3230999Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Benjamin Xie, Dastyni Loksa, Greg L. Nelson, Matthew J. Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J. Ko. 2019. A Theory of Instruction for Introductory Programming Skills. Computer Science Education 29, 2-3 (2019), 205–253. https://doi.org/10.1080/08993408.2019.1565235Google ScholarGoogle ScholarCross RefCross Ref
  93. Benjamin Xie, Greg L. Nelson, and Amy J. Ko. 2018. An Explicit Strategy to Scaffold Novice Program Tracing. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE ’18). Association for Computing Machinery, New York, NY, USA, 344–349. https://doi.org/10.1145/3159450.3159527Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Lisa Yan, Annie Hu, and Chris Piech. 2019. Pensieve: Feedback on Coding Process for Novices. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE ’19). Association for Computing Machinery, New York, NY, USA, 253–259.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format