Automated Answer Scoring for Engineering’s Open-Ended Questions

  • Muhammad S Ahmed Eastern Michigan University
Keywords: Audience Response System, Close-Ended Questions, Automated Text Scoring, Automated Essay Evaluation, Natural Processing Language


Audience Response System (ARS), like “clicker,” has proven their effectiveness in students’ engagement and in enhancing their learning. Apart from close-ended questions, ARS can help instructors to pose open-ended questions. Such questions are not scored automatically for that Automated Text Scoring; ATS is vastly used. This paper presents the findings of the development of an intelligent Automated Text Scoring, iATS, which provides instantaneous scoring of students’ responses to STEM-related factual questions. iATS is integrated with an Audience Response System (ARS), known as iRes, which captures students’ responses in traditional classrooms environment using smartphones. iATS Research is conducted to code and test three Natural Language Processing (NLP), text similarity methods. The codes were developed in PHP and Python environments. Experiments were performed to test Cosine similarity, Jaccard Index and Corpus-based and knowledge-based measures, (CKM), scores against instructor’s manual grades. The research suggested that the cosine similarity and Jaccard index are underestimating with an error of 22% and 26%, respectively. CKM has a low error (18%), but it is overestimating the score. It is concluded that codes need to be modified with a corpus developed within the knowledge domain and a new regression model should be created to improve the accuracy of automatic scoring.


Download data is not yet available.

Author Biography

Muhammad S Ahmed, Eastern Michigan University

118 Sill Hall, Eastern Michigan University. Ypsilanti, MI. USA


Mayer, R. E., Stull, A., Deleeuw, K., Almeroth, K., Bimber, B., Chun, D., … Zhang, H. (2009). Clickers in college classrooms: Fostering learning with questioning methods in large lecture classes. Contemporary Educational Psychology, 34(1), 51–57. doi: 10.1016/j.cedpsych.2008.04.002

Wittrock, M. C. (1990). Generative processes of comprehension. Educational Psychologist, 24, 354–376.

Mayer, R. E. (2008). Learning and instruction. New York: Pearson Merrill Prentice Hall.

Duncan, D., 2005. Clickers in the Classroom: How to Enhance Science Teaching Using Classroom Response Systems. Addison-Wesley, New York.

Herreid, C., 2006. ‘‘Clicker” cases: introducing case study teaching into large classrooms. Journal of College Science Teaching 36 (2), 43–47.

Kane, M. T. (2006). Validation. In: R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

Patterson, B., Kilpatrick, J., & Woebkenberg, E. (2010). Evidence for teaching practice: The impact of clickers in a large classroom environment. Nurse Education Today, 30(7), 603–607. doi: 10.1016/j.nedt.2009.12.008

Blood, Ian. "Automated Essay Scoring: A Literature Review". Working Papers in TESOL and Applied Linguistics 11.(2011) (2016): n. pag. Web. 16 Apr. 2016.

Cavus, N., Ibrahim, D. (2009) ‘M-Learning: An experiment in using SMS to support learning new English language words’, British Journal of Educational Technology, 40(1): 78-91.

Ahmed, M. S. (2017). Lessons Learned from NSF I-Corps Boot Camp, Journal of Education and Practice, 8(26) pg 1-10.

Zupanc, K., & Bosni´c, Z. (2015). Advances in the Field of Automated Essay Evaluation. Informatica, 39, 383–395.

Condon, W. (2013). Large-scale assessment, locally-developed measures, and automated scoring of essays: Fishing for red herrings? Assessing Writing, 18(1), 100-108. doi:10.1016/j.asw.2012.11.001

Greene, P. (2018, July 4). Automated Essay Scoring Remains An Empty Dream. Retrieved from

Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2010). Automated grammatical error detection for language learners. In E. Hirst, Synthesis Lectures on Human Language Technologies, San Rafael, CA: Morgan & Claypool Publishers.

Shermis, M. D., & Burstein, J. C. (2013). Handbook of automated essay evaluation. New York, NY: Routledge.

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Beijing: OReilly.

Sphinx, (nd) Natural Language Toolkit. Retrieved August 21, 2019, from

Trilla, A. (2012, February 28). Natural Language Processing Toolkit for PHP. Retrieved from

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. Journal of Technology, Learning, and Assessment, 4 (3), 1-21.

Contreras, J. O., Hilles, S., & Abubakar, Z. B., (2018). Automated Essay Scoring with Ontology based on Text Mining and NLTK tools. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). doi: 10.1109/icscee.2018.8538399

Das, K., & Sinha, S. K. (2019). Identification and Analysis of Future User Interactions Using Some Link Prediction Methods in Social Networks. Data, Engineering, and Applications, 83–94. doi: 10.1007/978-981-13-6347-4_8

Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and Knowledge-based Measures of Text Semantic Similarity. Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 25. Retrieved from

How to Cite
Ahmed, M. S. (2019). Automated Answer Scoring for Engineering’s Open-Ended Questions. INTERNATIONAL JOURNAL OF RESEARCH IN EDUCATION METHODOLOGY, 10, 3398-3406.