نوع مقاله : مقاله پژوهشی - کاربردی

نویسندگان

1 گروه مهندسی کامپیوتر، دانشگاه آزاد اسلامی واحد رشت، گیلان، ایران.

2 گروه مهندسی کامپیوتر، موسسه آموزش عالی آیندگان، تنکابن، ایران

چکیده

هدف: نمره‏دهی خودکار آزمون‌های تشریحی فرآیند ارزیابی اتوماتیک پاسخ‌های سوالات مبتنی بر متن با استفاده از روش‌های محاسباتی و یادگیری ماشین است. گسترش استفاده از سیستم‌های آموزشی هوشمند و اهمیت ارزیابی نیاز به سیستم‌های خودکار برای نمره‌دهی آزمون‌ها را بیش از پیش افزایش داده است.
روش شناسی پژوهش: با توجه به اینکه در فرآیند نمره‌دهی خودکار، پاسخ‌های متنی ارائه شده توسط دانش‌آموزان با یک پاسخ ایده آل بر اساس میزان شباهت آن‌ها مورد مقایسه قرار می‌گیرد، می‌توان از تکنیک‌های محاسبه ارتباط و شباهت معنایی بین متون نیز برای اینکار بهره برد. در این راستا، در این مقاله ابتدا روش‌های مختلف محاسبه ارتباط معنایی در کاربرد ارزیابی خودکار آزمون‌های تشریحی با هم مقایسه و تاثیر دامنه و اندازه منبع دانش پیش‌زمینه‌ای روی دقت الگوریتم‌ها بررسی شد. در ادامه‌، یک رویکرد برای بهبود عملکرد سیستم نمره‌دهی خودکار آزمون‌های تشریحی معرفی شده که از پاسخ‌های ارائه شده توسط آزمون‌دهندگان که بالاترین نمره را دریافت کرده‌اند، به عنوان بازخورد استفاده می‌کند.
یافته‌ها: برای ارزیابی کارایی روش‌های محاسبه شباهت و ارتباط معنایی در کاربرد نمرده‌دهی خودکار آزمون‌های تشریحی و عملکرد مدل پیشنهادی، آزمایشاتی روی مجموعه داده ارائه شده توسط  موهلرو میهالسیا که دارای 7 سوال با 630 پاسخ تشریحی است، صورت گرفت.




اصالت/ارزش افزوده علمی:  بر اساس نتایج حاصل از آزمایش‌ها، نه ‌تنها روش‌های محاسبه ارتباط معنایی از کارایی بالایی در حوزه ارزیابی خودکار آزمون‌های تشریحی برخوردارند، بلکه استفاده از از بازخورد اتوماتیک نیز می‌تواند دقت و کارایی روشهای محاسبه ارتباط معنایی برای این هدف به طور قابل توجهی افزایش دهد.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Automatic assessment of short answers based on computational and data mining approaches

نویسندگان [English]

  • Hossein Sadr 1
  • Mojdeh Nazari Soleimandarabi 1
  • Zeinab Khodaverdian 2

1 Department of Computer Engineering, Islamic Azad University, Rasht Branch, Guilan, Iran.

2 گروه مهندسی کامپیوتر، موسسه آموزش عالی آیندگان، تنکابن، ایران

چکیده [English]

Purpose: Automatic short answer grading is known as the task of automatic assessment of answers based on natural language using computation methods and machine learning algorithms. The proliferation of large-scale intelligent education systems and the importance of assessment as a key factor in the education process have increased the need for highly flexible automated systems for scoring exams.
Methodology: While in the process of automatic short answer grading, student's answer is compared to an ideal response and scoring is done based on their similarity, semantic relatedness and similarity measures can also be employed for this aim. To this end, several semantic relatedness and similarity measures are firstly compared in application of short answer grading. In the following, a method for improving the performance of short answer grading systems based on semantic relatedness and similarity measures which leverages students' answers with the highest score as feedback is proposed.
Findings: In order to evaluate the performance of semantic and similarity relatedness methods in application of automatic short answer grading and the prposed model, various experiments were concucted on Mohler and Mihalcea dataset that contains 7 questions and 630 answers.




Originality/Value: Based on the empirical experiments not only semantic relatedness and similarity measures have great efficiency in automatic short answer grading but also using students' answers as feedback can considerably improve the accuracy and performance of semantic relatedness and similarity measures for this task.

کلیدواژه‌ها [English]

  • Data mining approaches
  • Short answer grading
  • Semantic relatedness
  • Semantic similarity
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational linguistics32(1), 13-47. https://doi.org/10.1162/coli.2006.32.1.13
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International journal of artificial intelligence in education25(1), 60-117.https://doi.org/10.1007/s40593-014-0026-8
Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology38(1), 188-230. https://doi.org/10.1002/aris.1440380105
Filighera, A., Steuer, T., & Rensing, C. (2020, July). Fooling automatic short answer grading systems. International conference on artificial intelligence in education (pp. 177-190). Cham: Springer. https://doi.org/10.1007/978-3-030-52237-7_15
Gabrilovich, E., & Markovitch, S. (2009). Wikipedia-based semantic interpretation for natural language processing. Journal of artificial intelligence research34, 443-498. DOI: https://doi.org/10.1613/jair.2669
Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database305, 305-332.
Jarmasz, M., & Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity. In  N. Nicolov., K. Bontcheva., G. Angelova., & R. Mitkov (Eds.), Recent advances in natural language processing iii.John Benjamins Publishing Co.
Jarmasz, M., & Szpakowicz, S. (2012). Roget's Thesaurus and semantic similarity. Proceedings of conference on recent advances in natural language processing. arXiv:1204.0245
Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.
Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database49(2), 265-283.
Lee, Y. Y., Ke, H., Yen, T. Y., Huang, H. H., & Chen, H. H. (2020). Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement. Journal of the association for information science and technology71(6), 657-670. https://doi.org/10.1002/asi.24289
Lesk, M. (1986, June). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th annual international conference on systems documentation, 24-26. https://doi.org/10.1145/318723.318728
Li, P., Xiao, B., Ma, W., Jiang, Y., & Zhang, Z. (2017). A graph-based semantic relatedness assessment method combining wikipedia features. Engineering applications of artificial intelligence65, 268-281. https://doi.org/10.1016/j.engappai.2017.07.027
Lin, D. (1998). An information-theoretic definition of similarity. The fifteenth international conference on machine learning (pp. 296-304). https://dl.acm.org/doi/10.5555/645527.657297
Mihalcea, R., Corley, C., & Strapparava, C. (2006, July). Corpus-based and knowledge-based measures of text semantic similarity. Proceedings of the 21st national conference on artificial intelligence (pp. 775-780). https://dl.acm.org/doi/10.5555/1597538.1597662
Mohler, M., & Mihalcea, R. (2009, March). Text-to-text semantic similarity for automatic short answer grading. The 12th conference of the european chapter of the ACL (pp. 567-575). Athens, Greece.  DOI: 10.3115/1609067.1609130
Mohler, M., Bunescu, R., & Mihalcea, R. (2011, June). Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. The 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 752-762). https://dl.acm.org/doi/10.5555/2002472.2002568
Nazari Soleimandarabi, M., Mirroshandel, S. A., & Sadr, H. (2015a). The significance of semantic relatedness and similarity measures in geographic information science. International journal of computer science and network solutions3(2), 12-23.    
Nazari Soleimandarabi, M., Mirroshandel, S. A., & Sadr, H. (2015b). A Survey of semantic relatedness measures. International journal of computer science and network solutions, 3(2), 1-11.
Patwardhan, S., & Pedersen, T. (2006). Using WordNet-based context vectors to estimate the semantic relatedness of concepts. The workshop on making sense of sense: bringing psycholinguistics and computational linguistics together.    
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004, July). WordNet:: similarity-measuring the relatedness of concepts. HLT-NAACL--Demonstrations '04: Demonstration Papers at HLT-NAACL 2004 (pp. 38–41). Association for Computational Linguistics.
Peinelt, N., Nguyen, D., & Liakata, M. (2020, July). tBERT: Topic models and BERT joining forces for semantic similarity detection. Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 7047-7055). 
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. Proceedings of LREC 2010 workshop new challenges for NLP frameworks. Valletta, Malta: University of Malta. https://is.muni.cz/publication/884893/en/Software-Framework-for-Topic-Modelling-with-Large-Corpora/Rehurek-Sojka
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007
Roitman, H., & Kurland, O. (2019, July). Query performance prediction for pseudo-feedback-based retrieval.  The 42nd international ACM SIGIR conference on research and development in information retrieval (pp. 1261-1264). https://doi.org/10.1145/3331184.3331369  
Roy, S., Rajkumar, A., & Narahari, Y. (2018). Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures. International journal of advances in engineering sciences and applied mathematics10(1), 105-113. https://doi.org/10.1007/s12572-017-0202-9
Sadr, H., & Nazari Solimandarabi, M. (2019). Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures. Journal of advances in computer research10(2), 17-30.
Sadr, H., Nazari, M., Pedram, M. M., & Teshnehlab, M. (2019a). Exploring the efficiency of topic-based models in computing semantic relatedness of geographic terms. International journal of web research2(2), 23-35.
Sadr, H., Pedram, M. M., & Teshnehlab, M. (2019c). A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural processing letters50(3), 2745-2761. https://doi.org/10.1007/s11063-019-10049-1
Sadr, H., Pedram, M. M., & Teshnehlab, M. (2020). Multi-view deep network: A deep model based on learning features from heterogeneous neural networks for sentiment analysis. IEEE access8, 86984-86997. DOI: 10.1109/ACCESS.2020.2992063
Sadr, H., Pedram, M. M., & Teshnehlab, M. (2021). Convolutional neural network equipped with attention mechanism and transfer learning for enhancing performance of sentiment analysis. Journal of AI and data mining. 9(2), 141-151. DOI:10.22044/jadm.2021.9618.2100
Sadr, H., Pedram, M. M., & Teshnelab, M. (2019b). Improving the performance of text sentiment analysis using deep convolutional neural network integrated with hierarchical attention layer. International journal of information and communication technology research11(3), 57-67. http://ijict.itrc.ac.ir/article-1-416-en.html
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.
Strube, M., & Ponzetto, S. P. (2006, July). WikiRelate! Computing semantic relatedness using Wikipedia. AAAI'06 Proceedings of the 21st national conference on Artificial intelligence (pp. 1419-1424).  https://dl.acm.org/doi/10.5555/1597348.1597414
Süzen, N., Gorban, A. N., Levesley, J., & Mirkes, E. M. (2020). Automatic short answer grading and feedback using text mining methods. Procedia computer science169, 726-743. https://doi.org/10.1016/j.procs.2020.02.171
Taieb, M. A. H., Aouicha, M. B., & Hamadou, A. B. (2013). Computing semantic relatedness using Wikipedia features. Knowledge-based systems50, 260-278. https://doi.org/10.1016/j.knosys.2013.06.015
Taieb, M. A. H., Zesch, T., & Aouicha, M. B. (2020). A survey of semantic relatedness evaluation datasets and procedures. Artificial intelligence review53(6), 4407-4448. https://doi.org/10.1007/s10462-019-09796-3
Witten, I. H., & Milne, D. N. (2008). An effective, low-cost measure of semantic relatedness obtained from Wikipedia links.
Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. The 32nd annual meeting on association for computational linguistics. arXiv preprint cmp-lg/9406033.   
Young, J. R. (2012). Inside the Coursera contract: How an upstart company might profit from free courses. The chronicle of higher education19(07), 2012.
Zesch, T., & Gurevych, I. (2010). Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural language engineering16(1), 25-59.
Zhang, L., Huang, Y., Yang, X., Yu, S., & Zhuang, F. (2019). An automatic short-answer grading model for semi-open-ended questions. Interactive learning environments, 1-14. https://doi.org/10.1080/10494820.2019.1648300
Zhang, Y., Lin, C., & Chi, M. (2020). Going deeper: Automatic short-answer grading by combining student and question models. User modeling and user-adapted interaction30(1), 51-80. https://doi.org/10.1007/s11257-019-09251-6
Zhang, Z., Gentile, A. L., & Ciravegna, F. (2013). Recent advances in methods of lexical semantic relatedness–a survey. Natural language engineering19(4), 411-479.
Zhu, X., Guo, Q., Zhang, B., & Li, F. (2019). An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links. Applied intelligence49(10), 3708-3730. https://doi.org/10.1007/s10489-019-01452-1