Keywords: similarity / dissimilarity measure, semantic similarity, lexical affinity, text data
ABOUT APPROACHES TO DETERMINE THE MEASURE OF DISSIMILARITY IN TEXT DATA
UDC 004.6
Due to the rapid growth of textual data, it is extremely important to process it. The article discusses various ways to obtain a measure of similarity/dissimilarity for text data. Various methods are present-ed that take into account the lexical similarity of strings, as well as a semantic discrepancy of the text.
1. Yunianta A. Semantic data mapping technology to solve semantic data problem on heterogeneity aspect / A. Yunianta, O. M. Barukab, N. Yusof, N. Dengen, H. Haviluddin, M. S. Othman // International Journal of Advances in Intelligent Informatics. – 2017. – vol. 3, no. 3. – pp. 161–172.
2. Hidayat E. Y. Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering / E. Y. Hidayat , F. Firdausillah, K. Hastuti, I. N. Dewi, A. Azhari // International Journal of Advances in Intelligent Informatics. – 2015. – vol. 1, no. 3. – p. 132.
3. Hall P. A. V. Approximate string matching / Patrick A. V. Hall, Geoff R. Dowling // Computing Surveys. – 1980. – vol. 12 no. 4. – pp. 381–402.
4. Jaro, M. A. Advances in record linkage methodology as applied to the 1985 census of Tampa Florida / M. A. Jaro // Journal of the American Statistical Society. – 1989. – vol. 84, no. 406. – pp. 414-420.
5. Jaro, M. A. Probabilistic linkage of large public health data file / M. A. Jaro // Statistics in Medicine. – 1995. – vol. 14. – pp. 491–498.
6. Kondrak G. N-gram similarity and distance / G. Kondrak // International symposium on string processing and information retrieval. – 2005. – pp. 115–126.
7. Yu M. String similarity search and join: a survey / M. Yu, G. Li, D. Deng, J. Feng // Frontiers of Computer Science. – 2016. – vol.10, no. 3. – pp. 399–417.
8. Eugene F. K. Taxicab Geometry / F. K. Eugene. – Dover Publications, 1987. – p. 96
9. Dice L. R. Measures of the Amount of Ecologic Association Between Species / L. R. Dice // Ecology. – 1945. – vol. 26, no. 3. – pp. 297–302.
10. Lund K. Semantic and associative priming in high-dimensional semantic space / K. Lund // Proc. of the 17th Annual conferences of the Cognitive Science Society. – 1995, pp. 660–665.
11. Landauer T. K. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge / T. K. Landauer, S. T. Dumais // Psychological Review. – 1997. – vol. 104, no. 2. – pp. 211–240
12. Gabrilovich E. Computing semantic relatedness using wikipedia-based explicit semantic analysis / E. Gabrilovich, S. Markovitch // IJcAI. – 2007. – vol. 7. – pp. 1606–1611
13. Mihalcea R. Corpus based and knowledge-based measures of text semantic similarity / R. Mihalcea, C. Corley, C. Strapparava // American Association for Artificial Intelligence. – 2006. – vol. 6. – pp. 775–780,
14. T. Slimani Description and Evaluation of Semantic Similarity Measures Approaches / T. Slimani // International Journal of Computer Applications. – 2013. – vol. 80, no. 10. – pp. 25–33
15. Tversky A. Features of similarity / A. Tversky // Psychological Review. – 1977. – vol. 84, no. 4. – pp. 327–352, 1977
Keywords: similarity / dissimilarity measure, semantic similarity, lexical affinity, text data
For citation: Reshetnikov A.D. , ABOUT APPROACHES TO DETERMINE THE MEASURE OF DISSIMILARITY IN TEXT DATA. Bulletin of the Voronezh Institute of High Technologies. 2019;13(3). Available from: https://vestnikvivt.ru/ru/journal/pdf?id=968 (In Russ).
Published 30.09.2019