О ПОДХОДАХ ДЛЯ ОПРЕДЕЛЕНИЯ МЕРЫ НЕСХОДСТВА В ТЕКСТОВЫХ ДАННЫХ
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
SCIENTIFIC JOURNAL BULLETIN OF VORONEZH INSTITUTE OF HIGH TECHNOLOGIES
Online media
ISSN 2949-4443

ABOUT APPROACHES TO DETERMINE THE MEASURE OF DISSIMILARITY IN TEXT DATA

Reshetnikov A.D.  

UDC 004.6

  • Abstract
  • List of references
  • About authors

Due to the rapid growth of textual data, it is extremely important to process it. The article discusses various ways to obtain a measure of similarity/dissimilarity for text data. Various methods are present-ed that take into account the lexical similarity of strings, as well as a semantic discrepancy of the text.

1. Yunianta A. Semantic data mapping technology to solve semantic data problem on heterogeneity aspect / A. Yunianta, O. M. Barukab, N. Yusof, N. Dengen, H. Haviluddin, M. S. Othman // International Journal of Advances in Intelligent Informatics. – 2017. – vol. 3, no. 3. – pp. 161–172.

2. Hidayat E. Y. Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering / E. Y. Hidayat , F. Firdausillah, K. Hastuti, I. N. Dewi, A. Azhari // International Journal of Advances in Intelligent Informatics. – 2015. – vol. 1, no. 3. – p. 132.

3. Hall P. A. V. Approximate string matching / Patrick A. V. Hall, Geoff R. Dowling // Computing Surveys. – 1980. – vol. 12 no. 4. – pp. 381–402.

4. Jaro, M. A. Advances in record linkage methodology as applied to the 1985 census of Tampa Florida / M. A. Jaro // Journal of the American Statistical Society. – 1989. – vol. 84, no. 406. – pp. 414-420.

5. Jaro, M. A. Probabilistic linkage of large public health data file / M. A. Jaro // Statistics in Medicine. – 1995. – vol. 14. – pp. 491–498.

6. Kondrak G. N-gram similarity and distance / G. Kondrak // International symposium on string processing and information retrieval. – 2005. – pp. 115–126.

7. Yu M. String similarity search and join: a survey / M. Yu, G. Li, D. Deng, J. Feng // Frontiers of Computer Science. – 2016. – vol.10, no. 3. – pp. 399–417.

8. Eugene F. K. Taxicab Geometry / F. K. Eugene. – Dover Publications, 1987. – p. 96

9. Dice L. R. Measures of the Amount of Ecologic Association Between Species / L. R. Dice // Ecology. – 1945. – vol. 26, no. 3. – pp. 297–302.

10. Lund K. Semantic and associative priming in high-dimensional semantic space / K. Lund // Proc. of the 17th Annual conferences of the Cognitive Science Society. – 1995, pp. 660–665.

11. Landauer T. K. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge / T. K. Landauer, S. T. Dumais // Psychological Review. – 1997. – vol. 104, no. 2. – pp. 211–240

12. Gabrilovich E. Computing semantic relatedness using wikipedia-based explicit semantic analysis / E. Gabrilovich, S. Markovitch // IJcAI. – 2007. – vol. 7. – pp. 1606–1611

13. Mihalcea R. Corpus based and knowledge-based measures of text semantic similarity / R. Mihalcea, C. Corley, C. Strapparava // American Association for Artificial Intelligence. – 2006. – vol. 6. – pp. 775–780,

14. T. Slimani Description and Evaluation of Semantic Similarity Measures Approaches / T. Slimani // International Journal of Computer Applications. – 2013. – vol. 80, no. 10. – pp. 25–33

15. Tversky A. Features of similarity / A. Tversky // Psychological Review. – 1977. – vol. 84, no. 4. – pp. 327–352, 1977

Reshetnikov A. D.


Voronezh, Russia

Keywords: similarity / dissimilarity measure, semantic similarity, lexical affinity, text data

For citation: Reshetnikov A.D. , ABOUT APPROACHES TO DETERMINE THE MEASURE OF DISSIMILARITY IN TEXT DATA. Bulletin of the Voronezh Institute of High Technologies. 2019;13(3). Available from: https://vestnikvivt.ru/ru/journal/pdf?id=968 (In Russ).

84

Full text in PDF

Published 30.09.2019