Desarrollo de un sistema de análisis de autoría de textos de literatura de autores hispanohablantes

Borja Macías, David Elías

dc.contributor.advisor	Martínez Quezada, Daniel Orlando
dc.contributor.advisor	Ortiz Beltrán, Ariel Orlando
dc.contributor.author	Borja Macías, David Elías
dc.coverage.spatial	Colombia	spa
dc.date.accessioned	2021-08-26T19:38:22Z
dc.date.available	2021-08-26T19:38:22Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/20.500.12749/14040
dc.description.abstract	Tras el notorio auge de aplicaciones de Machine Learning en los últimos años, mayoritariamente del procesamiento de imágenes y audio, son pocas las aplicaciones en el área de la literatura, especialmente el reconocimiento de autoría. Por eso surge la pregunta ¿Qué tan efectivas son las técnicas de Machine Learning para la identificación de patrones de grandes volúmenes de textos literarios en el contexto hispanoamericano? Por ende, el objetivo de este trabajo fue desarrollar un sistema inteligente de reconocimiento de estilos literarios basado en obras de literatura universal en español, para automatizar la creación de textos que repliquen el estilo de los autores. Para llevar acabo la investigación se realizó una revisión del estado del arte en técnicas de Machine Learning para la problemática de clasificación de textos y el procesamiento del lenguaje natural. Posteriormente se recolectaron 86 obras literarias de dominio público de 8 autores, a la cual se le realizó un preprocesamiento para la extracción de características de frecuencia inversa de documento (TF-IDF), que se usan para formar vectores de características. Los modelos de Machine Learning propuestos fueron Naïve Bayes, Support Vector Machine y K-Nearest Neighbors; para la clasificación, y cadenas de Markov para la generación de texto, siendo el modelo de clasificación con mejor resultado Naïves Bayes con un accuracy de 0.6453125, y mejor valor del hiperparámetro keysize para la cadena de Markov de 3. Teniendo esto en cuenta cabe resaltar las limitaciones tenidas en este proyecto debido a los modelos de Machine Learning utilizados junto a la cantidad de características extraídas, y se recomienda implementar nuevos modelos capacitados en el análisis de series de tiempo temporales.	spa
dc.description.tableofcontents	1. INTRODUCCIÓN...................................................................................................7 2. OBJETIVO.............................................................................................................9 2.1 OBJETIVO GENERAL.........................................................................................9 2.2 OBJETIVOS ESPECÍFICOS...............................................................................9 2.3 RESULTADOS...................................................................................................10 2.4 METODOLOGÍA.................................................................................................11 3. MARCO TEÓRICO..............................................................................................13 3.1 ESTADO DEL ARTE..........................................................................................13 3.2 BASE TEÓRICA.................................................................................................28 3.2.1 Aprendizaje Automática en máquinas................................................28 3.2.2 Selección de características..............................................................34 3.2.3 Selección de características..............................................................37 3.2.4 Modelo de clasificación......................................................................41 3.3 BASE CONCEPTUAL........................................................................................46 4. RESULTADOS.....................................................................................................51 4.1 CLASIFICADOR................................................................................................51 4.2 GENERADOR....................................................................................................57 5. CONCLUSIONES................................................................................................59 6. REFERENCIAS...................................................................................................61 7. ANEXOS..............................................................................................................72	spa
dc.format.mimetype	application/pdf	spa
dc.language.iso	spa	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/co/	*
dc.title	Desarrollo de un sistema de análisis de autoría de textos de literatura de autores hispanohablantes	spa
dc.title.translated	Development of a system for analyzing the authorship of literature texts by Spanish-speaking authors	spa
dc.degree.name	Ingeniero de Sistemas	spa
dc.publisher.grantor	Universidad Autónoma de Bucaramanga UNAB	spa
dc.rights.local	Abierto (Texto Completo)	spa
dc.publisher.faculty	Facultad Ingeniería	spa
dc.publisher.program	Pregrado Ingeniería de Sistemas	spa
dc.description.degreelevel	Pregrado	spa
dc.type.driver	info:eu-repo/semantics/bachelorThesis
dc.type.local	Trabajo de Grado	spa
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.subject.keywords	Systems engineer	spa
dc.subject.keywords	Technological innovations	spa
dc.subject.keywords	Machine learning	spa
dc.subject.keywords	Authorship	spa
dc.subject.keywords	Literature	spa
dc.subject.keywords	Natural language processing	spa
dc.subject.keywords	Categorization	spa
dc.subject.keywords	Artificial intelligence	spa
dc.subject.keywords	Machine theory	spa
dc.subject.keywords	Authors	spa
dc.subject.keywords	Data processing	spa
dc.identifier.instname	instname:Universidad Autónoma de Bucaramanga - UNAB	spa
dc.identifier.reponame	reponame:Repositorio Institucional UNAB	spa
dc.type.hasversion	info:eu-repo/semantics/acceptedVersion
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.accessrights	http://purl.org/coar/access_right/c_abf2	spa
dc.relation.references	Analytics Software & Solutions. (s. f.-a). Aprendizaje automático: Qué es y por qué es importante. Recuperado 22 de marzo de 2019, de https://www.sas.com/es_co/insights/analytics/machine-learning.html	spa
dc.relation.references	Analytics Software & Solutions. (s. f.-b). What is Natural Language Processing? Recuperado 29 de marzo de 2019, de https://www.sas.com/en_us/insights/analytics/what-is-natural-languageprocessing-nlp.html	spa
dc.relation.references	Arcila-Calderón, C., Ortega-Mohedano, F., Jiménez-Amores, J., & Trullenque, S. (2017). Análisis supervisado de sentimientos políticos en español: Clasificación en tiempo real de tweets basada en aprendizaje automático. El profesional de la información (EPI), 26(5), 973-982. https://doi.org/10.3145/epi.2017.sep.18	spa
dc.relation.references	Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119. https://doi.org/10.1145/1461928.1461959	spa
dc.relation.references	Bermejo, E., & Martínez, Á. (2017, marzo). Machine Learning Whitepaper. Recuperado de https://www.slideshare.net/raona/machine-learningwhitepaper	spa
dc.relation.references	Betancourt, G. A. (2005). LAS MÁQUINAS DE SOPORTE VECTORIAL (SVMs). Scientia et technica, 1(27). https://doi.org/10.22517/23447214.6895	spa
dc.relation.references	Caballero, Y., Bello, R., Arco, L., Cárdenas, B., Márquez, Y., & García, M. M. (2010). LA TEORÍA DE LOS CONJUNTOS APROXIMADOS PARA EL DESCUBRIMIENTO DE CONOCIMIENTO. (162), 261-270	spa
dc.relation.references	Camacho, por J. A. (2018, octubre 26). Linear Discriminant Analysis. Recuperado 24 de octubre de 2019, de JacobSoft website: https://www.jacobsoft.com.mx/es_mx/linear-discriminant-analysis/	spa
dc.relation.references	CLiPS. (2010, octubre 13). MBSP for Python \| CLiPS. Recuperado 17 de mayo de 2019, de http://www.clips.ua.ac.be/pages/MBSP	spa
dc.relation.references	Cortes Vasquez, A. (2015). Learning System of Web Navigation Patterns through Hypertext Probabilistic Grammars. 11, 72-78. http://dx.doi.org/10.17981/ingecuc.11.1.2015.07	spa
dc.relation.references	Dans, E. (2013). Estilometría y anonimato. Recuperado 8 de abril de 2019, de EnriqueDans website: https://www.enriquedans.com/2013/08/estilometria-yanonimato.html	spa
dc.relation.references	ESAcademic. (s. f.). Derivación (lingüística) [Diccionario]. Recuperado 21 de abril de 2019, de Los diccionarios y las enciclopedias sobre el Académico website: http://www.esacademic.com/dic.nsf/eswiki/343084	spa
dc.relation.references	Espitia Betancourt, C. A., & Páramo Lozada, J. P. (2018). Aplicación del aprendizaje automático en la clasificación de textos cortos: Un caso de estudio en el conflicto armado colombiano. Recuperado de https://repository.ucatolica.edu.co/handle/10983/22546	spa
dc.relation.references	estilometria.com. (s. f.). Estilometría. Recuperado 7 de abril de 2019, de ESTILOMETRÍA website: http://www.estilometria.com/	spa
dc.relation.references	García, L. G. (2018). CLASIFICADOR MEJORADO DE TEXTOS PARA EL CONTEXTO DE MEDIO AMBIENTE USANDO NAIVE BAYES MULTINOMIAL EN MÉXICO. 12.	spa
dc.relation.references	González, C., Vega, Á., Vega, G., & Luengos, G. (2017). EstilometríaTSO – Estilometría aplicada al teatro del Siglo de Oro. Recuperado 8 de abril de 2019, de http://estilometriatso.com/	spa
dc.relation.references	Gonzalez, L. (2019). Curvas ROC y Área bajo la curva (AUC) \| #34 Curso Machine Learning con Python. Recuperado de https://www.youtube.com/watch? v=AcbbkCL0dlo	spa
dc.relation.references	González, L. (2019, enero 4). Métodos de Selección de Características. Recuperado 24 de octubre de 2019, de Ligdi González website: http://ligdigonzalez.com/metodos-de-seleccion-de-caracteristicas-machinelearning/	spa
dc.relation.references	González-Avella, J. C., Tudury, J. M., & Rul-lan, G. (s. f.). Análisis de Series Temporales Usando Redes Neuronales Recurrentes. Recuperado 22 de marzo de 2019, de https://www.apsl.net/blog/2017/06/14/analisis-de-seriestemporales-usando-redes-neuronales-recurrentes/	spa
dc.relation.references	González-Meneses, Y. N., Pedroza-Méndez, B. E., López-Briones, F., PérezCorona, C., & Ramírez-Cruz, J. F. (2014). Implementación del clasificador naive Bayes para la acentuación automática de palabras ambiguas del español. . . ISSN, 9.	spa
dc.relation.references	InternetWorldStats. (2018, septiembre 8). Spanish Speaking Internet Users and Population—Statistics 2018. Recuperado 15 de agosto de 2019, de https:// www.internetworldstats.com/stats13.htm	spa
dc.relation.references	InternetWorldStats. (2019, julio 10). Top Ten Internet Languages in The World— Internet Statistics. Recuperado 15 de agosto de 2019, de https://www.internetworldstats.com/stats7.htm	spa
dc.relation.references	Jamal, N., Mohd, M., & Noah, S. A. (2012). Poetry Classification Using Support Vector Machines.	spa
dc.relation.references	Jockers, M. L., & Witten, D. M. (2010). A comparative study of machine learning methods for authorship attribution. Literary and Linguistic Computing, 25(2), 215-223. https://doi.org/10.1093/llc/fqq001	spa
dc.relation.references	Khan, A., Baharudin, B., Hong Lee, L., & Khan, K. (2010, febrero). A Review of Machine Learning Algorthms for Text-Documents Classification. 1(1). Recuperado de https://s3.amazonaws.com/academia.edu.documents/30773019/jait0101.pdf ? AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1551884637&Sig nature=AlZd%2FGICjpWt2735Mt%2B7Zi83adA%3D&response-contentdisposition=inline%3B%20filename %3DJournal_of_Advances_in_Information_Techn.pdf#page=6	spa
dc.relation.references	Khatiboun, A. F. (2019). Machine learning en ciberseguridad. 50	spa
dc.relation.references	Ko van der Sloot, & Maarten van Gompel. (s. f.-a). MBT. Recuperado 18 de mayo de 2019, de https://languagemachines.github.io/mbt/	spa
dc.relation.references	Ko van der Sloot, & Maarten van Gompel. (s. f.-b). TiMBL. Recuperado 17 de mayo de 2019, de https://languagemachines.github.io/timbl/	spa
dc.relation.references	Ko van der Sloot, & Maarten van Gompel. (s. f.-b). TiMBL. Recuperado 17 de mayo de 2019, de https://languagemachines.github.io/timbl/	spa
dc.relation.references	Koppel, M., & Schler, J. (s. f.). Exploiting stylistic idiosyncrasies for authorship attribution. Recuperado de https://cs.biu.ac.il/~koppel/papers/ijcaiidiosyncrasy-final.ps	spa
dc.relation.references	Krepych, S., & Spivak, I. (2018). Algorithm of Automatic Generation of Hotel Descriptions Using Templates Based on Markov Chains. 2018 International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S T), 257-260. https://doi.org/10.1109/INFOCOMMST.2018.8632149	spa
dc.relation.references	Kumar, V., & Minz, S. (2014). Poem Classification Using Machine Learning Approach. En B. V. Babu, A. Nagar, K. Deep, M. Pant, J. C. Bansal, K. Ray, & U. Gupta (Eds.), Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012 (pp. 675-682). Springer India	spa
dc.relation.references	León, R. A., Furlán, L. R., & Prieto, J. T. (2016). La detección de ansiedad y estrés en el lenguaje escrito mediante procesamiento automatizado por computadora. 86-95	spa
dc.relation.references	Lou, A., Inkpen, D., & Tanasescu, C. (2015). Multilabel Subject-Based Classification of Poetry. The Twenty-Eighth International Flairs Conference. Presentado en The Twenty-Eighth International Flairs Conference. Recuperado de https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS15/paper/view/10372	spa
dc.relation.references	Luyckx, K., & Daelemans, W. (2005, noviembre). Shallow Text Analysis and Machine Learning for Authorship Attribution [Part of book or chapter of book]. Recuperado 6 de marzo de 2019, de LOT Occasional Series website: http://dspace.library.uu.nl/handle/1874/296538	spa
dc.relation.references	Luyckx, K., & Daelemans, W. (2008). Authorship Attribution and Verification with Many Authors and Limited Data. Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1, 513–520. Recuperado de http://dl.acm.org/citation.cfm?id=1599081.1599146	spa
dc.relation.references	Minitab, LLC. (s. f.-a). ¿Qué es ANOVA? [Mtbconcept]. Recuperado 24 de octubre de 2019, de https://support.minitab.com/es-mx/minitab/18/help-and-how-to/ modeling-statistics/anova/supporting-topics/basics/what-is-anova/	spa
dc.relation.references	Minitab, LLC. (s. f.-b). ¿Qué es una prueba de chi-cuadrada? [Mtbconcept]. Recuperado 24 de octubre de 2019, de https://support.minitab.com/es-mx/minitab/18/help-and-how-to/statistics/ tables/supporting-topics/chi-square/what-is-a-chi-square-test/	spa
dc.relation.references	Mitchell, T. M. (1997). Machine Learning. Recuperado de http://profsite.um.ac.ir/~monsefi/machine-learning/pdf/Machine-LearningTom-Mitchell.pdf	spa
dc.relation.references	Moreno, A., Armengol, E., Béjar, J., Belanche, L., Cortés, U., Gavaldà, R., … Sànchez, M. (1994). Aprendizaje automático. Recuperado de http://hdl.handle.net/2099.3/36157	spa
dc.relation.references	Neethu, M. S., & Rajasree, R. (2013). Sentiment analysis in twitter using machine learning techniques. 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 1-5. https://doi.org/10.1109/ICCCNT.2013.6726818	spa
dc.relation.references	Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs Up?: Sentiment Classification Using Machine Learning Techniques. Proceedings of the ACL02 Conference on Empirical Methods in Natural Language Processing Volume 10, 79–86. https://doi.org/10.3115/1118693.1118704	spa
dc.relation.references	Pazzani, M. J., & Billsus, D. (2007). Content-Based Recommendation Systems. En P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The Adaptive Web: Methods and Strategies of Web Personalization (pp. 325-341). https://doi.org/10.1007/978-3-540-72079-9_10	spa
dc.relation.references	Pelechano, V., & Pastor, A. (2005). Neuroticismo y trastornos de personalidad. Análisis y Modificación de Conducta, 31(139). Recuperado de http://rabida.uhu.es/dspace/bitstream/handle/10272/12605/Neuroticismo.pdf ?sequence=2	spa
dc.relation.references	Pereira, J. (2016). Leveraging Chatbots to Improve Self-guided Learning Through Conversational Quizzes. Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality, 911–918. https://doi.org/10.1145/3012430.3012625	spa
dc.relation.references	Pereira-Toledo, A., López-Cabrera, J. D., & Quintero-Domínguez, L. A. (2017). Estudio experimental para la comparación del desempeño de Naïve Bayes con otros clasificadores bayesianos. Revista Cubana de Ciencias Informáticas, 11(4), 67-84	spa
dc.relation.references	Pérez-Planells, Ll., Delegido, J., Rivera-Caicedo, J. P., & Verrelst, J. (2015). Análisis de métodos de validación cruzada para la obtención robusta de parámetros biofísicos. Revista de Teledetección, (44), 55. https://doi.org/10.4995/raet.2015.4153	spa
dc.relation.references	Pérez-Rubido, R. (2013). Una revisión a algoritmos de selección de atributos que tratan la redundancia en datos microarreglos. Revista Cubana de Ciencias Informáticas, 7(4), 16-30.	spa
dc.relation.references	R, J. E. R., F, H. A. B., & M, S. P. B. (2011). Software para el filtrado de páginas web pornográficas basado en el clasificador KNN - UDWEBPORN. Revista Avances en Sistemas e Informática, 8(1), 43-49	spa
dc.relation.references	Rauet Garcia, A. (2019). Big Data aplicado al Marketing (Universitat Politècnica de Catalunya). Recuperado de https://upcommons.upc.edu/bitstream/handle/2117/165595/BigDataAplicado alMarketing_Aleix_Rauet.pdf	spa
dc.relation.references	Rodríguez, Y., Fernández, Y., Bello, R., & Caballero, Y. (2014). Selección de atributos relevantes aplicando algoritmos que combinan conjuntos aproximados y optimización en colonias de hormigas. Revista Cubana de Ciencias Informáticas, 8(1), 79-86	spa
dc.relation.references	Romero, L. A. (s. f.). Redes Neuronales. Recuperado 22 de marzo de 2019, de http://avellano.fis.usal.es/~lalonso/RNA/index.htm	spa
dc.relation.references	RosettaCode. (2019, septiembre 4). Markov chain text generator—Rosetta Code. Recuperado 7 de noviembre de 2019, de https://rosettacode.org/wiki/Markov_chain_text_generator#Functional	spa
dc.relation.references	Rubio Terrés, C. (2000). Introducción a la utilización de los modelos de Markov en el análisis farmacoeconómico. Farmacia Hospitalaria, 24(4), 241-247.	spa
dc.relation.references	Russo, C., Ramón, H., Alonso, N., Cicerchia, B., Esnaola, L., & Tessore, J. P. (2017). Tratamiento Masivo de Datos Utilizando Técnicas de Machine Learning. 131-134	spa
dc.relation.references	Salazar-Serrudo, C., & García-Villalba, J. (s. f.). A Web Searching Agent that Uses Intelligent Techniques. 10.	spa
dc.relation.references	Sarro, L. M. (2009). Compromiso sesgo-varianza. Recuperado de https://canal.uned.es/video/5a6f8828b1111f4c618b45ea	spa
dc.relation.references	Scikit-Learn. (s. f.). Choosing the right estimator—Scikit-learn 0.21.3 documentation. Recuperado 20 de octubre de 2019, de https://scikitlearn.org/stable/tutorial/machine_learning_map/index.html	spa
dc.relation.references	Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Comput. Surv., 34(1), 1–47. https://doi.org/10.1145/505282.505283	spa
dc.relation.references	Sreeja, P. S., & Mahalakshmi, G. S. (2016). Comparison of Probabilistic Corpus Based Method and Vector Space Model for Emotion Recognition from Poems. Recuperado de http://docsdrive.com/pdfs/medwelljournals/ajit/2016/908-915.pdf	spa
dc.relation.references	Stańczyk, U., & Krzysztof A., C. (2007). Machine learning approach to authorship attribution of literary texts. 1(4), 8.	spa
dc.relation.references	tfidf.com. (s. f.). Tf-idf: A Single-Page Tutorial—Information Retrieval and Text Mining. Recuperado 7 de abril de 2019, de http://www.tfidf.com/	spa
dc.relation.references	Tim Jones, M. (2017, octubre 4). Aprendizaje profundo y Caffe, Deeplearning4j, TensorFlow y DDL. Recuperado 22 de marzo de 2019, de http://www.ibm.com/developerworks/ssa/library/cc-machine-learning-deeplearning-architectures/index.html	spa
dc.relation.references	Tong, S., & Koller, D. (2001). Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research, 2(Nov), 45-66.	spa
dc.relation.references	Ugarriza, N. (1999). Neuroticismo, expresiones emocionales y percepción de la violencia en escolares. Revista de la Facultad de Psicología, (2), 79-110.	spa
dc.relation.references	ULLmedia - Universidad de La Laguna. (2014). Representación de documentos mediante TF-IDF. Recuperado de https://www.youtube.com/watch? v=OkSZZ0F7ToA	spa
dc.relation.references	Universidad de Sevilla. (s. f.-a). Capítulo 3—Perceptrón multipaca. Recuperado 7 de abril de 2019, de http://bibing.us.es/proyectos/abreproy/12166/fichero/Volumen+1++Memoria+descriptiva+del+proyecto%252F3+-+Perceptron+multicapa.pdf	spa
dc.relation.references	Universidad de Sevilla. (s. f.-b). Capítulo 4—El perceptrón. Recuperado 7 de abril de 2019, de http://bibing.us.es/proyectos/abreproy/11084/fichero/Memoria+por+cap %C3%ADtulos+%252FCap%C3%ADtulo+4.pdf+	spa
dc.relation.references	Universidad de Sevilla. (s. f.-c). Coeficiente de correlación lineal de Pearson. Recuperado de https://personal.us.es/vararey/adatos2/correlacion.pdf	spa
dc.relation.references	Universitat politècnica de Catalunya. (s. f.). Aprendizaje Automático \| Facultad de Informática de Barcelona. Recuperado 2 de abril de 2019, de Aprendizaje Automático—Facultad de informática de Barcelona website: https://www.fib.upc.edu/es/estudios/grados/grado-en-ingenieria-informatica/ plan-de-estudios/asignaturas/APA	spa
dc.relation.references	Viera, A. F. G. (2017). Técnicas de aprendizaje de máquina utilizadas para la minería de texto. Investigación bibliotecológica, 31(71), 103-126. https://doi.org/10.22201/iibi.0187358xp.2017.71.57812	spa
dc.relation.references	Wilbur, W. J., & Sirotkin, K. (1992). The automatic identification of stop words. Journal of Information Science, 18(1), 45-55. https://doi.org/10.1177/016555159201800106	spa
dc.relation.references	Witten, L. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3.a ed.). USA: Elsevier.	spa
dc.relation.references	Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3, Part 2), 6527-6535. https://doi.org/10.1016/ j.eswa.2008.07.035	spa
dc.relation.references	Zhang, D., & Lee, W. S. (2006). Extracting Key-substring-group Features for Text Classification. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 474–483. https://doi.org/10.1145/1150402.1150455	spa
dc.contributor.cvlac	Martínez Quezada, Daniel Orlando [0000041131]	spa
dc.contributor.cvlac	Ortiz Beltrán, Ariel Orlando [0001459925]	spa
dc.contributor.googlescholar	Ortiz Beltrán, Ariel Orlando [FS1dky4AAAAJ&hl=es&oi=ao]	spa
dc.contributor.orcid	Martínez Quezada, Daniel Orlando [0000-0002-9910-1770]	spa
dc.contributor.orcid	Ortiz Beltrán, Ariel Orlando [0000-0003-1522-2362]	spa
dc.contributor.researchgate	Martínez Quezada, Daniel Orlando [Daniel-Martinez-Quezada]	spa
dc.contributor.researchgate	Ortiz Beltrán, Ariel Orlando [Ariel-Ortiz-Beltran]	spa
dc.subject.lemb	Ingeniería de sistemas	spa
dc.subject.lemb	Innovaciones tecnológicas	spa
dc.subject.lemb	Inteligencia artificial	spa
dc.subject.lemb	Teoría de las máquinas	spa
dc.subject.lemb	Autores	spa
dc.subject.lemb	Procesamiento de datos	spa
dc.identifier.repourl	repourl:https://repository.unab.edu.co	spa
dc.description.abstractenglish	After the notorious boom in Machine Learning applications in recent years, mostly for image and audio processing, there are few applications in the literature area, especially authorship recognition. That is why the question arises, How effective are Machine Learning techniques for the identification of patterns of large volumes of literary texts in the Hispanic American context? Therefore, the objective of this work was to develop an intelligent system for the recognition of literary styles based on works of universal literature in Spanish, to automate the creation of texts that replicate the style of the authors. To carry out the research, a review of the state of the art in Machine Learning techniques was carried out for the problem of text classification and natural language processing. Later 86 works were collected 8 authors' public domain literature, which was preprocessed for the extraction of document inverse frequency features (TF-IDF), which are used to form feature vectors. The proposed Machine Learning models were Naïve Bayes, Support Vector Machine and K-Nearest Neighbors; for the classification, and Markov chains for the text generation, the classification model with the best result being Naïves Bayes with an accuracy of 0.6453125, and the best value of the keysize hyperparameter for the Markov chain of 3. Taking this into account, it is worth highlighting the limitations had in this project due to the Machine Learning models used together with the amount of extracted characteristics, and it is recommended to implement new models trained in the analysis of temporal time series.	spa
dc.subject.proposal	Aprendizaje automático	spa
dc.subject.proposal	Autoría	spa
dc.subject.proposal	Literatura	spa
dc.subject.proposal	Lenguaje natural	spa
dc.subject.proposal	Procesamiento	spa
dc.subject.proposal	Categorización	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TP
dc.rights.creativecommons	Atribución-NoComercial-SinDerivadas 2.5 Colombia	*
dc.coverage.campus	UNAB Campus Bucaramanga	spa
dc.description.learningmodality	Modalidad Presencial	spa

Ficheros en el ítem

Nombre:: 2019_Tesis_David_Elias_Borja.pdf
Tamaño:: 2.718Mb
Formato:: PDF
Descripción:: Tesis

Ver/

Nombre:: 2019_Licencia_David_Elias_Borja.pdf
Tamaño:: 496.6Kb
Formato:: PDF
Descripción:: Licencia

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ingeniería de Sistemas [374]

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 2.5 Colombia