Applied AI Scientist and AI Architect working across enterprise speech AI, multilingual AI, LLM systems, agentic AI, knowledge graphs, retrieval-based AI, evaluation systems, and AI architecture.
- 👋 Hi, I'm Tatjana Chernenko
- 📫 Contact: tatjana.chernenko.work@gmail.com
- Applied AI Scientist and AI Architect with work spanning enterprise speech AI, multilingual AI, applied NLP, LLM systems, agentic AI, RAG, knowledge graphs, evaluation and benchmarking systems, terminology intelligence, and AI-ready data architecture.
- My public GitHub contains a selective set of personal, academic, and research-oriented technical artifacts. Most recent enterprise work is not public due to confidentiality and employer constraints.
- Speech AI, multilingual AI, and language technologies
- Applied NLP, terminology intelligence, and specialised-vocabulary handling
- LLM systems, GenAI
- Retrieval-augmented generation (RAG)
- Agentic AI workflows
- Knowledge graphs, knowledge-enhanced AI, and workflow automation
- Evaluation, benchmarking, and reliability-oriented AI quality systems
- Enterprise AI architecture, data foundations, and governance-aware AI execution
Practitioner writing on AI evaluation, benchmarking, and enterprise speech AI — published on Hugging Face.
-
Representativeness Before Metrics: Rethinking AI Evaluation for Deployment [HuggingFace link pending] What enterprise speech AI evaluation reveals about benchmark reliability — and why the lessons reach further than speech. Argues that weak benchmark representativeness, not metric design, is the primary bottleneck between benchmark success and deployment confidence.
-
[When Benchmarks Saturate: Ecological Validity in AI Evaluation] [HuggingFace link pending] Why discriminative power, behavioral realism, and decision relevance matter more as systems improve. Examines how saturation and weak ecological validity compound each other — and what evaluation surfaces need to do to remain informative.
-
LREC 2026: A Dataset for Evaluating ASR on Specialized Vocabulary Emily Haubert Klering, Eduardo Gabriel Cortes, Tatjana Chernenko, Mariana Vargas Trarbach, Gabriel de Oliveira Ramos, Sandro José Rigo, Maitê Dupont, Ana Luiza Treichel Vianna, Gabriela Krause dos Santos, Vinicius Meirelles Pereira, Denis Andrei de Araujo, Rafael Kunst SAP–UNISINOS research collaboration. Focused on evaluation methodology and benchmark design for specialised-vocabulary robustness in enterprise ASR.
-
US Patent: Semantic Domain Assignment Referencing Governance Domains and Term Databases T. Chernenko, B. Schork, M. DANEI — US Patent 12,518,105 (2026)
-
US Patent Application: Adaptive Fidelity Pipeline for Minimizing Hallucinations and Skipped Content in Speech-to-Text Systems US Patent App. 250089US01 (2025) — [link pending]
-
US Patent: System and Method Performing Terminology Disambiguation T. Chernenko, B. Schork, M. DANEI — US Patent 12,386,820 (2025)
-
US Patent: Detection of Abbreviation and Mapping to Full Original Term T. Chernenko, A. Snitko, J. Scharnbacher, M. Vasiltschenko — US Patent 12,067,370 (2024)
-
CHERTOY: Word Sense Induction for Web Search Result Clustering Academic NLP research project at the Institute for Computational Linguistics, Heidelberg University, based on the SemEval-2013 WSI task. Built an unsupervised word sense induction pipeline for clustering ambiguous web-search snippets into semantically coherent subtopic groups using sense2vec word representations, vector-mixture bag-of-words snippet embeddings, and MeanShift clustering; evaluated 40 controlled experimental variants across preprocessing, embedding models, compositional representations, and clustering algorithms, improving pairwise clustering quality over baseline. GitHub: CHERTOY System
-
Natural Language Generation from Structured Inputs for Image Description Generation Academic research project at the Institute for Computational Linguistics, Heidelberg University, on structured-to-text generation for image description. Built an encoder-decoder architecture with a feed-forward encoder over normalized attribute vectors and an LSTM decoder for sequence generation, using MS COCO, V-COCO, and COCO-a to model objects, actions, semantic roles, spatial relations, and descriptive attributes under automatic and human evaluation. GitHub: Data-to-text Generation
-
LexRank-based Text Summarization with Semantic Similarity Enhancements Research project on extractive summarization extending LexRank with semantic-similarity features to improve sentence ranking and summary quality in longer documents. GitHub: Text Summarization with LexRank
Selected older repositories in areas such as predictive maintenance, anomaly detection, reinforcement learning, speech adaptation, and data augmentation remain available in the profile history as secondary technical artifacts.