Skip to content
View Arlchoose-code's full-sized avatar

Block or report Arlchoose-code

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Arlchoose-code/README.md
โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•—โ–‘โ–‘โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ•šโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ–‘โ•šโ•โ•โ•โ–ˆโ–ˆโ•—โ–‘โ–‘โ•šโ–ˆโ–ˆโ•”โ•โ–‘โ–‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
โ•šโ•โ•โ•โ•โ•โ•โ–‘โ–‘โ–‘โ–‘โ•šโ•โ•โ–‘โ–‘โ–‘โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•

โ–ˆโ–ˆโ•—โ–‘โ–‘โ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•—โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ•—โ–‘โ–‘โ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘
โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘
โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–‘โ–‘โ•šโ–ˆโ–ˆโ•”โ•โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘
โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–‘โ•šโ–ˆโ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•
โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ–‘โ–‘โ–‘โ•šโ•โ•โ–‘โ–‘โ–‘โ•šโ•โ•โ•โ•โ•โ–‘โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•โ–‘โ•šโ•โ•โ•โ•โ•โ–‘

Building Indonesian Language Intelligence โ€” from scratch.

Linguist ยท AI Engineer ยท Open Source Builder ยท Security-Aware Developer

Website LinkedIn HuggingFace Email Profile Views


๐Ÿ”ญ Currently Working On

status:       "Preparing graduate research applications"
focus:        "Language Technology ร— Computational Sociolinguistics"
next:         "Graduate research in Language Technology & Computational Sociolinguistics"
open_to:      "Collaborators, compute resources, research mentors"
building:     "Aibys2 โ€” next-gen Indonesian LLM (tokenizer ยท training ยท SFT ยท tool calling ยท vision)"
recent:       "Aibys AI tools suite (research, medical, legal, invoice) ยท ArLface Recognition"
learning:     "Sociolinguistics research methodology, academic writing EN"

๐Ÿค Open to Collaborate On

Collaborate

  • ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian NLP โ€” anything that makes Bahasa Indonesia better represented in AI
  • ๐Ÿ˜๏ธ AI for underserved communities โ€” especially rural or low-resource contexts
  • ๐Ÿ”ฌ Low-resource language modeling โ€” training, fine-tuning, evaluation
  • ๐Ÿ” Security-aware AI systems โ€” threat modeling, robust architecture
  • ๐Ÿ“š Language technology research โ€” if you're a researcher looking for a motivated collaborator

Got compute? I've got the pipeline. ๐Ÿ˜„


๐Ÿ‘‹ About Me

I'm Syahril Haryono โ€” an Indonesian developer at an unusual intersection: German linguistics ร— AI engineering ร— grassroots community work.

I started in tech at 13 โ€” not through courses or bootcamps, but by probing the edges of systems: exploring web vulnerabilities, understanding how things break. That early obsession with how systems fail under the hood became the foundation of how I build today: with a deep instinct for failure modes, security implications, and why robust architecture must be designed in from day one โ€” not bolted on as an afterthought.

I studied German Language Education at Universitas Negeri Jakarta, which gave me something most engineers lack: a rigorous understanding of how language is structured, how meaning is encoded, and how communication breaks down across cultures and communities.

Along the way, two experiences left me with questions I still can't stop thinking about.

The first: Building Aibys โ€” an Indonesian LLM from scratch โ€” made me realize how severely underrepresented Bahasa Indonesia is in the global NLP landscape. 270 million speakers, yet most language models treat it as an afterthought. Why does this gap exist, and what does it actually take to close it properly?

The second: Leading a digital transformation program at a rural village in Karawang โ€” training 20โ€“50 locals, watching the platform get abandoned within a year โ€” made me realize that the problem isn't just technical. Why do communities with genuine needs still reject technology built "for" them? What is the actual barrier โ€” and is it linguistic, cultural, or something deeper?

These aren't questions I can answer alone, or with just more coding.

They're the questions I intend to bring into graduate research โ€” and I'm actively looking for the right academic environment to pursue them.


๐Ÿ”ฌ The Aibys Ecosystem

An open-source pipeline for building a Large Language Model for Bahasa Indonesia โ€” entirely from scratch.

Core LLM Stack

Repo What it does
๐Ÿ—ƒ๏ธ Aibys-Data-Collector Collect, clean, shuffle & prepare Indonesian text datasets. Streaming-mode for 50GB+ corpora. Estimated corpus: ~13B tokens.
๐Ÿ—๏ธ Indonesian-LLM-Starter Decoder-only Transformer from PyTorch scratch: RMSNorm ยท RoPE ยท SwiGLU ยท Flash Attention 2 ยท GGUF export.
๐ŸŽฏ Indonesian-LLM-Finetune LoRA fine-tuning pipeline โ€” turn a pre-trained checkpoint into a conversational Bahasa Indonesia assistant.
๐Ÿ”ค aibys-tokenizer BPE tokenizer ยท 32K vocab ยท trained on 10M sentences ยท weighted sampling optimized for Bahasa Indonesia.
โšก Aibys2 Next-gen runnable LLM starter โ€” tokenizer ยท training ยท checkpointing ยท SFT scaffolding ยท tool calling ยท vision dataset support.
Aibys Data Collector  โ†’  Indonesian LLM Starter  โ†’  Indonesian LLM Finetune
  (corpus pipeline)        (pre-training)              (instruction tuning)
         โ†“                       โ†“                            โ†“
  ~13B token corpus    โ†’   aibys_final.pt          โ†’   model siap chat ๐Ÿ‡ฎ๐Ÿ‡ฉ
                                 โ†“
                             Aibys2 (next iteration โ€” SFT ยท tool calling ยท vision)

Current status: Full pipeline functional. Proof-of-concept training completed (20K steps โ†’ coherent Indonesian text generation โœ“). Aibys2 actively in development. Full training pending compute resources.

Aibys AI Tools Suite

Local-first, privacy-preserving AI tools โ€” all powered by Ollama, running fully on your machine.

Repo What it does
๐Ÿ“„ aibys-research-summarizer Turns PDF/TXT research papers into structured plain-language summaries, key results, limitations, follow-up questions, and exportable reports.
๐Ÿฅ aibys-medical-explainer Explains medical reports from PDF/TXT/image uploads, highlights notable results, and saves JSON/CSV/Markdown history.
โš–๏ธ aibys-legal-analyzer Summarizes contracts, highlights risky clauses, scores risk, and saves local JSON/CSV/Markdown reports.
๐Ÿงพ aibys-invoice-extractor Extracts structured data from invoice/receipt PDFs and images. Export to CSV. Vision-powered, runs fully local.

๐Ÿ‘๏ธ Computer Vision

Repo What it does
๐Ÿค– ArLface-Recognition Open-source face recognition system built with FastAPI and Python. Uses AuraFace (ArcFace) for embeddings โ€” all application logic built from scratch. Real-time, OpenCV-powered.

๐ŸŒ Community Work

Separate from my AI projects โ€” but these shaped how I think about who technology is actually built for.

๐ŸŒ Desa Medalsari Digital โ€” Karawang, 2024

Community service project in collaboration with Universitas Negeri Jakarta.

  • Designed and deployed a digital platform for a rural village in Karawang, West Java
  • Conducted a one-week on-site digital literacy training for 20โ€“50 local residents
  • Platform was eventually discontinued โ€” not due to technical failure, but low adoption

This experience raised questions I haven't stopped thinking about: Why does a working platform, with trained users, still get abandoned? Is it the interface? The language? The relevance to their daily lives? Or is it something about how we define "digital readiness" that's fundamentally wrong?

๐ŸŽญ Goethe-Institut Jakarta โ€” Volunteer, 2024

Science exhibition: "UNIVERSUM ยท MENSCH ยท INTELLIGENZ" at Perpustakaan Nasional RI. Assisted visitors exploring interactive installations on AI, the universe, and human intelligence.


๐Ÿ› ๏ธ Tech Stack

AI / ML & NLP PyTorch HuggingFace Transformers SentencePiece LoRA / PEFT Flash Attention 2 GGUF ยท Ollama ยท llama.cpp OpenCV ArcFace / AuraFace Claude API MCP (Model Context Protocol) Microsoft Azure AI Google Cloud Vertex AI Amazon Bedrock

Systems & Backend Python Go Rust PHP Node.js / Bun FastAPI Gin Echo Laravel Express Hono

Frontend React Next.js Vue Nuxt.js TypeScript Tailwind CSS Vanilla JS

Databases PostgreSQL MySQL MongoDB Redis SQLite

Human Languages

Language Level
๐Ÿ‡ฎ๐Ÿ‡ฉ Bahasa Indonesia Native
๐Ÿ‡ฌ๐Ÿ‡ง English Professional working proficiency
๐Ÿ‡ฉ๐Ÿ‡ช Deutsch B2 โ€” studied 3+ years, volunteered at Goethe-Institut Jakarta

๐Ÿ“œ Certifications

๐ŸŸ  Anthropic โ€” 10 certificates
  • Claude 101 ยท Building with the Claude API ยท Claude Code in Action
  • Introduction to Model Context Protocol ยท MCP: Advanced Topics
  • AI Fluency: Framework & Foundations ยท Teaching AI Fluency ยท AI Fluency for Educators
  • Claude with Google Cloud's Vertex AI ยท Claude in Amazon Bedrock
๐Ÿ”ต Microsoft โ€” 5 certificates
  • Foundations of AI and Machine Learning
  • AI and Machine Learning Algorithms and Techniques
  • Microsoft Azure for AI and Machine Learning
  • Advanced AI and Machine Learning Techniques and Capstone
  • Building Intelligent Troubleshooting Agents ยท Full-Stack Developer Capstone
๐Ÿ”ด IBM โ€” 3 certificates
  • Machine Learning with Python
  • Python for Data Science, AI & Development
  • Full Stack Software Developer Assessment
๐ŸŸก Google Cloud โ€” 3 certificates
  • Google Cloud Fundamentals: Core Infrastructure
  • Developing a REST API with Go and Cloud Run
  • Process Documents with Python Using the Document AI API
๐ŸŸ  Amazon ยท ๐ŸŸฃ Duke ยท ๐Ÿ”ต Meta ยท others
  • Amazon: Generative AI in Software Development ยท Full Stack Web Development
  • Duke University: Rust Fundamentals
  • Meta: Programming with JavaScript ยท Version Control ยท Introduction to Front-End Development

โšก Fun Facts

  • ๐Ÿ”“ Started hacking systems at 13 โ€” now I build them with security in mind from day one
  • ๐Ÿ‡ฉ๐Ÿ‡ช Studying German language education while building an Indonesian LLM โ€” yes, both at the same time
  • ๐Ÿ’ป Built a 13B-token corpus pipeline on a laptop that couldn't finish the training run
  • ๐Ÿ˜๏ธ Got a whole village to use a digital platform in one week โ€” watched it die in one year
  • ๐Ÿง  Believes the most interesting problems in AI are not technical โ€” they're linguistic and social
  • ๐Ÿ‘๏ธ Built a face recognition system from scratch because "just use a library" felt like cheating
  • โ˜• Powered by questions that don't have Stack Overflow answers

๐Ÿ—บ๏ธ Origin Story

[2014] โ”€โ”€โ”€โ”€ Age 13. First contact with the internet's underbelly.
    โ”‚        Explored web vulnerabilities, network weaknesses, defacing.
    โ”‚        Not malice โ€” pure curiosity about how systems work.
    โ”‚        โ†’ Gained something no course teaches:
    โ”‚          an instinct for where systems fail,
    โ”‚          and why security must be designed in, not bolted on.
    โ”‚
[2018] โ”€โ”€โ”€โ”€ Channeled that energy into building, not breaking.
    โ”‚        Joined an IT community. Co-founded ByteDevCode.
    โ”‚        Started developing real products for real users.
    โ”‚
[2022] โ”€โ”€โ”€โ”€ Enrolled in German Language Education @ UNJ.
    โ”‚        Studied linguistics, pedagogy, cross-cultural communication.
    โ”‚        โ†’ Language became a new lens: how humans and machines
    โ”‚          communicate โ€” and why they so often fail to.
    โ”‚
[2024] โ”€โ”€โ”€โ”€ Volunteered at Goethe-Institut Jakarta (UNIVERSUMยทMENSCHยทINTELLIGENZ).
    โ”‚
    โ”‚        Led digital transformation at Desa Medalsari, Karawang.
    โ”‚        Built the platform. Trained 20โ€“50 locals in one week.
    โ”‚        Platform abandoned within a year.
    โ”‚        โ†’ Left with more questions than answers.
    โ”‚          That discomfort became a research direction.
    โ”‚
[2025] โ”€โ”€โ”€โ”€ Started building Aibys โ€” Indonesian LLM from scratch.
    โ”‚        Trained BPE tokenizer (32K vocab, 10M sentences).
    โ”‚        Built ~13B-token corpus pipeline.
    โ”‚        First training run: 20K steps โ†’ coherent Indonesian text โœ“
    โ”‚        โ†’ More questions: why is Bahasa Indonesia so underrepresented
    โ”‚          in global NLP? What would it take to change that?
    โ”‚
[2026] โ”€โ”€โ”€โ”€ Open-sourced the full Aibys ecosystem.
    โ”‚        Built Aibys2: next-gen LLM starter with tool calling & vision.
    โ”‚        Shipped Aibys AI tools suite:
    โ”‚          research summarizer ยท medical explainer ยท
    โ”‚          legal analyzer ยท invoice extractor
    โ”‚        Built ArLface Recognition โ€” open-source face recognition
    โ”‚          from scratch with ArcFace embeddings.
    โ”‚        Certifications: Anthropic ยท Microsoft ยท IBM ยท Google ยท Amazon
    โ”‚
[NEXT] โ”€โ”€โ”€โ”€ The questions accumulated.
             Solo projects and self-study can only go so far.
             The next step is finding the right research environment
             to investigate them properly โ€” and the right people
             to investigate them with. ๐Ÿ‡ฎ๐Ÿ‡ฉ

๐Ÿ Contribution Activity

snake gif


"I learned how systems break before I learned how to build them. That's not a detour โ€” that's the foundation."

โ€” Syahril Haryono ยท Bogor, Indonesia ๐Ÿ‡ฎ๐Ÿ‡ฉ

arlab.my.id

Pinned Loading

  1. ArLface-Recognition ArLface-Recognition Public

    ArLface Recognition is an open-source face recognition system built with FastAPI and Python. It uses the AuraFace (ArcFace) pretrained model only for face embeddings, while all application logic, bโ€ฆ

    HTML 2

  2. Indonesian-LLM-Starter Indonesian-LLM-Starter Public

    A starter kit for building your own Indonesian Large Language Model (LLM) from scratch โ€” architecture, data pipeline, tokenizer, and training loop included.

    Python 4

  3. Aibys2 Aibys2 Public

    A runnable from-scratch LLM starter with tokenizer, training, checkpointing, SFT scaffolding, tool calling, and vision dataset support.

    Python 1

  4. Aibys-Data-Collector Aibys-Data-Collector Public

    Tools to collect, clean, shuffle, and prepare Indonesian text datasets for LLM pre-training.

    Python 1