Skip to content

csng98/get_next_line

Repository files navigation

This project has been created as part of the 42 curriculum by csekakul.

Get Next Line: Dynamic Stream Parsing via Static State Contexts πŸ“‘

Welcome to get_next_line, a C programming project focused on implementing a custom function that reads and extracts exactly one line at a time from a specific file descriptor stream.

πŸš€ Project Overview

The core objective of this project is to develop an efficient line-reading engine. Because standard system calls slice data by arbitrary byte boundaries rather than text formatting, this function intercepts raw buffer chunks, isolates individual line records, and safely manages leftovers between consecutive function executions.

πŸ“‚ Project Structure

File Name Description
πŸ“„ get_next_line.h Central header file containing mandatory prototypes and structure definitions.
πŸ“„ get_next_line.c Core logic functions handling line parsing, state checking, and standard reading loops.
πŸ“„ get_next_line_utils.c Essential utility helper functions (string concatenation, character allocation, etc.).
πŸ“„ get_next_line_bonus.h Expanded header file supporting infrastructure for multiple file descriptors.
πŸ“„ get_next_line_bonus.c Multi-FD optimized core parsing loop employing state-isolation variables.
πŸ“„ get_next_line_utils_bonus.c Helper functions explicitly tailored and named for the bonus files.
πŸ“„ README.md Architectural overview, technical analysis, and system documentation.

Description

get_next_line is a C project where you implement a function that reads one line at a time from a file descriptor.

char *get_next_line(int fd);

The function must:

  • Return the next line including the \n (if present)
  • Return the last line even if it does not end with \n
  • Return NULL when EOF is reached or an error occurs
  • Work with any BUFFER_SIZE
  • Handle very large lines correctly

This project introduces:

  • Static variables
  • File descriptors
  • The read() system call
  • Memory management
  • Persistent state between function calls

How It Works

Since read() does not read β€œlines” but only raw bytes, we must:

  1. Read chunks of size BUFFER_SIZE
  2. Store them in persistent memory (called a stash)
  3. Stop reading once a \n is found
  4. Extract one line
  5. Keep leftover characters for the next function call

Core Concepts

File Descriptors

When you open a file:

int fd = open("file.txt", O_RDONLY);

You receive a file descriptor (an integer).
Each descriptor has its own internal cursor (offset).


read()

ssize_t read(int fd, void *buf, size_t nbyte);
  • Reads up to nbyte bytes
  • Moves the file offset forward
  • Returns:
    • > 0 β†’ bytes read
    • 0 β†’ EOF
    • -1 β†’ error

read() remembers where it stopped thanks to the file descriptor offset.


Static Variables

A static variable:

  • Is initialized only once
  • Keeps its value between function calls
  • Lives until program termination

Example:

int counter(void)
{
    static int i = 0;
    i++;
    return (i);
}

This is how get_next_line remembers leftover data between calls.


Why We Need a β€œStash”

Example with BUFFER_SIZE = 4
File content:

Hello\n

read() behaves like:

"Hell"
"o\n"

Without storing previous reads, we would lose data.

So we use a static stash to accumulate content until a full line is available.


My Implementation Approach

There are two common approaches.


Option A β€” Stash as a Single String (My final approach)

static char *stash;

You:

  • Append buffer to stash
  • Search for \n
  • Extract line
  • Keep the remainder

Pros

  • Simple design
  • Easy to understand
  • One growing string
  • One extraction
  • One cleanup

Cons

  • Multiple reallocations
  • Repeated copying of large strings

Option B β€” Stash as a Linked List (This was my first approach)

typedef struct s_gnl_list
{
    char *content;
    struct s_gnl_list *next;
}   t_gnl_list;

Instead of continuously reallocating a large string:

  • Each read() creates a new node
  • Nodes are linked together
  • When building the final line:
    • Calculate exact length
    • Allocate once
    • Copy data once

Why I wated to choose this first:

  • Fewer large reallocations
  • Potentially faster for long lines
  • Good practice with linked lists
  • Better memory control

Edge Cases Handled

  • Empty file
  • One-character file
  • No newline at EOF
  • Very long lines
  • BUFFER_SIZE = 1
  • Invalid file descriptor
  • read() returning -1
  • (Bonus) Multiple file descriptors

Compilation

Using Makefile

make

Clean object files

make fclean

Manual compilation

cc -Wall -Wextra -Werror -D BUFFER_SIZE=42 main.c get_next_line.c get_next_line_utils.c

If using a static library:

cc -Wall -Wextra -Werror -D BUFFER_SIZE=42 main.c get_next_line.a

Instructions

Example main.c

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "get_next_line.h"

int main(void)
{
    int		fd = open("test.txt", O_RDONLY);
    char 	*line;

    if (fd < 0)
	{
    	printf("Invalid file descriptor.\n");
		return (1);
	}
    line = get_next_line(fd);
	while (line != NULL)
	{
		printf("line -> %s", line);
		free(line);
		line = get_next_line(fd);
	}
	close(fd);
    return (0);
}

Example main.c with command line arguments

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include "get_next_line.h"

int	main(int argc, char **argv)
{
	int		fd;
	char	*line;
	int		i = 1;

	if (argc != 2)
	{
		printf("Usage: ./a.out file\n");
		return (1);
	}
	fd = open(argv[1], O_RDONLY);
	if (fd < 0)
	{
		perror("open");
		return (1);
	}

	while ((line = get_next_line(fd)))
	{
		printf("Line %d: %s", i++, line);
		free(line);
	}
	close(fd);
}

Example main.c for multiple file descriptors

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include "get_next_line.h"

int main(void)
{
    int		fd1;
	int		fd2;
    char	*line1 = NULL;
    char	*line2 = NULL;

    fd1 = open("file1.txt", O_RDONLY);
    fd2 = open("file2.txt", O_RDONLY);
    if (fd1 < 0 || fd2 < 0)
    {
        perror("open");
        return 1;
    }
    while (1)
    {
        line1 = get_next_line(fd1);
        line2 = get_next_line(fd2);
        if (!line1 && !line2)
            break;
        if (line1)
        {
            printf("FD1: %s", line1);
            free(line1);
        }
        if (line2)
        {
            printf("FD2: %s", line2);
            free(line2);
        }
    }
    close(fd1);
    close(fd2);
    return 0;
}

Test Different BUFFER_SIZE Values

cc -Wall -Wextra -Werror -D BUFFER_SIZE=1  main.c get_next_line.c get_next_line_utils.c
cc -Wall -Wextra -Werror -D BUFFER_SIZE=42 main.c get_next_line.c get_next_line_utils.c
cc -Wall -Wextra -Werror -D BUFFER_SIZE=9999 main.c get_next_line.c get_next_line_utils.c

Memory Leak Check

Using Valgrind:

valgrind --leak-check=full --show-leak-kinds=all ./a.out

You want:

  • No leaks
  • No invalid reads/writes

Libraries Used

In get_next_line.h:

  • <unistd.h> β†’ read()
  • <stdlib.h> β†’ malloc(), free()

In testing:

  • <fcntl.h> β†’ open()
  • <stdio.h> β†’ printf()
  • <unistd.h> β†’ close()

Resources


AI Usage

AI tools were used only for:

  • Improving documentation clarity
  • Reorganizing structure
  • Explaining theoretical concepts

All implementation decisions, debugging, and testing were done manually.


Final Result

get_next_line:

  • Efficiently reads one line at a time
  • Manages memory safely
  • Handles edge cases
  • Works with any buffer size
  • Supports multiple file descriptors

About

πŸ“‘ A custom C implementation of a stream-parsing engine, reading text files line-by-line using persistent static memory state tracking and multi-fd isolation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages