26 Days of C at 42 Piscine: What Can You Actually Build After That? | Part I

9 min readFeb 16, 2025

In January, I plunged headfirst into an unforgettable experience at the @42Porto Piscine — 26 days of coding marathons, 10 hours a day, 7 days a week, building projects in C. It was paradoxically intense in many ways. Now that it’s over, I’ve been eager to apply what I learned, with one question burning in my mind: “What can I build that’s both challenging and practical?”

One week later, I created something genuinely exciting. It’s simple but born entirely from my own vision — and I’m proud of it.

As a cybersecurity enthusiast, I’ve spent countless hours simulating real-world scenarios: ethical hacking, social engineering, and data collection. But there’s a catch: the cybersecurity landscape is a sea of fragmented tools. Want to enumerate subdomains? You’ll need Amass for depth, Sublist3r for speed, and a dozen others for edge cases. Each tool has quirks — API keys, resource hogging, traffic blocks — and integrating them feels like herding cats. Compatibility issues abound, and choosing the right tool for specific needs often leads to overwhelm and wasted time. So, I decided to build my own subdomain enumeration tool in C.

Why C? While Python tools dominate the space, C offers raw performance and a deeper understanding of low-level mechanics. Plus, I wanted to push the boundaries of what I’d learned at 42.

“Oh, why don’t you just use aggregation tools that combine the most popular subdomain enumeration tools, or build an aggregator? Like a normal people?”

Because it’s not fun! And I don’t want to! I want to see how far I can go by building my own functional program in C, with my own libraries and functions. Sure, there are countless Python tools and tutorials for building this kind of thing in Python, but almost nothing about how to do it in C. So, let’s try something new.

First of all, what the heck is subdomain enumeration?

It’s the process of discovering subdomains (like mail.google.com) associated with a root domain. This is a critical step for ethical hackers to map an organization’s attack surface, uncovering hidden services or potential vulnerabilities.

Does it sound complicated? Absolutely. But I’m not aiming to build a complete tool with all the bells and whistles — at least not yet. I’m starting with an MVP (Minimum Viable Product). This project is all about that. I’ll walk you through how I outlined the MVP features, made my technical choices, structured my workflow, and thought through the entire process.

How did I built it?

First, I needed to outline the MVP features. The simplest approach is to read subdomains from a wordlist and check their DNS records. That’s it. No frills, no extras — just the core functionality. Essentially, I wanted to build something like this:

``` mermaid
graph TD
    A[Start] --> B[Read Wordlist]
    B --> C[Generate FQDN]
    C --> D[Check DNS Records]
    D --> E[Display Valid Subdomains]
    E --> F[End]
```

Hint: You can paste the following code into a Mermaid.md interpreter to visualize the workflow.

Since the explanation will be extensive, I’ve divided the post into two parts. In this first part, I’ll outline the solution and explain the first function, which deals with file handling. I’ll also detail the reasoning behind my technical choices.

For more details about any other functions, “Google is your friend”. Those who understand will understand.

What’s a Wordlist?

A wordlist is simply a list of strings that represent possible subdomains (e.g., "ftp", "mail", "admin", etc.). Think of it as a master key used to test all the locks in a house. It’s the most basic approach for enumeration and a great starting point because it allows me to begin with simple operations without needing complex algorithms. Plus, well-known tools like Sublist3r and Amass also start with wordlists.

So, the first step is to build a function that reads a file. This function should:

Return an array of strings (a pointer to a pointer to char), where each string is a word from the wordlist (e.g., "mail", "api").
Ensure the filename isn’t modified within the function (using const).
Store the number of words read from the file.

With these requirements in mind, here’s what we’re aiming for:

read_wordlist()

char **read_wordlist(const char *filename, int *count);

We can break this section into a few parts:
1. File opening;
2. File reading;
3. File closing; and
4. Return of Array of Strings.

fopen()

FILE *file = fopen(filename, "r");
if (!file)
{
    perror("Error opening file");
    exit(EXIT_FAILURE);
}

First of all, we need to open the wordlist file `(fopen()) in read mode (“r”).

Good Practice: ERROR verification
If the file do not exists nor possible to be opened, fopen() return NULL. perror() displays a descriptive messenger “Error opening file”, then stop the program with an error code, avoiding continue with an invalid file.

Now, we can initializing the variables:

char **words = NULL;
char buffer[256];
*count = 0;

Since we don’t have memory allocated yet, we initializing the array of strings with NULL value. But why NULL? To avoid memory garbage and to allow realloc() to work correctly on the first iteration. For this MVP, I define a temporary buffer to allocate each new line read from the file. 256 is enough for most of subdomains. Then, we initialize the string counter pointer with 0 (zero).

fgets()

while (fgets(buffer, sizeof(buffer), file))
{
 buffer[strcspn(buffer, "\n")] = 0;

 char **temp = realloc(words, (*count + 1) * sizeof(char *));
 if (temp == NULL)
 {
  printf("Error reallocating memory!\n");
  exit(1);
 }
 words = temp;
 
 words[*count] = malloc(strlen(buffer) + 1);
 strcpy(words[*count], buffer);

 (*count)++;
}

Before explain these loop it’s important explain why I used fgets() instead fscan(). fgets function is safer than fscan() to read entire line from a file (or input) until it encounters a newline (\n) or reaches the specified maximum length. This ensures that whitespace and tabs are included in the reading, which is essential for processing full text or multi-word lines. fgets() also allows you to specify the maximum buffer size, avoiding buffer overflow, which is a common problem in functions like gets(). The fscanf function, otherwise, is designed to read formatted data. It stops reading when it encounters a whitespace, tab, or line break, and it can be dangerous if not used carefully, as it does not automatically limit the number of characters read, which can lead to vulnerabilities.

Now, about the while loop, I like to read this way:

“As long as the fgets() function can read a line from the file and store it in the the buffer array, with the maximum size of sizeof(buffer), continue executing the code block inside the while.”

It is a good exercise if you have difficult to think and visualize the code (like me). If you want to be more objective you can read:
“The fgets function reads a text line from file, then store these read line into the buffer array.”

strcspn()
The line `buffer[strcspn(buffer, “\n”)] = 0;` is quite important because it ensure that the string read from file is correctly formatted, with no newline (\n) character at the end, and replaces it with the null character (\0), which indicates the end of the string in C. This is effectively removes the newline character from the string. But (again) why this is important here?

When you read a line from a file with fgets(), the function adds a newline character (\n) to the end of the string (if the line fits in the `buffer`). This newline character can cause problems when you manipulate the string later, such as in comparisons, concatenations, or when storing the string in a data structure. For instance, if the file contains the lines “hello\n” and “world”, fgets() will store “hello\n” in the `buffer`. Without removing the \n, the string would be treated as “hello\n” instead of “hello” and the output will be:

hello
world

This is especially important when working with functions like strcpy(), strcmp(), or when storing the string in a string array (which is precisely what we are doing here). Also, if you don’t remove the \n, the string length `(strlen(buffer))` will include the newline character, which may lead to allocating more memory than necessary.

realloc()
The realloc function is used to resize a memory block previously allocated with malloc() or calloc(). It takes two arguments:
— The pointer to the original memory block (`words`).
— The new desired size `((*count + 1) * sizeof(char *))`.
>> `*count` is the current number of strings stored in the `words` array;
>> `*count + 1` increases the size of the array by 1, to accommodate a new string;
>> `sizeof(char *)` is the size of a pointer to char (i.e., the size of one element of the `words` array).
The result of realloc() is assigned back to `words`. This is necessary because array `words` is dynamic, that is, it grows as new strings are read from the file. Without realloc(), we would need to know the exact number of strings in advance, which is not always possible. This allows us to increase the size of the array only when necessary, avoiding wasting memory. Therefore realloc() will reallocate memory to a new address, and the original pointer (`words`) needs to be updated.

Good Practice: Verifying realloc()
realloc() may fail and return NULL. If this happens, the original pointer (`words`) will be lost, causing a memory leak. Also, it is necessary to use a temporary pointer `(**temp)`to check whether realloc() was successful before updating words.

malloc(), strlen() and strcpy()
Ok, since the array `words` is dynamic we need to allocate memory for each new string and copy the contents of the buffer to the array `words`. This is the job of malloc function, dynamically allocate memory to store the string contained in the `buffer` variable. The malloc() ensures that each string has exactly the space needed for its contents, avoiding wasted memory.

words[*count] = malloc(strlen(buffer) + 1);

`strlen(buffer) + 1` returns the length of the string in the `buffer` variable (not counting the null character \0). The + 1 is to reserve space for the null character (\0), which marks the end of the string in C.

In other words:
”Dynamically allocate memory to store a string the size of the current `buffer` contents plus one, and assign the address of that memory to the element at position `*count` of the `words` array.”

`strcpy(words[*count], buffer);`
The strcpy (string copy) function copies all the characters of the string in the buffer (including the null character \0) to the location pointed to by `words[*count]`. Without strcpy(), the memory allocated for `words[*count]` would be empty or contain garbage.

`(*count)++;`
After that, we increase the pointer to integer which holds the number of strings stored in the array `words`.

At the end, we release the resources associated with the file to avoid file descriptors (`fclose(file);`), and return the array `words` updated.

Now, you must be wondering where the free function is. Don’t worry, I left it in the main function.

After execution, the memory will look like this:

words →  [0] → "www\0"
         [1] → "mail\0"
         [2] → "ftp\0"
         …

Questions for Reflection
— What happens if the file has a line longer than 256 characters?
— How could we improve error handling?
— Why don’t we use calloc() instead of malloc()?

That’s All for Now!

This wraps up Part I, where I’ve outlined the MVP, explored the concept of wordlists, and started building the foundation for our subdomain enumeration tool. But this is just the warmup!

In Part II, we’ll focus on essential next steps:

Implementing DNS resolution to check subdomain validity.
Error handling: Gracefully handle file read failures and DNS timeouts.
User input validation: Ensure valid domains and wordlist formats.
Measuring execution time to optimize performance.
Exploring memory management techniques to handle large datasets efficiently.
Adding colorful output for better readability and user experience. (yep, it’s matter too)

We’ll also discuss (not execute) potential enhancements for our v1.0 and future roadmap.

Parallel DNS checks: Speed up enumeration with multi-threading.
Wildcard DNS detection: Avoid false positives from catch-all DNS configurations.
Progress tracking: Add a simple progress bar for large wordlists.
Output formatting: Save results to a file for later analysis.
Timeout handling: Prevent hangs during DNS resolution.

Why these?

Practical: Address common pain points in subdomain tools.
Achievable: Build on the existing C foundation without overcomplicating.
Impactful: Transform the MVP into a usable tool for real-world scenarios.

Want to see how we implement these or contribute? Check out the code in GitHub Repo.

had-nu - Overview

Master's Degree Student in Information Systems | AppSec & Cybersecurity Researcher | Threat Intell, Social Engineering…

github.com