Secure Text Input

C is a bit of a tricky language. You really need to understand at least a little about how memory works — otherwise, you're flying blind. Take pointers, for example: they're completely unusable if you don't know what's going on under the hood... and if you try anyway, well, things can get pretty wild o_O

One of the trickiest aspects of the language is reading text input. You've seen the scanf function, which we used early in this course. You're probably thinking, "What could be simpler and more straightforward?"

Well, guess what — it's anything but simple. I deliberately glossed over the tricky parts when introducing it, just to keep things accessible. But in reality, this stuff can get complex fast.

Why? What is complex about reading user input?

Because the person using your program is a human — and humans make mistakes and behave unpredictably. If you ask, "How old are you?" what's stopping them from replying, "STFU SiteRaw is the best website on the Internets"?

The goal of this chapter is to walk you through the problems you might encounter using scanf (especially security issues) and to show you a much safer alternative: the fgets function.

Make sure you've fully grasped the chapter on strings before diving into this one.

The limits of scanf

The scanf() function, which I introduced at the start of the course, is a double-edged sword:

It's easy to use when you're starting out (which is why I showed it to you)...
...but its inner workings are complex, and it can even be dangerous in certain cases.

Sounds contradictory, right? :D

In reality, scanf seems easy to use, but it's not so simple in practice. Let me show you its limits through two real-world examples.

1. Entering a string with spaces

Let's say you ask the user to enter a string, and they type something with a space in it:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    char name[20] = {0};
    printf("What is your name? ");
    scanf("%s", name);
    printf("Ah! So your name is %s!\n\n", name);
    return 0;
}

Output:

What is your name? SiteRaw Boss
Ah! So your name is SiteRaw!

Where did "Boss" go?

Well, scanf stops reading as soon as it hits a space, tab, or newline. So you can't read in a full string if it contains spaces.

Actually, the word "Boss" is still in memory — in what's called the buffer. The next time you call scanf, it'll read "Boss" automatically, since it's still sitting there in memory, waiting.

It is possible to make scanf read spaces, but it's a bit tricky.

2. Entering a string that's too long

Now here's a much more serious issue: memory overflow.

Let's take the exact same code as above, but tweak it slightly:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    char name[5] = {0};
    printf("What is your name? ");
    scanf("%s", name);
    printf("Ah! So your name is %s!\n\n", name);
    return 0;
}

This time, we've allocated 5 characters for our name array. That means we can store 4 characters — the last slot is reserved for the null terminator \0.

If you've forgotten this part, definitely go review the chapter on strings.

What happens if you enter more characters than the array can hold?

Input:

What is your name? SiteRaw
Ah! So your name is SiteRaw!

Looks fine, right? But what you've just seen is a programmer's nightmare. What just happened is called a buffer overflow.

We had allocated 5 characters for the name, but we actually needed 8. So what did scanf do? It just kept writing past the end of the array like it was no big deal! It spilled over into memory zones that weren't meant to be written to.

The extra characters overwrote other things in memory. That's why we call it a buffer overflow.

Why is that dangerous?

Without going into too much technical detail (it gets complicated, and that's not the focus of this chapter), just know this: if a program doesn't guard against overflows, a user can write whatever they want into memory.

In some cases, they can even inject executable code and trick the program into running it. This is the infamous buffer overflow attack — a well-known, though difficult-to-execute, hacking technique.

Our goal in this chapter is to make our input handling secure — to prevent buffer overflows. Sure, you could allocate a huge buffer (like 10,000 characters), but that doesn't fix the problem. A determined attacker could just input more than 10,000 characters and still succeed.

As silly as it sounds, many programmers just didn't bother with this in the past. If they had done things right from the start, a good chunk of today's "security vulnerabilities" wouldn't even exist.

Reading a String Safely

There are several standard C functions that can read a line of text. Aside from scanf (which is too complex to cover in full here), you've also got:

gets: reads an entire string, but it's extremely dangerous because it doesn't prevent buffer overflows!
fgets: does what gets does, but safely — it lets you control how many characters are written to memory.

As you can imagine: despite being a standard C function, gets is downright dangerous. Any program that uses it is vulnerable to buffer overflow attacks.

So let's look at how fgets works and how to use it in real programs to replace scanf.

The fgets function

Here's the function prototype for fgets (from stdio.h):

char *fgets(char *str, int num, FILE *stream);

Let's break that down:

str: a pointer to a memory-allocated array where fgets will store the user's input.
num: the size of the str buffer passed as the first argument.
Keep in mind: if you allocate a 10-char array, fgets will only read up to 9 characters (it always saves one slot for the \0 terminator).
stream: a pointer to the file you're reading from. In this case, the "file" is the standard input — your keyboard. To read from standard input, pass the stdin pointer, which is automatically defined in C's standard headers.

That said, fgets can also be used to read from actual files, like we saw in the chapter about file handling.

fgets returns the same pointer as str if everything goes well, or NULL if there was an error. So you can just check the return value to detect issues.

Let's give it a try:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    char name[10];
    printf("What is your name? ");
    fgets(name, 10, stdin);
    printf("Ah! So your name is %s!\n\n", name);
    return 0;
}

Output:

What is your name? SiteRaw
Ah! So your name is SiteRaw
!

It works perfectly — except for one small detail: when you hit "Enter", fgets keeps the newline character (\n) that comes from pressing the Enter key. That's why there's a blank line after "SiteRaw" in the output.

There's nothing you can do to stop fgets from writing that \n. That's just how it works. But nothing's stopping us from writing our own wrapper function that calls fgets and automatically removes the newline every time!

Creating Your Own Input Function

It's actually not that hard to write your own little input function — one that can automatically fix things up for you each time.

We'll call this function read (original). It will return 1 if everything went well, and 0 if there was an error.

Removing the newline character "\n"

Our read function will call fgets, and if everything works correctly, it'll look for the newline character \n using the strchr function — you should already be familiar with that one. If a \n is found, it gets replaced with a \0 (end of string), so we don't keep the Enter keypress in the string.

Here's the step-by-step, commented code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h> // Don't forget to include string.h for strchr()
int read(char *string, int length)
{
    char *entryPosition = NULL;
    // Read the text typed on the keyboard
    if (fgets(string, length, stdin) != NULL)  // If input is successful
    {
        entryPosition = strchr(string, '\n'); // Look for newline
        if (entryPosition != NULL) // If we find it
        {
            *entryPosition = '\0'; // Replace newline with \0
        }
        return 1; // Return 1 if everything went fine
    }
    else
    {
        return 0; // Return 0 if there was an error
    }
}

Notice how I'm calling fgets directly inside an if — that saves me from storing the result in a pointer just to check if it's NULL.

Once that first if runs, I already know whether fgets worked or not (e.g., maybe the user typed more characters than allowed).

If all goes well, I then search for the \n using strchr and replace it with a \0.

Having two \0s in a row is no big deal. The computer stops at the first one and treats it as the end of the string anyway.

The result? It works! :D

int main(int argc, char *argv[])
{
    char nom[10];
    printf("What's your name? ");
    read(nom, 10);
    printf("Ah! So your name is %s!\n\n", nom);
    return 0;
}

Output:

What's your name? SiteRaw  
Ah! So your name is SiteRaw!

Emptying the buffer

But we're not out of the woods yet. We haven't looked at what happens if the user tries to type more characters than we have space for!

Example input:

What's your name? SiteRaw The Best Website On The Internets  
Ah! So your name is SiteRaw T!

Since fgets is secure, it stopped reading after the 9th character — because we only allocated an array of 10 char (don't forget the \0 at the end, which takes up the 10th slot).

The problem is, the rest of the string — "he Best Website On The Internets" — hasn't disappeared! It's still sitting in the buffer. The buffer is basically a special memory area that temporarily holds the keyboard input. It acts as a bridge between the keyboard and your variable.

In C, you have a pointer to this buffer — it's the stdin I mentioned earlier!

When the user types on the keyboard, the operating system (like Windows) copies that input into the stdin buffer. This buffer just temporarily holds the text.

The job of fgets is to grab as much as it can from the buffer and copy it into the array you provide. Once fgets has done its copying, it clears out everything it was able to copy from the buffer.

So, if it managed to copy everything from the buffer, the buffer is empty afterward.

But if the user types too much, and fgets can only fit part of it into your array (because you only allocated 10 char), then only the read part is cleared. The rest stays in the buffer!

Let's try it with a long string:

int main(int argc, char *argv[])
{
    char nom[10];
    printf("What's your name? ");
    read(nom, 10);
    printf("Ah! So your name is %s!\n\n", nom);
    return 0;
}

Output:

What's your name? SiteRaw The Best Website On The Internets  
Ah! So your name is SiteRaw T!

Just as expected, fgets could only copy the first 9 characters. But the rest of the text is still in the buffer!

That means if you call fgets again, it'll read whatever is still sitting in the buffer!

Try this code:

int main(int argc, char *argv[])
{
    char nom[10];
    printf("What's your name? ");
    read(nom, 10);
    printf("Ah! So your name is %s!\n\n", nom);
    read(nom, 10);
    printf("Ah! So your name is %s!\n\n", nom);
    return 0;
}

We're calling the read function twice. But as you'll see, you won't get to type your name a second time — because fgets just grabs what was left over in the buffer!

Output:

What's your name? SiteRaw The Best Website On The Internets Ah! So your name is SiteRaw T!

Ah! So your name is he Best W!

When the user types too many characters, fgets does protect us from memory overflow — but the leftover characters are still in the buffer. We need to flush the buffer.

So let's upgrade our read function by calling a helper function, emptyBuffer, whenever we detect that too much was typed:

void emptyBuffer()
{
    int c = 0;
    while (c != '\n' && c != EOF)
    {
        c = getchar();
    }
}
int read(char *string, int length)
{
    char *entryPosition = NULL;
    if (fgets(string, length, stdin) != NULL)
    {
        entryPosition = strchr(string, '\n');
        if (entryPosition != NULL)
        {
            *entryPosition = '\0';
        }
        else
        {
            emptyBuffer();
        }
        return 1;
    }
    else
    {
        emptyBuffer();
        return 0;
    }
}

The read function now calls emptyBuffer in two situations:

The input was too long (we know this because \n wasn't found in the string).
There was an error of some kind, and we need to flush the buffer anyway, just to be safe.

The emptyBuffer function is short but mighty.

It reads characters from the buffer one at a time using getchar. This function returns an int (not a char, for reasons nobody really understands, but hey... whatever). We store that int in a temporary variable c, and loop until we find a \n or the EOF marker (End Of File), which both signal that we've reached the end of the buffer.

As soon as we hit one of those, we stop.

And there you have it! :) It's a bit technical and might seem complicated at first, but trust me: it makes sense once you get the hang of it. Take your time with these functions, and with the help of my diagrams, you'll totally get it!

Converting Strings to Numbers

Our read function is now efficient and reliable, but it only reads text. You're probably wondering: "But how do we get a number from that?" o_O

Well, read is a basic input function. With fgets, you can only capture text, but there are other functions that let you convert that text into a number afterward.

strtol: Convert a String to a long

The prototype of the strtol function is a little unusual:

long strtol(const char *start, char **end, int base);

The function reads the character string you pass it (start) and tries to convert it into a long, using the base you specify (usually 10, since we typically use digits from 0 to 9). It returns the number it successfully converted.

As for the end pointer-to-pointer, the function uses it to return the address of the first character it didn't convert. But we won't need that, so we can just pass NULL to say "I don't care."

The string must start with a number — anything after that is ignored. Leading spaces are fine.

Let's look at some examples to get a feel for how it works:

long i;
i = strtol("147", NULL, 10);                // i = 147  
i = strtol("147.215", NULL, 10);            // i = 147  
i = strtol("147.215", NULL, 10);         // i = 147  
i = strtol("147+34", NULL, 10);          // i = 147  
i = strtol("147 Big Macs", NULL, 10);    // i = 147  
i = strtol("There are 147 werewolves", NULL, 10); // i = 0 (error: doesn't start with a number)

Any string that starts with a number (or spaces followed by a number) will be converted to a long — up to the first invalid character (like . or +). But if the string doesn't begin with a number at all, conversion fails and strtol simply returns 0.

Let's write a readLong function that calls our earlier read (which reads a string), then converts that string into a number:

long readLong()
{
    char numberText[100] = {0}; // 100 characters should be plenty
    if (read(numberText, 100))
    {
        // If reading went well, convert to long and return the result
        return strtol(numberText, NULL, 10);
    }
    else
    {
        // If reading failed, return 0
        return 0;
    }
}

You can test this with a simple main function:

int main(int argc, char *argv[])
{
    long age = 0;
    printf("What's your age? ");
    age = readLong();
    printf("Ah! So you're %d years old!\n\n", age);
    return 0;
}

Output:

What's your age? 17  
Ah! So you're 17 years old!

strtod: Convert a String to a double

The strtod function works just like strtol, except it reads decimal numbers and returns a double:

double strtod(const char *start, char **end);

You'll notice the third base parameter is gone, but we still have to deal with the end pointer — even if we don't use it.

Challenge: Try writing a readDouble function. It should be just like readLong, except this time you call strtod and return a double. Simple!

With that done, you could write something like this:

How much do you weigh? 167.9  
Ah! So you weigh 167.900000 lbs!

This last chapter of part IV of the C tutorial might have been a bit tougher, but it showed you how to safely capture user input and convert it into numbers using solid, reliable techniques. I strongly recommend using these kinds of functions in your own projects rather than relying on scanf, especially for security reasons. (It's way too easy to mess up your memory with scanf, as we've seen.)

Enjoyed this C / C++ course?

If you liked this lesson, you can find the book "Learn C Programing for Beginners" from the same authors, available on SiteRaw, in bookstores and in online libraries in either digital or paperback format. You will find a complete C / C++ workshop with many exclusive bonus chapters.

More information

< Summary >