String in C Language
Overview
C doesn’t have a dedicated string type. Instead, strings are treated as arrays of characters (char arrays). For example, the string “Hello” is processed as the array {‘H’, ‘e’, ‘l’, ‘l’, ‘o’}.
The compiler allocates a continuous block of memory for the array, with all characters stored in adjacent memory units. C automatically adds a null byte (‘\0’) at the end of the string to indicate its termination. This null character is different from the character ‘0’. The null character has an ASCII code of 0 (binary 00000000), while ‘0’ has an ASCII code of 48 (binary 00110000). So, the string “Hello” is actually stored as {‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’}.
Every string ends with the null character ‘\0’. This allows C to read strings from memory without knowing their length in advance - it simply reads characters until it encounters ‘\0’.
For example:
1 | char localString[10]; |
This declares a character array with 10 elements, which can be used as a string. Since one position must be reserved for ‘\0’, it can hold a string of up to 9 characters.
Writing strings as arrays can be cumbersome. C provides a shorthand: characters within double quotes are automatically treated as character arrays.
1 | {'H', 'e', 'l', 'l', 'o', '\0'} |
Both representations are equivalent and stored the same way internally. With double quotes, you don’t need to add ‘\0’ yourself - C does it automatically.
Note: Double quotes denote strings, while single quotes denote characters. They’re not interchangeable. Putting “Hello” in single quotes will cause a compiler error.
1 | // Error |
Even a single character in double quotes (like “a”) is treated as a string (stored as 2 bytes), not as a character ‘a’ (stored as 1 byte).
If a string contains double quotes, they need to be escaped with a backslash:
1 | "She replied, \"It does.\"" |
Backslashes can also represent other special characters, like newlines (\n) or tabs (\t):
1 | "Hello, world!\n" |
For long strings, you can use a backslash (\
) at the end of a line to split it across multiple lines:
1 | "hello \ |
This approach has a drawback: the second line must start at the leftmost column, or any indentation will be included in the string. To solve this, C allows concatenation of string literals if they’re adjacent or separated only by whitespace:
1 | char greeting[50] = "Hello, ""how are you ""today!"; |
This syntax also supports multi-line string concatenation:
1 | char greeting[50] = "Hello, " |
To print strings, use the %s format specifier with printf():
1 | printf("%s\n", "hello world"); |
String Variable Declaration
String variables can be declared either as character arrays or as pointers to character arrays:
1 | // Method 1 |
Both methods declare a string variable s
. With the first method, you can let the compiler calculate the array length:
1 | char s[] = "Hello, world!"; |
The compiler will set the length of array s to 14, exactly fitting the string.
The array length can be larger than the actual string length:
1 | char s[50] = "hello"; |
Here, s has a length of 50, but “hello” only uses 6 positions (including ‘\0’). The remaining 44 positions are initialized to ‘\0’.
However, the array length cannot be smaller than the actual string length:
1 | char s[5] = "hello"; // Error |
This will cause a compiler error because the array is too small to hold the string and its null terminator.
While character pointers and character arrays are mostly equivalent for declaring string variables, there are two key differences:
- Strings declared as pointers are treated as constants and cannot be modified:
1 | char* s = "Hello, world!"; |
This will lead to unpredictable results or errors at runtime.
With array declaration, you can modify any array element:
1 | char s[] = "Hello, world!"; |
This difference arises because string literals are stored in the constant memory area, which users can’t modify. When declared as a pointer, the variable stores an address pointing to this constant area. But when declared as an array, the compiler allocates separate memory and copies the characters there, allowing modification.
To emphasize that a string pointer is read-only, you can use the const
keyword:
1 | const char* s = "Hello, world!"; |
- Pointer variables can be reassigned to other strings:
1 | char* s = "hello"; |
But array variables cannot point to another string:
1 | char s[] = "hello"; |
The array name always points to its initial memory address and can’t be changed.
For the same reason, you can’t directly assign a string to a character array after declaration:
1 | char s[10]; |
To assign a new string to an array, use the strcpy()
function:
1 | char s[10]; |
This copies the string “abc” into s without changing s
‘s address.
strlen()
The strlen()
function returns the byte length of a string, excluding the null terminator (\0
). Its prototype is as follows:
1 | // string.h |
The parameter is a string variable, and it returns a value of type size_t
, which is an unsigned integer. In most cases, it can be treated as an int
, unless dealing with extremely long strings. Here’s an example of how to use it:
1 | char* str = "hello"; |
The prototype for strlen()
is defined in the standard library header file string.h
, so you need to include it:
1 |
|
Note that string length (strlen()
) and the size of the string variable (sizeof()
) are two different concepts.
1 | char s[50] = "hello"; |
In this example, the string length is 5, while the size of the string variable is 50 bytes.
If you choose not to use strlen()
, you can manually calculate the string length by checking for the null terminator:
1 | int my_strlen(char *s) { |
strcpy()
In C, to copy strings, you cannot use the assignment operator (=
) directly with character arrays. For example:
1 | char str1[10]; |
Both attempts result in errors because array variable names represent fixed memory addresses that cannot be reassigned.
For character pointers, using the assignment operator (=
) merely copies the address:
1 | char* s1; |
Here, s1
and s2
point to the same string, rather than copying the string content.
C provides the strcpy()
function to copy the content of one string to another, effectively functioning as a string assignment. The prototype is found in the string.h
header file:
1 | strcpy(char dest[], const char source[]); |
The first parameter is the destination string array, and the second is the source. Ensure that the destination is large enough to hold the source; otherwise, this can lead to buffer overflow, causing unpredictable behavior. The const
qualifier indicates that the source string will not be modified.
Here’s an example:
1 |
|
In this example, the contents of s
are copied to t
, resulting in two separate strings. Modifying one does not affect the other. Note that the uninitialized positions in t
may contain random values.
You can also use strcpy()
to assign a string to a character array:
1 | char str[10]; |
The return value of strcpy()
is a pointer to the first character of the destination string.
Another example:
1 | char* s1 = "beast"; |
This code copies “beast” starting from the 7th position in s2
, altering the latter’s content.
strcpy()
can also be nested to assign multiple strings:
1 | strcpy(str1, strcpy(str2, "abcd")); |
However, it’s safer to declare the first parameter as an initialized array, not as an uninitialized pointer:
1 | char* str; |
This is problematic because str
points to a random location.
If you prefer to implement your own string copy function, you can do so like this:
1 | char* strcpy(char* dest, const char* source) { |
The crucial line is while (*dest++ = *source++);
, which copies each character until it reaches the null terminator (\0
).
Finally, note that strcpy()
poses security risks, as it does not check if the destination has enough space. If there’s a chance of overflow, consider using strncpy()
instead.
strncpy()
The usage of strncpy()
is similar to that of strcpy()
, with the addition of a third parameter that specifies the maximum number of characters to copy. This helps prevent buffer overflow in the destination string.
1 | char* strncpy( |
In the prototype above, the third parameter n
defines the maximum number of characters to copy. If the source string has more characters than n
, the copying will stop, and the destination string will not have a null terminator (\0
). This is an important consideration. If the source string has fewer characters than n
, the behavior of strncpy()
is identical to that of strcpy()
.
Here’s an example:
1 | strncpy(str1, str2, sizeof(str1) - 1); |
In this example, str2
is copied to str1
, but only up to sizeof(str1) - 1
characters. The last position in str1
is reserved for the null terminator. Since strncpy()
does not automatically append a \0
, it’s necessary to add it manually if the copied substring does not include one.
strncpy()
can also be used to copy a portion of a string:
1 | char s1[40]; |
In this example, only the first 5 characters of s2
are copied to s1
.
strcat()
The strcat()
function is used to concatenate two strings. It takes two strings as parameters and appends a copy of the second string to the end of the first, modifying the first string in the process. The function prototype is defined in the string.h
header file:
1 | char* strcat(char* s1, const char* s2); |
The return value is a pointer to the first string.
For example:
1 | char s1[12] = "hello"; |
Note that the first string must have enough space to hold the concatenated result. If not, it may overflow, leading to undefined behavior. To avoid this, consider using strncat()
.
strncat()
The strncat()
function is used to concatenate two strings, similar to strcat()
, but it includes a third parameter that specifies the maximum number of characters to append. The function stops adding characters when it reaches the specified limit or encounters a null character (\0
) in the source string. Its prototype is defined in the string.h
header file:
1 | char* strncat( |
The strncat()
function returns a pointer to the destination string.
To ensure that the concatenated string does not exceed the length of the destination buffer, strncat()
is typically called like this:
1 | strncat( |
strncat()
automatically adds a null character (\0
) at the end of the concatenated result. Therefore, the maximum value for the third parameter should be the length of str1
minus its current length, minus one. Here’s an example:
1 | char s1[10] = "Monday"; |
In this example, s1
has a total length of 10 and a current length of 6. The difference, minus one, is 3, meaning s1
can safely add three more characters. Thus, the result is “MondayTue”.
strcmp()
To compare two strings in C, you can’t do it directly; instead, you must compare them character by character. The C language provides the strcmp()
function for this purpose.
The strcmp()
function is used to compare the contents of two strings. Its prototype is defined in the string.h
header file:
1 | int strcmp(const char* s1, const char* s2); |
The function compares the strings in lexicographical order. It returns:
- 0 if the strings are equal,
- a value less than 0 if
s1
is less thans2
, - a value greater than 0 if
s1
is greater thans2
.
Here’s a usage example:
1 | // s1 = "Happy New Year" |
Note that strcmp()
is intended solely for comparing strings, not individual characters. Since characters are essentially small integers, you can compare them directly using the equality operator (==
). Therefore, do not pass character values to strcmp()
.
strncmp()
strncmp()
is a variation of strcmp()
that allows comparison of only a specified number of characters from two strings. It includes a third parameter, n
, which defines the maximum number of characters to compare. The function prototype is defined in the string.h
header file:
1 | int strncmp(const char* s1, const char* s2, size_t n); |
The return value is similar to strcmp()
: it returns 0 if the strings are identical, a negative value if s1
is less than s2
, and a positive value if s1
is greater than s2
.
Here’s an example:
1 | char s1[12] = "hello world"; |
In this example, only the first 5 characters of the two strings are compared.
sprintf()
, snprintf()
The sprintf()
and snprintf()
functions are used in C to format strings. While sprintf()
writes formatted data to a string, snprintf()
adds an additional layer of safety by preventing buffer overflows.
sprintf()
The prototype for sprintf()
is defined in the stdio.h
header file:
1 | int sprintf(char* s, const char* format, ...); |
- The first parameter is a pointer to the output string.
- The second parameter is a format string, followed by any additional variables to be formatted.
Example:
1 | char first[6] = "hello"; |
In this example, sprintf()
combines the strings “hello” and “world” into the variable s
. The function returns the number of characters written (excluding the null terminator). If an error occurs, it returns a negative value.
However, sprintf()
poses a significant security risk: if the formatted output exceeds the allocated buffer size, it can lead to buffer overflows.
snprintf()
To mitigate this risk, C provides snprintf()
, which has an additional parameter to specify the maximum number of characters to write:
1 | int snprintf(char* s, size_t n, const char* format, ...); |
- The second parameter
n
limits the output ton - 1
characters, leaving space for the null terminator.
Example:
1 | snprintf(s, 12, "%s %s", "hello", "world"); |
In this case, snprintf()
will ensure that at most 11 characters are written, plus the null terminator. The function always writes the null terminator, and if the formatted output exceeds n
, it writes only n - 1
characters, ensuring safety.
The return value of snprintf()
is similar to sprintf()
, indicating the number of characters that would have been written (excluding the null terminator). If the formatted string is larger than n
, the return value may exceed n
. To confirm a complete write, the return value should be non-negative and less than n
.
String Array
A string array can be implemented using a two-dimensional character array when each element is a string. In this example, we define an array of strings representing the days of the week:
1 | char weekdays[7][10] = { |
Here, we have a string array with 7 elements, where the maximum length of any string, including the null terminator, is 10 characters.
Since the compiler can automatically determine the size of the first dimension, we can simplify the declaration:
1 | char weekdays[][10] = { |
However, this approach wastes space since most strings are shorter than 10 characters. A more efficient solution is to use an array of character pointers:
1 | char* weekdays[] = { |
In this case, we have a one-dimensional array of 7 pointers, each pointing to a string.
To iterate over this string array, you can use the following code:
1 | for (int i = 0; i < 7; i++) { |