A brief analysis of arrays and character arrays in C language

  • 2020-05-05 11:37:21
  • OfStack

Let's write a program to count the occurrences of Numbers, whitespace characters (including Spaces, tabs, and newlines), and all other characters. This program is not very useful, but it allows us to discuss many aspects of the C language.

All input characters can be divided into 12 classes, so you can store the number of occurrences of each number in an array, which is more convenient than using 10 separate variables. Here is a version of the program:


#include <stdio.h>
/* count digits, white space, others */
main()
{
 int c, i, nwhite, nother;
 int ndigit[10];
 nwhite = nother = 0;
 for (i = 0; i < 10; ++i)
 ndigit[i] = 0;
 while ((c = getchar()) != EOF)
 if (c >= '0' && c <= '9')
  ++ndigit[c-'0'];
 else if (c == ' ' || c == '\n' || c == '\t')
  ++nwhite;
 else
  ++nother;
 printf("digits =");
 for (i = 0; i < 10; ++i)
  printf(" %d", ndigit[i]);
 printf(", white space = %d, other = %d\n", nwhite, nother);
}

When the program itself is used as input, the output is: digits = 9, 3, 0, 0, 0, 0, 0, 1, white space = 123, other = 345

The declaration statement int ndigit[10] declares the variable ndigit as an array of 10 integers. In C, array subscripts always start at 0, so the 10 elements of the array are ndigit[0], ndiglt[1],... , ndigit[9], which can be reflected by initializing and printing two for loops of the array.

Array subscripts can be any integer expression, including integer variables (such as i) and integer constants.

The execution of this program depends on the character representation property of the number. For example, the test statement if (c >) = '0' && c < = '9') to determine whether a character in c is a number. If it is a number, the number corresponds to c- '0'. Only when '0', '1'... , '9' has a continuously increasing value. Fortunately, this is true for all character sets.

By definition, characters of type char are small integers, so variables and constants of type char are equivalent in arithmetic expressions to variables and constants of type int. This is both natural and convenient; for example, c - '0' is an integer expression, and if the character stored in c is '0' ~ '9', the value will be 0 ~ 9, so it can act as a legal subscript for the array ndigit.

The ability to determine whether a character is a number, a blank character, or another character is accomplished by the following sequence of statements:


if (c >= '0' && c <= '9')
 ++ndigit[c-'0'];
else if (c == ' ' || c == '\n' || c == '\t')
 ++nwhite;
else
 ++nother;

Multiplexing decisions are often expressed in the following way:
if (condition 1)
1
  statements else if (condition 1)
2
  statements   ...
  ...
else
  statement n
In this way, the conditions are evaluated from front to back until a condition is met, and then the corresponding part of the statement is executed. When this part of the statement is completed, the entire body of the statement (any of which can be several statements enclosed in curly braces) is executed. If all conditions are not met, the statement after the last else, if any, is executed. Similar to the previous word-counting program, if there is no last else and its corresponding statement, the body of the statement will not perform any action. Between the first if and the last else there can be 0 or more sequences of statements in the following form:
else if (condition)
  statement
As far as programming style is concerned, we recommend that the reader use the indentation format shown above to reflect the hierarchy of the structure; otherwise, if each if is indented some distance from the previous else, a longer decision sequence may exceed the right edge of the page.

character array
The
character array is the most commonly used array type in the C language. Let's write a program to illustrate the use of character arrays and the functions that manipulate them. The program reads in a set of text lines and prints out the longest. The basic framework of the algorithm is very simple:
while (there are also unprocessed rows)
if (the line is longer than the longest line processed)
  saves the longest row of this behavior,
  holds the length of the row
Print the longest line
As you can see from the above framework, the program naturally breaks down into pieces to read in a new line, test the read in, save the line, and the rest controls the process.

Because this division is more reasonable, you can write the program this way. First, we write a separate function, getline, that reads the next line of input. We try to keep this function useful in other fields. At least the getline function should return a signal when reading the end of the file; A more useful design is the ability to return the length of a text line when it is read in, and 0 when it encounters a file terminator. Because 0 is not a valid line, it can be used as a return value to mark the end of a file. Each line contains at least one character and contains only newline characters of length 1.

When a new line is found to be longer than the longest line previously read, it needs to be saved. That is, we need another function, copy, to copy the new line to a safe location.

Finally, we need to control getline and copy in the main function main. Here's what we wrote:


#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);

/* print the longest input line */
main()
{
 int len;
 int max;
 /* current line length */
 /* maximum length seen so far */
 char line[MAXLINE]; /* current input line */
 char longest[MAXLINE]; /* longest line saved here */
 max = 0;
 while ((len = getline(line, MAXLINE)) > 0)
 if (len > max) {
 max = len;
 copy(longest, line);
 }
 if (max > 0) /* there was a line */
 printf("%s", longest);
 return 0;
}

/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
 int c, i;
 for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
 s[i] = c;
 if (c == '\n') {
 s[i] = c;
 ++i;
 }
 s[i] = '\0';
 return i;
}

/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
 int i;
 i = 0;
 while ((to[i] = from[i]) != '\0')
 ++i;
}
</stdio.h>

The beginning of the program declares the functions getline and copy, assuming that they both reside in the same file.

Data is exchanged between main and getline through a pair of parameters and a return value. In the getline function, two arguments are passed through the program line.


int getline(char s[], int lim)

Declared, it declares the first parameter s as an array and the second parameter lim as an integer. The purpose of providing the array size in the declaration is to leave storage space. It is not necessary to specify the length of the array s in the getline function because the size of the array is set in the main function. Like the power function, the getline function USES an return statement to return the value to its caller. The program line also states that the return value type of the getline number is int. Since the default return value type of the function is int, int can be omitted here.

Some functions return useful values, while others (such as copy) are only used to perform actions and do not return values. The return value type of the copy function is void, which explicitly states that the function does not return any value.

The getline function inserts the character '\0' (that is, the null character with a value of 0) into the end of the array it creates to mark the end of the string. This convention has been adopted by the C language: when similar to
occurs in C language programs


"hello\0"

Is stored as an array of characters, with each element of the array storing each character of the string separately, and ending the string with '\0'.

The format specification %s in the printf function states that the corresponding argument must be a string in this form. The implementation of the copy function relies on the fact that the input parameter ends with '\0', which copies '\0' into the output parameter. That is, the null character '\0' is not part of normal text.

It is worth mentioning that even a small program like the one mentioned above can encounter some troublesome design problems when passing parameters. For example, what should the main function do when the line of the read is greater than the maximum allowed? The getline function is safe to execute, whether or not the newline character is reached, it will stop reading the character when the array is full. The main function determines whether the current line is too long by testing the length of the line and checking the last character returned, and then handles it on a case-by-case basis. In order to simplify the process, we will not consider this problem here.

A program that calls the getline function cannot know the length of the input line in advance, so the getline function needs to check for overflow. On the other hand, the program calling the copy function knows (and can figure out) the length of the string, so the function doesn't need to do error checking.


Related articles: