c language sscanf scanf fscanf regular expression usage

  • 2020-06-03 08:00:58
  • OfStack

Each language has varying levels of support for regular expressions. In C, the three functions with input are not very powerful for regular expressions, but it is important to understand 1.

First, take a look at their prototype:


#include <stdio.h>
int scanf(const char *format, ...);
int fscanf(FILE *stream, const char *format, ...);
int sscanf(const char *str, const char *format, ...);

sscanf is similar to scanf and can use standard input (stdin) as the input source. The most important part is the format parameter. It can be one or more {%[*] [width] [{h | l | I64 | L}]type | ' '| '\t' | '\n' | non-% symbol}.

Parameter interpretation:

1. * can also be used in the format, (i.e. %*d and %*s) with an asterisk (*) to indicate that the data is skipped and not read.

2, {a|b|c} means a,b,c choose 1, [d], which means d or d.

3. width stands for read width.

4, {h | l | I64 | L}: size, usually h for single byte size, I for 2-byte size,L for 4-byte size(double exception),l64 for 8-byte size.

5, type: that's %s,%d and so on.

6, special: %*[width] [{h | l | I64 | L}]type means that the condition is filtered out and no value is written to the target parameter

Set operation supported: %[ES70en-ES71en] indicates that match any character in a to z, greedy (as many matches as possible)%[aB'] matches one member in a, B, ', greedy %[^a] matches any character in non-ES78en, greedy

The return value

These three functions return the input items that were successfully matched and allocated. This means that in your format parameter list format, the return value can be less than the number of matches you provide (some will fail). An early match fails to return 0. It returns EOF if the end of the file is reached and EOF in the event of an error. You can see the error code by outputting errno.

There is a security risk if you use fscanf to determine if a file is finished, and if the match fails every time you read it, the return value will never be EOF. scanf functions read data into buffers and then read in buffers.

Note: The scanf family of functions ignores the whitespace at the beginning of line 1

sscanf/scanf regular usage

The usage of %[]

[

%[] means to read a set of 1 characters, and if [the first character after it is "^", the opposite is true.

The string in [] can be 1 or more characters. An empty character set (%[]) is against the rules, but

Lead to unpredictable results. %[^] is also against the rules.

]

%[ES111en-ES112en] reads the string between ES113en-ES114en and stops if not before, as in


char s[]="hello, my friend "  ; //  Pay attention to  : ,  The comma without  a-z  between 
sscanf( s,  " %[a-z] " , string ) ; // string=hello

%[^ a-ES119en] reads a string that is not between ES120en-ES121en and stops if it encounters a character between ES122en-ES123en, as in


char s[]="HELLOkitty "  ;//  Pay attention to  : ,  The comma without  a-z  between 
sscanf( s,  " %[^a-z] " , string ) ; // string=HELLO

%*[^=] is preceded by an * to indicate that the variable is not saved. Skip qualified strings.


char s[]="notepad=1.0.0.1001" ;
char szfilename [32] = "" ;
int i = sscanf( s, "%*[^=]", szfilename ) ; // szfilename=NULL,  Because I didn't save it 
int i = sscanf( s, "%*[^=]=%s", szfilename ) ; // szfilename=1.0.0.1001

%40c reads 40 characters

The run-time
library does not automatically append a null terminator to the string, nor does reading 40 characters
automatically terminate the scanf() function. Because the library uses buffered input, you must press the ENTER key to terminate the string scan. If you press the ENTER before the scanf() reads 40 characters, it is displayed normally, and the library continues to prompt for additional input until it reads 40 characters

%[^=] reads the string until it hits the '=' sign, which can be followed by more characters, such as:


char s[]="notepad=1.0.0.1001" ;
char szfilename [32] = "" ;
int i = sscanf( s, "%[^=]", szfilename ) ; // szfilename=notepad 

If the parameter format is: %[^=:], then you can also read notepad from notepad:1.0.0.1001

Use examples:


char s[]="notepad=1.0.0.1001" ;
char szname [32] = "" ;
char szver [32] =  ""  ;
sscanf( s, "%[^=]=%s", szname , szver ) ; // szname=notepad, szver=1.0.0.1001

Conclusion: %[] has a great function, but it is not used very often, mainly because:

1. Many systems have bugs in the scanf functions (typically TC makes mistakes when typing floating-point values).
2, the use of complex, easy to make mistakes.
3. It is difficult for the compiler to analyze the syntax, which affects the quality and execution efficiency of the object code.

Personally, I find point 3 the most deadly. The more complex the function, the less efficiently it is executed. Some simple string analysis can be handled by ourselves.

The use and distinction of scanf(), sscanf(),fscanf() in C

scanf(), sscanf(),fscanf()
The first is input from the console (keyboard);
The second is input from a string;
The third is input from the file;
scanf
The scanf() function reads from stdin(standard input) in the format specified by format(format) and saves the data to other parameters.


int main()
{
  int a,b,c;
  printf(" Input: a,b,c\n");
  scanf("%d,%d,%d",&a,&b,&c);
  printf("a = %d b = %d c = %d",a,b,c);
  return 0;
}

sscanf
The sscanf() function is similar to scanf() except that the input is read from buffer(buffer).
sscanf is similar to scanf in that it is used for input, except that the latter takes the screen (stdin) as the input source and the former takes a fixed string as the input source

Usage:
%[] means to read in a set of 1 characters, and if the first character after [is "^", it means the opposite. Strings in [] can be 1 or more characters. Empty character sets (%[]) are against the rules and can result in unpredictable results. %[^] is also against the rules.

1. Common usage.

char buf[512] ;
sscanf (" 123456 ", "% s buf); // buf is the name of the array. It means to store 123456 in buf as %s!
printf("%s\n", buf);
The result is: 123456

2. Takes a string of the specified length. As in the following example, take a string with a maximum length of 4 bytes.

sscanf("123456 ", "%4s", buf);
printf("%s\n", buf);
The result is: 1234

3. A string up to the specified character. As in the following example, take a string until a space is encountered.

sscanf("123456 abcdedf", "%[^ ]", buf);
printf("%s\n", buf);
The result is: 123456

4. Takes a string containing only the specified character set. As in the following example, take a string that contains only 1 through 9 and lowercase letters.

sscanf("123456abcdedfBCDEF", "%[1-9a-z]", buf);
printf("%s\n", buf);
The result is: 123456abcdedf
When entering: sscanf("123456abcdedfBCDEF","%[1-9ES236en-ES237en]",buf);
printf("%s\n",buf);
The result is: 123456

5. A string up to the specified character set. As in the following example, take a string until an uppercase letter is encountered.

sscanf("123456abcdedfBCDEF", "%[^A-Z]", buf);
printf("%s\n", buf);
The result is: 123456abcdedf

6. Given a string iios/12DDWDFF@122, get the string between/and @,

Filter out the "iios/" first, then send the non-' @' 1 string to buf
sscanf("iios/12DDWDFF@122", "%*[^/]/%[^@]", buf);
printf("%s\n", buf);
The result is: 12DDWDFF

7. Given a string "hello, world", only world is retained.

(Note: ", "is followed by a space. %s stops with a space and * ignores the first string read.)
sscanf (" hello, world ", "% * s % s," buf);
printf("%s\n", buf);
Results: world
%*s means that the first matched %s is filtered, i.e., "hello," is filtered
The result is NULL if there is no space.


Related articles: