linux command

gawk

Linux Command – gawk เป็นคำสั่งที่สำคัญและทรงพลังมากสำหรับ unix/linux user เพราะสามารถใช้แปลง, ค้นหา และ สร้างข้อมูลในรูปแบบ text ที่มีความซับซ้อนตามความต้องการของ user ได้ทุกรูปแบบ พัฒนาต่อยอดจาก awk

 

คำสั่ง

ทำการจัดการค้นหาจาก file text

$ cat saixiii.txt
line,1,Hello world!
line,2,My name is saixiii
line,3,Good Bye
line,4,Bye

 

1. แสดงผลข้อมูลเป็น field

โดยปกติจะใช้ white space เป็นตัวแบ่งข้อมูล field ($) โดย field number จะเริ่มต้นที่ 1 แต่ถ้าจะให้แสดงผลทั้งหมดจะใช้ $0

$ gawk '{print $1}' saixiii.txt
line,1,Hello
line,2,My
line,3,Good
line,4,Bye
$ gawk '{print $0}' saixiii.txt
line,1,Hello world!
line,2,My name is saixiii
line,3,Good Bye
line,4,Bye

 

2. สามารถกำหนด string ที่ใช้เป็นตัวแบ่ง field

-F'{string}’  ใช้กำหนดสัญลักษณ์ที่เป็น delimiter

$ gawk -F',' '{print $2}' saixiii.txt
1
2
3
4
$ gawk -F',' '{print $3}' saixiii.txt
Hello world!
My name is saixiii
Good Bye
Bye

 

3.  สามารถใส่เงื่อนไขในการแสดงผล

‘{if(expression) {print ….. }’

$ gawk -F',' '{if($2%2==1) {print $0}}' saixiii.txt
line,1,Hello world!
line,3,Good Bye

 

4. ประมวลผลทั้ง file ด้วยเงื่อนไขพร้อมแสดงผล

เราสามารถใช้ BEGIN เพื่อประกาศตัวแปรภายใน awk เพื่อใช้งาน รวมถึงเขียน express ในการทำงานแต่ละบรรทัด จนสุดท้ายแสดวผลออกมาตอน END

$ gawk -F',' '{print $2}' saixiii.txt
1
2
3
4
$ gawk -F',' 'BEGIN{n=0} {n=n+$2} END{print n}' saixiii.txt
10

 

โครงสร้างคำสั่ง

gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

 

รายละเอียด

เป็นภาษา programming ที่ใช้ในการจัดการข้อมูลชนิด text ที่คิดค้นขึ้นพัฒนาต่อยอด มาจาก awk เพื่อให้ใช้งานแบบ freeware ก่อนที่ awk จะกลายมาเป็น freeware เช่นทุกวันนี้

 

Option

-f program-file
--file program-file
Read the AWK program source from the file program-file, instead of from the first command line argument. Multiple -f (or --file) options may be used.

-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS predefined variable).

-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN rule of an AWK program.

-b
--characters-as-bytes
Treat all input data as single-byte characters. In other words, don't pay any attention to the locale information when attempting to process strings as multibyte characters.
The --posix option overrides this one.

-c
--traditional
Run in compatibility mode. In compatibility mode, gawk behaves identically to Brian Kernighan's awk; none of the GNU-specific extensions are recognized. See GNU EXTEN?
SIONS, below, for more information.

-C
--copyright
Print the short version of the GNU copyright information message on the standard output and exit successfully.

-d[file]
--dump-variables[=file]
Print a sorted list of global variables, their types and final values to file. If no file is provided, gawk uses a file named awkvars.out in the current directory.
Having a list of all the global variables is a good way to look for typographical errors in your programs. You would also use this option if you have a large program with a
lot of functions, and you want to be sure that your functions don't inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to
make with simple variable names like i, j, and so on.)

-D[file]
--debug[=file]
Enable debugging of AWK programs. By default, the debugger reads commands interactively from the keyboard (standard input). The optional file argument specifies a file
with a list of commands for the debugger to execute non-interactively.

-e program-text
--source program-text
Use program-text as AWK program source code. This option allows the easy intermixing of library functions (used via the -f and --file options) with source code entered on
the command line. It is intended primarily for medium to large AWK programs used in shell scripts.

-E file
--exec file
Similar to -f, however, this is option is the last one processed. This should be used with #! scripts, particularly for CGI applications, to avoid passing in options or
source code (!) on the command line from a URL. This option disables command-line variable assignments.

-g
--gen-pot
Scan and parse the AWK program, and generate a GNU .pot (Portable Object Template) format file on standard output with entries for all localizable strings in the program.
The program itself is not executed. See the GNU gettext distribution for more information on .pot files.

-h
--help Print a relatively short summary of the available options on the standard output. (Per the GNU Coding Standards, these options cause an immediate, successful exit.)

-i include-file
--include include-file
Load an awk source library. This searches for the library using the AWKPATH environment variable. If the initial search fails, another attempt will be made after appending
the .awk suffix. The file will be loaded only once (i.e., duplicates are eliminated), and the code does not constitute the main program source.

-l lib
--load lib
Load a shared library lib. This searches for the library using the AWKLIBPATH environment variable. If the initial search fails, another attempt will be made after append?
ing the default shared library suffix for the platform. The library initialization routine is expected to be named dl_load().

-L [value]
--lint[=value]
Provide warnings about constructs that are dubious or non-portable to other AWK implementations. With an optional argument of fatal, lint warnings become fatal errors.
This may be drastic, but its use will certainly encourage the development of cleaner AWK programs. With an optional argument of invalid, only warnings about things that are
actually invalid are issued. (This is not fully implemented yet.)

-M
--bignum
Force arbitrary precision arithmetic on numbers. This option has no effect if gawk is not compiled to use the GNU MPFR and MP libraries.

-n
--non-decimal-data
Recognize octal and hexadecimal values in input data. Use this option with great caution!

-N
--use-lc-numeric
This forces gawk to use the locale's decimal point character when parsing input data. Although the POSIX standard requires this behavior, and gawk does so when --posix is
in effect, the default is to follow traditional behavior and use a period as the decimal point, even in locales where the period is not the decimal point character. This
option overrides the default behavior, without the full draconian strictness of the --posix option.

-o[file]
--pretty-print[=file]
Output a pretty printed version of the program to file. If no file is provided, gawk uses a file named awkprof.out in the current directory.

-O
--optimize
Enable optimizations upon the internal representation of the program. Currently, this includes simple constant-folding, and tail call elimination for recursive functions.
The gawk maintainer hopes to add additional optimizations over time.

-p[prof-file]
--profile[=prof-file]
Start a profiling session, and send the profiling data to prof-file. The default is awkprof.out. The profile contains execution counts of each statement in the program in
the left margin and function call counts for each user-defined function.

-P
--posix
This turns on compatibility mode, with the following additional restrictions:

? \x escape sequences are not recognized.

? Only space and tab act as field separators when FS is set to a single space, newline does not.

? You cannot continue lines after ? and :.

? The synonym func for the keyword function is not recognized.

? The operators ** and **= cannot be used in place of ^ and ^=.

-r
--re-interval
Enable the use of interval expressions in regular expression matching (see Regular Expressions, below). Interval expressions were not traditionally available in the AWK
language. The POSIX standard added them, to make awk and egrep consistent with each other. They are enabled by default, but this option remains for use with --traditional.

-S
--sandbox
Runs gawk in sandbox mode, disabling the system() function, input redirection with getline, output redirection with print and printf, and loading dynamic extensions. Com?
mand execution (through pipelines) is also disabled. This effectively blocks a script from accessing local resources (except for the files specified on the command line).

-t
--lint-old
Provide warnings about constructs that are not portable to the original version of UNIX awk.

-V
--version
Print version information for this particular copy of gawk on the standard output. This is useful mainly for knowing if the current copy of gawk on your system is up to
date with respect to whatever the Free Software Foundation is distributing. This is also useful when reporting bugs. (Per the GNU Coding Standards, these options cause an
immediate, successful exit.)

Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a “-”. This provides consistency with the argument parsing con?
vention used by most other POSIX programs.

In compatibility mode, any other options are flagged as invalid, but are otherwise ignored. In normal operation, as long as program text has been supplied, unknown options are
passed on to the AWK program in the ARGV array for processing. This is particularly useful for running AWK programs via the “#!” executable interpreter mechanism.

For POSIX compatibility, the -W option may be used, followed by the name of a long option.

 

กลุ่มคำสั่ง

awk,egrep(1), sed(1), getpid(2), getppid(2), getpgrp(2), getuid(2), geteuid(2), getgid(2), getegid(2), getgroups(2), usleep(3)

 

Reference:

คำสั่ง Unix – Linux Command

Linux, Unix

 

Author: Suphakit Annoppornchai

Credit: https://saixiii.com

Leave a Reply