Next: Format, Previous: Translit, Up: Text handling [Contents][Index]
Global substitution in a string is done by patsubst
:
Searches string for matches of regexp, and substitutes replacement for each match. The syntax for regular expressions is the same as in GNU Emacs (see Regexp).
The parts of string that are not covered by any match of regexp are copied to the expansion. Whenever a match is found, the search proceeds from the end of the match, so a character from string will never be substituted twice. If regexp matches a string of zero length, the start position for the search is incremented, to avoid infinite loops.
When a replacement is to be made, replacement is inserted into the expansion, with ‘\n’ substituted by the text matched by the nth parenthesized sub-expression of patsubst, for up to nine sub-expressions. The escape ‘\&’ is replaced by the text of the entire regular expression matched. For all other characters, ‘\’ treats the next character literally. A warning is issued if there were fewer sub-expressions than the ‘\n’ requested, or if there is a trailing ‘\’.
The replacement argument can be omitted, in which case the text matched by regexp is deleted.
The macro patsubst
is recognized only with parameters.
patsubst(`GNUs not Unix', `^', `OBS: ') ⇒OBS: GNUs not Unix patsubst(`GNUs not Unix', `\<', `OBS: ') ⇒OBS: GNUs OBS: not OBS: Unix patsubst(`GNUs not Unix', `\w*', `(\&)') ⇒(GNUs)() (not)() (Unix)() patsubst(`GNUs not Unix', `\w+', `(\&)') ⇒(GNUs) (not) (Unix) patsubst(`GNUs not Unix', `[A-Z][a-z]+') ⇒GN not patsubst(`GNUs not Unix', `not', `NOT\') error→m4:stdin:6: Warning: trailing \ ignored in replacement ⇒GNUs NOT Unix
Here is a slightly more realistic example, which capitalizes individual
words or whole sentences, by substituting calls of the macros
upcase
and downcase
into the strings.
Expand to text, but with capitalization changed: upcase
changes all letters to upper case, downcase
changes all letters
to lower case, and capitalize
changes the first character of each
word to upper case and the remaining characters to lower case.
First, an example of their usage, using implementations distributed in m4-1.4.19/examples/capitalize.m4.
$ m4 -I examples include(`capitalize.m4') ⇒ upcase(`GNUs not Unix') ⇒GNUS NOT UNIX downcase(`GNUs not Unix') ⇒gnus not unix capitalize(`GNUs not Unix') ⇒Gnus Not Unix
Now for the implementation. There is a helper macro _capitalize
which puts only its first word in mixed case. Then capitalize
merely parses out the words, and replaces them with an invocation of
_capitalize
. (As presented here, the capitalize
macro has
some subtle flaws. You should try to see if you can find and correct
them; or see Answers).
$ m4 -I examples undivert(`capitalize.m4')dnl ⇒divert(`-1') ⇒# upcase(text) ⇒# downcase(text) ⇒# capitalize(text) ⇒# change case of text, simple version ⇒define(`upcase', `translit(`$*', `a-z', `A-Z')') ⇒define(`downcase', `translit(`$*', `A-Z', `a-z')') ⇒define(`_capitalize', ⇒ `regexp(`$1', `^\(\w\)\(\w*\)', ⇒ `upcase(`\1')`'downcase(`\2')')') ⇒define(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')') ⇒divert`'dnl
While regexp
replaces the whole input with the replacement as
soon as there is a match, patsubst
replaces each
occurrence of a match and preserves non-matching pieces:
define(`patreg', `patsubst($@) regexp($@)')dnl patreg(`bar foo baz Foo', `foo\|Foo', `FOO') ⇒bar FOO baz FOO ⇒FOO patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2') ⇒bab abb 212 ⇒bab
Omitting regexp evokes a warning, but still produces output; contrast this with an empty regexp argument.
patsubst(`abc') error→m4:stdin:1: Warning: too few arguments to builtin `patsubst' ⇒abc patsubst(`abc', `') ⇒abc patsubst(`abc', `', `\\-') ⇒\-a\-b\-c\-
Next: Format, Previous: Translit, Up: Text handling [Contents][Index]