Showing first {{hits.length}} results of {{hits_total}} for {{searchQueryText}}{{hits.length}} results for {{searchQueryText}}

No Search Results

Contents 1 Introduction and overview 1.1 Possible additional background reading 2 Macros as token lists 2.1 A brief word on how token lists are stored: nodes 2.2 Reminder: the 4 parts of a macro definition 3 Graphic showing a real macro token list 3.1 Understanding the nodes 3.2 Back to the example 3.3 The “command” \fake used in \foo 3.4 Special tokens in the token list 3.4.1 The “end match” token 3.4.2 “match parameter” tokens 3.5 Special tokens in the token list 3.5.1 “output parameter” tokens 4 Part 6  Part 1   Part 2    Part 3     Part 4     Part 5     Part 6   Introduction and overview In Part 4 we reviewed some basic properties of TeX macros in preparation for the next two articles where we take a close look at the underlying mechanics of TeX macros: specialized token lists. In these final two articles we use diagrams, called node lists, that were prepared from data generated using a specially modified version of Knuth’s original TeX software—those modifications were designed to access internal TeX data structures which are normally inaccessible to the user. By “hooking into” TeX’s internal macro-processing and execution routines it was possible to write out graphical data which enables a more detailed and accurate discussion of TeX’s macro-processing behaviour. Overleaf hopes that these diagrams assist readers to achieve a better understanding of how TeX macros really work. Possible additional background reading Overleaf has already published two token-related articles that provide additional background information on TeX tokens and TeX token lists. Do please take time to check them out if you need to fill any gaps in your understanding and help you get the most from Parts 5 and 6 of this series. What is a "TeX token"? What is a TeX token list? Macros as token lists When TeX detects a macro-creation command (\def, \edef, \gdef or \xdef) within the input stream it triggers a process which converts both of the sections of our macro’s definition into one long token list—but a very particular type of token list. Token lists for macros are slightly different to other token lists used within TeX because they contain “special” token values that only processes internal to TeX itself can create/generate: those special tokens cannot be directly created by any commands that you can include in your .tex file. TeX creates and uses those “special” token values to help with processing your macro call, as we’ll explore and explain below. A brief word on how token lists are stored: nodes To store a list of tokens (integer values) TeX uses a data structure called a linked list, which, in TeX’s case, comprises a list of so-called nodes. You can think of a node as a small package of computer memory which can be used to store a collection of data items. To store a macro, these nodes are strung together like a chain, where each node (link in the chain) can store several pieces of information—including a token value and the memory address of the next node in the list. For further information, you can read the article What is a TeX token list but the following diagram summarizes the key features of a macro stored as a token list: Reminder: the 4 parts of a macro definition As discussed in Part 4, the structure of any macro can be written as: {} where: = one of \def, \edef, \gdef or \xdef; =the name of your macro, such as \foo; can be “null” (not present) or it can be an string of delimiter tokens and macro parameter tokens; is the actual body of your macro: the section that is “executed” when you call the macro. NOTE: (As also observed in Part 4) throughout the discussion we are assuming that will be followed by a space character of category code 10 to act as a delimiter to terminate the . We have not explicity shown that space character in our text/discussion but we assume it is there. Strictly speaking, we should represent it something like this: {} However, we will omit explicit inclusion of a character and implicitly assume its presence. NOTE: The characters { and } do not become part of the macro token list: their purpose is simply to tell text’s input scanner (which creates tokens) where the starts and stops. When TeX defines a macro, the sections are converted into one long continuous token list—the total number of tokens in that list depends on the complexity of the macro. As we’ve seen, the section has a specific purpose of acting as a “token template” or “blueprint” that TeX uses to pick out the tokens which form the arguments (values) to use with the actual macro: i.e., the tokens to feed into the . To firm-up these ideas, let’s take an example macro but keep it short so that subsequent diagrams do not become too cluttered: \def\foo A#1\fake{123 #1} For our macro, \foo = A#1\fake = 123 #1 Although this example is a simple macro, it contains all the features we need to explore. As noted, TeX will convert into one long token list which you can see in the diagram below. In our example, the tokens formed from A#1\fake{123 #1} have been converted to a consecutive sequence of tokens stored in a token list (as a linked list of nodes). Graphic showing a real macro token list The following diagram, showing how the macro \def\foo A#1\fake{123 #1} is stored, uses real data from inside a TeX engine. It was created using a customized version of Knuth’s TeX that was modified with additional code to intercept macro calls, examine TeX’s internal data and export it to format for processing using an open-source graphics program called Graphviz. You can download the following graphic as a PDF file (675 KB) or SVG file (1.8 MB). Understanding the nodes Within the diagram above you’ll see that each node contains two data items called the next node and the current node. These are just integer values that represent memory locations inside TeX—locations where other nodes are stored. The values of next node and current node are not important, they simply store the locations (memory addresses) which allow nodes to be linked together in a list. Back to the example In the node diagram, the token list formed from A#1\fake{123 #1} contains several “special tokens” introduced at the start of this article. In addition, the node list representing our macro starts with a “special first node”: we’ll explore what these are and what they do. The very first item in a macro token list (and some other token list types) does not store a token value but a data item called the macro’s reference count which TeX uses to track the use of the macro. The first token of the is stored in the node that follows immediately after the reference count: you can see it is a token representing the letter A with category code 11. From discussions in Parts 2 and 3 we know that a character token is calculated using $\text{token value}=256\times \text{category code} + \text{character code}$ which, for a letter A with category code 11, is $\text{token value}=256\times 11 + 65$ giving the value 2881, as shown in the node. The “command” \fake used in \foo Within our macro definition \def\foo A#1\fake{123 #1} one of the delimiters is an undefined command \fake which is stored within the token list as part of the section. As you can see, within the overall macro token list \fake is a token whose value is 19491—an integer value calculated by TeX using the formula discussed in Part 3. When TeX attempts to execute \foo it will expect to find the \fake token value at the end of the section. TeX will not try to execute the \fake command because its role is merely to provide a form of “punctuation” within the “token template”. Special tokens in the token list The “end match” token When calling a macro, TeX’s first task is to scan the macro as typed by the user and compare the tokens present in the user’s section to the tokens contained within the template stored in memory (created at the time the macro was defined). Because the macro’s full definition, constructed from is stored as one long consecutive list of tokens, TeX needs to know where, in that token list, stops and where starts. To achieve this, when TeX is defining the macro (building the token list) it will insert a special terminator token called an end match token as the very last token in the set of tokens generated from . The end match token cannot be generated from user commands, only TeX itself can create it, hence TeX is certain to detect the end of the . Here, we can see that the first token following after end match is a token representing the digit 1 with category code 12. This should be expected because the for our macro \foo is 123 #1—i.e., it starts with the token representing the digit 1 (with category code 12). From the discussion in Parts 2 and 3 we know that a character token is calculated using $\text{token value}=256\times \text{category code} + \text{character code}$ which, for a digit 1 with category code 12 is $\text{token value}=256\times 12 + 49$ giving the token value 3121, as shown in the node. “match parameter” tokens When TeX stores the macro definition, it converts any parameter tokens ($1 ,$2… #9) within to one called a match parameter token. These tokens tell TeX that it needs to start looking for tokens, within the user’s macro call, that are the arguments of the macro. Special tokens in the token list “output parameter” tokens When TeX has processed everything and is ready to actually run (expand) the macro, the output parameter tokens instruct TeX of locations within the where it needs to feed-in the tokens representing the arguments provided by the user when the macro was called. In effect, “At this location, insert the tokens representing the user’s argument n, where n=1...9”. Within the section of the stored macro-definition token list there will be an output parameter token corresponding to each $1 ,$2... #9 present in the original definition. If we look at our definition of \foo (\def\foo A#1\fake{123 #1}) we see there is only 1 macro parameter (#1) in the (A#1\fake) and subsequently only 1 macro parameter (#1) appears in the (123 #1): this results in just 1 output parameter token present in the token list representing the . Note the following in the node list representing \foo’s : the token immediately before the output parameter token represents a space character (category code 10, character code 32) because there is a space between the 123 and the macro parameter (#1) in the original definition of \foo; the output parameter is the last token in the list: the next node has a special value of “null” (meaning “empty”) which is used to terminate the list: there are no more nodes after output parameter because it is the final token, indicating the end of the and thus the end of the macro definition. Part 6 In Part 6 we use some detailed graphics to explain and explore the exact meaning of macro expansion and the consequences of TeX’s tokenization of macro arguments prior to feeding them into a macro’s .  Part 1   Part 2    Part 3     Part 4     Part 5     Part 6