General-Purpose Annals

Cortex-M3 Basics

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (Second Edition), 2010

3.1 Registers

As we've seen, the Cortex™-M3 processor has registers R0 through R15 and a number of special registers. R0 through R12 are general purpose, but some of the 16-bit Pollex® instructions tin can but access R0 through R7 (depression registers), whereas 32-bit Pollex-2 instructions can access all these registers. Special registers accept predefined functions and can simply exist accessed by special register access instructions.

three.i.i General Purpose Registers R0 through R7

The R0 through R7 full general purpose registers are also called low registers. They tin can be accessed past all 16-chip Thumb instructions and all 32-bit Pollex-ii instructions. They are all 32 $.25; the reset value is unpredictable.

3.1.2 General Purpose Registers R8 through R12

The R8 through R12 registers are too called high registers. They are accessible by all Pollex-ii instructions but non past all 16-bit Thumb instructions. These registers are all 32 $.25; the reset value is unpredictable (see Figure three.1).

FIGURE 3.1. Registers in the Cortex-M3.

three.ane.3 Stack Pointer R13

R13 is the stack pointer (SP). In the Cortex-M3 processor, in that location are two SPs. This duality allows two separate stack memories to be set up. When using the register proper name R13, you can only access the electric current SP; the other one is inaccessible unless you lot use special instructions to movement to special annals from general-purpose annals (MSR) and move special register to general-purpose register (MRS). The ii SPs are as follows:

Main Stack Pointer (MSP) or SP_main in ARM documentation: This is the default SP; it is used past the operating system (Bone) kernel, exception handlers, and all application codes that crave privileged admission.

Procedure Stack Pointer (PSP) or SP_process in ARM documentation: This is used past the base-level application lawmaking (when not running an exception handler).

Stack Button and Popular

Stack is a retentivity usage model. It is simply part of the system memory, and a pointer annals (inside the processor) is used to go far work as a first-in/last-out buffer. The common use of a stack is to save register contents before some information processing and then restore those contents from the stack later on the processing task is done.

FIGURE 3.2. Bones Concept of Stack Retention.

When doing Push button and POP operations, the pointer register, commonly called stack pointer, is adjusted automatically to prevent adjacent stack operations from corrupting previous stacked data. More details on stack operations are provided on later part of this affiliate.

It is not necessary to utilise both SPs. Simple applications can rely purely on the MSP. The SPs are used for accessing stack retention processes such as PUSH and Pop.

In the Cortex-M3, the instructions for accessing stack memory are Button and Popular. The assembly language syntax is as follows (text subsequently each semicolon [;] is a comment):

Push   {R0}   ; R13=R13-iv, then Memory[R13] = R0

POP   {R0}   ; R0 = Memory[R13], then R13 = R13 + iv

The Cortex-M3 uses a full-descending stack organisation. (More than particular on this subject tin can be found in the "Stack Memory Operations" section of this chapter.) Therefore, the SP decrements when new data is stored in the stack. Button and Popular are unremarkably used to save register contents to stack retentiveness at the commencement of a subroutine and then restore the registers from stack at the end of the subroutine. You lot can PUSH or POP multiple registers in 1 instruction:

subroutine_1

  PUSH   {R0-R7, R12, R14} ; Save registers

  ...   ; Practise your processing

  Popular   {R0-R7, R12, R14} ; Restore registers

  BX   R14   ; Return to calling function

Instead of using R13, you can use SP (for SP) in your programme codes. It ways the same thing. Inside plan code, both the MSP and the PSP can be called R13/SP. However, you can access a particular one using special register admission instructions (MRS/MSR).

The MSP, as well called SP_main in ARM documentation, is the default SP later on power-up; it is used by kernel lawmaking and exception handlers. The PSP, or SP_process in ARM documentation, is typically used past thread processes in organisation with embedded OS running.

Because register PUSH and POP operations are always give-and-take aligned (their addresses must exist 0x0, 0x4, 0x8, ...), the SP/R13 flake 0 and bit 1 are hardwired to 0 and e'er read as nothing (RAZ).

iii.1.4 Link Register R14

R14 is the link register (LR). Inside an associates plan, you tin can write it as either R14 or LR. LR is used to store the return programme counter (PC) when a subroutine or function is called—for example, when y'all're using the branch and link (BL) instruction:

principal   ; Main programme

  ...

  BL function1 ; Call function1 using Branch with Link education.

  ; PC = function1 and

  ; LR = the next instruction in main

  ...

function1

  ...   ; Program code for office 1

  BX LR   ; Return

Despite the fact that bit 0 of the PC is e'er 0 (because instructions are word aligned or one-half word aligned), the LR bit 0 is readable and writable. This is because in the Pollex instruction prepare, chip 0 is often used to point ARM/Thumb states. To allow the Thumb-2 program for the Cortex-M3 to piece of work with other ARM processors that support the Pollex-2 applied science, this to the lowest degree significant bit (LSB) is writable and readable.

3.ane.five Plan Counter R15

R15 is the PC. You can admission it in assembler code by either R15 or PC. Considering of the pipelined nature of the Cortex-M3 processor, when y'all read this register, you lot will observe that the value is different than the location of the executing instruction, ordinarily past iv. For example:

0x1000 :   MOV   R0, PC   ; R0 = 0x1004

In other instructions like literal load (reading of a memory location related to current PC value), the effective value of PC might not exist pedagogy address plus 4 due to alignment in address calculation. Only the PC value is still at least ii bytes alee of the pedagogy address during execution.

Writing to the PC will cause a co-operative (but LRs practise non get updated). Because an instruction address must exist half word aligned, the LSB (bit 0) of the PC read value is always 0. However, in branching, either past writing to PC or using branch instructions, the LSB of the target accost should be fix to 1 considering it is used to indicate the Thumb land operations. If it is 0, it tin can imply trying to switch to the ARM state and volition result in a mistake exception in the Cortex-M3.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9781856179638000065

INTRODUCTION TO THE ARM Instruction Set up

ANDREW N. SLOSS , ... CHRIS WRIGHT , in ARM System Programmer's Guide, 2004

3.5 PROGRAM Status REGISTER INSTRUCTIONS

The ARM instruction set provides two instructions to directly control a program status register (psr). The MRS pedagogy transfers the contents of either the cpsr or spsr into a annals; in the reverse direction, the MSR teaching transfers the contents of a register into the cpsr or spsr. Together these instructions are used to read and write the cpsr and spsr.

In the syntax you lot can encounter a characterization chosen fields. This can be any combination of command (c), extension (x), status (s), and flags (f). These fields relate to item byte regions in a psr, as shown in Figure 3.9.

Effigy 3.ix. psr byte fields.

MRS copy plan condition register to a general-purpose annals Rd = psr
MSR move a general-purpose register to a programme status register psr[field] = Rm
MSR movement an immediate value to a plan condition register psr[field] = immediate

The c field controls the interrupt masks, Thumb state, and processor mode. Example 3.26 shows how to enable IRQ interrupts by clearing the I mask. This operation involves using both the MRS and MSR instructions to read from and then write to the cpsr.

EXAMPLE 3.26

The MSR first copies the cpsr into register r1. The BIC instruction clears bit 7 of r1. Register r1 is then copied back into the cpsr, which enables IRQ interrupts. Y'all can see from this example that this code preserves all the other settings in the cpsr and merely modifies the I bit in the control field.

This case is in SVC manner. In user style you can read all cpsr bits, but y'all can only update the condition flag field f.

iii.5.1 COPROCESSOR INSTRUCTIONS

Coprocessor instructions are used to extend the educational activity set. A coprocessor can either provide boosted computation capability or be used to command the memory subsystem including caches and memory direction. The coprocessor instructions include data processing, register transfer, and retention transfer instructions. We volition provide simply a short overview since these instructions are coprocessor specific. Annotation that these instructions are only used by cores with a coprocessor.

CDP coprocessor data processing—perform an operation in a coprocessor
MRC MCR coprocessor register transfer—move information to/from coprocessor registers
LDC STC coprocessor retentivity transfer—load and store blocks of retention to/from a coprocessor

In the syntax of the coprocessor instructions, the cp field represents the coprocessor number between p0 and p15. The opcode fields describe the operation to take identify on the coprocessor. The Cn, Cm, and Cd fields draw registers inside the coprocessor. The coprocessor operations and registers depend on the specific coprocessor you are using. Coprocessor 15 (CP15) is reserved for system control purposes, such equally retentiveness management, write buffer control, cache control, and identification registers.

Case 3.27

This example shows a CP15 register being copied into a general-purpose register.

Hither CP15 register-0 contains the processor identification number. This annals is copied into the general-purpose register r10.

3.5.2 COPROCESSOR 15 Education SYNTAX

CP15 configures the processor core and has a set of dedicated registers to store configuration information, as shown in Instance three.27. A value written into a register sets a configuration attribute—for example, switching on the cache.

CP15 is called the system control coprocessor. Both MRC and MCR instructions are used to read and write to CP15, where register Rd is the cadre destination annals, Cn is the primary register, Cm is the secondary register, and opcode2 is a secondary register modifier. Yous may occasionally hear secondary registers called "extended registers."

As an example, here is the instruction to move the contents of CP15 control register c1 into register r1 of the processor cadre:

Nosotros use a shorthand notation for CP15 reference that makes referring to configuration registers easier to follow. The reference notation uses the following format:

The first term, CP15, defines information technology as coprocessor 15. The second term, subsequently the separating colon, is the principal annals. The chief annals X tin have a value betwixt 0 and xv. The tertiary term is the secondary or extended register. The secondary register Y can have a value between 0 and 15. The last term, opcode2, is an pedagogy modifier and tin have a value between 0 and vii. Some operations may also use a nonzero value west of opcode1. Nosotros write these as CP15:w:cX:cY:Z.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9781558608740500046

Overview of the Cortex-M3

Joseph Yiu , in The Definitive Guide to the ARM Cortex-M3 (2nd Edition), 2010

2.ii Registers

The Cortex-M3 processor has registers R0 through R15 (see Figure 2.ii). R13 (the stack pointer) is banked, with only 1 copy of the R13 visible at a fourth dimension.

Figure 2.2. Registers in the Cortex-M3.

2.2.1 R0–R12: General-Purpose Registers

R0–R12 are 32-flake general-purpose registers for data operations. Some sixteen-flake Thumb ® instructions tin but access a subset of these registers (low registers, R0–R7).

2.2.2 R13: Stack Pointers

The Cortex-M3 contains two stack pointers (R13). They are banked then that only one is visible at a time. The 2 stack pointers are as follows:

Chief Stack Arrow (MSP): The default stack pointer, used past the operating system (OS) kernel and exception handlers

Procedure Stack Pointer (PSP): Used by user awarding code

The lowest 2 bits of the stack pointers are ever 0, which means they are ever discussion aligned.

2.2.3 R14: The Link Annals

When a subroutine is called, the render address is stored in the link register.

2.ii.iv R15: The Program Counter

The program counter is the current program accost. This annals can be written to control the plan menstruation.

2.two.5 Special Registers

The Cortex-M3 processor also has a number of special registers (see Effigy 2.3). They are as follows:

Plan Condition registers (PSRs)

Interrupt Mask registers (PRIMASK, FAULTMASK, and BASEPRI)

Control register (CONTROL)

FIGURE 2.3. Special Registers in the Cortex-M3.

These registers take special functions and tin be accessed just by special instructions. They cannot be used for normal information processing (see Table ii.1).

Table 2.1. Special Registers and Their Functions

Register Function
xPSR Provide arithmetic and logic processing flags (zippo flag and carry flag), execution condition, and current executing interrupt number
PRIMASK Disable all interrupts except the nonmaskable interrupt (NMI) and hard fault
FAULTMASK Disable all interrupts except the NMI
BASEPRI Disable all interrupts of specific priority level or lower priority level
CONTROL Define privileged status and stack pointer selection

For more data on these registers, see Chapter three.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9781856179638000053

Early Intel® Architecture

In Power and Performance, 2015

i.1.2 Registers

Aside from the four segment registers introduced in the previous section, the 8086 has seven general purpose registers, and two condition registers.

The general purpose registers are divided into two categories. 4 registers, AX, BX, CX, and DX, are classified every bit data registers. These data registers are accessible every bit either the full 16-scrap annals, represented with the X suffix, the depression byte of the total 16-chip register, designated with an L suffix, or the loftier byte of the 16-fleck register, delineated with an H suffix. For instance, AX would access the full 16-fleck register, whereas AL and AH would access the register's low and high bytes, respectively.

The second nomenclature of registers are the pointer/alphabetize registers. This includes the following four registers: SP, BP, SI, and DI, The SP register, the stack pointer, is reserved for usage as a pointer to the top of the stack. The SI and DI registers are typically used implicitly as the source and destination pointers, respectively. Dissimilar the data registers, the pointer/index registers are simply accessible as full xvi-bit registers.

As this categorization may indicate, the general purpose registers come with some guidance for their intended usage. This guidance is reflected in the educational activity forms with implicit operands. Instructions with implicit operands, that is, operands which are assumed to exist a sure register and therefore don't crave that operand to be encoded, allow for shorter encodings for common usages. For convenience, instructions with implicit forms typically also accept explicit forms, which require more bytes to encode. The recommended uses for the registers are every bit follows:

AX Accumulator

BX Data (relative to DS)

CX Loop counter

DX Data

SI Source pointer (relative to DS)

DI Destination pointer (relative to ES)

SP Stack pointer (relative to SS)

BP Base arrow of stack frame (relative to SS)

Aside from assuasive for shorter didactics encodings, this guidance is as well an help to the programmer who, one time familiar with the various register meanings, will be able to deduce the meaning of assembly, assuming information technology conforms to the guidelines, much faster. This parallels, to some degree, how variable names help the programmer reason virtually their contents. Information technology's important to note that these are just suggestions, not rules.

Additionally, there are ii status registers, the instruction pointer and the flags register.

The instruction pointer, IP, is also ofttimes referred to every bit the program counter. This register contains the memory address of the adjacent education to be executed. Until 64-bit mode was introduced, the instruction arrow was not directly accessible to the programmer, that is, information technology wasn't possible to access information technology like the other general purpose registers. Despite this, the instruction pointer was indirectly accessible. Whereas the instruction pointer couldn't be modified through a MOV didactics, information technology could exist modified by any education that alters the program menstruum, such as the Call or JMP instructions.

Reading the contents of the instruction arrow was also possible by taking advantage of how x86 handles office calls. Transfer from one part to another occurs through the CALL and RET instructions. The CALL instruction preserves the current value of the instruction arrow, pushing it onto the stack in lodge to support nested function calls, and and then loads the instruction pointer with the new address, provided as an operand to the educational activity. This value on the stack is referred to as the return address. Whenever the function has finished executing, the RET instruction pops the return accost off of the stack and restores it into the instruction arrow, thus transferring command dorsum to the role that initiated the part call. Leveraging this, the programmer can create a special thunk function that would simply re-create the render value off of the stack, load it into i of the registers, and so render. For example, when compiling Position-Independent-Code (PIC), which is discussed in Chapter 12, the compiler will automatically add functions that utilize this technique to obtain the instruction arrow. These functions are usually called __x86.get_pc_thunk.bx(), __x86.get_pc_thunk.cx(), __x86.get_pc_thunk.dx(), and so on, depending on which register the pedagogy arrow is loaded.

The 2d condition register, the EFLAGS annals, is comprised of 1-bit status and control flags. These bits are prepare by various instructions, typically arithmetics or logic instructions, to signal sure atmospheric condition. These condition flags can then be checked in order to make decisions. For a list of the flags modified by each instruction, run into the Intel SDM. The 8086 divers the following condition and control bits in EFLAGS:

Nothing Flag (ZF) Prepare if the result of the instruction is zero.

Sign Flag (SF) Set if the result of the teaching is negative.

Overflow Flag (OF) Set if the effect of the instruction overflowed.

Parity Flag (PF) Fix if the result has an fifty-fifty number of bits ready.

Carry Flag (CF) Used for storing the carry bit in instructions that perform arithmetic with carry (for implementing extended precision).

Adjust Flag (AF) Similar to the Deport Flag. In the parlance of the 8086 documentation, this was referred to as the Auxiliary Carry Flag.

Management Flag (DF) For instructions that either autoincrement or autodecrement a pointer, this flag chooses which to perform. If set, autodecrement, otherwise autoincrement.

Interrupt Enable Flag (IF) Determines whether maskable interrupts are enabled.

Trap Flag (TF) If prepare CPU operates in single-step debugging mode.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B978012800726600001X

Intel® Pentium® Processors

In Power and Performance, 2015

2.ii.iii Out-of-Social club Execution

As discussed in Section 2.one.i, prior to the 80486, the processor handled one education at a fourth dimension. As a upshot, the processor's resources remained idle while the currently executing instruction was not utilizing them. With the introduction of pipelining, the pipeline was partitioned to allow multiple instructions to coexist simultaneously. Therefore, when the currently executing didactics had finished with some of the processor'south resource, the next didactics could begin utilizing them before the offset pedagogy had completely finished executing. The introduction of μops expanded significantly on this concept, splitting instruction execution into smaller steps.

Each type of μop has a corresponding type of execution unit. The Pentium Pro has v execution units: ii for treatment integer μops, ii for handling floating point μops, and one for treatment memory μops. Therefore, up to five μops tin can execute in parallel. An instruction, divided into one or more than μops, is not done executing until all of its corresponding μops have finished. Plain, μops from the aforementioned instruction have dependencies upon one another so they can't all execute simultaneously. Therefore, μops from multiple instructions are dispatched to the execution units.

Taking advantage of the fine granularity of μops, out-of-order execution significantly improves utilization of the execution units. Up until the Pentium Pro, Intel processors executed in-order, meaning that instructions were executed in the same sequence as they were organized in retention. With out-of-order execution, μops are scheduled based on the available resources, equally opposed to their ordering. Every bit instructions are fetched and decoded, the resulting μops are stored in the Reorder Buffer. As execution units and other resources become available, the Reservation Station dispatches the respective μop to i of the execution units. Once the μop has finished executing, the consequence is stored back into the Reorder Buffer. In one case all of the μops associated with an instruction take completed execution, the μops retire, that is, they are removed from the Reorder Buffer and any results or side-effects are made visible to the rest of the system. While instructions can execute in any order, instructions ever retire in-order, ensuring that the programmer does not demand to worry about handling out-of-order execution.

To illustrate the problem with in-social club execution and the benefit of out-of-order execution, consider the following hypothetical state of affairs. Assume that a processor has two execution units capable of handling integer μops and one capable of handling floating bespeak μops. With in-order scheduling, the almost efficient usage of this processor would be to intermix integer and floating point instructions following the 2-to-ane ratio. This would involve carefully scheduling instructions based on their teaching latencies, along with the latencies for fetching any retentiveness resources, to ensure that when an execution unit becomes available, the adjacent μop in the queue would be executable with that unit.

For example, consider four instructions scheduled on this instance processor, three integer instructions followed past a floating indicate instruction. Assume that each didactics corresponds to ane μop, that these instructions have no interdependencies, and that all three execution units are currently available. The get-go two integer instructions would be dispatched to the two available integer execution units, merely the floating point instruction would non be dispatched, even though the floating betoken execution unit was available. This is because the third integer instruction, waiting for ane of the two integer execution units to get bachelor, must be issued first. This underutilizes the processor'due south resources. With out-of-guild execution, the start two integer instructions and the floating signal educational activity would be dispatched together.

In other words, out-of-order execution improves the utilization of the processor'due south resource. Additionally, because μops are scheduled based on available resources, some educational activity latencies, such as an expensive load from memory, may be partially or completely masked if other work tin be scheduled instead.

Register Renaming

From the pedagogy gear up perspective, Intel processors have eight full general purpose registers in 32-fleck mode, and 16 full general purpose registers in 64-bit mode, however, from the internal hardware perspective, Intel processors take many more registers. For example, the Pentium Pro has 40 registers, organized in a structure referred to as a Concrete Register File.

While this many extra registers might seem similar a performance boon, especially if the reader is familiar with the performance gain received from the eight actress registers in 64-fleck mode, these registers serve a different purpose. Rather than providing the process with more registers, these extra registers serve to handle information dependencies in the out-of-order execution engine.

When a value is stored into a register, a new register file entry is assigned to comprise that value. Once some other value is stored into that register, a different register file entry is assigned to contain this new value. Internal to the processor cadre, each data dependency on the get-go value will reference the commencement entry, and each data dependency on the second value volition reference the second entry. Therefore, the out-of-order engine is able to execute instructions in an order that would otherwise be impossible due to simulated data dependencies.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128007266000021

Load/store and branch instructions

Larry D. Pyeatt , William Ughetta , in ARM 64-Fleck Assembly Linguistic communication, 2020

iii.ii AArch64 user registers

As shown in Fig. 3.2 , the AArch64 ISA provides 31 general-purpose registers, which are called

Image 2

through

Image 3

. These registers can each store 64 $.25 of data. To utilise all 64 bits, they are referred to equally

Image 4

through

Image 5

(capitalization is optional). To utilize only the lower (least significant) 32 $.25, they are referred to every bit

Image 6

. Since each register has a 64-bit proper name and a 32-bit name, we use

Image 7

through

Image 8

to specify a register without specifying the number of $.25. For example, when we refer to

Image 9

, we are really referring to either

Image 10

or

Image 11

.

Figure 3.2

Figure 3.ii. AArch64 general purpose registers (

Image 1
) and special registers.

iii.2.1 General purpose registers

The general-purpose registers are each used according to specific conventions. These rules are defined in the application binary interface (ABI). The AArch64 ABI is chosen AAPCS64. The difference between callee saved and caller saved registers will also be explained in Section 5.4.four.

Registers

Image 12
are used for passing arguments when calling a procedure or function Registers
Image 13
are scratch registers and can be used at any time because no assumptions are fabricated about what they contain. They are called scratch registers because they are useful for belongings temporary results of calculations. Registers
Image 14
can also be used as scratch registers, but their contents must be saved before they are used, and restored to their original contents before the procedure exits.

Some of the registers have alternate names. For example,

Image 15
is as well known every bit
Image 16
. Most of these alternating names are only of involvement to people writing compilers and operating systems. Nonetheless, two of these registers are of involvement to all AArch64 programmers.

iii.two.2 Frame arrow

The frame pointer,

Image 17
, is used by loftier-level language compilers to runway the electric current stack frame. This register can be helpful when the program is running nether a debugger, and can sometimes aid the compiler to generate more efficient lawmaking for returning from a subroutine. The GNU C compiler can exist instructed to utilise
Image 17
as a full general-purpose annals by using the –fomit-frame-pointer command line selection. The use of
Image 17
as the frame arrow is a programming convention. Some instructions (e.g. branches) implicitly modify the plan counter, the link register, and even the stack arrow, so they are considered to be hardware special registers. Every bit far as the hardware is concerned, the frame pointer is exactly the same equally the other full general-purpose registers, but AArch64 programmers employ information technology for the frame pointer because of the ABI.

three.ii.3 PSTATE annals

The

Image 18

register contains bits that indicate the condition of the electric current procedure, including information almost the results of previous operations. Fig. 3.3 shows all of its bits. The dashed lines indicate unused space that may be reserved for futurity AArch64 architectural extensions. The

Image 18

register is actually a collection of independent fields, well-nigh of which are but used by the operating arrangement. User programs make utilize of the first four $.25, N, Z, C, and 5. These are referred to as the condition flags field. Most instructions tin change these flags, and afterward instructions can use the flags to control their operation. Their significant is as follows:

Negative:

This flake is set up to one if the signed result of an operation is negative, and ready to zero if the outcome is positive or zip.

Zilch:

This flake is set to one if the result of an functioning is zero, and set to zero if the result is non-goose egg.

Deport:

This bit is set to one if an add operation results in a carry out of the nearly pregnant bit, or if a decrease operation results in a borrow. For shift operations, this flag is fix to the last flake shifted out past the shifter.

oVerflow:

For addition and subtraction, this flag is set if a signed overflow occurred.

Figure 3.3

Figure iii.3. Fields in the PSTATE register.

3.ii.4 Link register

The process link register,

Image 5
, is used to hold the render address for subroutines. Sure instructions cause the program counter to be copied to the link annals, then the programme counter is loaded with a new address. These branch-and-link instructions are briefly covered in Section 3.5 and in more item in Section 5.4. The link register could theoretically be used as a scratch register, but its contents are modified by hardware when a subroutine is called, in order to save the correct return address. Using
Image 5
as a general-purpose register is dangerous and is strongly discouraged.

3.2.5 Stack pointer

The program stack was introduced in Department one.4. The stack pointer,

Image 19
, is used to hold the address where the stack ends. This is commonly referred to as the acme of the stack, although on most systems the stack grows down and the stack pointer really refers to the lowest accost in the stack. The address where the stack ends may change when registers are pushed onto the stack, or when temporary local variables (automatic variables) are allocated or deleted. The use of the stack for storing automatic variables is described in Chapter v. The stack pointer can simply exist modified or read past a pocket-sized set of instructions.

three.2.6 Goose egg register

The zero register,

Image 20
, can be referred to every bit a 64-bit register,
Image 21
, or a 32-bit register,
Image 22
. It always has the value zero. Most instructions can apply the zero register equally an operand, even equally a destination register. If this is the example, the instruction volition not change the destination annals. However, it can still have side furnishings, including updating the
Image 18
flags based on the ALU operation and incrementing a annals in pre-indexed or post-indexed addressing. The cipher register cannot e'er exist used every bit an operand. It shares the aforementioned binary encoding with the stack pointer register,
Image 19
, which is the value
Image 23
. Some instructions can admission the zero register, while others tin can access the stack pointer.

3.2.7 Program counter

The program counter,

Image 24
, always contains the accost of the next didactics that volition be executed. The processor increments this register by four, automatically, subsequently each teaching is fetched from memory. By moving an address into this register, the programmer can cause the processor to fetch the adjacent instruction from the new accost. This gives the developer the ability to leap to whatsoever address and begin executing code there. Only a small-scale number of instructions can access the
Image 24
straight. For example instructions that create a PC-relative address, such every bit
Image 25
, and instructions which load a annals, such equally
Image 26
, are able to access the program counter straight.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128192214000109

Knights Landing compages

Jim Jeffers , ... Avinash Sodani , in Intel Xeon Phi Processor High Performance Programming (Second Edition), 2016

Integer execution unit

The IEU executes integer μops, which are defined as those that operate on general-purpose registers R0–R15 (i.e., RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8…R15). There are 2 IEUs in the core. Each IEU contains 12-entry RS that issues i μop per cycle. The Integer RSes are fully out-of-social club in their scheduling. Most operations have 1-cycle latency and are supported by both IEUs, merely a few operations have 3- or v-cycles latency (eastward.g., multiplies) and are only supported by one of the IEUs.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780128091944000041

Estimator Data Processing Hardware Compages

Paul J. Fortier , Howard Eastward. Michel , in Estimator Systems Performance Evaluation and Prediction, 2003

2.3.1 Education types

Based on the number of registers available and the configuration of these registers several types of instruction are possible—for example, if many registers are available, as would exist the case in a stack computer, no address computations are needed and the teaching, therefore, can exist much shorter both in format and execution fourth dimension required. On the other mitt, if at that place are no general registers and all computations are performed past retentiveness movements of data, then instructions will be longer and require more than time due to operand fetching and storage. The following are representative of instruction types:

0-address instructions—This type of instruction is found in machines where many general-purpose registers are available. This is the case in stack machines and in some reduced instruction prepare machines. Instructions of this type perform their function totally using registers. If nosotros have three general registers, A, B, and C, a typical format would have the course:

(two.1) R [ A ] < R [ B ] operator R [ C ]

which indicates that the contents of registers B and C have the operator (such equally add, subtract, multiply, etc.) performed on them, with the event stored in general register C. Similarly, we could describe instructions that use only ane or two registers as follows:

(2.2) R [ B ] < R [ B ] operator R [ C ]

or

(ii.three) operator R [ C ]

which represents two-register and one-annals instructions, respectively. In the ii-annals case one of the operand registers is also used as the upshot register. In the unmarried-register case the operand register is as well the consequence register. The increase instruction is an example of one-annals instruction. This blazon of instruction is plant in all machines.

1-accost instructions—In this type of didactics a single memory address is constitute in the instruction. If another operand is used, it is typically an accumulator or the acme of a stack in a stack reckoner. The typical format of these instructions has the form:

(2.4) operator K [ accost ]

where the contents of the named memory accost have the named operator performed on them in conjunction with an implied special annals. An example of such an instruction could be equally follows:

(2.5) Movement K [ 100 ]

or

(ii.6) Add together M [ 100 ]

which moves the contents of memory location 100 into the ALU's accumulator or adds the contents of memory address 100 with the accumulator and stores the result in the accumulator. If the result must be stored in retentiveness, nosotros would need a store instruction:

(2.seven) Store M [ 100 ]

1-and-l/2-accost instructions—In one case we have an compages that has some full general-purpose registers, we can provide more than avant-garde operations combining memory contents and the general registers. The typical instruction performs an operation on a memory location's contents with that of a general register—for example, nosotros could add the contents of a memory location with the contents of a general register, A, every bit shown:

(2.8) Add R [ A ] , Grand [ 100 ]

This instruction typically stores the result in the first named location or register in the instruction. In this example information technology is register A.

two-address instructions—Two address instructions utilize two retentiveness locations to perform an didactics—for example, a block move of N words from ane location in memory to another, or a block add together. The move may appear as follows:

(ii.nine) Move N , 1000 [ 100 ] , 1000 [ 1000 ]

ii-and-50/2-address instructions—This format uses two memory locations and a general register in the instruction. Typical of this type of instruction is an functioning involving ii memory locations storing the result in a register or an operation with a full general annals and a retentivity location storing the consequence on another memory location, as shown:

(2.10) R [ A ] > > Thou [ 100 ] operator M [ 1000 ] Thousand [ 1000 ] > > M [ 100 ] operator R [ A ]

3-address instructions—Some other less common form of instruction format is the three-address education. These instructions involve three memory locations—ii used for operands and one as the results location. A typical format is shown:

(2.eleven) M [ 200 ] > > M [ 100 ] operator 1000 [ 300 ]

Read total chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781555582609500023

Advanced Encryption Standard

Tom St Denis , Simon Johnson , in Cryptography for Developers, 2007

x86 Functioning

The AMD Opteron achieves a nice boost due to the add-on of the eight new general-purpose registers. If we examine the GCC output for x86_64 and x86_32 platforms, nosotros can see a squeamish difference between the two ( Table four.2).

Table 4.2. First Quarter of an AES Circular

Both snippets attain (at least) the first MixColumns stride of the first round in the loop. Note that the compiler has scheduled part of the second MixColumns during the outset to achieve college parallelism. Even though in Tabular array iv.ii the x86_64 code looks longer, it executes faster, partially because it processes more of the second MixColumns in roughly the same time and makes skillful use of the actress registers.

From the x86_32 side, nosotros can clearly see diverse spills to the stack (in bold). Each of those costs usa 3 cycles (at a minimum) on the AMD processors (two cycles on most Intel processors). The 64-bit code was compiled to have zippo stack spills during the main loop of rounds. The 32-bit code has virtually fifteen stack spills during each round, which incurs a penalization of at to the lowest degree 45 cycles per round or 405 cycles over the course of the 9 full rounds.

Of class, we do not meet the full penalisation of 405 cycles, every bit more than ane opcode is being executed at the same time. The penalty is likewise masked by parallel loads that are likewise on the critical path (such as loads from the Te tables or circular central). Those delays occur anyways, and then the fact that we are also loading (or storing to) the stack at the same fourth dimension does not add to the cycle count.

In either example, nosotros tin can improve upon the code that GCC (iv.i.1 in this case) emits. In the 64-bit lawmaking, we see a pairing of "shrq $24, %rdx" and "and1 $255,%edx". The andl operation is not required since only the lower 32 bits of %rdx are guaranteed to have anything in them. This potentially saves upward to 36 cycles over the course of ix rounds (depending on how the andl operation pairs upward with other opcodes).

With the 32-bit code, the double loads from (%esp) (lines 2 and 3) incur a needless three-cycle penalty. In the case of the AMD Athlon (and Opterons), the load shop unit will short the load functioning (in certain circumstances), simply the load volition always take at to the lowest degree three cycles. Irresolute the second load to "movl %edx,%ebx" means that we stall waiting for %edx, simply the penalization is but i cycle, non three. That change alone will free up at well-nigh 9*2*4 = 72 cycles from the 9 rounds.

Read full affiliate

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9781597491044500078

Embedded Processor Architecture

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

Annals Operands

Source and destination operands can be any of the follow registers depending on the instruction beingness executed:

32-chip general purpose registers (EAX, EBC, ECX, EDX, ESI, EDI, ESP, or EBP)

16-scrap general purpose registers (AX, BX, CX, DX, SI, SP, BP)

8-bit full general-purpose registers (AH, BH, CH, DH, AL, BL, CL, DL)

Segment registers

EFLAGS register

MMX

Control (CR0 through CR4)

System Table registers (such as the Interrupt Descriptor Table register)

Debug registers

Motorcar-specific registers

On RISC embedded processors, in that location are by and large fewer limitations in the registers that can be used by instructions. IA-32 often reduces the registers that can be used as operands for certain instructions.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780123914903000059