Hamburger Helper Modification Thread (Souping Up Hamburger Helper) (Supercharged)

baynes · September 29, 2019, 4:42pm

theres copy instructions

Derumination · September 29, 2019, 4:43pm

represents the loop body and thus does not include the back edge B4 + B2. The order of processing these regions will be the only topological order: Rz , R3, R4. First, R2 has no predecessors within R6; remember that the edge B4 -+ B2 goes outside R6. Thus, fR6,1N[B21 is the identity function," and fR6,0UT[B21 is the transfer function for block B2 itself. The header of region B3 has one predecessor within R6, namely Rz. The transfer function to its entry is simply the transfer function to the exit of Bz, fR6,0UT[B2], which has already been computed. We compose this function with the transfer function of B3 within its own region to compute the transfer function to the exit of B3. Last, for the transfer function to the entry of R4, we must compute
because both B2 and B3 are predecessors of B4, the header of R4. This transfer function is composed with the transfer function fR470UT[Brl to get the desired function fR670UT[B41. Notice, for example, that d3 is not killed in this transfer function, because the path B2 -+ B4 does not redefine variable a. Now, consider loop region R7. It contains only one subregion R6 which represents its loop body. Since there is only one back edge, B4 + B2, to the header of R6, the transfer function representing the execution of the loop body 0 or more times is just f~670UT[B,1: the gen set is {d4,d5,d6) and the kill set is 0. There are two exits out of region R7, blocks B3 and B4. Thus, this transfer function is composed with each of the transfer functions of R6 to get the corresponding transfer functions of R7. Notice, for instance, how d6 is in the gen set for fR7,B3 because of paths like B2 + B4 + B2 -+ B3, or even B2 -+ BS + B4 3B2 -+ B3. Finally, consider R8, the entire flow graph. Its subregions are R1, R7, and R5, which we shall consider in that topological order. As before, the transfer function fR8,1N[BlI is simply the identity function, and the transfer function ~R~,ouT[B~] is just fR1 ,OUT[B~] which in turn is f~l The header of R7, which is B2, has only one predecessor, B1, so the transfer function to its entry is simply the transfer function out of B1 in region Rg. We compose fRs7~u~[B11 with the transfer functions to the exits of B3 and B4 within R7 to obtain their corresponding transfer functions within Rg . Lastly, we consider R5. Its header, B5, has two predecessors within Rg, namely B3 and B4. Therefore, we compute fRs ,OUT[B~] A ,OUT[B~] get fR8 ,IN[B~] - Since the transfer function of block B5 is the identity function, fR8,0UT[B51 = fR8,1N[B51. Step 3 computes the actual reaching definitions from the transfer functions. In step 3(a), IN[R~] = 0 since there are no reaching definitions at the beginning of the program. Figure 9.52 shows how step 3(b) computes the rest of the data-flow values. The step starts with the subregions of R8. Since the transfer function from the start of Rg to the start of each of its subregion has been
"strictly speaking, we mean fR6,~~[R21, but when a region like R2 is a single block, it is often clearer if we use the block name rather than the region name in this context.
684 CHAPTER 9. MACHTNE-INDEPENDENT OPTIMIZATIONS
computed, a single application of the transfer function finds the data-flow value at the start each subregion. We repeat the steps until we get the data-flow values of the leaf regions, which are simply the individual basic blocks. Note that the data-flow values shown in Figure 9.52 are exactly what we would get had we applied iterative data-flow analysis to the same flow graph, as must be the

Derumination · September 29, 2019, 4:44pm

Although removing one reference may render a large number of objects un- reachable, the operation of recursively modifying reference counts can easily be deferred and performed piecemeal across time. Thus, reference counting is par- ticularly attractive algorithm when timing deadlines must be met, as well as for interactive applications where long, sudden pauses are unacceptable. Another advantage is that garbage is collected immediately, keeping space usage low

baynes · September 29, 2019, 4:44pm

just unreadable

baynes · September 29, 2019, 4:45pm

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.4977&rep=rep1&type=pdf

Derumination · September 29, 2019, 4:46pm

Simple trace-based collectors do stop-the-world-style garbage collection, which may introduce long pauses into the execution of user programs. We can reduce the length of the pauses by performing garbage collection one part at a time. We can divide the work in time, by interleaving garbage collection with the mutation, or we can divide the work in space by collecting a subset of the garbage at a time. The former is known as incremental collection and the latter is known as partial collection. An incremental collector breaks up the reachability analysis into smaller units, allowing the mutator to run between these execution units. The reachable set changes as the mutator executes, so incremental collection is complex. As we shall see in Section 7.7.1, finding a slightly conservative answer can make tracing more efficient. The best known of partial-collection algorithms is generational garbage col- lection; it partitions objects according to how long they have been allocated and collects the newly created objects more often because they tend to have a shorter lifetime. An alternative algorithm, the train algorithm, also collects a subset of garbage at a time, and is best applied to more mature objects. These two algorithms can be used together to create a partial collector that handles younger and older objects differently. We discuss the basic algorithm behind partial collection in Section 7.7.3, and then describe in more detail how the generational and train algorithms work. Ideas from both incremental and partial collection can be adapted to cre- ate an algorithm that collects objects in parallel on a multiprocessor; see Sec- tion 7.8.1.
7

baynes · September 29, 2019, 4:46pm

i dont know anything about GC algorithms

Derumination · September 29, 2019, 4:46pm

We can implement write barriers in two ways. The first approach is to re- member, during a mutation phase, all new references written into the Scanned objects. We can place all these references in a list; the size of the list is propor- tional to the number of write operations to Scanned objects, unless duplicates are removed from the list. Note that references on the list may later be over- written themselves and potentially could be ignored. The second, more efficient approach is to remember the locations where the writes occur. We may remember them as a list of locations written, possibly with duplicates eliminated. Note it is not important that we pinpoint the exact locations written, as long as all the locations that have been written are rescanned. Thus, there are several techniques that allow us to remember less detail about exactly where the rewritten locations are.
Instead of remembering the exact address or the object and field that is written, we can remember just the objects that hold the written fields.
We can divide the address space into fixed-size blocks, known as cards, and use a bit array to remember the cards that have been written into.
7.7. SHORT-PAUSE GARBAGE COLLECTION 487
We can choose to remember the pages that contain the written locations. We can simply protect the pages containing Scanned objects. Then, any writes into Scanned objects will be detected without executing any ex- plicit instructions, because they will cause a protection violation, and the operating system will raise a program exception.
In general, by coarsening the granularity at which we remember the written locations, less storage is needed, at the expense of increasing the amount of rescanning performed. In the first scheme, all references in the modified objects will have to be rescanned, regardless of which reference was actually modified. In the last two schemes, all reachable objects in the modified cards or modified pages need to be rescanned at the end of the tracing process.
Combining Incremental and Copying Techniques
The above methods are sufficient for mark-and-sweep garbage collection. Copy- ing collection is slightly more complicated, because of its interaction with the mutator. Objects in the Scanned or Unscanned states have two addresses, one in the From semispace and one in the To semispace. As in Algorithm 7.16, we must keep a mapping from the old address of an object to its relocated address. There are two choices for how we update the references. First, we can have the mutator make all the changes in the From space, and only at the end of garbage collection do we update all the pointers and copy all the contents over to the To space. Second, we can instead make changes to the representation in the To space. Whenever the mutator dereferences a pointer to the From space, the pointer is translated to a new location in the To space if one exists. All the pointers need to be translated to point to the To space in the end.
7.7.3 Partial-Collection Basics
The fundamental fact is that objects typically "die young." It has been found that usually between 80% and 98% of all newly allocated objects die within a few million instructions, or before another megabyte has been allocated. That is, objects often become unreachable before any garbage collection is invoked. Thus, is it quite cost effective to garbage collect new objects frequently. Yet, objects that survive a collection once are likely to survive many more collections. With the garbage collectors described so far, the same mature objects will be found to be reachable over and over again and, in the case of copying collectors, copied over and over again, in every round of garbage collection. Generational garbage collection works most frequently on the area of the heap that contains the youngest objects, so it tends to collect a lot of garbage for relatively little work. The train algorithm, on the other hand, does not spend a large proportion of time on young objects, but it does limit the pauses due to garbage collection. Thus, a good combination of strategies is to use generational collection for young objects, and once an object becomes
488 CHAPTER 7. RUN-TIME ENVIRONMENTS
sufficiently mature, to "promote" it to a separate heap that is managed by the train algorithm. We refer to the set of objects to be collected on one round of partial collection as the target set and the rest of the objects as the stable set. Ideally, a partial collector should reclaim all objects in the target set that are unreachable from the program's root set. However, doing so would require tracing all objects, which is what we try to avoid in the first place. Instead, partial collectors conservatively reclaim only those objects that cannot be reached through either the root set of the program or the stable set. Since some objects in the stable set may themselves be unreachable, it is possible that we shall treat as reachable some objects in the target set that really have no path from the root set. We can adapt the garbage collectors described in Sections 7.6.1 and 7.6.4 to work in a partial manner by changing the definition of the "root set." Instead of referring to just the objects held in the registers, stack and global variables, the root set now also includes all the objects in the stable set that point to objects in the target set. References from target objects to other target objects are traced as before to find all the reachable objects. We can ignore all pointers to stable objects, because these objects are all considered reachable in this round of partial collection. To identify those stable objects that reference target objects, we can adopt techniques similar to those used in incremental garbage collection. In incremen- tal collection, we need to remember all the writes of references from scanned objects to unreached objects during the tracing process. Here we need to re- member all the writes of references from the stable objects to the target objects throughout the mutator's execution. Whenever the mutator stores into a sta- ble object a reference to an object in the target set, we remember either the reference or the location of the write. We refer to the set of objects holding references from the stable to the target objects as the remembered set for this set of target objects. As discussed in Section 7.7.2, we can compress the repre- sentation of a remembered set by recording only the card or page in which the written object is found. Partial garbage collectors are often implemented as copying garbage collec- tors. Noncopying collectors can also be implemented by using linked lists to keep track of the reachable objects. The "generational" scheme described below is an example of how copying may be combined with partial collection.
7.7.4 Generational Garbage Collect ion
Generational garbage collection is an effective way to exploit the property that most objects die young. The heap storage in generational garbage collection is separated into a series of partitions. We shall use the convention of numbering them 0,1,2, . . . , n, with the lower-numbered partitions holding the younger objects. Objects are first created in partition 0. When this partition fills up, it is garbage collected, and its reachable objects are moved into partition 1. Now, with partition 0 empty again, we resume allocating new objects in that
7.7. SHORT-PAUSE GARBAGE COLLECTION 489
partition. When partition 0 again fills,6 it is garbage collected and its reachable objects copied into partition 1, where they join the previously copied objects. This pattern repeats until partition 1 also fills up, at which point garbage collection is applied to partitions 0 and 1. In general, each round of garbage collection is applied to all partitions num- bered i or below, for some i; the proper i to choose is the highest-numbered partition that is currently full. Each time an object survives a collection (i.e., it is found to be reachable), it is promoted to the next higher partition from the one it occupies, until it reaches the oldest partition, the one numbered n. Using the terminology introduced in Section 7.7.3, when partitions i and below are garbage collected, the partitions from 0 through i make up the target set, and all partitions above i comprise the stable set. To support finding root sets for all possible partial collections, we keep for each partition i a remembered set, consisting of all the objects in partitions above i that point to objects in set i. The root set for a partial collection invoked on set i includes the remembered sets for partition i and below. In this scheme, all partitions below i are collected whenever we collect i. There are two reasons for this policy:

Since younger generations contain more garbage and are collected more often anyway, we may as well collect them along with an older generation.
Following this strategy, we need to remember only the references pointing from an older generation to a newer generation. That is, neither writes to objects in the youngest generation nor promoting objects to the next generation causes updates to any remembered set. If we were to collect a partition without a younger one, the younger generation would become part of the stable set, and we would have to remember references that point from younger to older generations as well.
In summary, this scheme collects younger generations more often, and col- lections of these generations are particularly cost effective, since LLobjects die young.." Garbage collection of older generations takes more time, since it in- cludes the collection of all the younger generations and contains proportionally less garbage. Nonetheless, older generations do need to be collected once in a while to remove unreachable objects. The oldest generation holds the most mature objects; its collection is expensive because it is equivalent to a full collec- tion. That is, generational collectors occasionally require that the full tracing step be performed and therefore can introduce long pauses into a program's execution. An alternative for handling mature objects only is discussed next.

baynes · September 29, 2019, 4:47pm

what do they have on register allocation

i see that the book was publish in 2006
and iirc they proved that you could do graph coloring on registers in polynomial time in like 2004

baynes · September 29, 2019, 4:48pm

"chordal coloring" or something to that sort

Derumination · September 29, 2019, 4:48pm

piler, along with relevant symbol table information, and produces as output a semantically equivalent target program, as shown in Fig. 8.1. The requirements imposed on a code generator are severe. The target pro- gram must preserve the semantic meaning of the source program and be of high quality; that is, it must make effective use of the available resources of the target machine. Moreover, the code generator it self must run efficiently. The challenge is that, mathematically, the problem of generating an optimal target program for a given source program is undecidable; many of the subprob- lems encountered in code generation such as register allocation are computa- tionally intractable. In practice, we must be content with heuristic techniques that generate good, but not necessarily optimal, code. Fortunately, heuristics have matured enough that a carefully designed code generator can produce code that is several times faster than code produced by a naive one. Compilers that need to produce efficient target programs, include an op- timization phase prior to code generation. The optimizer maps the IR into IR from which more efficient code can be generated. In general, the code- optimization and code-generation phases of a compiler, often referred to as the back end, may make multiple passes over the IR before generating the target program. Code optimization is discussed in detail in Chapter 9. The tech- niques presented in this chapter can be used whether or not an optimization phase occurs before code generation. A code generator has three primary tasks: instruction selection, register source^ FIont 1 intermediats Code ?ntermediatq Code parget program End code ) Optimixer ) code Generator program
Figure 8.1: Position of code generator
CHAPTER 8. CODE GENERATION
allocation and assignment, and instruction ordering. The importance of these tasks is outlined in Section 8.1. Instruction selection involves choosing appro- priate target-machine instructions to implement the IR statements. Register allocation and assignment involves deciding what values to keep in which reg- isters. Instruction ordering involves deciding in what order to schedule the execution of instructions. This chapter presents algorithms that code generators can use to trans- late the IR into a sequence of target language instructions for simple register machines. The algorithms will be illustrated by using the machine model in Sec- tion 8.2. Chapter 10 covers the problem of code generation for complex modern machines that support a great deal of parallelism within a single instruction. After discussing the broad issues in the design of a code generator, we show what kind of target code a compiler needs to generate to support the abstrac- tions embodied in a typical source language. In Section 8.3, we outline imple- mentations of static and stack allocation of data areas, and show how names in the IR can be converted into addresses in the target code. Many code generators partition IR instructions into "basic blocks," which consist of sequences of instructions that are always executed together. The partitioning of the IR into basic blocks is the subject of Section 8.4. The following section presents simple local transformations that can be used to transform basic blocks into modified basic blocks from which more efficient code can be generated. These transformations are a rudimentary form of code optimization, although the deeper theory of code optimization will not be taken up until Chapter 9. An example of a useful, local transformation is the discovery of common subexpressions at the level of intermediate code and the resultant replacement of arithmetic operations by simpler copy operations. Section 8.6 presents a simple code-generation algorithm that generates code for each statement in turn, keeping operands in registers as long as possible. The output of this kind of code generator can be readily improved by peephole optimization techniques such as those discussed in the following Section 8.7. The remaining sections explore instruction selection and register allocation.
8.1 Issues in the Design of a Code Generator
While the details are dependent on the specifics of the intermediate represen- tation, the target language, and the run-time system, tasks such as instruction selection, register allocation and assignment, and instruction ordering are en- countered in the design of almost all code generators. The most important criterion for a code generator is that it produce cor- rect code. Correctness takes on special significance because of the number of special cases that a code generator might face. Given the premium on correct- ness, designing a code generator so it can be easily implemented, tested, and maintained is an important design goal.
8.1. ISSUES IN THE DESIGN OF A CODE GENERATOR
8.1.1 Input to the Code Generator
The input to the code generator is the intermediate representation of the source program produced by the front end, along with information in the symbol table that is used to determine the run-time addresses of the data objects denoted by the names in the IR. The many choices for the IR include three-address representations such as quadruples, triples, indirect triples; virtual machine representations such as bytecodes and stack-machine code; linear representations such as postfix no- tation; and graphical representations such as syntax trees and DAG's. Many of the algorithms in this chapter are couched in terms of the representations considered in Chapter 6: three-address code, trees, and DAG7s. The techniques we discuss can be applied, however, to the other intermediate representations as well. In this chapter, we assume that the front end has scanned, parsed, and translated the source program into a relatively low-level IR, so that the values of the names appearing in the IR can be represented by quantities that the target machine can directly manipulate, such as integers and floating-point numbers. We also assume that all syntactic and static semantic errors have been detected, that the necessary type checking has taken place, and that type- conversion operators have been inserted wherever necessary. The code generator can therefore proceed on the assumption that its input is free of these kinds of errors.
8.1.2 The Target Program
The instruction-set architecture of the target machine has a significant im- pact on the difficulty of constructing a good code generator that produces high-quality machine code. The most common target-machine architectures are RISC (reduced instruction set computer), CISC (complex instruction set computer), and stack based. A RISC machine typically has many registers, three-address instructions, simple addressing modes, and a relatively simple instruction-set architecture. In contrast, a CISC machine typically has few registers, two-address instruc- tions, a variety of addressing modes, several register classes, variable-length instructions, and instructions with side effects. In a stack-based machine, operations are done by pushing operands onto a stack and then performing the operations on the operands at the top of the stack. To achieve high performance the top of the stack is typically kept in registers. Stack-based machines almost disappeared because it was felt that the stack organization was too limiting and required too many swap and copy operations. However, stack-based architectures were revived with the introduction of the Java Virtual Machine (JVM). The JVM is a software interpreter for Java bytecodes, an intermediate language produced by Java compilers. The inter-
CHAPTER 8. CODE GENERATION
preter provides software compatibility across multiple platforms, a major factor in the success of Java. To overcome the high performance penalty of interpretation, which can be on the order of a factor of 10, just-in-time (JIT) Java compilers have been created. These JIT compilers translate bytecodes during run time to the native hardware instruction set of the target machine. Another approach to improving Java performance is to build a compiler that compiles directly into the machine instructions of the target machine, bypassing the Java bytecodes

Derumination · September 29, 2019, 4:49pm

baynes · September 29, 2019, 4:50pm

Derumination · September 29, 2019, 4:50pm

UNLESS YOU ARE PAYING ME TO LEARN THIS I DONT CARE

Derumination · September 29, 2019, 4:52pm

ian · September 29, 2019, 4:52pm

When you're looking back at this time in your life you will laugh--just give it a few years.

Derumination · September 29, 2019, 4:52pm

seriously ian shut the fuck up stop baiting me into happiness and telling me not to kill myself without talking to you you're so fucking stupid

Derumination · September 29, 2019, 4:53pm

leave me alone

ian · September 29, 2019, 4:54pm

Derumination · September 29, 2019, 4:55pm

I'm 29, I'm from Hungary, and I love gaming.
Wow that's a lot of info, I'll let it sink in.
If you want to know more just ask me in chat :D