1819-87R-SDSA: Practice: Data Flow Analysis with IBM WALA

Setup instructions

• We will use the IBM WALA analysis framework

• You will need Eclipse with the Plugin development tools (PDE) installed. Eclipse is available from www.eclipse.org and you can check out PDE from the Eclipse MarketPlace (Help->EclipseMarketPlace)

• Install IBM WALA, as explained at http://wala.sourceforge.net/wiki/index.php/UserGuide:Getting_Started

NB: for our exercise, it is enough to follow the UserGuide:Getting_Started when it says “To get started, we concentrate on a core subset of WALA for standard Java analysis: […] When you initially import the WALA projects into Eclipse, you may see some compile errors (due to missing external dependencies), but the above projects should compile, which is enough to get started."

• You have to set the two property files as explained in the guide

• Recall to set the correct Java Runtime in file com.ibm.wala.core/dat/wala.properties (e.g., in my case, mac os, it is: /System/Library/Frameworks/JavaVM.framework/Classes)

• Modify com.ibm.wala.core.tests/dat/wala.testdata.txt as follows

Primordial,Java,sodlib,none

Primordial,Java,jarFile,primordial.jar.model

Application,Java,classFile,dataflow/StaticDataflow.class

• To test your installation, run the JUnit test cases in project: com.ibm.wala.core.tests, class: com.ibm.wala.examples.analysis.dataflow.DataflowTest (If all 4 test cases succeed, it works!)

Tutorial and basic documentation

See the PLDI 2010 Tutorial slides

Exercise 1: Study and adapt the implementation of the intra procedural data flow analysis

project: com.ibm.wala.core.tests

1.1) Execute the analysis by running the test cases:

com.ibm.wala.examples.analysis.dataflow.DataflowTest.testIntraproc1()

com.ibm.wala.examples.analysis.dataflow.DataflowTest.testIntraproc2()

NB: The test cases will make the analysis run on the methods of the class dataflow.StaticDataflow in the project com.ibm.wala.testdata

1.2) Understand the implementation by studying the output (on the console) produced by the two test cases with reference to the program under analysis.

1.3) Adapt the implementation of the class com.ibm.wala.examples.analysis.dataflow.IntraprocReachingDefs such that the analysis will handle an initial definition for the static fields in the program, to account for the fact that the variable can have (previously assigned) initial values.

Suggestions:

- adapt method IntraprocReachingDefs.numberPutStatics to set data flow facts for the initial definitions of the fields

- adapt the transfer function (getNodeTransferFunction) to start propagating the initial definition from the entry node (node.isEntryBlock()) of the control flow graph.

Refer to this sample code to test your implementation:

public static void test2bis() {

if (f == 5) {

g = 2;

f = 4;

} else {

g = 7;

}

NB: You expect that the initial definition of field f can be propagated until the exit point through the else branch, while the initial definition of the field g will not propagate because it is killed in both branches.

Exercise 2: Study and experience with the implementation of the context-insensitive inter-procedural data flow analysis

project: com.ibm.wala.core.tests

2.1) Execute the analysis by running the test cases:

com.ibm.wala.examples.analysis.dataflow.DataflowTest.testContextInsensitive()

2.2) Understand the implementation by studying the output (on the console) produced by the test case with reference to the program under analysis.

NB: You may want to patch class ContextInsensitiveReachingDefs according to the attached file that restricts the output to the most interesting classes of the program under analysis.

2.3) [This task is left as assignment] Adapt the methods of the program under analysis to reproduce the imprecision scenarios showed in the initial part of the projected slides.

Exercise 3: Study the implementation of the context-sensitive inter-procedural data flow analysis

project: com.ibm.wala.core.tests

3.1) Execute the analysis by running the test cases:

com.ibm.wala.examples.analysis.dataflow.DataflowTest.testContextSensitive()

NB: You may want to patch classes ContextSensitiveReachingDefs and DataflowTest according to the attached files (<ContextSensitiveReachingDefs.java>, <DataflowTest.java>) that restrict the output to the most interesting classes of the program under analysis.

3.2) Understand the implementation by comparing the code of class ContextSensitiveReachingDefs with the presentation in the paper

— T Reps, S Horwitz, M Sagiv. Precise interprocedural dataflow analysis via graph reachability. POPL 1995 —

according to the following mapping guidelines:

- Architecture of the implementation in WALA: The *tabulation* algorithm (Ref. to the beginning of RHS.sec4) is provided as part of the WALA framework. It has to be injected with:

- supergraph, a graph of graphs. This is the inter-procedural call graph - Ref. to RHS.def2.1

The supegraph is extracted with WALA and just forwarded to the algorithm.

- reaching-def-domain, (also tabulation domain): maps distinct df-facts to distinct integers, to represent set D of RHS.def2.4. - Ref. to RHS.def2.4

The df-facts are collected by a visit of the flow graphs and added to the tabulation domain in method collectInitialSeeds.

- flow-functions: define how each df-fact propagates, is killed or is generated when passing through a edge.

Thus, each control flow edge is associated with a flow function: 2^D —> 2^D, where D is a finite set of data flow facts. - Ref. Example in RHS.sec3.1 and RHS.def3.2

The concept of *exploded* super graph is introduced at the beginning of RHS.sec3.2. and Figure 2

The flow-functions are set by overriding method IUnaryFlowFunction.getTargets that accounts for the df-facts that are generated and propagated (that is not killed) through each control flow edge, while the df-facts that refer to the field of the generated definition must be killed.

- initial df-facts, the (finite) set of df-facts (called path-edges) that will be propagated. The path-edges are collected by a visit of the flow graph in method collectInitialSeeds and injected in the analysis.

Ref. to the definition of path-edges in the initial part of RHS.sec4

3.3) Based on your understanding, try to unswear the following QUESTION:

The implementation of the tabulation algorithm in WALA differs from RHS.sec4. The difference is described as a comment in the implementation, where it is said that it is un “unbalanced” (or *partially balanced*) version of the tabulation algorithm. After reading the comment, and based on what is written the paper, can you understand this difference?

Ultime modifiche: giovedì, 5 settembre 2019, 12:04