in Search
Welcome to Mainframe Migration Sign in |
Company Information

    The Software Revolution Inc

    The Software Revolution, Inc. (TSRI) provides automated legacy computer system modernization services to both government and industry. Our low-cost and low-risk services are derived from a highly advanced artificial intelligence-based software re-engineering toolset called JANUS
    www.softwarerevolution.com
    The Software Revolution, Inc.
    11410 NE 122nd Way, Suite 304
    Kirkland, WA 98034

    Greg Tadlock
    VP of Sales
    gtadlock@softwarerevolution.com
    (425) 284-2782
    (425) 284-2785 (fax)
    Randy A. DoblarĀ 
    Chief Operating Officer
    rdoblar@softwarerevolution.com
    (425) 284-2800
    (425) 284-2785 (fax)
    Philip Newcomb
    CEO & Founder
    salesteam@softwarerevolution.com
    (425) 284-2790
    (425) 284-2785 (Fax)

This Member

Syndication

The Software Revolution Inc

TSRI - "Legacy System Modularization Analysis & Partitioning"

 

 

 

 

Legacy System Modularization

 

Analysis & Partitioning

 

 

The Software Revolution, Inc

 

 

 

Introduction

 

Many large companies are facing the problem of their legacy systems becoming an inhibitor to their business growth and change.  One way to avoid this problem is to simply get rid of the legacy system and replace it with a new system, which has been developed based on new requirements.  This option is rarely exercised in practice however, for many reasons, with the most significant reason being the low success rate and high cost associated with traditional manual replacement projects.  

 

Automated modernization, which employs artificial intelligence-based program transformation technology, has been proven to be a lower risk and less expensive option than nearly any other approach.  While there is growing preponderance of evidence that automated transformation is the most dependable, vastly faster and cheaper approach to legacy system modernization, many companies have adopted a more cautious approach to the modernization of their mission critical systems.  A growing number of organizations undertake various kinds of structural improvements to their systems, such as modularization, before undertaking other approaches to modernization that are generally perceived as more radical.

 

Incremental Modernization

 

Increasingly organizations approach legacy system modernization as a multi-phase incremental activity, with the objective of each phase to produce clearly defined, perceived and demonstrable benefits, that can be easily accomplished within the established schedule and budget constraints, while simultaneously being structured to be complementary to the objectives of each project in a series of interrelated projects.  

 

Projects to improve the structure of the legacy system before more radical change is undertaken, such as replacing the language it is written in and the database it uses or the platform (J2EE or .NET) it runs in, are becoming more commonplace.  Key reasons for this approach are that smaller projects can be undertaken with less schedule, cost and technical risk; the expenditures better contained; and the benefits more carefully articulated and measured than larger projects. 

 

Employing an automated approach to legacy system modernization however, allows the entire system to be addressed as a single project at low cost and technical risk, over a relatively short project schedule, while still maintaining very high Return-On-Investment (ROI). 

 

Legacy Modularization Definition

 

Modularization is a form of program restructuring that replaces higher-complexity programs with collections of lower-complexity modules. 

 

Human factor studies have consistently shown that higher-complexity programs require greater cost to maintain than lower-complexity programs.  Program restructurings, such as modularization, are generally undertaken only when an information system has grown to such complexity that its maintenance has become a major consumer of organizational resources.  While the number of modules is greater than the original number of programs, the purpose of each module is simpler, its function and data implementation are less complex, and the purpose of each module is more precisely and better documented. Some additional code is introduced to implement the interfaces between modules and the module specific data definitions, but this potential increase in code quantity is more than offset by the reduction of the complexity of individual modules below a target complexity threshold that will greatly reduce the effort associated with analysis, code modification, debugging and testing during maintenance.

 

Organizations may undertake modularization as an alternative to, or in concert with, more radical forms of modernization such as system rewrite, COTS replacement, and language transformation or other forms of program restructuring such as functional-area-partitioning, data-centric modularization, slice-based segmentation, and business-rule extraction, which will be not be discussed in this paper.

 

Modularization may be undertaken as a relatively “safe” form of restructuring activity to prepare a major legacy system for subsequent modernization phases.  Modularization may be undertaken in conjunction with other forms of re-factoring.  Undertaken in concert with a project to capture domain knowledge from the existing programming staff in the form of human prepared module descriptions, modularization may accelerate the capture and improve the accuracy of business models and business process descriptions for the legacy system.

 

Legacy Re-factoring

 

Organizations wishing to improve the structure of their legacy code may consider various other kinds of  “safe” re-factoring operations.  These will be described in greater detail in sections below.   But to summarize, TSRI can provide automated services for legacy language re-factoring that include: Dead Code Removal, Dead Data Removal, Redundant Code Removal, Redundant Data Removal, Duplicate Code Consolidation, Duplicate Data Consolidation, Similar Code Consolidation, Similar Data Consolidation, Data Name Standardization, Program Name Standardization, Copy Book Name Standardization, Procedure Name Standardization.  

 

These forms of legacy re-factoring can provide significant advantages to the organization employing them by reducing the complexity and improving the maintainability of the legacy software to which they are applied.  The main objective of this paper is to discuss modularization, so these services will be described only briefly within the context of this paper.

 

Automated Modularization vs. Manual Modularization

 

When supported by automated tools, legacy system modularization has been shown to be a low risk, highly disciplined, and inexpensive restructuring activity with rapid return on investment through reduced maintenance costs and continued return on investment during subsequent legacy system modernized activities.   Without automation, modularization is a very time consuming, risk prone and difficult undertaking.  Manual modularization is a complex task with much hand checking needed to validate the results.  Typically it takes several person-months for a team of programmers to modularize a single 15,000 LOC unit of code with higher-complexity programs of 40,000 lines of code or more requiring as many as 10 months to a year for a team of skilled programmers to modularize by hand.

 

Modularization Business Case

 

The business case for modularization is typically based upon the anticipated reduction in maintenance costs, the ease of reuse of well-documented modules, and productivity gains during major upgrades.  Modularization may be undertaken to reduce the costs of in-house maintenance, to improve IT department productivity, and to avoid the risks and uncertainty associated with outsourcing of internal maintenance activities.  The benefits achieved by modularizing a legacy system are preserved even if the legacy system is later modernized into a new language, later mined for business rules, or integrated with 3rd party COTS packages. 

 

Modularization can be done independently of, or as a part of a broader set of modernization objectives.  None of the structural improvements achieved by modularization are sacrificed, and the final product may ultimately be improved by first modularizing the legacy system, while keeping it in the legacy language, and then using in-house staff with domain expertise to augment machine-generated functional-level documentation with business process model descriptions for each of the resultant modules.  If undertaken before commencing other modernization activities, the knowledge captured during modularization can provide insight that may be used to guide subsequent modernization phases.

 

While more modern languages unquestionably provide greater flexibility in their support for many other forms of program restructuring, functionally-centric modularization, which is the main focus of this paper, is well supported by most dialects of COBOL and can be undertaken for COBOL with a high level of confidence that its benefits and business case can be demonstrated. 

 

Incorporation of Legacy Modularization into the Modernization Process

 

Some TSRI customers may wish to prepare a major legacy system for subsequent modernization phases by first Modularizing and/or Re-factoring the legacy system before transforming it.  A hybrid of our standard four-phase modernization process six-phase modernization process is shown below that incorporates Legacy Modularization and Legacy Re-factoring.

 

Standard Modernization Process

Hybrid Modernization Process

Assessment

Assessment

 

Legacy Modularization

 

Legacy Re-factoring

Transformation

Transformation

Re-factoring

Re-factoring

Web-Enablement

Web-Enablement

 

This standard process may be modified to include modularization as show below.

 

Assessment:  The Assessment process captures the legacy system software’s ”As-Is” state, derives existing systems design, and identifies transformation baseline metrics to guide the subsequent transformation, re-factoring and web-enablement processes.

 

            Legacy Modularization: is a relatively “safe” form of restructuring activity that replaces higher-complexity programs with collections of lower-complexity modules, accelerate the capture and improve the accuracy of business models and business process descriptions for the legacy system.

 

Transformation:  The Transformation process automatically rewrites legacy software to target language, couples target language to relational or object-oriented databases, transforms the customer’s data bases, and directs the application to new target platforms.

 

Re-factoring:  Re-factoring or re-engineering the resulting target language is a two-step process to improve a system’s maintainability and potentially its performance.  TSRI’s re-factoring steps are as follows:

 

·        Automatic Re-factoring: The identification and removal of dead and redundant code by TSRI to improve maintainability without changing the system’s functionality.

·        Semi-Automatic Re-factoring: The identification of situations within the code where Domain Experts could opt to make engineering changes to improve system maintainability without impacting system functionality.

·        Web-enablement:  The Web-enablement process facilitates legacy system migration to a web environment by transforming the code into Java for the J2SE/J2EE and Microsoft C#/.NET web frameworks.

 

Automated Modularization Case Study

 

Industrial case studies have shown automated modularization to produce very high level of ROI compared to more manual methods and with very high levels of automation, even full automation.  A landmark paper, “Automating the Modularization of Large (COBOL) Programs: Application of an Enabling Technology For Reengineering”, presented at the 1st Working Conference on Reverse Engineering, May 21-23, 1993[1] by the Software Revolution founder, Philip Newcomb, documented the efficacy of automated reengineering technology for same-language software modularization. 

 

This much cited industrial research paper, was later republished in a slightly different form in the CACM in 1994, and documented development and application of a research-prototype modularization tool that was applied to several monolithic COBOL programs ranging in size from 10,000 to 40,000 lines of code from the Boeing Payroll System, a 600,000 line mainframe COBOL application.  Developed in a little over 4-1/2 person months the research prototype achieved near identical results on semi-automated modularization tasks in 4 hours to the results achieved by a team of professional programmers in 10 months.    Direct comparison of machine and human performance on modularization tasks was possible because Boeing had kept detailed project performance data during the manual modularization of 18 large Boeing Payroll system programs. 

 

Human factors studies, which are not publicly available, documented the productivity improvements that were achieved by maintenance programmers using the COBOL programs following modularization.   This industrial research provided compelling justification about the value of automated software transformation, and contributed to Boeing’s decision to invest more than $10 million in automated software analysis and transformation research over a 6 year period from 1994 through 1998. 

 

Current Status of Modularization

 

At the time Boeing’s modularization pilot was conducted in 1991, Viasoft vended a mainframe suite of tools Renaisance™ that included a modularization option.  In some trials Renaisance™ amplified the number of lines of code as much as 50-fold over the original.  Renaisance™ was investigated by Boeing for the Payroll Pilot, but was deemed unsuitable because it created code-bloat. 

 

Anderson Consulting’s Center for Strategic Technology Research laboratory (CSTaR) investigated techniques for program segmentation, a generalized form of code extraction and repackaging as a part of its SRE/COBOL™ workbench, [2] but SRE/COBOL™ was never commercialized.  CSTaR’s SRE/COBOL™ was based on Refine/COBOL™ a COBOL reverse-engineering tool which originated from collaborative research by Boeing and Reasoning while TSRI’s founder was at Boeing from 1989 through 1994.  The modularization prototype was built using a precursor to Refine/COBOL.  In 2000 Reasoning discontinued support for Refine/COBOL. 

 

Neither Boeing nor Reasoning commercialized the modularization research prototype, and the technological foundation upon which the modularization prototype was developed has since been superseded by more advanced technologies.  Commencing in 2003, TSRI replaced Reasoning’s Lisp-based Refine™ technology with a C++-based technology, which provided superior support for TSRI’s automated transformation services, which have focused primarily upon automated transformation of legacy languages into modern languages.

 

We have found more than a 100 citations to our 1993 paper, but we are not aware of any commercial products that achieved comparable results to those we reported in 1993.

 

Same-Language Modularization vs. Language Transformation

 

Fully automated transformation from legacy languages into modern languages is, in general, more technologically difficult than is legacy system restructuring.  Both involve application of parsing and transformation technologies.  The chief challenge of legacy to modern language transformation is the achievement of valid syntactic and semantic mappings, while the primary challenge of same-language modularization is the preservation of functional equivalence while composition improvements are introduced to reduce control-flow complexity of the program.   Minimizing the overhead associated with the introduction of inter-module interfaces, avoiding code duplication and preserving state or working storage are the major challenges.

 

Functionally Centric Modularization

 

Functionally-centric Modularization is a form of program restructuring that produces modularized programs in the much them same style and format as would be produced if a team of programmers were to systematically recompose a complex program into smaller more manageable components.  Modularization seeks a natural repartitioning of the system that preserves the functionality of the original system while improving its maintainability. The new modules must be constructed in such a way that:

 

1.      Data declarations are properly assigned to linkage and storage sections, with their internal structure and alignment preserved,

2.      CALLs and entry points for modules are created with correct parameters,

3.      There are no cross-module GOTOs or unmatched error-handling constructs,

4.      Array index registers are properly treated, and

5.      Modularization steps are documented to facilitate future maintenance.

 

Modularization involves a broad range of non-localized changes across thousands of lines of code.  These changes must preserve the functionality of the original system while improving its maintainability. 

 

Modularization Automation Requirements

 

Automation introduced to support the modularization process should:

 

1.      Support and facilitate the human-making or human review process to assure that the partitioning makes sense from a human perspective. 

2.      Produce correctly modularized programs in the same style and format as would be produced manually,

3.      Support unit and integration testing of the modularized system,

4.      Integrate with the existing mainframe environment,

5.      Not disrupt or be an impediment to the conduct of on-going maintenance tasks.

 

The Modularization Process

 

The goal of the modularization process is to break a large program into several programs of manageable size by repeatedly selecting paragraphs in an existing program, called the “source”, and transferring them and related code into another program, called the “target”.   Some aspects of this modularization are necessary for semantically correct modularization, and others are required by related maintenance tasks, independently of whether the modularization process is applied manually or automated.

 

This section first gives an overview of the modularization process, and then discusses two key technical problems encountered in this process, creating CALL parameters, and creating the modularized program’s data division.

 

The process of selecting paragraphs and transferring them to a target program is called a “modularization step”.  The input to a modularization step is a COBOL program (the source) and the set of selected paragraphs in the source.  The outputs of a modularization step are the target program and the modified source program.

 

Each modularization step removes some paragraphs from the source.   The modified source can then serve as input to the next step.  The steps are repeated until the source is reduced to manageable size and the paragraphs in the original source are distributed among a collection of smaller programs.   A COBOL program with tens of thousands of lines of code may require many modularization steps.

 

The modularization process includes analyzing the source to make a suitable selection of paragraphs for each modularization step.  Since this step is relatively straightforward compared to implementing the subsequent modularization steps, it was not implemented into the original modularization prototype.  

 

The commercial grade modularization tool will support automated or semi-automated paragraph selection, using a combination of metrics to measure the quality of the source and target code.  McCabe’s complexity metrics will be used for measuring control complexity, line of code metrics for measuring code size, the number of paragraphs, call depth, call fan-in and fan-out, number of parameters in linkage sections, number of data elements in working storage sections, composition of language constructs, and subject area of variables and paragraphs will be used as input to the selection of the paragraphs for each modularization step.  Following each modularization step these metrics will be regenerated for each target module and the source program for use as input to the next modularization step.

 

In the modularization prototype during each the modularization step, each selected paragraph par is replaced by a new paragraph Z-CALL-target-entry.  entry is the name of an entry created for par in the target program target.  This Z-CALL-target-entry paragraph contains a CALL entry statement that invokes par in the target program.  Also, each PERFORM par statement in the source program is replaced by a PERFORM Z-CALL-target-entry statement.   This approach to CALL introduction is called segregated CALL introduction.  The modularization prototype used CALL segregation to encapsulate the CALL statements for subsequent manual review and to isolate CALL statements with long sequences of parameters from the main body of code.

 

CALL segregation allowed analysts already familiar with the legacy systems to continue using the code without incurring the “mental” overhead to read CALL USING parameter sequences; it isolated complexities of parameter sequences to a separate section of the program; it facilitated distinction of CALLs introduced by the modularization process from CALLs that were already present, and it allow programmers scanning the source code to instantly recognize CALLs that were introduced by modularization.

 

The commercial grade modularization tool will simplify of the above process by permitting either CALL target-entry segregation, or direct CALL target-entry introduction at the point in the code where the PERFORM statement was replaced.  The user will decide upon direct CALL or segregated CALL introduction at the time of paragraph selection, and in the absence of user direction, the tool will decide using default or user-defined parameter length restrictions.

 

After completion of a modularization step the target program (the new module) has:

 

1.      A procedure division that contains:

a.       The selected paragraphs, plus all paragraphs in the PERFORM call graph below the selected paragraphs, and

b.      Appropriate entries for the paragraphs that are called with direct or segregated CALLS from the modified source program.

2.      A data division that contains:

a.       A linkage section for data elements that are entry parameters or fields of entry parameters, and

b.      A working storage section for data elements that are local to the target program.

3.      Identification and environment divisions the same as in the source except for a new program id and the addition of comments documenting the modularization step.

 

The modularization process requires all paragraphs in the PERFORM call graph below the selected paragraphs to be transferred to the target program because COBOL does not allow calls from a called program to the calling program.

 

The modified source program has:

 

1.      A procedure division with:

a.       The transferred paragraphs removed, and

b.      Calls to the entries in the target programs;

2.      Identification, environment and data divisions the same as in the original source.

 

One of the key technical problems in implementing a modularization step is determining the parameters of the CALL statement created for a PERFORM.  The PERFORM statement does not have parameters.  Thus data flow must be analyzed to determine the CALL parameters.  Data flow analysis of a single compilation unit may be insufficient to determine the complete set of parameters for each CALL; other analysis techniques may be needed.

 

The next two sections focus on this problem and on the related problem of creating the data division of the target program.  Other modifications of the source and creation of other divisions of the target are more straightforward and are not treated in this paper.

 

Generating CALL Parameters

 

Since PERFORM statements do not have parameters, the parameters for the CALL that is generated for it must be determined by analyzing the data flow in the source program.  Some data elements referenced in a transferred paragraph are determined to be input or output parameters of PERFORM of that paragraph.  The rest are local to that paragraph.

 

An input parameter of a PERFORM statement is a data element that is set prior to the PERFORM statement and used in the perform paragraph.  An output parameter is a data element that is set in the performed paragraph and used following the PERFORM statement.

 

“Prior” and “following” are with reference to a control flow path through the PERFORM statement.  Manual determination of the possible control flow paths through a particular point in a large COBOL program is a difficult task.  Data flow analysis to determine the prior and following references between the variables that are referenced within the paragraphs to be included in the target program and the variable referenced within the paragraphs remaining within the source program is a computational intensive operation that consumes intensive amounts of machine resources. 

 

During manual modularization the analysis of a single variable’s linkage requirements required hours of human analysis, and selection of one partition required a weeks of human analysis.

 

Determination of data flow within paths is complicated by the existence of many aliases in most COBOL programs.  For example, if a data element A is set prior to a PERFORM, and the performed paragraphs uses B, which is a field of A, then A or B should be included as a parameter in the call generated for the PERFORM.   A search of control flow paths may not reveal any place where B is explicitly set; the maintainer must recognize that A is an alias of B, and search for the statement that sets A.  The alias problem is non-trivial; industry analyses show that in a typical COBOL program, lowest level data elements have on average 20 aliases.[3]

 

Determination of data flow between programs that CALL each other is another complication.  If a data element A in program U is passed through a CALL statement to another program V, then the data flow of A within program V must be done in order to correctly partition program U.

 

Data-flow analysis of the source program by itself is not always sufficient to generate all the required CALL parameters.  The control flow graph for a program may have no path from a set of data elements A to a use of A.  However, if A is in the linkage section, its value may persist between two CALLs, the first of which causes the set and the second the use.   Maintainers apply their knowledge of the COBOL application as a whole and their knowledge of COBOL programming practice to help identify such parameters.

 

Generating The Target Program Data Division

 

The data division has both linkage and working storage sections.  Declarations for data elements identified as parameters are included in the target program linkage section.  Declaration for other data elements that are referenced in the target program are included in the target program’s working storage section.  The modularization process includes rules for determining how the data declarations in the target program are generated from the data declarations in the source program.  Here are some of the rules:

 

1.      If a data element in the source data declarations has a superior item with an OCCURS clause, the superior item is also included in the target program data division.

2.      If a data element in the target has an OCCURS clause (including one added by the previous rule), its immediate superior’s data declaration is included in the target program data division.

3.      If a data element in the target has a REDEFINES clause, the data declaration for the element it redefines is also included in the target program data division.

4.      If a condition-name (88-level data element) is in the target program, its conditional data element is also included in the target program data division

Technical Approach

 

This section describes the technical approach that will be used to build the commercial grade modularization tool.  Key consideration for the technical approach are that it enable the automation of modularization processes that have already proven through prior industrial research to be of value within an industrial environment.  The code produced must be in the same style and format as would be produced through a highly disciplined manual process, and the target code must integrate with the mainframe environment.  Also, the process must be extensible to facilitate automation of additional, related maintenance or modernization activities that could take place following the modularization step.

 

Enabling Technology

 

A new enabling technology for reengineering will be used to build the commercial grade COBOL modularization tool.  The central technical ideas underlying the technology are:

 

1.      Represent software in the form of abstract syntax trees models in an object-oriented database.

2.      Use high-level specification languages to operate upon code captured in this form.

 

Janus is a software-reengineering tools framework developed by The Software Revolution for this purpose.  JANUS provides capabilities for assessment, transformation, re-factoring, and web-enablement of legacy systems into modern languages and architectural frameworks.   The Janus Tools provide unprecedented analysis and modeling capabilities for legacy languages and needs as little as 8 to 12 weeks be modified for a typical source or target language.

 

 

The Janus Tool framework provides high-level specification languages and model support for:

 

  1. Abstract Syntax Tree Model Construction
  2. Software Model Analysis
  3. Language Neutral Modeling
  4. Software Model Transformation

 

These facets of the Janus Software Reengineering environment will be described in greater detail below.

 

Abstract Syntax Tree Model Construction

 

Abstract Syntax Tree Model Construction for a software language involves creation of a grammar system for the language.  The Grammar system is defined in a syntax file.  The syntax file contains the extended BNF specification for the grammar rules that are used for parsing or printing the language between syntax-free abstract models forms and the syntax forms of the software language.  An abstract syntax tree is set of syntax-free model structures that represent the abstract structure and abstract semantics of the language construct types of the software language.   An AST is a syntax independent representation of a language construct.  A familiar language construct such as an IF-STATEMENT in a programming language is illustrated below.  A statement such as:

 

‘If  a < 10 then b = 5 else b = 10’

 

is represented as a abstract structured collection of objects in an AST something like:

 

<IF-THEN-ELSE <CONDITION <LT-EXPRESSION>> <THEN<STATEMENT-BLOCK>> <ELSE<STATEMENT-BLOCK>>>

 

These AST structures are represented above in a simplified form.  In reality ASTs are much more complex than shown in the example above.  AST structures are not intended to be read by humans.  They are analyzed by and transformed by specially created specification languages. 

 

 JPGEN™ is the specification language and tool used for defining language specifications, aka. grammar systems, in the Janus Tools Framework.  JPGEN is used for defining the mapping from syntactical forms of languages into AST structures.  JPGen uses an extended BNF, a special form of grammar specification, to define the mapping from every syntactical construct of a language into an equivalent abstract structure that is devoid of “surface syntax”.  JPGEN Language Specifications are consumed by the TSRI parser generator, JPGEN™, and used to construct parsers and printers and other tools which perform the actual operation of mapping between the syntactic forms (CODE) and the abstract form (AST) of the software.  The JPGEN ‘printer’ generated by the grammar system for a language generates the CODE from the AST.  The JPGEN ‘parser’ for a language generated by the grammar system generates the AST from the CODE.

 

Software Model Analysis

 

AST Analysis is performed to augment an AST that initially represents just the syntactical form of the languages with additional KB structures that capture the results of analysis as model structures that are amenable to subsequent analysis or transformation.   TSRI employs two high-level specification languages for this purpose.

 

 JTGEN™ is a declarative transformation generator, developed by TSRI , that takes as input as input a Transformation Specification.  The Transformation specification is processed by JTGEN to generate a transformation program that takes as input a set of KB structures and generates as output another set of KB structures.  JTGEN can also be used to modify KB structures in-place by producing as outputs the modified forms of the structures given to it as inputs. 

 

Refine++™ is a TSRI developed variation of the Refine™ wide spectrum language, originally developed by the Kestrel institute in the mid-80s.  Refine was used for the USAF sponsored Knowledge-Based Software Assistant (KBSA) in the late 80s and early 90s and became one of the most widely used tools in industry for software analysis and transformation during the 90s until support Reasoning Systems dropped support for the Refine language in 2000.  TSRI Refine++ is a highly modified extension of Refine that has been augmented with class constructors, grammar-based model construction, bindings to the native C++ and to C++ programs produced by JTGEN and JPGEN transformation generators.  The Refine++ specification language is transformed into C++ for efficiency and portability and executes at several hundred times the speed of the LISP-based Refine implemented by Reasoning systems.

 

TSRI uses Refine++ and JTGEN for AST analysis.  An AST is typically analyzed by augmentation with additional KB structures to capture the results of various algorithms that are applied to the structures from which the AST is composed.  Semantic augmentations are typically defined in a JTGEN rule file or in a Refine++ file. 

 

To provide Semantic analyses that are common across languages TSRI provides mechanisms for mapping software languages into a language independent modeling framework which will be described in greater detail below.

 

Language Independent Modeling

 

To provide a means of performing analysis and transformation efficiently across all languages, TSRI developed a Language Independent Modeling framework.  The LIM is a complete high-level language and modeling framework that TSRI developed and incrementally extends as needed to provide language-neutral syntactic and semantic model representation.  Utilization of the LIM requires mapping into and from the LIM AST model of the language-specific AST models of source and target languages.  The LIM possesses language independent syntactical form called the High-Level Intermediate Language (HLIL).

 

 

While the LIM is augmented by a set of creating semantic augmentations analyses that are universal across all languages.  In practice a combination of both language-independent and language-specific semantic analyses are performed to fully capture the semantics annotations for a language.  Semantic Annotations that are fully language-independent are defined within various language-independent semantic rule files, while semantic annotations that are language specific are performed using into language-specific annotation rule files. 

 

In summary, model construction involves defining a grammar system which is used to generate a ‘parser’ and a ‘printer’ for a language and a set of annotation rule and Refne++ specification that are used to augment the abstract syntax trees created by the parsers with semantic augmentations.  Transformation from a language-specific syntax into a language-specific, but syntax-free AST, proceeds by parsing the software for the application to construct the AST.  The abstract syntax tree is subsequently analyzed to augment it with semantic annotations before it is converted into a language-independent and syntax-free AST of the TSRI high-level intermediate language.

 

Software Model Transformation:

 

We apply a series of rewrite rules to the semantically augmented AST structures of the legacy language to transform them into the language neutral AST structures of the syntax-independent LIM.  We then optionally apply additional rewrite rules to transform the language independent model into the AST structures of some other target language.  For instance, the rewrite rules that define the transformation from COBOL into the LIM are defined in the COBOL2LIM.rul file, and conversely the rewrite rules that define the transformation from LIM back into the COBOL language are defined in the LIM2COBOL.rul files.

 

To transform syntax-neutral  COBOL ASTs into the syntax-neutral AST structures of the LIM, a transformation generator (JTGEN™) transforms the rewrite rules of the Ada2LIM.rul file into a highly efficient C++ program that is compiled for efficiency, and then integrated into the TSRI Transformation Framework and invoked upon collections of COBOL ASTs to convert them into LIM ASTs.  All of the intermediate data structures used for representing these abstract structures are modeled as highly optimized C++ structures.  Manipulation of these structures occurs in memory for speed and efficiency. 

 

The transformation generator is quite versatile and the tool framework is readily extensible.  It takes as input any .rul file, and generates a C++ program which is compiled into a highly efficient executable program that transforms the abstract structures of one language into the abstract structure of some other language.  The inputs to this process are simply the .syn files that define the Language Specifications for each of the languages, and of course, the .rul files that defines the rewrite rules that transforms the abstract structures of one language into the abstract structures of the other language.  The construction of the .syn files and the .rul files requires a high level of technical expertise in computer science and language theory.

 

 

 

The efficiency of the two-step transformation process is a highly scalable approach to software modeling and analysis, as well as transformation and compilation.  It eliminates the need to build separate analysis tools, separate development environments, and separate maintenance environments for each source legacy language.

the TSRI-Janus Tool Framework is a powerful family of syntax specification and transformation specification tools that is also suitable as the foundation for an industrial standard for the representation, modeling, and transformation of software.

 

Legacy Re-factoring

 

TSRI provides several forms of legacy system re-factoring options already available are summarized in the Table 1.

 

Dead Code Removal

Code that is never called or used that can be removed.

Dead Data Removal

Data that is never referenced that can be removed.

Redundant Code Removal

Code that occurs more than once in a software configuration that can be removed without changing functionality.

Redundant Data Removal.

Data definitions defined more than once in a software configuration that can be removed without changing functionality.

Duplicate Code Consolidation

Code that occurs more than once than can be merged into one or more reusable units of code.

Duplicate Data Consolidation

Data definitions that occur more than once than can be merged into one or more reusable units of data.

Similar Code Consolidation

Code that can be merged into one or more reusable units of code by parameterizing the differences between the similar code units.

Similar Data Consolidation

Data that can be merged into one or more reusable units of data by standardizing the data names and using data name qualification to provide unambiguous access to the data.

Data Name Standardization

Allows user to specify a standard data name that takes the place of an existing data name in the program.

Program Name Standardization

Allows user to specify a replacement program name that takes the place of existing program names in the program.

Copy Book Name Standardization

Allows user to specify a replacement copybook name that takes the place of existing copybook names in the program.

Procedure Name Standardization

Allows user to specify a standard name for procedures that takes the place of existing procedure names in the program.

 

Table 1:  Available Re-Factoring Options

 

The specification of these kinds of re-factoring operations may involve many decisions by end-users.  To facilitate and manage this decision process, TSRI provides the end-user with a summary of the set of possible re-factoring actions.  One or more Domain Experts familiar with the legacy application fill in entries or make choices in a form associated with each type of re-factoring operation to specify their preference of re-factoring operations.  

 

The TSRI tools read the re-factoring specification database and automatically carry out the re-factoring operations.  Re-factoring operations that carry out code removal or code consolidation require the Domain Expert to review each code situation and select a check box to remove or merge the code.  Re-factoring operations that carry out code name standardization require the Domain Expert to review each name and specify a substitute name to take the place of the original name. 

 

The TSRI re-factoring tools safely and accurately carry out the specified re-factoring operations and generate efficient and accurate code in the customer’s target language.   To carry out the re-factoring operations, transformations are performed against the abstract structures of the IOM.  These re-factoring operations are actually implemented when the IOM is transformed into the selected modern or legacy target language.

 

Legacy Modularization Benefits Summary

 

Modularization of large legacy systems allows the organization to reduce its on-going operational and maintenance costs, improve reliability and lowers the future costs associated with expanding system’s functionality.   Automated Modularization performed with Legacy System Assessment and Re-factoring can provide even greater benefits during the early phases of a software modernization project.  

 

Early phases of modernization projects usually include objectives such as proof-of-concept or risk reduction projects to demonstrate the effectiveness of the modernization technology and the selected approach to the company’s needs and culture.  Towards this end, modularization can be a highly automated, high ROI activity, with clearly demonstrable and persistent benefits that can effectively illustrate the power and effectiveness of the automated information system modernization approach.

 



[1] Newcomb, P. H, Marcosian, L., Automating the Modularization of Large COBOL Programs: Application of an Enabling Technology for Reengineering, Working Conference on Reverse Engineering, 1993, IEEE.

[2] Ning, J.Q., A. Engberts, W. Kozacynski, Recovering Reusable Components from Legacy Systems by Program Segmentation, 1993, IEEE.

[3] Vesely, E.G. COBOL: A guide to Structured Portable, Maintainable, and Efficient Program Design, Prentice-Hall, 1989.

Published Wednesday, June 04, 2008 4:24 PM by MikeJones
Filed under

Comments

No Comments
Anonymous comments are disabled

About MikeJones

The Software Revolution, Inc. (TSRI) 11410 NE 122nd Way, Suite 304 Kirkland, WA 98034 Sales.Group@ softwarerevolution.com www.softwarerevolution.com (425) 284-2770
Powered by Community Server and hosted by Telligent