The Minimum You Need to Know Before Reading COBOL Source Code

· · COBOL, Legacy Technology, Business Systems, Maintenance, Mainframe

Handovers, incident response, maintaining a vendor package. In situations like these, one day a pile of COBOL source code suddenly lands on your desk.

  • The file names end in .cbl or .cpy
  • The variable names are all uppercase
  • Rows of 01, 05, 77, 88
  • Things like PIC S9(7)V99 COMP-3 appear — somewhere between an incantation and accounting software
  • And it is full of COPY, so the file you opened does not even show you the whole picture

Around this point your brain turns slightly to powder.

But the map you need to read it is not that big. COBOL varies between compilers and products, yet the skeleton you should grasp first when reading an existing business system is largely the same everywhere. With IBM-style and typical business COBOL in mind, this article lays out the minimum set for people who suddenly have to read COBOL source code.

1. The Conclusion First (In One Breath)

Putting it rather crudely up front, but in a way that actually helps in practice:

  • Before being a language of logic, COBOL is very much a language of record definitions
  • Reading only the PROCEDURE DIVISION gives you half the story. Look at the DATA DIVISION first
  • PIC is the shape of an item; USAGE is how it is represented
  • COMP-3 is packed decimal. It shows up constantly in the world of amounts and counts
  • 88 is not a separate variable; it is a condition-name attached to the values of the preceding item
  • REDEFINES is a mechanism for viewing the same memory in a different shape. It is not a copy
  • If there is a COPY, the source you are looking at is not yet complete. You cannot see the whole until you open the copybooks
  • If you can follow PERFORM, IF, EVALUATE, READ, WRITE, and CALL, you can grasp most of the flow
  • Old source is fixed format, where column positions carry meaning. The whitespace you see is not just decoration1

In short: DIVISION, PIC, USAGE, COMP-3, REDEFINES, OCCURS, 88, COPY, PERFORM. Once you can read these, your odds of getting lost drop considerably.

2. Think of COBOL First as a Language About the Shape of Data

If you read it with C# or Java instincts, you will first want to chase the ifs, fors, and function calls. But with COBOL, before going there, it is faster to grasp “what records does this program receive, what records does it produce, and what buffers does it hold?”

A typical business COBOL program flows roughly like this:

  1. Read records from a file or DB
  2. Move them into WORKING-STORAGE items
  3. Branch on conditions
  4. Repack them into another record
  5. Write them out

In other words, layout tends to come before algorithm.

For example, here is a typical skeleton.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. SAMPLE01.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT SALES-FILE ASSIGN TO ...

       DATA DIVISION.
       FILE SECTION.
       FD  SALES-FILE.
       01  SALES-REC.
           05  SALE-ID       PIC 9(8).
           05  SALE-AMOUNT   PIC S9(7)V99 COMP-3.

       WORKING-STORAGE SECTION.
       01  WS-EOF            PIC X VALUE 'N'.
           88  EOF           VALUE 'Y'.

       PROCEDURE DIVISION.
           PERFORM UNTIL EOF
               READ SALES-FILE
                   AT END
                       SET EOF TO TRUE
                   NOT AT END
                       PERFORM PROCESS-SALE
               END-READ
           END-PERFORM
           STOP RUN.

When reading this code, the first things to look at are the type of SALE-AMOUNT and the meaning of EOF — before the PERFORM. Read COBOL in that order and it suddenly goes quiet.

3. Look at the Four DIVISIONs First

COBOL source is first divided into four large DIVISIONs.

DIVISION What to look at first
IDENTIFICATION DIVISION Program name, old comments, provenance
ENVIRONMENT DIVISION Files, external resources, I/O assumptions
DATA DIVISION Record definitions, working areas, parameters
PROCEDURE DIVISION The actual processing steps

The especially important parts are these.

  • FILE SECTION Contains the record definitions for input/output files
  • WORKING-STORAGE SECTION Contains everyday variables, flags, counters, work buffers
  • LOCAL-STORAGE SECTION May contain areas re-initialized on each invocation
  • LINKAGE SECTION May contain parameters passed in from outside, the receiving end of a subprogram

If you see a LINKAGE SECTION and PROCEDURE DIVISION USING ..., there is a strong chance the program is not self-contained — it runs on data received from outside.

4. Do Not Be Intimidated by the Look of Fixed Format

In old COBOL, the column positions themselves in a source line carry meaning. If you look at it without knowing this, you will never figure out “why is there this weird margin on the left?”1

In fixed format, roughly:

  • Columns 1 - 6: sequence number
  • Column 7: indicator
  • Columns 8 - 11: Area A
  • Columns 12 - 72: Area B

Column 7 is especially important.

  • * or / : comment line
  • - : continuation line
  • D : debugging line
  • *> : a comment that can also appear mid-line

To lower the visual pressure, a very rough sketch looks like this.

1234567 8901 23456789012345678901234567890
      * comment
       IDENTIFICATION DIVISION.
       PROGRAM-ID. SAMPLE01.

The whitespace here is not “formatting” in the modern sense; parts of it are syntax. Converting tabs in an editor, shifting things left, or copy-pasting carelessly will simply break it. When looking at old source, first question whether the file is fixed format or free format. Run a modern formatter over fixed-format code and it blows up quite spectacularly.

5. The Bare Minimum of the DATA DIVISION

5.1 Level Numbers

COBOL data definitions build their hierarchy with level numbers, not indentation.2

       01  WS-ORDER.
           05  WS-ORDER-ID    PIC 9(8).
           05  WS-AMOUNT      PIC S9(7)V99 COMP-3.
           05  WS-STATUS      PIC X.
               88  WS-OK      VALUE '0'.
               88  WS-ERROR   VALUE '9'.

       77  WS-COUNT           PIC 9(4).

At minimum, remembering just this much is enough.

  • 01 : top-level record or group forming one unit
  • 02 - 49 : the levels below it
  • 77 : an independent elementary item
  • 88 : condition-name. Attaches a name to a value of the preceding item3
  • 66 : for RENAMES. You will not run into it often, but it exists

What matters is not to think of 88 as a separate bool variable. There is no separate storage area called WS-OK; rather, when WS-STATUS is '0', it can be read under the name WS-OK — that is the feel of it.

One more important thing: it is the level numbers, not the whitespace, that determine the hierarchy. The visual indentation is a useful hint, but what you should ultimately trust is the 01 / 05 / 10 / 88.2

5.2 PICTURE

PIC expresses the shape of an item. The ones you will see most often are these.

Notation Rough meaning
X Character
9 Digit
S Signed
V Decimal point exists only logically
X(10) 10 characters
9(5) 5-digit number
S9(7)V99 Signed, 7 integer digits + 2 decimal digits

For example:

  • PIC X(10) → 10 characters
  • PIC 9(5)V99 → 5 integer digits + 2 decimal digits
  • PIC S9(7)V99 → signed, 7 integer digits + 2 decimal digits

The especially important one here is V. V holds no actual . character. PIC 9(5)V99 is treated as “a number with 2 decimal places,” but there is no dot character in the data. So if you interpret a file or a dump as “the string you see,” you will almost always trip.

5.3 USAGE / DISPLAY / COMP / COMP-3

If PIC is the shape, USAGE is the representation in which the item is held. At minimum, grasping just the following gets you a long way.45

Notation Rough meaning Caution when reading
DISPLAY External decimal, visible as characters On a mainframe this may assume EBCDIC6
COMP / BINARY Binary The visible digit count and the internal representation are different things
COMP-3 / PACKED-DECIMAL Packed decimal Looks broken if you read it as characters

For example:

       01  WS-AMOUNT-DISP   PIC S9(7)V99.
       01  WS-AMOUNT-BIN    PIC S9(7) COMP.
       01  WS-AMOUNT-PACK   PIC S9(7)V99 COMP-3.

All three are “numbers,” but they hold their contents differently.

What pays off most in practice is your reflex the instant you see COMP-3.

  • That is packed decimal
  • Probably an amount, a tax figure, a count, or a rate
  • It is supposed to look broken when viewed as text
  • Eyeballing it in a CSV/UTF-8 frame of mind will cause an accident

Holding on to that understanding makes you much less likely to panic needlessly at how dumps and binary files appear.

One more small note: DISPLAY does not necessarily mean an ASCII string. On z/OS systems EBCDIC is the assumption, so even when digits appear as characters, the byte values can differ from ASCII '0' - '9'.6

5.4 REDEFINES / OCCURS / COPY / FILLER

These four are the places where readers get stuck.

REDEFINES

REDEFINES is a mechanism for viewing the same storage area in a different shape. It is not a copy.7

       01  REC-BUF.
           05  REC-TYPE      PIC X.
           05  REC-DATA      PIC X(99).

       01  HEADER-REC REDEFINES REC-BUF.
           05  HDR-TYPE      PIC X.
           05  HDR-DATE      PIC 9(8).
           05  FILLER        PIC X(91).

This is close to the feel of a union in C-family languages. It often appears in the style of “interpret one 100-byte area as different record types.”

OCCURS

OCCURS is an array. In COBOL it tends to be called a table.

       05  WS-ITEM OCCURS 12 TIMES.
           10  WS-PRICE    PIC 9(5).

If you further encounter OCCURS DEPENDING ON, it is a variable-length table. In that case it can affect the positions of the items that follow, so following it with a fixed-length mindset will make you lose your footing.8

COPY

COPY is a compile-time include. In other words, the source you have open may not be the finished form yet.9

       COPY CUSTOMER-REC.
       COPY ERROR-MAP.

It is entirely normal for record definitions, shared flags, host variables for SQL, and external interfaces to be stuffed into copybooks.

When heavy use of COPY makes the source hard to read, it is faster to check whether you can get at the expanded source or a compiler listing. IBM Enterprise COBOL even has an option called MDECK for writing out the input source after library processing.10

FILLER

FILLER is an item with no name. But “unreferenced, therefore meaningless” is wrong.

It routinely serves as:

  • Reserved space
  • A compatibility gap for an old specification
  • Padding to match a record length
  • Slack for a REDEFINES

FILLER merely lacks a name — it still exists as bytes. Forget this and your mapping against an external file drifts out of alignment one byte at a time.

6. The Bare Minimum of the PROCEDURE DIVISION

If the DATA DIVISION is the map, the PROCEDURE DIVISION is the route you travel.

6.1 PERFORM

PERFORM is COBOL’s basic control transfer. Roughly speaking, it means call a piece of processing and come back.11

The forms you will see most often are these.

       PERFORM INIT-PROC
       PERFORM UNTIL EOF
           PERFORM READ-PROC
           IF NOT EOF
               PERFORM EDIT-PROC
               PERFORM WRITE-PROC
           END-IF
       END-PERFORM

PERFORM comes in two broad flavors.

  • Out-of-line PERFORM, which names a paragraph or section
  • Inline PERFORM ... END-PERFORM, which writes a block in place

In older code you will also routinely see range forms like PERFORM A-100 THRU A-199. Convenient, but adding a paragraph in the middle easily drags it into the range by accident, so when reading, check carefully where the range ends.

6.2 IF / EVALUATE / Scope

For conditional branching, IF is the basic tool. Thinking of EVALUATE as roughly a switch/case is mostly correct.

What you need to watch is how scopes end.12

Code with explicit terminators such as

  • END-IF
  • END-PERFORM
  • END-READ

is still the readable kind.

The problem is old code. In COBOL, . acts as an implicit scope terminator and closes all the still-open statements at once.12

That means a single period changes:

  • How far the IF extends
  • How far the PERFORM extends
  • Where the next sentence begins

Furthermore, NEXT SENTENCE is not the same as CONTINUE. NEXT SENTENCE jumps to the point after the next period, so its destination shifts depending on where the following . happens to be.12

When reading old COBOL, “watch the periods, not the line endings” is about the right calibration.

6.3 READ / WRITE / CALL

The frequent fliers in business COBOL are these.

  • READ
  • WRITE
  • REWRITE
  • START
  • CALL

READ ... AT END ... in particular is the classic pattern.

       READ IN-FILE
           AT END
               SET EOF TO TRUE
           NOT AT END
               PERFORM PROCESS-REC
       END-READ

If there is a CALL 'SUBPGM' USING ..., control jumps to another program. In that case, look at the callee’s LINKAGE SECTION and PROCEDURE DIVISION USING — the shape of the handoff becomes quite visible.

7. What Lives Outside COBOL

Quite often, COBOL’s world is not self-contained in the source.

  • File definitions
  • The execution environment
  • DB connections
  • The transaction environment
  • Job control

all live outside it.

At minimum, grasping the following makes reading much easier.

Files and FILE STATUS

Read the FILE-CONTROL in the ENVIRONMENT DIVISION together with the FILE SECTION / FD in the DATA DIVISION — they come as a pair.13

       SELECT IN-FILE ASSIGN TO ...
           FILE STATUS IS WS-FS.

       FD  IN-FILE.
       01  IN-REC.
           05 ...

If there is a FILE STATUS, it receives the result code after each I/O. When reading file-related failures or EOF handling, you cannot even begin without looking at this.14

EXEC SQL

If this appears, it is embedded SQL.

       EXEC SQL
           SELECT ...
       END-EXEC.

In this case the COBOL is a “vessel for host variables,” and the actual selection criteria and update targets are on the SQL side. So the shortcut is to read the contents of EXEC SQL as ordinary SQL.

EXEC CICS

If this appears, you are in a CICS transaction context.15

       EXEC CICS
           RECEIVE MAP(...)
       END-EXEC.

At that instant, this stops being a plain batch-reading exercise. You need to read it together with the external context: screens, transactions, response codes, the COMMAREA, and so on.

JCL and Execution Definitions

In mainframe batch, it is not unusual for which datasets actually get allocated and in what order the jobs flow to live outside the COBOL source. When you look at the source alone and cannot tell “where is this file?”, it routinely turns out that the code is not at fault — you just have not widened your view far enough yet.

8. The Minimum Reading Order

When you suddenly have to read COBOL, the following order is the safe one.

  1. Sweep up every COPY Open the copybooks if you can. If not, look for a listing or the expanded source
  2. Pick out the 01-level record definitions List the top-level items in the FILE SECTION, WORKING-STORAGE, and LINKAGE SECTION
  3. Read the PICs and USAGEs Identify amounts, dates, counts, codes, flags
  4. Search for READ / WRITE / REWRITE / CALL / EXEC SQL / EXEC CICS Grasp the I/O and the external boundaries first
  5. Follow only the first main path Trace the chain of PERFORMs from the top of the PROCEDURE DIVISION
  6. Look at the 88s and status items The meanings of EOF, success/failure, and type codes become much easier to read
  7. Mark every REDEFINES / OCCURS DEPENDING ON / COMP-3 They will matter later without fail, so flag them as hazardous material up front
  8. For files, look at the FILE STATUS This eliminates a lot of misreadings around I/O errors

In this order, you avoid having to close-read the whole thing from the start. With COBOL, rather than trying to understand 100% from the beginning, it is far easier to nail down the three points — records, external boundaries, main path — and then go into the details.

9. Common Stumbling Points

Finally, here are the places where beginners get caught with very high probability.

Thinking REDEFINES is “a different variable”

It is not. It is the same storage area read in a different shape. Modify one side, and the other side’s view changes too.7

Thinking 88 is “an independent bool”

It is not. It is just a name attached to a value of the preceding item. Behind the scenes, SET WS-OK TO TRUE stores the corresponding value into the underlying item.3

Ignoring COPY and reading only the main body

That is walking into the mountains with half the map still folded. It is entirely normal for field definitions, shared flags, and host variables to live wholesale outside the file.9

Thinking MOVE is plain assignment

MOVE is not just a memcpy. Depending on the receiving item’s type, it can involve conversion, digit alignment, zero filling, truncation, and editing/de-editing.[^move]

Underestimating the effect of .

COBOL’s . is heavier than you imagine. In old code with no explicit terminators, misjudging how much this period closes means misreading the control flow.12

Thinking packed decimal or EBCDIC is “mojibake”

It is not necessarily broken. Quite often it simply was never a string to begin with, or is just not ASCII.46

Assuming what follows OCCURS DEPENDING ON sits at a fixed position

The items after a variable-length table can move position depending on the value. Read it with a fixed-length mindset and all your offset calculations go wrong.8

10. Quick Reference: What to Think First

Word you found First thing to think
01 Top level of a record or group. Grasp the big picture from here
88 Named meaning of a flag or status code. The key to reading branches
PIC X(...) Character item
PIC 9(...) / S9(...)V... Numeric item. Check digit count and decimal position
COMP Binary
COMP-3 Packed decimal. Likely an amount or a count
REDEFINES The same area being reinterpreted differently
OCCURS Array / table
OCCURS DEPENDING ON Variable length. Watch the positions that follow too
FILLER No name, but it has length
COPY You cannot see the finished form without the copybook
PERFORM The skeleton of the main path
READ / WRITE / REWRITE File I/O
EXEC SQL DB processing
EXEC CICS Transaction processing
FILE STATUS I/O result code

11. Summary

COBOL is not hard because it is old. It is just that data definitions, external files, and the execution context are tightly intertwined, which makes the initial entry point hard to see.

To restate the minimum set for reading it:

  • Grasp the map via the DIVISIONs
  • Read the DATA DIVISION first
  • Read the shape of each item via PIC and USAGE
  • Mark every COMP-3, REDEFINES, OCCURS, 88, and COPY
  • Follow PERFORM, READ, WRITE, and CALL
  • Nail down the external boundaries via FILE STATUS, EXEC SQL, and EXEC CICS
  • Do not underestimate how . behaves

Once you can see all this, COBOL turns from “mysterious ancient magic” into “a record-processing language.” Legacy technology is not scary because the name is old; it is just that picking the wrong scale for your first look suddenly makes it hard to understand. Get the map scale right, and it reads surprisingly normally.

12. References

The main sources referenced in this article.

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog