LT099 Page 1 Using Callable PDP-11 SORT/MERGE: Things You Might Not Have Thought of Doing Bruce L. Gaarder The Gaarder Group, Inc. 1711 Highland Parkway St. Paul, MN 55116 ---------------------------------------------------------------------- Overview o Oriented toward the use of SORT/MERGE routines in application programs. o The rationale applies to VMS SORT/MERGE, but there may be additional capabilities or different routine names or parameters. o Sort is used generically for sort/merge in most discussions for brevity. o Mostly ideas about things to do, rather than sample code, because the routines are quite simple to use. o A high number of sorts in an application system used to be a measure of how fully you were utilizing your data. o I have heard people say that sorting is old-hat, that if you need to get records from a file in a different order, that you should define an alternate key. This is fine in some circumstances, but the number of keys can get out of hand, and RMS slows down on updating indices when there are many keys. LT099 Page 2 Callable SORT/MERGE Fundamentals o Languages: BASIC-PLUS-2, PDP-11 FORTRAN-77, MACRO-11, PDP-11 FORTRAN-IV, PDP-11 PASCAL, COBOL-81. o There are two interfaces to the SORT/MERGE routines. If you wish to manipulate individual records, you use the record interface, if not, you use the file interface. You can use either type of interface on input or on output. If you use one on input and the other on output, it is called a mixed-mode interface. o If you use the file interface for both input and output, the result is the same as a stand-alone sort or merge, under program control. o The record interface on input means that you supply the SORT/MERGE routines each record to be sorted one at a time, and you are in control of where the data comes from. In this case, SORT/MERGE has no input file. Indeed, there may be no input file of any kind. o The record interface on output means that you receive the ordered records one at a time, and you are in control of what you do with the data. In this case, SORT/MERGE has no output file. Indeed, there may be no output file of any kind. o The file interface on input means that SORT/MERGE directly reads the input file(s) without your intervention. o The file interface on output means that SORT/MERGE directly writes the output file without your intervention. o You can supply your own routines to replace three routines in both sort and merge and one additional routine in merge, rather than using the defaults. o The first is a routine to handle warning conditions and decide whether to allow continuation. o The second is a routine to handle key comparisons. o The third is a routine to handle keys that collate as equal. o The fourth is a routine to provide input records to MERGE. This is not needed in SORT because there is only one input file open at a time. o You must start each sort with a call to SRTINI and end it with a call to SRTEND. You can't call SRTINI again until SRTEND has been called. o You must start each merge with a call to MRGINI and end it with a call to MRGEND. You can't call MRGINI again until MRGEND has been called. o It isn't explicitly stated in the manual, but I wouldn't expect to be able to do a sort and a merge at the same time. LT099 Page 3 SORT Subroutines for File Interface on Input and Output o SRTINI initializes the sort operation by passing key information, sort options, and file names needed for the file interface. o SRTSRT reads the input file(s), sorts the records, and writes the output file. o SRTEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- SORT Subroutines for File Interface on Input and Record Interface on Output o SRTINI initializes the sort operation by passing key information, sort options, and file name(s) needed for the file interface. o SRTSRT reads the input file(s) and sorts the records. o SRTRLS returns one ordered record to your program. Must be called until there are no more ordered records. o SRTEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- SORT Subroutines for Record Interface on Input and File Interface on Output o SRTINI initializes the sort operation by passing key information, sort options, and file name needed for the file interface. o SRTRLS passes one record to the sort. Must be called until there are no more input records. o SRTSRT sorts the records, and writes the output file. o SRTEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- SORT Subroutines for Record Interface on Input and Output o SRTINI initializes the sort operation by passing key information and sort options. o SRTRLS passes one record to the sort. Must be called until there are no more input records. o SRTRLS returns one ordered record to your program. Must be called until there are no more ordered records. o SRTEND performs cleanup functions, such as closing files and releasing memory. LT099 Page 4 MERGE Subroutines for File Interface on Input and Output o MRGINI initializes the merge operation by passing key information, merge options, and file names needed for the file interface. o MRGMRG reads the input file(s), merges the records, and writes the output file. o MRGEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- MERGE Subroutines for File Interface on Input and Record Interface on Output o MRGINI initializes the merge operation by passing key information, merge options, and file names needed for the file interface. o MRGMRG reads the input files and merges the records. o MRGRTN returns one ordered record to your program. o MRGMRG and MRGRTN must be called as a pair until there are no more output records. o MRGEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- MERGE Subroutines for Record Interface on Input and File Interface on Output o MRGINI initializes the merge operation by passing key information, merge options, and file name needed for the file interface. o Remember: you must supply an input routine. o MRGMRG merges the records, and writes the output file. It is called only once per merge. o MRGEND performs cleanup functions, such as closing files and releasing memory. ---------------------------------------------------------------------- MERGE Subroutines for Record Interface on Input and Output o MRGINI initializes the merge operation by passing key information and merge options. o Remember: you must supply an input routine. o MRGRTN returns one ordered record to your program. Must be called until there are no more ordered records. o MRGEND performs cleanup functions, such as closing files and releasing memory. LT099 Page 5 Some xxxINI Parameters of Interest o Command line buffer holds a command line in MCR/CCL format. You can specify anything that makes sense given the interfaces which you are using. o Specification file buffer holds specification file text, if you don't want to have an external specification file (the file name would be in the command line if you did). o Merge order is the number of input files for a merge using the record interface on input. o Other parameters are explained in the manual and don't need extra comment. ---------------------------------------------------------------------- Comparison Routine o A user-written comparison routine can consider more than just the information in the key fields, or compare the data in a different manner than the standard routines. o Other files could also be accessed to help compare the key fields. ---------------------------------------------------------------------- Equal Key Routine o A user-written equal key routine can choose to delete both records, keep the first and delete the second, delete the first and keep the second, or keep both records. o You might want to modify some field in one or both records, combine the two records into one, keep track of the duplicates in some other file or in a program counter. o To use an example from the manual, if you wished to produce a list sorted by employee of the total amount paid to each employee for the year, the equal key routine could add the amounts from each weekly check into the second duplicate record and delete the first. You would then have one record per employee coming out of the sort, rather than as many as 52. ---------------------------------------------------------------------- Warning Routine o This is useful for giving "user-friendly" error messages. o You can also take other actions, such as tracking certain types of errors, or errors in sorting certain files, etc. o Your routine passes back a continue or terminate status code. LT099 Page 6 What Can You Do With Callable SORT/MERGE? o Calling SORT/MERGE routines can be considerably faster than doing an external SORT/MERGE because you might need to create a temporary file, sort it, and read the temporary file, where you could avoid the creation and reading of the temporary file. o Consider sorting a file with 150,000 records: an external sort would require 150,000 writes to the temporary file and 150,000 reads from the external file. o You would want to create the temporary file if several different programs would be accessing the data in the same order at different times. o They can also be slower, depending on how much memory is available. o Using callable SORT/MERGE lets you perform many sorts without dropping into DCL. o You can modify the sort order and other options on the fly by changing the command line buffer. o You can have a preferred device for the work files and shift to another device if the sort/merge fails because there is not enough room on that device. o In general, you can decide whether to retry a sort, based on the error code returned. LT099 Page 7 An Example From a Long Time Ago In a large securities accounting system, the reporting system worked in phases, with a sort in the middle. o There were two subroutines per report: an input processing and formatting routine, and an output processing and formatting routine. o As the input file was read, all subroutines were called to decide whether their report would contain this detail line. If so, a fixed control area was created with the report id as the primary key, other key fields to sort the record properly within that report, and data fields for totaling, etc. The fixed control area was followed by a print line image. The record was released to the sort. o The data was sorted. o As each record was returned, each subroutine was called if the matching record-id was found. It would then further process the control fields, print the print line image from the record, and print subtotal lines and total lines as appropriate. Obviously, you could also update files at this point, if it made sense. o A sample of the record layout: 001 - 006 Report Id 007 - 100 Balance of Sort Key 101 - 200 Data Fields 201 - 203 Length of Print Line Image 204 - 336 Print Line Image o Some specific layouts: 001 - 006 "RPTAA1" 007 - 012 Customer Number 013 - 018 Date of Transaction 019 - 030 Security Id Number 031 - 100 Blanks 101 - 110 Number of Shares 111 - 116 Transfer Agent 117 - 122 Clearing Date 123 - 200 Blanks 201 - 203 Length of Print Line Image 204 - 336 Print Line Image 001 - 006 "RPTAA2" 007 - 012 Date of Transaction 013 - 018 Customer Number 019 - 024 Transfer Agent 025 - 100 Blanks 101 - 130 Registration Number 131 - 140 Number of Shares 141 - 146 Clearing Date 147 - 200 Blanks 201 - 203 Length of Print Line Image 204 - 336 Print Line Image