Micro Focus File Formats
Data Management Series
|The SimoTime Home Page|
This document describes the various data file formats (or structures) used by Micro Focus. The intent is to provide an overview of the different file structures. For a detailed description of the various file structures or file systems refer to the Micro Focus documentation.
This document is intended to provide information to individuals that are migrating an application, migrating data files or sharing data files between an IBM Mainframe and a Linux, UNIX or Windows system using Micro Focus technologies. This includes COBOL, Assembler, JCL and other utility programs such as IDCAMS and SORT. Therefore, we have included information about the COBOL compiler directives and run time configuration specifications used for programs that process the data files.
A special "Thank you" to Larry Simmons of Micro Focus for providing much of the information that is presented in this series of white papers and sample programs.
We have made a significant effort to ensure the documents and software technologies are correct and accurate. We reserve the right to make changes without notice at any time. The function delivered in this version is based upon the enhancement requests from a specific group of users. The intent is to provide changes as the need arises and in a timeframe that is dependent upon the availability of resources.
Copyright © 1987-2019
SimoTime Technologies and Services
All Rights Reserved
This section describes various file formats and the file handling environments used by Micro Focus. For the development environment the base file handler may be the preferred environment. For the Test and Production environment the Micro Focus File Share running on a separate server may be the preferred environment for Indexed Files.
Microsoft has various formats for disk formatting. The FAT (or File Allocation Table) is the older technology used prior to Windows 2000 and many external storage devices are shipped with the initialization being FAT. The FAT format has a limit of 2 gigabytes per file. The FAT32 format raised the limit to 4 gigabytes per file. The NTFS format removed the 4 gigabytes limit.
Many external USB storage devices ship with FAT. This is the lowest common denominator for moving data between platforms and is supported by Windows and UNIX systems. Micro focus files that are smaller than 2 gigabytes may easily be moved across the three disk formats. Files greater than 2 gigabytes in size and less than 4 gigabytes may be easily moved between FAT32 and NTFS. Files that are larger than 4 gigabytes require the NTFS format.
Refer to Supporting Large Files in a Micro Focus Environment for additional information about large files in the Micro Focus environment.
The sequential files may be divided into two groups, Line Sequential and Record Sequential. The Line Sequential files are associated with ASCII/Text files and usually contain variable length records with a record separator value (this may be a one or two byte value) between each record. Depending on how a Line Sequential file is created it could have fixed length records with spaces as the trailing pad characters.
A Record Sequential file may contain fixed records or variable length records. The record sequential files with fixed length are a series of concatenated records or data strings of a predefined length. A record sequential file with variable length records is a series of concatenated records or data strings of varying lengths preceded by a record descriptor word (RDW) that defines the length of the record. A header record is placed at the start of the file when the file is created.
The two types of sequential files are discussed in more detail in the following sections of this document.
Line Sequential files are ASCII/Text files. Depending on how the file is created it may have fixed or variable length records. The separation of the individual records is maintained by the use of delimiter bytes between the records. For Windows this is usually a two byte value consisting of a Carriage-Return and Line-Feed (or CRLF), the hexadecimal notation is x'0D' and x'0A'. For UNIX systems this is usually a one byte value consisting of a Line-Feed (or LF), the hexadecimal notation is x'0A'.
The record content for Line Sequential files should be display or print text using the ASCII encoding format. Hence the name ASCII/Text files. These files should not contain packed or binary data strings.
A Record Sequential file may contain fixed-length records or variable-length records. The record sequential files with fixed-length records are a series of concatenated records or data strings of a predefined length without record separator values between each record. The first byte of the first record starts at the first byte of the file.
A record sequential file with variable-length records is a series of concatenated records or data strings of varying lengths without record separator values between each record. Each record is preceded by a record descriptor word (RDW) that defines the length of the record that follows. A Micro Focus header record is placed at the start of the file when the file is created.
Data is stored in records with predefined, fixed lengths concatenated into a contiguous string of data from the beginning to the end of the file. The file size must be a multiple of the record length. Record separator byes are not used. There is no Micro Focus header record. The record length is determined by the record definition in the COBOL program. When using MFE or ES/MTO Batch Facility and the file utility programs the record length is determined from the catalog or the LRECL value included with the JCL. The Data File Converter (DFCONV) uses a filename.PRO file to store the file information. If the .PRO file does not exist the user is prompted for the file information and this information is then stored in a .PRO file.
The record content for Record Sequential files may contain text strings that are ASCII or EBCDIC-encoded and numeric values that use a packed-decimal, zoned-decimal or binary format.
Data is stored in records with variable lengths. A Record Descriptor Word (RDW) of two (2) bytes prefixes each record. A 128-byte header record is at the beginning of the file. This header record is followed by a combination of the RDW and record that may have trailing low-values to keep records aligned on a word boundary. The records are then concatenated into a contiguous string of data from the beginning to the end of the file.
The mainframe variable length files that are transferred via FTP from the mainframe need special handling in order to get the RDW information to precede the variable length records. The RDW for mainframe variable length files is four (4) bytes with the first two (2) bytes being the record length. An optional Block Descriptor Word (BDW) may also be present in a mainframe variable length file. The BDW is four (4) bytes in length. If the same FTP syntax that is used for fixed-length records is used to transfer variable length records the file will be transferred without the RDW information. Special FTP statements are needed to have the RDW information included.
The record content for Record Sequential files may contain text strings that are ASCII or EBCDIC-encoded and numeric values that use a packed-decimal, zoned-decimal or binary format.
|Micro Focus File Formats|
The IDXFORMAT(8) is another popular format and is required if the files are larger than 2 gigabytes. With the IDXFORMAT(8) a single file contains both the data and the indices (primary key and alternate keys). The primary key must be unique and the alternate keys may contain duplicates.
This type of file may be EBCDIC or ASCII encoded and may contain packed, binary or floating-point fields. When converting this type of file between ASCII and EBCDIC it is necessary to read the original file and create a new file since the primary key value will be changing. The alternate indices for the new file should be rebuilt after the file has been created in its ASCII-encoded format. Additional time should be allocated to provide for configuring and managing index files under the control of Micro Focus File Share. Additional information for index files may be found in the Micro Focus documentation.
Support for VSAM Data Sets is provided using the various physical file technologies provided by Micro Focus.
The VSAM, Entry-Sequenced-Data-Set (or ESDS) is supported in the Linux, Unix and Windows environments. The underlying physical file technology used for this support is dependent on the existence and use of an alternate index.
The VSAM, Entry-Sequenced-Data-Set (or ESDS) is supported in the Linux, Unix and Windows environments using the Micro Focus Sequential File Support.
The VSAM, Entry-Sequenced-Data-Set (or ESDS) is supported in the Linux, Unix and Windows environments using the Micro Focus Indexed File Support.
The VSAM, Key-Sequenced-Data-Set (or KSDS) is supported in the Linux, Unix and Windows environments using the Micro Focus Indexed File Support. This support includes a primary key (or index) without duplicate keys. Alternate indices may be defined with or without duplicate keys.
The VSAM, Linear-Data-Set (or LDS) is not supported in a Micro Focus environment. Since this format is used for "System-oriented" functions that require access via assembler programs it is rarely found in an application environment. We have never encountered this format in an application migration project.
The VSAM, Relative-Record-Data-Set (or RRDS) is supported in the Linux, Unix and Windows environments using the Micro Focus Relative File Support.
Generation Data Groups (or GDG's) are sequential files. At the start of job execution the system resolves all the relative GDG references in the job stream. The best example of this is when you create (+1) in one step you need to reference it as (+1) in subsequent steps of that job.
However, there is no renaming of the actual dataset names done at EOJ. Given this scenario:
|Sample Scenario for GDG Processing|
At EOJ, the G0001V00 dataset will "roll off" and be disassociated with the GDG. It may be deleted as well depending on whether or not SCRATCH or NOSCRATCH has been specified for the GDG. So the datasets associated with the GDG at EOJ will actually be G0002V00, G0003V00, and G0004V00 (keeping in mind that G0001V00 may still exist and be in the catalog). The highest numbered G????v00 datasets that exist at EOJ will remain associated with the GDG as governed by the LIMIT.
For more information about GDG's refer to the Downloads and Links to Similar Pages section of this document.
A Partitioned Data Set (or PDS) is a data structure used in a mainframe environment. A PDS contains one or more sequential files that are called members. A PDS may be referred to as a library and they are normally used to group together members that serve a common purpose. For example, a PDS that contains JCL source members may be referred to as a JCL Library.
For more information about PDS's refer to the Downloads and Links to Similar Pages section of this document.
Creating the catalog is accomplished the first time a server is started. Populating the catalog (or creating catalog entries) for a new Enterprise Server, Mainframe Sub-System environment is a time sensitive, critical task. This task needs to be implemented at the beginning of projects that require the moving or migration of non-relational data files (this includes VSAM Data Sets). The process for creating and populating a catalog should be setup as a repeatable process from the beginning.
The Micro Focus Catalog serves as a repository for the following items.
|Functions of the Micro Focus Catalog|
The recommended sequence of events for creating catalog entries is as follows.
|Recommended Sequence of Events for Creating Catalog Entries|
The Micro Focus FILETYPE(n) compiler directive is used to define the file type to be processed by a COBOL program. The following matches the integer to the file format description.
|FILETYPE Compiler Directive|
Micro Focus provides a wide range of directives to support many of the mainframe file, record and field formats along with processing techniques that coincide with mainframe behavior.
A typical report file on the Mainframe is an EBCDIC-encoded, record sequential file of 133 byte records with the first byte being used for carriage control. With Micro Focus Mainframe Express (MFE) this format is maintain.
With Micro Focus Net Express a typical report file is an ASCII-encoded, line sequential file with embedded carriage control characters.
Since the files are all printable text it is easy to convert the encoding between EBCDIC and ASCII before attempting a file compare. However, the difference in the file formats will make it difficult to do a simple file compare. To solve this problem the report files need to be created in the mainframe format. To do this it is necessary to use the FILETYPE(11) and ADV compiler directives. This will cause Net Express to create the report files using the mainframe format. After converting one of the files between EBCDIC and ASCII the files may be easily compared.
Now, we must address the task of actual printing or storing in a repository. If the files are to be printed then we must recompile the programs without the FILETYPE(11) and ADV directives or obtain a utility program from Micro Focus that will read a mainframe formatted file and write a PC or UNIX line sequential file with embedded carriage control.
On the mainframe many reports are never printed (or rarely printed) but are placed in a repository for online viewing. This repository is usually managed by a separate, third-party software package. Many of the report management software vendors have a Windows or UNIX version of their software.
Note: this section addresses batch printing. If a terminal printer under the control of CICS is used then the print format maps to the terminal printer specifications and may require additional attention when migrating between the EBCDIC and ASCII environments.
A Comma-Separated-Values (or CSV) file is usually a sequential file with a specific record structure. For the Linux, UNIX and Windows (or LUW) environments a CSV file is usually an ASCII/Text file with variable length records.
The record structure is a number of text strings (or alphanumeric or numeric values) with each text string separated by a delimiter byte. For alphanumeric values the trailing spaces are normally truncated, for numeric values the leading zeroes or normally truncated. The delimiter byte by default is a comma, hence the name Comma-Separated-Values.
For more information about CSV files refer to the Downloads and Links to Similar Pages section of this document.
During a data migration from a mainframe system to a Windows or UNIX system many of the files may contain records with a variety of fields (or data strings) of various numeric formats that are used by the mainframe or the COBOL programming language. The following links provide additional information about numeric field formats.
The following list provides external links to reference material and examples for the various types of numeric formats used with the COBOL programming language and/or the Mainframe System.
|Additional Information for the various Types of Numeric Formats used with COBOL|
This section describes various Micro Focus compiler directives that may be required to control program behavior in the Linux, UNIX or Windows environments in a manner compliant with the compiler options and subsequent execution on the Mainframe System. The following directives will affect the way programs process and format numeric fields. Once the data is stored in a data file on s permanent storage media the format of the records and their content structure must be maintained.
If a program attempts to access the record buffers defined in the FD section of a COBOL program this will result in a 114 error return code. Normally, this memory area is not allocated and available until after the file is opened. This causes the 114 error message and can be very time consuming to diagnose. To make the record buffers available before a file open a compiler directive (NOHOSTFD) must be used.
Management (i.e. processing, storage and retrieval) of the various numeric formats has been and continues to be a challenge on the mainframe. When transferring data files that contain the various numeric formats from the Mainframe to a Windows or UNIX platform the challenges are transferred along with the files. Micro Focus (on the Windows and UNIX platforms) offers a number of COBOL compiler directives to help deal with the challenges of managing the various numeric formats.
NUMPROC is a mainframe compiler option. When NUMPROC(MIG) is in effect, the compiler generates code that is similar to that produced by OS/VS COBOL. This option can be especially useful if you migrate OS/VS COBOL programs to IBM Enterprise COBOL for z/OS.
Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to Enterprise COBOL. When NUMPROC(MIG) is in effect, the following processing occurs at the mainframe:
|When NUMPROC(MIG) is in effect, the preceding processing occurs at the mainframe|
For Micro Focus the use of a mainframe dialect DIALECT(ENTCOBOL) will set other compiler options for mainframe compatibility. For example IBMCOMP and NOTRUNC directives will be included when a mainframe dialect is use. This will maintain numeric integrity and size for COMP or BINARY fields. Next the use of the directives HOSTNUMMOVE HOSTNUMCOMPARE SIGNFIXUP HOSTARITHMETIC CHECKNUM will emulate the mainframe NUMPROC(NOPFD)
The sequence in which the directives are specified is also important since some directives will set other directives. For example, the DIALECT directive that specifies a mainframe dialect will set CHARSET(EBCDIC). If the desired encoding is ASCII then the CHARSET(ASCII) directive must follow the DIALECT directive.
|An Overview of Compiler Directives for Numeric Processing|
The file extfh.cfg is the file handler configuration file. By default, the file handler looks for it in the current directory. You can use the EXTFH environment variable to specify its path and/or name explicitly. For example: the following will explicitly define the location of the configuration file.
The following shows the content of a sample EXTFH configuration file.
[XFH-DEFAULT] filemaxsize=8 idxformat=8 filepointersize=8 INDEXCOUNT=32 IGNORELOCK=ON READSEMA=OFF USEVSAMKEYDEFS=OFF
This document describes the various data file formats (or structures) used by Micro Focus. The intent is to provide an overview of the different file structures. This document may be used to assist as a tutorial for new programmers or as a quick reference for experienced programmers.
In the world of programming there are many ways to solve a problem. This document and the links to other documents are intended to provide a greater awareness of the Data Management and Application Processing alternatives.
The documentation and software were developed and tested on systems that are configured for a SIMOTIME environment based on the hardware, operating systems, user requirements and security requirements. Therefore, adjustments may be needed to execute the jobs and programs when transferred to a system of a different architecture or configuration.
SIMOTIME Services has experience in moving or sharing data or application processing across a variety of systems. For additional information about SIMOTIME Services or Technologies please send an e-mail to: email@example.com or call 415 883-6565. We appreciate hearing from you.
Permission to use, copy, modify and distribute this software, documentation or training material for any purpose requires a fee to be paid to SimoTime Technologies. Once the fee is received by SimoTime the latest version of the software, documentation or training material will be delivered and a license will be granted for use within an enterprise, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Technologies.
SimoTime Technologies makes no warranty or representations about the suitability of the software, documentation or learning material for any purpose. It is provided "AS IS" without any expressed or implied warranty, including the implied warranties of merchantability, fitness for a particular purpose and non-infringement. SimoTime Technologies shall not be liable for any direct, indirect, special or consequential damages resulting from the loss of use, data or projects, whether in an action of contract or tort, arising out of or in connection with the use or performance of this software, documentation or training material.
This section includes links to documents with additional information that are beyond the scope and purpose of this document. The first group of documents may be available from a local system or via an internet connection, the second group of documents will require an internet connection.
Note: A SimoTime License is required for the items to be made available on a local system or server.
The following links may be to the current server or to the Internet.
Note: The latest versions of the SimoTime Documents and Program Suites are available on the Internet and may be accessed using the icon. If a user has a SimoTime Enterprise License the Documents and Program Suites may be available on a local server and accessed using the icon.
Explore the non-Relational Data Connection for more examples of accessing methodologies and coding techniques for Data Files and VSAM Data Sets.
Explore How to Create Catalog Entries for Generation Data Groups (GDG's) and Document the Process.
Explore How to Create Catalog Entries for Partioned Data Sets (PDS's) and Document the Process. This suite of scripts and documentation is for a Micro Focus Enterprise Server running on a Windows System.
Explore how to Create a File with CSV Formatted Records. The conversion from a Fixed-Field-Length (FFL) format to a Comma-Separated-Values (CSV) format is included in this example.
Explore how to Read and Parse a File with CSV Formatted Records . The conversion from a Comma-Separated-Values (CSV) format to a Fixed-Field-Length (FFL) format is included in this example.
Explore The ASCII and EBCDIC Translation Tables. These tables are provided for individuals that need to better understand the bit structures and differences of the encoding formats.
Explore The File Status Return Codes to interpret the results of accessing VSAM data sets and/or QSAM files.
The following links will require an internet connect.
A good place to start is The SimoTime Home Page for access to white papers, program examples and product information. This link requires an Internet Connection
Explore The Micro Focus Web Site for more information about products and services available from Micro Focus. This link requires an Internet Connection.
Explore the Glossary of Terms for a list of terms and definitions used in this suite of documents and white papers.
This document was created and is maintained by SimoTime Technologies. If you have any questions, suggestions, comments or feedback please use the following contact information.
|1.||Send an e-mail to our helpdesk.|
|2.||Our telephone numbers are as follows.|
|2.1.||1 415 763-9430 office-helpdesk|
|2.2.||1 415 827-7045 mobile|
We appreciate hearing from you.
SimoTime Technologies was founded in 1987 and is a privately owned company. We specialize in the creation and deployment of business applications using new or existing technologies and services. We have a team of individuals that understand the broad range of technologies being used in today's environments. Our customers include small businesses using Internet technologies to corporations using very large mainframe systems.
Quite often, to reach larger markets or provide a higher level of service to existing customers it requires the newer Internet technologies to work in a complementary manner with existing corporate mainframe systems. We specialize in preparing applications and the associated data that are currently residing on a single platform to be distributed across a variety of platforms.
Preparing the application programs will require the transfer of source members that will be compiled and deployed on the target platform. The data will need to be transferred between the systems and may need to be converted and validated at various stages within the process. SimoTime has the technology, services and experience to assist in the application and data management tasks involved with doing business in a multi-system environment.
Whether you want to use the Internet to expand into new market segments or as a delivery vehicle for existing business functions simply give us a call or check the web site at http://www.simotime.com
|File Formats for The Micro Focus Environment - Data Management Series by SimoTime Enterprises|
|Copyright © 1987-2019
SimoTime Technologies and Services
All Rights Reserved
|When technology complements business|