# SAS to Python: Read Binary Numerical Data

In this article we will discuss the difference in reading binary numerical data between SAS and Python.

## Byte Order

Integer values for integer binary data are typically stored in one of three sizes: 1 byte, 2 bytes, or 4 bytes. The ordering of the bytes for the integer varies depending on the platform (operating environment) on which the integers were produced. There are two types of byte orders, “big-endian” and “little-endian”. In short, the main difference lies in where most significant byte is stored. For example, if an integer is stored in 4 bytes, then 2 is stored as 10 00 00 00 in little-endian and as 00 00 00 10 in big-endian. In this article we will just focus on the little-endian, since it is used in Windows and Linux.

## SAS

SAS uses informats to read the binary data and the corresponding formats for writing the binary data. The common informats are IB*w.d*, PIB*w.d*.

**IB w.d**: for integer binary values.

*w*specifies the width of the input fields and

*d*is a scaling factor and specifies the power of 10 that divides the integer.

**PIB w.d**: for positive integer binary values.

Note that when using

`infile`

statement to open a binary file, one has to specify option`recfm`

to value`n`

so that SAS knows the file is a binary file.

## Python

Use `open(file_name, 'rb')`

to open the binary file, remember to include `b`

to indicate a binary file. There are two ways commonly used in Python to read binary numerical data. The first one is to use function `int.from_bytes()`

. It is simple and quick. The following example converts the bytes b’\xf8’ to the integer 248.

However, it needs a loop if one is converting a series of bytes. In such case, it is easier to use the built in package `struct`

. The specific function used to read binary data is `struct.unpack`

. Let’s say there is a stream of bytes containing 4 integers each of which is stored in 4 bytes. The following one command can read all 4 integers at the same time.

It returns a tuple of 4 integers. Here `iiii`

is the format string similar to informats/formats in SAS, which tells Python how the integers are stored. It can also be written as `'i'*4`

. The following table compares some formats with SAS informats.

Format | Python Type | Standard Size | SAS Informat/Format |
---|---|---|---|

b | integer | 1 | IB1. |

B | integer | 1 | PIB1. |

h | integer | 2 | IB2. |

H | integer | 2 | PIB2. |

i | integer | 4 | IB4. |

I | integer | 4 | PIB4. |

q | integer | 8 | IB8. |

Q | integer | 8 | PIB8. |

## Example

We will be using this binary file. Feel free to download the file for your own test. The file has 21 bytes or 9 integers. The first 4 integers are stored in 4 bytes each and the last 5 integers are store in 1 byte each.

- SAS

- Python