1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2019 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
11 @node Encrypted File Wrappers
12 @chapter Encrypted File Wrappers
14 SPSS 21 and later can package multiple kinds of files inside an
15 encrypted wrapper. The wrapper has a common format, regardless of the
16 kind of the file that it contains.
19 The SPSS encryption wrapper is poorly designed. When the password is
20 unknown, it is much cheaper and faster to decrypt a file encrypted
21 this way than if a well designed alternative were used. If you must
22 use this format, use a 10-byte randomly generated password.
26 * Common Wrapper Format::
30 @node Common Wrapper Format
31 @section Common Wrapper Format
33 An encrypted file wrapper begins with the following 36-byte header,
34 where @i{xxx} identifies the type of file encapsulated: @code{SAV} for
35 a system file, @code{SPS} for a syntax file, @code{SPV} for a viewer
36 file. PSPP code for identifying these files just checks for the
37 @code{ENCRYPTED} keyword at offset 8, but the other bytes are also
41 0000 1c 00 00 00 00 00 00 00 45 4e 43 52 59 50 54 45 |........ENCRYPTE|
42 0010 44 @i{xx} @i{xx} @i{xx} 15 00 00 00 00 00 00 00 00 00 00 00 |D@i{xxx}............|
43 0020 00 00 00 00 |....|
46 Following the fixed header is essentially the regular contents of the
47 encapsulated file in its usual format, with each 16-byte block
48 encrypted with AES-256 in ECB mode.
50 To make the plaintext an even multiple of 16 bytes in length, the
51 encryption process appends PKCS #7 padding, as specified in RFC 5652
52 section 6.3. Padding appends 1 to 16 bytes to the plaintext, in which
53 each byte of padding is the number of padding bytes added. If the
54 plaintext is, for example, 2 bytes short of a multiple of 16, the
55 padding is 2 bytes with value 02; if the plaintext is a multiple of 16
56 bytes in length, the padding is 16 bytes with value 0x10.
58 The AES-256 key is derived from a password in the following way:
62 Start from the literal password typed by the user. Truncate it to at
63 most 10 bytes, then append as many null bytes as necessary until there
64 are exactly 32 bytes. Call this @var{password}.
67 Let @var{constant} be the following 73-byte constant:
70 0000 00 00 00 01 35 27 13 cc 53 a7 78 89 87 53 22 11
71 0010 d6 5b 31 58 dc fe 2e 7e 94 da 2f 00 cc 15 71 80
72 0020 0a 6c 63 53 00 38 c3 38 ac 22 f3 63 62 0e ce 85
73 0030 3f b8 07 4c 4e 2b 77 c7 21 f5 1a 80 1d 67 fb e1
74 0040 e1 83 07 d8 0d 00 00 01 00
78 Compute CMAC-AES-256(@var{password}, @var{constant}). Call the
79 16-byte result @var{cmac}.
82 The 32-byte AES-256 key is @var{cmac} || @var{cmac}, that is,
83 @var{cmac} repeated twice.
88 Consider the password @samp{pspp}. @var{password} is:
91 0000 70 73 70 70 00 00 00 00 00 00 00 00 00 00 00 00 |pspp............|
92 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
99 0000 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45
106 0000 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45
107 0010 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45
111 * Checking Passwords::
114 @node Checking Passwords
115 @subsection Checking Passwords
117 A program reading an encrypted file may wish to verify that the
118 password it was given is the correct one. One way is to verify that
119 the PKCS #7 padding at the end of the file is well formed. However,
120 any plaintext that ends in byte 01 is well formed PKCS #7, meaning
121 that about 1 in 256 keys will falsely pass this test. This might be
122 acceptable for interactive use, but the false positive rate is too
123 high for a brute-force search of the password space.
125 A better test requires some knowledge of the file format being
126 wrapped, to obtain a ``magic number'' for the beginning of the file.
130 The plaintext of system files begins with @code{$FL2@@(#)} or
134 Before encryption, a syntax file is prefixed with a line at the
135 beginning of the form @code{* Encoding: @var{encoding}.}, where
136 @var{encoding} is the encoding used for the rest of the file,
137 e.g.@: @code{windows-1252}. Thus, @code{* Encoding} may be used as a
138 magic number for system files.
141 The plaintext of viewer files begins with 50 4b 03 04 14 00 08 (50 4b
145 @node Password Encoding
146 @section Password Encoding
148 SPSS also supports what it calls ``encrypted passwords.'' These are
149 not encrypted. They are encoded with a simple, fixed scheme. An
150 encoded password is always a multiple of 2 characters long, and never
151 longer than 20 characters. The characters in an encoded password are
152 always in the graphic ASCII range 33 through 126. Each successive
153 pair of characters in the password encodes a single byte in the
156 Use the following algorithm to decode a pair of characters:
160 Let @var{a} be the ASCII code of the first character, and @var{b} be
161 the ASCII code of the second character.
164 Let @var{ah} be the most significant 4 bits of @var{a}. Find the line
165 in the table below that has @var{ah} on the left side. The right side
166 of the line is a set of possible values for the most significant 4
167 bits of the decoded byte.
170 @t{2 } @result{} @t{2367}
171 @t{3 } @result{} @t{0145}
172 @t{47} @result{} @t{89cd}
173 @t{56} @result{} @t{abef}
177 Let @var{bh} be the most significant 4 bits of @var{b}. Find the line
178 in the second table below that has @var{bh} on the left side. The
179 right side of the line is a set of possible values for the most
180 significant 4 bits of the decoded byte. Together with the results of
181 the previous step, only a single possibility is left.
184 @t{2 } @result{} @t{139b}
185 @t{3 } @result{} @t{028a}
186 @t{47} @result{} @t{46ce}
187 @t{56} @result{} @t{57df}
191 Let @var{al} be the least significant 4 bits of @var{a}. Find the
192 line in the table below that has @var{al} on the left side. The right
193 side of the line is a set of possible values for the least significant
194 4 bits of the decoded byte.
197 @t{03cf} @result{} @t{0145}
198 @t{12de} @result{} @t{2367}
199 @t{478b} @result{} @t{89cd}
200 @t{569a} @result{} @t{abef}
204 Let @var{bl} be the least significant 4 bits of @var{b}. Find the
205 line in the table below that has @var{bl} on the left side. The right
206 side of the line is a set of possible values for the least significant
207 4 bits of the decoded byte. Together with the results of the previous
208 step, only a single possibility is left.
211 @t{03cf} @result{} @t{028a}
212 @t{12de} @result{} @t{139b}
213 @t{478b} @result{} @t{46ce}
214 @t{569a} @result{} @t{57df}
220 Consider the encoded character pair @samp{-|}. @var{a} is
221 0x2d and @var{b} is 0x7c, so @var{ah} is 2, @var{bh} is 7, @var{al} is
222 0xd, and @var{bl} is 0xc. @var{ah} means that the most significant
223 four bits of the decoded character is 2, 3, 6, or 7, and @var{bh}
224 means that they are 4, 6, 0xc, or 0xe. The single possibility in
225 common is 6, so the most significant four bits are 6. Similarly,
226 @var{al} means that the least significant four bits are 2, 3, 6, or 7,
227 and @var{bl} means they are 0, 2, 8, or 0xa, so the least significant
228 four bits are 2. The decoded character is therefore 0x62, the letter