From: Ben Pfaff Date: Sun, 1 Dec 2019 23:45:45 +0000 (+0000) Subject: pspp-convert: Support decrypting SPV files. X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=f0827ef96044219fea423d73147937c4c266827b;p=pspp pspp-convert: Support decrypting SPV files. Also, now properly understands and documents the PKCS #7 padding used for all encrypted files. Special thanks to Alan Mead for assistance. --- diff --git a/NEWS b/NEWS index 1cad23fea6..020bc6d148 100644 --- a/NEWS +++ b/NEWS @@ -14,8 +14,15 @@ Changes from 1.2.0 to 1.3.0: * Plain text output is no longer divided into pages, since it is now rarely printed on paper. - * pspp-convert: New "-a", "-l", "--password-list" options to search - for an encrypted file's password. + * pspp-convert: + + - New support to decrypt encrypted viewer (SPV) files. The + encrypted viewer file format is unacceptably insecure, so to + discourage its use PSPP and PSPPIRE do not directly read or write + this format. + + - New "-a", "-l", "--password-list" options to search for an + encrypted file's password. * Improvements to SAVE DATA COLLECTION support for MDD files. diff --git a/doc/dev/encrypted-file-wrappers.texi b/doc/dev/encrypted-file-wrappers.texi index d4fcac207b..c445eec023 100644 --- a/doc/dev/encrypted-file-wrappers.texi +++ b/doc/dev/encrypted-file-wrappers.texi @@ -16,10 +16,10 @@ encrypted wrapper. The wrapper has a common format, regardless of the kind of the file that it contains. @quotation Warning -The SPSS encryption wrapper is poorly designed. It is much cheaper -and faster to decrypt a file encrypted this way than if a well -designed alternative were used. If you must use this format, use a -10-byte randomly generated password. +The SPSS encryption wrapper is poorly designed. When the password is +unknown, it is much cheaper and faster to decrypt a file encrypted +this way than if a well designed alternative were used. If you must +use this format, use a 10-byte randomly generated password. @end quotation @menu @@ -30,13 +30,12 @@ designed alternative were used. If you must use this format, use a @node Common Wrapper Format @section Common Wrapper Format -This section describes the general format of an SPSS encrypted file -wrapper. The following sections describe the details for each kind of -encapsulated file. - An encrypted file wrapper begins with the following 36-byte header, -where @i{xxx} identifies the type of file encapsulated, as described -in the following sections: +where @i{xxx} identifies the type of file encapsulated: @code{SAV} for +a system file, @code{SPS} for a syntax file, @code{SPV} for a viewer +file. PSPP code for identifying these files just checks for the +@code{ENCRYPTED} keyword at offset 8, but the other bytes are also +fixed in practice: @example 0000 1c 00 00 00 00 00 00 00 45 4e 43 52 59 50 54 45 |........ENCRYPTE| @@ -46,10 +45,17 @@ in the following sections: Following the fixed header is essentially the regular contents of the encapsulated file in its usual format, with each 16-byte block -encrypted with AES-256 in ECB mode. Each type of encapsulated file is -processed in a slightly different way before encryption, as described -in the following sections. The AES-256 key is derived from a password -in the following way: +encrypted with AES-256 in ECB mode. + +To make the plaintext an even multiple of 16 bytes in length, the +encryption process appends PKCS #7 padding, as specified in RFC 5652 +section 6.3. Padding appends 1 to 16 bytes to the plaintext, in which +each byte of padding is the number of padding bytes added. If the +plaintext is, for example, 2 bytes short of a multiple of 16, the +padding is 2 bytes with value 02; if the plaintext is a multiple of 16 +bytes in length, the padding is 16 bytes with value 0x10. + +The AES-256 key is derived from a password in the following way: @enumerate @item @@ -102,35 +108,39 @@ The AES-256 key is: @end example @menu -* Encrypted System Files:: -* Encrypted Syntax Files:: +* Checking Passwords:: @end menu -@node Encrypted System Files -@subsection Encrypted System Files - -An encrypted system file uses @code{SAV} as the identifier in its -header. +@node Checking Passwords +@subsection Checking Passwords -Before encryption, a system file is appended with as many null bytes -as needed (possibly zero) to make it a multiple of 16 bytes in length, -so that it fits exactly in a series of AES blocks. (This implies that -encrypted system files must always be compressed, because otherwise a -system file with only a single variable might appear to have an extra -case.) +A program reading an encrypted file may wish to verify that the +password it was given is the correct one. One way is to verify that +the PKCS #7 padding at the end of the file is well formed. However, +any plaintext that ends in byte 01 is well formed PKCS #7, meaning +that about 1 in 256 keys will falsely pass this test. This might be +acceptable for interactive use, but the false positive rate is too +high for a brute-force search of the password space. -@node Encrypted Syntax Files -@subsection Encrypted Syntax Files +A better test requires some knowledge of the file format being +wrapped, to obtain a ``magic number'' for the beginning of the file. -An encrypted syntax file uses @code{SPS} as the identifier in its -header. +@itemize @bullet +@item +The plaintext of system files begins with @code{$FL2@@(#)} or +@code{$FL3@@(#)}. +@item Before encryption, a syntax file is prefixed with a line at the beginning of the form @code{* Encoding: @var{encoding}.}, where @var{encoding} is the encoding used for the rest of the file, -e.g. @code{windows-1252}. The syntax file is then appended with as -many bytes with value 04 as needed (possibly zero) to make it a -multiple of 16 bytes in length. +e.g.@: @code{windows-1252}. Thus, @code{* Encoding} may be used as a +magic number for system files. + +@item +The plaintext of viewer files begins with 50 4b 03 04 14 00 08 (50 4b +is @code{PK}). +@end itemize @node Password Encoding @section Password Encoding diff --git a/doc/pspp-convert.texi b/doc/pspp-convert.texi index 7af90a113c..8c4941f095 100644 --- a/doc/pspp-convert.texi +++ b/doc/pspp-convert.texi @@ -55,11 +55,11 @@ this format.) @end table @command{pspp-convert} can convert most input formats to most output -formats. Encrypted system file and syntax files are exceptions: if -the input file is in an encrypted format, then the output file must be -the same format (decrypted). To decrypt such a file, specify the -encrypted file as @var{input}. The output will be the equivalent -plaintext file. +formats. Encrypted SPSS file formats are exceptions: if the input +file is in an encrypted format, then the output file will be the same +format (decrypted). To decrypt such a file, specify the encrypted +file as @var{input}. The output will be the equivalent plaintext +file. Options for the output format are ignored in this case. The password for encrypted files can be specified a few different ways. If the password is known, use the @option{-p} option diff --git a/src/data/encrypted-file.c b/src/data/encrypted-file.c index e340d04fce..c0124cbee9 100644 --- a/src/data/encrypted-file.c +++ b/src/data/encrypted-file.c @@ -27,6 +27,7 @@ #include "libpspp/cast.h" #include "libpspp/cmac-aes256.h" #include "libpspp/message.h" +#include "libpspp/str.h" #include "gl/minmax.h" #include "gl/rijndael-alg-fst.h" @@ -37,20 +38,20 @@ struct encrypted_file { + const struct file_handle *fh; FILE *file; - enum { SYSTEM, SYNTAX } type; int error; - uint8_t ciphertext[16]; - uint8_t plaintext[16]; - unsigned int ofs, n; + uint8_t ciphertext[256]; + uint8_t plaintext[256]; + unsigned int ofs, n, readable; uint32_t rk[4 * (RIJNDAEL_MAXNR + 1)]; int Nr; }; static bool decode_password (const char *input, char output[11]); -static bool fill_buffer (struct encrypted_file *); +static void fill_buffer (struct encrypted_file *); /* If FILENAME names an encrypted SPSS file, returns 1 and initializes *FP for further use by the caller. @@ -63,12 +64,14 @@ int encrypted_file_open (struct encrypted_file **fp, const struct file_handle *fh) { struct encrypted_file *f; - char header[36 + 16]; + enum { HEADER_SIZE = 36 }; + char data[HEADER_SIZE + sizeof f->ciphertext]; int retval; int n; f = xmalloc (sizeof *f); f->error = 0; + f->fh = fh; f->file = fn_open (fh, "rb"); if (f->file == NULL) { @@ -78,8 +81,8 @@ encrypted_file_open (struct encrypted_file **fp, const struct file_handle *fh) goto error; } - n = fread (header, 1, sizeof header, f->file); - if (n != sizeof header) + n = fread (data, 1, sizeof data, f->file); + if (n < HEADER_SIZE + 2 * 16) { int error = feof (f->file) ? 0 : errno; if (error) @@ -89,19 +92,16 @@ encrypted_file_open (struct encrypted_file **fp, const struct file_handle *fh) goto error; } - if (!memcmp (header + 8, "ENCRYPTEDSAV", 12)) - f->type = SYSTEM; - else if (!memcmp (header + 8, "ENCRYPTEDSPS", 12)) - f->type = SYNTAX; - else + if (memcmp (data + 8, "ENCRYPTED", 9)) { retval = 0; goto error; } - memcpy (f->ciphertext, header + 36, 16); - f->n = 16; + f->n = n - HEADER_SIZE; + memcpy (f->ciphertext, data + HEADER_SIZE, f->n); f->ofs = 0; + f->readable = 0; *fp = f; return 1; @@ -138,12 +138,9 @@ encrypted_file_read (struct encrypted_file *f, void *buf_, size_t n) uint8_t *buf = buf_; size_t ofs = 0; - if (f->error) - return 0; - while (ofs < n) { - unsigned int chunk = MIN (n - ofs, f->n - f->ofs); + unsigned int chunk = MIN (n - ofs, f->readable - f->ofs); if (chunk > 0) { memcpy (buf + ofs, &f->plaintext[f->ofs], chunk); @@ -152,8 +149,9 @@ encrypted_file_read (struct encrypted_file *f, void *buf_, size_t n) } else { - if (!fill_buffer (f)) - return ofs; + fill_buffer (f); + if (!f->readable) + break; } } @@ -165,21 +163,13 @@ encrypted_file_read (struct encrypted_file *f, void *buf_, size_t n) int encrypted_file_close (struct encrypted_file *f) { - int error = f->error; + int error = f->error > 0 ? f->error : 0; if (fclose (f->file) == EOF && !error) error = errno; free (f); return error; } - -/* Returns true if F is an encrypted system file, - false if it is an encrypted syntax file. */ -bool -encrypted_file_is_sav (const struct encrypted_file *f) -{ - return f->type == SYSTEM; -} #define b(x) (1 << (x)) @@ -286,6 +276,26 @@ decode_password (const char *input, char output[11]) return true; } +/* Check for magic number at beginning of plaintext decrypted from F. */ +static bool +is_good_magic (const struct encrypted_file *f) +{ + char plaintext[16]; + rijndaelDecrypt (f->rk, f->Nr, CHAR_CAST (const char *, f->ciphertext), + plaintext); + + const struct substring magic[] = { + ss_cstr ("$FL2@(#)"), + ss_cstr ("$FL3@(#)"), + ss_cstr ("* Encoding"), + ss_buffer ("PK\3\4\x14\0\x8", 7) + }; + for (size_t i = 0; i < sizeof magic / sizeof *magic; i++) + if (ss_equals (ss_buffer (plaintext, magic[i].length), magic[i])) + return true; + return false; +} + /* Attempts to use plaintext password PASSWORD to unlock F. Returns true if successful, otherwise false. */ bool @@ -341,40 +351,107 @@ encrypted_file_unlock__ (struct encrypted_file *f, const char *password) assert (sizeof key == 32); f->Nr = rijndaelKeySetupDec (f->rk, CHAR_CAST (const char *, key), 256); - /* Check for magic number at beginning of plaintext. */ - rijndaelDecrypt (f->rk, f->Nr, - CHAR_CAST (const char *, f->ciphertext), - CHAR_CAST (char *, f->plaintext)); + if (!is_good_magic (f)) + return false; - const char *magic = f->type == SYSTEM ? "$FL?@(#)" : "* Encoding"; - for (int i = 0; magic[i]; i++) - if (magic[i] != '?' && f->plaintext[i] != magic[i]) - return false; + fill_buffer (f); return true; } -static bool +/* Checks the 16 bytes of PLAINTEXT for PKCS#7 padding bytes. Returns the + number of padding bytes (between 1 and 16, inclusive), if well formed, + otherwise 0. */ +static int +check_padding (const uint8_t *plaintext) +{ + uint8_t pad = plaintext[15]; + if (pad < 1 || pad > 16) + return 0; + + for (size_t i = 1; i < pad; i++) + if (plaintext[15 - i] != pad) + return 0; + + return pad; +} + +static void fill_buffer (struct encrypted_file *f) { - f->n = fread (f->ciphertext, 1, sizeof f->ciphertext, f->file); + /* Move bytes between f->ciphertext[f->readable] and f->ciphertext[f->n] to + the beginning of f->ciphertext. + + The first time this is called for a given file, it does nothing because + f->readable is initially 0. After that, in steady state f->readable is 16 + less than f->n, so the final 16 bytes of ciphertext become the first 16 + bytes. This is necessary because we don't know until we hit end-of-file + whether padding in the last 16 bytes will require us to discard up to 16 + bytes of data. */ + memmove (f->ciphertext, f->ciphertext + f->readable, f->n - f->readable); + f->n -= f->readable; + f->readable = 0; f->ofs = 0; - if (f->n == sizeof f->ciphertext) + + if (f->error) /* or assert(!f->error)? */ + return; + + /* Read new ciphernext, extending f->n, until we've filled up f->ciphertext + or until we reach end-of-file or encounter an error. + + Afterward, f->error indicates what happened. */ + while (f->n < sizeof f->ciphertext) { - rijndaelDecrypt (f->rk, f->Nr, - CHAR_CAST (const char *, f->ciphertext), - CHAR_CAST (char *, f->plaintext)); - if (f->type == SYNTAX) + size_t retval = fread (f->ciphertext + f->n, 1, + sizeof f->ciphertext - f->n, f->file); + if (!retval) { - const char *eof = memchr (f->plaintext, '\04', sizeof f->plaintext); - if (eof) - f->n = CHAR_CAST (const uint8_t *, eof) - f->plaintext; + f->error = ferror (f->file) ? errno : EOF; + break; } - return true; + f->n += retval; + } + + /* Calculate the number of readable bytes. If we're at the end of the file, + then we can read everything, otherwise we hold back the last 16 bytes + because they might be padding or not. */ + if (!f->error) + { + assert (f->n == sizeof f->ciphertext); + f->readable = f->n - 16; } else + f->readable = f->n; + + /* If we have an incomplete block then trim it off and complain. */ + unsigned int overhang = f->readable % 16; + if (overhang) { - if (ferror (f->file)) - f->error = errno; - return false; + assert (f->error); + msg (ME, _("%s: encrypted file corrupted (ends in incomplete %u-byte " + "ciphertext block)"), + fh_get_file_name (f->fh), overhang); + f->error = EIO; + f->readable -= overhang; + } + + /* Decrypt all the blocks we have. */ + for (size_t ofs = 0; ofs < f->readable; ofs += 16) + rijndaelDecrypt (f->rk, f->Nr, + CHAR_CAST (const char *, f->ciphertext + ofs), + CHAR_CAST (char *, f->plaintext + ofs)); + + /* If we're at end of file then check the padding and trim it off. */ + if (f->error == EOF) + { + unsigned int pad = check_padding (&f->plaintext[f->n - 16]); + if (!pad) + { + msg (ME, _("%s: encrypted file corrupted (ends with bad padding)"), + fh_get_file_name (f->fh)); + f->error = EIO; + return; + } + + f->readable -= pad; } } diff --git a/src/data/encrypted-file.h b/src/data/encrypted-file.h index 4a2d6b390a..7add68042d 100644 --- a/src/data/encrypted-file.h +++ b/src/data/encrypted-file.h @@ -31,6 +31,4 @@ bool encrypted_file_unlock__ (struct encrypted_file *, const char *password); size_t encrypted_file_read (struct encrypted_file *, void *, size_t); int encrypted_file_close (struct encrypted_file *); -bool encrypted_file_is_sav (const struct encrypted_file *); - #endif /* encrypted-file.h */ diff --git a/tests/automake.mk b/tests/automake.mk index 0a8b8e3657..1dba99efa1 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -275,6 +275,8 @@ EXTRA_DIST += \ tests/data/v13.sav \ tests/data/v14.sav \ tests/data/test-encrypted.sps \ + tests/data/test-decrypted.spv \ + tests/data/test-encrypted.spv \ tests/language/mann-whitney.txt \ tests/language/data-io/Book1.gnm.unzipped \ tests/language/data-io/test.ods \ diff --git a/tests/data/encrypted-file.at b/tests/data/encrypted-file.at index 7f0a1ae197..e3ad1bf715 100644 --- a/tests/data/encrypted-file.at +++ b/tests/data/encrypted-file.at @@ -65,3 +65,8 @@ DESCRIPTIVES /quantity ]) AT_CLEANUP +AT_SETUP([decrypt an encrypted viewer file]) +AT_KEYWORDS([syntax file decrypt pspp-convert spv]) +AT_CHECK([pspp-convert $srcdir/data/test-encrypted.spv test.spv -p Password1]) +AT_CHECK([cmp $srcdir/data/test-decrypted.spv test.spv]) +AT_CLEANUP diff --git a/tests/data/test-decrypted.spv b/tests/data/test-decrypted.spv new file mode 100644 index 0000000000..891263dccd Binary files /dev/null and b/tests/data/test-decrypted.spv differ diff --git a/tests/data/test-encrypted.spv b/tests/data/test-encrypted.spv new file mode 100644 index 0000000000..da8be2c80f Binary files /dev/null and b/tests/data/test-encrypted.spv differ diff --git a/utilities/pspp-convert.1 b/utilities/pspp-convert.1 index 6e2e9f5224..9c96dead1e 100644 --- a/utilities/pspp-convert.1 +++ b/utilities/pspp-convert.1 @@ -55,9 +55,10 @@ specify the format for unrecognized extensions. . .PP \fBpspp\-convert\fR can convert most input formats to most output -formats. Encrypted system file and syntax files are exceptions: if -the input file is in an encrypted format, then the output file must -be the same format (decrypted). +formats. Encrypted SPSS file formats are exceptions: if the input +file is in an encrypted format, then the output file will be the same +format (decrypted). Options for the output format are ignored in this +case. . .SH "OPTIONS" .SS "General Options" diff --git a/utilities/pspp-convert.c b/utilities/pspp-convert.c index 4a2f0f0291..bb04abbfb2 100644 --- a/utilities/pspp-convert.c +++ b/utilities/pspp-convert.c @@ -184,24 +184,11 @@ main (int argc, char *argv[]) output_fh = fh_create_file (NULL, output_filename, NULL, fh_default_properties ()); if (encrypted_file_open (&enc, input_fh) > 0) { - if (encrypted_file_is_sav (enc)) - { - if (strcmp (output_format, "sav") && strcmp (output_format, "sys")) - error (1, 0, _("can only convert encrypted data file to sav or " - "sys format")); - } - else - { - if (strcmp (output_format, "sps")) - error (1, 0, _("can only convert encrypted syntax file to sps " - "format")); - } - - if (!decrypt_file (enc, input_fh, output_fh, password, + if (decrypt_file (enc, input_fh, output_fh, password, ds_cstr (&alphabet), length, password_list)) + goto exit; + else goto error; - - goto exit; }