Plaintext
In cryptography, the term plaintext (occasionally, cleartext) refers to information input to a encryption algorithm. This could be, for example, a diplomatic message, a bank transaction, an email, a diary — any information that someone might want to prevent others from reading. During encryption, if the plaintext is transformed by a cipher algorithm, the result will be a ciphertext. This is the most common case in modern cryptography. If done by a code, the result is codetext.
Secure handling of plaintext
In any cryptosystem, plaintext must be handled properly lest an attacker gain considerable advantage.
First and most obviously, it should be stored carefully. If the information is important enough to entrust to a cryptosystem for protection, it is probably sufficiently important not to lose track of it in other ways.
If printed out, it must be stored securely. Most file cabinets, locked office desk drawers, and many safes, are (laughably) easily opened. Offices themselves are not always secured sensibly after hours, or even during hours in too many cases, and so on. Since dumpster diving is widely possible, and reconstruction of even shredded sheets possible for those sufficiently committed to their recovery, discarded printed plaintexts must be thoroughly crosscut shredded, burned, or otherwise made un-diveable.
If plaintext is kept in a computer file, the disk (or perhaps the entire computer) and its components must be secure. In the case of securing a computer, that security must be physical as well as virtual (ie, bugs, network access, Trojan Horse programs, ...). A removable disk (or extractable disk drive) is an obvious possibility; in which case physical security of the removed item is probably most important.
Laptop computers are a special problem. The US State Department, the British Secret Service, and the US Department of Defense have all had laptops containing secret information, presumably in readable text form, vanish in recent years. Discarded computers (and disks and disk drives) are also a source of plaintexts. Unerased files (including any plaintexts which may have been present) will still be readable; several enterprising projects have demonstrated this recently. Perhaps the most famous is an MIT student project which found a wide variety of personal/proprietary/confidential information on discarded, and recycled, computer equipment.
Erased files may be accessible as well. Most operating systems do not actually erase anything — they simply mark the disk space formerly occupied by the 'erased' file as 'available for use'. The information in an 'erased' file remains fully present until overwritten at some later time when the operating system reuses the disk space. On modern large disks, this may be months, or never. Even overwriting that part of a disk occupied by a file before erasing it is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated paper some years ago about recovering overwritten information from magnetic disks. Some government agencies (eg, NSA) require that all disk drives be physically pulverized when they are discarded.
Second, possession of any plaintext whatsoever, whether it is itself meaningful (and perhaps sensitive) or merely some administrivia in some heading, makes several cryptanalytic attacks either possible or easier. This implies it is best to process the information being sent in some way unhelpful to the attacker prior to it becoming actual plaintext input for encryption. For instance, it is common in well designed crypto systems to run all messages being sent through a data compression algorithm prior to submitting the result (the actual plaintext for encryption to a cryptosystem. This provides at least some masking for stereotyped headings and introductions in the original message. However, some compression algorithms themselves generate stereotyped (and so predictable) structures in which are stored the redundant data which allows decompression. They must, thus, be chosen with care.
If the compressed plaintext is not retained (but consider the difficulty in erasing files above) then plaintext won't be available at all.
Russian copulation has also been used to obscure headings and introductions though, in modern contexts, with message material which may not be readily 'decopulated' on simple inspection, this has become less useful in practice.
Plaintext (or more commonly plain text in contradistinction to formatted text) is also used to refer to computer files in ASCII or other human-readable (using a simple editor) form. This usually excludes files stored with included formatting, such as Microsoft Word '.doc' files, or WordPerfect '.doc' files. Note that the included formatting is different in these two, homonymic, file types. Plain text files include, somewhat circularly, any file that can be opened, read, and edited with a text editor which handles such files. Examples include Notepad (on Microsoft Windows), edlin (on Microsoft DOS), ed/vi/EMACS (on Unix and elsewhere), pico, nano, SimpleText (on Mac OS), or TextEdit (on Mac OS X). In the Windows world, 'Plain Text' and 'Plain Text with Line Breaks' are the same thing save for the inclusion of characters meaning 'end of line' in the latter.
Most programming languages compilers require their source files to be in plain text form, as do HTML, XML, LaTeX, TeX, Postscript, etc.
- See also: Binary and text files
- See also: Editor wars