make-temp-name

18/07/2017

For someone not terribly experienced in writing safe programs, one can only hope that building blocks like make-temp-file are doing the right thing and cannot be subverted by a malicious third party. The general advice here is that it’s preferable to use the primitive for creating the temporary file instead of the primitive to generate its name. Now, does Emacs reuse mkstemp(3) for this? Or at least tmpnam(3)? Of course not! Where we go, we can just invent our own source of randomness:

make-temp-file looks as follows:

static const char make_temp_name_tbl[64] =
{
  'A','B','C','D','E','F','G','H',
  'I','J','K','L','M','N','O','P',
  'Q','R','S','T','U','V','W','X',
  'Y','Z','a','b','c','d','e','f',
  'g','h','i','j','k','l','m','n',
  'o','p','q','r','s','t','u','v',
  'w','x','y','z','0','1','2','3',
  '4','5','6','7','8','9','-','_'
};

static unsigned make_temp_name_count, make_temp_name_count_initialized_p;

/* Value is a temporary file name starting with PREFIX, a string.

   The Emacs process number forms part of the result, so there is
   no danger of generating a name being used by another process.
   In addition, this function makes an attempt to choose a name
   which has no existing file.  To make this work, PREFIX should be
   an absolute file name.

   BASE64_P means add the pid as 3 characters in base64
   encoding.  In this case, 6 characters will be added to PREFIX to
   form the file name.  Otherwise, if Emacs is running on a system
   with long file names, add the pid as a decimal number.

   This function signals an error if no unique file name could be
   generated.  */

Lisp_Object
make_temp_name (Lisp_Object prefix, bool base64_p)
{
  Lisp_Object val, encoded_prefix;
  ptrdiff_t len;
  printmax_t pid;
  char *p, *data;
  char pidbuf[INT_BUFSIZE_BOUND (printmax_t)];
  int pidlen;

  CHECK_STRING (prefix);

  /* VAL is created by adding 6 characters to PREFIX.  The first
     three are the PID of this process, in base 64, and the second
     three are incremented if the file already exists.  This ensures
     262144 unique file names per PID per PREFIX.  */

  pid = getpid ();

  if (base64_p)
    {
      pidbuf[0] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[1] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[2] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidlen = 3;
    }
  else
    {
#ifdef HAVE_LONG_FILE_NAMES
      pidlen = sprintf (pidbuf, "%"pMd, pid);
#else
      pidbuf[0] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[1] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[2] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidlen = 3;
#endif
    }

  encoded_prefix = ENCODE_FILE (prefix);
  len = SBYTES (encoded_prefix);
  val = make_uninit_string (len + 3 + pidlen);
  data = SSDATA (val);
  memcpy (data, SSDATA (encoded_prefix), len);
  p = data + len;

  memcpy (p, pidbuf, pidlen);
  p += pidlen;

  /* Here we try to minimize useless stat'ing when this function is
     invoked many times successively with the same PREFIX.  We achieve
     this by initializing count to a random value, and incrementing it
     afterwards.

     We don't want make-temp-name to be called while dumping,
     because then make_temp_name_count_initialized_p would get set
     and then make_temp_name_count would not be set when Emacs starts.  */

  if (!make_temp_name_count_initialized_p)
    {
      make_temp_name_count = time (NULL);
      make_temp_name_count_initialized_p = 1;
    }

  while (1)
    {
      unsigned num = make_temp_name_count;

      p[0] = make_temp_name_tbl[num & 63], num >>= 6;
      p[1] = make_temp_name_tbl[num & 63], num >>= 6;
      p[2] = make_temp_name_tbl[num & 63], num >>= 6;

      /* Poor man's congruential RN generator.  Replace with
         ++make_temp_name_count for debugging.  */
      make_temp_name_count += 25229;
      make_temp_name_count %= 225307;

      if (!check_existing (data))
        {
          /* We want to return only if errno is ENOENT.  */
          if (errno == ENOENT)
            return DECODE_FILE (val);
          else
            /* The error here is dubious, but there is little else we
               can do.  The alternatives are to return nil, which is
               as bad as (and in many cases worse than) throwing the
               error, or to ignore the error, which will likely result
               in looping through 225307 stat's, which is not only
               dog-slow, but also useless since eventually nil would
               have to be returned anyway.  */
            report_file_error ("Cannot create temporary name for prefix",
                               prefix);
          /* not reached */
        }
    }
}

DEFUN ("make-temp-name", Fmake_temp_name, Smake_temp_name, 1, 1, 0,
       doc: /* Generate temporary file name (string) starting with PREFIX (a string).
The Emacs process number forms part of the result, so there is no
danger of generating a name being used by another Emacs process
\(so long as only a single host can access the containing directory...).

This function tries to choose a name that has no existing file.
For this to work, PREFIX should be an absolute file name.

There is a race condition between calling `make-temp-name' and creating the
file, which opens all kinds of security holes.  For that reason, you should
normally use `make-temp-file' instead.  */)
  (Lisp_Object prefix)
{
  return make_temp_name (prefix, 0);
}

The generated file name is therefore a combination of the prefix, the Emacs PID and three characters from the above table. This makes about 200.000 possible temporary files that can be generated with a given prefix in an Emacs session. This range can be traversed in a negligible amount of time to recreate the state of the RNG and accurately predict the next temporary file name.

(defun make-temp-file (prefix &optional dir-flag suffix)
  "Create a temporary file.
The returned file name (created by appending some random characters at the end
of PREFIX, and expanding against `temporary-file-directory' if necessary),
is guaranteed to point to a newly created empty file.
You can then use `write-region' to write new data into the file.

If DIR-FLAG is non-nil, create a new empty directory instead of a file.

If SUFFIX is non-nil, add that at the end of the file name."
  ;; Create temp files with strict access rights.  It's easy to
  ;; loosen them later, whereas it's impossible to close the
  ;; time-window of loose permissions otherwise.
  (with-file-modes ?\700
    (let (file)
      (while (condition-case ()
                 (progn
                   (setq file
                         (make-temp-name
                          (if (zerop (length prefix))
                              (file-name-as-directory
                               temporary-file-directory)
                            (expand-file-name prefix
                                              temporary-file-directory))))
                   (if suffix
                       (setq file (concat file suffix)))
                   (if dir-flag
                       (make-directory file)
                     (write-region "" nil file nil 'silent nil 'excl))
                   nil)
               (file-already-exists t))
        ;; the file was somehow created by someone else between
        ;; `make-temp-name' and `write-region', let's try again.
        nil)
      file)))

It’s interesting that the docstring of this function states that the return value “is guaranteed to point to a newly created empty file.”. If there were to exist a file for every possible combination for a prefix, this function would just fall into an infinite loop and block Emacs for no apparent reason. Both of these issues have been solved in a better way in glibc.

At least the impact of predicting the name is lessened if one uses make-temp-file instead of make-temp-name on its own. An attacker cannot create a symlink pointing to a rogue location with the predicted name as that would trigger a file-already-exists error and make the function use the next random name. All they could do is read out the file afterwards iff they have the same permission as the user Emacs runs with. A symlink attack can only be executed successfully with a careless make-temp-name user, thankfully I’ve not been able to find one worth subverting on GitHub yet.

Thanks to dale on #emacs for bringing this to my attention!


The Dude Abides

15/05/2017

While crafting another patch for cc-mode, I stumbled upon the following:

;; RMS says don't make these the default.
;; (April 2006): RMS has now approved these commands as defaults.
(unless (memq 'argumentative-bod-function c-emacs-features)
  (define-key c-mode-base-map "\e\C-a"    'c-beginning-of-defun)
  (define-key c-mode-base-map "\e\C-e"    'c-end-of-defun))

I’ve seen this kind of thing before, but it keeps puzzling me. Why would you:

  • Put a comment there about not making the following a default
  • Put a clarifying comment how these have been approved as defaults now
  • Keep both comments to trip up future readers

Any other project I’m involved in would immediately remove that commentary. Emacs is the only one where suggesting such drastic action would yield a lengthy bikeshedding discussion about the merits of pre-VCS traditions and how keeping them will toughen up whoever may join in the collaborative development effort. Yes, seriously.


Less Than Exceptional

26/02/2017

As of a few weeks ago, I’ve become part of Evil’s new maintainer team. Most of the bugs we get are not quite Evil’s fault, but rather that of other components it relies upon, such as Emacs. #785 is a perfect example of this observation. Take a look at the sources of the function responsible for the reported behavior and see whether you can find what’s wrong about it:

DEFUN ("lookup-key", Flookup_key, Slookup_key, 2, 3, 0,
       doc: ...)
  (Lisp_Object keymap, Lisp_Object key, Lisp_Object accept_default)
{
  ptrdiff_t idx;
  Lisp_Object cmd;
  Lisp_Object c;
  ptrdiff_t length;
  bool t_ok = !NILP (accept_default);

  keymap = get_keymap (keymap, 1, 1);

  length = CHECK_VECTOR_OR_STRING (key);
  if (length == 0)
    return keymap;

  idx = 0;
  while (1)
    {
      c = Faref (key, make_number (idx++));

      if (CONSP (c) && lucid_event_type_list_p (c))
        c = Fevent_convert_list (c);

      /* Turn the 8th bit of string chars into a meta modifier.  */
      if (STRINGP (key) && XINT (c) & 0x80 && !STRING_MULTIBYTE (key))
        XSETINT (c, (XINT (c) | meta_modifier) & ~0x80);

      /* Allow string since binding for `menu-bar-select-buffer'
         includes the buffer name in the key sequence.  */
      if (!INTEGERP (c) && !SYMBOLP (c) && !CONSP (c) && !STRINGP (c))
        message_with_string ("Key sequence contains invalid event %s", c, 1);

      cmd = access_keymap (keymap, c, t_ok, 0, 1);
      if (idx == length)
        return cmd;

      keymap = get_keymap (cmd, 0, 1);
      if (!CONSP (keymap))
        return make_number (idx);

      maybe_quit ();
    }
}

If you haven’t spotted it, the function never signals an exception for the exceptional case of an invalid key sequence. Instead, it returns one of the few types that aren’t allowed in a definition, namely an integer. It’s up to the caller to handle this case on their own, something the majority of Emacs packages rightfully neglect to do as even + informs you about invalid input. This design decision reeks of the many ways of signalling errors in C and worse, cannot be properly fixed as there’s code inside Emacs relying on the integer return type. There is even a lookup-key-ignore-too-long in Emacs core that looks a lot like what I wrote, but it’s part of menu-bar.el for whatever reason and not advertised at all.

tl;dr: In-band signalling sucks big time.


Strange Code

08/02/2017
(defface w3m-history-current-url
  ;; The following strange code compounds the attributes of the
  ;; `secondary-selection' face and the `w3m-arrived-anchor' face,
  ;; and generates the new attributes for this face.
  (let ((base 'secondary-selection)
        (fn (if (featurep 'xemacs)
                'face-custom-attributes-get
              'custom-face-attributes-get));; What a perverseness it is.
        ;; Both `face-custom-attributes-get' in XEmacs and
        ;; `custom-face-attributes-get' in CUSTOM 1.9962 attempt to
        ;; require `font' in Emacs/w3 and `cl' arbitrarily. :-/
        (features (cons 'font features))
        base-attributes attributes attribute)
    (setq base-attributes (funcall fn base nil)
          attributes (funcall fn 'w3m-arrived-anchor nil))
    (while base-attributes
      (setq attribute (car base-attributes))
      (unless (memq attribute '(:foreground :underline))
        (setq attributes (plist-put attributes attribute
                                    (cadr base-attributes))))
      (setq base-attributes (cddr base-attributes)))
    (list (list t attributes)))
  "Face used to highlight the current url in the \"about://history/\" page."
  :group 'w3m-face)

These days one would simply use :inherit for both w3m-arrived-anchor and secondary-selection.


No Obvious Way To Do It Properly

05/01/2017

You can often catch me griping way more about an “Emacs API” than Emacs Lisp, often without any explanation how the interfaces Emacs provides could be possibly worse than the language used for this. Here comes a not too obvious example of the issue at hand that will not go away if you were to rewrite the code in question in <insert favorite toy language>:

(defvar flyspell-prog-text-faces
  '(font-lock-string-face font-lock-comment-face font-lock-doc-face)
  "Faces corresponding to text in programming-mode buffers.")

(defun flyspell-generic-progmode-verify ()
  "Used for `flyspell-generic-check-word-predicate' in programming modes."
  ;; (point) is next char after the word. Must check one char before.
  (let ((f (get-text-property (- (point) 1) 'face)))
    (memq f flyspell-prog-text-faces)))

The above is helper code for Flyspell to detect whether point is in a string or comment. While this is most certainly the easiest way to achieve this goal, it isn’t what the manual recommends. What you’re supposed to do instead is to use the syntax table parser by accessing its state via syntax-ppss and checking slots 3 and 4[1]. The issue with that approach is that it’s not that simple, see the sources of Smartparens for the ugly details, more specifically the sp-point-in-string-or-comment function.

Let’s take a step back. Why exactly do we need to check the syntactical state of Emacs? To understand what’s going on, you’ll need to know that the first step when building a major mode for a programming language is to set up the syntax tables so that Emacs can recognize strings and comments properly. Once this has been done, both syntax highlighting and movement work correctly. This hack has been done back then to have that special case dealt with in a performant manner, albeit it only works properly for languages looking like C or Lisp. Single characters can be marked as having a special syntax, with extra slots for two-character sequences. In Lisp, a semicolon would start the comment and every newline would terminate one, in C the comment starters and enders would be /* and */. If you’re using a language with more complicated rules than that, there is the crutch of writing a custom syntax-propertize-function that walks over an arbitrary buffer region and applies syntax properties manually. This is how triple-quoted strings are implemented in python.el, the same technique is used to highlight a S-expression comment in scheme.el which starts with #; and extends to the following S-expression.

Unfortunately, not all modes play well. Some resort to just painting over the thing to be a comment or string by applying the right face to it and sacrifice proper movement over symbols and S-expressions. It shouldn’t be surprising that given the choice to fix faulty modes or circumvent a cumbersome API, plenty Emacs hackers pick the latter and get more robust handling for free. After all, practicality beats purity, right?

[1]I’ve seen at least one mode checking slot 8 instead. A detailed analysis of the difference is left to the reader as exercise.