No Obvious Way To Do It Properly


You can often catch me griping way more about an “Emacs API” than Emacs Lisp, often without any explanation how the interfaces Emacs provides could be possibly worse than the language used for this. Here comes a not too obvious example of the issue at hand that will not go away if you were to rewrite the code in question in <insert favorite toy language>:

(defvar flyspell-prog-text-faces
  '(font-lock-string-face font-lock-comment-face font-lock-doc-face)
  "Faces corresponding to text in programming-mode buffers.")

(defun flyspell-generic-progmode-verify ()
  "Used for `flyspell-generic-check-word-predicate' in programming modes."
  ;; (point) is next char after the word. Must check one char before.
  (let ((f (get-text-property (- (point) 1) 'face)))
    (memq f flyspell-prog-text-faces)))

The above is helper code for Flyspell to detect whether point is in a string or comment. While this is most certainly the easiest way to achieve this goal, it isn’t what the manual recommends. What you’re supposed to do instead is to use the syntax table parser by accessing its state via syntax-ppss and checking slots 3 and 4[1]. The issue with that approach is that it’s not that simple, see the sources of Smartparens for the ugly details, more specifically the sp-point-in-string-or-comment function.

Let’s take a step back. Why exactly do we need to check the syntactical state of Emacs? To understand what’s going on, you’ll need to know that the first step when building a major mode for a programming language is to set up the syntax tables so that Emacs can recognize strings and comments properly. Once this has been done, both syntax highlighting and movement work correctly. This hack has been done back then to have that special case dealt with in a performant manner, albeit it only works properly for languages looking like C or Lisp. Single characters can be marked as having a special syntax, with extra slots for two-character sequences. In Lisp, a semicolon would start the comment and every newline would terminate one, in C the comment starters and enders would be /* and */. If you’re using a language with more complicated rules than that, there is the crutch of writing a custom syntax-propertize-function that walks over an arbitrary buffer region and applies syntax properties manually. This is how triple-quoted strings are implemented in python.el, the same technique is used to highlight a S-expression comment in scheme.el which starts with #; and extends to the following S-expression.

Unfortunately, not all modes play well. Some resort to just painting over the thing to be a comment or string by applying the right face to it and sacrifice proper movement over symbols and S-expressions. It shouldn’t be surprising that given the choice to fix faulty modes or circumvent a cumbersome API, plenty Emacs hackers pick the latter and get more robust handling for free. After all, practicality beats purity, right?

[1]I’ve seen at least one mode checking slot 8 instead. A detailed analysis of the difference is left to the reader as exercise.

When Data Becomes Code


If you’ve hung out on #emacs for a while, chances are you’ve been recommended ibuffer for advanced buffer management (like, killing many buffers at once). There is a horror lurking beneath its pretty interface though and it starts out with a customizable:

(defcustom ibuffer-always-compile-formats (featurep 'bytecomp)
  "If non-nil, then use the byte-compiler to optimize `ibuffer-formats'.
This will increase the redisplay speed, at the cost of loading the
elisp byte-compiler."
  :type 'boolean
  :group 'ibuffer)


(defun ibuffer-compile-make-eliding-form (strvar elide from-end-p)
  (let ((ellipsis (propertize ibuffer-eliding-string 'font-lock-face 'bold)))
    (if (or elide (with-no-warnings ibuffer-elide-long-columns))
        `(if (> strlen 5)
             ,(if from-end-p
                  ;; FIXME: this should probably also be using
                  ;; `truncate-string-to-width' (Bug#24972)
                  `(concat ,ellipsis
                           (substring ,strvar
                                      (string-width ibuffer-eliding-string)))
                   ,strvar (- strlen (string-width ,ellipsis)) nil ?.)

(defun ibuffer-compile-make-substring-form (strvar maxvar from-end-p)
  (if from-end-p
      ;; FIXME: not sure if this case is correct (Bug#24972)
      `(truncate-string-to-width str strlen (- strlen ,maxvar) nil ?\s)
    `(truncate-string-to-width ,strvar ,maxvar nil ?\s)))

(defun ibuffer-compile-make-format-form (strvar widthform alignment)
  (let* ((left `(make-string tmp2 ?\s))
         (right `(make-string (- tmp1 tmp2) ?\s)))
       (setq tmp1 ,widthform
             tmp2 (/ tmp1 2))
       ,(pcase alignment
          (:right `(concat ,left ,right ,strvar))
          (:center `(concat ,left ,strvar ,right))
          (:left `(concat ,strvar ,left ,right))
          (_ (error "Invalid alignment %s" alignment))))))

(defun ibuffer-compile-format (format)
  (let ((result nil)
        ;; We use these variables to keep track of which variables
        ;; inside the generated function we need to bind, since
        ;; binding variables in Emacs takes time.
        (vars-used ()))
    (dolist (form format)
       ;; Generate a form based on a particular format entry, like
       ;; " ", mark, or (mode 16 16 :right).
       (if (stringp form)
           ;; It's a string; all we need to do is insert it.
           `(insert ,form)
         (let* ((form (ibuffer-expand-format-entry form))
                (sym (nth 0 form))
                (min (nth 1 form))
                (max (nth 2 form))
                (align (nth 3 form))
                (elide (nth 4 form)))
           (let* ((from-end-p (when (cl-minusp min)
                                (setq min (- min))
                  (letbindings nil)
                  (outforms nil)
                  min-used max-used strlen-used)
             (when (or (not (integerp min)) (>= min 0))
               ;; This is a complex case; they want it limited to a
               ;; minimum size.
               (setq min-used t)
               (setq strlen-used t)
               (setq vars-used '(str strlen tmp1 tmp2))
               ;; Generate code to limit the string to a minimum size.
               (setq minform `(progn
                                (setq str
                                        `(- ,(if (integerp min)
             (when (or (not (integerp max)) (> max 0))
               (setq max-used t)
               (cl-pushnew 'str vars-used)
               ;; Generate code to limit the string to a maximum size.
               (setq maxform `(progn
                                (setq str
                                        (if (integerp max)
                                (setq strlen (string-width str))
                                (setq str
                                        'str elide from-end-p)))))
             ;; Now, put these forms together with the rest of the code.
             (let ((callform
                    ;; Is this an "inline" column?  This means we have
                    ;; to get the code from the
                    ;; `ibuffer-inline-columns' alist and insert it
                    ;; into our generated code.  Otherwise, we just
                    ;; generate a call to the column function.
                    (ibuffer-aif (assq sym ibuffer-inline-columns)
                        (nth 1 it)
                      `(,sym buffer mark)))
                   ;; You're not expected to understand this.  Hell, I
                   ;; don't even understand it, and I wrote it five
                   ;; minutes ago.
                    (if (get sym 'ibuffer-column-summarizer)
                        ;; I really, really wish Emacs Lisp had closures.
                        ;; FIXME: Elisp does have them now.
                        (lambda (arg sym)
                            (let ((ret ,arg))
                              (put ',sym 'ibuffer-column-summary
                                   (cons ret (get ',sym
                      (lambda (arg _sym)
                        `(insert ,arg))))
                   (mincompform `(< strlen ,(if (integerp min)
                   (maxcompform `(> strlen ,(if (integerp max)
               (if (or min-used max-used)
                   ;; The complex case, where we have to limit the
                   ;; form to a maximum or minimum size.
                     (when (and min-used (not (integerp min)))
                       (push `(min ,min) letbindings))
                     (when (and max-used (not (integerp max)))
                       (push `(max ,max) letbindings))
                      (if (and min-used max-used)
                          `(if ,mincompform
                             (if ,maxcompform
                        (if min-used
                            `(when ,mincompform
                          `(when ,maxcompform
                     (push `(setq str ,callform
                                  ,@(when strlen-used
                                      `(strlen (string-width str))))
                     (setq outforms
                           (append outforms
                                   (list (funcall insertgenfn 'str sym)))))
                 ;; The simple case; just insert the string.
                 (push (funcall insertgenfn callform sym) outforms))
               ;; Finally, return a `let' form which binds the
               ;; variables in `letbindings', and contains all the
               ;; code in `outforms'.
               `(let ,letbindings
    ;; We don't want to unconditionally load the byte-compiler.
    (funcall (if (or ibuffer-always-compile-formats
                     (featurep 'bytecomp))
             ;; Here, we actually create a lambda form which
             ;; inserts all the generated forms for each entry
             ;; in the format string.
             `(lambda (buffer mark)
                (let ,vars-used
                  ,@(nreverse result))))))

(defun ibuffer-recompile-formats ()
  "Recompile `ibuffer-formats'."
  (setq ibuffer-compiled-formats
        (mapcar #'ibuffer-compile-format ibuffer-formats))
  (when (boundp 'ibuffer-filter-format-alist)
    (setq ibuffer-compiled-filter-formats
          (mapcar (lambda (entry)
                    (cons (car entry)
                          (mapcar (lambda (formats)
                                    (mapcar #'ibuffer-compile-format formats))
                                  (cdr entry))))

(defun ibuffer-clear-summary-columns (format)
  (dolist (form format)
    (when (and (consp form)
               (get (car form) 'ibuffer-column-summarizer))
      (put (car form) 'ibuffer-column-summary nil))))

(defun ibuffer-check-formats ()
  (when (null ibuffer-formats)
    (error "No formats!"))
  (let ((ext-loaded (featurep 'ibuf-ext)))
    (when (or (null ibuffer-compiled-formats)
              (null ibuffer-cached-formats)
              (not (eq ibuffer-cached-formats ibuffer-formats))
              (null ibuffer-cached-eliding-string)
              (not (equal ibuffer-cached-eliding-string ibuffer-eliding-string))
              (eql 0 ibuffer-cached-elide-long-columns)
              (not (eql ibuffer-cached-elide-long-columns
                        (with-no-warnings ibuffer-elide-long-columns)))
              (and ext-loaded
                   (not (eq ibuffer-cached-filter-formats
                   (and ibuffer-filter-format-alist
                        (null ibuffer-compiled-filter-formats))))
      (message "Formats have changed, recompiling...")
      (setq ibuffer-cached-formats ibuffer-formats
            ibuffer-cached-eliding-string ibuffer-eliding-string
            ibuffer-cached-elide-long-columns (with-no-warnings ibuffer-elide-long-columns))
      (when ext-loaded
        (setq ibuffer-cached-filter-formats ibuffer-filter-format-alist))
      (message "Formats have changed, recompiling...done"))))

Another weird one is that the extracted autoloads for ibuffer-ext.el reside in ibuffer.el, but that’s the lesser evil of the two.

Credits go to holomorph for discovering that maintenance nightmare.

PSA: Emacs is not a proper GTK application


Update: I’ve been pointed to an emacs-devel discussion about giving Emacs a proper GTK frontend, most comparable to the Win32 and NS frontends (which are free from X11isms). This would be a better approach than what’s been outlined below, with the Cairo code being repurposed for the drawing bits.

Daniel Colascione’s excellent write-up on bringing double-buffered rendering to Emacs has prompted me to do the same on a set of questions that can be occasionally spotted on #emacs:

  • Which GUI build of Emacs shall I choose?
  • What’s the difference between the GTK build and the other builds of Emacs on Linux?
  • Does the GTK build run on Wayland?
  • Does the GTK build run on Broadway?
  • Why does Emacs not render as nicely as <insert more modern text editor>?

If you’ve ever programmed a GUI application with a modern/popular GUI toolkit, you’ll have noticed that while it gives you loads of desirable features, it forces you to structure your application around its idea of an event loop. In other words, your application is forced to react asynchronously to user events. Games are pretty much the only major kind of graphical application that can get away with doing their own event loop, but end up doing their own GUI instead.

Now, the issue with Emacs is that it does its own event loop, pretending that the frontend is a textual or graphical terminal. It’s pretty much the graphical equivalent of a REPL and at the time it only had a text terminal frontend, this way of doing things worked out fairly well. However, by the time users demanded having pretty GTK widgets in Emacs, it became clear that more involved hacks were needed to make that happen. This is why Emacs runs the GTK event loop one iteration at a time, pushes its own user events into it (to make widgets react) and a plethora of more hacks to reconcile their rendering with everything done by X11.

In other words, Emacs is more of a X11 application plus your favorite widgets. The choice of GUI toolkit to build it with is mostly irrelevant, save an infamous bug with the GTK3 frontend that can crash the daemon. Emacs will therefore not run on a pure Wayland system or under Broadway in the browser. If anyone would want to make that happen, either the GTK frontend would need to yank out everything X11 (unlikely as it’s deeply entrenched) or to create a new frontend doing platform-agnostic drawing (the Cairo feature being a prime candidate for that).

Further reading material:

Honesty is the Best Policy

;;; Please do not try to understand this code unless you have a VERY
;;; good reason to do so.  I gave up trying to figure it out well
;;; enough to explain it, long ago.

This precedes paredit-forward-sexps-to-kill which appears to be code that deals with the problem that commands operating on S-expressions may fail mysteriously if there’s no trailing newline afterwards.

Credits go to contrapunctus.

Brutal Workarounds


Today I took another stab at my abandoned Emacs Lisp IRC bot and thought to myself that it would be nice if it were able to notify me on RSS and Atom feed updates. Now, I’m not terribly fond of reinventing the wheel, so like every good programmer I dared taking a look at existing solutions, like News Ticker:

(defun newsticker--do-xml-workarounds ()
  "Fix all issues which `xml-parse-region' could be choking on."

  ;; a very very dirty workaround to overcome the
  ;; problems with the newest (20030621) xml.el:
  ;; remove all unnecessary whitespace
  (goto-char (point-min))
  (while (re-search-forward ">[ \t\r\n]+<" nil t)
    (replace-match "><" nil t))
  ;; and another brutal workaround (20031105)!  For some
  ;; reason the xml parser does not like the colon in the
  ;; doctype name "rdf:RDF"
  (goto-char (point-min))
  (if (re-search-forward "<!DOCTYPE[ \t\n]+rdf:RDF" nil t)
      (replace-match "<!DOCTYPE rdfColonRDF" nil t))
  ;; finally.... ~##^°!!!!!
  (goto-char (point-min))
  (while (search-forward "\r\n" nil t)
    (replace-match "\n" nil t))
  ;; still more brutal workarounds (20040309)!  The xml
  ;; parser does not like doctype rss
  (goto-char (point-min))
  (if (re-search-forward "<!DOCTYPE[ \t\n]+rss[ \t\n]*>" nil t)
      (replace-match "" nil t))
  ;; And another one (20050618)! (Fixed in GNU Emacs
  ;; Remove comments to avoid this xml-parsing bug:
  ;; "XML files can have only one toplevel tag"
  (goto-char (point-min))
  (while (search-forward "<!--" nil t)
    (let ((start (match-beginning 0)))
      (unless (search-forward "-->" nil t)
        (error "Can't find end of comment"))
      (delete-region start (point))))
  ;; And another one (20050702)! If description is HTML
  ;; encoded and starts with a `<', wrap the whole
  ;; description in a CDATA expression.  This happened for
  (goto-char (point-min))
  (while (re-search-forward
          "<description>\\(<img.*?\\)</description>" nil t)
     "<description><![CDATA[ \\1 ]]></description>"))
  ;; And another one (20051123)! XML parser does not
  ;; like this: <yweather:location city="Frankfurt/Main"
  ;; region="" country="GM" />
  ;; try to "fix" empty attributes
  ;; This happened for
  (goto-char (point-min))
  (while (re-search-forward "\\(<[^>]*\\)=\"\"" nil t)
    (replace-match "\\1=\" \""))
  (set-buffer-modified-p nil))

I guess I’ll not be using this and go for a special-purpose solution instead without over a decade old workarounds. After all, I’m not stuck in 2005 with Emacs 21…