Brutal Workarounds


Today I took another stab at my abandoned Emacs Lisp IRC bot and thought to myself that it would be nice if it were able to notify me on RSS and Atom feed updates. Now, I’m not terribly fond of reinventing the wheel, so like every good programmer I dared taking a look at existing solutions, like News Ticker:

(defun newsticker--do-xml-workarounds ()
  "Fix all issues which `xml-parse-region' could be choking on."

  ;; a very very dirty workaround to overcome the
  ;; problems with the newest (20030621) xml.el:
  ;; remove all unnecessary whitespace
  (goto-char (point-min))
  (while (re-search-forward ">[ \t\r\n]+<" nil t)
    (replace-match "><" nil t))
  ;; and another brutal workaround (20031105)!  For some
  ;; reason the xml parser does not like the colon in the
  ;; doctype name "rdf:RDF"
  (goto-char (point-min))
  (if (re-search-forward "<!DOCTYPE[ \t\n]+rdf:RDF" nil t)
      (replace-match "<!DOCTYPE rdfColonRDF" nil t))
  ;; finally.... ~##^°!!!!!
  (goto-char (point-min))
  (while (search-forward "\r\n" nil t)
    (replace-match "\n" nil t))
  ;; still more brutal workarounds (20040309)!  The xml
  ;; parser does not like doctype rss
  (goto-char (point-min))
  (if (re-search-forward "<!DOCTYPE[ \t\n]+rss[ \t\n]*>" nil t)
      (replace-match "" nil t))
  ;; And another one (20050618)! (Fixed in GNU Emacs
  ;; Remove comments to avoid this xml-parsing bug:
  ;; "XML files can have only one toplevel tag"
  (goto-char (point-min))
  (while (search-forward "<!--" nil t)
    (let ((start (match-beginning 0)))
      (unless (search-forward "-->" nil t)
        (error "Can't find end of comment"))
      (delete-region start (point))))
  ;; And another one (20050702)! If description is HTML
  ;; encoded and starts with a `<', wrap the whole
  ;; description in a CDATA expression.  This happened for
  (goto-char (point-min))
  (while (re-search-forward
          "<description>\\(<img.*?\\)</description>" nil t)
     "<description><![CDATA[ \\1 ]]></description>"))
  ;; And another one (20051123)! XML parser does not
  ;; like this: <yweather:location city="Frankfurt/Main"
  ;; region="" country="GM" />
  ;; try to "fix" empty attributes
  ;; This happened for
  (goto-char (point-min))
  (while (re-search-forward "\\(<[^>]*\\)=\"\"" nil t)
    (replace-match "\\1=\" \""))
  (set-buffer-modified-p nil))

I guess I’ll not be using this and go for a special-purpose solution instead without over a decade old workarounds. After all, I’m not stuck in 2005 with Emacs 21…

Close Enough


byte-opt.el is one of those files Jamie Zawinski laid his golden hands on. It seems that back in the days, there wasn’t much of a concern about Emacs Lisp execution speed until he got annoyed enough to bolt on an optimizer. Its sources start with a wonderful quote:

“No matter how hard you try, you can’t make a racehorse out of a pig. You can, however, make a faster pig.”

I recommend reading it to get an idea what compiler jargon like “peephole optimizer” could possibly mean. During my last study, I found this curious piece of code:

(defun byte-optimize-approx-equal (x y)
  (<= (* (abs (- x y)) 100) (abs (+ x y))))

So, according to this 99 and 100 are equal. Awesome!

Idiomatic Emacs Lisp

<JordiGH> Strictly speaking, isn’t “idiomatic lisp” whatever rms writes?

I’m afraid this is not the case. See this snippet.

;;Function that handles term messages: code by rms (and you can see the
;;difference ;-) -mm

(defun term-handle-ansi-terminal-messages (message)
  ;; Is there a command here?
  (while (string-match "\eAnSiT.+\n" message)
    ;; Extract the command code and the argument.
    (let* ((start (match-beginning 0))
           (command-code (aref message (+ start 6)))
              (substring message
                         (+ start 8)
                         (string-match "\r?\n" message
                                       (+ start 8)))))
      ;; Delete this command from MESSAGE.
      (setq message (replace-match "" t t message))

      ;; If we recognize the type of command, set the appropriate variable.
      (cond ((= command-code ?c)
             (setq term-ansi-at-dir argument))
            ((= command-code ?h)
             (setq term-ansi-at-host argument))
            ((= command-code ?u)
             (setq term-ansi-at-user argument))
            ;; Otherwise ignore this one.
             (setq ignore t)))

      ;; Update default-directory based on the changes this command made.
      (if ignore
        (setq default-directory
               (if (and (string= term-ansi-at-host (system-name))
                                        (string= term-ansi-at-user (user-real-login-name)))
                   (expand-file-name term-ansi-at-dir)
                 (if (string= term-ansi-at-user (user-real-login-name))
                     (concat "/" term-ansi-at-host ":" term-ansi-at-dir)
                   (concat "/" term-ansi-at-user "@" term-ansi-at-host ":"

        ;; I'm not sure this is necessary,
        ;; but it's best to be on the safe side.
        (if (string= term-ansi-at-host (system-name))
              (setq ange-ftp-default-user term-ansi-at-save-user)
              (setq ange-ftp-default-password term-ansi-at-save-pwd)
              (setq ange-ftp-generate-anonymous-password term-ansi-at-save-anon))
          (setq term-ansi-at-save-user ange-ftp-default-user)
          (setq term-ansi-at-save-pwd ange-ftp-default-password)
          (setq term-ansi-at-save-anon ange-ftp-generate-anonymous-password)
          (setq ange-ftp-default-user nil)
          (setq ange-ftp-default-password nil)
          (setq ange-ftp-generate-anonymous-password nil)))))

This isn’t bad code by any means, just clumsy and careful as opposed to the highly compressed nature of the surrounding code. The “I’m not sure this is necessary, but it’s best to be on the safe side.” comment reminds me of The Daily WTF.



This is a codeless post that will instead focus on a design issue present in all (at the time of writing) stable releases of Emacs. Be assured that you will not have to work around it in the upcoming Emacs 25 release.

Have you ever wondered why some commands deactivate the region afterwards, although there’s no explicit call to the deactivate-mark function? It turns out that this is intentional behavior as can be seen in the documentation of the deactivate-mark variable:

If an editing command sets this to t, deactivate the mark afterward.
The command loop sets this to nil before each command,
and tests the value when the command returns.
Buffer modification stores t in this variable.

So, any command modifying a buffer will deactivate the region. Makes sense and if you for some reason need the region again, it’s a C-x C-x away. There is a major problem with this though, it doesn’t matter which buffer is modified…

This bit me hard with eyebrowse. I am using a modeline indicator to visualize its state which is using the built-in format-spec package. As that package is using a temporary buffer for turning a format string into a formatted string and the modeline indicator is recalculated very often, this led to the region being deactivated on any command. It took me quite a bit to figure this one out. I consider it madness for anyone to expect this behavior when writing functions that should not interfere with the region, so I’m glad it has been fixed in Emacs 25 by making the variable buffer-local.

Let's consider an obsolete replacement


I’m currently writing my second mode, this time for textual markup. As I still don’t have much experience with it, I did look at other modes of that kind, ultimately ending up with rst.el.

It’s not unusual for older code to redefine things that could possibly not supported by all Emacs versions out there. What I did not expect however, was an implementation of symbolic regular expressions:

(defvar rst-re-alist) ; Forward declare to use it in `rst-re'.

;; FIXME: Use `sregex' or `rx' instead of re-inventing the wheel.
(rst-testcover-add-compose 'rst-re)
;; testcover: ok.
(defun rst-re (&rest args)
  "Interpret ARGS as regular expressions and return a regex string.
Each element of ARGS may be one of the following:

A string which is inserted unchanged.

A character which is resolved to a quoted regex.

A symbol which is resolved to a string using `rst-re-alist-def'.

A list with a keyword in the car.  Each element of the cdr of such
a list is recursively interpreted as ARGS.  The results of this
interpretation are concatenated according to the keyword.

For the keyword `:seq' the results are simply concatenated.

For the keyword `:shy' the results are concatenated and
surrounded by a shy-group (\"\\(?:...\\)\").

For the keyword `:alt' the results form an alternative (\"\\|\")
which is shy-grouped (\"\\(?:...\\)\").

For the keyword `:grp' the results are concatenated and form a
referenceable group (\"\\(...\\)\").

After interpretation of ARGS the results are concatenated as for
  (apply 'concat
          (lambda (re)
             ((stringp re)
             ((symbolp re)
              (cadr (assoc re rst-re-alist)))
             ((characterp re)
              (regexp-quote (char-to-string re)))
             ((listp re)
              (let ((nested
                     (mapcar (lambda (elt)
                               (rst-re elt))
                             (cdr re))))
                 ((eq (car re) :seq)
                  (mapconcat 'identity nested ""))
                 ((eq (car re) :shy)
                  (concat "\\(?:" (mapconcat 'identity nested "") "\\)"))
                 ((eq (car re) :grp)
                  (concat "\\(" (mapconcat 'identity nested "") "\\)"))
                 ((eq (car re) :alt)
                  (concat "\\(?:" (mapconcat 'identity nested "\\|") "\\)"))
                  (error "Unknown list car: %s" (car re))))))
              (error "Unknown object type for building regex: %s" re))))

;; FIXME: Remove circular dependency between `rst-re' and `rst-re-alist'.
(with-no-warnings ; Silence byte-compiler about this construction.
  (defconst rst-re-alist
    ;; Shadow global value we are just defining so we can construct it step by
    ;; step.
    (let (rst-re-alist)
      (dolist (re rst-re-alist-def rst-re-alist)
        (setq rst-re-alist
              (nconc rst-re-alist
                     (list (list (car re) (apply 'rst-re (cdr re))))))))
    "Alist mapping symbols from `rst-re-alist-def' to regex strings."))

I find it hilarious that they appear to be aware of a now obsolete alternative and a more powerful, officially supported one, yet decided to do their own thang. At least there’s not much code around that could be yucky, if you ignore that one circular dependency mentioned at the bottom between the function and its look-up alist.