I have a text with a lots of lines, my question is how to delete the repeat lines in emacs? using the command in emacs or elisp packages without external utils.

for example:

this is line a
this is line b
this is line a

to remove the 3rd line (same as 1st line)

this is line a
this is line b
up vote 15 down vote accepted

Put this code to your .emacs:

(defun uniq-lines (beg end)
  "Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (while (not (eobp))
        (kill-line 1)
        (yank)
        (let ((next-line (point)))
          (while
              (re-search-forward
               (format "^%s" (regexp-quote (car kill-ring))) nil t)
            (replace-match "" nil nil))
          (goto-char next-line))))))

Usage:

M-x uniq-lines

If you have Emacs 24.4 or newer, the cleanest way to do it would be the new delete-duplicate-lines function. Note that

  • this works on a region, not a buffer, so select the desired text first
  • it maintains the relative order of the originals, killing the duplicates

For example, if your input is

test
dup
dup
one
two
one
three
one
test
five

M-x delete-duplicate-lines would make it

test
dup
one
two
three
five

You've the option of searching from backwards by prefixing it with the universal argument (C-u). The result would then be

dup
two
three
one
test
five

Credit goes to emacsredux.com.

Other roundabout options, not giving quite the same result, available via Eshell:

  1. sort -u; doesn't maintain the relative order of the originals
  2. uniq; worse it needs its input to be sorted
  • sort -u may not be a stable sort, but sort -u -s is – Squidly Feb 10 '15 at 14:34
  • Yes, that's true. Fixed now! Also running it from eshell seems to be a less cleaner solution that using the in-built feature. – legends2k Feb 10 '15 at 14:36
  • @Squid I think I gave the last comment without verifying your's properly. Try feeding the input data to both sort -u and sort -us, you would get the same result which is not the same as delete-duplicate-lines's. More to the point, we are not talking about stable sort, which means the relative order of identical elements are maintained. Since we're removing duplicates, identical elements are anyways lost. delete-duplicate-lines maintains the order of the originals not the duplicates; so one wouldn't be able to get the same result with sort. – legends2k May 5 '17 at 20:26

In linux, select region, and type

M-| uniq <RETURN>

The result without duplicates are in new buffer.

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (setq s                           ; because Emacs can't properly
                                        ; split lines :/
            (substring 
             s (position-if
                (lambda (x)
                  (not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

An alternative:

  • Will not use undo history to store matches.
  • Will be in general faster (but if you are after ultimate speed - build a prefix tree).
  • Has an effect of replacing all former newline characters, whatever they were with \n (UNIX-style). Which may be a bonus or a disadvantage, depending on your situation.
  • You could make it a little bit better (faster), if you re-implement split-string in a way that it accepts characters instead of regular expression.

Somewhat longer, but, perhaps, a bit more efficient variant:

(defun split-string-chars (string chars &optional omit-nulls)
  (let ((separators (make-hash-table))
        (last 0)
        current
        result)
    (dolist (c chars) (setf (gethash c separators) t))
    (dotimes (i (length string)
                (progn
                 (when (< last i)
                   (push (substring string last i) result))
                 (reverse result)))
      (setq current (aref string i))
      (when (gethash current separators)
        (when (or (and (not omit-nulls) (= (1+ last) i))
                  (/= last i))
          (push (substring string last i) result))
        (setq last (1+ i))))))

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string-chars
                (buffer-substring-no-properties start end) '(?\n) t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))
  • 1
    Lines in Emacs buffers are always delimited by \n (regardless of what delimiter is used in the corresponding file). The use of \r for that was only ever for the old selective-display, which has been made obsolete many years ago by the invisible property of overlays and text-properties. – Stefan Oct 24 '12 at 12:37

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.