[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: URLs vs. line breaks
- From: Katsumi Yamaoka <yamaoka@xxxxxxx>
- Date: Wed, 20 Nov 2013 15:58:52 +0900
- X-ml-name: emacs-w3m
- X-mail-count: 12218
- References: <b4mpppwheo3.fsf@xxxxxxx> <87siussmnn.fsf@xxxxxxxxxxx>
In [emacs-w3m : No.12217] jidanni@xxxxxxxxxxx wrote:
>>>>>> "KY" == Katsumi Yamaoka <yamaoka@xxxxxxx> writes:
KY> Do you have an idea to detect absolutely correctly such a broken
KY> url? (I mean it will be apt to gather non-url words.)
> All that needs to be done is parsing the URLs before folding lines
> instead of after.
Indeed. I realized this is the emacs-w3m matter, not Gnus'.
w3m tries to fold a long word, no matter whether it looks like
a url, so I made such ones surrounded with <nobr>...</nobr>
before passing to w3m. I think it would be useful not only for
Gnus articles. The regexp used to look for url-like things is
a copy of `gnus-button-url-regexp'. A patch follows:
--- w3m.el~ 2013-10-17 01:33:17.000000000 +0000
+++ w3m.el 2013-11-20 06:52:10.389931200 +0000
@@ -6167,6 +6167,58 @@
(frame-char-width)))))
(list "-o" "display_image=off")))))))))
+(defvar gnus-button-url-regexp)
+
+(defun w3m-markup-urls-nobreak ()
+ "Make things that look like urls unbreakable.
+This function prevents non-link long urls from being broken (w3m tries
+to fold them)."
+ (let ((case-fold-search t)
+ (regexp
+ (eval-when-compile
+ ;; A copy of `gnus-button-url-regexp'.
+ (concat
+ "\\b\\(\\(www\\.\\|\\(s?https?\\|ftp\\|file\\|gopher\\|"
+ "nntp\\|news\\|telnet\\|wais\\|mailto\\|info\\):\\)"
+ "\\(//[-a-z0-9_.]+:[0-9]*\\)?"
+ (if (string-match "[[:digit:]]" "1") ;; Support POSIX?
+ (let ((chars "-a-z0-9_=#$@~%&*+\\/[:word:]")
+ (punct "!?:;.,"))
+ (concat
+ "\\(?:"
+ ;; Match paired parentheses, e.g. in Wikipedia URLs:
+ ;; http://thread.gmane.org/47B4E3B2.3050402@xxxxxxxxx
+ "[" chars punct "]+" "(" "[" chars punct "]+" "[" chars "]*)"
+ "\\(?:" "[" chars punct "]+" "[" chars "]" "\\)?"
+ "\\|"
+ "[" chars punct "]+" "[" chars "]"
+ "\\)"))
+ (concat ;; XEmacs 21.4 doesn't support POSIX.
+ "\\([-a-z0-9_=!?#$@~%&*+\\/:;.,]\\|\\w\\)+"
+ "\\([-a-z0-9_=#$@~%&*+\\/]\\|\\w\\)"))
+ "\\)")))
+ (nd (make-marker))
+ st)
+ (goto-char (point-min))
+ (while (re-search-forward regexp nil t)
+ (set-marker nd (match-end 0))
+ (setq st (goto-char (match-beginning 0)))
+ (if (and (re-search-backward "\\(<\\)\\|>" nil t)
+ (match-beginning 1))
+ (goto-char nd)
+ (goto-char st)
+ (skip-chars-backward "\t\f ")
+ (when (string-match "<" (buffer-substring (max (- (point) 4)
+ (point-min))
+ (point)))
+ (forward-char -4))
+ (insert "<nobr>")
+ (goto-char nd)
+ (when (looking-at "[\t\f ]*>")
+ (goto-char (match-end 0)))
+ (insert "</nobr>")))
+ (set-marker nd nil)))
+
(defun w3m-rendering-buffer (&optional charset)
"Do rendering of contents in the currenr buffer as HTML and return title."
(w3m-message "Rendering...")
@@ -6177,6 +6229,7 @@
(unless (eq w3m-type 'w3m-m17n)
(w3m-remove-meta-charset-tags))
(w3m-fix-illegal-blocks)
+ (w3m-markup-urls-nobreak)
(w3m-rendering-half-dump charset)
(w3m-message "Rendering...done")
(w3m-rendering-extract-title))