[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Offline mode for shimbun retrieval
- From: David Engster <deng@xxxxxxxxxxxxxxx>
- Date: Wed, 26 Nov 2008 14:25:42 +0100
- X-ml-name: emacs-w3m
- X-mail-count: 10457
- References: <87d4gjilf4.fsf@xxxxxxxxxxx> <b4mskpex1rm.fsf@xxxxxxx>
Katsumi Yamaoka <yamaoka@xxxxxxx> writes:
>> The filenames used for saving the shimbuns are generated through a md5
>> of the URL (truncated to the first 10 chars).
>
> Isn't the truncation unnecessary? I'm worried about the confliction
> of file names.
The truncation is purely cosmetic, so just remove the two substring
commands if you don't like it. But if I remember correctly, with a 40bit
md5 we can roughly expect one collision in 2^20 items, so I figured this
should be more than enough for the purpose.
>> +(defcustom shimbun-local-path temporary-file-directory
>> + "Directory where local shimbun files are stored.
>> +Default is the system's temporary directory."
>> + :group 'shimbun
>> + :type 'directory)
>
> `temporary-file-directory' is available only in Emacs. For XEmacs
> users, it should be the return value of the `(temp-directory)'
> like this:
>
> (defcustom shimbun-local-path (if (featurep 'xemacs)
> (temp-directory)
> temporary-file-directory)
> ...)
>
> But how about making it default to the value of
> `w3m-default-save-directory'?
Yes, I haven't thought of that one. I changed it to
w3m-default-save-directory; I had to change the location of (require
'w3m) for that, though. I also removed the 'umask' command in the script
generation, since it only really makes sense when using /tmp.
>> +(defun nnshimbun-generate-download-script (&optional async)
>> + "Generate download script for all subscribed schimbuns.
>> +Output will be put in a new buffer. If called with a prefix,
>> +puts a '&' after each curl command."
> ^^^^
> Is curl faster than w3m? ;-) I guess it's true because curl is
> much smaller than w3m. (If you make it customizable like mm-url.el
> does, it seems to be better to do it in shimbun.el since Wanderlust
> users and Mew users will use it in the future.)
Ah, I forgot to change the doc string... Yes, my first version was with
curl, but I switched to w3m to avoid another dependency. I don't think
it makes a big speed difference, though. I don't plan to make this
customizable, since w3m, curl and wget differ in how to include
information from the HEAD request in the file (for extraction of the
Content-Type/Charset). In curl, you can do this via "-w
${content_type}", but it's appended at the end of the file. Besides, I
think w3m does its job just fine. :-)
Regarding speed: Now that retrieving the feeds isn't the bottleneck
anymore, byte-compilation makes some difference, and it seems the
Makefile doesn't compile the shimbuns. For example, shimbun-rss-find-el
in sb-rss.el can take some time on bigger feeds, and byte compilation
makes it about twice as fast. Otherwise, most of the time is spend with
xml-parse-tag, which we cannot really do much about.
I've attached a new version of the patch. BTW, if you plan on including
this, please use dengste@xxxxxx as address in the ChangeLog. I can also
write something up for the documentation.
Regards,
David
Index: shimbun.el
===================================================================
RCS file: /storage/cvsroot/emacs-w3m/shimbun/shimbun.el,v
retrieving revision 1.194
diff -u -r1.194 shimbun.el
--- shimbun.el 23 Jul 2008 08:25:51 -0000 1.194
+++ shimbun.el 26 Nov 2008 13:15:15 -0000
@@ -78,6 +78,7 @@
(require 'eword-encode)
(require 'luna)
(require 'std11)
+(require 'w3m)
(eval-and-compile
(luna-define-class shimbun ()
@@ -185,6 +186,21 @@
:match (lambda (widget value) (natnump value))
:value 1)))
+(defcustom shimbun-use-local nil
+ "Specifies if local files should be used (\"offline\" mode).
+This way, you can use an external script to retrieve the
+necessary HTML/XML files. For an example, see
+`nnshimbun-generate-download-script'. If a local file for an URL
+cannot be found, it will silently be retrieved as usual."
+ :group 'shimbun
+ :type 'boolean)
+
+(defcustom shimbun-local-path w3m-default-save-directory
+ "Directory where local shimbun files are stored.
+Default is the value of `w3m-default-save-directory'."
+ :group 'shimbun
+ :type 'directory)
+
(defun shimbun-servers-list ()
"Return a list of shimbun servers."
(let (servers)
@@ -219,20 +235,43 @@
(shimbun-mua-shimbun-internal mua))
;;; emacs-w3m implementation of url retrieval and entity decoding.
-(require 'w3m)
(defun shimbun-retrieve-url (url &optional no-cache no-decode
referer url-coding-system)
"Rertrieve URL contents and insert to current buffer.
Return content-type of URL as string when retrieval succeeded.
Non-ASCII characters `url' are escaped based on `url-coding-system'."
- (let (type)
- (if (and url
- (setq type (w3m-retrieve
- (w3m-url-transfer-encode-string url url-coding-system)
- nil no-cache nil referer)))
+ (let (type charset fname)
+ (if (and url
+ shimbun-use-local
+ shimbun-local-path
+ (file-regular-p
+ (setq fname (concat (file-name-as-directory
+ (expand-file-name shimbun-local-path))
+ (substring (md5 url) 0 10)
+ "_shimbun"))))
+ ;; get local file contents
+ (progn
+ (let ((coding-system-for-read 'no-conversion))
+ (insert-file-contents fname))
+ (when (re-search-forward "^$" nil t)
+ (let ((pos (match-beginning 0)))
+ (re-search-backward
+ "^Content-Type: \\(.*?\\)\\(?:[ ;]+\\|$\\)\\(charset=\\(.*\\)\\)?"
+ nil t)
+ (setq type (match-string 1)
+ charset (match-string 3))
+ (delete-region (point-min) pos))))
+ ;; retrieve URL
+ (when url
+ (setq type (w3m-retrieve
+ (w3m-url-transfer-encode-string url url-coding-system)
+ nil no-cache nil referer))))
+ (if type
(progn
(unless no-decode
- (w3m-decode-buffer url)
+ (if charset
+ (w3m-decode-buffer url charset type)
+ (w3m-decode-buffer url))
(goto-char (point-min)))
type)
(unless no-decode
Index: nnshimbun.el
===================================================================
RCS file: /storage/cvsroot/emacs-w3m/shimbun/nnshimbun.el,v
retrieving revision 1.62
diff -u -r1.62 nnshimbun.el
--- nnshimbun.el 17 Oct 2007 11:15:58 -0000 1.62
+++ nnshimbun.el 26 Nov 2008 13:15:15 -0000
@@ -993,6 +993,34 @@
(gnus-group-make-group grp (list 'nnshimbun server)))))
(message "No group is found in nnshimbun+%s:" server)))))
+(defun nnshimbun-generate-download-script (&optional async)
+ "Generate download script for all subscribed schimbuns.
+Output will be put in a new buffer. If called with a prefix,
+puts a '&' after each w3m command."
+ (interactive "P")
+ (switch-to-buffer
+ (get-buffer-create "*shimbun download script*"))
+ (erase-buffer)
+ (insert
+ (concat "#!/bin/sh\n# shimbun download script\n\n"
+ "W3M=" (if w3m-command w3m-command "/usr/bin/w3m")
+ "\nOPTS=\"-no-cookie -o accept_encoding=identity -dump_both\"\n\n"))
+ (let ((path (file-name-as-directory
+ (expand-file-name shimbun-local-path)))
+ url fname)
+ ;; get all subscribed shimbun groups
+ (dolist (cur gnus-newsrc-alist)
+ (when (and (eq (car-safe (nth 4 cur)) 'nnshimbun)
+ (<= (nth 1 cur) gnus-level-subscribed))
+ (when (string-match "nnshimbun\\+\\(.+\\):\\(.+\\)" (car cur))
+ (nnshimbun-possibly-change-group (match-string 2 (car cur))
+ (match-string 1 (car cur)))
+ (setq url (shimbun-index-url nnshimbun-shimbun))
+ (setq fname (concat path (substring (md5 url) 0 10) "_shimbun"))
+ (insert
+ (concat "$W3M $OPTS " url " > " fname
+ (if async " &\n" "\n"))))))))
+
(provide 'nnshimbun)
;;; nnshimbun.el ends here