[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Offline mode for shimbun retrieval

I've installed the shimbun offline feature in the emacs-w3m trunk.
That's very nice!  Currently only Gnus users can use it.  Wanderlust
experts and Mew experts, would you mind making the codes for them

(For Gnus users) it's easy to get started.  To do that, set the
`shimbun-use-local' variable to non-nil, make the shell script
by performing the `nnshimbun-generate-download-script' command,
execute the script, and check new articles in the shimbun groups
as usual.

(The feature is not effective for some shimbun modules, such as
 sb-gendai-net.el.  In that case, because the index url only
 offers the real url and the module always reads it online.)

>>>>> In [emacs-w3m : No.10457] David Engster wrote:
> Katsumi Yamaoka <yamaoka@xxxxxxx> writes:
>>> The filenames used for saving the shimbuns are generated through a md5
>>> of the URL (truncated to the first 10 chars).
>> Isn't the truncation unnecessary?  I'm worried about the confliction
>> of file names.

> The truncation is purely cosmetic, so just remove the two substring
> commands if you don't like it. But if I remember correctly, with a 40bit
> md5 we can roughly expect one collision in 2^20 items, so I figured this
> should be more than enough for the purpose.

I see.  So let's leave it until someone reports the problem.

>>> +(defcustom shimbun-local-path temporary-file-directory
>> But how about making it default to the value of
>> `w3m-default-save-directory'?

> Yes, I haven't thought of that one. I changed it to
> w3m-default-save-directory; I had to change the location of (require
> 'w3m) for that, though. I also removed the 'umask' command in the script
> generation, since it only really makes sense when using /tmp.

>>> +(defun nnshimbun-generate-download-script (&optional async)
>>> +  "Generate download script for all subscribed schimbuns.
>>> +Output will be put in a new buffer.  If called with a prefix,
>>> +puts a '&' after each curl command."
>>                          ^^^^
>> Is curl faster than w3m? ;-)  I guess it's true because curl is
>> much smaller than w3m.

> Ah, I forgot to change the doc string... Yes, my first version was with
> curl, but I switched to w3m to avoid another dependency. I don't think
> it makes a big speed difference, though. I don't plan to make this
> customizable, since w3m, curl and wget differ in how to include
> information from the HEAD request in the file (for extraction of the
> Content-Type/Charset). In curl, you can do this via "-w
> ${content_type}", but it's appended at the end of the file. Besides, I
> think w3m does its job just fine. :-)

> Regarding speed: Now that retrieving the feeds isn't the bottleneck
> anymore, byte-compilation makes some difference, and it seems the
> Makefile doesn't compile the shimbuns. For example, shimbun-rss-find-el
> in sb-rss.el can take some time on bigger feeds, and byte compilation
> makes it about twice as fast. Otherwise, most of the time is spend with
> xml-parse-tag, which we cannot really do much about.

I understood what's the point.  Thanks.  Actually the script seems
to spend time for retrieving, not for loading the w3m executable.
In my case, time seems to be eaten mainly for retrieving, so the
offline retrieving of indices is useful enough.

> I've attached a new version of the patch. BTW, if you plan on including
> this, please use dengste@xxxxxx as address in the ChangeLog.


> I can also write something up for the documentation.

Great.  Please write the Info entry when you have time.

BTW, I've slightly modified `nnshimbun-generate-download-script'
so that it quotes urls in the script since some urls contain SPC,
"&", etc.  Also I've made it so as not to cause an error if old
groups that shimbun modules no longer support exist.  (I didn't
notice I have such groups because I usually use `1 g' or `2 g'
instead of the `g' group command but those groups are in the group
level 4.)