[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Shimbun for RSS feeds without published content
- From: Katsumi Yamaoka <yamaoka@xxxxxxx>
- Date: Fri, 28 Nov 2008 16:10:28 +0900
- X-ml-name: emacs-w3m
- X-mail-count: 10462
- References: <87tz9sg4lx.fsf@xxxxxxxx>
>>>>> In [emacs-w3m : No.10461] David Engster wrote:
> In my ongoing quest to retire nnrss in favor of nnshimbun in Gnus, I
> also made a generic RSS shimbun for feeds which do not contain the full
> content (which seems to become more and more common these days). It is
> similar to what rss-hash does for feeds with published content.[1]
> I called this shimbun rss-blogs, since it already tries to detect some
> popular blogging engines (Google Blogger a.k.a. Blogspot, WordPress and
> TypePad) to extract the post contents. So far this has worked pretty
> well for me, but if you encounter a site that uses one of those engines
> and where the content extraction fails, please let me know and I'll try
> to adapt the regexps.
> However, rss-blogs is of course not restricted to those engines. You can
> use it with any feed and manually specify regexps for content-start and
> content-end, just like with rss-hash. Please look into the variable
> shimbun-rss-blogs-group-url-regexp for details.
> Regards,
> David
> [1] By the way, I first could not get rss-hash to work, since it is
> explicitly excluded in the function shimbun-servers-list (and the same
> applies to atom-hash). Is this a bug?
I have no time to try sb-rss-blogs.el, sorry, but I've installed
it in my system anyway. When I compiled it in the emacs-w3m source
tree, I got the following warnings:
While compiling sb-rss-blogs-guess-type-from-rss in file emacs-w3m/shimbun/sb-rss-blogs.el:
** assignment to free variable shimbun-rss-blogs-current-type
While compiling toplevel forms in file emacs-w3m/shimbun/sb-rss-blogs.el:
** reference to free variable group
Is the attached patch ok? If so, I'll install it in the emacs-w3m
CVS trunk.
--- sb-rss-blogs.el~ 2008-11-28 06:47:43 +0000
+++ sb-rss-blogs.el 2008-11-28 07:07:57 +0000
@@ -99,25 +99,22 @@
"Analyze 'generator' tag in RSS feed for known CMS."
(save-excursion
(goto-char (point-min))
- (setq
- shimbun-rss-blogs-current-type
- (if (or (save-excursion
- (re-search-forward
- "<[ ]*generator[ ]*>\\(.+\\)<[ ]*/generator[ ]*>" nil t))
- (save-excursion
- (re-search-forward
- "generator=[\"']\\(.+?\\)[\"']" nil t)))
- (let ((type (match-string 1)))
- (cond
- ((string-match "blogger" type)
- 'blogger)
- ((string-match "WordPress" type)
- 'wordpress)
- ((string-match "TypePad" type)
- 'typepad)
- (t
- nil)))
- nil))))
+ (when (or (save-excursion
+ (re-search-forward
+ "<[ ]*generator[ ]*>\\(.+\\)<[ ]*/generator[ ]*>" nil t))
+ (save-excursion
+ (re-search-forward
+ "generator=[\"']\\(.+?\\)[\"']" nil t)))
+ (let ((type (match-string 1)))
+ (cond
+ ((string-match "blogger" type)
+ 'blogger)
+ ((string-match "WordPress" type)
+ 'wordpress)
+ ((string-match "TypePad" type)
+ 'typepad)
+ (t
+ nil))))))
(defun shimbun-rss-blogs-guess-type-from-html ()
"Analyze 'generator' tag in HTML page for known CMS."
@@ -144,7 +141,8 @@
(url (shimbun-header-xref header))
(startend
(cdr-safe
- (cdr (assoc group shimbun-rss-blogs-group-url-regexp))))
+ (cdr (assoc (shimbun-current-group-internal shimbun)
+ shimbun-rss-blogs-group-url-regexp))))
content-start content-end)
(unless (eq (car-safe startend) 'none)
(cond