[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Shimbun for RSS feeds without published content



>>>>> In [emacs-w3m : No.10461] David Engster wrote:
> In my ongoing quest to retire nnrss in favor of nnshimbun in Gnus, I
> also made a generic RSS shimbun for feeds which do not contain the full
> content (which seems to become more and more common these days). It is
> similar to what rss-hash does for feeds with published content.[1]

> I called this shimbun rss-blogs, since it already tries to detect some
> popular blogging engines (Google Blogger a.k.a. Blogspot, WordPress and
> TypePad) to extract the post contents. So far this has worked pretty
> well for me, but if you encounter a site that uses one of those engines
> and where the content extraction fails, please let me know and I'll try
> to adapt the regexps.

> However, rss-blogs is of course not restricted to those engines. You can
> use it with any feed and manually specify regexps for content-start and
> content-end, just like with rss-hash. Please look into the variable
> shimbun-rss-blogs-group-url-regexp for details.

> Regards,
> David

> [1] By the way, I first could not get rss-hash to work, since it is
> explicitly excluded in the function shimbun-servers-list (and the same
> applies to atom-hash). Is this a bug?

I have no time to try sb-rss-blogs.el, sorry, but I've installed
it in my system anyway.  When I compiled it in the emacs-w3m source
tree, I got the following warnings:

While compiling sb-rss-blogs-guess-type-from-rss in file emacs-w3m/shimbun/sb-rss-blogs.el:
  ** assignment to free variable shimbun-rss-blogs-current-type
While compiling toplevel forms in file emacs-w3m/shimbun/sb-rss-blogs.el:
  ** reference to free variable group

Is the attached patch ok?  If so, I'll install it in the emacs-w3m
CVS trunk.
--- sb-rss-blogs.el~	2008-11-28 06:47:43 +0000
+++ sb-rss-blogs.el	2008-11-28 07:07:57 +0000
@@ -99,25 +99,22 @@
   "Analyze 'generator' tag in RSS feed for known CMS." 
   (save-excursion
     (goto-char (point-min))
-    (setq
-     shimbun-rss-blogs-current-type
-     (if (or (save-excursion
-	       (re-search-forward
-		"<[ ]*generator[ ]*>\\(.+\\)<[ ]*/generator[ ]*>" nil t))
-	     (save-excursion
-	       (re-search-forward
-		"generator=[\"']\\(.+?\\)[\"']" nil t)))
-	 (let ((type (match-string 1)))
-	   (cond
-	    ((string-match "blogger" type)
-	     'blogger)
-	    ((string-match "WordPress" type)
-	     'wordpress)
-	    ((string-match "TypePad" type)
-	     'typepad)
-	    (t
-	     nil)))
-       nil))))
+    (when (or (save-excursion
+		(re-search-forward
+		 "<[ ]*generator[ ]*>\\(.+\\)<[ ]*/generator[ ]*>" nil t))
+	      (save-excursion
+		(re-search-forward
+		 "generator=[\"']\\(.+?\\)[\"']" nil t)))
+      (let ((type (match-string 1)))
+	(cond
+	 ((string-match "blogger" type)
+	  'blogger)
+	 ((string-match "WordPress" type)
+	  'wordpress)
+	 ((string-match "TypePad" type)
+	  'typepad)
+	 (t
+	  nil))))))
 
 (defun shimbun-rss-blogs-guess-type-from-html ()
   "Analyze 'generator' tag in HTML page for known CMS." 
@@ -144,7 +141,8 @@
 	(url (shimbun-header-xref header))
 	(startend 
 	 (cdr-safe
-	  (cdr (assoc group shimbun-rss-blogs-group-url-regexp))))
+	  (cdr (assoc (shimbun-current-group-internal shimbun)
+		      shimbun-rss-blogs-group-url-regexp))))
 	content-start content-end)
     (unless (eq (car-safe startend) 'none)
       (cond