[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: sb-atom and sb-rss fetching already deleted articles



Katsumi Yamaoka <yamaoka@xxxxxxx> writes:
>>>>>> In [emacs-w3m : No.11159] David Engster wrote:
>> That's perfectly fine for me.  Another solution might be to compare the
>> date headers of the entries: get all entries of the feed which are not
>> present in the group, but also check their date header if they are
>> actually newer than the ones which are already there. If an entry is
>> older, we can assume it was already expired?
>
> I'm not familiar with wl/elmo code nowadays, but AFAICT, wl/elmo
> doesn't seem to have the one like nov that keeps date information.
> So, I've made changes in sb-rss.el and sb-rss-blogs.el in the way
> I wrote last.  sb-atom.el is unchanged.

I've implemented the approach I quoted above and attached a patch
against current CVS.  The advantage of this approach is that it works
more generally, without the need for an extra parameter.  So far it has
worked for me, but I obviously couldn't test some of the special cases
for 'hankaku' which are implemented there.

Maybe you could take a look at the patch and see if it works for you.  I
could implement the same for sb-atom.el if you should decide to apply
it.

Regards,
David
? sb-slashdot-patch.diff
Index: sb-rss-blogs.el
===================================================================
RCS file: /storage/cvsroot/emacs-w3m/shimbun/sb-rss-blogs.el,v
retrieving revision 1.3
diff -u -r1.3 sb-rss-blogs.el
--- sb-rss-blogs.el	25 Feb 2010 02:11:10 -0000	1.3
+++ sb-rss-blogs.el	25 Feb 2010 20:53:09 -0000
@@ -80,7 +80,7 @@
 						  shimbun-rss-blogs)
 						 &optional range)
   (let ((group (shimbun-current-group-internal shimbun))
-	(headers (shimbun-rss-get-headers shimbun range t nil t))
+	(headers (shimbun-rss-get-headers shimbun range t))
 	(type (sb-rss-blogs-guess-type-from-rss))
 	from)
     (cond
Index: sb-rss.el
===================================================================
RCS file: /storage/cvsroot/emacs-w3m/shimbun/sb-rss.el,v
retrieving revision 1.50
diff -u -r1.50 sb-rss.el
--- sb-rss.el	25 Feb 2010 02:11:10 -0000	1.50
+++ sb-rss.el	25 Feb 2010 20:53:09 -0000
@@ -173,8 +173,7 @@
   (shimbun-rss-get-headers shimbun range t))
 
 (defun shimbun-rss-get-headers (shimbun &optional range
-					need-descriptions need-all-items
-					quit-immediately)
+					need-descriptions need-all-items)
   (let ((xml (condition-case err
 		 (xml-parse-region (point-min) (point-max))
 	       (error
@@ -182,14 +181,55 @@
 			 (shimbun-index-url shimbun)
 			 (error-message-string err))
 		nil)))
-	(ignored-subject (luna-slot-value shimbun 'ignored-subject))
-	dc-ns rss-ns author hankaku headers)
-    (when xml
-      (setq dc-ns (shimbun-rss-get-namespace-prefix
-		   xml "http://purl.org/dc/elements/1.1/")
-	    rss-ns (shimbun-rss-get-namespace-prefix
-		    xml "http://purl.org/rss/1.0/")
-	    author
+	headers newheaders newestdate)
+    (setq headers
+	  (shimbun-rss-get-headers-1 xml shimbun need-descriptions))
+    (when headers
+      (if need-all-items
+	  headers
+	;; Search for the newest item we already have in the shimbun.
+	(dolist (header headers newestdate)
+	  (let* ((date (shimbun-header-date header))
+		 (id (shimbun-header-id header))
+		 datesecs)
+	    (when (and (stringp date)
+		       (> (length date) 0))
+	      (setq datesecs (float-time (date-to-time date))))
+	    (if (and datesecs
+		     (shimbun-search-id shimbun id)
+		     (or (null newestdate)
+			 (< newestdate datesecs)))
+		(setq newestdate datesecs)
+	      (push header newheaders))))
+	(if (null newestdate)
+	    ;; Feed does not have any date information, so we return everything.
+	    newheaders
+	  ;; Return only those items which are newer.
+	  (delq nil
+		(mapcar
+		 (lambda (header)
+		   (let ((date (shimbun-header-date header))
+			 datesecs)
+		     (when (and (stringp date)
+				(> (length date) 0))
+		       (setq datesecs (float-time (date-to-time date))))
+		     (when (or (null datesecs)
+			       (<= newestdate datesecs))
+		       header)))
+		 newheaders)))))))
+
+
+(defun shimbun-rss-get-headers-1 (xml shimbun need-descriptions)
+  "Retrieve all items found in XML for SHIMBUN and return headers.
+If NEED-DESCRIPTIONS, include node text as description."
+  (when xml
+    (let  ((dc-ns (shimbun-rss-get-namespace-prefix
+		   xml "http://purl.org/dc/elements/1.1/"))
+	   (rss-ns (shimbun-rss-get-namespace-prefix
+		    xml "http://purl.org/rss/1.0/"))
+	   (ignored-subject (luna-slot-value shimbun 'ignored-subject))
+	   author hankaku headers)
+      (setq author
 	    (catch 'found-author
 	      (dolist (channel
 		       (shimbun-rss-find-el (intern (concat rss-ns "channel"))
@@ -203,29 +243,21 @@
 				  '(body nil))
 		      (generate-new-buffer " *temp*")))
       (unwind-protect
-	  (catch 'done
-	    (dolist (item (shimbun-rss-find-el (intern (concat rss-ns "item"))
-					       xml)
-			  headers)
-	      (let ((url (and (listp item)
-			      (eq (intern (concat rss-ns "item")) (car item))
-			      (shimbun-rss-node-text rss-ns 'link (cddr item)))))
-		(when url
-		  (let* ((date (or (shimbun-rss-get-date shimbun url)
-				   (shimbun-rss-node-text dc-ns 'date item)
-				   (shimbun-rss-node-text rss-ns 'pubDate item)))
-			 (id (shimbun-rss-build-message-id shimbun url date))
-			 (subject (shimbun-rss-node-text rss-ns 'title item)))
-		    (when (and id
-			       (or need-all-items
-				   (if (shimbun-search-id shimbun id)
-				       (if quit-immediately
-					   (throw 'done headers)
-					 nil)
-				     t))
-			       (if (and ignored-subject subject)
-				   (not (string-match ignored-subject subject))
-				 t))
+	  (dolist (item (shimbun-rss-find-el (intern (concat rss-ns "item"))
+					     xml)
+			headers)
+	    (let ((url (and (listp item)
+			    (eq (intern (concat rss-ns "item")) (car item))
+			    (shimbun-rss-node-text rss-ns 'link (cddr item)))))
+	      (when url
+		(let* ((date (or (shimbun-rss-get-date shimbun url)
+				 (shimbun-rss-node-text dc-ns 'date item)
+				 (shimbun-rss-node-text rss-ns 'pubDate item)))
+		       (id (shimbun-rss-build-message-id shimbun url date))
+		       (subject (shimbun-rss-node-text rss-ns 'title item)))
+		  (when id
+		    (unless (and ignored-subject subject
+				 (string-match ignored-subject subject))
 		      (push
 		       (shimbun-create-header
 			0