[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: sb-zeit-de sometimes gets only part of anarticle



Andreas Seltenreich writes:

> Elias Oltmanns writes:
>
>> since there is going to be another release soon, I just wanted to
>> point out that sb-zeit-de sometimes gets only the first part of an
>> article. Unfortunately, I haven't quite worked out the pattern yet.
>
> I suspect this is happening when an advertisement is placed in the
> middle of an article. I'll take a closer look at it...

Right, adjusting shimbun-zeit-de-content-end fixed the problem with
articles containing advertisements. I also added a method
shimbun-clear-contents to get rid of advertisements and webbugs. Patch
attached.

regards,
Andreas

Changes:
            * sb-zeit-de.el: (shimbun-zeit-de-content-end) don't match
            on advertisements 
            (luna-define-method) added to get rid of webbugs and
            advertisements
Index: sb-zeit-de.el
===================================================================
RCS file: /storage/cvsroot/emacs-w3m/shimbun/sb-zeit-de.el,v
retrieving revision 1.2
diff -c -r1.2 sb-zeit-de.el
*** sb-zeit-de.el	9 Mar 2005 01:44:35 -0000	1.2
--- sb-zeit-de.el	23 Mar 2005 23:58:07 -0000
***************
*** 33,39 ****
  
  (defvar shimbun-zeit-de-content-start "title\">")
  (defvar shimbun-zeit-de-content-end
!   "navigation[^>]*>[^A]\\|</p></p></td>\\|\<script\\|</body>\\|</html>")
  (defvar shimbun-zeit-de-from-address "DieZeit@zeit.de")
  
  (luna-define-method shimbun-headers :before ((shimbun shimbun-zeit-de)
--- 33,42 ----
  
  (defvar shimbun-zeit-de-content-start "title\">")
  (defvar shimbun-zeit-de-content-end
!   (concat
!    "</body>\\|</html>\\|navigation[^><]*>[^A]\\|"
!    "<script language=\"JavaScript1\.2\" type=\"text/javascript\">"))
! 
  (defvar shimbun-zeit-de-from-address "DieZeit@zeit.de")
  
  (luna-define-method shimbun-headers :before ((shimbun shimbun-zeit-de)
***************
*** 79,84 ****
--- 82,97 ----
  
  (luna-define-method shimbun-index-url ((shimbun shimbun-zeit-de))
    "http://newsfeed.zeit.de/")
+ 
+ (luna-define-method shimbun-clear-contents :after ((shimbun shimbun-zeit-de)
+ 						    header)
+ 
+   ;;  remove advertisements and 1-pixel-images aka webbugs
+   (shimbun-remove-tags "<a[^>]*doubleclick.net" "</a>")
+   (shimbun-remove-tags "<IFRAME[^>]*doubleclick.net[^>]*>")
+   (shimbun-remove-tags "<img[^>]*doubleclick.net[^>]*>")
+   (shimbun-remove-tags "<img[^>]*\\(width\\|height\\)=\"1px\"[^>]*>")
+   (shimbun-remove-tags "<tr><td[^>]*>Anzeige</td></tr>"))
  
  (provide 'sb-zeit-de)