[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: youtube support [CORRECTION]



Apologies. I somehow posted my wrong solution. The correction is
included in-line below

On 2017-09-19 13:37, Boruch Baum wrote:
> I got it into my head to use emacs-w3m for searching youtube videos, and
> decided to share my results, if it's of interest to the project or to
> other users.
>
> 1] Initial configuration
>
>    This part was straightforward:
>
>    #+BEGIN_SRC emacs-lisp
>    (add-to-list 'w3m-uri-replace-alist
>      '("\\`yt:" w3m-search-uri-replace "youtube"))
>
>    (add-to-list ' w3m-content-type-alist
>      '("youtube" "https://www.youtube.com/search?q=%s";))
>    #+END_SRC
>
>
> 2] Initial result
>
>    The initial result was surpisingly satisfying because many of the site
>    features don't require javascript. Twenty results appear per page,
>    navigable using links at the footer.
>
> 3] Thumbnail correction
>
>    Youtube seems to have some form of server bandwidth optimization to
>    limit thumbnail images of result to the first six results. I suspect
>    this is because on javascript-enabled browsers the results appear in
>    a single, infinitely-scrolling page.
>
>    This turned out to be easy to fix, using a simple emacs-w3m 'filter' (but
>    wait for more below):
>
>    #+BEGIN_SRC emacs-lisp
     (goto-char (point-min))
     (while (search-forward "<img" nil t)
       (let ((p1 (match-end 0)) (p2 (search-forward ">" nil t)))
        (goto-char p1)
        (when (search-forward "data-thumb=" p2 t)
          (goto-char p1)
          (when (re-search-forward "src=\"[^\"]*\"" p2 t)
            (replace-match "")
            (goto-char p1)
            (re-search-forward "data-thumb=" p2 t)
            (replace-match "src=")))))
>    #+END_SRC
>
> 4] My current filter
>
>    Once I got started messing with the page, I made a bunch of other
>    changes to make it appear more compact and to remove what I
>    considered unnecessary stuff.
>
>    #+BEGIN_SRC emacs-lisp
>    ; ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
>    ; ┃ youtube filter                                             ┃
>    ; ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
     (defun w3m-filter-youtube (url)
       (w3m-filter-delete-regions url "<head>" "<title>" t t)
       (w3m-filter-delete-regions url "</title>" "</head>" t t)
       (insert "<body>")
       (w3m-filter-delete-regions url
         "<body" "<p class=\"num-results" nil t (point))
       (w3m-filter-delete-regions url
         "<div id=\"footer-container\"" "</body>" nil t)
       (w3m-filter-replace-regexp url "</?h4[^>]*>" "")
       (goto-char (point-min))
       (let ((p1 (point)) (p2 (search-forward "<ol" nil t)))
        (w3m-filter-replace-regexp url "<li>" " | " p1 p2)
        (w3m-filter-replace-regexp url "<ul>" "" p1 p2)
        (w3m-filter-replace-regexp url
          "<li><div class=\"yt-lockup[^>]*>" "<p><li>" p2)
        (w3m-filter-replace-regexp url "<button.*</button>" "")
        (w3m-filter-replace-regexp url "<a aria-hidden[^>]*>" "")
        (w3m-filter-replace-regexp url "</?h3[^>]*>" "")
        (goto-char (point-min))
        (while (search-forward "<ul class=\"yt-lockup-meta-info\">" nil t)
          (delete-region (match-beginning 0) (match-end 0))
          (setq p1 (point) p2 (search-forward "</ul>" nil t))
          (w3m-filter-replace-regexp url "</?li>" " " p1 p2)
          (w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
        (goto-char (point-min))
        (while (search-forward "<ul class=\"yt-badge-list \">" nil t)
          (replace-match "")
          (setq p1 (point) p2 (search-forward "</ul>" nil t))
          (insert " ")
          (w3m-filter-replace-regexp url "</?li[^>]*>" " " p1 p2)
          (w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
        (w3m-filter-replace-regexp url
          "</div><div class=\"yt-lockup-meta \">" "")
        (goto-char (point-min))
        (while (search-forward "<img" nil t)
          (let ((p1 (match-end 0)) (p2 (search-forward ">" nil t)))
           (goto-char p1)
           (when (search-forward "data-thumb=" p2 t)
             (goto-char p1)
             (when (re-search-forward "src=\"[^\"]*\"" p2 t)
               (replace-match "")
               (goto-char p1)
               (re-search-forward "data-thumb=" p2 t)
               (replace-match "src=")))))))
>
>    (add-to-list 'w3m-filter-configuration
>      '(t
>        "filter for youtube.com"
>        "\\`http[s]?://www.youtube\\.com/"
>        w3m-filter-youtube))
>    #+END_SRC
>
> 5] TODO, maybe
>
>    There is a minor bug at the bottom of the page that I'm not terribly
>    motivated to try to fix, unless maybe someone makes a complaint. The
>    search navigation links should read something like
>
>                     "1 2 3 4 5 6 7 Next »"
>
>    where all the text are links except for the indicator of the current
>    page. However, the current code deletes the element indicating the
>    current page, so for example, at page three what appears is
>
>                     "1 2 4 5 6 7 Next »"
>
>    This is because of how the filter is currently deleting "button"s.
>

-- 
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0