[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

youtube support



I got it into my head to use emacs-w3m for searching youtube videos, and
decided to share my results, if it's of interest to the project or to
other users.

1] Initial configuration

   This part was straightforward:

   #+BEGIN_SRC emacs-lisp
   (add-to-list 'w3m-uri-replace-alist
     '("\\`yt:" w3m-search-uri-replace "youtube"))

   (add-to-list ' w3m-content-type-alist
     '("youtube" "https://www.youtube.com/search?q=%s";))
   #+END_SRC


2] Initial result

   The initial result was surpisingly satisfying because many of the site
   features don't require javascript. Twenty results appear per page,
   navigable using links at the footer.

3] Thumbnail correction

   Youtube seems to have some form of server bandwidth optimization to
   limit thumbnail images of result to the first six results. I suspect
   this is because on javascript-enabled browsers the results appear in
   a single, infinitely-scrolling page.

   This turned out to be easy to fix, using a simple emacs-w3m 'filter' (but
   wait for more below):

   #+BEGIN_SRC emacs-lisp
   (w3m-filter-replace-regexp url
     "data-thumb=" "src=")
   #+END_SRC

4] My current filter

   Once I got started messing with the page, I made a bunch of other
   changes to make it appear more compact and to remove what I
   considered unnecessary stuff.

   #+BEGIN_SRC emacs-lisp
   ; ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ; ┃ youtube filter                                             ┃
   ; ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
   (defun w3m-filter-youtube (url)
     (w3m-filter-delete-regions url "<head>" "<title>" t t)
     (w3m-filter-delete-regions url "</title>" "</head>" t t)
     (insert "<body>")
     (w3m-filter-delete-regions url
       "<body" "<p class=\"num-results" nil t (point))
     (w3m-filter-delete-regions url
       "<div id=\"footer-container\"" "</body>" nil t)
     (w3m-filter-replace-regexp url "</?h4[^>]*>" "")
     (goto-char (point-min))
     (let ((p1 (point)) (p2 (search-forward "<ol" nil t)))
      (w3m-filter-replace-regexp url "<li>" " | " p1 p2)
      (w3m-filter-replace-regexp url "<ul>" "" p1 p2)
      (w3m-filter-replace-regexp url
        "<li><div class=\"yt-lockup[^>]*>" "<p><li>" p2)
      (w3m-filter-replace-regexp url "<button.*</button>" "")
      (w3m-filter-replace-regexp url "<a aria-hidden[^>]*>" "")
      (w3m-filter-replace-regexp url "</?h3[^>]*>" "")
      (goto-char (point-min))
      (while (search-forward "<ul class=\"yt-lockup-meta-info\">" nil t)
        (delete-region (match-beginning 0) (match-end 0))
        (setq p1 (point) p2 (search-forward "</ul>" nil t))
        (w3m-filter-replace-regexp url "</?li>" " " p1 p2)
        (w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
      (goto-char (point-min))
      (while (search-forward "<ul class=\"yt-badge-list \">" nil t)
        (delete-region (match-beginning 0) (match-end 0))
        (setq p1 (point) p2 (search-forward "</ul>" nil t))
        (insert " ")
        (w3m-filter-replace-regexp url "</?li[^>]*>" " " p1 p2)
        (w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
      (w3m-filter-replace-regexp url
        "</div><div class=\"yt-lockup-meta \">" "")
      (w3m-filter-replace-regexp url
        "data-thumb=" "src=")
     ))

   (add-to-list 'w3m-filter-configuration
     '(t
       "filter for youtube.com"
       "\\`http[s]?://www.youtube\\.com/"
       w3m-filter-youtube))
   #+END_SRC

5] TODO, maybe

   There is a minor bug at the bottom of the page that I'm not terribly
   motivated to try to fix, unless maybe someone makes a complaint. The
   search navigation links should read something like

                    "1 2 3 4 5 6 7 Next »"

   where all the text are links except for the indicator of the current
   page. However, the current code deletes the element indicating the
   current page, so for example, at page three what appears is

                    "1 2 4 5 6 7 Next »"

   This is because of how the filter is currently deleting "button"s.

-- 
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0