[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
youtube support
- From: Boruch Baum <boruch_baum@xxxxxxx>
- Date: Tue, 19 Sep 2017 13:37:53 -0400
- X-ml-name: emacs-w3m
- X-mail-count: 12799
I got it into my head to use emacs-w3m for searching youtube videos, and
decided to share my results, if it's of interest to the project or to
other users.
1] Initial configuration
This part was straightforward:
#+BEGIN_SRC emacs-lisp
(add-to-list 'w3m-uri-replace-alist
'("\\`yt:" w3m-search-uri-replace "youtube"))
(add-to-list ' w3m-content-type-alist
'("youtube" "https://www.youtube.com/search?q=%s"))
#+END_SRC
2] Initial result
The initial result was surpisingly satisfying because many of the site
features don't require javascript. Twenty results appear per page,
navigable using links at the footer.
3] Thumbnail correction
Youtube seems to have some form of server bandwidth optimization to
limit thumbnail images of result to the first six results. I suspect
this is because on javascript-enabled browsers the results appear in
a single, infinitely-scrolling page.
This turned out to be easy to fix, using a simple emacs-w3m 'filter' (but
wait for more below):
#+BEGIN_SRC emacs-lisp
(w3m-filter-replace-regexp url
"data-thumb=" "src=")
#+END_SRC
4] My current filter
Once I got started messing with the page, I made a bunch of other
changes to make it appear more compact and to remove what I
considered unnecessary stuff.
#+BEGIN_SRC emacs-lisp
; ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
; ┃ youtube filter ┃
; ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
(defun w3m-filter-youtube (url)
(w3m-filter-delete-regions url "<head>" "<title>" t t)
(w3m-filter-delete-regions url "</title>" "</head>" t t)
(insert "<body>")
(w3m-filter-delete-regions url
"<body" "<p class=\"num-results" nil t (point))
(w3m-filter-delete-regions url
"<div id=\"footer-container\"" "</body>" nil t)
(w3m-filter-replace-regexp url "</?h4[^>]*>" "")
(goto-char (point-min))
(let ((p1 (point)) (p2 (search-forward "<ol" nil t)))
(w3m-filter-replace-regexp url "<li>" " | " p1 p2)
(w3m-filter-replace-regexp url "<ul>" "" p1 p2)
(w3m-filter-replace-regexp url
"<li><div class=\"yt-lockup[^>]*>" "<p><li>" p2)
(w3m-filter-replace-regexp url "<button.*</button>" "")
(w3m-filter-replace-regexp url "<a aria-hidden[^>]*>" "")
(w3m-filter-replace-regexp url "</?h3[^>]*>" "")
(goto-char (point-min))
(while (search-forward "<ul class=\"yt-lockup-meta-info\">" nil t)
(delete-region (match-beginning 0) (match-end 0))
(setq p1 (point) p2 (search-forward "</ul>" nil t))
(w3m-filter-replace-regexp url "</?li>" " " p1 p2)
(w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
(goto-char (point-min))
(while (search-forward "<ul class=\"yt-badge-list \">" nil t)
(delete-region (match-beginning 0) (match-end 0))
(setq p1 (point) p2 (search-forward "</ul>" nil t))
(insert " ")
(w3m-filter-replace-regexp url "</?li[^>]*>" " " p1 p2)
(w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
(w3m-filter-replace-regexp url
"</div><div class=\"yt-lockup-meta \">" "")
(w3m-filter-replace-regexp url
"data-thumb=" "src=")
))
(add-to-list 'w3m-filter-configuration
'(t
"filter for youtube.com"
"\\`http[s]?://www.youtube\\.com/"
w3m-filter-youtube))
#+END_SRC
5] TODO, maybe
There is a minor bug at the bottom of the page that I'm not terribly
motivated to try to fix, unless maybe someone makes a complaint. The
search navigation links should read something like
"1 2 3 4 5 6 7 Next »"
where all the text are links except for the indicator of the current
page. However, the current code deletes the element indicating the
current page, so for example, at page three what appears is
"1 2 4 5 6 7 Next »"
This is because of how the filter is currently deleting "button"s.
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0