Splunk/custom-search-youtube

From aldeid
Jump to: navigation, search
You are here:
Custom search youtube

Description

What it does

Streaming custom search command that shows information (video length, video title) about youtube videos based on squid proxy logs. Clicking on a row will open the corresponding youtube video in a new tab.

Screenshot

Splunk-custom-search-youtube-dashboard.png

Download

Download the app here. You can install it from the
Manage apps
>
Install app from file
menu.

Source

Code

This is the main code that should be copied to
$SPLUNK_HOME/etc/apps/youtube/bin/youtube.py
:
#!/usr/bin/env python
#
# Author: Sebastien Damaye
# Description: streaming custom search command that shows information about youtube
#   videos based on squid proxy logs
# Use as follows:
# source="*squid*" uri="*www.youtube.com/watch?v=*" | sort _time | youtube uri | table _time clientip uri youtube
#

import splunk.Intersplunk 
import re
import urllib2
import urlparse
import time

def getYoutube(uri):
    m = re.match(r'^https:\/\/www.youtube.com\/watch\?v=([a-zA-Z0-9_-]+)(&.+)?$', uri)
    if m:
        response = urllib2.urlopen('http://youtube.com/get_video_info?video_id=%s' % m.group(1))
        html = response.read()
        qs = urlparse.parse_qs(html)
        if 'title' in qs:
            title = qs['title'][0]
            length = time.strftime("%H:%M:%S", time.gmtime(int(qs['length_seconds'][0])))
            return "(%s) %s" % (length, title)
        else:
            return "Error while retrieving info"
    else:
        return "Regexp not recognized"

# get the previous search results
results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()

# for each results, add a 'youtube' attribute, calculated from the uri field
for result in results:
   result["youtube"] = getYoutube(result["uri"])

# output results
splunk.Intersplunk.outputResults(results)

Dashboard

<form script="table_drilldown_url_field.js">
  <label>youtube videos</label>
  <fieldset submitButton="false" autoRun="true">
    <input type="time" token="TimeRangePicker" searchWhenChanged="true">
      <label>TimeRange</label>
      <default>
        <earliest>@d</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <table id="link">
        <search>
          <query>source="*squid*" uri="*www.youtube.com/watch?v=*" | sort _time | youtube uri | table _time clientip uri youtube v</query>
          <earliest>$TimeRangePicker.earliest$</earliest>
          <latest>$TimeRangePicker.latest$</latest>
        </search>
        <drilldown target="_blank">
          <link>
            <![CDATA[
            https://www.youtube.com/watch?v=$row.v$
            ]]>
          </link>
        </drilldown>
        <option name="wrap">true</option>
        <option name="rowNumbers">false</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">row</option>
        <option name="count">30</option>
        <fields>["_time","clientip","uri","youtube"]</fields>
      </table>
    </panel>
  </row>
</form>

Known limitations

  • It takes time to render the table because it's dependant from the youtube webservice (http://youtube.com/get_video_info) and the table won't be displayed until it has gathered the info for all entries. Beware that the script could take a very long time depending on the number of videos to analyze.
  • In some cases, the google webservice does not return results (for legal reasons I guess). In these cases, the script displays
    Error while retrieving the info

Comments

blog comments powered by Disqus

Keywords: splunk youtube