Home Java Sourcebuster JS:a JavaScript module for determining the sources of visitors to a website

Sourcebuster JS:a JavaScript module for determining the sources of visitors to a website

by admin

This story started about six months ago.That’s when I wrote my first meaningful module for Rails, Sourcebuster.And at the same time, I got an invite to the hbr for post about this module In fact, most of the theory is already laid out at the link, and I don’t want to copy an old one. Instead I suggest you to read the previous post before reading this one.
For those who are lazy – literally a paragraph summary of the previous series. About 6 months ago I wrote a Ruby on Rails module, that helps to determine the sources of site visitors and use this data for different marketing and analytics sado-musical entertainments. Now I’ve decided to take a closer look at JS and ported it to JavaScript, which I’m going to talk about in this post.

Main theses

  • The script defines parameters of the sources of site visitors and stores the data in cookies
  • the logic of defining and rewriting the sources is exactly the same as in Google Analytics
  • the script is completely autonomous in terms of data retrieval and has no dependencies on third-party stuff (like _utmz cookies)
  • is written in pure JavaScript, with no dependencies on third-party libraries
  • you can use the data you get :
  • For phone number spoofing
  • content spoofing on the website (e.g., headers)
  • save with the forms sent from the site
  • Export to your CRM or analytics systems.

Links

Github · Download from GitHub · Changelog · Test page
Next, the documentation will be updated at sbjs.rocks

Installation and adjustment

Because the module is written in pure JavaScript, has no dependencies on third-party libraries, and doesn’t care about the actual content of the DOM, it can be called as early as you see fit. The higher you put it in the

 

of the page, the earlier you get a cookie, the data from which can be used to manipulate DOM objects.

Easy installation

Inserting into pages :
<script src="/path/to/sourcebuster.min.js"id="sbjs"></script>

Suitable for those who :

  • does not use subdomains on the site
  • counts only transfers from Yandex and Google as organic traffic (the rest is unimportant / error / insert your variant)

What you get out of the box

  • By default, organic traffic is only considered as transitions from Yandex and Google.
  • Duration of user session : 30 minutes.
  • The default logic is that the site does not use subdomains. See below for a detailed explanation of configuring the script to work with subdomains.
  • The default user ip is not saved.

Installation "for advanced PC users"

<script>var _sbjs = _sbjs || [];_sbjs.push(['_setSessionLength', 15]);_sbjs.push(['_setBaseHost', 'statica.alexfedoseev.com']);_sbjs.push(['_setTimeZoneOffset', 4]);_sbjs.push(['_addOrganicSource', 'yahoo.com', 'p']);_sbjs.push(['_addOrganicSource', 'bing.com', 'q', 'bing']);_sbjs.push(['_addReferralSource', 'facebook.com', 'social']);_sbjs.push(['_addReferralSource', 't.co', 'social', 'twitter.com']);_sbjs.push(['_addReferralSource', 'plus.url.google.com', 'social', 'plus.google.com']);</script><script src="/path/to/sourcebuster.min.js" id="sbjs"> </script>

You need to put your wishes into the settings array through _sbjs.push and then load the main script right after that.
There are 7 types of user settings in total :

  • _setSessionLength
  • _setBaseHost
  • _setTimeZoneOffset
  • _setCampaignParam
  • _addOrganicSource
  • _addReferralSource
  • _setUserIP

Let’s go in order.

_setSessionLength

_sbjs.push(['_setSessionLength', 15]);

Set the duration of a user session in minutes.
Within this module, this figure only affects overwriting/non-overwriting referralsource.
A few words about source overwriting. When a user comes to the site for the first time, we get the data about the source. The same user may return to the site from another source, and we either need to overwrite the current source or not. The overwriting logic is exactly the same as in Google Analytics:
Sourcebuster JS:a JavaScript module for determining the sources of visitors to a website

  • Transitions with utm markup overwrite anything and everything (even themselves).
  • Transitions from organic output similarly – overwrite everything and always.
  • Direct transitions do not overwrite everything ever and nothing. They are recorded only in the case of the very first visit to the site, provided that no other sources have been recorded before.
  • Referral transitions within the current session does not overwrite anything, rewrite occurs only if the user has no session. Why – I will explain by example: often the visitor in the current visit goes to the site from a third-party resource, which is not the real source – for example, from the mail service, where he had a link to activate registration.

_setBaseHost

_sbjs.push(['_setBaseHost', 'alexfedoseev.com']);

Set the base host, within which all transitions will be considered internal (not referral) traffic. This setting is only relevant if you use subdomains on your site.
Scenario 1
Suppose you have a website : site.com . Your site has a blog : blog.site.com And you want the conversions from the site to the blog and back to be counted as internal traffic : i.e. source blog.site.com was not captured as referral and did not overwrite other sources in a new session. To do this, you need to add the line :

_sbjs.push(['_setBaseHost', 'site.com']);

With this setting, if the user has switched from blog.site.com to site.com (as well as from alex.blog.site.com to site.com ), the source will not be overwritten, and such a jump would be equivalent to a jump from site.com/about to site.com/contacts
Scenario 2
Now consider the opposite scenario: when you want to split traffic between subdomains and consider it as referral traffic. There is a main site ( site.com ) and there is a blog ( blog.site.com ), which has subdomains for users ( alex.blog.site.com ).You want transitions between blog.site.com and alex.blog.site.com count as internal traffic, and conversions between these subdomains and the main site as referral traffic. To do this :

// on the pages of the main site_sbjs.push(['_setBaseHost', 'site.com', false])// on the pages of the blog.site.com and alex.blog.site.comsubdomains_sbjs.push(['_setBaseHost', 'blog.site.com']);

Notice the third parameter false in the setting for the main site. It is set if and only if non-referral traffic should be only within the specified domain. All other traffic, including conversions from subdomains within the same domain, will be considered referral traffic.
In our example, with this setting, all transitions between the main site and the blogs will be considered referral traffic. And if a user goes to the main site for the first time by clicking on a link from the user’s blog, then his source will be : alex.blog.site.com (traffic type : referral ).
Check again that you got it right when you use the parameter false
Domain of the page where the setting is installed _setBaseHost with the parameter false must match the host specified in this setting.

// Correct: on pages site.com_sbjs.push(['_setBaseHost', 'site.com ', false]);// DOESN 'T MAKE SENSE : on pages blog.site.com_sbjs.push(['_setBaseHost', 'site.com ', false]);

The specified host does not have any subdomains from which you want traffic to be considered non-referral.

_sbjs.push(['_setBaseHost', 'site.com', false]);//=> traffic from ALL subdomains to site.com will be referrals

_setTimeZoneOffset

_sbjs.push(['_setTimeZoneOffset', 4]);

Set the time zone.
By default, the date is saved in UTC. The setting allows you to change the default time zone.

_setCampaignParam

_sbjs.push(['_setCampaignParam', 'custom_campaign']);

Set the GET parameter whose value will be saved as utm_campaign (if there is no original parameter in the request utm_campaign ). This setting was added mainly because of the Google AdWords tag gclid
Example of usage
If you have traffic from Google AdWords and you use the gclid , you can shorten the urls by removing the utm markup. Sourcebuster will still determine that it is utm traffic from Google AdWords.
If the url contains only the label gclid :
http://statica.alexfedoseev.com/sourcebuster-js/?gclid=sMtH
This will give the following result :

  • Traffic type: utm
  • utm_source: google
  • utm_medium: cpc
  • utm_campaign: google_cpc
  • utm_content: (none)
  • utm_term: (none)

You can change the value of utm_campaign via _setCampaignParam :
http://statica.alexfedoseev.com/sourcebuster-js/?gclid=sMtHcustom_campaign=test_custom
Then the result will be :

  • Traffic type:utm
  • utm_source: google
  • utm_medium: cpc
  • utm_campaign: test_custom
  • utm_content: (none)
  • utm_term: (none)

IMPORTANT

  • If the url contains the original utmtags (utm_source, utm_medium, utm_campaign), the tag gclid and the parameter specified via _setCampaignParam will be ignored.
  • If the url only contains the parameter specified with _setCampaignParam , Sourcebuster will consider this transition as utm traffic.

_addOrganicSource

_sbjs.push(['_addOrganicSource', 'yahoo.com', 'p']);_sbjs.push(['_addOrganicSource', 'bing.com', 'q', 'bing']);

Adding a source of organictraffic.
Suppose you want the system to count the jumps from the search bing.com – as organic traffic.To do this you need to add a basic host – 'bing.com'. , and the keyword parameter is 'q' Both parameters are mandatory.You can also specify an alias for the source via an optional third parameter ( 'bing' ).
To get the keyword parameter, you need to go to bing.com. and type a query into the search box (e.g, "apple" ).This will take you to a page with an address like :
www.bing.com/search ? q=apple go=qs=nform=QBLHpq=applesc=8-5sp=-1sk=cvid=718ad07527244c319ecebf44aa261f64
Keyword parameter – 'q' – is a character between "?" (or "" if the parameter is not the first one after a question mark) and "=apple" in the search results page url.

_addReferralSource

_sbjs.push(['_addReferralSource', 'facebook.com', 'social']);_sbjs.push(['_addReferralSource', 't.co', 'social', 'twitter.com']);

Add the source of referraltraffic.In general, if you are satisfied with utm_medium when going from e.g. facebook.com will have the value referral then you don’t need to set anything.But if you want to assign such transitions a custom channel (e.g, utm_medium=social ), you can add such a setting via _addReferralSource The first parameter is the base host, the second is the desired value utm_medium
In addition, some resources have a different referrer than the main domain (for example, Twitter has a host referrer – t.co ). In such cases, you can use the optional third parameter to assign aliases to sources. You can also use it to group multiple sites with different referrers into one source.

_setUserIP

_sbjs.push(['_setUserIP', <%= request.remote_ip %> ]);

By default, the script does not save the visitor’s ip address. If you want to save it in a custom cookie, you can add it via _setUserIP by getting it on the backend. The example shows how to do this in Ruby.

Using

Cookies

So, the script is installed and configured. Now visitors have the following cookies when they go to the site :

  • sbjs_current
  • sbjs_first
  • sbjs_first_add
  • sbjs_session
  • sbjs_referer
  • sbjs_udata

sbjs_current

Parameters of the outermost transition source.
If the user has changed the transition source (1-2-3-many times), this cookie will have the most outermost source.
Content format

typ=organic|src=google|mdm=organic|cmp=(none)|cnt=(none)|trm=(none)

Parameters

  • typ
    Traffic type. Possible values : utm , organic , referral , typein There is no fifth.
  • src
    Source. In fact , the value utm_source
  • mdm
    Channel. Value utm_medium Can be configured through utm markup and _addReferralSource.
  • cmp
    Advertising Campaign. Value utm_campaign.
  • cnt
    The version of the banner ad. Value utm_content
  • trm
    Key query. Value utm_term

Examples of content

# switch from the marked advertisementtyp=utm|src=yandex|mdm=cpc|cmp=my_adv_campaign|cnt=banner_1|trm=buy_my_stuff# switch from organicstyp=organic|src=google|mdm=organic|cmp=(none)|cnt=(none)|trm=(none)# referral from a third-party sitetyp=referral|src=site.com|mdm=referral|cmp=(none)|cnt=(none)|trm=(none)# conversion from facebook with _addReferralSourcetyp=referral|src=facebook.com|mdm=social|cmp=(none)|cnt=(none)|trm=(none)# direct linktyp=typein|src=typein|mdm=typein|cmp=(none)|cnt=(none)|trm=(none)

sbjs_first

Cook’s composition is exactly the same sbjs_current but it stores the parameters of the very first visitor’s source. This cookie is set once and is never overwritten.

sbjs_first_add

Additional data about the first visit of the user : date/time and login point.
Content format

fd=2014-06-11 17:28:26|ep=http://statica.alexfedoseev.com/sourcebuster-js/

Parameters

  • fd
    Date and time of the very first visit to the site by a particular user. Saves in the format yyyyy-mm-dd hh:mm:ss The default is UTC. You can change the time zone via _setTimeZoneOffset
  • ep
    Site entry point.

sbjs_session

Cookie-flag that the user has an open session. Lifetime : 30 minutes or your setting via _setSessionLength. (since last activity).

sbjs_referer

The referer at which the source was written or overwritten.
Content format

ref=http://habrahabr.ru

Parameters

  • ref
    The page of the "third-party" site from which the visitor came to the site.

sbjs_udata

Additional data about the user : ip and user-agent.
Content format

uip=80.20.123.77|uag=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36

Parameters

  • uip
    Current ip address of the visitor.
  • uag
    The current user-agent (browser) of the visitor.

Data retrieval

Data is available via get_sbjs :

get_sbjs.name_cookie_without_prefix_sbjs_.name_parameter_in_cookie// for example to get the current utm_sourceget_sbjs.current.src// the first utm_mediumget_sbjs.first.mdm// entry pointget_sbjs.first_add.ep// user-agent of the userget_sbjs.udata.uag// etc.

And of course you can parse the cookies yourself on the back.

Data usage

Let’s display the visitor’s current source on the page. To do this, we need the variable get_sbjs was defined. If, however, at the moment of calling the function that outputs the current source, get_sbjs has not yet had time to load, then we should monitor the script loading (sourcebuster) and only then run the function which outputs the current source. Also for IE8 and below we need a little fix to run this function at the right time (after the sourcebuster is loaded).

//here we put the name of the current source<div id="data-box"> </div><script type="text/javascript">// a fix/hopper for IE that starts the callback function only after the main script is loaded// in our case, the callback is a function that outputs the current source, while the main script is a sourcebusterfunction ie_load_bug_fix(script, callback) {if (script.readyState == 'loaded' || script.readyState == 'completed') {callback();} else {setTimeout(function() { ie_load_bug_fix(script, callback); }, 100);}}// function that outputs the current sourcefunction place_data() {document.getElementById('data-box').innerHTML = get_sbjs.current.src;}// output :// first we check if the get_sbjs variable is defined// if the variable is defined, then we start the output of the current source// if not, then we check the browser type and run the output only after the sourcebuster is loadedif (typeof get_sbjs !== 'undefined') {place_data();} else {if (window.addEventListener) {sbjs.addEventListener('load', place_data, false);} else if (window.attachEvent) {ie_load_bug_fix(sbjs, place_data);}}</script>

Restrictions

Transitions from httpsto http

The standard for protocol transitions is https to http there is no referrer in the request, and such transfers will be defined by the module as typein (i.e. direct hits).

The "|" symbol in the utm markup

If you use it, it is most likely get_sbjs will not work correctly.Sorry e.

Afterword

I’m just starting to test the module on live projects. If anyone wants to join, you are welcome. It’s good if there is a working solution, with which you can compare the results. If you catch a bug, it would be better to report it in the format issue on Github
That’s it for me, thanks for your attention and good luck.

You may also like