Opera VPN behind the curtains is just a proxy

When setting up (that’s immediately when user enables it in settings) Opera VPN sends few API requests to https://api.surfeasy.com to obtain credentials and proxy IPs, see below.

The browser then talks to a proxy de0.opera-proxy.net (when VPN location is set to Germany), it’s IP address can only be resolved from within Opera when VPN is on, it’s 185.108.219.42 (or similar, see below). It’s an HTTP/S proxy which requires auth.

When loading a page with Opera VPN enabled, the browser sends a lot of requests to de0.opera-proxy.net with Proxy-Authorization request header.

The Proxy-Authorization header decoded: CC68FE24C34B5B2414FB1DC116342EADA7D5C46B:9B9BE3FAE674A33D1820315F4CC94372926C8210B6AEC0B662EC7CAD611D86A3
(that’s sha1(device_id):device_password, where device_id and device_password come from the POST /v2/register_device API call, please note that this decoded header is from another Opera installation and thus contains different device_id and device_password than what is shown below)

These creds can be used with the de0.opera-proxy.net even when connecting from a different machine, it’s just an HTTP proxy anyway.

When you use the proxy on a different machine (with no Opera installed), you’ll get the same IP as when using Opera’s VPN, of course.

This Opera “VPN” is just a preconfigured HTTP/S proxy protecting just the traffic between Opera and the proxy, nothing else. It’s not a VPN.

They even call it Secure proxy (besides calling it VPN, sure) in Opera settings.

The API calls are:

  1. https://api.surfeasy.com/v2/register_subscriber
  2. https://api.surfeasy.com/v2/register_device
  3. https://api.surfeasy.com/v2/geo_list
  4. https://api.surfeasy.com/v2/discover


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/lzc5rsFmddw/558b7c4cd81afa7c857381254ae7bd10

Original article

How Big Data Creates False Confidence

Photograph by Juergen Faelchle / Shutterstock

If I claimed that Americans have gotten more self-centered lately, you might just chalk me up as a curmudgeon, prone to good-ol’-days whining. But what if I said I could back that claim up by analyzing 150 billion words of text? A few decades ago, evidence on such a scale was a pipe dream. Today, though, 150 billion data points is practically passé. A feverish push for “big data” analysis has swept through biology, linguistics, finance, and every field in between.

Although no one can quite agree how to define it, the general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry. The data are often generated by millions of real-world user actions, such as tweets or credit-card purchases, and they can take thousands of computers to collect, store, and analyze. To many companies and researchers, though, the investment is worth it because the patterns can unlock information about anything from genetic disorders to tomorrow’s stock prices.

But there’s a problem: It’s tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn’t be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus—and the reasons why should give us pause about any research that blindly trusts big data.

In the case of language and culture, big data showed up in a big way in 2011, when Google released its Ngrams tool. Announced with fanfare in the journal Science, Google Ngrams allowed users to search for short phrases in Google’s database of scanned books—about 4 percent of all books ever published!—and see how the frequency of those phrases has shifted over time. The paper’s authors heralded the advent of “culturomics,” the study of culture based on reams of data and, since then, Google Ngrams has been, well, largely an endless source of entertainment—but also a goldmine for linguists, psychologists, and sociologists. They’ve scoured its millions of books to show that, for instance, yes, Americans are becoming more individualistic; that we’re “forgetting our past faster with each passing year”; and that moral ideals are disappearing from our cultural consciousness.

We’re Losing Hope: An Ngrams chart for the word “hope,” one of many intriguing plots found by xkcd author Randall Munroe. If Ngrams really does reflect our culture, we may be headed for a dark place.

The problems start with the way the Ngrams corpus was constructed. In a study published last October, three University of Vermont researchers pointed out that, in general, Google Books includes one copy of every book. This makes perfect sense for its original purpose: to expose the contents of those books to Google’s powerful search technology. From the angle of sociological research, though, it makes the corpus dangerously skewed.

Some books, for example, end up punching below their true cultural weight: The Lord of the Rings gets no more influence than, say, Witchcraft Persecutions in Bavaria. Conversely, some authors become larger than life. From the data on English fiction, for example, you might conclude that for 20 years in the 1900s, every character and his brother was named Lanny. In fact, the data reflect how immensely prolific (but not necessarily popular) the author Upton Sinclair was: He churned out 11 novels about one Lanny Budd.

Who’s named Lanny?: A Google Ngrams plot of “Lanny” vs. more common names in English fiction.

Still more damning is the fact that Ngrams isn’t a consistent, well-balanced slice of what was being published. The same UVM study demonstrated that, among other changes in composition, there’s a marked increase in scientific articles starting in the 1960s. All this makes it hard to trust that Google Ngrams accurately reflects the shifts over time in words’ cultural popularity. 

Go Figure: “Figure” with a capital F, used mainly in captions, rose sharply in frequency through the 20th Century, suggesting that the corpus includes more technical literature over time. That may say something about society, but not much about how most of society uses words.

Even once you get past the data sources, there’s still the thorny issue of interpretation. Sure, words like “character” and “dignity” might decline over the decades. But does that mean that people care about morality less? Not so fast, cautions Ted Underwood, an English professor at the University of Illinois, Urbana-Champaign. Conceptions of morality at the turn of the last century likely differed sharply from ours, he argues, and “dignity” might have been popular for non-moral reasons. So any conclusions we draw by projecting current associations backward are suspect.

Of course, none of this is news to statisticians and linguists. Data and interpretation are their bread and butter. What’s different about Google Ngrams, though, is the temptation to let the sheer volume of data blind us to the ways we can be misled.

This temptation isn’t unique to Ngrams studies; similar errors undermine all sorts of big data projects. Consider, for instance, the case of Google Flu Trends (GFT). Released in 2008, GFT would count words like “fever” and “cough” in millions of Google search queries, using them to “nowcast” how many people had the flu. With those estimates, public health officials could act two weeks before the Centers for Disease Control could calculate the true numbers from doctors’ reports.

When big data isn’t seen as a panacea, it can be transformative.

Initially, GFT was claimed to be 97 percent accurate. But as a study out of Northeastern University documents, that accuracy was a fluke. First, GFT completely missed the “swine flu” pandemic in the spring and summer of 2009. (It turned out that GFT was largely predicting winter.) Then, the system began to overestimate flu cases. In fact, it overshot the peak 2013 numbers by a whopping 140 percent. Eventually, Google just retired the program altogether.

So what went wrong? As with Ngrams, people didn’t carefully consider the sources and interpretation of their data. The data source, Google searches, was not a static beast. When Google started auto-completing queries, users started just accepting the suggested keywords, distorting the searches GFT saw. On the interpretation side, GFT’s engineers initially let GFT take the data at face value; almost any search term was treated as a potential flu indicator. With millions of search terms, GFT was practically guaranteed to over-interpret seasonal words like “snow” as evidence of flu.

But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data.

Lest I sound like a Google-hater, I hasten to add that the company is far from the only culprit. My wife, an economist, used to work for a company that scraped the entire Internet for job postings and aggregate them into statistics for state labor agencies. The company’s managers boasted that they analyzed 80 percent of the jobs in the country, but once again, the quantity of data blinded them to the ways it could be misread. A local Walmart, for example, might post one sales associate job when it actually wants to fill ten, or it might leave a posting up for weeks after it was filled.

So rather than succumb to “big data hubris,” the rest of us would do well to keep our skeptic hats on—even when someone points to billions of words.

Jesse Dunietz, a Ph.D. student in computer science at Carnegie Mellon University, has written for Motherboard and Scientific American Guest Blogs, among others. Follow him on Twitter @jdunietz.


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/nOx-ucmWtlQ/how-big-data-creates-false-confidence

Original article

Capture, edit and share full web pages with Open Screenshot

Desktop screenshot tools support many capture types, and sometimes this includes taking an image of a full web page, even when it doesn’t fit on the screen. Sounds great, but it’s extremely difficult for a third-party tool to make this happen consistently with all browsers and web pages, and often it just won’t work. Switching to a browser extension like Chrome’s Open Screenshot can be a smarter solution, because it has more access to its host’s web content, and doesn’t need to try and support everything else. Trying it out is as easy as clicking Open Screenshot’s address bar button, selecting… [Continue Reading]


Original URL: http://feeds.betanews.com/~r/bn/~3/32xlqD9GbIw/

Original article

Simple Alerting for the ELK Stack

README.md

Stories in Ready
Stories in In Progress
Build Status
Join the chat at https://gitter.im/Yelp/elastalert

Easy & Flexible Alerting With ElasticSearch

ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch.

At Yelp, we use Elasticsearch, Logstash and Kibana for managing our ever increasing amount of data and logs.
Kibana is great for visualizing and querying data, but we quickly realized that it needed a companion tool for alerting
on inconsistencies in our data. Out of this need, ElastAlert was created.

If you have data being written into Elasticsearch in near real time and want to be alerted when that data matches certain patterns, ElastAlert is the tool for you. If you can see it in Kibana, ElastAlert can alert on it.

Overview

We designed ElastAlert to be reliable, highly modular, and easy to set up and configure.

It works by combining Elasticsearch with two types of components, rule types and alerts.
Elasticsearch is periodically queried and the data is passed to the rule type, which determines when
a match is found. When a match occurs, it is given to one or more alerts, which take action based on the match.

This is configured by a set of rules, each of which defines a query, a rule type, and a set of alerts.

Several rule types with common monitoring paradigms are included with ElastAlert:

  • “Match where there are X events in Y time” (frequency type)
  • “Match when the rate of events increases or decreases” (spike type)
  • “Match when there are less than X events in Y time” (flatline type)
  • “Match when a certain field matches a blacklist/whitelist” (blacklist and whitelist type)
  • “Match on any event matching a given filter” (any type)
  • “Match when a field has two different values within some time” (change type)
  • “Match when a never before seen term appears in a field” (new_term type)
  • “Match when the number of unique values for a field is above or below a threshold (cardinality type)

Currently, we have support built in for the following alert types:

  • Email
  • JIRA
  • OpsGenie
  • Commands
  • HipChat
  • Slack
  • Telegram
  • AWS SNS
  • VictorOps
  • PagerDuty
  • Gitter

Additional rule types and alerts can be easily imported or written.

In addition to this basic usage, there are many other features that make alerts more useful:

  • Alerts link to Kibana dashboards
  • Aggregate counts for arbitrary fields
  • Combine alerts into periodic reports
  • Separate alerts by using a unique key field
  • Intercept and enhance match data

To get started, check out Running ElastAlert For The First Time in the documentation.

Running ElastAlert

$ python elastalert/elastalert.py [--debug] [--verbose] [--start ] [--end ] [--rule ] [--config ]

--debug will print additional information to the screen as well as suppresses alerts and instead prints the alert body.

--verbose will print additional information without supressing alerts.

--start will begin querying at the given timestamp. By default, ElastAlert will begin querying from the present.
Timestamp format is YYYY-MM-DDTHH-MM-SS[-/+HH:MM] (Note the T between date and hour).
Eg: --start 2014-09-26T12:00:00 (UTC) or --start 2014-10-01T07:30:00-05:00

--end will cause ElastAlert to stop querying at the given timestamp. By default, ElastAlert will continue
to query indefinitely.

--rule will allow you to run only one rule. It must still be in the rules folder.
Eg: --rule this_rule.yaml

--config allows you to specify the location of the configuration. By default, it is will look for config.yaml in the current directory.

Documentation

Read the documentation at Read the Docs.

Configuration

See config.yaml.example for details on configuration.

Example rules

Examples of different types of rules can be found in example_rules/.

  • example_spike.yaml is an example of the “spike” rule type, which allows you to alert when the rate of events, averaged over a time period,
    increases by a given factor. This example will send an email alert when there are 3 times more events matching a filter occurring within the
    last 2 hours than the number of events in the previous 2 hours.

  • example_frequency.yaml is an example of the “frequency” rule type, which will alert when there are a given number of events occuring
    within a time period. This example will send an email when 50 documents matching a given filter occur within a 4 hour timeframe.

  • example_change.yaml is an example of the “change” rule type, which will alert when a certain field in two documents changes. In this example,
    the alert email is sent when two documents with the same ‘username’ field but a different value of the ‘country_name’ field occur within 24 hours
    of each other.

  • example_new_term.yaml is an example of the “new term” rule type, which alerts when a new value appears in a field or fields. In this example,
    an email is sent when a new value of (“username”, “computer”) is encountered in example login logs.

Frequently Asked Questions

My rule is not getting any hits?

So you’ve managed to set up ElastAlert, write a rule, and run it, but nothing happens, or it says 0 query hits. First of all, we recommend using the command elastalert-test-rule rule.yaml to debug. It will show you how many documents match your filters for the last 24 hours (or more, see --help), and then shows you if any alerts would have fired. If you have a filter in your rule, remove it and try again. This will show you if the index is correct and that you have at least some documents. If you have a filter in Kibana and want to recreate it in ElastAlert, you probably want to use a query string. Your filter will look like

filter:
- query:
    query_string:
      query: "foo: bar AND baz: abc*"

If you receive an error that Elasticsearch is unable to parse it, it’s likely the YAML is not spaced correctly, and the filter is not in the right format. If you are using other types of filters, like term, a common pitfall is not realizing that you may need to use the analyzed token. This is the default if you are using Logstash. For example,

filter:
- term:
    foo: "Test Document"

will not match even if the original value for foo was exactly “Test Document”. Instead, you want to use foo.raw. If you are still having trouble troubleshooting why your documents do not match, try running ElastAlert with --es_debug_trace /path/to/file.log. This will log the queries made to Elasticsearch in full so that you can see exactly what is happening.

I got hits, why didn’t I get an alert?

If you got logs that had X query hits, 0 matches, 0 alerts sent, it depends on the type why you didn’t get any alerts. If type: any, a match will occur for every hit. If you are using type: frequency, num_events must occur within timeframe of each other for a match to occur. Different rules apply for different rule types.

If you see X matches, 0 alerts sent, this may occur for several reasons. If you set aggregation, the alert will not be sent until after that time has elapsed. If you have gotten an alert for this same rule before, that rule may be silenced for a period of time. The default is one minute between alerts. If a rule is silenced, you will see Ignoring match for silenced rule in the logs.

If you see X alerts sent but didn’t get any alert, it’s probably related to the alert configuration. If you are using the --debug flag, you will not receive any alerts. Instead, the alert text will be written to the console. Use --verbose to achieve the same affects without preventing alerts. If you are using email alert, make sure you have it configured for an SMTP server. By default, it will connect to localhost on port 25. It will also use the word “elastalert” as the “From:” address. Some SMTP servers will reject this because it does not have a domain while others will add their own domain automatically. See the email section in the documentation for how to configure this.

Why did I only get one alert when I expected to get several?

There is a setting called realert which is the minimum time between two alerts for the same rule. Any alert that occurs within this time will simply be dropped. The default value for this is one minute. If you want to receive an alert for every single match, even if they occur right after each other, use

realert:
  minutes: 0

You can of course set it higher as well.

How can I prevent duplicate alerts?

By setting realert, you will prevent the same rule from alerting twice in an amount of time.

realert:
  days: 1

You can also prevent duplicates based on a certain field by using query_key. For example, to prevent multiple alerts for the same user, you might use

realert:
  hours: 8
query_key: user

Note that this will also affect the way many rule types work. If you are using type: frequency for example, num_events for a single value of query_key must occur before an alert will be sent. You can also use a compound of multiple fields for this key. For example, if you only wanted to receieve an alert once for a specific error and hostname, you could use

query_key: [error, hostname]

Internally, this works by creating a new field for each document called field1,field2 with a value of value1,value2 and using that as the query_key.

The data for when an alert will fire again is stored in Elasticsearch in the elastalert_status index, with a _type of silence and also cached in memory.

How can I change what’s in the alert?

You can use the field alert_text to add custom text to an alert. By setting alert_text_type: alert_text_only, it will be the entirety of the alert. You can also add different fields from the alert by using Python style string formatting and alert_text_args. For example

alert_text: "Something happened with {0} at {1}"
alert_text_type: alert_text_only
alert_text_args: ["username", "@timestamp"]

You can also limit the alert to only containing certain fields from the document by using include.

include: ["ip_address", "hostname", "status"]

My alert only contains data for one event, how can I see more?

If you are using type: frequency, you can set the option attach_related: true and every document will be included in the alert. An alternative, which works for every type, is top_count_keys. This will show the top counts for each value for certain fields. For example, if you have

top_count_keys: ["ip_address", "status"]

and 10 documents matched your alert, it may contain something like

ip_address:
127.0.0.1: 7
10.0.0.1: 2
192.168.0.1: 1

status:
200: 9
500: 1

How can I make the alert come at a certain time?

The aggregation feature will take every alert that has occured over a period of time and send them together in one alert. You can use cron style syntax to send all alerts that have occured since the last once by using

aggregation:
  schedule: '2 4 * * mon,fri'

I have lots of documents and it’s really slow, how can I speed it up?

There are several ways to potentially speed up queries. If you are using index: logstash-*, Elasticsearch will query all shards, even if they do not possibly contain data with the correct timestamp. Instead, you can use Python time format strings and set use_strftime_index

index: logstash-%Y.%m
use_strftime_index: true

Another thing you could change is buffer_time. By default, ElastAlert will query large overlapping windows in order to ensure that it does not miss any events, even if they are indexed in real time. In config.yaml, you can adjust buffer_time to a smaller number to only query the most recent few minutes.

buffer_time:
  minutes: 5

By default, ElastAlert will download every document in full before processing them. Instead, you can have ElastAlert simply get a count of the number of documents that have occured in between each query. To do this, set use_count_query: true. This cannot be used if you use query_key, because ElastAlert will not know the contents of each documents, just the total number of them. This also reduces the precision of alerts, because all events that occur between each query will be rounded to a single timestamp.

If you are using query_key (a single key, not multiple keys) you can use use_terms_query. This will make ElastAlert perform a terms aggregation to get the counts for each value of a certain field. Both use_terms_query and use_count_query also require doc_type to be set to the _type of the documents. They may not be compatible with all rule types.

Can I perform aggregations?

The only aggregation supported currently is a terms aggregation, by setting use_terms_query.

I’m not using @timestamp, what do I do?

You can use timestamp_field to change which field ElastAlert will use as the timestamp. You can use timestamp_type to change it between ISO 8601 and unix timestamps. You must have some kind of timestamp for ElastAlert to work. If your events are not in real time, you can use query_delay and buffer_time to adjust when ElastAlert will look for documents.

I’m using flatline but I don’t see any alerts

When using type: flatline, ElastAlert must see at least one document before it will alert you that it has stopped seeing them.

How can I get a “resolve” event?

ElastAlert does not currently support stateful alerts or resolve events.

Can I set a warning threshold?

Currently, the only way to set a warning threshold is by creating a second rule with a lower threshold.

License

ElastAlert is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Read the documentation at Read the Docs.

Questions? Drop by #elastalert on Freenode IRC.


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/mfnTw0cSBig/elastalert

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: