Docker’s no longer all about test-and-dev, says Docker CEO

Improvements to Docker over the last year have made it enterprise-ready for an ever-growing number of enterprises. We get specifics from Docker CEO Ben Golub.

Original URL:  

Original article

Video demo of

Video demo of


In 1999, my company, UserLand Software, released a product called Manila.

There was a lot to the product, it was a content management system, or CMS, and was built around the idea of editorial roles, and a discussion group. Templates. Full control over appearance with templates for everything.

The idea was that a publication would consist of a group of editorial people collaborating on a flow of news stories.

But it was also for blogging. We ran with Manila. A lot of people used it.

One of the big innovations of Manila was that every bit of content you could edit had a big Edit This Page button on it. Click the button, make a change, click Submit. This was a huge innovation. Made it a lot easier.

Anyway, fast forward to 2016, and I’m doing this year’s version of Manila. It’s called 1999. Because when you’re using it, you’re blogging like it’s 1999. With a H/T to Prince. 

The software is getting very close to being finished now.

And I wanted to sneak out a preview of what editing is like in 1999.

Believe it or not, it’s even easier than Edit This Page. 

Here’s the demo

Original URL:  

Original article

Show HN: Micro – a microservice toolkit

By now you may have heard of this new phenomenon, microservices.
If you’re not yet familiar and interested in learning more, checkout our introductory post

In this blog post we’re going to discuss Micro, an open source microservices toolkit.
Micro provides the core requirements for building and managing microservices. It consists of a set of libraries and tools
which are primarily geared towards development in the programming language Go, however looks to
solve for other languages via HTTP using a Sidecar.

Before we get to the details of Micro, let’s talk about why we decided to dedicate our time to it.

Development or Deployment

It’s clear from our past experiences and what we’re seeing in the industry that there’s a need for a focus on development
rather than deployment. PaaS solutions are readily available. Companies like AWS, Google and Microsoft
are providing feature rich platforms while also rapidly moving towards supporting container orchestration if not already
doing so. All of this gives us access to large scale compute with the click of a few buttons.

This new world sounds great. You might say this solves all your problems, right? Well, while we now have access to
massive scale computing power, we’re still lacking the tools that enable us to write software that’s
capable of leveraging it. Not only that, in this new world, containers are likely more ephemeral, coming and going
as the runtime reschedules them or machines they’re running on fail.

Scaling challenges

The other issue we’ve seen time and time again is the way in which organisations fall victim to their monolithic
architectures. With a need to grow at a blistering pace, there’s the tendency to cram features into the
existing system and incur technical debt which snowballs into an unmanageable situation. Aside from this,
as the organisation attempts to grow the engineering team, it becomes infinitely more difficult for developers
to collaborate on a single code base or do feature development in parallel without being blocked at time
of release.

There’s an inevitable need to rearchitect and an eventual path to SOA or microservice based
architectures. Companies end up taking on an R&D effort in house, learning by trial and error. If only
there were tools to help create scalable systems, reduce the likely hood of R&D and provide domain expertise
from those with past experience.

Enter Micro

At Micro we’re building a microservice ecosystem that includes tools, services and solutions for microservice
development. We’re building the foundation of that ecosystem with a tool by the same name.
Micro is a microservice toolkit, which will enable the creation of a
scalable architectures and increase speed of execution.

Let’s dig into the features of Micro.

Go Micro

Go Micro is a pluggable RPC framework used to build microservices in Go.
It delivers the essential features required to create, discover and communicate with services. The core of any good
microservice architecture begins by addressing service discovery, synchronous and asynchronous communication.

Included packages and features:

  • Registry – Client side service discovery
  • Transport – Synchronous communication
  • Broker – Asynchronous comunication
  • Selector – Node filtering and load balancing
  • Codec – Message encoding/decoding
  • Server – RPC server building on the above
  • Client – RPC client building on the above

Where Go Micro differs from most libraries is it’s pluggable architecture. This allows the implementation and backend
system for every package to be swapped out. For example; the default service discovery mechanism for the registry is
Consul but this can easily be swapped with a plugin for etcd, zookeeper or anything else
you choose to implement. Plugins we’re implementing can be found at

The value in a pluggable system is the ability to choose the platform used to support your microservices without
having to rewrite any code. Go Micro requires zero code changes, just a mere import of a plugin and you’re done.

Go Micro is the starting point for writing microservices. The readme provides
an overview of how to write, run and query a service. There’s a greeter example here micro/examples/greeter
and more example services throughout the repo

go micro


So Go Micro provides a way to write services in Go but how about other languages? How do we create a polygot ecosystem
where anyone can leverage the advantages of Micro? While Micro is written in Go, we wanted to allow a quicky and easy
way to integrate applications written in any language.

Enter the Sidecar, a lightweight companion service which is
conceptually “attached” to the main (aka Parent) application and complements it by providing the features of the
Micro system that are otherwise available using the Go Micro library. The sidecar is a process that runs alongside
your application, delivering the features of Go Micro via a HTTP interface.

Features of the sidecar:

  • Registration with discovery system
  • Host discovery of other services
  • Healthchecking of the main application
  • A proxy to make RPC requests
  • PubSub via WebSockets

Examples of using the Sidecar with ruby or python can be found here micro/examples/greeter.
We’ll look to add more sample code in the near future to help with understanding how to integrate the sidecar.



Making RPC requests from one service to another is pretty straight forward with Go Micro but not ideal for external
access. Instances of a service can fail, they may be rescheduled elsewhere or end up binding to any random port.
The API provides a single entry point to query microservices
and should be used as the gateway for external access.

The API provides a few different types of request handlers.

1. /rpc

Individual services can be queried via RPC using the /rpc endpoint. Example:

	-d "service=go.micro.srv.greeter" 
	-d "method=Say.Hello" 
	-d "request={"name": "John"}" 

{"msg":"Hello John"}
2. api.Request

The API can be used to breakdown URLs to be served by individual microservices. This is a powerful method of API composition.
Here the API uses the first part of the request path along with a namespace component to determine
the service to route requests to. HTTP requests are then converted to an api.Request
and forwarded appropriately.

At Micro we use a pattern of creating API microservices to serve requests at the edge. Separating
the responsibility of backend versus frontend services.

An example of API request handling:


GET /greeter/say/hello?name=John


service: go.micro.api.greeter (default namespace go.micro.api is applied)
method: Say.Hello
request {
	"method": "GET",
	"path": "/greeter/say/hello",
	"get": {
		"name": "John"

The structure of an api.Request and api.Response:

syntax = "proto3";

message Pair {
	optional string key = 1;
	repeated string values = 2;

message Request {
	optional string method = 1;   // GET, POST, etc
	optional string path = 2;     // e.g /greeter/say/hello
	map header = 3; 
	map get = 4;    // The URI query params
	map post = 5;   // The post body params
	optional string body = 6;     // raw request body; if not application/x-www-form-urlencoded

message Response {
	optional int32 statusCode = 1;
	map header = 2;
	optional string body = 3;

An example of how to create an API service can be found here. Greeter API

3. proxy

The final method of request handling for the API is a reverse proxy. Just as above, the API uses the request
path and a namespace component to determine the service to route requests to. By providing reverse proxying
and microservice request routing we’re able to support REST, a widely sought after requirement.

The proxy can be enabling by passing the --api_handler=proxy flag.

An example of how to build a RESTful API can be found here micro/examples/greeter/api.


Web UI

The web UI provides a simple dashboard for observing and interacting with a running system. Not only that but it also
provides a reverse proxy much like the API. Our goal with a “web proxy” is to enable the development of
web apps as microservices. Again, just like the API, the request path is used along with a namespace to determine
the service to route requests to. The web proxy also supports web sockets as we see realtime being a core part
of delivering web apps.



The CLI is a command line tool which provides a way to observe, interact and manage services in a running environment.
The current feature set allows you to inspect the registry, check basic health of services and execute queries against
services themselves.

The other nifty feature, is the ability to use the Sidecar as a proxy for the CLI. It’s as simple as specifying
the address for the sidecar as a flag when executing the CLI.


Putting it all together

We’ve written an example of full end-to-end flow through the system using a simple greeter service.

The flow is as follows:

  1. HTTP GET request is made to the micro API at path /greeter/say/hello with the query name=John.
  2. The API translates this using the default namespace to the api service go.micro.api.greeter and method Say.Hello.
    The request is structured as an api.Request.
  3. The API using Go Micro, queries the registry to find all the nodes for the service go.micro.api.greeter and
    forwards the request to one of the nodes.
  4. The greeter api parses the request, generates a hello.Request and makes a request to the rpc service go.micro.srv.greeter.
    Again the same registry/discovery mechanism is used to find the nodes for the service.
  5. The greeter rpc service responds with a hello.Response.
  6. The greeter api translates the response into a api.Response and passes it back to the API.
  7. The micro API parses the response and responds to the client’s HTTP request.

In a more complex example, an API service may call out to many other RPC services, aggregate and transform
the data and then pass back a final summarised result to the client. This allows you to maintain a
consistent external entry point and change services in the background without the knowledge of the client.

greeter service


If you want to kick the tyres on a running system, checkout our demo at

We’re running Micro On Kubernetes using Google Container Engine. The demo is open source if you want to run it yourself.
You can find the k8s config here


Micro provides the fundamental building blocks for writing and managing microservices. Go Micro includes
the core requirements; discovery, client/server and pub/sub. The CLI let’s you interact with your
environment and services. The sidecar enables integration of any non Micro application. The API is a
single entry point for rpc requests and enables the creation of REST endpoints. With pluggable interfaces
you can pick and choose the systems you want to leverage in building your architecture.

Our goal at Micro is to enable development at scale, increase speed of execution and provide value starting from
the very beginning of the developer lifecycle. We feel Micro is the best way to do all those things.
Over time the ecosystem of tools will grow to include more feature rich services for discovery, routing
and obversability.

If you want to learn more about the services we offer or microservices, checkout the website or
the github repo.

Follow us on Twitter at @MicroHQ or join the Slack
community here.


Original URL:  

Original article

Show HN: Python 3.5 Async Web Crawler Example

Python3.5 Async Crawler Example with aiohttp and asyncio

Installation Python 3.5

sudo add-apt-repository ppa:fkrull/deadsnakes

sudo apt-get update

sudo apt-get install python3.5

Replace python3.5 with python3

sudo cp /usr/bin/python3 /usr/bin/python3-backup

sudo rm /usr/bin/python3

sudo ln -s /usr/bin/python3.5 /usr/bin/python3


sudo apt-get install python3-pip

sudo pip3 install aiohttp

Original URL:  

Original article

Rspamd 1.2: new major release of the fast spam filtering engine

The next major release of rspamd: 1.2.0 is now released.

Key features:

  • Dynamic rules updates
  • Regular expressions maps support
  • Better performance: pcre2 support, faster fuzzy hashes, faster IP lookups
  • Improved stability: fixed many important bugs and memory leaks

This version is a gradual improvement over the previous 1.1 branch. It is the first release with rules updates support. I believe that it would be easier to backport new rules or critical scores changes from the experimental line to stable one. Rules updates are signed to protect integrity and authenticate the updates source.

Among other features introduces by this version are regular expressions maps support (with hyperscan acceleration if available). This sort of maps could be used to match many regular expressions at the same time to detect certain patterns in the messages being scanned.

Rspamd 1.2 has a couple of performance improvements: it now supports PCRE 2 regular expressions library that is usually faster than pcre 1. Fuzzy hashing gets further improvements by utilizing AVX2 instructions available from Intel Haswell CPU family. From 1.2 version, rspamd uses better algorithm to store IP addresses allowing to lookup among millions of IPv4 and IPv6 records in almost zero time.

The new release is scanned with Coverity scan and other static analysis tools that helped to fix many potential bugs and leaks. I believe that rspamd 1.2 is stable, solid and completely production ready so far.

The complete log of changes could be found here:

There are many important additions in the documentation shipped with rspamd. There is now frequently asked questions article that describes many aspects of practical rspamd using. The quick start guide has been also updated to improve new users’ experiences when installing and running rspamd.

Rmilter has been also upgraded to the version 1.7.5 that fixes important greylisting and clamav issues. Rmilter changelog is available here:

Original URL:  

Original article

Machine Learning: An In-Depth, Non-Technical Guide – Part 5

Machine Learning: An In-Depth, Non-Technical Guide — Part 5

Originally published at here on March 18, 2016.


  1. Overview, goals, learning types, and algorithms
  2. Data selection, preparation, and modeling
  3. Model evaluation, validation, complexity, and improvement
  4. Model performance and error analysis
  5. Unsupervised learning, related fields, and machine learning in practice


Welcome to the fifth and final chapter in a five-part series about machine learning.

In this final chapter, we will revisit unsupervised learning in greater depth, briefly discuss other fields related to machine learning, and finish the series with some examples of real-world machine learning applications.

Unsupervised Learning

Recall that unsupervised learning involves learning from data, but without the goal of prediction. This is because the data is either not given with a target response variable (label), or one chooses not to designate a response. It can also be used as a pre-processing step for supervised learning.

In the unsupervised case, the goal is to discover patterns, deep insights, understand variation, find unknown subgroups (amongst the variables or observations), and so on in the data. Unsupervised learning can be quite subjective compared to supervised learning.

The two most commonly used techniques in unsupervised learning are principal component analysis (PCA) and clustering. PCA is one approach to learning what is called a latent variable model, and is a particular version of a blind signal separation technique. Other notable latent variable modeling approaches include expectation-maximization algorithm (EM) and Method of moments3.


PCA produces a low-dimensional representation of a dataset by finding a sequence of linear combinations of the variables that have maximal variance, and are mutually uncorrelated. Another way to describe PCA is that it is a transformation of possibly correlated variables into a set of linearly uncorrelated variables known as principal components.

Each of the components are mathematically determined and ordered by the amount of variability or variance that each is able to explain from the data. Given that, the first principal component accounts for the largest amount of variance, the second principal component the next largest, and so on.

Each component is also orthogonal to all others, which is just a fancy way of saying that they’re perpendicular to each other. Think of the X and Y axis’ in a two dimensional plot. Both axis are perpendicular to each other, and are therefore orthogonal. While not easy to visualize, think of having many principal components as being many axis that are perpendicular to each other.

While much of the above description of principal component analysis may be a bit technical sounding, it is actually a relatively simple concept from a high level. Think of having a bunch of data in any amount of dimensions, although you may want to picture two or three dimensions for ease of understanding.

Each principal component can be thought of as an axis of an ellipse that is being built (think cloud) to contain the data (aka fit to the data), like a net catching butterflies. The first few principal components should be able to explain (capture) most of the data, with the addition of more principal components eventually leading to diminishing returns.

One of the tricks of PCA is knowing how many components are needed to summarize the data, which involves estimating when most of the variance is explained by a given number of components. Another consideration is that PCA is sensitive to feature scaling, which was discussed earlier in this series.

PCA is also used for exploratory data analysis and data visualization. Exploratory data analysis involves summarizing a dataset through specific types of analysis, including data visualization, and is often an initial step in analytics that leads to predictive modeling, data mining, and so on.

Further discussion of PCA and similar techniques is out of scope of this series, but the reader is encouraged to refer to external sources for more information.


Clustering refers to a set of techniques and algorithms used to find clusters (subgroups) in a dataset, and involves partitioning the data into groups of similar observations. The concept of ‘similar observations’ is a bit relative and subjective, but it essentially means that the data points in a given group are more similar to each other than they are to data points in a different group.

Similarity between observations is a domain specific problem and must be addressed accordingly. A clustering example involving the NFL’s Chicago Bears (go Bears!) was given in chapter 1 of this series.

Clustering is not a technique limited only to machine learning. It is a widely used technique in data mining, statistical analysis, pattern recognition, image analysis, and so on. Given the subjective and unsupervised nature of clustering, often data preprocessing, model/algorithm selection, and model tuning are the best tools to use to achieve the desired results and/or solution to a problem.

There are many types of clustering algorithms and models, which all use their own technique of dividing the data into a certain number of groups of similar data. Due to the significant difference in these approaches, the results can be largely affected, and therefore one must understand these different algorithms to some extent to choose the most applicable approach to use.

K-means and hierarchical clustering are two widely used unsupervised clustering techniques. The difference is that for k-means, a predetermined number of clusters (k) is used to partition the observations, whereas the number of clusters in hierarchical clustering is not known in advance.

Hierarchical clustering helps address the potential disadvantage of having to know or pre-determine k in the case of k-means. There are two primary types of hierarchical clustering, which include bottom-up and agglomerative.

Here is a visualization, courtesy of Wikipedia, of the results of running the k-means clustering algorithm on a set of data with k equal to three. Note the lines, which represent the boundaries between the groups of data.

There are two types of clustering, which define the degree of grouping or containment of data. The first is called hard clustering, where every data point belongs to only one cluster and not the others. Soft clustering, or fuzzy clustering on the other hand refers to the case where a data point belongs to a cluster to a certain degree, or is assigned a likelihood (probability) of belonging to a certain cluster.

Method comparison and general considerations

What is the difference then between PCA and clustering? As mentioned, PCA looks for a low-dimensional representation of the observations that explains a good fraction of the variance, while clustering looks for homogeneous subgroups among the observations.

An interesting point to note is that in the absence of a target response, there is no way to evaluate solution performance or errors as one does in the supervised case. In other words, there is no objective way to determine if you’ve found a solution. This is a significant differentiator between supervised and unsupervised learning methods.

Predictive Analytics, Artificial Intelligence, and Data Mining, Oh My!

Machine learning is often interchanged with terms like predictive analytics, artificial intelligence, data mining, and so on. While machine learning is certainly related to these fields, there are some notable differences.

Predictive analytics is a subcategory of a broader field known as analytics in general. Analytics is usually broken into three sub-categories: descriptive, predictive, and prescriptive.

Descriptive analytics involves analytics applied to understanding and describing data. Predictive analytics deals with modeling, and making predictions or assigning classifications from data observations. Prescriptive analytics deals with making data-driven, actionable recommendations or decisions.

Artificial intelligence (AI) is a super exciting field, and machine learning is essentially a sub-field of AI due to the automated nature of the learning algorithms involved. According to Wikipedia, AI has been defined as the science and engineering of making intelligent machines, but also as the study and design of intelligent agents, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success

Statistical learning is becoming popularized due to Stanford’s related online course and its associated books: An Introduction to Statistical Learning, and The Elements of Statistical Learning.

Machine learning arose as a subfield of artificial intelligence, statistical learning arose as a subfield of statistics. Both fields are very similar, overlap in many ways, and the distinction is becoming less clear over time. They differ in that machine learning has a greater emphasis on prediction accuracy and large scale applications, whereas statistical learning emphasizes models and their related interpretability, precision, and uncertainty.

Lastly, data mining is a field that’s also often confused with machine learning. Data mining leverages machine learning algorithms and techniques, but also spans many other fields such as data science, AI, statistics, and so on.

The overall goal of the data mining process is to extract patterns and knowledge from a data set, and transform it into an understandable structure for further use. Data mining often deals with large amounts of data, or big data.

Machine Learning in Practice

As discussed throughout this series, machine learning can be used to create predictive models, assign classifications, make recommendations, and find patterns and insights in an unlabeled dataset. All of these tasks can be done without requiring explicit programming.

Machine learning has been successfully used in the following non-exhaustive example applications1:

  • Spam filtering
  • Optical character recognition (OCR)
  • Search engines
  • Computer vision
  • Recommendation engines, such as those used by Netflix and Amazon
  • Classifying DNA sequences
  • Detecting fraud, e.g., credit card and internet
  • Medical diagnosis
  • Natural language processing
  • Speech and handwriting recognition
  • Economics and finance
  • Virtually anything else you can think of that involves data

In order to apply machine learning to solve a given problem, the following steps (or a variation) should to be taken, and should use machine learning elements discussed throughout this series.

  1. Define the problem to be solved and the project’s objective. Ask lots of questions along the way!
  2. Determine the type of problem and type of solution required.
  3. Collect and prepare the data.
  4. Create, validate, tune, test, assess, and improve your model and/or solution. This process should be driven by a combination of technical (stats, math, programming), domain, and business expertise.
  5. Discover any other insights and patterns as applicable.
  6. Deploy your solution for real-world use.
  7. Report on and/or present results.

If you encounter a situation where you or your company can benefit from a machine learning-based solution, simply approach it using these steps and see what you come up with. You may very well wind up with a super powerful and scalable solution!


Congratulations to those that have read all five chapters in full! I would like to thank you very much for spending your precious time joining me on this machine learning adventure.

This series took me a significant amount of time to write, so I hope that this time has been translated into something useful for as many people as possible.

At this point, we have covered virtually all major aspects of the entire machine learning process at a high level, and at times even went a little deeper.

If you were able to understand and retain the content in this series, then you should have absolutely no problem participating in any conversation involving machine learning and its applications. You may even have some very good opinions and suggestions about different applications, methods, and so on.

Despite all of the information covered in this series, and the details that were out of scope, machine learning and its related fields in practice are also somewhat of an art. There are many decisions that need to be made along the way, customized techniques to employ, as well as use creative strategies in order to best solve a given problem.

A high quality practitioner should also have a strong business acumen and expert-level domain knowledge. Problems involving machine learning are just as much about asking questions as they are about finding solutions. If the question is wrong, then the solution will be as well.

Thank you again, and happy learning (with machines)!

About the Author: Alex Castrounis founded InnoArchiTech. Sign up for the InnoArchiTech newsletter and follow InnoArchiTech on Twitter at@innoarchitech for the latest content updates.

Original URL:  

Original article

Disrupting Law School

In a new whitepaper, Disrupting Law School, Michael B. Horn and I explore various aspects of disruption in the legal services sector with an eye toward how law schools can respond proactively. As we state in the whitepaper, it is clear to us that law schools need to change. But many in the academy believe […]

Original URL:  

Original article

Extracting image metadata at scale

We have a collection of nearly two million images that play very prominent roles in helping members pick what to watch. This blog describes how we use computer vision algorithms to address the challenges of focal point, text placement and image clustering at a large scale.

Focal point
All images have a region that is the most interesting (e.g. a character’s face, sharpest region, etc.) part of the image. In order to effectively render an image on a variety of canvases like a phone screen or TV, it is often required to display only the interesting region of the image and dynamically crop the rest of an image depending on the available real-estate and desired user experience. The goal of the focal point algorithm is to use a series of signals to identify the most interesting region of an image, then use that information to dynamically display it.
70177057_StoryArt_1536x864.jpg80004288_StoryArt_1536x864 (2).jpg
[Examples of face and full-body features to determine the focal point of the image]

We first try to identify all the people and their body positioning using Haar-cascade like features. We also built haar based features to also identify if it is close-up, upper-body or a full-body shot of the person(s). With this information, we were able to build an algorithm that auto-selects what is considered the “best’ or “most interesting” person and then focuses in on that specific location.

However, not all images have humans in them. So, to identify interesting regions in those cases, we created a different signal – edges. We heuristically identify the focus of an image based on first applying gaussian blur and then calculating edges for a given image.

Here is one example of applying such a transformation:


///Remove noise by blurring with a Gaussian filter
GaussianBlur( src, src, Size(n,n ), 0, 0, BORDER_DEFAULT );
/// Convert the image to grayscale
cvtColor( src, src_gray, CV_BGR2GRAY );

/// Apply Laplace function
Laplacian( src_gray, dst, ddepth, kernel_size, scale, delta, BORDER_CONSTANT );
convertScaleAbs( dst, abs_dst );

Below are a few examples of dynamically cropped images based on focal point for different canvases:


Text Placement
Another interesting challenge is determining what would be the best place to put text on an image. Examples of this are the ‘New Episode’ Badge and placement of subtitles in a video frame.

[Example of “New Episode” badge hiding the title of the show]

In both cases, we’d like to avoid placing new text on top of existing text on these images.

Using a text detection algorithm allows us to automatically detect and correct such cases. However, text detection algorithms have many false positives. We apply several transformations like watershed and thresholding before applying text detection. With such transformations, we can get fairly accurate probability of text in a region of interest for image in large corpus of images.

[Results of text detection on some of the transformations of the same image]

Image Clustering
Images play an important role in a member’s decision to watch a particular video. We constantly test various flavors of artwork for different titles to decide which one performs the best. In order to learn which image is more effective globally, we would like to see how an image performs in a given region. To get an overall global view of how well a particular set of visually similar images performed globally, it is required to group them together based on their visual similarity.

We have several derivatives of the same image to display for different users. Although visually similar, not all of these images come from the same source. These images have varying degrees of image cropping, resizing, color correction and title treatment to serve a global audience.

As a global company that is constantly testing and experimenting with imagery, we have a collection of millions of images that we are continuously shifting and evolving. Manually grouping these images and maintaining those images can be expensive and time consuming, so we wanted to create a process that was smarter and more efficient.

[An example of two images with slight color correction, cropping and localized title treatment]

These images are often transformed and color corrected so a traditional color histogram based comparison does not always work for such automated grouping. Therefore, we came up with an algorithm that uses the following combination of parameters to determine a similarity index – measurement of visual similarity among group of images.

We calculate similarity index based on following 4 parameters:
  1. Histogram based distance
  2. Structural similarity between two images
  3. Feature matching between two images
  4. Earth mover’s distance algorithm to measure overall color similarity

Using all 4 methods, we can get a numerical value of similarity between two images in a relatively fast comparison.

Below is example of images grouped based on a similarity index that is invariant to color correction, title treatment, cropping and other transformations:
[Final result with similarity index values for group of images]

Images play a crucial role in first impression of a large collection of videos, and we are just scratching the surface on what we can learn from media and we have many more ambitious and interesting problems to tackle in the road ahead.

If you are excited and passionate about solving big problems, we are hiring. Contact us

Original URL:  

Original article

Node.js on Google App Engine Goes Beta

When you’re ready to address performance, Google Cloud Trace will help you analyze performance by collecting end-to-end latency data for requests to App Engine URIs and additional data for round-trip RPC calls to App Engine services like Datastore, and Memcache.

NodeSource partnership

Along with the Cloud Debug and Trace tools, we’re also announcing a partnership with NodeSource. NodeSource delivers enterprise-grade tools and software targeting the unique needs of running server-side JavaScript at scale. The N|Solid™ platform extends the capabilities of Node.js to provide increased developer productivity, protection of critical applications and peak application performance. N|Solid and Cloud Platform make a great match for running enterprise Node.js applications. You can learn more about using N|Solid on Cloud Platform from the NodeSource blog.

Committent to Node.js and open source

At Google, we’re committed to open source. The new core node.js Docker runtime, debug module, trace tools, gcloud NPM module, everything  all open source:

We’re thrilled to welcome Node.js developers to Cloud Platform, and we’re committed to making further investments to help make you as productive as possible. This is just the start  keep your ear to the ground to catch the next wave of Node.js support on Cloud Platform.

We can’t wait to hear what you think. Feel free to reach out to us on Twitter @googlecloud, or request an invite to the Google Cloud Slack community and join the #nodejs channel.

Posted by Justin Beckwith, Product Manager, Google Cloud Platform

We’re excited to announce that the Node.js runtime on Google App Engine is going beta. Node.js makes it easy for developers to build performant web applications and mobile backends with JavaScript. App Engine provides an easy to use platform for developers to build, deploy, manage and automatically scale services on Google’s infrastructure. Combining Node.js and App Engine provides developers with a great platform for building web applications and services that need to operate at Google scale.

Getting started

Getting started with Node.js on App Engine is easy. We’ve built a collection of getting started guides, samples, and interactive tutorials that walk you through creating your code, using our APIs and services and deploying to production.

When running Node.js on App Engine, you can use the tools and databases you already know and love. Use Express, Hapi, Parse-server or any other web server to build your app. Use MongoDB, Redis, or Google Cloud Datastore to store your data. The runtime is flexible enough to manage most applications and services  but if you want more control over the underlying infrastructure, you can easily migrate to Google Container Engine or Google Compute Engine for full flexibility and control.

Using the gcloud npm module, you can take advantage of Google’s advanced APIs and services, including Google BigQuery, Google Cloud Pub/Sub, and the Google Cloud Vision API:

var gcloud = require('gcloud')({
  projectId: 'my-project',
  keyFilename: 'keyfile.json'

var vision =;
vision.detectText('./image.jpg', function(err, text) {
  if (text.length > 0) {
    console.log('We found text on this image...');

Services like the Vision API allow you to take advantage of Google’s unique technology in the cloud to bring life to your applications.

Advanced diagnostic tooling

Deploying Node.js applications to Cloud Platform is just the first step. During the lifespan of any application, you’ll need the ability to diagnose issues in production. Google Cloud Debugger lets you inspect the state of Node.js applications at any code location without stopping or slowing it down. You can set breakpoints, and analyze the state of your application in real time:

When you’re ready to address performance, Google Cloud Trace will help you analyze performance by collecting end-to-end latency data for requests to App Engine URIs and additional data for round-trip RPC calls to App Engine services like Datastore, and Memcache.

NodeSource partnership

Along with the Cloud Debug and Trace tools, we’re also announcing a partnership with NodeSource. NodeSource delivers enterprise-grade tools and software targeting the unique needs of running server-side JavaScript at scale. The N|Solid™ platform extends the capabilities of Node.js to provide increased developer productivity, protection of critical applications and peak application performance. N|Solid and Cloud Platform make a great match for running enterprise Node.js applications. You can learn more about using N|Solid on Cloud Platform from the NodeSource blog.

Committent to Node.js and open source

At Google, we’re committed to open source. The new core node.js Docker runtime, debug module, trace tools, gcloud NPM module, everything  all open source:

We’re thrilled to welcome Node.js developers to Cloud Platform, and we’re committed to making further investments to help make you as productive as possible. This is just the start  keep your ear to the ground to catch the next wave of Node.js support on Cloud Platform.

We can’t wait to hear what you think. Feel free to reach out to us on Twitter @googlecloud, or request an invite to the Google Cloud Slack community and join the #nodejs channel.

Posted by Justin Beckwith, Product Manager, Google Cloud Platform

Original URL:  

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: