NVIDIA Shows New Doom Demo On GeForce GTX 1080

MojoKid shares a video showing the upcoming Doom game on NVIDIA’s new GeForce GTX 1080 graphics card using the Vulkan API, quoting this report from HotHardware:
At a private briefing with NVIDIA, representatives from id software came out on stage to show off the upcoming game…the first public demonstration of the game using both NVIDIA’s new flagship and the next-gen API, which is a low-overhead, cross-platform graphics and compute API akin to DirectX 12 and AMD’s Mantle. In the initial part of the demo, the game is running smoothly, but its frame rate is capped at 60 frames per second. A few minutes in, however, at about the :53 second mark…the rep from id says, “We’re going to uncap the framerate and see what Vulkan and Pascal can do”.

With the framerate cap removed, the framerate jumps into triple digit territory and bounces between 120 and 170 frames per second, give or take. Note that the game was running on a projector at a resolution of 1080p with all in-game image quality options set to their maximum values. The game is very reminiscent of previous Doom titles and the action is non-stop.

Share on Google+

Read more of this story at Slashdot.

Original URL: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/LvhO0xzncTw/nvidia-shows-new-doom-demo-on-geforce-gtx-1080  

Original article

Dropbox Cuts Several Employee Perks as Silicon Valley Startups Brace For Cold

Not everything is working out at Dropbox, popular cloud storage and sharing service, last valued at $10 billion. Business Insider is reporting a major cost cutting at the San Francisco-based company. As part of it, the publication reports, Dropbox has cancelled its free shuttle in San Francisco, its gym washing service, pushed back dinner time by an hour and curtailed the number of guests to five per month (previously it was unlimited). These cuttings will directly impact Dropbox’s profitability. According to a leaked memo, obtained by BI, employee perks alone cost the company at least $25,000 a year for each employee. (Dropbox has nearly 1,500 employees.) From the report: Dropbox isn’t the only high-profile startup to unleash a company wide cost-cutting campaign lately. A number of unicorn startups, worth over $1 billion, including Evernote, Jawbone, and Tango, have all gone through some form of cost cuts, whether layoffs, office closures, or reduced employee perks. […] A lot of this has to do with the slowing venture funding environment in Silicon Valley. Investors have become much more conservative with their money lately, and are losing patience for startups that have failed to generate returns after years of free spending. For Dropbox, the cost cuts may have less to do with the state of the VC market than with its own ambitions. Dropbox CEO Drew Houston has repeatedly said in the past that he doesn’t need to raise capital in the private market anymore. Instead, Dropbox may want to show investors that its business is strong enough to IPO.

Share on Google+

Read more of this story at Slashdot.

Original URL: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/xvTY2ePtSnc/dropbox-cuts-several-employee-perks-as-silicon-valley-startups-brace-for-cold  

Original article

Using LLVM to Accelerate Application Performance on GPUs

This post can also be found on Nvidia’s site.

At MapD our goal is to build the world’s fastest big data analytics and visualization platform that enables lag-free interactive exploration of multi-billion row datasets. MapD supports standard SQL queries as well as a visualization API that maps OpenGL primitives onto SQL result sets.

Although MapD is fast running on x86-64 CPUs, our real advantage stems from our ability to leverage the massive parallelism and memory bandwidth of GPUs. The most powerful GPU currently available is the NVIDIA Tesla K80 Accelerator, with up to 8.74 teraflops of compute performance and nearly 500 GB/sec of memory bandwidth. By supporting up to eight of these cards per server we see orders-of-magnitude better performance on standard data analytics tasks, enabling a user to visually filter and aggregate billions of rows in tens of milliseconds, all without indexing. The following Video shows the MapD dashboard, showing 750 million tweets animated in real time. Nothing in this demo is pre-computed or canned. Our big data visual analytics platform is running on 8 NVIDIA Tesla K40 GPUs on a single server to power the dashboard.

<a href="http://www.youtube.com/watch?feature=player_embedded&v=iBJomD2mQf0%20
%0A” target=”_blank” rel=”noreferrer”>MapD GPU database demonstration

Fast hardware is only half of the story, so at MapD we have invested heavily into optimizing our code such that a wide range of analytic workloads run optimally on GPUs. In particular, we have worked hard so that common SQL analytic operations, such as filtering (WHERE) and GROUP BY, run as fast as possible. One of the biggest payoffs in this regard has been moving from the query interpreter that we used in our prototype to a JIT (Just-In-Time) compilation framework built on LLVM. LLVM allows us to transform query plans into architecture-independent intermediate code (LLVM IR) and then use any of the LLVM architecture-specific “backends” to compile that IR code for the needed target, such as NVIDIA GPUs, x64 CPUs, and ARM CPUs.

Query compilation has the following advantages over an interpreter:

  1. Since it is inefficient to evaluate a query plan for a single row at a time (in one “dispatch”), an interpreter requires the use of extra buffers to store the intermediate results of evaluating an expression. For example, to evaluate the expression x * 2+3, an interpreter-based query engine would first evaluate x * 2 for a number of rows, storing that to an intermediate buffer. The intermediate results stored in that buffer would then be read and summed with 3 to get the final result. Writing and reading these intermediate results to memory wastes memory bandwidth and/or valuable cache space. Compare this to a compiled query which can simply store the result of the first subexpression (x*2) into a register before computing the final result, allowing the cache to be used for other purposes, for example to create the hash table necessary for a query’s GROUP BY clause. This is related to loop fusion and kernel fusion compiler optimizations.

  2. An efficient interpreter would likely involve executing instructions represented by vectors of opcodes/byte-codes. Decoding the byte-code to get the required operations and then branching to the correct operation requires a significant amount of extra cycles. On the other hand, pre-generating compiled code for the query avoids the inefficiencies of this virtual machine approach.

  3. Depending on the number and range of the columns used in a GROUP BY clause, different hash strategies are optimal. Some of them rely on generating collision-free hash functions based on the range of the data, which is only known at runtime. Reproducing such functionality efficiently with an interpreter, particularly when the number and types of columns can vary, is difficult.

Of course, LLVM is not the only way to generate a JIT query compiler. Some databases employsource-to-source compilers to convert SQL to another source language like C++, which they then compile using regular compilers like gcc. We think that an LLVM-based compiler has significant advantages over a transpiler, including:

  1. Compilation times are much quicker using LLVM. We can compile our query plans in tens of milliseconds, whereas source-to-source compilation often requires multiple seconds to compile a plan. Since our platform is built for interactive data exploration, minimizing query compilation time is critical.

  2. LLVM IR is quite portable over the various architectures we run on (GPU, x86-64, ARM). In contrast, source language generation requires more attention to syntactic differences, particularly in divergent cases like CUDA vs. OpenCL (both can be targeted with LLVM quite easily).

  3. LLVM comes with built-in code validation APIs and tools. For example, comparison and arithmetic operations on integers will fail (with a useful error message) if the operand widths are different. Once a function is generated, llvm::verifyFunction performs additional sanity checks, ensuring (among other things) that the control flow graph of our query is well-formed.

How MapD Uses NVVM

LLVM is powerful and battle-proven for CPUs, but our product focuses on GPUs. If we could use LLVM for GPU code compilation we’d get all the benefits we’ve mentioned while also being able to run on a CPU when needed. Fortunately, the NVIDIA Compiler SDK made this a reality long before we started to build our product.

Figure 1: The MapD dashboard showing airline data using the Crossfilter interface.

The NVIDIA Compiler SDK includes libNVVM, an LLVM-based compiler backend and NVVM IR, a rather extensive subset of LLVM IR. Thanks to our choice of LLVM and libNVVM, our system runs on NVIDIA GPUs, GPU-less ultrabooks, and even on the 32-bit ARM CPU on the Jetson TK1, all using the same code base.

MapD does not need to directly generate all code. We offload some of the functionality to a runtime written in C++ whenever code generation would be tedious and error-prone without any performance benefits. This approach is a great fit for things like aggregate functions, handling arithmetic on columns with SQL null values, hash dictionaries and more. The LLVM based C++ compiler, clang, generates the corresponding LLVM IR, and we combine it with our explicitly generated IR.

As is always the case when compilation is involved, the time required to generate native code is an important consideration. An interactive system sees new queries all the time as the user refines them in search of insight. We’re able to keep code generation consistently under 30 ms for entirely new queries, which is good enough to be unnoticeable in the console, especially for massive datasets. However, for “mere billions” of rows, our UI is able to show smooth animations over multiple correlated charts. Since the actual execution is so fast in this case, 30 ms can matter a lot.

Fortunately, these queries are structurally identical and only differ in the value of literals as the filter window moves across the time range or the user selects the tail of a histogram. With caching in place, compilation time becomes a non-issue. We keep it simple and still generate the IR, then use it as a key in the native code cache. The LLVM API offers an easy way to serialize source level entities (functions in our case), shown below.

std::string serialize_function(const llvm::Function* f) {
std::stringstream ss;
llvm::raw_os_ostream os(ss);
return ss.str();
Performance Measurements

Ideas are great in performance-focused systems, but the proof is in the pudding. As it turns out, MapD extracts a lot of performance out of GPUs.

Queries using filter and aggregate routinely hit more than 80% of the available bandwidth. We’ve measured more than 240 GB/s on a single K40 (vs a theoretical max of 288GB/sec) for a filter and count query touching a single column. When grouping by a single column with 20 possible values and some skew (the carrier in the airline data set in Figure 1), MapD can only reach slightly more than 100 GB/s on K40. On the new Titan X GPU, based on the Maxwell architecture, we are able to get more than 200 GB/s on the same query, on a single card. Maxwell handles contention in shared memory atomics significantly better than the Kepler architecture, which explains this great result on skewed inputs. We’re looking forward to this feature being implemented on future generations of Tesla cards as well.

Figure 2: MapD performance compared to leading in-memory database on 2-socket, 8-GPU system (group-by and filter query)

MapD is easily able to get a 40-50x speedup on a multi-GPU system, even when compared to our own code running on a high end dual-socket CPU system, and there are even queries for which the gap is two orders of magnitude (this is often code with lots of divisions, which tend to be slow on x86-64). Compared to other leading in-memory CPU-based databases, which typically use interpreters or source-to-source compilers, the speedup can easily be three orders of magnitude, as Figure 2 shows.

LLVM JIT Compilation for GPUs: Tips and Tricks

We’ve learned a lot about LLVM and JIT compilation for GPUs while building MapD’s interactive query engine, and we’d like to share some of that experience with you.

Most MapD runtime functions are marked as always_inline, which forces the LLVMAlwaysInliner optimization pass to inline them so that there is no function call overhead and increased scope for other optimization passes. For example, the following is a reasonable way of implementing a max aggregate.

extern "C" __attribute__((always_inline))
void agg_max(int64_t* agg, const int64_t val) {
*agg = std::max(*agg, val);

Note that the function is not marked as device since this is not CUDA C++ code. Any explicit call to this function will be eventually inlined and the result can run unmodified on the GPU. Also, ifagg points to a value allocated on the stack (as is the case for queries without GROUP BY clause), the PromoteMemoryToRegister pass will place it in a register for the inner loop of the query. The runtime functions which need GPU-specific implementations are part of a regular CUDA C++ library we can call from the query.

We’ve said that NVVM generates native code, but there actually is an additional step we haven’t discussed.

From the IR we generate, NVVM generates PTX, which in turn is compiled to native code for the GPU. Especially if you’re bundling a CUDA C++ library with the generated code, like we do, caching the result of this last step is very important. Make sure the compute cache directory is writable by your application or else it will silently fail and recompile every time. The code snippet below shows how we bundle a library with the PTX we generate.

checkCudaErrors(cuLinkCreate(num_options, &option_keys[0],
&option_values[0], &link_state_));
if (!lib_path.empty()) {
// To create a static CUDA library:
// 1. nvcc -std=c++11 -arch=sm_30 --device-link
// -c [list of .cu files]
// 2. nvcc -std=c++11 -arch=sm_30
// -lib [list of .o files generated by step 1]
// -o [library_name.a]
checkCudaErrors(cuLinkAddFile(link_state_, CU_JIT_INPUT_LIBRARY,
                              lib_path.c_str(), num_options, 
                              &option_keys[0], &option_values[0]));
checkCudaErrors(cuLinkAddData(link_state_, CU_JIT_INPUT_PTX,
                          strlen(ptx) + 1, 0, num_options,
                          &option_keys[0], &option_values[0]));
 void* cubin;
 size_t cubin_size;
 checkCudaErrors(cuLinkComplete(link_state_, &cubin, &cubin_size));
 checkCudaErrors(cuModuleLoadDataEx(&module_, cubin, num_options,
                               &option_keys[0], &option_values[0]));
 checkCudaErrors(cuModuleGetFunction(&kernel_, module_, func_name.c_str()));

There is an upper bound for the number of registers a block can use, so the CU_ JIT_ THREADS_ PER_BLOCK option should be set to the block size. Failing to do so can make the translation to native code fail. We’ve had this issue for queries with many projected columns and a lot of threads per block before setting this option.

Speaking of libraries, not all POSIX C functions are included in the CUDA C++ runtime libraries. In our case, we needed gmtime_r for the EXTRACT family of SQL functions. Fortunately, we’ve been able to port it from newlib and compile it with NVCC.

Just a word of caution: despite sharing the IR specification, NVVM and LLVM are ultimately different code-bases. Going with an older version of LLVM, preferably the one NVVM is based on, can help. We decided against that approach since the LLVM API offers a wide range of “IR surgery” features and we were able to fix up these mismatches, but your mileage may vary.

Also, unlike LLVM IR, unaligned loads are not allowed in NVVM IR. The address of a load must be a multiple of the size of the type; otherwise, the query would crash with an invalid memory access error on the GPU, even if the load is not annotated as aligned.

Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/lVZoQXq4vmc/  

Original article

Helix conducts research as you write

tcdisrupt_NY16-8702 Researchers often need to go beyond Google to find the kind of medical journal articles and flat data files necessary for their work. But many journal articles are locked away in databases like JSTOR or PubMed, which don’t have the reliable search capabilities of an engine like Google — so researchers have to waste time tracking them down. Read More

Original URL: http://feedproxy.google.com/~r/Techcrunch/~3/gUKpTUAgcgc/  

Original article

Are US Courts ‘Going Dark’?

An anonymous reader writes: Judge Stephen Wm. Smith argues that questions about the government’s “golden age of surveillance” miss an equally significant trend: that the U.S. Courts are “going dark”. In a new editorial, he writes that “Before the digital age, executed search warrants were routinely placed on the court docket available for public inspection,” but after the Electronic Communications Privacy Act of 1986, more than 30,000 secret court surveillance orders were given just in 2006. He predicts that today’s figure is more than double, “And those figures do not include surveillance orders obtained by state and local authorities, who handle more than 15 times the number of felony investigations that the feds do. Based on that ratio, the annual rate of secret surveillance orders by federal and state courts combined could easily exceed half a million.”

Judge Smith also cites an increase in cases — even civil cases — that are completely sealed, but also an increase in “private arbitration” and other ways of resolving disputes which are shielded from the public eye. “Employers, Internet service providers, and consumer lenders have led a mass exodus from the court system. By the click of a mouse or tick of a box, the American public is constantly inveigled to divert the enforcement of its legal rights to venues closed off from public scrutiny. Justice is becoming privatized, like so many other formerly public goods turned over to invisible hands — electricity, water, education, prisons, highways, the military.” The judge’s conclusion? “Over the last 40 years, secrecy in all aspects of the judicial process has risen to literally unprecedented levels. “

Share on Google+

Read more of this story at Slashdot.

Original URL: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/VC6z1EKxKfs/are-us-courts-going-dark  

Original article

Upstreaming in Node?

Upstreaming in Node?


A Node.js question —

We had a feature in Radio UserLand called upstreaming.

There was a folder structure that it watched where it stored all the users public data. When a file appeared in the structure, or a file changed, Radio copied the file to a user-specified public server location.

It’s how publishing worked, also how it shared data.

I was wondering if there was a Node.js app that does this.

If not, I’m probably going to write it. Should be simpler in Node and it performed fast even back in 2002, it would be amazing today. 

Let me know. I’m @davewiner on Twitter or comment on Facebook.

Original URL: http://scripting.com/2016/05/08/1243.html  

Original article

Nvidia’s new graphics cards are a big deal

The hotly anticipated Nvidia GeForce GTX 1080 is now official, marking a major leap in performance and efficiency with the introduction of the new Pascal architecture. How major? Take the ultra beefy and expensive Titan X graphics card from last year, slice its power demands from 250W to 180W, cut its price from $999 to $599, and then throw in a bit more performance on top. Virtual reality and multi-monitor gaming have also been foremost concerns in Nvidia’s Pascal development, and the graphics company has a super cool new technology to make both better, called simultaneous multi-projection.

nvidia gtx 1080
nvidia gtx 1080

GeForce GTX 1080 full specifications

In Nvidia CEO Jen-Hsun Huang’s words, “The GeForce GTX 1080 is almost irresponsible amounts of performance.” Claiming twice the VR performance and three times the power efficiency of the Titan X, what Nvidia is doing with its new GTX 1000 series is bringing yesteryear’s insane high end into 2016’s mainstream. In fact, the more significant GPU announced today might well be the GTX 1070, a $379 card that also outperforms the Titan X. Built on a 16nm FinFET process, these Pascal GPUs are evidently extraordinarily efficient, and once they start trickling down into more affordable models, they promise to unlock VR gaming for a much wider audience.

The GTX 980, the graphics card that’s most often recommended as a good baseline specification for a VR-capable PC, is immediately going to be pushed down in price. Not coincidentally, there are some discount deals on GTX 980 Ti cards this weekend.

The simultaneous multi-projection tech that’s part of Pascal shouldn’t be overlooked. Its main purpose is to correct all the distortions that arise when using dual- or triple-monitor setups, turning your display array into a true window unto a 3D world. I saw a gorgeous triple-display rig at Acer’s stand at Computex last year, showing off The Witcher 3 — but only the middle monitor’s images looked beautiful, with the two peripheral ones suffering from horrible warping. It was a known issue then, mostly because the game didn’t support the wild 10,320 x 1,440 resolution, but Nvidia is wisely addressing the issue directly now. The company knows the primary reasons to buy its top-end GPUs will be multi-display and VR rigs.

Conscious of the need to provide more power-intensive applications to fuel demand for its GPUs, Nvidia has also unveiled a VR Funhouse experience, which showcases its various VRWorks software development tools. VR Funhouse is compatible with the HTC Vive and will soon be made available through the Steam store, integrating visual, audio, and touch technologies that Nvidia has developed for creating more immersive VR content. And if all that wasn’t enough, there’s now a new Ansel in-game photography tool, which adds visual filters, 360-degree captures, and a whole lot more to the traditional screenshot tool.

The Nvidia GeForce GTX 1080 will be available from May 27th at a retail price of $599. A so-called founders edition, featuring speed-binned chips (i.e. the ones that outperform the base spec the most and offer the most overclocking headroom), will be sold by Nvidia for $699. The GTX 1070 follows two weeks later, on June 10th, at a price of $379 from Nvidia’s partners or $449 for the founders edition.

Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/SZpcUvppH8Q/nvidia-gtx-1080-1070-pascal-specs-price-release-date  

Original article

Linux Mint 18 Will Ship Without Multimedia Support

An anonymous reader quotes this report from Distrowatch: Linux Mint 18 will no longer provide separate, codec-free installation media for OEM and magazine distribution. Instead, the distribution will ship without multimedia support while making it easy for users to acquire media codecs during the initial installation of the operating system. “OEM installation disks and NoCodec images will no longer be released. Instead, similar to other distributions, images will ship without codecs and will support both traditional and OEM installations. This will reduce our release cycle to 4 separate events and the production and testing of 12 ISO images. Multimedia codecs can be installed easily: From the welcome screen, by clicking on “Multimedia Codecs”, or from the main menu, by clicking on “Menu”->”Sound and Video”->”Install Multimedia Codecs”, or during the installation process, by clicking a checkbox option.” Additional information on the upcoming release of Linux Mint 18 can be found in the project’s monthly newsletter.

Softpedia points out that they’re using Ubuntu 16.04 LTS as the package base, meaning “more hardware devices and components are now supported.”

Share on Google+

Read more of this story at Slashdot.

Original URL: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/Tm7NSQwIw38/linux-mint-18-will-ship-without-multimedia-support  

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: