Linux-insides: Inline assembly

Introduction

During reading of source code of the Linux kernel, often I see something statements like that:

__asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK));

Yes, this is inline assembly or in other words assembler code which is integrated in a high level programming language. In my case this high level programming language is C. Yeah, the C programming language is not very high-level, but still.

If you are familiar with assembly programming language, you may notice that inline assembly is not very different from the usual. Moreover, the special form of inline assembly which is called basic form is the same. For example:

__asm__("movq %rax, %rsp");

or

The same code (of course without __asm__ prefix) you might see in plain assembly code. Yes, this is very similar, but not so simple as it might seem at first glance. Actually, the GCC supports two forms of inline assembly statements:

The basic form consists only from two things: the __asm__ keyword and the string with valid assembler instructions. For example it may looks something like this:

__asm__("movq    $3, %raxtn"
        "movq    %rsi, %rdi");

Instead of the __asm__ keyword, also the asm keyword may be used, but the __asm__ is portable whereas the asm keyword is the GNU extenstion. Further I will use only __asm__ variant in examples.

If you know assembly programming language this looks pretty easy. The main problem is in the second form of inline assembly statements – extended. This form allows us to pass parameters to an assembly statement, perform jumps and etc. Not so hard, but this leads to the need to know the additional rules in addition to the knowledge of assembly language. Everyt ime, when I see yet another piece of inline assembly code in the Linux kernel, I need to refer to the official documentation of GCC to remember how behaves a particular qualifier or what is the meaning of the =&r for example.

I’ve decided to write this part to consolidate my knowledge related to the inline assembly here. As inline assembly statements are quite common in the Linux kernel and we may see them in linux-insides parts sometimes, I thought that it will be useful if we will have a special part which contains description of more important aspects of the inline assembly. Of course you may find comprehensive information about inline assembly in the official documentation, but I like the rule all in one place.

** Note: This part will not provide guide for assembly programming. It is not intended to teach you to write programs with assembler and to know that one or another assembler instruction means. Just a little memo for extended asm. **

Introduction to extended inline assembly

So, let’s start. As I already wrote above, the basic assembly statement consists from the asm or __asm__ keyword and set of assembly instructions. If you are familar with assembly programming language, there is no sense to write something additional about it. Most interesting part is inline assembler with operands or extended assembler. An extended assembly statement looks a little harder and consists not only from two parts:

__asm__ [volatile] [goto] (AssemblerTemplate
                           [ : OutputOperands ]
                           [ : InputOperands  ]
                           [ : Clobbers       ]
                           [ : GotoLabels     ]);

All parameters which are marked with squared brackets are optional. You may notice that if we will skip all optional parameters and also volatile and goto qualifiers, we will get basic form. Let’s start to consider this in order. The first optional qualifier is volatile. This specificator tells to compiler that an assembly statement may produce side effects. In this case we need to prevent compiler’s optimization related to the given assembly statement. In simple words, the volatile specificator tells to compiler to not touch this statement and put it in the same place where it was in the original code. For example let’s look at the following function from the Linux kernel:

static inline void native_load_gdt(const struct desc_ptr *dtr)
{
    asm volatile("lgdt %0"::"m" (*dtr));
}

Here we may see the native_load_gdt function which loads base address of the Global Descriptor Table to the GDTR register with the lgdt instruction. This assembly statement is marked with volatile qualifier. It is very important that compiler will not change original place of this assembly statement in the resulted code. In other way the GDTR register may contain wrong address of the Global Descriptor Table or an address may be correct, but the structure isn’t filled yet. In this way an exception will be generated and the kernel will not booted correctly.

The second optional qualifier is the goto. This qualifier tells to the compiler that the given assembly statement may perform a jump to one of the labels which are listed in the GotoLabels. For example:

__asm__ goto("jmp %l[label]" : : : label);

As we finished with these two qualifiers, let’s consider the main part of an assembly statement body. As we may see, the main part of assembly statement consists from the following four parts:

  • set of assembly instructions;
  • output parameters;
  • input parameters;
  • clobbers.

The first represents a string which contains a set of valid assembly instructions which may be separated by the tn sequence. Names of processor registers must be prefixed with the %% sequence in extended form and other symbols like immediates must start from the $ symbol. The OutputOperands and InputOperands are comma-separated lists of C variables which may be provided with constraints and the Clobbers is a list of registers or other values which are changed by the assembler instructions from the AssemblerTemplate beyond those listed in the OutputOperands. After we considered format of an extended we may look at first example. But before this we must know about constraints. A constraint is a string which specifies placement of an operand. For example value of an operand may be written to processor register or it can be read from memory and etc.

Now let’s consider following simple example:

#include <stdio.h>

int main(void)
{
        int a = 5;
        int b = 10;
        int sum = 0;

        __asm__("addl %1,%2" : "=r" (sum) : "r" (a), "0" (b));
        printf("a + b = %dn", sum);
        return 0;
}

Before we will consider this example, let’s compile and run it to be sure that it works as expected:

$ gcc test.c -o test
./test
a + b = 15

Ok, great. It works. Now let’s consider this example. Here we may see simple C program which calculates sum of two variables and put the result into sum variable. In the end we just print the result. This example consists from three parts. The first is assembly statement with add instruction which adds value of the source operand to the value of the destination operand and stores the result in the destination operand. In our case:

will be expanded to the:

Variables and expressions which are listed in the OutputOperands and InputOperands may be matched in the AssemblerTemplate. An input/output operand is designated as %N where the N is the number of operand from left to right beginning from zero. The second part of the our assembly statement is located after the first : symbol and represents definition of the output value:

Notice that the sum is marked with two special symbols: =r. This is first constraint that we have encountered. Actually constraint here is only r. The = symbol is modifier which denotes output value. This tells to compiler that the previous value will be descarded and replaced by the new data. Besides the = modifier, GCC provides support for following three modifiers:

  • + – an operand is read and written by an instruction;
  • & – output register shouldn’t overlap an input register and should be used only for output;
  • % – tells the compiler that operands may be commutative.

Now let’s back to the r qualifier. As I already wrote above, a qualifier denotes placement of an operand. The r symbol means a value will be stored in one of the general purpose register. The last part of our assembly statement:

are input operands – a and b variables. We already know what does r qualifier mean. Now we may notice new constraint before b variable. The 0 or any other digit from 1 to 9 is called – matching constraint. With this assembler may use only one signle operand that fills two roles. As you may guess, here the value of the constraint provides the order number of operands. In our case 0 will match sum. If we will look at assembly output of our program:

0000000000400400 :
  400401:       ba 05 00 00 00          mov    $0x5,%edx
  400406:       b8 0a 00 00 00          mov    $0xa,%eax
  40040b:       01 d0                   add    %edx,%eax

we will see that only two general purpose registers are used: %edx and %eax. In this way the %eax register is used as for storing value of b variable as for storing result of calculation. We considered input and output parameters of an inline assembly statement. Before we will meet other constraints supportd by gcc, there is still to consider last possible part of an inline assembly statement – clobbers.

Clobbers

As I wrote above, the clobbered part should contain a comma-separated list of registers which will be changed in the AssemblerTemplate. This may be useful when our assembly expression needs in additional register for calculation and only output parameter will be changed. If we will add clobered register to the inline assembly statement, the compiler will take into account this and the register will not be reused in a wrong way.

Let’s consider the same example, but will add additionall simple assembler expression:

__asm__("movq $100, %%rdxtn"
        "addl %1,%2" : "=r" (sum) : "r" (a), "0" (b));

If we will look at the assembly output:

0000000000400400 :
  400400:       ba 05 00 00 00          mov    $0x5,%edx
  400405:       b8 0a 00 00 00          mov    $0xa,%eax
  40040a:       48 c7 c2 64 00 00 00    mov    $0x64,%rdx
  400411:       01 d0                   add    %edx,%eax

We will see that %edx register will will be overwritten with 0x64 or 100 value and the result will be 115 instead of 15. Now if we will add the %rdx register to the list of clobbered registers:

__asm__("movq $100, %%rdxtn"
        "addl %1,%2" : "=r" (sum) : "r" (a), "0" (b) : "%rdx");

and will look at the assembler output again:

0000000000400400 :
  400400:       b9 05 00 00 00          mov    $0x5,%ecx
  400405:       b8 0a 00 00 00          mov    $0xa,%eax
  40040a:       48 c7 c2 64 00 00 00    mov    $0x64,%rdx
  400411:       01 c8                   add    %ecx,%eax

Now we may see that the %ecx register will be used for sum calculation. Besides general purpose registers, we may pass two special specificators. They are:

The first – cc indicates that an assembler code modifies flags register. This is common way to pass cc to clobbers list due to the arithmetic or logic instructions:

__asm__("incq %0" ::""(variable): "cc");

The second memory specificator tells to the compiler that the given inline assembly statement executes arbitrary write or read operations in memory which is not pointed by operands listed in output list. This allows to compiler prevent keeping of values loaded from memory to be cached in registers. Let’s take a look at the following example:

#include <stdio.h>

int main(void)
{
        int a[3] = {10,20,30};
        int b = 5;

        __asm__ volatile("incl %0" :: "m" (a[0]));
        printf("a[0] - b = %dn", a[0] - b);
        return 0;
}

Of course, this example may seem artificial… Ok, in fact this is the case. But it may show us main concept. Here we have an array of integer numbers and one integer variable. The example is pretty simple, we take first element of the a array and increment its value. After this we subtract the value of the b variable from the just incremented value of the first element the a array. In the end we just print result. If we will compile and run this simple example, the result may surprise us:

~$ gcc -O3  test.c -o test
~$ ./test
a[0] - b = 5

The result is 5 here, but why? We increased value of the first element of the a array, so the result must be 6 here. Let’s look at the assembler output of this example:

00000000004004f6 :
  4004f6:       c7 44 24 f0 0a 00 00    movl   $0xa,-0x10(%rsp)
  4004fd:       00 
  4004fe:       c7 44 24 f4 14 00 00    movl   $0x14,-0xc(%rsp)
  400505:       00 
  400506:       c7 44 24 f8 1e 00 00    movl   $0x1e,-0x8(%rsp)
  40050d:       00 
  40050e:       ff 44 24 f0             incl   -0x10(%rsp)
  400512:       b8 05 00 00 00          mov    $0x5,%eax

At the first line we may see that first element of the a array contains 0xa or 10 value. The last two lines of code are actual calculations. We increment value of the first of our array with incl instruction and just put 5 to the %eax register. This looks strange. We have passed -O3 flag to gcc, so the compiler removed calculations. The problem here that GCC has a copy of the element of array in a register (rsp in our case) that was loaded from memory, but in the same way GCC does not associates actual calculation with calculation in the assembly statement and just puts directly calculated result of the a[0] - b to the %eax register.

Let’s now add memory to the clobbers list:

__asm__ volatile("incl %0" :: "m" (a[0]) : "memory");

and the new result will be:

~$ gcc -O3  test.c -o test
~$ ./test
a[0] - b = 6

Now the result is correct. If we will look at the assembly output now:

00000000004004f6 :
  4004f6:       c7 44 24 f0 0a 00 00    movl   $0xa,-0x10(%rsp)
  4004fd:       00 
  4004fe:       c7 44 24 f4 14 00 00    movl   $0x14,-0xc(%rsp)
  400505:       00 
  400506:       c7 44 24 f8 1e 00 00    movl   $0x1e,-0x8(%rsp)
  40050d:       00 
  40050e:       ff 44 24 f0             incl   -0x10(%rsp)
  400512:       8b 44 24 f0             mov    -0x10(%rsp),%eax
  400516:       83 e8 05                sub    $0x5,%eax
  400519:       c3                      retq

we will see one difference here. This difference in the following piece code:

  400512:       8b 44 24 f0             mov    -0x10(%rsp),%eax
  400516:       83 e8 05                sub    $0x5,%eax

Instead of direct calculation, GCC now associates calculation from the assembly statement and put the value of the a[0] to the %eax register after this. In the end it just substracts value of the b variable. Besides memory specificator, we may see new constraint here – m. This constraint tells to compiler to deal with address of the a[0], instead of its value. So, now we finished with clobbers and now we may continue to consider other constraints supported by GCC besided r and m that we already seen.

Constraints

Now as we finished with all three possible parts of an inline assembly statement, let’s return to constraints. We already saw some constraints in this part, like r constraint which represnets register operand, m constraint represents memory operand and 0-9 constraints which are represent an operand that matches specified operand number from an inline assembly statement. Besides this constraints, the GCC provides support for other constraints. For example – i constraint represents an immediate integer operand with know value:

#include <stdio.h>

int main(void)
{
        int a = 0;

        __asm__("movl %1, %0" : "=r"(a) : "i"(100));
        printf("a = %dn", a);
        return 0;
}

The result is:

~$ gcc test.c -o test
~$ ./test
a = 100

Or for example I constraint which represents 32-bit integer. The difference between i and I constraints is that the i is more general, when I is for strictly 32-bit integer data. For example if you will try compile following example:

int test_asm(int nr)
{
        unsigned long a = 0;

        __asm__("movq %1, %0" : "=r"(a) : "I"(0xffffffffffff));
        return a;
}

you will get following error:

$ gcc -O3 test.c -o test
test.c: In function ‘test_asm’:
test.c:7:9: warning: asm operand 1 probably doesn’t match constraints
         __asm__("movq %1, %0" : "=r"(a) : "I"(0xffffffffffff));
         ^
test.c:7:9: error: impossible constraint in ‘asm’

when:

int test_asm(int nr)
{
        unsigned long a = 0;

        __asm__("movq %1, %0" : "=r"(a) : "i"(0xffffffffffff));
        return a;
}

works perfectly:

~$ gcc -O3 test.c -o test
~$ echo $?
0

GCC also supports J, K, N constraints for integer constants in the range 0...63 bits, signed 8-bit integer constants and unsigned 8-bit integer constants respectively. The o constraint represents memory operand which represents offsetable memory address. For example:

#include <stdio.h>

int main(void)
{
        static unsigned long arr[3] = {0, 1, 2};
        static unsigned long element;

        __asm__ volatile("movq 16+%1, %0" : "=r"(element) : "o"(arr));
        printf("%dn", element);
        return 0;
}

The result as expected:

~$ gcc -O3 test.c -o test
~$ ./test
2

All of these constraints may be combined (of course actually not all). In this way the compiler will choose the best one for the certain situation. For example:

#include <stdio.h>

int a = 1;

int main(void)
{
        int b;
        __asm__ ("movl %1,%0" : "=r"(b) : "r"(a));
        return b;
}

will use memory operand:

0000000000400400 :
  400400:       8b 05 26 0c 20 00       mov    0x200c26(%rip),%eax        # 60102c 

That’s all about commonly used constraints in inline assembly statements. More you may find in the documentation.

Architecture specific constraints

Before this part will be finished, let’s look at the set of special constraints. This constrains are architecture specific and as this book is x86_64 architecture specific, we will consider constraints related to it. First of all the set of ad and also S and D constraints represent generic purpose registers. In this case the a constraint corresponds to %al, %ax, %eax or %rax register depending on instruction size. The S and D constraints are %si and %di registers respectively. For example let’s take our previous example. We may see in the its assembly output that value of the a variable is stored in the %eax register. Now let’s look at the assembly output of the same example, but with other constraint:

#include <stdio.h>

int a = 1;

int main(void)
{
        int b;
        __asm__ ("movl %1,%0" : "=r"(b) : "d"(a));
        return b;
}

Now may see that value of the a variable will be stored in the %edx register:

0000000000400400 :
  400400:       8b 15 26 0c 20 00       mov    0x200c26(%rip),%edx        # 60102c 

The f and t constraints represents any floating point stack register – %st and the top of the floating point stack respectively. The u constraint represents the second value from the top of the floating point stack.

That’s all. You may find more details about x86_64 and not only architectures specific constraints in the official documentation.

Links


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/A8fS6UGMRWQ/asm.md

Original article

How I turned my Raspberry Pi into a Chromebox

IMG_20160430_181119TeleRead readers will recall that I posted details of a new hack for the Raspberry Pi 2 and Pi 3 for Chromebook-using teachers and students, turning the ultra-cheap minicomputer into a Chromebox. With my own Raspberry Pi 2 lying idle at home, I decided to try this myself: and here are the results.

Installation of the new OS was as easy as expected. l unpacked the downloaded file with 7-zip and wrote it onto the Raspberry’s micro SD card with Win32diskimager; with this done, I set up my Raspberry with mouse, keyboard, HDMI cable and Ethernet plugged in, then powered it up from the monitor’s USB port. It started up first time with no problems.

How well does it work though? Startup is slow, but no slower than many Linux OS versions running on the Raspberry Pi 2. The display looks fine across a big screen, and with a Bluetooth USB adapter plugged in, the device can work fine with wireless mouse and keyboard. For web browsing, document editing on Google Docs, audio playback on YouTube, email composition on Gmail, and even Facebook, it should be enough. However, video playback is clunky and often plain inadequate, and the thing often locks up on complex web pages like Twitter. The Raspberry Chromebox does record most account settings for restart, but wifi doesn’t work, at least with the USB adapters I have – a known issue that may get fixed later.

How useful is it? For general purpose computing on the Raspberry Pi 2, it’s faster and more user-friendly than the Linux versions I’ve run on this device. I plan to use it for text editing and browsing, and am already listening to audiobooks on it. It may not be good for much more, but you get what you pay for, and with the Raspberry Pi 2, you’re paying almost nothing.

The post How I turned my Raspberry Pi into a Chromebox appeared first on TeleRead News: E-books, publishing, tech and beyond.


Original URL: http://www.teleread.com/turned-raspberry-pi-chromebox/

Original article

Automatic Image Colorization of Greyscale Images Using CNN

Abstract:

We present a novel technique to automatically colorize
grayscale images that combines both global priors and local
image features. Based on Convolutional Neural Networks, our
deep network features a fusion layer that allows us to
elegantly merge local information dependent on small image
patches with global priors computed using the entire image. The
entire framework, including the global and local priors as well
as the colorization model, is trained in an end-to-end fashion.
Furthermore, our architecture can process images of any
resolution, unlike most existing approaches based on CNN. We
leverage an existing large-scale scene classification database
to train our model, exploiting the class labels of the dataset
to more efficiently and discriminatively learn the global
priors. We validate our approach with a user study and compare
against the state of the art, where we show significant
improvements. Furthermore, we demonstrate our method
extensively on many different types of images, including
black-and-white photography from over a hundred years ago, and
show realistic colorizations.

Paper (15.3MB) Code (GitHub) BibTex

Colorization Architecture:

Our model consists of four main components: a low-level
features network, a mid-level features network, a global
features network, and a colorization network. The components
are all tightly coupled and trained in an end-to-end fashion.
The output of our model is the chrominance of the image which
is fused with the luminance to form the output image.

Publications:

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa.
“Let there be Color!: Joint End-to-end Learning of Global and
Local Image Priors for Automatic Image Colorization with
Simultaneous Classification”.
ACM Transaction on Graphics (Proc. of SIGGRAPH 2016), to
appear.
@Article{IizukaSIGGRAPH2016,
  author =
{Satoshi Iizuka and Edgar Simo-Serra and Hiroshi Ishikawa},
  title = {{Let there be Color!: Joint End-to-end
Learning of Global and Local Image Priors for Automatic Image
Colorization with Simultaneous Classification}},
  journal = "ACM Transactions on Graphics
(Proc. of SIGGRAPH 2016)",
  year = 2016,
  volume = 35,
  number = 4,
}
This work was partially supported by JST CREST.


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/acZAOesve3M/

Original article

Cisco Finds Backdoor Installed on 12M PCs

UPDATED. Cisco’s Talos security intelligence and research group has come across a piece of software that installed backdoors on 12 million computers around the world.

The software, which exhibits adware and spyware capabilities, was developed by a French online advertising company called Tuto4PC. The firm, previously known as Eorezo Group and apparently linked to another company called Wizzlabs, has been targeted by French authorities over its questionable practices regarding the installation of unwanted software and harvesting of users’ personal details.

Cisco started analyzing Tuto4PC’s OneSoftPerDay application after its systems detected an increase in “Generic Trojans” (i.e. threats not associate with any known family). An investigation uncovered roughly 7,000 unique samples with names containing the string “Wizz,” including “Wizzupdater.exe,” “Wizzremote.exe” and “WizzInstaller.exe.” The string also showed up in some of the domains the samples had been communicating with.

Researchers determined that the application, installed with administrator rights, was capable not only of downloading and installing other software, such as a known scareware called System Healer, but also of harvesting personal information. Furthermore, experts found that the software is designed to detect the presence of sandboxes, antiviruses, security tools, forensic software and remote access doors.

These “features” have led Cisco Talos to classify the Tuto4PC software as a “full backdoor capable of a multitude of undesirable functions on the victim machine.”

According to Tuto4PC’s website, the company offers hundreds of tutorials that users can access for free by installing a piece of software that displays ads. However, based on Cisco’s research, it appears the company is doing more than just displaying ads.

Tuto4PC said its network consisted of nearly 12 million PCs in 2014, which could explain why Cisco’s systems detected the backdoor on 12 million devices. An analysis of a sample set revealed infections in the United States, Australia, Japan, Spain, the UK, France and New Zealand.

“Based on the overall research, we feel that there is an obvious case for this software to be classified as a backdoor. At minimum it is a potentially unwanted program (PUP). There is a very good argument that it meets and exceeds the definition of a backdoor,” Cisco Talos researchers said in a blog post.

“The creation of a legitimate business, multiple subsidiaries, domains, software and being a publicly listed company do not stop this adware juggernaut from slowing down their attempts to push their backdoors out to the public,” they added.

In response to Cisco’s blog post, Tuto4PC Group CEO Franck Rosset clarified that its antivirus bypass technology is not used for malicious purposes — he says it’s designed to make it easier for users to install its applications, which have been blocked by antiviruses. The company has provided the following statement to SecurityWeek

The Talos blogpost is inaccurate in describing Tuto4PC as a shady malware distribution enterprise. We are currently working with our lawyers in order to evaluate the action we can take against Talos’ inexact (negative) presentation of our business.

 

We are a listed company on the French stock exchange. Since 2004, our business model is to create widgets, tutorials etc. for free download on download websites. The download of our programs is for free subject to agreement for accepting advertising from an adware attached in the download.

 

Contrary to Talos’ wrongful allegations, our business has been approved by French regulators and we have never been indicted or sued for any malware distribution!!!!

 

We have a technology subsidiary (Cloud 4PC) with some developments in cybersecurity. Due to some undue blocking by antiviruses that recently blocked Tuto4PC adware (some of them have also an adware business model), we are using a bypass technology so that people can easily download our programs (and adware). Although the bypass software is extremely efficient, it has no other purpose or use that helping the Tuto4PC adware download.

 

There is no malware activity and Talos cannot prove or show any malware use of the program — with more than 10 million installed, if there was to be any malware activity, obviously there should be some user complaints.

 

As you can see, we are a French company — very easy to reach, we are not hiding in some rogue country — we do not understand why Talos has not contacted us prior to their post.

 

In any case, our subsidiary Cloud 4PC is going to launch soon “AV Booster,” an antivirus booster that will help stop any real malware that use bypass techniques like the ones we developed.”

*Updated with statement from Tuto4PC

Related: Lenovo Accused of Shipping PCs With Adware That Breaks HTTPS Browsing

Related: Android Tablets with Pre-Installed Trojan Sold on Amazon

Previous Columns by Eduard Kovacs:


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/VwN6UY_xwwc/cisco-finds-backdoor-installed-12-million-pcs

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: