Understand EVM bytecode – Part 1

If you have started reading this article, I guess you already know what EVM stands for. So I wouldn’t spend too much time on the background of Ethereum. If you do need some basics of it, please go ahead google “Ethereum Virtual Machine”. The main goal of these series of articles is to help understanding everything about EVM bytecode in case you will be involved in some work about bytecode level contract audit or develop a decompiler of EVM bytecode.

Now let’s start with some very basic of EVM bytecode. EVM is a stack-based Virtual Machine. If you have experience with any of similar VMs (like Java VM, DVM, .NET VM), you wouldn’t have too much difficulty to understand the basic idea of it. Basically, EVM bytecode is the VM level machine language. You can image these level of code is certainly not for human to read same as low level machine codes. It can be compiled by high level EVM languages. The most popular one would be Solidity for now. To understand EVM bytecodes better, I will use a lot of simple Solidity samples for demo. So let’s start with our very first simple example:

pragma solidity 0.4.25;

contract Demo1 {
uint public balance;

function add(uint value) public returns (uint256) {
balance = balance + value;
return balance;
}
}

You may ask why I didn’t use the common HelloWorld as a start example. That is because commonly a HelloWorld example will use a string variable, and for our EVM bytecode, the string variable is a dynamical length variable, and we will get another article to talk about it later. So let’s just start with some a simple Add operation for the very first demo.

To compile this piece of Solidity program, we need a compiler. I really recommend Remix for this job. Remix is not just an online compiler, it also supports a lot of great features you would love. Please visit following link to start using it:

https://remix.ethereum.org

The main GUI of Remix is shown as following photo:

The portal is straightforward to use. On the right column of the page, there are tabs you can select for your interest.

After adding a new file demo1.sol in Remix portal, you can choose the right compiler version from “Compile” tab for the compilation. Here we are using “0.4.25”. When the compilation is done without any errors, you can click on “Details” to get the EVM bytecode from the value of “object” in the “BYTECODE” section of the popped out page.

The whole string of it is:

608060405234801561001057600080fd5b5060c78061001f6000396000f3
0060806040526004361060485763ffffffff7c0100000000000000000000
0000000000000000000000000000000000006000350416631003e2d28114
604d578063b69ef8a8146074575b600080fd5b348015605857600080fd5b
5060626004356086565b60408051918252519081900360200190f35b3480
15607f57600080fd5b5060626095565b6000805482019081905591905056
5b600054815600a165627a7a7230582063aa00920d824233ab5307ef3a379
c757bdbee62fe00fe36a5d852c766e58fef0029

At the first glance of the string, you might be just lost, right? But don’t worry we will explore the whole piece of binary string to understand the in and out of it.

First, if you look at the string closely, you will know this is a HEX format string to present a piece of binary. Yes, you are right. The real EVM bytecode is actually a binary string, but in order to show it better to others, it is always be presented in the HEX format. To understand the every byte of the binary characters inside the string, we need to first know some basics of EVM opcodes.

An opcode is a instruction of the EVM. Every opcode itself is a 8bit unsigned integer. For example, 0x00 means STOP, 0x01 means ADD. To understand all meanings of the opcodes, please refer to the Ethereum Yellow Paper at:

https://ethereum.github.io/yellowpaper/paper.pdf

For now, we wouldn’t go through all of the opcodes to explain the meanings. We just need to know the basics of them and explain the new opcodes when we encountered them. So let’s start from the first part of the EVM bytecode we got from Remix to explain:

6080604052

If we mapping all opcode into a readable instructions, we can get following code:

00:  6080 PUSH1 0x80
02: 6040 PUSH1 0x40
04: 52 MSTORE

From above code snippet, we can see 2 opcodes, PUSH1 and MSTORE. PUSH1 means to push 1-byte integer into stack for future use. There are also PUSH2, PUSH3 … until PUSH32. In EVM all integers are from 1-byte to 32-byte long. PUSH family opcodes are the only ones come with operands in EVM bytecode, because for rest of the opcodes they will use the values in the stack. For this example, the first 2 PUSH1 will push 0x80 and 0x40 into the stack, then MSTORE will use the 2 items in the stack for the memory write operation. So the above code snippet is actually the EVM assembler code:

mstore(0x40,0x80)

After MSTORE uses the 2 items in the stack, they will be popped out. Commonly the result of the opcode will be pushed into the stack for later use. However MSTORE does not have a return value, so it will not push anything into the stack.

So in this way if you keep going through the whole EVM bytecode Remix returned to us, you will get the whole list of opcodes. But before we go further to explore more opcodes, let’s talk about 2 more concepts in the EVM environment, memory and storage.

Memory is a readable and writable structure designed for hash calculation and external calls or returns. Memory is reset as stack whenever the EVM starts. The difference from stack is that memory can be accessed by address.  For the earlier example, MSTORE will save the specified value 0x80 into the according address 0x40. You might wonder the meaning of this action. Actually, address 0x40 in EVM memory is reserved for the “free memory pointer”, so when the EVM code needs to use some memory, it will get the free memory pointer from 0x40. Also, if you don’t want that memory be overflowed by future operation, you need to update the value in 0x40 so future operation will not use the same memory again.

Other than memory and stack, storage variables are the ones which hold states. So storage variables won’t be reset every time EVM restarts. You can consider storage as a dictionary or hash table. Everything changed in storage will be recorded in the world states of Ethereum ecosystem. Storage related opcodes are SLOAD and SSTORE. We will talk more about storage variables when analyzing more complicated structures like mappings or arrays.

Based on these information, let’s continue on the bytecode string.

05: 34     CALLVALUE
06: 80 DUP1
07: 15 ISZERO
08: 61 PUSH2 0x0010
0B: 57 JUMPI
0C: 6000 PUSH1 0x00
0E: 80 DUP1
0F: FD REVERT
10: 5B JUMPDEST
11: 50 POP
12: 60C7 PUSH1 0xc7
14: 80 DUP1
15: 61001F PUSH2 0x001f
18: 60 PUSH1 0x00
1A: 39 CODECOPY
1B: 60 PUSH1 0x00
1D: F3 RETURN
1E: 00 STOP

This code snippet is a bit long, but don’t worry about it. Let’s go through it from step by step. CALLVALUE will push msg.value into the stack, then DUP1 will duplicate that value on the stack and check it whether it is 0 or not by using ISZERO. If the value ISZERO got from stack is 0, this opcode will push a TRUE into the stack for next instructions. The next PUSH2 will push a code address 0x0010 into the stack for JUMPI. JUMPI is a conditional jump instruction which uses 2 items from the stack. One is for the condition result, and the other is for the jump address. If the condition (in this case, it is the ISZERO(msg.value)) is satisfied the execution will jump to 0x0010, otherwise the code will end with REVERT(0,0). So the bytecode from address 0x05-0x0F can be transferred to following equivalent Solidity code:

if(msg.value != 0) revert();

The reason why we didn’t see this line in our original Solidity code is because this check was injected by compiler for non-payable functions.

To continue on the later part of the bytecode, if you arrange the stack manually, you can see there is an instruction CODECOPY(0x0,0x001F,0xC7). It means it will copy 0xC7 bytes code from offset 0x1F into memory (0x0, 0xC7). Then the code will call RETURN(0x0,0xC7) to hand the copied data back to EVM. Until now you might have guessed out the logic of this operation and what is the functionality of this piece of bytecode.

Apparently, the whole piece of bytecode generated from Remix compiler has multiple parts. The set from 0-0x1E is the creation part of the contract. This code will be only called during the smart contract creation. It will call the constructor of the contract and also copy the runtime part of code to EVM for creation. After the contract account is created, then the runtime part of code from 0x1F-(0x1F+0xC7) will be called for future transactions on this contract and the constructor function will not be called anymore. Also, you might have found that in the creation part of the bytecode, this is no any JUMP or JUMPI instructions to make the execution into the runtime part bytecode.

To prove what we guess is correct, let’s make another Solidity code with a constructor function:

pragma solidity 0.4.25;

contract Demo2 {
uint public balance;

function add(uint value) public returns (uint256) {
balance = balance + value;
return balance;
}

constructor (uint value) public {
balance = value;
}
}

After compiling it with Remix, we can get the creation part of the bytecode as following:

608060405234801561001057600080fd5b506040516020806100fa833981016040
525160005560c7806100336000396000f300

Apparently the code is longer than the previous one since we defined a constructor function there. So let’s disassemble the opcode into more readable codes:

0000    60  PUSH1 0x80
0002 60 PUSH1 0x40
0004 52 MSTORE
0005 34 CALLVALUE
0006 80 DUP1
0007 15 ISZERO
0008 61 PUSH2 0x0010
000B 57 JUMPI
000C 60 PUSH1 0x00
000E 80 DUP1
000F FD REVERT
0010 5B JUMPDEST
0011 50 POP
0012 60 PUSH1 0x40
0014 51 MLOAD
0015 60 PUSH1 0x20
0017 80 DUP1
0018 61 PUSH2 0x00fa
001B 83 DUP4
001C 39 CODECOPY
001D 81 DUP2
001E 01 ADD
001F 60 PUSH1 0x40
0021 52 MSTORE
0022 51 MLOAD
0023 60 PUSH1 0x00
0025 55 SSTORE

0026 60 PUSH1 0xc7
0028 80 DUP1
0029 61 PUSH2 0x0033
002C 60 PUSH1 0x00
002E 39 CODECOPY
002F 60 PUSH1 0x00
0031 F3 RETURN
0032 00 STOP

We can see some similar code set at the start and end. But the code set between 0x12 and 0x25 are new. So let’s focus on this new part. First, in opcode set 0x12 and 0x14, MLOAD(0x40) was called to get the value from memory at address 0x40. From previous section, we already knew the address 0x40 in memory holds the free memory pointer in EVM. In this case it is 0x80. Then after arranging the stack by using PUSH and DUP, it will have [… 0x20, 0x00FA, 0x80] in stack before calling CODECOPY. So the code will call CODECOPY(0x80, 0x00FA, 0x20). Apparently, this action didn’t show in previous demo bytecode. It has something to do with the new code we put inside the constructor function. It copies the last 32 bytes data from code into the free memory address. It is likely the parameter value during the deployment of the contract. Let’s keep going on the later bytecode.

In the instructions set of 0x1D – 0x21, the code added 0x20 to the current free memory pointer 0x80, and save it back to the address 0x40 by using MSTORE(0x40, 0x80+0x20).

Then the instruction at 0x22 will push the value returned by MLOAD(0x80) into the stack, which is the 32-byte value copied from the code. The later code at 0x23, 0x25 will save the value into the storage offset 0x0 using SSTORE(0x0, MLOAD(0x80)). So in summary, the instructions between 0x12 and 0x25 are basically doing some operation like:

SSTORE(0x0,CODECOPY(0x80, 0x00FA, 0x20))

Apparently, during the deployment of a new contract, the initialized parameters are specified at the end of EVM bytecode in the transaction data payload. Then in the process of creation, the constructor function will get the the parameter by using CODECOPY.

So far, we have talked the basics of EVM bytecode, including the three types of data structures in EVM: stack, memory and storage, some regular opcodes involved in the smart contract creation, how constructor parameters were transferred, and the structure of compiled EVM bytecode. In next section we will talk about the runtime part of the bytecode.

Understand EVM bytecode – Part 2

PS, We have published our online EVM decompiler to everyone. Please feel free to use it. Any comments are welcome.

https://www.trustlook.com/products/smartcontractguardian

Trustlook Provides Protection Services to Amber Mobile

San Jose, Calif., Dec. 14, 2018, Trustlook, the global leader of AI-powered cybersecurity, today announced a partnership with global weather forecasting application Amber Mobile. Trustlook will provide their extensive portfolio of security products to Amber, which will allow them to create a safer internet experience for their users.

Trustlook is the global leader in next-generation cybersecurity products which focus on advanced zero-day prevention. Over the years, Trustlook has been partnered with industry leading enterprises such as Huawei, Amazon, and Qualcomm. Their AI-based mobile security engine boasts a malware detection rate exceeding 98.0 percent, which is currently disrupting an industry full of traditional cybersecurity vendors.

Amber Mobile is a mobile application developer. Their most popular offering is the Amber Weather application, which provides users with global weather forecasting. Amber Weather is available in over 30 languages and has been downloaded over 10 million times. Additionally, Amber provides its weather forecasting data API to other applications with more than 1 million calls daily.

The founder of Amber Mobile, Rui Song, recognizing the grave cybersecurity threats facing organizations today, said, “User security issues are growing more and more important as we continue to gain corporate and individual customers worldwide. Trustlook’s reliable cybersecurity technology can solve this problem and build a safer internet for Amble Mobile’s users”

“As malware attacks are ever-growing, we are glad to see that Amber Mobile has paid attention to the current cybersecurity climate, and we believe that our technology is up to the task of defending Amber’s users”, said Trustlook CEO Allan Zhang.

Through the partnership, Amber Mobile will provide a safer internet environment for their users. Additionally, by collecting malware related data from Amber Mobile, Trustlook will be able to further enhance their already formidable AI technology.

About Trustlook

Trustlook was founded in 2013 with the goal of providing security solutions that go beyond the existing tools available today by detecting and addressing zero-day vulnerabilities and advanced malware. Their innovative SECUREai engine delivers the performance and scalability needed to provide total threat protection against malware and other forms of attack. Trustlook’s solutions protect mobile devices, network appliances and the IoT. The company is managed by leading security experts from Palo Alto Networks, FireEye, Google and Yahoo.

Trustlook Offers Zero-Day Protection Services to NewBornTown

San Jose, Calif., Dec. 12, 2018, Trustlook, the global leader of AI-powered cybersecurity, today announced a partnership with global AI service provider NewBornTown. Trustlook will provide their extensive portfolio of security products to NewBornTown, which will allow them to create a safer internet experience for their users.

Trustlook provides cybersecurity support for over 150 million mobile devices worldwide, most of them from ubiquitous brands such as Huawei and Oppo. Having such a widespread presence provides Trustlook a global perspective on the state of mobile security.

NewBornTown is a global AI service provider. In 2013, NewBornTown released the Solo application launcher and was awarded Top Developer and Best App on Google Play. In the past 5 years, NewBornTown has also released a series of mobile apps built upon its SoloAware AI engine. Boasting over 600 million worldwide users, NewBornTown provides apps in categories including as entertainment, fitness, beauty, photography, leisure, gaming, etc.

“ NewBornTown’s products covered several categories, but the common mission of those products is to protect users’ internet security”,CEO of Trustlook Allan Zhang said,“ we are happy that NewBornTown will let Trustlook guard their 600 million users, and Trustlook definitely can cope with the diverse usage scenarios and protection demands with our advanced capabilities in zero-day attack detection and protection”

Through the partnership, NewBornTown will provide a safer internet environment to their users. On the other hand, Trustlook will be able to further develop AI technology using the most up-to-date data, further enhancing their already formidable performance.

About Trustlook

Trustlook was founded in 2013 with the goal of providing security solutions that go beyond the existing tools available today by detecting and addressing zero-day vulnerabilities and advanced malware. Their innovative SECUREai engine delivers the performance and scalability needed to provide total threat protection against malware and other forms of attack. Trustlook’s solutions protect mobile devices, network appliances and the IoT. The company is managed by leading security experts from Palo Alto Networks, FireEye, Google and Yahoo.

Do You Know Where the Internet is Most Dangerous?

Trustlook, the global leader of AI-powered cybersecurity has published an internet security map based on data they collected.

Trustlook provides cybersecurity support for over 150 million mobile devices worldwide, most of them from ubiquitous brands such as Huawei and Oppo. Having such a widespread presence provides Trustlook a global perspective on the state of mobile security.

Based on data collected during September 2018, Trustlook has discovered that China has the largest quantity of malware in the world, and that regions such as Africa and Oceania have the highest mobile infection rates.

China has the largest quantity of mobile malware in the world.

Trustlook collects mobile security data during the process of protecting user devices, scanning their phones or IoT devices for malicious applications and files. For data collection, different applications count as different samples but the same application in different devices count as the same sample; the resulting “malware count” of a region refers to the number of unique malware endemic there.

According to the data, China’s malware count is the highest, followed by the United States, Canada, Indonesia, and Brazil. In the following table, countries and regions are sorted by their malware counts.

The most obvious caveat is that these regions have higher counts because there are more users and applications. Markets such as China and the United States are frankly much larger than other sampling regions, motivating more malware diversity and development.

Therefore the data for malware count in each region is not as meaningful as it would appear at first glance. A deeper analysis of the data was required, to see if the regions with top malware counts are actually as dangerous as they seem.

Africa and Oceania have the highest malware concentration.

When discussing whether a region’s internet is safe, it makes more sense to measure the ratio of malware counts to data samples rather than the malware count. This way, we can better quantify the malware concentration of a particular region.

According to Trustlook’s analyses, the malware to sample ratio is highest in the Solomon Islands, followed by Palau, Haiti, and Burundi.

Surprisingly, China which has the highest malware count isn’t even in the top 30 when using the new metric. We can also see from the above table that there are no North American or European countries within the top 30 and only one country, Afghanistan, from Asia.

Beijing, Chengdu and Guangzhou cultivate the most malware samples

There is big differences between different cities in China in malware counts. Beijing, Chengdu and Guangzhou, and Shanghai lead the pack in having the most malware in their citizens’ mobile devices.

There are no boundaries inside a country’s internet, which is divided by different languages and cultures. It is hard to say these cities are more dangerous, and the reasons behind the virus number maybe because there are some common behaviors between their citizen, which means a typical group of users and people, and developers should pay attention to.

Trustlook’s mission is to defend every mobile device and everyone’s cybersecurity.

PolySwarm Marketplace Partners With Trustlook to Offer New Zero-Day Protection Services

San Jose, Calif., Nov. 28, 2018, Trustlook, the global leader of AI-powered cybersecurity, today announced the partnership with decentralized threat intelligence marketplace PolySwarm.  Trustlook will provide additional security services to Polyswarm’s platform, which will strengthen their ability to detect and prevent zero-day attacks.

Polyswarm is a decentralized security marketplace which provides tools and services that experts use to tailor make anti-malware engines. PolySwarm incentivizes a global community of information security experts to disrupt the $8.5 billion cyber threat intelligence industry, providing enterprises and consumers with unprecedented speed and accuracy in threat detection. 

Trustlook is the global leader in next-generation cybersecurity products which focus on advanced zero-day prevention. Over the years, Trustlook has been the partner of first tier enterprises like Huawei, Amazon and Qualcomm. Their AI-based mobile security engine boasts a malware detection rate of over 98.0 percent. 

“As malware attacks are ever-growing, PolySwarm’s decentralized platform demonstrates a new way to protect the internet,” CEO of Trustlook Allan Zhang said, “Trustlook is happy to support PolySwarm’s growth with our advanced capabilities in zero-day attack detection and protection.”

By joining the PolySwarm platform, Trustlook will be able to train AI models using the most up-to-date attack behavior, further enhancing their already formidable performance. On the other hand, PolySwarm will gain the capabilities and expertise of a reputable and battle-proven vendor like Trustlook.

“We are very excited to have Trustlook join the growing network of PolySwarm’s micro-engines,” said Steve Bassi, PolySwarm CEO. “With a continuous stream of high-powered security engines joining the PolySwarm network, our ability to combat threats and ensure enterprises are properly fortified against evolving malware keeps getting stronger.”

About Trustlook
Trustlook was founded in 2013 with the goal of providing security solutions that go beyond the existing tools available today by detecting and addressing zero-day vulnerabilities and advanced malware. Their innovative SECUREai engine delivers the performance and scalability needed to provide total threat protection against malware and other forms of attack. Trustlook’s solutions protect mobile devices, network appliances and the IoT. The company is managed by leading security experts from Palo Alto Networks, FireEye, Google and Yahoo.

About PolySwarm
PolySwarm is the first decentralized marketplace allowing security experts to build anti-malware engines that compete to protect consumers. Providing enterprises and consumers with unprecedented speed and accuracy in threat detection. The PolySwarm market runs on Nectar (NCT), an ERC20-compatible utility token. For more information, please visit PolySwarm.io.


Trustlook Announces New Security Solution For Zero-Day Attacks

San Jose, Calif., Nov. 12, 2018, Trustlook, the global leader of AI-powered cybersecurity, today announced the release of Revere, a new kernel-level security solution which provides efficient and reliable security protection for Internet of Things (IoT) devices.

Today’s IoT devices like smart door locks, webcams, smart speakers, drones, and cars, which run on Linux or Android operating systems, are vulnerable to zero-day attacks, enabling hackers to simply access users’ privacy and life safety.

Current evidence shows that the number of IoT device attacks is overgrowing. According to the Kaspersky Lab IoT report, the number of malware detection for IoT devices in the first half of 2018 was more than triple the amount of IoT malware seen in the whole of 2017, and in 2017 there were ten times more than in 2016. A recent F5 Networks report suggests that IoT devices have become the number one attack target on the Internet, surpassing the total amount of attack to web and application servers, email servers, and databases.

The most reliable security solutions are built into the operating system. “Trustlook has discovered in practice that putting the security module in the kernel is faster and more responsive than not using kernel. It is difficult to hide things from the kernel,” said Trustlook CEO Allan Zhang.

The new Revere solution can protect the system from the foundational layer: When a program makes a system call to the kernel, the Revere module can collect the behavior data of the program. Based on newly input data, a built-in AI model, which has been well trained on a large amount of training data samples, will make accurate predictions of various types of abnormal behaviors, such as privilege escalations, malware downloads, DOS/DDOS network attacks, brute-force password cracking, system file tampering, and privacy data theft, thereby preventing various types of zero-day attacks.

Key benefits of the new Revere solution include:

  • Secure and fast: Revere is more secure and response faster than traditional security engine, especially for time-sensitive applications, such as smart speakers that contain sensitive data or cars that involve personal safety.
  • Compatible: Revere applies to most Linux-based IoT devices as its security examination will be finished in kernel.
  • Intelligent: Trustlook Security Lab collects all types of IoT device attack behavior data to train AI models and upgrade remotely to maintain its predictive protection against the latest attacks. Revere’s zero-day attack detection and prevention is beyond the capability of most traditional signature-based security engines.
  • Efficient: Revere’s on-device detection model consumes a relatively small amount of resources and delivers stable performance. For example, on an IP camera running embedded Linux, Revere consumes less than 1% of CPU capacity in standby mode, less than 3% during most active operations, and occupies at most 5MB of memory.

Trustlook currently provides an SDK-based solution for Revere, while developing a cloud service platform, which allows vendors to monitor the system security in real time. In the future, Trustlook will provide customers with a full-stack IoT security solution from devices to the cloud.

About Trustlook:

Trustlook is the global leader in next-generation cybersecurity products based on artificial intelligence. The company’s innovative SECUREai engine delivers the performance and scalability needed to provide total threat protection against malware and other forms of attack. Trustlook’s solutions protect mobile devices, network appliances, and IoT. For many years, Trustlook has served Huawei, Amazon, Qualcomm and other leading hardware and software vendors.

Find out more at: trustlook.com

Black Hat 2018 is a Wrap!

Black Hat Las Vegas seems to get bigger and better every year. This year was no different. Trustlook was thrilled to be a part of the show, and would like to say thanks to all those who stopped by our booth at Innovation City. There were some great conversations and a lot of shared learnings on the future of cybersecurity for IoT devices.

IMG_20180808_101233448

Black Hat was also an opportunity for Trustlook to announce our latest product, SECUREai Core Detect. This product allows IT administrators to quickly see what IoT devices are on their network. In addition, sophisticated algorithms continually analyze communication to and from every device, instantly identifying anomalies and suspicious network behavior.

To learn more about SECUREai Core Detect, please click here. You can also contact bd@trustlook.com to schedule a demo.