VirusTotal (简称 VT), 是谷歌旗下一家免费提供可疑文件扫描服务的网站. VT 上有超过50家反病毒引擎提供实时扫描服务. 我们每天收集用户上传到 VT 的 APK ...
There are more and more unethical hackers tampering legit mobile applications. By injecting Ads or virus code, and then publishing the repackaged applications on third party App stores, hackers can deal great damage to users and App developers. Therefore, the market of mobile applications protection (aka, packer or jiagu) is booming. More and more mobile application developers choose to use the pack service provided by the App protection vendors to protect their applications. However, are these “App Protection” service really trustworthy as they claimed to “Protecting Mobile Security for Users”?
One day, we accidentally discovered some abnormal network traffic during regular sample audits:
After initial investigation, we found those traffic was sent to a well-known App protection company’s server. We are surprised because we were not informed that our data would be collected by this company. We then realized that an “inside job”, stealing user’s information secretly, is being done by this App protection service provider which is supposed to provide security services to developers and users.
Not all users can know whether the App they installed has used any protection service or not. Most of them aren’t even aware that their personal data has been stolen. App developers may also lack knowledge on the protection service they choose or overlook the agreement on end users’ privacy for using it. The lack of knowledge from both users and developers or sometimes forced authorization, leads to vulnerabilities in user's privacy security.
In order to better understand the App protection service, we conducted a research of six popular App protection service providers on the market. We programmed a “hello world” application with all permissions granted but has no related functionality. We then submit it to these six providers and observe the application’s behavior after packing. At the end, we found at least two of them collecting user’s privacy data.
When submitting our application to the first provider’s protection service, we noticed there was a data collection service toggle button. Although it said that some data will be collected and analyzed, it did not show more details. Then in its user agreement, we found that it says it will collect and not limit to “SDK or API version, platform, timestamp, application ID, application version, open UDID, iOS IDFA, MAC address, IMEI, IMSI, manufacturer, OS version, session start/stop time, locale, timezone, network status, location, gender, age, browser pages, errors, IP address, and so on”. (By the way, the data collection service has been removed from the provider’s website after we published this blog.)
We analyzed the network traffic and found it collected the following device information: Wifi_ssid, link speed, signal strength, network status, IP address, WiFi adaptor MAC address, OS version, device model, longitude and latitude, android_id, IMEI, IMSI, phone number, CPU model, CPU load, remaining power, screen DPI, memory usage, etc.
We are surprised by the amount of data it collects. And we are even more surprised when we found that most of the data was transmitted in plain text. That means not only the pack provider can get the information, but also any attackers in the middle and unsafe Hot-Spot creators or phishers are capable to do so.
What about the second protection service provider? In their website, the “jiagu data analyzing” and “crash log analyzing” is checked by default, but the only place we were able to find what information they collected lies in their user agreement, it says “application anti-piracy” is included in the “basic service”, and cannot be unchecked. The user agreement also obscurely stated they will collect something based on this function.
In the process of analysis, we encountered lots of challenges to anti reverse engineering.
We decompiled the manifest file and found the entry point has been modified to the packer’s entry:
As we can see, the original code has been hidden by the packer, we can’t easily reverse back the original code. The abnormal network traffic we noticed is not sent by the decompiled code. It only contains a simple piece of code to load a .so library based on CPU architecture. Therefore, we have to look further into the dynamic library “libjiagu.so”.
In “libjiagu.so”, we still can’t find any suspicious logic, and the library has done quite a lot of work to hide its logic, as you can tell from the control flow graph below.
After we dive into the code, we notice the data segment is unusually larger than the code segment. Then we decided to use a debugger for further analysis, but the progress is not smooth at all. The application constantly exits after we attach a debugger. So there must be some anti-debug functions. Eventually, we found these anti-debug functions been applied:
- Detection of IDA ARM debugger’s default port: 23946
- Code execution time check (anti-break-point)
- Debugger attachment detection
- Parent process detection
- File dump detection
After we bypassed these detections, we found the data segment was decrypted and decompressed to another dynamic library. This library uses a customized linker to hide it’s linker related information. By using this method, the loaded library will not show up in the modules list.
We then dumped the library from memory. However, it’s not loadable by IDA. Because the original soinfo data was modified.
After we reverse engineered the self-loader, we finally collected the real soinfo and put it back to the dumped library. now it can be loaded by IDA. From there, we found the logic in the library how it sends out the request with user information:
The request uses HTTP, but not in plaintext, it is encrypted with RSA, so the man in the middle can’t decrypt it directly from traffic. However, we can still find out the data they collected in the code:
We can clearly see the data they are trying to get from the picture above, but the data they are collecting is much more than what we observed. After setting up some hooks in the system, we find out a bigger range of collected user data which violate their claimed user agreement. They said they only collect md5 hash of some data, but the real data collected was: IMEI, IMSI, android_id, phone number, ICCID, DNS_IP, installed applications and version numbers, OS version, device model, WiFi MAC address, build version, kernel version, screen DPI, wifi_ssid, link speed, signal strength, IP address, etc.
In addition, each time the application is launched, under specific conditions, the package name, version number and other information of all installed applications on user’s phone are uploaded:
At this point, we've reached the end of audits on these demo samples. Then we collected more samples packed by these two vendors. According to the popularity of the downloads of these samples, we estimated that about at least 25 million users are exposed to such potential privacy leak risk. The top 10 countries and number of users affected by these services are:
- China: 25547405
- United States: 77488
- Australia: 20675
- Canada: 20186
- India: 19159
- United Kingdom: 16640
- Japan: 15519
- Germany: 13566
- Italy: 12215
- Thailand: 11320
We also plot a world map showing the distribution of affected users below:
We believe that when collecting personal data, any organization or individual should be legal, reasonable, and with user’s consent. On top of that, user's personal information should be secured, and the transmission should be encrypted. Meanwhile, we appeal to all developers to enhance their security. The developers should research and understand what data will be collected by any third-party SDK, code or plugins they are going to use in their application. The developers should select a reliable third-party service provider and take full responsibility for user’s data and privacy security.