VirusTotal (简称 VT), 是谷歌旗下一家免费提供可疑文件扫描服务的网站. VT 上有超过50家反病毒引擎提供实时扫描服务. 我们每天收集用户上传到 VT 的 APK ...
In recent years, there have been many data breach events even from some very well-known enterprises, and the security of personal privacy has been attracting people's attention more and more. How to effectively protect personal information has become a serious problem to security engineers. End point users should be educated with some basic concepts to protect their privacy too.
At the 3.15 party in 2019, Chinese CCTV reported the phishing issue based on the free WiFi hotspots to steal user privacy. As long as the user's mobile phone has enabled the WiFi function and searches for the wireless network around it, it will be restored to the user's portrait by "some means." Your name, mobile number, age, income range and other personal information will be collected by the third party. Is this process a bit sensational? So how does the WiFi hotspot reveal personal information?
When WiFi is actively scanning, it will actively send its own device information to ask if there is a WiFi hotspot nearby. This device information contains a unique device identifier (MAC address) of the WiFi network adapter in the mobile phone. The naming rule for this identifier is determined by the major netword adapter manufacturers. However, only one MAC identifier is not enough to retrieve the identity of the user. So who provides the mapping of MAC and Personal information of the users?
Today, we will set up a network experiment to collect and analyze network packets passing through the router to see what data can be collected in the open network environment. We have developed a customized packet analysis tool to help the analysis.
We first create a WiFi hotspot to share the network to the target device, and there is no additional modification on the target device.
Then we work on the target device, a desktop, to start a chat software QQ, and log in to a non-existent account.
After clicking Login, our analysis tool has obtained the QQ account number that the target wants to log in to.
The QQ account can be retrieved from the traffic is only because the traffic was not encrypted or encrypted by symmetric algorithm (in this way, common the key was hard-coded in the software itself).
Similarly, if we use an Android phone to open the APP QQ and repeat the above operation.
We also got the QQ number of the target, since the traffic between the client and server is the same.
Besides IM software, when you use browsers to visit your favorite websites, such as Baidu:
The websites user want to visit can be retrieved too from the traffic:
If you have been familiar with sniffing tools such as Wireshark before, you are not surprised by these results. However, during the test of some APPs in our phone, some interesting traffic attracts our attention. For some reason, we have removed the APP names in following discussions.
This data comes from an APP in our phone, and one of his network requests has caught our attention.
Although the HTTP request data has been encrypted, the HTTP request itself does not use SSL connection. We analyzed the APP and learned that the request was initiated by a SDK for AD analysis. The encryption is using AES symmetric algorithm and the key is hard-coded in the APP. From following code we can see the secure key was encoded in BASE64 format:
Then we can use the same secure key to decrypt the traffic we caught and the data payload sent to their server can be seen in plain text now:
The decrypted data can be clearly seen to include but not limited to: user's IP, country, language, phone model, carrier, various device identifiers, etc.
This means that all users who use an APP integrated with SDK will upload their information to the target server. We do not judge the legality of the data collection here, However it is worth noting that this network request occurs on the first launch, and there is no prompts at all for the endpoint users that the data has been uploaded.
We assume that when this happens in the public network. Since the data collection party does not reasonably use HTTPS to encrypt the traffic, the data in the traffic is obvious to anyone on the same network. The WiFi owner or hacker can easily obtain the data by same tricks. The collected data can be stored, analyzed even sold to other third parties for profit.
In fact, the data collection behaviors can be found in many similar advertisements, behaviors, and statistical analysis SDKs. They are collecting personal information all the time. When you open the APP with these SDKs, all the private information which describes the user will be uploaded to their server for analysis.
Since the privacy issue getting more attention, more privacy related act and laws was in production such as the EU General Data Protection Regulation (GDPR) . Also the platform vendor has brought more features to enhance privacy protection too. Such as the high version of the IOS system, when the mobile phone actively searches for WiFi, it will use random device information (mac). And the high version of the Android system (starting with 8.0) manufacturers can also customize this feature. More and more websites are beginning to use secure connection Https, so the content transmitted is encrypted and cannot be monitored to access the content, and we should also be careful not to connect to the unknown public WiFi hotspot to avoid unnecessary disclosure of personal information.