Proxy Residential services are fast becoming a hot topic in the cybersecurity space. This paper provides insights into how host analysis can be used to understand the operational mechanisms of unlimited residential proxies, which are becoming an important tool for bypassing restrictions, data scraping, and other network activities as the need for privacy and anonymity continues to grow. However, these Proxy Residential are often dynamically assigned, which makes accurate host analysis of them a challenge.
1. Host Analysis Challenges for Residential Proxies Services
One of the biggest challenges in performing host analysis for residential proxies services is the dynamic nature of these IPs. Residential IPs are usually reassigned frequently, which means that after capturing a residential proxies IP, the analysis must be completed before the host moves to another IP, otherwise the results will become invalid. To address this problem, we design a real-time analysis system that can perform host fingerprinting immediately after capturing a new residential proxies IP, and measure relay time (as the period of a residential proxy), as well as detect when a host goes offline or its IP changes.
2. How the real-time analysis system works
As shown in Fig. 2, our real-time analysis system consists of three main modules: a host fingerprinter, an IP activity checker, and a relay time analyzer. These three modules work together to analyze each captured residential proxies.
2.1 Host Fingerprinter
The host fingerprint reader is the key module in the real-time analysis system. It identifies device type and vendor information by sending various probe requests to the captured residential proxies IPs. The probe requests include common TCP/UDP ports such as HTTP (port 80), SSH (port 22), Telnet (port 23), HTTPS (port 443), RTSP (port 554) and UPNP (port 5000). Once the response is received and the banner is captured, the system uses the Nmap Service Detection probe list to identify the device type and vendor information.
3. the use of sticky and semi-sticky gateways
In practice, residential proxies service providers typically offer sticky and half-sticky gateways to ensure that clients can consistently use the same residential IP address. Our real-time analytics system utilizes this feature to perform detection by both external fingerprinting (outsoleFP) and internal fingerprinting (insideFP).
3.1 External fingerprinting (outsoleFP)
External fingerprinting is the process of confirming the identity of a host by sending a probe request and capturing the response banner. If the system sees the same IP again after the first probe, we can be sure that the banner belongs to the same residential proxies host. This method is very effective, especially when dealing with situations where IPs are frequently reassigned.
3.2 Internal Fingerprinting (insideFP)
Internal fingerprinting goes a step further by taking advantage of the fact that some residential proxies service providers do not filter client access to the target IP to directly identify the host by looping back to probe requests at address 127.0.0.1. This approach is particularly effective in identifying those residential proxies hosts that are in private networks. Research has shown that all three residential proxies service providers, Proxies Online, Geosurf, and ProxyRack, allow this type of probing.
4. High-performance host analysis systems
To enable efficient host analysis across a large number of IPs, our system performs hierarchical processing of residential proxies. The system does not initiate internal fingerprinting (insideFP) unless external fingerprinting (outsideFP) shows the router or NAT. This strategy effectively reduces the time and resource consumption of the analysis.
In terms of performance, our system running on an Amazon EC2 instance with a bandwidth of 60 Mbps, 1 GB of RAM, and a single-core CPU of 2.40 GHz is able to analyze 800,000 IPs per hour with a fingerprinting time of 63.57 seconds per IP. Overall, we successfully obtained banners from 728,528 IPs and identified device type and vendor information for 547,497 of them.
5. dataset and findings
To support our study, we utilized a variety of datasets, including PUP traffic data, passive DNS data, IP geolocation data, and publicly available proxies. With these datasets, we are able to more fully characterize the ecosystem of residential proxies and identify a large number of residential proxy hosts associated with IoT devices such as webcams, DVRs, and printers.
5.1 PUP Traffic Data
Our PUP traffic data comes from a leading IT company and covers suspicious traffic collected from customer devices between June 2017 and November 2017.1 The PUP traffic data is a collection of data from a number of residential agents. This data not only helped us identify the use of PUPs as residential proxies, but also revealed hidden infrastructure components within residential proxy services.
5.2 Passive DNS and IP geolocation data
We also used passive DNS data from 360 Netlab to identify Fast flux activity on residential proxies IPs. Meanwhile, IP geolocation data provided by IP2Location helped us to obtain geolocation information of residential proxies including country, city and ISP.
6. Conclusion and future outlook
By 0 analyzing the hosts of residential proxies services, we have uncovered the complex operational mechanisms behind these services and revealed their potential impact in the cybersecurity domain. Although our research has revealed many important findings, there are still many questions that need to be further explored. For example, how to more effectively identify residential proxies hosts that are difficult to detect, and how to cope with the dynamics of sticky and semi-sticky gateways are important directions for future research.
In conclusion, understanding and analyzing residential proxies will become increasingly important as cyber attacks become more sophisticated. Our research provides valuable insights into this area and lays the foundation for future security measures.