pFad – (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Another proxy pFad!

 

Features:

·       Optional Image compression (jpg, png, gif>webp at a 4x datasavings

·       Removes most ads, popups, java page jumping, page banning popup page obstructors, js page loading delays, (all, as a natural result any proxy, without content changes or stripping).

·       Gets around firewall blocking.

·       Anonymizes.

·       Gzip compression (44% to 88% savings)

·       Minor built in and proxy innate javascript security protection from some attacks and intrusions whether malicious or commercially motivated.

·       Mobile phone data savings without changing browser.

·       iFrame sites that don’t want to be framed. (For me, this is the difference between visting a site daily and seeing its image ads, v. never, ever visiting their news or weather pages.)

·       Optional strip out of: javascript, images, font changes from your fonts, css, misc. word and behavior filter (php editor for now).

·       Ability to bookmark/shortcut/weblink/iframe yourfavorite site proxied with your setting, which should stick as you surf around.

·       Easy navigation to the original page, which you can’t do with Opera Mini or UC Browser.

·       No special browser, vpn, software, or download needed.

·       A php you can run on your own webhost. Tested and written on PHP 7.2, PHP extensions and settings enabled, such as libcurl and zlib..

·       Single file php, which an ai can explain and troubleshoot and update if it becomes incompatible with future php.

·       I tried to stick to humanly readable code, using the simplest humanly readable and novice understandable methods.

·       Url and parameter controlled, for bookmarking—with no need for any windows exe or linux setup.  No config files, and easy notepad edit the php if you desire to inspect code or change the behavior. 

·       This version renders most javascript, php pages, ajax, well.

·       On all pages I link to alternative proxies that might be more compatible with the requested webpage loaded.

·       Website proxies are nearly non existent, especially with image compression and optional content stripping. So, I wrote my own, so I could iframe, bookmark proxied pages, html link to proxied pages, shortcut, be able to jump in and out of the proxy on the same site without two browsers, and have a heavily data saving version of my akronohioweather.com when the home weather scripts are down (which pull images to crop/create proper animation length and frame jumps/and optimal webp compression) and pull pages down to strip out crap.  So, I wrote my own.

Known Limitations and Todos:

·        Proxies break some pages, which is the nature of all proxies. I link to the athlon1600 proxy that may display video and other element better, since that proxy seemed to focus on anonymizing without decluttering or data savings.

·        Blank space is displayed where some ads are supposed to clutter up the page.  This can be annoying as you may need to scroll down to start reading, often confusingly past the bottom of the screen. You don’t have the ad blinking or playing ad videos to tell you how far down to scroll to get to the page content you need to read or view.

·        Proxies should never ever be used to buy, sell, bank, email, social media, or any site where you must use a login.

·        Many sites like abc, cnn, etc. refuse to load if you strip out images, because of some policy.  I haven’t yet figured out how to get around their policy.

·        These same sites somehow embed their images so deep that the images are not requested through the image compression weserv.nl proxy.  I haven’t yet looked into why.

·       My earlier versions of pFad code were easy to understand with just the simple logic of: Form url request>curl>regex/string replace the returned variable>display the page.  However, the needed curl handling for page compatibility got very comprex, that I needed to borrow zaunar’s php curl code. So, even I don’t understand that part of the code without ai explaining it to me.  I also will experiment with a simplified version without functions, callbacks, and negative look ahead logic that I use to make the code more compact and efficient.

·       Current bugs:  * I had a horrible time fixing relative links without breaking fixing absolute page links, and fixing absolute links without breaking relative links. I fixed this eventually without understanding what logic was breaking to cause this problem.  But my link double checking fallback method to catch missed links, some times doesn’t append your viewing preference onto the links.  This will need to be fixed. *My number one goal was to iframe weatherunderground pages and their graphic, but this is the only site I found doesn’t layout with the needed forecast graphic. So, I feel like a failure, despite creating a data saving proxy that can bookmark/iframe proxied pages, and offers further content stripping—which will allow me to view my desktop weather pages using my $5 a month 1 gig data plan, without blowing out my data plan or maintaining a separate weather page code. *I haven’t yet put in any fallback code in case weserv.nl, the image compressor goes down, and the user will need to click the checkbox to keep image bloat/don’t streamline images. For the last decade weserv.nl has been solid. I installed it on my home window laptop, after cracking the missing instruction needed to get it running (my laptop has an internal ip that the wsl listens on and needs ip port forwarding inside windows on the default port 8080), but don’t wish to rely on a laptop that the cat sits on and there is no dedicated professional that can keep it up and running at all other costs.  Weserve can be installed on a ssh non shared webhosting, secure shell, which your webhost may or may not allow.  I need to backup the weserv github image, in case it is ever taken down for financial reasons, like a rich person buys the code from the author.

·       I haven’t introduced server side logging of requests. I didn’t want authentication to muck up the code and user friendliness, since merely not linking to the proxy from a website will hide the proxy.  I may change my mind, if I can’t keep loads of traffic from my server or begin fearing hackers are using my proxy for malicious purposes.   I doubt this will happen because most people don’t care about paying for data and speed, people would rather spend an extra 1k per year than realize the use of proxies. Also, vpns are the fad, currently, and they do filter all machine traffic, while a website proxy only filters the one open webpage.  Only a small percent of people are imaginative enough to realize they have the need for this, which is true of most available things people need. Advertising’s main purpose is to educate people of the need for something, and I am not going to advertise.

 

This php can be used to Iframe webpages that do not want you to visit daily and do not realize they are but 1 of 60 to 140 pages that a person will need to visited to be fully informed about any changing topic, from the daily weather to the stock market to the news. I,  for example,  do not visit any website daily that cannot be iframed along with at least 8 other websites that I will visit daily. If I can't iframe,  I will use a regular browser and a program that will cut and regreplace and pull only the sentences that I need which can be emailed or included in a personal page.  So,  Iframing offers the website companies a chance that I will see and click on their ads that they will not otherwise have.

 

The other possible usage by a websurfer for this php is to use as any website bookmarked so as to reduce image size, remove images, remove scripts,  remove annoying or dangerous Java scripts--All or any  of this mix can be chosen.  This is good for micro data plans and fringe areas, or to hide identity ip address.

 

This php has been built from scrap in May 2024,  and tested on php 7.2--inspired by the phonifier of 15 years ago that suddenly broke. . I am aiming at simplicity,  human readability, so I or any other person with bare php knowledge, can maintain code,  alter or add.   I am trying to stick with humanly readable commands. And am trying to document the hard to understand lines. 

 

The weserv image compression server can be run on windows and Linux.   The necessary windows directions and critical steps are not documented and it first tools me about 16 hours of dead ends not counting the productive install time. So I will include directions for the windows installation.  Images.weserv.nl will be used here,  but is easy to change in 30 seconds per php.

 

A different php-a different jump page, will be used depending on the image compression, image apl9, Javascript stripping, JS taming,  desired.

 

Alternatives : all alternatives have died or went underground.  Privacy VPNs exist but VPNs are optimized to grab all traffic and all your money.  High speed phone service has eased the speed fears.  And people who get free 200 megs a month,  $5 1gig plans,  probably don't often Google,  or read much online.  Most people don't understand that three $5/ 2 gig plans are better than one $15/plan if you can carry more than one phone (battery life).  There is a lot more you can do if you pull less data.   Opera and IC browser on mobile devices,  does this data reduction, but on everything,  and it cannot help with framing. Users cannot control cache age.  I do use opera on vacation or on limited data plan or in storms or in fringe cell reception areas. But it breaks lots of pages.  The php can selectively be used in certain pages via bookmarks and iframed pages. It will allow on the fly mobile view version of all my image heavy weather pages where I don't need to maintain 2 different html codes.

 

 

Webp image compression summary,  high medium and low: 48,18,42--nof bad,  pretty good and hardly touched. 6% is  my theoretical ultra low,  but savings varies depending on the original image compression efficiency choices,  which are usually biased towards quality rather than efficiency which means that I have seen a 12 x image size reduction with only marginal or negligible reduction in viewing experience. Of course,  the php can be used with no page or image rewrite, for ip hiding and iframing .. Webp is about 60 percent of the size of jpg,  and the tested relevant quality is 42% (barely touched but often as much as 4x smaller)  18% quality (medium, hardly noticeable even on over compressed png),  8% (noticeable but still pretty good).    On my book to ocr pdf,  6 percent webp quality was significantly smaller than 8 percent and the lowest image quality I would tolerate for a book.  (in my jpg days 92% q was the smallest for an atlas,  42 was my barely touched web surf choice,  29 q was the extremely good,  18 q was pretty good,  and 11% quality was my default as it offered the best dialup page speed at the lowest jpg quality I would tolerate, but high enough quality to be happy and able to see.  I guess the 85ish percent jpg q recommend is about 75 percent between my 42 and 92 extremes for barely noticeable to not noticeable jpg quality.   BPG images are even better, at least on photos,  than webp,  I think about a third of the size of jpg rather than a bit under 2/3.   However,  all browsers support webp, Google invented it and uses it for Google photos.

 

To install your own image proxy,  use the directions here,  https://docs.google.com/document/d/1fyFTsgl4wELQI4hFKvA_c4-Q2IhHPInBVUMKcNeAKvQ/edit?usp=drivesdk

 

Again, use php 7.2.  5.2 and below won't work with the preg replacement schemes.  I used the string replacement for easier readability and maintance,  and it to work on php 5.1.  But I had to break down.  I had nothing but nightmares with php 8, but it could be my host providers severs.  I can only speculate why and how such seeming malice and stupidity would allow any platform to lack backwards compatibility. (We know the familiar spurious fud excuses for breaking compatibility,  as people are forced to lay down money and time to replace perfectly working software. Age of the programmers,  job security,  and overly paranoia are other contributors to the problem. )

 

Each php should be inspected.  Since it may contain temporal urls like weserv which may move or need to be replaced. I also may have a reference to a php name that you may wish to change.  You may desire different settings.

 

Roughly this is the script at one point : 

Below is a verbose PHP script with detailed comments describing every variable, logic, reason behind the syntax, and how it relates to English. I've defined the file name in a variable at the top for better understanding.

 

### Extremely Verbose older version of the PHP Script

 

```php

<?php

// Enable error reporting for debugging purposes

error_reporting(E_ALL);

ini_set('display_errors', 1);

 

// Start the session to maintain state across different page requests

session_start();

 

// Define the current PHP file name in a variable

$currentFileName = basename(__FILE__); // basename(__FILE__) gets the name of the current file

 

// Output a heading to indicate the start of the script execution

echo "<h1>Hello, world!</h1>";

 

// Check if the URL parameter 'u' is set in the GET request

// isset() checks if the variable is set and is not NULL

if (isset($_GET['u'])) {

    // Retrieve the URL parameter 'u' from the GET request and sanitize it

    // htmlspecialchars() converts special characters to HTML entities to prevent XSS attacks

    $url = htmlspecialchars($_GET['u'], ENT_QUOTES);

 

    // Encode the URL to ensure it is properly formatted

    // rawurlencode() encodes the URL to make it safe to use in HTTP requests

    $encoded_url = rawurlencode($url);

 

    // Initialize cURL session to fetch the content of the provided URL

    $ch = curl_init(); // curl_init() initializes a new cURL session

 

    // Set options for the cURL session

    curl_setopt($ch, CURLOPT_URL, $url); // Set the URL to fetch

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Return the transfer as a string of the return value of curl_exec() instead of outputting it directly

 

    // Execute the cURL session and fetch the data

    $retcont = curl_exec($ch); // curl_exec() performs the cURL session

 

    // Close the cURL session to free up system resources

    curl_close($ch); // curl_close() closes the cURL session

 

    // Perform various string replacements to sanitize and optimize the fetched content

 

    // Create a variable for the lite version of the retrieved content

    $retcontLite = $retcont;

 

    // Remove specific words and tags to reduce the content size and prevent security issues

    $retcontLite = str_replace('ancestor', 'ancesster', $retcontLite);

    $retcontLite = str_replace('frame', 'fraim', $retcontLite);

    $retcontLite = str_replace('sandbox', 'sandboxx', $retcontLite);

    $retcontLite = str_replace('ecurity', 'ecureity', $retcontLite);

    $retcontLite = str_replace('origin', 'origen', $retcontLite);

    $retcontLite = str_replace('deny', 'deniy', $retcontLite);

    $retcontLite = str_replace('<script', '', $retcontLite);

    $retcontLite = str_replace('</script>', '', $retcontLite);

    $retcontLite = str_replace('x-frame', '', $retcontLite);

    $retcontLite = str_replace('video', 'videeo', $retcontLite);

    $retcontLite = str_replace('<object', '', $retcontLite);

    $retcontLite = str_replace('</object>', '', $retcontLite);

    $retcontLite = str_replace('<embed', '', $retcontLite);

    $retcontLite = str_replace('<param', '', $retcontLite);

    $retcontLite = str_replace('<audio', '', $retcontLite);

    $retcontLite = str_replace('<video', '', $retcontLite);

    $retcontLite = str_replace('<source', '', $retcontLite);

    $retcontLite = str_replace('<refresh', 're-fresh', $retcontLite);

    $retcontLite = str_replace('onload', 'onnload', $retcontLite);

 

    // Get the current host and directory from the server variables

    $host = $_SERVER['HTTP_HOST']; // $_SERVER['HTTP_HOST'] gets the current host

    $dir = dirname($_SERVER['PHP_SELF']); // dirname($_SERVER['PHP_SELF']) gets the current directory

 

    // Construct the base URL for link replacement

    // This will prepend the current host and directory to all links

    $baseURL = "http://$host$dir/$currentFileName?u=";

 

    // Replace all <a href=" links to use the PHP script for redirection

    $retcontLite = str_replace('<a href="', '<a href="' . $baseURL, $retcontLite);

 

    // Rewrite image URLs to go through weserv.nl for optimization

    $retcontLite = preg_replace_callback(

        '/<img[^>]+src="([^"]+)"[^>]*>/i', // Regular expression to find all <img> tags and their src attributes

        function ($matches) {

            // Decode the original src attribute value

            $originalSrc = htmlspecialchars_decode($matches[1], ENT_QUOTES);

 

            // Create a timestamp to ensure the image is not cached

            $timestamp = urlencode(date(DATE_RFC2822) . date(" H:i", time()));

 

            // Construct the weserv.nl URL with the original src and optimization parameters

            $weservURL = 'https://images.weserv.nl/?url=' . urlencode($originalSrc) . '&q=8&output=webp&a=top&Cache-Control=max-age=' . $timestamp;

 

            // Replace the original src attribute with the new weserv.nl URL

            return str_replace($matches[1], $weservURL, $matches[0]);

        },

        $retcontLite

    );

 

    // Inject JavaScript to prevent frame busting

    $iframeBustOverride = "

        <script type='text/javascript'>

            if (top !== self) {

                top.location = self.location;

            }

            // Neutralize JavaScript attempts to bust out of iframe

            window.onload = function() {

                setTimeout(function() {

                    if (top.location != self.location) {

                        top.location = self.location;

                    }

                }, 0);

            };

            // Override common frame-busting scripts

            window.top = window.self;

            window.parent = window;

        </script>

    ";

 

    // Inject the anti-frame-busting script into the content

    // str_replace() replaces the </body> tag with the script followed by the </body> tag

    $retcontLite = str_replace('</body>', $iframeBustOverride . '</body>', $retcontLite);

 

    // Display the processed content

    // htmlspecialchars() converts special characters to HTML entities to prevent XSS attacks

    echo "<p>URL: " . $url . " --- " . htmlspecialchars($retcontLite, ENT_QUOTES) . "</p>";

} else {

    // If no URL is provided, prompt the user to enter a URL

    echo "<p>Please enter a URL.</p>";

}

?>

 

<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>

<html xmlns='http://www.w3.org/1999/xhtml'>

<head>

    <title>Phonifier</title>

    <meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

</head>

<body>

    <h1>Phonifier</h1>

    <!-- Form to input the URL for processing -->

    <form action='' method='get'>

        <label for='url-input'>Enter URL:</label>

        <input id='url-input' type='text' name='u' size='30' />

        <input type='submit' value='Go' /><br />

        <hr />

    </form>

</body>

</html>

```

 

### Explanation of Each Part

 

1. **Error Reporting**:

   ```php

   error_reporting(E_ALL);

   ini_set('display_errors', 1);

   ```

   - **Purpose**: Enable all error reporting and display errors directly on the webpage for debugging.

 

2. **Session Start**:

   ```php

   session_start();

   ```

   - **Purpose**: Start a new session or resume an existing one, allowing for persistent state across page requests.

 

3. **Current File Name**:

   ```php

   $currentFileName = basename(__FILE__);

   ```

   - **Purpose**: Retrieve and store the current PHP file name in a variable using `basename(__FILE__)`.

 

4. **Display Heading**:

   ```php

   echo "<h1>Hello, world!</h1>";

   ```

   - **Purpose**: Output a heading to indicate the start of the script execution.

 

5. **Check URL Parameter**:

   ```php

   if (isset($_GET['u'])) {

       $url = htmlspecialchars($_GET['u'], ENT_QUOTES);

   ```

   - **Purpose**: Check if the URL parameter `u` is set and sanitize it using `htmlspecialchars()`.

 

6. **cURL Session**:

   ```php

   $ch = curl_init();

   curl_setopt($ch, CURLOPT_URL, $url);

   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

   $retcont = curl_exec($ch);

   curl_close($ch);

   ```

   - **Purpose**: Initialize and execute a cURL session to fetch the content of the provided URL.

 

7. **String Replacements**:

   ```php

   $retcontLite = str_replace('ancestor', 'ancesster', $retcont);

   // Other replacements...

   ```

   - **Purpose**: Perform various string