IMO it'd be a really smart move for Google to hire Vitaly to help with the launching of this feature and things around it. He has done a great job with PhantomJS.
Even an acquisition[1] of PhantomJS would totally make sense, then let him keep working on it but based on headless Chrome and with real resources.
I played around with PhantomJS for something at my current job - it ultimately didn't work out for us (and we went a different route), but it was interesting and fun to learn about.
Yeah, I had the same thought. Just because you have a ton of experience that suggests you'd be good at doing a job doesn't mean a tech company would do something silly like hire you for that job.
Wow, I'm very impressed. At this stage it is a very wise decision to step down and to focus onto something else, rather than to hold on to a project that will eventually disappear. It takes a lot of courage to move on from a project that had to be maintained for several years and that had such a reach.
We can only be thankful for all the good work that went into PhantomJS, and wish the maintainers the best of luck in their next endeavors.
Some context, he's essentially been maintaining a web browser (which is a project on the level of an operating system) on his own.
Phantom 2 switched to QTWebKit which I'm sure was a tremendous amount of work. Probably at the end of that he was hoping things would get "easier", and it sounds like it hasn't. It's just too much work for one person and if companies aren't will to pay people to do it, I'd quit too.
He says in his message it's been a slog for some time, looks like a good time to be done with it. Open source is great, we all like it, but demanding unpaid jobs can get old, too.
And it makes sense. You want Chrome in your tests if your users are using Chrome. Very few (if any) of your users will ever visit your app with a headless PhantomJS browser, so it's not a platform that you should go out of your way to support.
I've been using Selenium Driver with Chrome in xvfb for my headless testing needs, and I've used PhantomJS for some automation things in the past where it was great, but since I switched I really haven't looked back!
I had things breaking subtly that I couldn't fix, and they did not manifest as problems in the Chrome browser or Selenium. I still don't know what was wrong, I just know that my Rails app won't pass its JavaScript functional tests if I use PhantomJS. When I did the evaluation of 3 test drivers, I found that the one with the actual browser in it was the one that worked most reliably.
Thanks so much to the PhantomJS maintainer for his hard work over the years! To me, it feels like his decision is the correct one here.
After we realised that we hadn't seen a "one-browser" bug for 2 years in our massive angularjs app, we got rid of all browsers but PhantomJS in our karma suite. PhantomJS' slowness, lagging-behind in webstandards and just my general gut feeling (these facts above made me question the point in running JavaScript tests in an actual browser at all) made me port our karma test suite to jest w/ jsdom and haven't been happier since we, years prior, got rid of our gnarly Selenium test suite that caught 0 bugs but was the major cause for maintenance headache.
I actually started using webdriverIO + chromedriver after fighting too much with casperJS - while webdriver (and Selenium) seem to have much more momentum there are still some things I really miss that PhantomJS gave when using Capybara. I was a very happy Capy+Phantom user when testing my rails apps.
Things like reading the HTTP response code, detecting 404s on assets and catching JS errors in the console are all not possible with Selenium/ Webdriver, and I relied heavily on those capabilities in my Capybara tests.
While headless chrome might be able to replace PhantomJS for many use cases that doesn't necessarily mean the APIs will be comparable. In fact I'd more likely expect the Chrome folks to say "the webdriver API is it, because it's a standard." [1] Sadly IMO it's lacking compared to what PhantomJS was capable of.
You can use PhantomJS as the backend on a Selenium script, and this news clearly demonstrates the utility of a higher-level API than using PhantomJS directly. If your tests are in Selenium, changing backends is generally a small matter.
I started writing in Selenium instead of CasperJS (a PhantomJS frontend) because PhantomJS experienced intermittent bugs on the page I was trying to access. I think you're right that "real browsers" are still much more reliable for complex use cases, but the low profile of PhantomJS is definitely nice when it works.
Open source software has to be one of the least efficient markets out there.
If you sum up the very real value PhantomJS has delivered to very real companies over the last several years, napkin math tells me we wouldn't have the project being abandoned for being a "bloody hell" to work on.
The main problem for developers is that getting your company to pay for something like developer tools can be very hard and a long project. Ideally every team gets a budget and a credit card to do as they see fit, but in practice a lot of (especially bigger) companies have a whole acquisition process. It's not uncommon for the hours spent in getting a license for e.g. an editor to be far more expensive than the product itself. I believe this is often highlighted again when Sublime Text is in the news again.
This as opposed to open source tooling which has no such hurdles.
Why do you think open source tooling has no such hurdles? At these same "larger" companies, they're going to have an open source review board, and legal involved for each open source product you want to use.
Just because you can download and use it on your local machine, doesn't mean you're not violating your corp policy and procedure.
Whats more surprising is the company I work for does have a support agreement with Oracle, although when we've run into problems that do need a technical expert from Oracle such as crash dumps and stack traces we're told to figure it out our-self.
Kind of defeats the whole reason to have a closed source system when you're still on the hook and charging out $500 per hour to clients.
This problem was well demonstrated at my shop the other day. There's a lot of fear of open source here. We needed to run a simple FTP process. I said I'd write a short script.
"You've gotta be careful with those free tools though... Never know what will happen when they break. You can't get any support. Besides, we need logging and alerting too."
"Oh, $currentTool does logging and alerting?"
"Well, yes... But it's currently not working. We've got a ticket in to fix it."
That was months ago. Yesterday it broke and there was no alert. There's also still no logs.
Gotta be careful about that "open source" stuff though.
Open source doesn't allow you to pass the buck. Commercial software with support contracts does. There is also the "look how much we're paying! it must be good" factor.
A good way to address this is have a tools budget.
Product Manager has $X per team member. Is encouraged to coordinate purchases and pool purchases together for licenses. Licenses can be proprietary or Open Source licenses X, Y and Z.
It's a classic "public good" - an economic activity whose benefits are impractical or unworthwhile to deny to those who don't pay. Things like emergency services, last-mile road infrastructure, and environmental protection work. A special-case of positive externalities. These are a classic example used by economists of a natural place for government in the economy.
At some point the falsifiable hypothesis "government cannot provide goods or services better than private actors" mutated into the dogma "government shall not provide goods or services better than private actors."
Dutch government has explicitly funded quite some projects, e.g. libressl and libreoffice, but compared to budgets for closed source, it's still very little.
Here's a nice blog about how the UK deals with this.
As to infrastructure work, easiest way to help and still profit is to hire one or more developers that explicitly work on a set of FOSS libraries. That way you have that knowledge in-house and a connection into the community. Also, you'll have some highly motivated employees.
I think that in the US, we've been fighting for some long about how much to cut government, the idea of proposing to create a new category of spending just doesn't occur to Democrats. The closest they can come to imagining something new is free college (which is the same what we have now [free high school, college scholarships] but more so) and free daycare (which is like free preschool but younger).
You're kind of in a bubble, there is a lot of U.S. government support for open source and most agencies use it. See https://code.gov for a starting point. There's never going to be a huge multi-billion dollar grant program because it is unfair to closed-source companies (who pay taxes) but you are imagining a debate or hangup that doesn't exist.
Governments do fund some things directly – most commonly grants to specific interests – but one other area which helps is allowing staff to work on open-source projects. At least in the United States civil servants’ work is generally considered public domain so we don't have to deal with the IP concerns which many companies still obsess over, which is nice.
If you poke around https://government.github.com/community you'll find a lot of government created projects but checking those organizations/ contributors will often turn up a ton of forks of popular tools. One common thing is improving security defaults or accessibility, which are tedious but mandatory for government.
If you value this, make sure to let your elected representatives know: I'm sure they hear from the major contractors regularly.
The question is reversed: open source software _is_ a positive externality, but happened almost always _without_ any government involvement (save for some rare exceptions like SELinux).
This is probably the reason it worked so good.
So, do we necessarily need governments for other positive externalities in the list?
> It's a classic "public good" - an economic activity whose benefits are impractical or unworthwhile to deny to those who don't pay.
Open source software is not a public good, in the economic sense. There are two criteria for being a public good: non-rivalry and non-excludability. Open-source software satisfies the first criterion (my using it doesn't prevent you from using it), but it's fairly excludable (I can prevent you from using it legally).
As developers, our instinct might tell us that it's not excludable because "if the source is there, nothing prevents me from using it", but when we're talking about goods which fall under copyright law, the legal aspect matters as well as the practicality. And in fact, open-source licenses (such as the GPL and the Apache licenses) can contain provisions which prevent people from using the licensed software under certain circumstances, while still being considered both free and open-source by the FSF and OSI respectively[0]
The real classic example of a public good is national security. Practically, there is literally no way that national security can be applied to people within a country on an individual basis, as opposed to a geographic one. For most threat models (e.g. espionage, (counter-)terrorism), the mitigations are things like "prevent terrorist attacks from happening". You can't apply the benefits of that only to people who have paid for the service - a terrorist attack either happens or it doesn't, and you can't choose who's a victim of it.
> These are a classic example used by economists of a natural place for government in the economy.
Even for things which are actually public goods, like national security, that's overstating the case greatly. Public goods are used as an example of a good for which an individual market cannot exist, but that doesn't mean that the only alternative is a government one.
The so-called "tragedy of the commons" is an appropriate (and ironic) example - despite the way that most people use the term, the town commons was actually something for which there were plenty of well-established codified rights, and these were not always negotiated or enforced by a government entity.
[0] For example, the Apache license contains a patent retaliation clause, which terminates your right to use the software in the event of a patent lawsuit. (Technically it doesn't revoke your right to the copyrighted code, but it does revoke your right to the underlying patents, which amounts to the same thing, because presumably the copyrighted code utilizes the underlying patents, or else it wouldn't be covered by the license in the first place).
No, companies that make billions of dollars in an ecosystem that benefits from said tools and don't pay a dime for the public good are to blame. It's not a problem with open source, it's a problem with a culture that takes and doesn't give back enough.
> Open source software has to be one of the least efficient markets out there.
The regular market rules don't really apply to Open source software. A lot of viable (or even thriving) open source projects would be dismal failures as stand-alone businesses or startups. Paradoxically, the only way they can provide real value is in their current form of open source projects.
I would have given CyanogenMod as an example, but the amount of inept management at the startup there would cloud the issue.
Yeah, I really think that open-source software shot itself in the foot by incorporating unlimited free sharing for every recipient into its mantra. Now everyone thinks that open-source has to mean impoverished, because despite all the happy vibes, very few people will pay for software that they could otherwise get for free.
You can make your software "source available", i.e., not open-source under activist definitions but still have a GitHub repo and all that, and restrict [heavy?] commercial use. I think it'd be interesting to see more open-source devs take that route and stop giving away the farm.
This will still allow people to use your stuff, developers will get familiar with the tooling and expect to be able to use it at work, and companies that have the dough can be compelled to pony up for a license.
On Windows, there is still an underappreciated market for cheap early-90s-shareware-style applications that are < $100 a pop, but I think most of them think that sharing the source means they have to enter the poor house, which is sad. We should show people that there's a way to share your source without bankrupting yourself in the meantime.
The GPL almost gets there, as it makes large-scale commercial use undesirable due to its infectious nature, which allows for dual-licensing, but with everything server-side nowadays, those stipulations are much less effective (have to go AGPL).
There's no shared source license which bans commercial usages but I think the software industry desperately needs one. I'm selling the software and want to enable the user to make modifications for private use and even share those modifications if they so desire = I have to hire a lawyer and hope he comes up with something that stands the test of a trial.
Not giving away something for free (as in freedom) but asking for free legal advice may sound ironic but it's not about the money, it's about having something reliable for a very common use case.
Curious what "moloch" means in this context. I can google it, of course, but I get "Biblical name relating to a Canaanite god associated with child sacrifice". Which doesn't help me much. End users are children that Google is sacrificing? Or?
To give a shorter answer: Scott Alexander (at the slatestarcodex link), through the poem, associates Moloch with negative-sum games, where no one comes out better than they went in. In extreme cases they force us to sacrifice the things we love in order to survive. You throw your children to Moloch to help you defeat enemies; otherwise, you die. Your enemies do the same thing. It would be better if nobody sacrificed their children, but nobody is in a position to bring that outcome to pass.
In this context, I would interpret "google's Moloch" along the lines of: Google is net-bad for the world, because of privacy issues and problems with centralisation and so on. Using Google's software (and services) makes them more powerful, so people don't want to use Google's software. But because everyone else is using Google's software, the world is optimized for Google users in a way that it isn't optimized for non-Google users, and so it's difficult to escape. And so Google grows yet stronger, and it becomes more difficult to escape.
(To clarify: this is my interpretation of grandparent's use of the phrase. It's not my own position, and there's a decent chance that I'm completely off-base and it's got nothing to do with grandparent's position either.)
And while people wait, you can already do a 'poor mans' headless Firefox thanks to SlimerJS and xvfb.
Phantomjs is less resource heavy if you're constantly spooling up and down lots of instances but I prefer SlimerJS w/ Firefox since it lets you keep up to date with a modern version of Firefox (rather than relying on sporadic QTWebkit updates from Phantomjs).
If you're using Casperjs, SlimerJS is virtually a drop in replacement for Phantomjs (though I worry about how long/well Casperjs will continue to be maintained).
One can use Xvfb to run normal Chrome as well (with few gotchas like --no-sandbox, --disable-gpu and dep on dbus-x11). A test I'm working on at the moment takes 28 seconds in PhantonJS and 6 in Chrome under Docker and Xvfb.
PhantomJS enabled us at the time to bootstrap a big project at work where at the end workflow the app had to turn HTML orders to PDF on the fly, eventually we moved to WKHTMLTOPDF (https://wkhtmltopdf.org/) which is much less hungry with resources but nonetheless PhantomJS played a huge role during the early days of the project and was easy to setup. If I remember correctly the only down side was to find the correct format for our HTML template so PhantomJS would render proper page break and repeat the header for super long orders.
I can understand why stepping down is the right decision, maintaining such project by himself is an amazing feat on its own and even more when it proves to be useful for so many companies. Sadly when it becomes your second job you might always be on the lookout for a clean exit and such opportunity just became a reality.
We have used PhantomJS and WKHTMLTOPDF both. Phantomjs hogs a lot of resources but is very good when you want print large PDFs (500+ pages). WKHTMLTOPDF struggles with larger HTMLs.
I did too, then found out you also needed a dev license for users being able to run your app. Supporting Mac/OSX is damn expensive if your app is free.
For iOS development I believe they got rid of that requirement, you can develop iOS apps (at least) using a personal dev account; only when you want to go to the app store do they ask for the license fee.
Is there any way to donate to the PhantomJS project? This seems like a good time to throw some money their way, in thanks for what the maintainers (mostly Vitaly over the past few months, at least, it looks like) have done.
To me, it's always a sad occasion to see diversity diminished. Nothing against Chromium, but I hope it won't be the one browser to rule them all. It's always good to have alternatives.
> Chrome is faster and more stable than PhantomJS. And it doesn't eat memory like crazy.
Wait, can someone tell me where to download this doesn't-eat-memory-like-crazy version of Chrome? Activity Monitor is showing me 2GB of Chrome processes right now and that's even with The Great Suspender having paused almost all my tabs.
I saw a trick where you can run Chrome and give it less memory, and it uses less memory. This is done using cgroups.
The blog post is somewhat old[1] and not in sync with the version that is in Git[2], you might find a way to do this without Docker (I was using an old version of Docker and kernel that couldn't get it to work. But I need the old version of Docker for reasons.)
Chrome will aggressively consume any memory you give it (up to a point?) to "make your browsing experience better" somehow. You're not wrong. But there is modern technology that can make it better. If you have a fast SSD, then Chrome can still use Swap to make your experience better. The later version in the Dockerfile linked on Git also leverages swapaccount with the seccomp setting.
This may be one great use of Docker for people that wouldn't yet have been convinced to use Docker for any serious reason.
I'm not sure that the results should be expected to be the same in GUI and in headless mode. I don't know - I'm just saying this is not clear without a test or clarification from someone who knows how it all works.
Many thanks to the maintainer for his work. I think this isn't unexpected, and actually encourage other unpaid maintainers to follow. Reason I'm thinking this is that the current state of voluntary support is unsustainable anyway, and by letting it go we maybe could make the market for dev tools economically viable again.
It's a very good question. One might imagine that the browser renders everything into a buffer at some point, and you could simply ask the engine to give you a pointer to that data.
The reality is very different. WebKit/Blink rendering is intimately tied with the graphics system of each platform, in particular through the use of native widgets and native window system compositors.
For example, on the Mac, a lot of compositing within the browser window is done using Core Animation layers. This is a really good idea for performance, because it leverages the work done by Apple to improve their GUI performance.
The downside is that capturing the output becomes very tricky when the browser doesn't do the final compositing. Previously this didn't really matter because 99.99% of browser rendering is for end users and they don't need to capture the output (or if they do, they would just use platform GUI functionality like screen capture).
An increasing demand for headless rendering has effectively forced browser engine teams to rethink some of the internal APIs so that a pipeline can be built to capture the final rendering.
It's the same order of magnitude of work, as it is maintaining a fully-fledged browser. It's sad to see open-source projects shut down, but being a sole developer is a lot different than having the resources of a giant like Google, for example.
Question for those of you more involved with such headless tasks. Do you think that chromium and firefox supporting headless will induce a surge in bots crawling the open web from now on?
Right off the bat: No. The reason for that is that crawling using a proper browser (i.e. Chrome) is a lot more resource-intensive than using a dedicated tool which only gets the top resource and maybe tries to parse some additional resources. With these kinds of tools you're limited by available bandwidth and IO speeds if you want to store things. If you're looking at a browser, you'll be limited more by things like memory consumption and CPU time, so you'd need a bigger box or more of them to drive the same amount of traffic. There is also not the same amount of ready made applications which take care of crawling, storing and maybe even indexing your data, so not something you can do without actually implementing a lot of things yourself.
Of course, that is only talking about wide-scale scanning. If you're only looking to scrape a single target, for whatever reason, then having an instrumented headless browser will greatly simplify things. Headless Chrome should be more efficient than running it in a (virtual) framebuffer. Plus the whole setup for a powerful crawler is reduced to "install Chrome, start it, point $crawler at the API endpoint". My guess is that we might see turnkey crawling / automation tools appear where you supply a list of URLs and the library + Chrome does the rest. Then, browser-based large scale scanning will be within everyone's reach, only limited by their resources.
Background: I created https://urlscan.io which will simply visit a website and record HTTP interactions (annotating the resources with some helpful meta data). I've been preaching the power of headless instrumented browsers for the better part of a year now ;)
I am happy for the guy as he seems to be able to let go without letting anybody down (which seems to be important to him). At the same time, it is sad when people have such a pressure for something they probably started as a fun project.
Thanks Vitaly for your work! SEO4Ajax would certainly not exist without PhantomJS. It helped us to deliver the service efficiently at the time.
Unfortunately, we had quite a few compatibility issues with it leading us to migrate to Chrome (with xvfb) one year ago. Since then, we must confess that we are very happy of this choice. Chrome is indeed very stable, fast and more importantly for us, always up-to-date.
That's true, but in my experience the instability of Chrome comes from opening and closing it's windows repeatedly as a large test suite often does - occasionally it doesn't seem to like opening a new instance while another is closing. Headless mode should resolve that problem.
There was also the issue a few years ago (last time I wrote automated scripts) where the Chrome driver for Selenium would go too fast for the browser to keep up, causing false failures.
I had to implement a "wait between actions" feature to handle it, while Phantom had no such problems. I'm assuming this will not be an issue with headless Chrome, since I think half of the problem was due to graphical rendering.
It's both sad to see an incredibly useful project be sunsetted and exciting that it's no longer needed. I remember a project that used phantomjs to scrape an old government camping site to build a compatibility layer on top.
Besides testing, my team uses phantomjs to convert pages to PDF and also to convert javascript generated charts to images. This is sad news, I'm don't think chrome eventually will add support for.
While it's much better than PhantomJS, still causes a lot of issues. We run a paid pet project [1] for html to pdf conversion and most of our customers have stories beginning with PhantomJS then moving to wkhtmltopdf then finally going for something else due to issues with both.
Headless Chrome might solve this issue once and for all though.
And there's its sister project wkhtmltoimage for rendering to images as well :) Sadly, though, the browser seems way behind the times so I think Chrome will win the next round. I built a prototype library for rendering pages to animated GIFs using the Chrome debugging protocol, but headless will make it even easier: https://github.com/peterc/chrome2gif
We are using it for reports and we are tailoring our HTML specifically to the reports so we haven't had that issue. If I needed Javascript it may not be the best solution.
I like that it generates full PDFs (with real text objects) though not just a static image. I'm not sure if you can generate a PDF like that with PhantomJS. I haven't tried.
You can use WebDriver to take screenshots. OTOH I don't think there's any way to do PDF generation without fucking around with injecting window.print() and trying to go from there.
Anybody know how to port code from using PhantomJS to headless Chrome? I have been using CasperJS that wraps to PhantomJS. PhantomJS had its own set of commands. headless chrome will have to be different one way or another.
This is sad, phantomjs is better stripped than Chromium headless, if you ever try to install Chromium on servers without X, it requires shit ton of dependencies, while phantomjs was properly modified requires only minimal library.
I think is a wise decision too. As an OSS collaborator is hard to explain how important/demanding this work is. I really understand his feeling, and I hope that more people like him could collaborate in OSS projects. Thanks for everything!
PhantomJS is a great tool, I implemented it as a PDF report generating system. But will Chromium be able to replace it in this regard? Will Chromium have paging features? Will it be able to repeat table headers when a table body content extends to the next page?
Hey dude... thanks from a grateful dev in Scottsdale, AZ. Your hard work enabled a lot of really cool stuff for us! Good luck in your future adventures!
A great example would be PDF generation for things like invoices. Rather than generating a pdf with something like PHP or Java. Render a regular html page with all the css you want (super easy compared to drawing a PDF using PHP) and then proceed to use a pdf printer on that page.
You could run such a thing as a microservice using a headless browser or PhantomJS. There are probably better ways to do this but that's one of the first things that popped into my head!
Because you probably already did the layout work in HTML to display on screen to the user and now just want a PDF version of it.
Or you can redo the layout in latex and maintain two layouts.
The full print css is actually pretty complete, problem is the only browser that fully supports it is PrinceXML. None of the major browsers seem to care much about print layout.
But HTML layout is very different to page based layout. HTML is responsive and has no concept of pagination. PDF is paginated and has no concept of responsiveness.
A few little personal things I've done with PhantomJS:
• A script that would go to Comcast's TV schedule for my area and make a list of all movies upcoming in the next two weeks on all channels that are included in my subscription. I could then grep that for a list of movies I've been looking for.
I couldn't just grab the page with curl and parse it, because JavaScript does most of the work. JavaScript fetches the listings, and when you advance the listings it it fetches the new listings and replaces the old ones on the page.
• A script that goes to the FCCs license information site and gets a list of all ham radio callsigns issued recently [1].
• A script that given a URL to a tactics problem on lichess gets the FEN for the position. I'd use this if I was doing tactics training there on my iPad and did not understand why my answer was wrong or why their answer was right.
I'd mail myself a link to the problem, and then later on my desktop I'd give that URL to this script and it would go to lichess to the problem page, and then from there to the board editor page for that position, and grab and give me the FEN, which I could then use to set up the position in Stockfish to analyze.
(This is no longer useful. They have made some changes at lichess and now they have a browser-based version of Stockfish on the problem pages, so I can answer my questions right there).
• A script that goes to everquest.com and gets the server population levels from the server population display on that page.
I don't think that there was anything in this one that actually needed a headless browser. As far as I recall it could have all been done with getting the page with curl and parsing it. It was just easier to do it in JavaScript using the DOM. (The lichess one may also have been that way).
Web scraping for sites that discourage simple robots by checking for JS or serving content via JS.
Also, the darker stuff like click fraud and all the other kinds of fraud where you pretend there are humans doing something when in fact there's just a bot.
At work I was given permission by a vendor to screen scrape their site while they worked on building a real API. This site was extremely dependent on javascript. Including doing some really complex token passing between multiple domains that the company owned. Not to mention all of their js was minified and uglified so I had a very hard time understanding what it was doing.
It was the first time I wasn't able to successfully reverse engineer a site enough to scrape what I needed with just requests/beautifulsoup. I was however able to get it working just fine using phantomjs via selenium via splinter. It was a fun exercise, but part of me still feels like it was cheating.
I wrote a PhantomJS script to download data from my bank accounts. They offer no API and disfunctional text-based exports, their websites are ridden with "good" (=terrible) web & security practices, like 3-characters-of-the-whole-password authentication, single-tab sessions, frames, etc. that makes it pretty much impossible to scrape with Python but relatively easy with a fully-fledged browser (although that still requires a lot of bank-specific boilerplate code).
That actually sounds like it could run you into legal issues (or worse), depending on your location (ie - access to a computer system without permission; they give you permission to use the app on the phone, but maybe not to use the API directly). YMMV.
I can't help you with Halifax or AMEX (yet), but my company (https://teller.io/) has a Natwest API in production (private beta). If you would like access, please ping me. sg -at- teller.io
We used it a lot for full automation tests for the UI. It's nice being able to interface with a full-featured browser that can run javascript, etc. And take screenshots when things go wrong.
We have a complex matrix of layouts and styles an user can chose from and we need to test them across all browser to make sure any improvement doesn't mess with others.
It's way cheaper to launch a browser headless at all the resolution we need and grab screenshot to visually compare them at glance, instead of goin one by one at hand
While we're on the topic, does anyone know where one might find scripts to scrape bank statements so you don't have to download them manually every month? (This is one thing I would find headless browsers useful for...)
It uses screenshots and OCR to automate web browsing and scraping, to you do not even have to "touch" the DOM. What you do is you simply draw a frame around the areas that you need to have extracted and OCR'ed. It also works with PDFs.
If your authentication is basic enough, it should be a problem to write one. The problem with a lot of banks is things like two-factor auth to do anything. If I didn't have a mortgage locked in at a crazily great rate, I'd consider changing banks just to get one that'll let me automate statement downloads. The alternative is to build a device to press buttons on their 2FA device... Come to thing of it that might be a fun hack.
2FA isn't my issue here. Actually downloading the statements is. Lots of banks use JS, some go through really weird hoops getting you the PDF that are difficult for non-experienced people to automate. Stuff like JS in embedded iframes that generate the link on the fly and open a new tab that you have to navigate. It's hard to accurately detect all the links and handle things like "Next Page" and so on, especially for more than one bank. It's quite nontrivial.
If you don't have 2FA issues, then while I agree it's non-trivial, it's certainly doable with a headless browser. But yes, I'd love for there to be simpler ways to do this in general.
They generally do. But many UK banks make it hard to automate things. Either insisting on 2FA in all cases, or having a secondary login without 2FA that only gives very limited access.
Some way of authorising API access to read-only access to things like statements would be fantastic, to the extent that I'd consider changing banks over it, if you know of any UK banks that offer it.
As well, most bank OFX/CSV exports I've dealt with are truncated in some ways (e.g. truncated labels), which make it harder to really leverage sometimes.
The format isn't the issue. The problem is I want it to be API-friendly so I don't even have to think about it; my system should download it automatically.
I have used them in the past to convert graphs and reports to JPGs and PDFs so that I can automatically email them to people in the company who are unable (or unwilling) to use a web page.
Selenium won't really be impacted, as it's higher-level than PhantomJS.
This is also a great example of why it's smart to use Selenium or something like it for scripting the tests. You can easily swap out for another backend in Selenium, but if you wrote tests in pure PhantomJS, you're now stuck with a codebase that depends on unmaintained software.
That sucks for any scraping use case. I have to imagine google has built in some way to detect headless browser mode serverside, even if only they can access it.
This is a great move. I appreciate what Phantom (and the maintainer) were trying to do, but I have always loathed PhantomJS. It has never worked well. In fact, I'd been away from it for some time, but just last night needed to install it to run some tests and it caused massive frustration.
I pulled a Node repo that ran tests using Karma (why people use Karma is a complete mystery to me). I pulled the repo, ran `npm install` and then `npm test`. Sure enough Karma explodes out of the gate.
Phantom can't start. I'm on Windows 8.1. I debug for an hour, eventually finding a magic custom binary Ariya created. I then have to copy this binary to the `/node_modules/karma-phantom-launcher/node_modules/phantomjs2-ext/bin` directory.
All this to run some Jasmine specs.
If Chrome headless support is really as good as "works just like Chrome without the GUI" then I will be one happy camper.
Even an acquisition[1] of PhantomJS would totally make sense, then let him keep working on it but based on headless Chrome and with real resources.
[1] Careful how you spin it for this route, learn from TJ/Express https://medium.com/@tjholowaychuk/strongloop-express-40b8bcb...
reply