Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The elusive universal Web bytecode (mozakai.blogspot.com)
64 points by espadrine on May 14, 2013 | hide | past | favorite | 60 comments


> The main problem is that we just don't know how to create a perfect "one bytecode to rule them all"

It boggles me that the prevalence of the assumption that there should be a single bytecode. Not all tasks are the same. In real world computing an ATMega is sometimes a better choice than an ARM or an x86.

When Notch released the DCPU16 specification there were multiple compatible emulators running within 24 hours. After 3 days there were compilers. Supporting a bytecode can be easy. Optimizing for speed makes for a much more difficult task, but not all architectures must target the same goals.

The article lists the goals of Fast, Portable, and Safe. I would add to that list more goals. Deterministic and Efficient are two that spring to mind. I would advocate multiple bytecodes that favour some goals over others. The 8 bit AVR of the Arduino would be a good pick for a bytecode that has a light footprint that would be ideal for small tasks.

I wouldn't want a free-for-all with a massive proliferation of architectures, but at least three would mean there is a much better chance of having the right tool for the right job.


An ATMega is sometimes a better choice because it's cheaper and uses less power, not because it has a better instruction format. An 8-bit bytecode would be ideal for very little on a 64-bit computer (you'd end up simulating larger integer arithmetic that the machine can do in a single instruction, for no benefit); more generally, since the details of the target machine are going to vary anyway, there is no point in having multiple bytecodes with the same underlying memory and execution model that can be easily complied between.

Having a C-like bytecode and a Java-like bytecode and a Haskell-like bytecode would be more reasonable, but it would also be a huge pain for browser makers. One highly optimized VM is complex enough...


Having written assembly for x86, Arm, 8 bit AVR and indeed DCPU16, I disagree. Each performs particular functions better. More importantly, We should be able to disagree. You don't have to use my AVR-like bytecode and I don't have to use your Java-like bytecode.

Having a single "this works for me so it should work for everyone" is dumb. Like only supporting png as your only image format.


I am ranting slightly out of the scope of my expertise, but I think that 'web bytecode' is putting lipstick on a pig. In fact, I think that the entire web stack is upside down, it is intended to serve a mostly static webpage, perhaps with a counter or a mouse-over effect. And it is doing this well. But to start with the browser, having a nice HTML parser, a DOM tree etc. is nice to have, sometimes. Similar with http, this is a stateless protocol, which again is nice, except if you want state. And on the other side of the connection is the webserver, which today is mostly a glorified front end for a database.

So every time I think about the web, my sense of software design is rebelling. It should just be constructed the other way around, with a nice VM on the client, that contains a browser if it is supposed to present structured hypertext. That communicates to a server over TCP/IP, without reinventing TCP atop of HTTP and a server that is actually tailored to whatever it is supposed to do. ( Before anybody accuses me of advocating Java, I want all of this nicely implemented.) But unfortunately, it is probably a billion users too late to start again from scratch.


I've often had this thought. But I keep coming back to this idea: HTML and CSS are actually a really nice way to describe an interface, for a few reasons:

1. Reusable styles plus case-by-case overrides--very important in a world where graphic design is so central to what we do. To me, CSS beats your typical GUI builder any day.

2. Flow. HTML/CSS have a sophisticated model for automatically flowing text and sizing elements to fit. The content below gets pushed down by the content above, automatically and as much as necessary. Other GUI models have similar things, but I prefer the web's take on it.

3. URLs. These are a beautiful concept. A string that uniquely identifies a certain resource, be that resource a document, a certain screen in an interactive interface, a record, whatever.

Now, you might say you could build all these things into your ideal client. But if you did, what would you have? Seems to me you'd have a web browser.

I do agree with your argument about the network protocols, though. We started with TCP, then we put HTTP on top of it to transfer documents. Then we started sending lots of HTTP requests back and forth for interactivity, and even having the client poll the server as a hack for pushing. Like you said, it started to look like TCP on top of HTTP on top of TCP.

But now we have websockets, which in my opinion is a pretty lean and mean protocol on top of TCP. I like to think of it as regular TCP with a few conveniences like negotiation. Granted, it is not in use everywhere, and I still use the TCP-on-top-of-HTTP-on-top-of-TCP approach for most of my stuff, mostly out of technical conservatism. But I think it'll get there.

As to the statelessness of regular HTTP, it still has its place. Sometimes stateless connections can be very beneficial. Imagine a site like Wikipedia. You wouldn't want everyone who's currently reading a page to have an open socket. With HTTP, you open the socket, you transfer the document, and you close the socket. Done. It's unfortunate that we have to open a new socket for every image, CSS file, etc., but I also expect that situation to improve over time.


Actually I agree, we could do a lot worse than HTML,CSS and JS on the client, and more general I think a lot of the development of the Internet are really fortunate accidents. But there are a lot of potentially useful services, which do not need a GUI, for example a rsync client. Or which would profit from a different metaphor than the newspaper with moving pictures metaphor HTML imposes, for example games.


Check out Atlantis/Embassies; they've proposed a very similar thing: http://research.microsoft.com/apps/pubs/default.aspx?id=1799... (not sure why they call a lightweight VM a "pico-datacenter", though)

There may be a way to incrementally approach this architecture by reimplementing existing functionality as modules that run inside some kind of VM and then deprecate the old hardcoded version.


> There may be a way to incrementally approach this architecture by reimplementing existing functionality as modules that run inside some kind of VM and then deprecate the old hardcoded version.

Hopefully, but if the history of computing is any indication: Layers of abstraction can only be added, not removed.


> But unfortunately, it is probably a billion users too late to start again from scratch.

A potentially interesting way around this is to make the client open source and get it shipped by all the Linux distributions. That would give application developers an initial audience which would help solve the chicken-and-egg problem.


> Before anybody accuses me of advocating Java, I want all of this nicely implemented.

So... I take it that in your opinion the reasons why Java (and to a lesser Flash) didn't supercede the web have to do with its implementation flaws, not its concept?


That would be my take anyway. A big issue with Java is that it required installing a huge standard library - that should just be downloaded as required. And flash was never really intended for applications; it was always about "rich content". And Air (the application framework built on flash) has the problem that it's proprietary so you have to trust Adobe.


I think that the evolution of 1998 hypertext to todays web apps makes sense, as in every step along the way was reasonable. And by the time that one could actually leverage the advantages of a VM concept, the browser was the entrenched incumbent.


I am reminded of "Worse Is Better" [1]:

  The good news is that in 1995 we will have a good operating system and
  programming language; the bad news is that they will be Unix and C++.
Javascript seems to exhibit much of Gabriel's "New Jersey approach".

[1] http://www.jwz.org/doc/worse-is-better.html


I like asm.js and have used it (http://wry.me/hacking/Turing-Drawings/ ). But I understand the basic case for 'web bytecode' to be this: software fault isolation and portable low-level distribution formats have both been demonstrated showing considerably less overhead than the roughly 2x of current asm.js, going back to the 90s (e.g. http://www.eecs.harvard.edu/~greg/cs255sp2004/wahbe93efficie... and http://en.wikipedia.org/wiki/Architecture_Neutral_Distributi... ). asm.js will improve, and it has a great adoption path, but it hasn't yet been shown to run as fast as that old work claimed to have done.


To be fair, the 2x figure was from a few months ago, and was from the very first prototype of OdinMonkey (asm.js optimizations) in Firefox. Things have improved since then and will continue to do so, see

http://arewefastyet.com/#machine=12&view=breakdown&s... http://arewefastyet.com/#machine=12&view=breakdown&s...

for more current numbers. Many are better than 2x slower than native.

The first prototype was basically a few months of work by 1 engineer specifically on OdinMonkey, building on a few years of work on the more general IonMonkey optimizing compiler. Those are far far smaller amounts of time than have been spent on compilers like gcc and clang, so it is not surprising there is a performance difference. But it will get much smaller.


> It turns out that C++ compiled to JavaScript can run at about half the speed of native code, which in some cases outperforms Java, and is expected to get better still. Those numbers are when using the asm.js subset of JavaScript, which basically structures the compiler output into something that is easier for a JS engine to optimize. It's still JavaScript, so it runs everywhere and has full backwards compatibility, but it can be run at near-native speed already today.

The problem that many people have with web browsers is that they have evolved from retrieving text to become bloated, Frankensteinish wannabe operating systems that perform redundant operations at slower speeds. It is the job of the underlying operating to execute programs, not the web browser's. Constructing a redundant operating system and executing code at half the possible speed (at best!) is not progress.


The browser I'm using at the moment is faster than the one I was using three years ago, despite also having a very large number of new abilities.

Further, a significant number of my current OS processes are currently executing programs themselves (with various degrees of dynamic compilation). I'm not clear on when it's okay for a process to itself execute a program. Is interpretation ever okay? Or does it always need to be dynamically compiled? Also, what's the rule on compilation of turing complete languages like Postscript?


I would say it is progress, if

1. The browser executes programs in a totally platform-independent way, which the underlying OS cannot (a website should be able to run on all browsers and on all OSes). So having the browser run code is useful too.

2. That "half speed" number was the status as of a few months ago, and is still improving.


> The browser executes programs in a totally platform-independent way

JS has different scripting engines and different dialects, so it isn't really platform independent in the sense that the same code will execute the same way on every computer. In fact, it's possible for JS to execute in different ways on the same computer when you run the code on browsers with different engines. There are plenty of instances were a page will not render the same way in different browsers (although this is usually because HTML use isn't consistent).

> That "half speed" number was the status as of a few months ago, and is still improving.

It's still an effort to reinvent the wheel. When a 1:1 ratio is reached, you (not you in particular) will still have ended up spending time implementing a feature that already exists on your OS, instead of improving the original program. For example, browsers have pdf readers baked in. Now you have a dedicated pdf reader on the desktop, and a slower, buggier pdf reader on the browser.

The problem is that programs on the desktop don't communicate well with each other to get the same kind of integration you get on a browser, or the underlying OS doesn't include programs to implement features that you want to include on your web page. I understand that it may be simpler to turn a web browser into a mini OS rather than get OS authors to change their OS (the Chromebook is a misguided attempt to take this to its logical end), but it is still an ugly and redundant solution.


JS, for the most part, does run identically on different browsers. With very few cases of truly undefined behavior, any other difference is either DOM stuff, or a bug.

I've ported lots of apps to JS, and generally they just work across browsers. Even the Unreal Engine 3 demo runs on Chrome, JS was not what was holding it back.

> It's still an effort to reinvent the wheel. When a 1:1 ratio is reached, you (not you in particular) will still have ended up spending time implementing a feature that already exists on your OS

The OS can run it, but native apps are not portable, as I said before. That is what makes it worth reinventing some parts of OSes in browsers.


IMHO, The whole reason Javascript is the de-facto language of the web is because Microsoft made the mistake of shipping a Turing complete language runtime that was not dependent on the Win 32 API and the language happened to be Javascript.

IMHO, any kind of new web bytecode standard will be immediately broken by Microsoft into an incompatible variant that we won't be able to nicely hack around and will thus fragment the market in their favor. The code base and runtime installed base of Javascript is so massive now that they can't easily fragment it.


Netscape shipped Javascript. Microsoft shipped JScript in a desperate attempt to peel away their near-monopoly market share, and they (correctly, IMHO) judged that impossible if they were entirely incompatible with Navigator.

Microsoft is no longer in a position to unilaterally destroy the web anymore, and I see no high probability of them regaining that position in the forseeable future. (Too much stuff going mobile, and they're still not strong enough in that space to dictate.... to put it lightly.)


Microsoft may still be able to hinder the web by using its influence in W3C to push for detrimental standards like Encrypted Media Extensions, and by refusing to implement things like WebGL.


They can't stall forever on new web standards if they prove useful. I'm not sure WebGL will, but if it does, the "graceful degradation" culture in web development means that IE users will get some sort of slow, limited approximation of the intended experience. And enough of them will know and start installing Chrome or Firefox in larger numbers again.

Microsoft can slow the adoption of web standards (and just via the desktop), but they don't have any real control now. And the only reason WebGL isn't in IE 10 is because they didn't want to offend the DX/D3D team. Once IE supports WebGL, D3D will die.


WebGL is useful for more than just games. For example, I'm using a fragment shader to implement client-side demosaicing[0] in the web interface for my Kinect-enabled home automation hardware (link in my profile if you're curious). CSS Shaders/custom filters might serve as a partial substitute for WebGL, but IE doesn't support those either.

On the bright side, I've seen rumors that IE11 will support WebGL, but that is of little use to people stuck on XP, Vista, or W7 due to compatibility requirements of other legacy apps. As you say, IE doesn't have the dominance it once did, but for some reason it's still used in certain market segments.

[0] https://en.wikipedia.org/wiki/Demosaicing


Please. JavaScript is a far cry from the ideal byte code / intermediate representation for all future applications to be built in. It's bad enough that people think every new app should run inside a web browser ... but saying that it MUST be written in JavaScript (or something that can generate JavaScript) is manifestly irrational and ridiculous.


What do you think is needed beyond something like asm.js?

For me the limitation that stands out is Javascript cannot pre-empt.

while(true){};

locks things until some magical supervisor (which _can_ pre-empt) notices and Tells you something went wrong.

You can use workers for long operations but you can't communicate with them while they do it.

look at the example for Web Workers

http://www.whatwg.org/specs/web-apps/current-work/multipage/...

It pumps out a series of primes, but you can't tell it to stop or switch to Fibonacci without killing it and making a brand new one.


Just calling JS "irrational and ridiculous" doesn't advance the conversation; we need details.


"saying that it MUST be written in JavaScript ... is manifestly irrational and ridiculous"

You misread.


> Please. JavaScript is a far cry from the ideal byte code / intermediate representation for all future applications to be built in.

Indeed. But it's what we actually have today, not what we wish to have tomorrow.


Perhaps the most significant thing is what he glosses over in this parenthetical paragraph:

(Of course there is one way to support all the things at maximal speed: Use a native platform as your VM. x86 can run Java, LuaJIT and JS all at maximal speed almost by definition. It can even be sandboxed in various ways. But it has lost the third property of being platform-independent.)

The argument is made that any bytecode is probably going to have most of the flaws of Javascript anyhow. However, it's a long standing truism that most problems in Comp Sci can be solved with another level of indirection. The TAOS VM was a virtual Instruction Set Architecture which could be Just In Time translated to a real instruction set architecture and run as fast as it could be read off disk. Since it is not a real ISA, it would also be platform-independent.

http://www.uruk.org/emu/Taos.html

Using such a VM would truly allow one to support any language on the web, at something like 80% "native" speeds. NaCL, LLVM, or even ASM.js could be grown into such a VM.

Additional levels of indirection are apt to create problems as well as solve them, however. Such an undertaking as an overarching standard would be very complex -- so much so, that it may never come to pass, so ASM.js may still be preferable, since it's much closer to actually existing.


I couldn't figure out from the link, how is TAOS different from the JVM/CLR? It seems to me that pNaCl and asm.js are already 99% of that, they're only missing all the libraries.


TAOS VM is like pNaCL, in that it's more of a target for implementing a VM than a bytecode compiler target. And yes, ASM.js is already most of this.


I've basically resigned myself to using JavaScript for new client-side development, since it's the native language of the browser, which is the most restrictive of all platforms.

I considered C# a few months ago; I know that things like Script# and JSIL can translate C# or CLI bytecode to JS. But I figured there would be subtle semantic mismatches that would require anyone working with such code to know both C# and JS anyway. So arguably JS requires less working knowledge.

The most obvious semantic mismatch between JS and other mainstream languages is that the other languages usually (if not always) support multiple threads with shared state. Consequently, the standard libraries of these other languages tend to have blocking APIs. This seems to me like a good reason to use JS for new client-side development, rather than trying to make another language compile to JS. One can convincingly argue that not supporting multiple threads with shared state is a good thing, anyway.

The only limitation in JS that really bothers me, if it still even exists, is that AFAIK the largest possible integer is 2^53.


JavaScript still has no integers. But you can use a bignum library. Hopefully you'll get integer level performance if the JavaScript VM sees that you're only doing integer operations (that includes dividing and then coercing an "integer" by using "|0" or similar).


Years of effort have gone into making javascript VMs faster than they have any right to be. Even an ideal bytecode would likely be slower than javascript for years. It seems unlikely that many people would be willing to give up performance now for theoretical future performance, and without a reasonable amount of use, there's no good reason for browsers to implement the new bytecode.

For better or for worse, we're stuck with javascript for the foreseeable future.


Even an ideal bytecode would likely be slower than javascript for years -- no, statically-typed bytecode should be significantly faster than dynamically-typed JavaScript (both in execution and parsing time), IMHO.


I agree that makes intuitive sense, but the article mentions numbers showing the opposite is true in some cases.


Isn't LLVM already faster than asm.js in most cases?


The question makes no sense. Emscripten even uses LLVM, so this comparison is recursive.


What about size? asm.js is fast, but parsing it is slow and it is super bloated. A real binary byte code is obviously a better choice, but there doesn't even seem to be any good options for this on the horizon. I would take asm.js if we get it, but it is far from the ideal solution.


It is actually not as bad as you would think. Much better than LLVM, in fact.

http://mozakai.blogspot.com/2011/11/code-size-when-compiling...


Oh, it was already submitted, I hate these country-specific Blogspot adresses. Deleted mine.


I thought this was going to be a comparison of asm.js and PNaCl, but the answer is, JS itself?

What we really need are toolchains that cross-compile (the same codebase) to both asm.js and PNaCl. What's nice is the asm.js also works on browsers that support neither.


It is possible to have the same codebase on Emscripten and NaCl if you use SDL. SDL implements a subset of SDL and SDL (the actual library) is already available for NaCl: https://developers.google.com/native-client/community/portin...


We already have bytecode options. Flash, Silverlight and Java plug ins. If mobile browsers all supported Flash and Silverlight...

At this point our hope is improved Javascript features in the next release...


>Be a convenient compiler target: First of all, the long list of languages from before shows that many people have successfully targeted JavaScript.

A counter example to your argumentation: Many people deploy products in PHP aswell, though we've all decided PHP is something we need to move away from..


This is missing a discussion of debugging. I don't think any of the emscripten/whatever magic has that down yet.


Source maps. Soon.


Definitely source maps. They should basically let stuff compiled to JS be as easy to debug as stuff compiled to x86.

But JavaScript actually has some advantages for debugging. The generated code is much more readable than x86 or ARM, and it's trivial to instrument code while debugging, http://mozakai.blogspot.com/2012/06/debugging-javascript-is-...


Source maps are a partial solution. We also need a way to translate the data (fields and data structures) back to its source representation.


Isn't that part of source maps?


Nope. It's just about mapping JavaScript code back to source files. We'll need to come up with another standard for data.


We have an existing, widely deployed VM that is portable and sandboxed, fully described by open standards, and has several competing but fully compatible implementations -- most of which are already very fast -- and people want to throw it out and replace it because they don't like the file format...


I would like some of what the author is smoking. In addition, I would like to see them explain exactly why they think that a full language with flexible syntax qualifies as bytecode. There are so many useless assertions and falsities in this post that I'm not sure where to start, and I'm not sure that a piece-by-piece refutation is worthwhile.


I think the word "bytecode" brings more confusion than clarity at this point. What we need is not necessarily a super-optimized, human-unreadable binary format for distributing programs. We just need a format that's standard and portable, and with underlying semantics that are amenable to a fast enough implementation. JavaScript is such a format; the fact that it's a programming language with C-ish syntax, rather than a binary bytecode format, is irrelevant at worst.


I think sooner or later a more agnostic VM in the browser will came. HTML was for hypertext, now (almost) it is for general UIs.

The problem is how to make a standard if organizations like Mozilla are not part of that? I think it is not so difficult: don't have a VM? emulate it with asm.js. Do you have a VM? run your application at full speed in more advanced browsers. When Google Chrome came with a faster Javascript engine, others needed to catch up.


The problem with this idea is that a new, "better" VM doesn't offer any substantial benefits over using Javascript-as-a-VM. The limitations of JSaaVM can be can addressed more easily by simply improving on the existing infrastructure. There's not a good reason to throw out what we have when incremental improvement gets us to pretty much the same point more quickly and with less pain.


Native client is faster than current solutions.


I'm not sure that a piece-by-piece refutation is worthwhile. -- it would be worthwhile because otherwise your comment doesn't really say anything useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: