NPM - An intervention
Posted on 22/2/12 by Felix Geisendörfer
Update: Isaac commented and explained why fuzzy version specifiers are here to stay. I'll be ok with it and will adapt my workflow accordingly.
Update 2: I did not give up on the bug that is part of the story below, a test case and fix has been submitted and merged!
Update 3: NPM Shrinkwrap is now a real thing.
NPM is the official node.js package manager. Unlike many package managers that came before, it is actually incredibly awesome, and has helped to create one of the most vibrant communities in the history of open source.
However, today I want to talk about a few aspects of npm that concern me. In particular I want to talk about stuff where I feel that NPM is making bad things easy, and good things hard.
NPM module versions are broken
Today, I tried to contribute to the forever module. The company I am helping had to patch their version of it because of a hard-to-reproduce bug in production and asked me to help submitting their fix upstream. Being the scientific type, I set out to write a test case against the forever version their patch is based on:
$ npm install firstname.lastname@example.org
Fantastic, NPM lets me specify which version of forever I want to install. Now lets verify the installed version works:
$ ./node_modules/forever/bin/forever node.js:134 throw e; // process.nextTick error, or 'error' event on first tick ^ TypeError: undefined is not a function at CALL_NON_FUNCTION_AS_CONSTRUCTOR (native) at Object.
Oh no, what happened? Mind you, except for an unrelated patch, this version of forever is running perfectly fine in production.
Well, as it turns out, you have been lied to. There is no such thing as forever v0.7.2. At least not a single one. It depends on an implicit and unchangable second parameter: time.
Why is that? Well, it is because forever v0.7.2 depends on this:
And as it turns out, nconf has released newer versions matching this selector, featuring a different API.
You are doing it wrong
"Hah!", you might say. "That's why you should check your node_modules into git!".
I am sorry, but that is not helpful. While this will allow me to pin down the node modules used by my app exactly, it does not help me here. What I want to do is to reproduce this bug in a standalone copy of forever v0.7.2, then check if it exists in the latest version, and if so submit the test case and fix for it upstream.
However, I can't. Not without manually resolving all forever dependencies the way NPM resolved them when v0.7.2 was released. (The fact that forever is a bit of a spaceship when it comes to dependencies does not help either).
Discouraging Open Source
Speaking about Mikeal's article. I felt that something was wrong about checking your node_modules into git when reading it, but it is only now that I can point out what:
In the article, Mikeal argues that module authors should not try to exactly reference their dependency versions, so this way users would get more frequent updates of those dependencies and help test them.
However, he says doing so for your app is a good thing.
I disagree. To me, this approach discourages open source for two reasons:
a) Bug reports:
I currently maintain 44 NPM modules. It is very hard to keep up with that.
If you are asking me to support multiple versions of all my dependencies, I will have to stop helping people with bug reports for my modules.
When somebody reports a bug for a given version of my module, I want to know exactly what version he used. Figuring out when he installed my module to rule out dependency issues for every bug report is not an option for me.
Ask yourself what is easier. Adding a quick patch to a node module you already track include in the git repo of your app, --or-- creating a fork of it, fixing the problem in the fork, pushing that fork on GitHub, changing your package.json to point to your fork, and submitting a pull request.
I know people cannot be forced to contribute back, nor should they be. But as things stand right now, checking in all node_modules of an app into git is the only sane option, as the version numbers in your package.json are essentially meaningless.
This means that contributing back to open source is made difficult by default, while keeping your patches to yourself is made easy. I would like this to be the other way arround.
I propose to gradually drop all support for fuzzy version specifiers from NPM.
To me, fuzzy version specifiers are entirely evil. They make things more complex. They force me to manually snapshot the packages I depend on for my apps. They prevent me from supporting and contributing to open source.
So rather than throwing more complexity at this problem, lets just remove this feature alltogether.
If you agree, please re-tweet this article or leave a comment.
You can skip to the end and add a comment.
There is another issue with fuzzy version specifiers (although that's not NPM's fault): Not all module authors actually follow conventions like major and minor release numbers. So I can have 0.2.x in my package.json, but the module author can actually potentially land a major API change as a point release. Because of this, I have stopped using fuzzy specifiers.
Completely agree, Rasmus retweeted (and the tweet that lead me here of course)
fuzzy is certainly more convenient when it works, but other than that yeah I agree haha, consistency > convenience
I personally like the approach Bundler (for Ruby) is doing by having a Gemfile.lock which describes the dependency chain with all versions. This is checked into the repository to ensure everyone has the same version of dependencies installed.
Thomas / Dom Udall: Sorry, the link to the tweet was a copy & paste fail. It's fixed now : ).
Sebastian Cohnen: I'd rather make things simpler than making them more complicated. But if fuzzy version specifiers are here to stay, yeah, that would be the next best thing.
I agree with out ! It must change : every time I have a npm update, I pray. Because often, I got broken packages :(
I just made a post to nodeJS News to aware people.
You don't need to figure out when they installed your module. You only need to have them run
npm ls and it'll tell you the version of everything that's installed. I maintain 45 published modules, one of which is npm itself. I've not had a problem with this.
Locking down node module versions will not prevent software from having bugs. It will reduce one vector of divergence, but only that one vector. It is a myth that all divergence in software is necessarily harmful. Flexible version dependencies make fixing bugs easier as well, in many cases.
In the case you mention, the
forever program depends on nconf 0.x.x. Using a module involves a certain amount of trust in the versioning semantics of the module's author. Maybe forever's author should contact nconf's author and figure out something that can work for them ;) In any event, all he has to do at that point is either publish a new version of forever which fixes the dependency, or a new version of nconf that maintains backwards compatibility. (Indeed, it's what has happened, but your insistence on using 0.7.2 instead of 0.7.4 or 0.7.5 has exposed you to this bug that was already fixed!)
The cost of this bug is very low. Fixing it is easy. Getting the relevant information to reproduce the state is a single command-line invocation and a paste into a gist.
It's interesting to note that the
forever package, as it is primarily a downloadable command line utility, really ought to have its dependencies bundled, (at least, according to me, and several others). Yes, this means that, after updating nconf to 0.5.1 to support whatever Broadway needs, Charlie will have to update the copy that forever is using, and re-publish with a new version. That's what I do for npm. It's not really any more work than explicit version numbers, and in fact, is easiest to accomplish when version numbers in package.json are vague.
In short, I think that the forever/nconf example is a bit of a straw man. npm ls will tell you which versions of everything are installed, and forever ought to be checking in its entire dependency tree anyway. It's not worth giving up the added real-world coverage, of allowing one's reusable module to be installed in multiple different configurations. It's not worth giving up the convenience of releasing bugfixes by pushing one thing vs having to push several.
You're trying to optimize for the deployment case, at the expense of the reusable module development case, and not realizing that the reason there is so much activity here is because we work so hard to make it convenient to do so.
npm shrinkwrap is coming very soon (like
bundler lock). This is a compromise between the convenience of easy upgrades and checking everything into git/bundling in the package tarball. Even in this case, though, I really don't get why you think that checking deps into git for deployed apps is much less convenient than fixed dependency versions.
I maintain a lot of reusable libs, many of which depend on one another. Having to update them all every time one of them changes would mean that the vast majority don't get bugfixes, or my modules would have to be more monolithic, or that there'd be many different copies installed unnecessarily.
You can't get away from having to figure out a reasonable compromise between module author and module user. If you find that you prefer explicit dependency versions, fine. No one's stopping you. In fact, we're adding the shrinkwrap command so that you can have control over this all the way through the tree. But I think that you're failing to see the real advantages of flexible dep versions in other areas.
This feature will never be removed, gradually or otherwise.
I think you misunderstand my statements about checking modules in to git.
A library like forever should NEVER check it's dependencies in to git. I explicitly state:
Only applications that are deployed should checkin node_modules. Package maintainers should continue to define what they think are acceptable version ranges, it’s the only way we can keep the community up with the rate of change and improvement we see in node.js.
An application you are deploying has different requirements and concerns than modules you publish to npm for others to use.
I don't buy the argument that it discourages bug reports and contributions. Sure, it might be easiest to patch a bug locally and then push it to production, and when you're running an application that is breaking because of a bug in a dependency that is what you should do. Waiting around for your pull request to get accepted is a not an acceptable excuse for your site being down.
You're missing the simple fact that using
npm install will overwrite all those local changes, so if you aren't also going through the work to get contributions in upstream you'll never be able to upgrade that dependency, which will likely leave you with more bugs in the long run than you have currently.
> A library like forever should NEVER check it's dependencies in to git.
No, if anything, I misunderstand the forever module ;)
I thought it was primarily a command-line util that keeps your thing running. Yes, if it's a reusable library, then it shouldn't check its deps in.
Disregarding the rest of the discussions (there are valid points on both sides): There's also the inverted case of an external change (to e.g. node itself) breaking something in your module, which is in turn used by another (potentially more used) mod.
Concretely, I maintain a websocket implementation called 'ws'. It's used by socket.io, websocket.io, zombie and other modules which are more commonly installed than ws itself. Yesterday I noticed that the prebuilt 0.6.11 node package for OS X now targets ia32, which causes x64 Mac users to end up with a mismatch between node itself and naively built native extensions. Consequently ws was broken, zombie was broken, websocket.io was broken.
Seeing as most modules depend on version 0.4.x or ~0.4.0 of ws, I was able to 'fix' all of these other modules in one fell swoop by updating my build scripts and pushing 0.4.7.
> Getting the relevant information to reproduce the state is a single command-line invocation and a paste into a gist.
That assumes I had the right "v0.7.2" already installed on my system. I didn't.
> In short, I think that the forever/nconf example is a bit of a straw man.
Absolutely. I wanted to paint my angle on this issue in black and white in order to simplify and focus my arguments.
I certainly don't run into this issue every day, and I certainly don't think this issue will actually stop me from anything.
However, I stand by one thing: I feel like npm is making it hard to do what I want here, and I think my needs are reasonable, even if not shared by everybody. (more on this below)
> Even in this case, though, I really don't get why you think that checking deps into git for deployed apps is much less convenient than fixed dependency versions.
The opposite. I think checking dependencies in git is very convenient. But I really dislike how it encourages to keep your patches private by making that option much easier than contributing. Now I certainly know that this does not apply to Mikeal or you at all, but from my experience helping companies with their projects, I'm not happy with this.
> npm shrinkwrap is coming very soon (like bundler lock).
Awesome. I'd still prefer to get rid of fuzzy versions, but this is the next best thing. So thank you so much for that, as well as your amazing work on npm. I really hope I got the tone of my post right, I certainly have nothing but good things to say about npm 99.9% of the time and I'd be concerned if you tried to please everybody 100% : ).
> But I think that you're failing to see the real advantages of flexible dep versions in other areas.
I see the benefits, but I'd rather have simplicity than flexibility here. I've seen many people caught by surprise when finding out that their fixed version numbers in their package.json only applies to the first level. It's not intuitive to me either.
> This feature will never be removed, gradually or otherwise.
I'm cool with that. You know better what's good for npm than anybody, so I'll adept to the way things work, as well as encourage people to check in their npm modules. And I'll simply be conservative when it comes to depending on modules with fuzzy version specifiers for my own modules.
Thanks for taking the time to reply, I'll update the post now to reflect the outcome of this discussion.
the comment was pointed at Felix, not you. although i disagree that forever's stuff should be checked in, but that's a conversation for another post :)
> the comment was pointed at Felix, not you.
Oh? I thought I had this right in my article. If you have an idea for changing my phrasing, I'll be happy to edit.
The root of problem was that you couldn't get the right version of forever because of a dep issue. Checking that dep in doesn't help you get a different version from `npm install` and forever shouldn't be checking in it's deps because it is a library. The version problem you have is unrelated to checking in deps unless forever checks in it's deps, which I expressly discourage.
> The version problem you have is unrelated to checking in deps
Kind of. My overarching point was against fuzzy version specifiers, not checking in node_modules. I only brought up checking in node_modules because it is another thing I am currently forced to do because of fuzzy version specifiers. But as mentioned by isaac, there will be a an alternative option using npm shrinkwrap soon.
> (Indeed, it's what has happened, but your insistence on using 0.7.2 instead of 0.7.4 or 0.7.5 has exposed you to this bug that was already fixed!)
First: email@example.com works just fine on firstname.lastname@example.org and email@example.com. Please upgrade
This kind of situation is why I started to unpublish versions of modules that are no longer maintained. That practice caused a bunch of hoopla around NKO last year (again, about forever) so I stopped doing it. Yes, using firstname.lastname@example.org was a bad call: I expected the nconf API to be more stable than it was. I did however, publish multiple versions after that and expect people to upgrade quickly. This has been the common wisdom in the node community: upgrade early and often.
That being said, we've begun to freeze dependencies at hard versions for complex dependency trees. A good example of this is flatiron: https://github.com/flatiron/flatiron/blob/master/package.json#L13-20
Looking forward to what `npm shrinkwrap` offers.
> First: email@example.com works just fine on firstname.lastname@example.org and email@example.com. Please upgrade
The problem also exists in the latest version of forever, see: https://github.com/nodejitsu/forever/pull/246
Anyway, I did not mean to criticize forever with this (other than the fact that I do think it's quite heavy on dependencies). In fact, the only reason why I went through the trouble of installing `v0.7.2` was so I could verify if the bug would go away in the latest version or not. It didn't, so I wrote a hopefully good test case / fix and submitted it to you guys.
That being said, what's your take on semver?
I like semver and I think that package@0.N.x (where N is a hard number, eg. 0.3.x) should be considered safe for module authors. The only times when semver has bit me in the ass is when I've done firstname.lastname@example.org.
For application developers the standard should be hard dependencies; always.
`forever` in-particular walks the line between application and module because it performs both tasks. I've been considering a module called `forever-core` with funtionality pulled-out from `forever` with just the core Objects (i.e. no CLI functionality) and then simply having the `forever` CLI app depend on that. In that world:
* `forever-core` would have @0.N.x style dependencies
* `forever` would have hard dependencies, e.g. @0.5.1
> I like semver and I think that package@0.N.x (where N is a hard number, eg. 0.3.x) should be considered safe for module authors.
Would love to chat about this in IRC at some point. IMO semver only works / has merit if we follow it closely, which also means that < 1.0.0 the version number makes no promises on stability. But I'd love to hear your thoughts.
Also I agree on forever, it's a more difficult scenario. IMO the version you install globally should have fixed dependencies as npm currently does not allow you to reference paths as versions specifiers.
Although in breach with semver, my idea of an ideal approach for small to medium sized modules:
For any version x.y.z of a reasonably well tested module, with 'reasonably' meaning at least a spec for what input works with the public interface, z should be upped if and only if the test set for z runs successfully for z+1. Otherwise, y++.
I agree with you einaros, we should have a way to handle dependency, with numbers or something else. As Sebastian Cohnen said bundler on Ruby works fine! Perhaps using the same process as they used for Ruby and not "reinvent the wheel" ...
I've had a similar thought around binary modules and global dependencies. A number of forever's dependencies are only necessary for the application, not the library. It could be neat to have a `binaryDependencies` or something which delineates those modules used in `bin/*` scripts.
Of course, I don't want to come across as too much of a devil due to my part as devil's advocate here.
There is a reason why npm supports listing a dependency explicitly. It's easy to forget the history now, but this was once a contentious issue. There were factions in the package.json spec debates back in the day claiming that `"foo":"1.2.3"` should be interpreted as `"foo":">=1.2.3 <1.3.0"`, which is clearly completely insane, and that's why npm deviated from the semver.org spec somewhat, by introducing explicit ranges and so on.
npm has always been about giving you the tools to do what you need to do. I think that shrinkwrap will be useful to you in situations like this. I also think that checking your modules into git would be a good idea if you're deploying them. The unfortunate crisis of choice is the price we pay for such abundance :)
Isn't this easy to solve?
* When you npm publish it keeps a strict.txt file with deps:
├── email@example.com (firstname.lastname@example.org)
├── email@example.com (firstname.lastname@example.org)
├── email@example.com (firstname.lastname@example.org email@example.com)
├── firstname.lastname@example.org (email@example.com)
├── firstname.lastname@example.org (email@example.com firstname.lastname@example.org email@example.com)
├── firstname.lastname@example.org (email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org)
├── email@example.com (firstname.lastname@example.org email@example.com)
├── firstname.lastname@example.org (email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com)
├── firstname.lastname@example.org (email@example.com firstname.lastname@example.org email@example.com)
├── firstname.lastname@example.org (email@example.com firstname.lastname@example.org email@example.com)
* When you npm install you can npm install --strict that uses this file to match the exact thing that was published.
Why is this a hard problem?
My two cents: I think it's important to look at this from the perspective of which use cases does NPM want to support.
If NPM wants to position itself as a tool that can be used in deployments, then it seems reasonable (and necessary) to grant developers control over the entire dependency tree. Personally, I would love for NPM to be useful in this way, given that it is already capable in every regard, it just hasn't been rendered practical for this purpose at present time. If I spin up a new server, I need to know exactly what code is running on it - it's simply not an option in QA/staging/prod environments to presume upon fuzzy versioning to not break stuff.
However, if NPM does not want to support this use case, then this is less of an issue. I say "less", because it's still inconvenient in situations like the one Felix described (which, incidentally, I've encountered as well). While the reasoning behind fuzzy versioning is sound and works 99% of the time, it is mildly frustrating when I have to jump through 6 or 7 hoops to simply say "you there, run at this version!".
Oh so this is coming :) Nevermind. Next time read comments first :)
Ahh, I missed the note on
npm shrinkwrap as well. This sounds like a reasonable feature for we anal retentive types when it comes to NPM-enabled deployments. Would an install option for providing "hints" for fuzzy versioning situations ever be considered? This is a miserable example because it would be ambiguous in a whole slew of situations, but the idea would be something like
npm install --hint foo 0.1.4.
This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.
I agree with you, but do I really have to retweet Rasmus Lerdorf going to Etsy?